I wonder what kind of language you would need to add to your license in order to explicitly forbid the ingestion of your code by Copilot and/or like projects.
> I wonder what kind of language you would need to add to your license in order to explicitly forbid the ingestion of your code by Copilot and/or like projects.
If Microsoft's theory is correct, under US law it is impossible to forbid this with a license, because a license is just an offer of additional permissions beyond what is available automatically under law, and Microsoft's theory is that ingesting code into GitHub is fair use and therefore permitted under law as an exception to copyright without any license from the copyright owner.
If Microsoft's theory is not correct, pretty much any license with an attribution requirement (among others, for other reasons) would work.
The idea that a special license is needed or of any use doesn't seem to have any justification, even in theory. (As is the idea that hosting publicly but not on GitHub changes the legal parameters.)
There's a part that I don't understand. If some software is mirrored on github by someone that isn't the copyright owner, it seems like github shouldn't be able to use it. Yet they said nothing about that specifically. In that case, is the only option to put code somewhere else than github under a license that forbids reuploading to github, and issue DMCAs when/if people reupload your code? It also sounds like when code is removed through DMCA, it should be removed from the training set and they should retrain copilot.
> If some software is mirrored on github by someone that isn't the copyright owner, it seems like github shouldn't be able to use it.
If they don't need permission from the copyright owner, either via license of GitHub T&C, because it's fair use, which is their overt legal theory, then why would it matter legally whether the code was posted to GitHub at all, much less by whom? Ingesting only code form GitHub is a practical convenience that has nothing to do with their legal theory of the right to do it.
> Yet they said nothing about that specifically
Their theory of fair use means they have the right to ingest any code, irrespective of who owns it and what conditions (if any) it is licensed under or where (or even if) it is hosted online. They don't need a separate justification for your scenario if the theory they’ve cited is correct.
> In that case, is the only option to put code somewhere else than github under a license that forbids reuploading to github, and issue DMCAs when/if people reupload your code
Nope, that doesn't help at all, legally; it may help practically as long as they are just using GitHub hosted code and not consuming code from other public hosting platforms, but it has no bearing on their legal theory of why they can ingest code without additional permissions.
Thanks for the clarification. So according to their theory, I could train a model on any code, even private Microsoft code, and that would be okay? That sounds surprising to me.
IANAL, but I have written licenses for that purpose. [1] (I'm trying to get them reviewed by a lawyer, but can't afford to; maybe I'll do a GoFundMe.)
What I did is say that if you feed copyrighted software to an algorithm that itself outputs software, then the license applies to the output. This covers the output of compilers and such, but it would also cover Copilot in my opinion. We'll see what a lawyer says.
However, even with a license, I wouldn't doubt that Microsoft would just put it through GitHub anyway because finding them out would be extraordinarily hard.
The “Yzena Copyleft License” states that it's a copyleft license, but it also states that it's not a viral license. According to Wikipedia, a viral license and a copyleft license are the same thing.
There is a difference between "strong" copyleft and "weak" copyleft. An example of "weak" (non-viral) copyleft is the CDDL. In fact, the CDDL's Wikipedia page talks about strong and weak copyleft.
You can read [1] for a breakdown of copyleft by an actual lawyer. Suffice it to say that Wikipedia's Copyleft page is woefully inadequate.