"jethrodaniel" does not appear to have the copyright to offer that license, but it's hard for Github to determine that in general, so I doubt they would be liable for the error.
Even if it's somehow available under an MIT license (which is questionable on the part of jethrodaniel), there's still infringement. MIT isn't public domain, it still has
> The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
Replicating it without complying with those terms is still infringement.
this. People are being willfully blind here, like cult members looking dead-eyed at their leader and chanting "This is great" as they drink the kool-aid.
And from Microsoft no less, once outcast for mass poisoning.
Actually the legal system is evidence based. Microsoft has evidence that the code they are producing is licensed under MIT as far as they can reasonably know. There's no definitive way to know that who actually owns the original copyright. I could grant permission to use my repo, but maybe I got that code from someone else, who then got it from someone else and so on and so forth. It's a similar situation with stolen goods, if you unknowingly purchase stolen goods you usually cannot be charged for theft as long as there aren't obvious signs that it's stolen such as the goods being priced far below market value.
Microsoft has evidence that the code they are reproducing is MIT licensed, so are they intentionally violating that license or does this AI thing include the license and attribution in every snippet it generates?
Major aspects of copyright infringement are strict liability, like a lot of civil actions around damages. It doesn't matter if you thought it was OK, there's still a damaged party that needs compensation according to the law. At best you'll simply avoid the criminal and punitive penalties.
No, PornHub doesn't have liability in a lot of cases because of 17 § 512, but has still had to deal with liability in general, which is why they nuked some 80% of their library not backed by verified individuals a while back.
A huge part of 17§512 is the DMCA takedown process mainly in 17§512(c)(3). Does Microsoft even have the ability to truly remove training data from the model? Or do they have to retrain on each DMCA takedown?
I personally don't want to have to upload proof of identity to GitHub and a signed document swearing that I own the copyright to all the code I upload to GitHub, or proof that I coded it. We need to be careful what we wish for.
> THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
If they had a reasonable basis for believing they had a license they're in the clear. "I didn't know" might not be enough but "I had good reasons to think otherwise" is.
I’m not a lawyer but my understanding these are torts so all you have to prove is Microsoft has liability. I think this would be easy to prove due to the way neural networks work since it’s just a way of performing a search.
Since it’s a tort I don’t think you have to prove they should have know it would return copyrighted code, the fact that it does is enough to have liability.
"jethrodaniel" does not appear to have the copyright to offer that license, but it's hard for Github to determine that in general, so I doubt they would be liable for the error.