Hacker News new | past | comments | ask | show | jobs | submit login

IANAL.

I don’t think you can prevent scraping or use in ML corpuses in this way. Copyright prevents the creation of non-transformative copies of a work other than some protected use cases (parody, education, etc). All OSS licenses do is provide a right to copy a work provided certain conditions (attribution, copy left) are met. But the general legal consensus as far as I know is that most ML models meet the threshold for being a new transformative work, so copyright doesn’t apply. Accordingly, you can’t use copyright to prevent something from being part of a ML corpus.

That said, I if your question is broader than the article… if you’re just talking about non-transformative uses (I.e., just using open source software) I don’t see any reason why you couldn’t create a license that doesn’t allow software to be deployed into certain environments. Some examples:

https://www.cs.ucdavis.edu/~rogaway/ocb/license2.pdf

https://www.linux.com/news/open-source-project-adds-no-milit...

No idea how these would do in court though.




> But the general legal consensus as far as I know is that most ML models meet the threshold for being a new transformative work, so copyright doesn’t apply.

Has this been tested in court yet?


It hasn't yet. I think this is the central claim of the GitHub co-pilot suit.

There's a prediction market on whether the suit will be successful, which is currently at 43%: https://manifold.markets/JeffKaufman/will-the-github-copilot...


> Copyright prevents the creation of non-transformative copies of a work

It also prevents transformative derivatives.

Both nontransformative copies and transformative derivative works may meet (in the US) the exception for fair use, which is the usual argument for nonlicensed use in ML training.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: