Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> publicly available data collected

Data, implies factual information. You can not copyright factual information.

The fact that I use the word "appalling" to describe the practice of doing this results in some vector relationship between the words. Thats the data, the fact, not the writing itself.

There are going to be a bunch of interesting court cases where the court is going to have to backtrack on copyrighting facts. Or were going to have to get some real odd legal interpretations of how LLM's work (and buy into them). Or we're going to have to change the law (giving everyone else first mover advantage).

Base on how things have been working I am betting that it's the last one, because it pulls up the ladder.




> Data, implies factual information. You can not copyright factual information

Where on Earth did you get that from?


> "data implies factual information"

They used the word DATA, not content, DATA...

The argument that is going to be made, that your copy right work stands. That the model doesn't care about your document it cares that "the" was used N number of times and its relationships to other words. That information isnt your work, and it is factual. That "data" only has value is when it's weighted against all the "data" put into the system, again not your work at all. (We would say thats information derived, but it will be argued that it is transformed).

> You can not copyright factual information

https://www.techdirt.com/2007/11/27/yet-again-court-tells-ml...

The MLB has been trying to copyright baseball stats forever. The court keeps saying "you cant copyright facts".


I honestly can’t tell if this is satire




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: