Hacker News new | past | comments | ask | show | jobs | submit login

That’s a good point. Wouldn’t OpenR1 suffer from the same problem? Or does being open somehow shield them from legal repercussions?



Some people believe they can dodge copyright issues so long as they have enough indirection in their training pipeline.

You take a terabyte of pirated college physics textbooks and train a model that can pose and answer physics 101 problems.

Then a separate, "independent" team uses that model to generate a terabyte of new, synthetic physics 101 problems and solutions, and releases this dataset as "public domain".

Then a third "independent" team uses that synthetic dataset to train a model.

The theory is this forms a sort of legal sieve. Pass the knowledge through a grid with a million fact-sized holes and with enough shaking, the knowledge falls through but the copyright doesn't.


Knowledge laundering




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: