You are assuming that laws/rights operate on basic properties like learning or remembering, but they operate on goals (e.g. humans can run, but there is no speed limit for a pedestrian. But if we learned to run at 100mph somehow, a limit would be introduced). The goal is to motivate creators to create on terms that would be fair enough and useful for a society. If you “generalize” their work at such scale, it is a very different situation, like when you are using a sidewalk for robots running at 100mph and demotivate anyone from using it.
Services like search, translation, recommendations, etc. are all dependent on training data
There is an obvious benefit in being represented in search results and recommendations.
> If you “generalize” their work at such scale, it is a very different situation, like when you are using a sidewalk for robots running at 100mph and demotivate anyone from using it.
I don't understand what you're trying to state.
If you make fun of someone on national TV, someone doesn't like it either. It doesn't mean that it should be illegal.
The point of attribution and copyright is to create a creator/inventor-friendly environment. The copyright law used realistic points in a problem space to provide it. These new AIs enlarged the problem space, but it doesn’t mean that the initial idea of copyright is not applicable to them by default, or should not be. It’s okay to vote for or against that, but the potential systemic effect of that vote should also be kept in mind.
I hope this makes more clear why I used this analogy.
> The point of attribution and copyright is to create a creator/inventor-friendly environment.
They're building a class, but, beyond that, I'm not really confident there are any parties that feel like they've been harmed by what they're claiming is a violation. Microsoft didn't just copy repositories from GPL licensed code, but from other companies too. Why didn't these companies care?
It is a good counter point though, as Waymo has released datasets that can only be used under a license. While at the same time, GitHub did not initially use discretion for people who had agreed to their terms of service and were using their hosted repositories.
While I think the legal system might lean towards enforcing consent to use the data in certain circumstances, on the technical side, if it's really a generalization that is resulted from training, the initial data is meaningless in the final model. It could be argued that it did not harm this company economically because the model trained without the data would still have an almost equivalent financial impact.
> It’s okay to vote for or against that, but the potential systemic effect of that vote should also be kept in mind.
My position above was clear, I do not think the law should be changed yet. The systematic change is from someone stating a violation happening when I do not believe that it is a violation. This lawsuit could result in models like Stable Diffusion becoming illegal, which I view as incredibly harmful to the future of artificial intelligence research.
What if we extrapolate a little into the furure. Copilot-likes and SDs become useful tools for knowledge/mastery extraction and application, much better than today. Will people want to create new knowledge, or will they be satisfied with reshuffling existing one? Will opinions change if people realize that there is no point in creating something new (it will be generalized), and existing is easily reproduced without them (classic “took our jobs”). Can it all happen? How do you see it?
Services like search, translation, recommendations, etc. are all dependent on training data
There is an obvious benefit in being represented in search results and recommendations.