Doesn't that mean if they used data created by, (or even the data of), anyone in the EU, that they would want to not release that model in the EU?
This sounds like "if an EU citizen created, or has data referenced, in any piece of the data you trained from then..."
Which, I mean, I can kind of see why US and Chinese companies prefer to just not release their models in the EU. How could a company ever make a guarantee satisfying those requirements? It would take a massive filtering effort.
This seems to mirror the situation where US financial regulations (FATCA) are seen as such a hassle to deal with for foreign financial institutions that they'd prefer to just not accept US citizens as customers.
> > This sounds like "if an EU citizen created, or has data referenced, in any piece of the data you trained from then..."
> Yes, and that should be the default for any citizen of any country in the world.
This is a completely untenable policy. Each and every piece of data in the world can be traced to one or more citizens of some country. Actively getting permission for every item is not feasible for any company, no matter the scale of the company.
I think that’s kinda the point that is being made.
Technolgy-wise, it is clearly feasible to aggregate the data to train an LLM and to release a product on that.
It seems that some would argue that was never legally a feasible thing to do, based on the training data being impossible to use legally. So, it is the existence of many of these LLMs that is (legally) untenable.
Whether valid or not the point may be mute because, like Uber, if the laws actually do forbid this use, they will change as necessary to accommodate the new technology. Too many “average voters” like using things such as ChatGPT and it’s not a hill politicians will be willing to die on.
> Actively getting permission for every item is not feasible for any company, no matter the scale of the company.
There's a huge amount of data that:
- isn't personal data
- isn't copyrighted
- isn't otherwise protected
You could argue if that is enough data, but neither you nor corporations argue that. You just go for "every single scrap of data on the planet must be made accessible to supranational trillion-dollar corporations, without limits, now and forever"
Doesn't that mean if they used data created by, (or even the data of), anyone in the EU, that they would want to not release that model in the EU?
This sounds like "if an EU citizen created, or has data referenced, in any piece of the data you trained from then..."
Which, I mean, I can kind of see why US and Chinese companies prefer to just not release their models in the EU. How could a company ever make a guarantee satisfying those requirements? It would take a massive filtering effort.