Hacker News new | past | comments | ask | show | jobs | submit login

> Adding an LLM abstraction layer doesn't make the existing laws (or social/moral pressure) go away.

Isn't the "abstraction" of "the model" exactly the reason we have open court filings against stable diffusion and other models for possibly stealing artist's work in the open source domain and claiming it's legal while also being financially backed by major corporations who are then using said models for profit?

Whose to say that "training a model on your data isn't actually stealing your data" it's just "training a model" as long as you delete the original data after you finish training?

What if instead of Google snooping, they hire a 3rd party to snoop it, then another 3rd party to transfer it, then another 3rd party to build the model, then another 3rd party to re-sell the model. Then create legal loopholes around which ones are doing it for "research" and which ones are doing it for profit/hiring. All of the sudden, it gets really murky who is and isn't allowed to have a model of you.

I feel one could argue that the abstraction is exactly the kind of smoke screen that many will use to avoid the social/moral pressures legally, allowing them to do bad things but get away with it.




> for possibly stealing artist's work in the open source domain

The provenance of the training set is key. Every LLM company so far has been extremely careful to avoid using people's private data for LLM training, and for good reason.

If a company were to train an LLM exclusively on a single person's private data and then use that LLM to make decisions about that person, the intention is very clearly to access that person's private data. There is no way they could argue otherwise.


> Every LLM company so far has been extremely careful to avoid using private people's data for LLM training

No, they haven’t. (Now, if you said “people's private data” instead of “private people's data”, you’d be, at least, less wrong.)


I've spoken with a lawyer about data collection in the past and I think there might be a case if you were to:

- collect thousands of people's data

- anonymize it

- then shadow correlate the data in a web

- then trace a trail through said web for each "individual"

- then train several individuals as models

- then abstract that with a model on top of those models

Now you have a legal case that it's merely an academic research into independent behaviors affecting a larger model. Even though you may have collected private data, the anonymization of it might fall under ethical data collection purposes (Meta uses this loophole for their shadow profiling).

Unfortunately, I don't think it is as cut and dry as you explained. As far as I know, these laws are already being side-stepped.

For the record, I don't like it. I think this is a bad thing. Unfortunately, it's still arguably "legal".


I realize that data can be de-anonymized, but if the same party anonymized and de-anonymized the data... well, IANAL, and you apparently talked to one, but that doesn't seem like something a court would like.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: