How about mandating that the big players feed SHA sums into a HaveIBeenPwned-style service? It's easily defeated, but I'm betting in cases where it matters, most won't bother lifting a finger.
Watermarking [0] is a better solution. It still works after changes made to the generated output, and anyone can independently check for a watermark. Computerphile did a video on it [1].
But of course, watermarking or checksums stop working once the general public runs LLMs on personal computers. And it's only a matter of time before that happens.
So in the long run, we have three options:
1. take away control from the users over their personal computers with 'AI DRM' (I strongly oppose this option), or
2. legislate: legally require a disclosure for each text on how it was created, or
3. stop assuming that texts are written by humans, and accept that often we will not know how it was created
[0]: Kirchenbauer, J., Geiping, J., Wen, Y., Katz, J., Miers, I., & Goldstein, T. (2023). A watermark for large language models. arXiv preprint arXiv:2301.10226. Online: https://arxiv.org/pdf/2301.10226.pdf
Will the general public be running LLMs on their own hardware, or will it be like where we are today with self-hosting? Despite what I've written above I would like to think it won't. But at the same time this is something big tech companies will work very hard to centralise.
In the short therm, I think it's very likely that companies (including smaller companies) integrating LLM's in their products want to locally run an open source LLM instead of relying on an external service, because it gives more independence and control.
Also, technical enthousiasts will run LLM's locally, like with image generation models.
In the long term, when smartphones are faster and open source LLM's are better (including more efficient), I can imagine LLM's running locally on smartphones.
'self-hosting', which I would define as hosting by individuals for own use or others based on social structures (friends/family/communities), like the hosting of internet forums, is quite small and it seems to shrink. So it seems unlikely that that form of hosting will become relevant for LLMs.
As of today you can download LLaMa/Alpaca and run it offline on commodity hardware (if you don't mind having someone else do the quantisation for you) - the cat's out of the bag with this one