I hear what you're saying, and I'm not saying some of it doesn't have merit. The following is meant as an open philosophical discussion.
On the topic of 'the information isn't free' I'm curious if you have the same opinion of encyclopedia companies. You must admit there's at least some parallels in that they also consolidate a large amount of information that was 'generated' from others.
Or how about the information you and I have gained from books and the internet? Sure we might 'pay' for it once by buying a book or seeing some ad, but then we might use that information to make thousands of dollars through employment without ever going back to buy another copy of that book. An even more 'egregious' example could be teachers. They're literally taking the knowledge of others, 'regurgitating' it to our children for money, and 'not giving anything back to whoever create the information in the first place'.
> there's a distinct danger that they will simply suck they sources dry and leave the internet itself even more of a wasteland than it has already become
Maybe. There's the whole AGI/ASI argument here in that they/we might not _need_ humans to create information in the same way we don't need human-calculators any more.
Barring that though I do hear what you're saying around a lowering value to creating 'new internet information'. Personally I can't see it affecting my internet use that much though as there's basically two categories my internet information gathering fall in to:
1. I want to know something, give me the short quick answer. This category is already full of sites that's are just trying to hack the search algos to show their version of copy-pasted info. I don't really care which I go to and if AI kills their business, oh well.
2. I want follow a personality. This category is where I have bloggers/youtubers/etc in RSS feeds and the like. I want to hear what they're saying because I find them and the topics interesting. I can't see this being replaced by AI any time soon.
> Or how about the information you and I have gained from books and the internet? Sure we might 'pay' for it once by buying a book
We've never as a society needed such a concept before, but publishing a book has always come with the implicit license that people who buy the book are allowed to both read the book and learn from the knowledge inside. Authors didn't write books about facts they didn't want people to learn.
But we now have a new situation where authors who never needed to specify this in a terms-of-use are realizing that they want to allow humans to learn from their work, but not machines. Since this hasn't ever been necessary before it's a huge grey area, and ML companies are riding around claiming they have license to learn to reproduce art styles just like any human would, ignoring whether the artist would have allowed one but not the other if given the chance to specify.
It's not that different from when photocopiers and tape recorder technology made it easy to copy documents or music, say from the radio, and we needed to grapple with the idea that broadcasting music might come with license to make personal recordings but not allow someone to replay those recordings for commercial use. It wasn't a concept that was necessary to have.
Now with AI, the copy is not exact, but neither was it with a tape recorder.
You raise some great points and I agree it that we are on tricky ideological grounds. I'll try to provide sensible counter-arguments to your encyclopaedia and teacher examples, and hopefully not fall into strawmans (please do object if I do):
1. First there's the motivation or intent. Teachers want to earn a living, but their purpose in some sense and (hopefully) their main intent is that of education. I argue that teachers should be paid handsomely, but I also argue that their motivation is rarely to maximize profits. This is contrary to the bog standard Silicon Valley AI company, who are clearly showing that they have zero scruples about breaking past promises for those sweet dollar signs.
2. My second point actually builds a bit on the first: both encyclopaedias and teachers tend to quote the source and they want their audience to expand their research horizon and reach for other sources. They don't just regurgitate information, they'll tend to show the reader where they got the information from and where to go for more and neither the teachers nor the books mind if the audience reaches for other teachers and books. LLMs and generative models are/will be/have been capable of this I'm sure, but it is not in their creators' interest to enhance or market this capability. The more the users are walled in, the better. They want a captive audience who only stays in the world of one AI model provider.
3. Scale. Never before has been the reuse (I'm trying to avoid using the word theft) of content produced by others conducted on such an industrial scale. The entire business model of LLMs and generative models has been to take information created by masses of humans and reproduce it. They seem to have zero qualms taking all the work of professional and amateur artists and feeding it into a statistical model that trivializes replication and reproduction. You could argue that humans do this as well, but I feel scale matters here. The same way that a kitchen knife can be used to murder someone, but with a machinegun you can mow down masses of people. Please excuse the morbid example, but I'm trying to drive a point: if we make a certain thing extremely easy, people will do it, and likely do it on a mass scale. You could argue that this is progress, but is all progress inherently beneficial?
There's value in these models, so we should use them. But I feel we are rapidly hurtling towards a walled garden corporate dystopia in so many areas of our society. Industries which tended to have negative impact on our lives (waste, tobacco, alcohol, drugs) have become heavily regulated and we have paid for these regulations in blood. Will we have to pay the same blood price for the harmful industries of the new age?
On the topic of 'the information isn't free' I'm curious if you have the same opinion of encyclopedia companies. You must admit there's at least some parallels in that they also consolidate a large amount of information that was 'generated' from others.
Or how about the information you and I have gained from books and the internet? Sure we might 'pay' for it once by buying a book or seeing some ad, but then we might use that information to make thousands of dollars through employment without ever going back to buy another copy of that book. An even more 'egregious' example could be teachers. They're literally taking the knowledge of others, 'regurgitating' it to our children for money, and 'not giving anything back to whoever create the information in the first place'.
> there's a distinct danger that they will simply suck they sources dry and leave the internet itself even more of a wasteland than it has already become
Maybe. There's the whole AGI/ASI argument here in that they/we might not _need_ humans to create information in the same way we don't need human-calculators any more.
Barring that though I do hear what you're saying around a lowering value to creating 'new internet information'. Personally I can't see it affecting my internet use that much though as there's basically two categories my internet information gathering fall in to:
1. I want to know something, give me the short quick answer. This category is already full of sites that's are just trying to hack the search algos to show their version of copy-pasted info. I don't really care which I go to and if AI kills their business, oh well.
2. I want follow a personality. This category is where I have bloggers/youtubers/etc in RSS feeds and the like. I want to hear what they're saying because I find them and the topics interesting. I can't see this being replaced by AI any time soon.