Yesterday Stackoverflow, today Reddit. A clear pattern emerges where open web co...

TheCoreh · on April 18, 2023

I understand where you're coming from, but can't fully agree with you.

First, Stack Overflow contributions are licensed under Creative Commons. So monetizing them is explicitly allowed.

Second, information is not "stolen" nor "goods". Copyright law is completely separate from physical property laws, so even if you could make a case about fair use of training data, copyright-ability of model weights and AI generated content (which I agree are still legal gray areas) and therefore whether or not the "Share-Alike" CC clause is enforceable in this context, it would be an entirely different argument from whether the whole industry is somehow entirely morally bankrupt.

Third, given that this is unpaid work made voluntarily by users of the platforms (Reddit, SO), why is it any more acceptable for these platforms to lock it up and monetize it than for AI companies?

I think it's completely reasonable to charge for API access, particularly above a certain volume, but not because these companies have a right to protect some sort of "intellectual capital investment", but rather because the server costs of processing the requests are not negligible.

If anything, this situation really separates the wheat from the chaff in terms of what pools of open web content are truly "open". If the platforms hosting them expect to retain control of their "investment" can they really be said to be open?

I understand the irony, given that OpenAI's own name is somewhat at odds with its practices (of merely providing open access versus truly releasing everything as open source) but I think the reasonable solution to that conundrum is something like Wikimedia Foundation, Internet Archive or maybe CERN for AI, not giving up on free, open content just because it might feed a giant private brain.

abdullahkhalids · on April 18, 2023

> First, Stack Overflow contributions are licensed under Creative Commons. So monetizing them is explicitly allowed.

The evolution of any human legal system can be described as follows.

1. Hey guys, here is a simple set of rules we have agreed upon, to make sure there are no conflicts. Please follow them in good faith.

2. 95% of people follow both the letter and spirit of the agreed rules.

3. Some bad actors come in and only comply with the letter of rules, hacking and exploiting the system to their obscene advantage.

4. The complexity of the rules is increased to shut down the bad actors. The new rules increase costs for everyone, good and bad actors.

Repeat steps 2-4 continuously till the system is completely broken and we are all much worse off. The bad actors, "We did nothing wrong, we followed the letter of the law."

AgentME · on April 19, 2023

What's the conflict? Stack Overflow content was specifically licensed under Creative Commons so that its content can be maximally used and learned from, and it seems to be working successfully in ways not envisioned before.

brokenmachine · on April 19, 2023

3.5. Bad actors lobby for the letter of the law to be changed in their favor.

4.5. Everyday people are incited to argue about distracting, trivial issues while systemic problems snowball.

eric-hu · on April 19, 2023

I'm favoriting this post. What a pithy description of the systemic breakdown of rule of law.

twelve40 · on April 18, 2023

I don't wish Microsoft to forcibly snag the profits from my (and more significantly, many others) Stack Overflow posts - while giving nothing back to the SO community. I'm ok with SO profiting from that and giving me points in return. If/when that becomes a noticeable issue for SO I'm sure they will revisit their approach too, because nobody likes leeches.

dhruvdh · on April 19, 2023

What do you mean Microsoft "forcibly snagging profits"? How do you profit? I am not familiar with incentives behind posting on SO.

Does Microsoft not cite SO posts in Bing results? Do they not make it easy to find the "correct" SO question/answer?

Is the issue that someone else is helping others, vs "you" or the "SO community"?

twelve40 · on April 20, 2023

An incentive to Stackoverflow to administer the service and to keep the lights on is to get paid for traffic to their website from Google search (which they monetize via a modest amount of ads and job posts)

Incentives for free contributors (SO users) to write up good questions, good answers and debate and to come up with and vote on better solutions in the comments is to get points, recognition and yes to help others and get credit for it in their name, even though this credit is not monetary.

If Microsoft regurgitates my answers (just using me as an example, there are infinitely better contributors) without sending traffic to the SO proper website and without people voting for my answer or participating in debates and discussions on SO website proper - and in many (if not most) cases there is no single smash-hit answer and things need to be worked out and voted on - then my motivation as an SO contributor drops to a complete 0. Basically, no reason to contribute at all, since Microsoft is going to grab my answers for itself and collect the subscription (in case of ChatGPT and Copilot), and eventually the inevitable ad revenue from majority of Microsoft and ChatGPT users never leaving the Microsoft properties and never contributing to the original SO activity.

Of course, there are tons of problems inside SO proper currently as well, but none of them destroy any motivation to contribute as third-parties scraping, regurgitating the original content and keeping the traffic to themselves.

gordian-mind · on April 18, 2023

But they didn't take anything? And those two moves of SO and Reddit are mainly about greed: they want some more money just for hosting content that people generated while viewing their ads and giving them money for features.

breck · on April 19, 2023

Two options:

1) Copyleft licenses

2) Abolish copyright law

I am one of the few arguing for #2, but I think #1 is a good short term option.

sceadu · on April 19, 2023

literally the story of google itself... built technology on a large corpus of existing text (the internet) for pagerank and then able to leverage and monetize it via search and ads.

vctrm67 · on April 19, 2023

But Google itself had and still is free. It's a service they provide to you without charge that, were it not to exist, your life would be almost immeasurably more difficult (as with any search engine). And most of the time it doesn't "take" from website owners; if anything, it generates more traffic for them.

When a model trains over Reddit, it may still provide a service that is free. But the way it's going, companies are charging money for access to those models and aren't generating traffic for the underlying training data/sites.

pcthrowaway · on April 19, 2023

Free to search, though you are the product. Even ChatGPT hasn't productized their users yet in order to provide their service for free

But make no mistake, the secret sauce in Google Search is by no means open, and possibly not even comprehensible to a single human at this point.

treis · on April 18, 2023

I wonder if AI training data can replace ads as a way to monetize web services.

sizzle · on April 19, 2023

And Twitter charging for api access