Reddit will begin charging for access to its API

neonate · on April 18, 2023

lcnPylGDnU4H9OF · on April 18, 2023

Related to https://news.ycombinator.com/item?id=35617763 ("Reddit Wants to Get Paid for Helping to Teach Big A.I. Systems"; an aside, but I much prefer the title of this post I'm commenting on as it describes the actual change) and it's hard to find this particularly disagreeable. Especially considering:

> Reddit’s API will remain free to developers who want to build apps and bots that help people to use Reddit, as well as to researchers who wish to study Reddit for strictly academic or noncommercial purposes.

> But companies that “crawl” Reddit for data and “don’t return any of that value” to users will have to pay up,” Reddit co-founder and CEO Steve Huffman told The Times.

prepend · on April 19, 2023

It’s funny because posts on Reddit don’t belong to Reddit, they belong to the users who created them.

Why would I, as someone who’s made tens of thousands of comments, care if someone scrapes and reuses my comments. I don’t want them to pay up.

This is a really rich comment from a company that relies entirely on user submitted content and has never “paid up.”

lvncelot · on April 19, 2023

> This is a really rich comment from a company that relies entirely on user submitted content

User submitted content and moderation.

wallmountedtv · on April 19, 2023

And the moderation that makes Reddit hold valuable content is done by its users on a per subreddit status. Only stuff that could break laws like extremist content and hate speech is handled by Reddit themselves.

It's really odd to call it "their" data, and this is not exclusive to Reddit.

hammyhavoc · on April 19, 2023

It isn't odd. Is it odd that content on Facebook is Meta's data? Maybe try reading the T&Cs and it won't seem so odd.

They provide the platform for free. Don't like it? Self-host or go elsewhere. This is the biz model every content silo uses.

psychphysic · on April 19, 2023

Agreed! Reddit runs on the good will of it's very few good users.

It is quite the cesspit and always has been.

Training much on it will likely worsen the confidently incorrect problems.

artificial · on April 19, 2023

Considering OpenAI trained on Twitter data among others, I think it'll make for more flavor that users crave, based on the popularity of both of those platforms.

brigandish · on April 19, 2023

You made the comment, Reddit built the API and the system you use for making comments. If they wish to charge for their part(s) in this, they can, or retract their work, or give it away. Just as you can for your comments.

prepend · on April 19, 2023

Certainly they can charge for their API. It was the CEO’s phrasing that was odd. That is their data and if people want to use it, they should pay up.

I think charging for the api is bad as it will make things like user apps harder. I think Reddit’s app is bad, so other apps need to use the api in order to function.

brookst · on April 19, 2023

There's a bunch of precedent that aggregators have some IP rights. Reddit does not have exclusive rights to your posts, but they can have some rights to the collection of posts from all users.

markdown · on April 19, 2023

> Why would I, as someone who’s made tens of thousands of comments, care if someone scrapes and reuses my comments. I don’t want them to pay up

Why would I, as someone who's made tens of thousands of comments, be happy with a corporation scraping my content to create a service that they'll turn around and charge me for? I want them to pay up, so that Reddit, this wonderful service that has given me thousands of hours of entertainment and education, can be sustainable and grow.

> This is a really rich comment from a company that relies entirely on user submitted content and has never “paid up.”

Most redditors will agree that they get much more from Reddit than they give. I for one am very happy with the arrangement I have with Reddit.

mxkopy · on April 19, 2023

Because reddit is a terrible company and you don't want to subsidize their transition into a shitty ad service.

bezier-curve · on April 19, 2023

There's a lot of people in this thread defending Reddit and they don't seem to have ever had the pleasure of dealing with an actual Reddit employee. They have a culture of unchecked cronyism. Reddit doesn't care about anyone, some people will eventually figure it out the hard way.

markdown · on April 19, 2023

> they don't seem to have ever had the pleasure of dealing with an actual Reddit employee.

99.999% of redditors will never ever have to deal with a Reddit employee. Cronyism? What the hell has that got to do with my consumption of and participation on Reddit?

bezier-curve · on April 19, 2023

Most users don't interact with Reddit employees, but the moderators that maintain most of the communities you enjoy do. That's how I ended up interacting with them. Shortly after the IPO rumors, my community started being harassed by an admin through modmail.

markdown · on April 19, 2023

That's an odd way to put it. The admins are basically god. God doesn't harass you, he tells you how to live. If an admin tells you to jump, you beg to know how high.

I've been a mod for a decade and never had a problem with admins.

bezier-curve · on April 19, 2023

I was new to it though, and an admin picked on me because I was parodying another subreddit (my home city subreddit). You've been modding for a decade, great. That basically backs up my original point that it's a party of tenure and closed-mindedness (cronyism). My situation was different, and it's pointless to argue, but if you trust any company (especially in a changing economic environment), keep your eyes open for poorly motivated incentives.

Regardless, the attitude that they're "god" is a weird way to put it. They randomly IP banned me for calling out an admin's publicity issue during an April fools event. That's cronyism. I've used Reddit for 12 years and engaged in conversations in good faith for years. Paid for subscriptions most of that time. All relationships, business or otherwise should be mutual in some form.

carlhjerpe · on April 19, 2023

If companies pay reddit for "public" data the incentive to poision the platform with too much ads decreases, and there's always adblock.

mxkopy · on April 19, 2023

Having more ads and monetizing the API aren't mutually exclusive. Look at the avatar/award system for example, which evolved in tandem with dark patterns that push the user to the app where they can serve unblockable ads.

lcnPylGDnU4H9OF · on April 19, 2023

My perspective is that Reddit made the comment you submitted. In real terms the comment is a record in a database which backs a web application that is developed and administered by Reddit. The comment is your expression but, like it or not, it is by Reddit’s grace that they publish it on their website. (Consider things which would be illegal for them to host and publish; they need to keep a close eye out for such things and prevent those relatively few posts among the millions they receive daily.)

rcme · on April 19, 2023

Does the air make my speech by propagating the energy from my larynx?

lcnPylGDnU4H9OF · on April 19, 2023

Technically, yes, if you’d like to credit the effect of one hearing your speech to the workings of Earth’s atmosphere. It’s true that the speech is your expression but you correctly point out that the air brings it to my perceptions.

marpstar · on April 19, 2023

I think what GP means by "made" is "produced". Reddit provides the platform, the community, and the reach -- it's like a record label. Much like how recording artists don't own their recordings.

moshun · on April 19, 2023

Tons of recording artists own their masters, both big and small. It’s a function of their contract that determines that ownership, and those terms are clear. Just as they are clear in Reddit’s TOS. You own your content, Reddit simply has a license to use it.

https://www.redditinc.com/policies/user-agreement-april-18-2...

marpstar · on April 19, 2023

From your link:

> When Your Content is created with or submitted to the Services, you grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit. You also agree that we may remove metadata associated with Your Content, and you irrevocably waive any claims and assertions of moral rights or attribution with respect to Your Content.

That's just about every aspect of "ownership" I can think of, minus the label "ownership". Honestly, it seems about as close as a lawyer would allow a company governed by Section 230 to have, as "ownership" would step into exposure to liability.

bjourne · on April 19, 2023

Well, this comment and all other comments you have submitted to Hacker News is a record in my browser's cache. Does it mean I have the right to save them into less volatile storage and charge others for accessing them?

lcnPylGDnU4H9OF · on April 19, 2023

If I sued you for running such a service, it's likely that you would be looking to convince a judge that you got me to agree to something like this: https://www.ycombinator.com/legal/

JCharante · on April 19, 2023

Okay, so where should we start mailing checks to Reddit for hosting our comments?

eviks · on April 19, 2023

To the same address Reddit mails checks for generating revenue based on those comments

bagels · on April 19, 2023

I assume you give them a license to use your posts when you sign up.

MuffinFlavored · on April 18, 2023

> But companies that “crawl” Reddit for data

It's not possible to hide crawling at a large enough scale, right? At some point, certain IPs/user agents will (should be?) hit with CAPTCHAs to be able to have access to content and no amount of user agent/cookie/session/whatever spoofing will get around that, yeah?

pocket_cheese · on April 18, 2023

IP restrictions are easy to overcome using mobile networks. Basically, mobile networks assigns your device an internal ip and NATs out to a very small pool of ip public addresses. If they block you, they also block a very large chunk of legitimate mobile users. I'm a big ol' dummy when it comes to networking, so I imagine I explained something poorly... so any mobile network nerds feel free to pile on!

Captchas are super easy! There's a gagillion captcha bypass services for every type of captcha. Just snag the captcha token, send it in an API call, and then you get a verified captcha token.

See CGNAT for more details about mobile networks. https://en.wikipedia.org/wiki/Carrier-grade_NAT#cite_note-of...

It's pretty much impossible to stop the top 1% of the most dedicated scrapers without affecting end user experience.

tjohns · on April 18, 2023

> IP restrictions are easy to overcome using mobile networks.

Only if the connection is over IPv4.

The mobile networks were among the first major adopters of IPv6, and most now give each device a unique IPv6 address.

abhibeckert · on April 18, 2023

My mobile device (iPhone) relays most traffic through the nearest Akamai datacenter. So they don't get my IP address. And that datacenter has a massive number of IP addresses, which are rotated.

denkmoon · on April 18, 2023

Out of interest how do you know it's being relayed through an Akamai DC? I assume you're talking about private relay which I also use, but I thought cloudflare was the 2nd hop for that?

lathiat · on April 19, 2023

They're using multiple providers including Fastly, Akamai and Cloudflare: https://www.streamingmediablog.com/2021/06/apple-private-rel...

sjtgraham · on April 19, 2023

This is only HTTP and not HTTPS traffic, which most www traffic is these days.

SnorkelTan · on April 19, 2023

reddits api is ip4 only

wswope · on April 18, 2023

Cat & mouse game. If you’re defending against a whitehat business scraping with curl from data center IPs, sure.

Against a less-savory actor using hundreds of IPs from residential proxies/compromised hosts, you’re gonna have a rough time, especially if you’re unwilling or unable to use aggresive fingerprinting or (vomit) CloudFlare. Not to mention CAPTCHAs are generally already a solved problem for scrapers.

SheinhardtWigCo · on April 19, 2023

Residential proxies are a completely solved problem, for companies that actually lose money to them (e.g. Ticketmaster, whose profit is maximized by blocking third-party scalpers so they can do the scalping themselves)

For companies that make money by having more MAUs, well, yeah, they're going to have a real "rough time" detecting inauthentic traffic

kayson · on April 18, 2023

Why is cloudflare vomit-inducing?

wswope · on April 18, 2023

While I appreciate it as an irreplaceable tool for countering DDoS, its premise is antithetical to a reliable and open web IMO, and it suffers from the same lack of accessible, customer-facing support as other big tech players. Lazy examples from HN algolia search:

https://news.ycombinator.com/item?id=32912075 https://news.ycombinator.com/item?id=17750801 https://news.ycombinator.com/item?id=22109969 https://news.ycombinator.com/item?id=30764757 https://news.ycombinator.com/item?id=29839960 https://news.ycombinator.com/item?id=22406277 https://news.ycombinator.com/item?id=23897705 https://news.ycombinator.com/item?id=34639212

(Hypocrisy disclaimer: I have sites behind CloudFlare.)

gnicholas · on April 19, 2023

Upvoted for “hypocrisy disclaimer”

tomwheeler · on April 18, 2023

I intensely dislike them taking over as gatekeepers of the web. Perhaps because my browser is configured to resist fingerprinting and to avoid running arbitrary scripts from random websites, it is very frequently blocked by Cloudflare.

As one example, I can no longer browse the site for Lowe's (big box home improvement chain). Consequently, I now buy everything from Home Depot (their competitor).

It's astonishing how Cloudflare can do such a poor job of determining the difference between a potential customer and an attacker. Life's too short to solve captchas for an intermediary, so I don't bother, I just find a competitor who wants my business.

travisjungroth · on April 18, 2023

> It's astonishing how Cloudflare can do such a poor job of determining the difference between a potential customer and an attacker.

I don’t find that astonishing at all. I can’t see how you’d disambiguate someone who is anonymous for good versus bad reasons. Not supporting the death of the anonymous internet, but it’s not happening because of incompetence.

thaumaturgy · on April 19, 2023

I don't think Cloudflare is immune to organizational incompetence even if a lot of brilliant people work there. I have similar intermittent problems as ~tomwheeler, despite a mostly unchanged residential IP and a browser configuration that's only a little bit defensive.

My outsider's impression is that Cloudflare has decided to rely much more heavily on browser fingerprinting than on classifying good/bad network activity. That puts them at odds with anyone that's taken steps to oppose being monetized by advertising firms.

tomwheeler · on April 20, 2023

> I can’t see how you’d disambiguate someone who is anonymous for good versus bad reasons.

One obvious clue would be that there are no attacks coming from my IP address.

kokanee · on April 19, 2023

I think that both Cloudflare and the Lowe's stores of the world understand that these interventions have negative side effects. The problem is that leaving them out has even worse consequences, and no one has offered a sufficient alternative.

Put another way, one could reason that they'd prefer to do business with Lowes because they are actively investing in security measures. Perhaps your data is more likely to be compromised at Home Depot.

yazzku · on April 19, 2023

It induces vomit on anyone who is on any combination of a) a slow network b) TOR or c) noscript. They also fundamentally act as middlemen, the gate between users and what's supposed to be an open web. They even promote having servers run plain http and they'll do the HTTPS proxying for you; you know, so that they can sniff the traffic between you and your users.

AlphaSite · on April 18, 2023

Require auth and it’ll help a ton.

thomastjeffery · on April 18, 2023

One of the core features of Reddit is that any person may create as many accounts as they like for free. Changing that would be incredibly disruptive.

You could set a minimum karma threshold, but that would only promote karma farming; which is already widespread.

asdadsdad · on April 19, 2023

reddit might be one of the few last places on the internet that hold the old times of pseudo-names and mindless anonymity. I don't see how changing that would benefit the company. see twitter

MuffinFlavored · on April 18, 2023

> One of the core features of Reddit is that any person may create as many accounts as they like for free. Changing that would be incredibly disruptive.

I wonder what their monthly active users look like if you filter out 1 person switching through 3 usernames/accounts for example.

Zak · on April 18, 2023

Reddit wants people to visit the site, become interested in the content they see, and start participating regularly. That's not compatible with hiding enough content behind a registration wall to thwart sufficiently sophisticated scraping.

henryfjordan · on April 18, 2023

There are services out there that have a large pool of consumer IPs that are marketed at crawlers for exactly this reason. A lot of them are either using hacked hardware or one of those free VPN browser plugins so it would be very hard to distinguish the traffic from a legit user.

stingraycharles · on April 18, 2023

There are residential proxies that allow you bypass most of these things. I’ve been using them to crawl e.g. Amazon or Instagram without any issues, but they’re expensive. IIRC something like $10/GB

sonofhans · on April 18, 2023

Serious question — is “residential proxies” a euphemism for “botnet?”

celestialcheese · on April 18, 2023

Yes - but legal and explicitly allowed by the user.

BrightData is the biggest of them, they run the free VPN Hola, and have an SDK app owners can install in their apps that allow selling bandwidth from installs. For someone who is price sensitive, trading some free residential bandwidth for whatever service is pretty compelling.

I'm sure there are scummy ones, but Bright seems to require pretty explicit consent. Not affiliated, just looked into it for some apps I have, but the payouts weren't good and I didn't think it'd be a good fit for our users.

stingraycharles · on April 18, 2023

This is exactly the one I was using. Basically you’re piggy-backing on mobile phones and other devices using their free VPN software, and it’s incredibly hard to block for large websites. Combine this with some other clever tricks, and you’re basically able to do huge scrapes for not-that-much money with incredible convenience.

candiddevmike · on April 18, 2023

There are also a lot of rural/metro ISPs that offer this as a service (residential IPs) if you find the right person

aeyes · on April 18, 2023

No because they are only HTTP proxies. But you don’t actually know how these companies get them, rumor is that they are part of browser extensions or free VPNs which users might install on their devices.

The most “reputable” company in this space is Bright Data (formerly Luminati).

bikingbismuth · on April 18, 2023

Essentially yes. Sometimes it also includes people who have installed “free” VPNs.

sleepybrett · on April 18, 2023

Yes and also people who install certain shady "VPN" software.

lobsterthief · on April 18, 2023

I’m curious about this too.

jamesfinlayson · on April 19, 2023

Last week someone here said that some of the big VPN players use botnets to residential IP addresses. I assumed they got residential IP addresses from ISPs but maybe not all ISPs in all parts of the world offer that.

zer0tonin · on April 18, 2023

It's possible but might end up more expensive (and definitely less reliable) than just paying whatever reddit asks for.

oceanplexian · on April 18, 2023

CAPTCHAs have been broken by primitive AI for a long time (Long before GPT4-like tools). Their only purpose is to deter the lazy bots. User agents, and any other arbitrary HTTP headers, cookies, etc. have been easy to circumvent as long as the internet has existed. The only thing that sort of works is IP reputation but with IPV6 you can have as many legitimate IPs as you want.

tl;dr Dedicated crawlers built by sophisticated actors are more or less impossible to defeat.

wlesieutre · on April 18, 2023

It's hard to imagine captchas being a workable solution as better AI models get cheaper.

At this point computers are probably already better at solving them than humans are.

gloosx · on April 19, 2023

It is very easy and cheap to scale. 1$ for 1000 captchas solved, 10$ for 1000 proxies. Then you have 1000 users, and these are kinda impossible to distinguish from your typical common users if you cared to randomize the digital fingerprints for each client to some extent. Paid APIs for publicly accessible data are not something that makes sense or works well in this world.

asdadsdad · on April 19, 2023

why would paid scraping services work then?

gloosx · on April 19, 2023

You're right... I went off-lane there. It makes total sense of course since there is a demand for data, and clearly, just a minority of the people can just scrape everything at will, even if it sounds like kids play to me. And actually, it all makes sense now, since pay walling your own API is just throwing some competition to scrapers, which is totally legit. Sometimes a simple question can do a good deed, thank you:)

asdadsdad · on April 19, 2023

Ha, no worries. I ask because I've also contemplated that kind of thing and realized that I was chasing my own tail perhaps.

boredtofears · on April 18, 2023

> Reddit’s API will remain free to developers who want to build apps and bots that help people to use Reddit

I'd be fine if they didn't. The ratio of useful bots to annoying ones is very low.

faeranne · on April 19, 2023

Bots? eh... probably gonna agree. Apps, hard no. API access to third-party apps is the only way to make competition work out in the end. IRQ, AIM, and MSN Messenger all existed around the same time, and thanks to XMPP, worked equally well on an XMPP client. This made it reasonable to use all 3 if a user wanted to, plus they could use their own server too, or a friends, or a company, or whatever. Thanks to SMTP, email is (mostly) the same way right now. On the other hand, the perfect example of how shuttering API access to apps can completely kill any competition exists right now in the form of Discord. Sure, Guilded exists, and one could argue Slack is a competitor, but tell me, do you genuinely use all three? Would they be interchangeable? Or do you split personal and professional between Discord and Slack? If all 3 had a common standard, or at least had open client api's, we'd already have a unified client, making all 3 easy to access at the same time, and we'd have good competition. Reddit has competition, and there are many third party apps that allow using all of them under one roof. Killing that off would not be a welcome change.

So yeah, Reddit may not need bots, but refusing to allow apps is just pushing another nail in the coffin of competition.

sebazzz · on April 19, 2023

Unfortunately the developer of the Apollo app already got a call, and apps will need to pay. That's then the end of reddit on mobile for me. The official app is unusable and had annoying behaviour in desperate attempts to boost engagement.

https://old.reddit.com/r/apolloapp/comments/12ram0f/had_a_fe...

> There was a quote in an article about how these changes would not affect Reddit apps, that was meant in reference to “apps on the Reddit platform”, as in embedded into the Reddit service itself, not mobile apps

>

> tl;dr: Paid API coming.

jimmySixDOF · on April 19, 2023

Paid I can deal with and Reddit are certainly entitled to some rev share for enabling the content - but - if this goes down the old EEE path through to extinguish third party as a way to force their interface and tools or nothing - then nothing is what it will be. Twitter's API history and present is a great example of how bad things could potentially get.

no_carrier · on April 18, 2023

My thoughts exactly.

dang · on April 18, 2023

Ok - I'm going to merge the threads but will use the more limited title on the merged post.

(Edit: merging https://news.ycombinator.com/item?id=35618695 hither now)

ChuckNorris89 · on April 18, 2023

>> But companies that “crawl” Reddit for data and “don’t return any of that value” to users will have to pay up,” Reddit co-founder and CEO Steve Huffman told The Times.

But they do return value to users. I'd much rather get my answer from a Chat-GPT query than scouring through Reddit.

Maybe he meant that they're not returning value to Reddit in which case he'd be right, but I hate him trying to spin this for the users.

kulahan · on April 18, 2023

ChatGPT having info from reddit does not help redditors, it helps ChatGPTers. You can't really say they're providing a service for reddit users.

thomastjeffery · on April 18, 2023

You also can't say that it isn't helping Reddit users, because the same person may use both platforms.

So the question is: how significant is the union of those two sets?

kbelder · on April 18, 2023

>“don’t return any of that value” to users

Notice that 'to users' was outside of the quote. That was an editorial addition.

majormajor · on April 18, 2023

In the original NY Times article there's this line:

> “Crawling Reddit, generating value and not returning any of that value to our users is something we have a problem with,” Mr. Huffman said. “It’s a good time for us to tighten things up.”

So it is "to users" but more specifically it's to our users.

I would agree with Huffman here: crawling the data to build ChatGPT gives the value to ChatGPT users who aren't necessarily Reddit users, and by short-circuiting queries and processes that otherwise may have led to new Reddit users, it's taking value from all Reddit users.

lcnPylGDnU4H9OF · on April 18, 2023

E.g., the "remind me" bot uses Reddit's API and genuinely returns value to at least some of Reddit's users (unless it just plainly never reminds people). Comparing ChatGPT to things of that nature makes the difference more apparent to me.

deanCommie · on April 18, 2023

That makes you a ChatGPT user, not a Reddit user.

Just because you want to use Reddit data, doesn't make it a Reddit user, does that make sense?

alfiedotwtf · on April 19, 2023

> But companies that “crawl” Reddit for data and “don’t return any of that value” to users will have to pay up,” Reddit co-founder and CEO Steve Huffman told The Times

Pot-Kettle!

The elephant in the room but everyone is forgetting is - how much does Reddit pay its users for content? Reddit's value comes from its users, which is completely voluntarily contributed lol.

Espressosaurus · on April 19, 2023

Except the users are contributing, in the form of upvotes, downvotes, comments, and moderation.

Reddit wouldn't exist without the work of volunteer moderators, as ripe for abuse as the positions are.

Even search engines provide value in that they provide alternative search functionality.

rafark · on April 19, 2023

“ The elephant in the room but everyone is forgetting is - how much does Reddit pay its users for content? ”

I have to say I’m not a fan of reddit, but you could also ask the question: how much do users pay to access reddit?

The web has made a lot of people feel entitled to free (high quality) services. But as developers we know building and maintaining services like reddit is not cheap (let alone free).

ohgodplsno · on April 19, 2023

Reddit users pay lot, considering just how many ads Reddit has on every page, as well as the arranged content promotions that regularly pop up "naturally".

candiddevmike · on April 18, 2023

I thought the supreme court found you can't stop folks from scraping data in the LinkedIn case? I think that applies here in some way.

gdulli · on April 18, 2023

As is usually the case, what they decided was something much narrower than the general case. LinkedIn still can and does make efforts to prevent/restrict automated browsing at scale. What they can't do is selectively block traffic from the plaintiff company altogether, when the content is otherwise publicly available.

zamnos · on April 18, 2023

> In a November 2022 ruling the Ninth Circuit ruled that hiQ had breached LinkedIn's User Agreement and a settlement agreement was reached between the two parties.

https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn

iudqnolq · on April 19, 2023

It's closer to the exact opposite: You can try and scrape LinkedIn. But if they try and stop you you can't try to get around the block.

Sort of, generally, except it's a lot more complicated

celestialcheese · on April 18, 2023

This is the best summary of the current state of scraping laws - https://blog.ericgoldman.org/archives/2022/12/hello-youve-be...

It's complicated is the short answer.

jonny_eh · on April 18, 2023

I think you can physically block them, you can't sue them though.

lcnPylGDnU4H9OF · on April 18, 2023

If one scraped the content that's served to a browser when it's navigated to www.reddit.com I would expect that ruling to apply. If the API is considered a separate service, then I would imagine they could restrict access under separate terms.

shagie · on April 18, 2023

With the ongoing "slowly breaking old reddit" and the move to SPA and mobile app, the data for comments and posts will be via OAuth API access rather than a server rendered html page.

BeFlatXIII · on April 18, 2023

Reddit is upset at all the Reddit mirrors that preserve deleted + removed comments, simple as.

dmix · on April 18, 2023

Oh no, will they go offline because of this? Unddit is extremely useful for seeing how mods manipulate subreddit and just general curiosity for what sort things are no longer in the overtone window this year.

tric · on April 18, 2023

They rely on the pushshift api to get comments that were deleted. The other comments are pulled by the browser at the time the page is accessed.

Depends on if the changes affect pushshift's crawling.

dahwolf · on April 18, 2023

Yesterday Stackoverflow, today Reddit. A clear pattern emerges where open web content/communities face existential issues if the current AI paradigm continues.

It's a daylight robbery. The sum of 18 years of Reddit is an enormous capital investment as well an immeasurable amount of hours spent by its users to create the content.

It's absolutely baffling how a single entity (OpenAI, Google Bard) can just take it all without permission or compensation, and then centrally and exclusively monetize these stolen goods.

The fact that we barely even blink when this happens, and that founders confidently execute on an idea like this, tells you everything there is to know about our industry. It doesn't even pretend to do good anymore. Anything goes, really.

Anyway, get ready for an "open" web that will consist of ever more private places with ever higher walls. Understandably so, any and all incentive to do something on the open web is not only pointless now, it actively helps to feed a giant private brain.

TheCoreh · on April 18, 2023

I understand where you're coming from, but can't fully agree with you.

First, Stack Overflow contributions are licensed under Creative Commons. So monetizing them is explicitly allowed.

Second, information is not "stolen" nor "goods". Copyright law is completely separate from physical property laws, so even if you could make a case about fair use of training data, copyright-ability of model weights and AI generated content (which I agree are still legal gray areas) and therefore whether or not the "Share-Alike" CC clause is enforceable in this context, it would be an entirely different argument from whether the whole industry is somehow entirely morally bankrupt.

Third, given that this is unpaid work made voluntarily by users of the platforms (Reddit, SO), why is it any more acceptable for these platforms to lock it up and monetize it than for AI companies?

I think it's completely reasonable to charge for API access, particularly above a certain volume, but not because these companies have a right to protect some sort of "intellectual capital investment", but rather because the server costs of processing the requests are not negligible.

If anything, this situation really separates the wheat from the chaff in terms of what pools of open web content are truly "open". If the platforms hosting them expect to retain control of their "investment" can they really be said to be open?

I understand the irony, given that OpenAI's own name is somewhat at odds with its practices (of merely providing open access versus truly releasing everything as open source) but I think the reasonable solution to that conundrum is something like Wikimedia Foundation, Internet Archive or maybe CERN for AI, not giving up on free, open content just because it might feed a giant private brain.

abdullahkhalids · on April 18, 2023

> First, Stack Overflow contributions are licensed under Creative Commons. So monetizing them is explicitly allowed.

The evolution of any human legal system can be described as follows.

1. Hey guys, here is a simple set of rules we have agreed upon, to make sure there are no conflicts. Please follow them in good faith.

2. 95% of people follow both the letter and spirit of the agreed rules.

3. Some bad actors come in and only comply with the letter of rules, hacking and exploiting the system to their obscene advantage.

4. The complexity of the rules is increased to shut down the bad actors. The new rules increase costs for everyone, good and bad actors.

Repeat steps 2-4 continuously till the system is completely broken and we are all much worse off. The bad actors, "We did nothing wrong, we followed the letter of the law."

AgentME · on April 19, 2023

What's the conflict? Stack Overflow content was specifically licensed under Creative Commons so that its content can be maximally used and learned from, and it seems to be working successfully in ways not envisioned before.

brokenmachine · on April 19, 2023

3.5. Bad actors lobby for the letter of the law to be changed in their favor.

4.5. Everyday people are incited to argue about distracting, trivial issues while systemic problems snowball.

eric-hu · on April 19, 2023

I'm favoriting this post. What a pithy description of the systemic breakdown of rule of law.

twelve40 · on April 18, 2023

I don't wish Microsoft to forcibly snag the profits from my (and more significantly, many others) Stack Overflow posts - while giving nothing back to the SO community. I'm ok with SO profiting from that and giving me points in return. If/when that becomes a noticeable issue for SO I'm sure they will revisit their approach too, because nobody likes leeches.

dhruvdh · on April 19, 2023

What do you mean Microsoft "forcibly snagging profits"? How do you profit? I am not familiar with incentives behind posting on SO.

Does Microsoft not cite SO posts in Bing results? Do they not make it easy to find the "correct" SO question/answer?

Is the issue that someone else is helping others, vs "you" or the "SO community"?

twelve40 · on April 20, 2023

An incentive to Stackoverflow to administer the service and to keep the lights on is to get paid for traffic to their website from Google search (which they monetize via a modest amount of ads and job posts)

Incentives for free contributors (SO users) to write up good questions, good answers and debate and to come up with and vote on better solutions in the comments is to get points, recognition and yes to help others and get credit for it in their name, even though this credit is not monetary.

If Microsoft regurgitates my answers (just using me as an example, there are infinitely better contributors) without sending traffic to the SO proper website and without people voting for my answer or participating in debates and discussions on SO website proper - and in many (if not most) cases there is no single smash-hit answer and things need to be worked out and voted on - then my motivation as an SO contributor drops to a complete 0. Basically, no reason to contribute at all, since Microsoft is going to grab my answers for itself and collect the subscription (in case of ChatGPT and Copilot), and eventually the inevitable ad revenue from majority of Microsoft and ChatGPT users never leaving the Microsoft properties and never contributing to the original SO activity.

Of course, there are tons of problems inside SO proper currently as well, but none of them destroy any motivation to contribute as third-parties scraping, regurgitating the original content and keeping the traffic to themselves.

gordian-mind · on April 18, 2023

But they didn't take anything? And those two moves of SO and Reddit are mainly about greed: they want some more money just for hosting content that people generated while viewing their ads and giving them money for features.

breck · on April 19, 2023

Two options:

1) Copyleft licenses

2) Abolish copyright law

I am one of the few arguing for #2, but I think #1 is a good short term option.

sceadu · on April 19, 2023

literally the story of google itself... built technology on a large corpus of existing text (the internet) for pagerank and then able to leverage and monetize it via search and ads.

vctrm67 · on April 19, 2023

But Google itself had and still is free. It's a service they provide to you without charge that, were it not to exist, your life would be almost immeasurably more difficult (as with any search engine). And most of the time it doesn't "take" from website owners; if anything, it generates more traffic for them.

When a model trains over Reddit, it may still provide a service that is free. But the way it's going, companies are charging money for access to those models and aren't generating traffic for the underlying training data/sites.

pcthrowaway · on April 19, 2023

Free to search, though you are the product. Even ChatGPT hasn't productized their users yet in order to provide their service for free

But make no mistake, the secret sauce in Google Search is by no means open, and possibly not even comprehensible to a single human at this point.

treis · on April 18, 2023

I wonder if AI training data can replace ads as a way to monetize web services.

sizzle · on April 19, 2023

And Twitter charging for api access

dahwolf · on April 18, 2023

We drastically need copyright reform for text, imagery, video. It was never designed for this AI era.

If you take a concept like "fair use". Let's say I embed your photo and express an opinion about it. That's what fair use was designed for. In-context relatively harmless usage of the content of others, for the sake of expression, culture and education.

That's not the same thing as "let me suck up all content ever created without permission, attribution or compensation, mangle it and sell it via the backdoor whilst making you obsolete".

You can't call that fair use, they are wildly different usages at wildly different scales with wildly different impact.

We need a new copyright category specifically for AI usage. If nothing is expressed, no training permission is given. One can opt-in and allow for training, allow for training under conditions, etc.

burtonator · on April 18, 2023

Honestly, I think it's completely unfair for AIs to train on this data.

I work in ML so I'm aware of the consequences but society wasn't.

My step-daughter is finally crushing it as an graphics artist and she is really pissed at tools like Midjourney.

I asked her about it and she said "yes, they steal the artwork of real artists and generate fake knockoffs" ... and I don't think her opinion is invalid.

dahwolf · on April 18, 2023

Fully agree on everything you said.

In addition, we're all kind of forced to hop on to AI whether we're a programmer or artist just to buy ourselves a little more time, delaying the inevitable. Actually, perhaps accelerating the inevitable by contributing to it.

Even in an utopian world where we would have an economic model to support this (UBI), the outcome still sucks. It wipes out human culture. There's no point in creating/producing anything as almost anything can be produced by anyone, at incredible quality, at no cost and with little skill.

Hence, your daughter being or becoming an incredible artist would have no meaning, except perhaps for herself enjoying the process of creating art.

trifurcate · on April 18, 2023

> Hence, your daughter being or becoming an incredible artist would have no meaning, except perhaps for herself enjoying the process of creating art.

There are lots of points and arguments to be made in this general area, but I have to ask, is this really so bad? I mean, what is the point of our lives and everything we do, other than to generally spend the rest of our time doing things we enjoy for their own sake?

If we're comparing "your daughter is an incredible artist, and here's a job for her designing product packaging for a multinational conglomerate" to "your daughter is an incredible artist, and the multinational conglomerate is using a diffusion model to design their packaging", I think it's really hard to say that the former is better than the latter. Of course, it all depends on the economic model, but the line I am quoting is within that assumption you made of the economic model being able to support this. In that case, I am for the latter wholeheartedly.

Economic incentives are great to get people "hustling", but they are rarely aligned with the human values you wish to protect, and mostly by chance if they are. Your daughter's artistry is better "spent" on art for art's (and personal enjoyment's) sake than on drawing clip art for an obscure HR form somewhere, IMO.

dahwolf · on April 19, 2023

Nothing would stop her from continuing to create art the human way. The most intrinsically motivated will certainly do so.

But it's only half the story. Besides the process of creating art in itself being rewarding, the other rewarding part should be how other people relate to it.

One might have trained themselves for thousands of hours and this will be reflected in the output. Most people suck at art thus the skill, dedication and creativity are recognized as such. This system has merit and scarcity.

The new system has no merit as any fool can type in a few words. Nor does it have scarcity which means an overabundance of output. Both contribute to a lost sense of meaning in creating and even consuming art.

If tomorrow we will all be as fast as the fastest runner, running will become quite pointless. There is no reward or recognition for running fast. In fact, you can't even call it fast anymore, as anybody can do it.

plutoh28 · on April 19, 2023

I wonder what would happen to a world where AI runs the economy. Not everyone has some hobby or passion that brings meaning into their lives. Some people just work, come home, and spend their free hours consuming some form of entertainment. Without work, would those people just have more free hours? The elimination of human labor could be disastrous to mental health.

dahwolf · on April 19, 2023

A good point, and I've been puzzled by how hard this split in characters is between individuals.

I know several people that without external force (work, duty) would have absolutely no idea what to do with themselves. Even their free time they organize around work-like chores or spend it on passive media.

These people seem to lack any sense of wonder, of curiosity or exploration. And it seems a permanent and fixed state. This is who they are. You can't change it.

I would not worry about this problem though because surely in the hypothetical situation of no commercial work, there's plenty of other work we can make up.

drusepth · on April 19, 2023

>There's no point in creating/producing anything as almost anything can be produced by anyone, at incredible quality, at no cost and with little skill.

Does this imply that some significant portion of art "value" is derived from scarcity (e.g. there is more value to creating/producing art when a smaller portion of the population can do so)?

From a strictly financial sense that makes sense, but it does seem morally at-odds with anything that makes art easier for humans to produce.

Is it "good" or "bad" to enable a larger population to produce more art?

Is it "good" or "bad" to enable a larger population to produce higher quality art?

Culturally, both seem like they'd be good. In our current economic model, they're probably both bad.

With an economic model that supports artists financially and removes the need to transmute "art" into "money", I don't think we'd see human culture wiped out. Without a financial incentive to create art, what's the point in creating/producing anything if not to contribute to human culture?

dahwolf · on April 19, 2023

Yes, there's value in scarcity of skill as well as scarcity of output.

Skill: if the merit part is entirely lost, surely we will value art far less compared to now. Anybody can make anything so what is the point?

Output: lots of art to admire is great, but unlimited art isn't. You can't attach value to unlimited.

alephaleph · on April 19, 2023

This perspective seems insane to me. I'm undecided, but if I were to put forward an argument that AI art will be bad for culture regardless of economic model, it would be something like "AI art will always be worse (in some way) than human art, but it will also be cheaper than human art, and thus will replace it in basically all commercial fields, which would be bad for culture." Maybe I'd say it's worse because it's inherently soulless, or just that as a practical matter AI is be better at doing the bare minimum than humans are, or something like that.

If I thought that AI art would allow almost anything to be produced by anything at incredible quality, at no cost and with little skill, that sounds like a Sci-Fi utopia to me, an almost unimaginable world in which all limitations on self-expression are lifted. A world in which making a movie or a TV show or a video game becomes a weekend project. It sounds wonderful.

dahwolf · on April 19, 2023

Nobody will watch your weekend project. Because it fails to impress, anybody can make it. "Unlimited" is not the paradise that you think it is.

alephaleph · on April 21, 2023

I really don't understand the world you're describing. Are you saying that no one will be able to enjoy art anymore because art won't be impressive?

greiskul · on April 18, 2023

I think if we had an economic model to support this, we would definitely be in a way better system then we are right now. So many artists and musicians don't have the resources right in our current system, and have to seek day jobs or stop making art already.

pcthrowaway · on April 19, 2023

> Hence, your daughter being or becoming an incredible artist would have no meaning, except perhaps for herself enjoying the process of creating art.

This is the most meaningful reason for creating art. In fact, I'd argue human expression is the defining element of art (AI output not being art in that sense of the word), and economic motivations just pervert it.

brokenmachine · on April 19, 2023

Artists don't generate art in a vacuum. Everything is a fake knockoff of everything else.

I believe the cream will still rise to the top, and the best artists will still create something totally different, and/or use AI tools to generate something better than they could create otherwise.

qumpis · on April 18, 2023

Are there artists who create in isolation? I.e. the ones who somehow can prove that their art is not based on what they've seen?

klabb3 · on April 19, 2023

No, it’s not in isolation. Doesn’t matter, because fair use applies to humans, not robots. When you go from “human that does X” to “human that operates machine that does X” you’ve changed the situation.

We’ve already been through this with cameras, which are technically just the same as using your eyes and your memory. Yet both legally and morally we all feel that operating a camera doesn’t grant you the same rights as you have by just being and looking. Strolling through the park and seeing the kids playing is very different from bringing a zoom lens and a camping chair.

That said, society could agree to a fair use that applies to ML-trained models. It could simply cover all non-commercial applications, or at the very least research.

pcthrowaway · on April 19, 2023

https://en.wikipedia.org/wiki/Outsider_art is probably the closest

eviks · on April 19, 2023

Are there artist who are capable of viewing gazillions of art works like computers can? And the copy&paste with little effort?

qumpis · on April 19, 2023

Not sure, maybe there are some savants with photographic memory.

But likely people remember references to certain art and can look them up and then 'copy paste' stylistic elements (but with a lot of effort!)

x-complexity · on April 19, 2023

> My step-daughter is finally crushing it as an graphics artist and she is really pissed at tools like Midjourney.

> I asked her about it and she said "yes, they steal the artwork of real artists and generate fake knockoffs" ... and I don't think her opinion is invalid.

Creativity doesn't exist in a vacuum. New creations are based on long-term absorptions of existing concepts & discoveries, & the decision to advance or rebel against any combination of said concepts & discoveries.

The nature of the work will change to focus more on the final product, wherein humans still hold an advantage over art generation models in terms of errors in the produced artwork. There's the possibility that such errors will be corrected with the use of an additional model down the pipeline that's solely focused on correcting said errors, but they're not foolproof either.

There will also be a larger emphasis in some niches over the documentation of the creation of said artworks, as it currently exists in some niche circles I'm in. Reductively, it's the knockoff Gucci handbag problem, wherein the remedies towards it will be the same here:

- (Tech) Serial imprinting / rollover keys / embedded signatures for verification

- (Social) Shaming & ostracization of individuals that buy knockoffs

I'm hesitant on using the legal system to solve such a problem, as the way the current copyright system is set up, it makes it near impossible for a new artist to NOT step on an existing artist's style in some form or another, even if unconsciously doing so.

CrimsonRain · on April 19, 2023

She's doing the same. We all are (stealing stuff, blending with randomness that moves us towards the goal).

nabogh · on April 18, 2023

In my opinion copyright is a law that is always at odds with the free flow of information. I'd hate for that law to start influencing how I interact with user generated text on the internet. As we see on YouTube, nuance for copyright loses to erring on the side of enforcing copyright even when the use is fair.

breck · on April 19, 2023

There's no better thinker IMO on this topic than Stephan Kinsella. (C)opyright law started in the 1500's as a form of censorship. There is no reason for it, other than censorship (or if you are in the top 1%, a great way to extract monopoly profits from the rest).

https://www.stephankinsella.com/paf-podcast/kol236-intellect...

verall · on April 18, 2023

Hey dahwolf I finally got a chance to skim through The Witch Trials of JK Rowling based on your recommendation and I thought it was pretty bad.

nothrowaways · on April 18, 2023

> “The Reddit corpus of data is really valuable,”

Totally agree, no question about that. But data comes from users. Shouldn't they also get paid?

waselighis · on April 18, 2023

Please no. I wish the users who create quality posts could get paid. But sadly, once money is involved, people will start to gamify the system, posting as much barely-passable garbage as possible to maximize upvotes, and the quality of content will deteriorate very quickly.

CobrastanJorji · on April 18, 2023

Money's already involved. Rather than posting barely-passable garbage, the automated systems just repost things that were popular a year or two to various subjects, with the top few comments replacing the titles of the posts. There are a number of counter-bots that detect these posts and warn people that the content is being automatically reposted, but it doesn't really stop it form happening. Presumably Reddit chooses not to stop it because, hey, engagement, woo, metrics go up, manager of engagement look good.

I'm not sure exactly how they're monetizing (maybe they sell the accounts once they have some popular posts?), but they definitely are.

ALittleLight · on April 18, 2023

I think this is actually a reddit, and all other similar platforms, problem. The issue is that there's good content, funny memes, insightful essays, whatever, that was submitted in the past. Some of it is no longer relevant and some of it your audience has already seen and would be bored by - but lots of it would be valuable to resurface now.

Because reddit focuses mainly on what's happening recently the good content of the past that might be relevant to a user today is buried. Reposters play a valuable role in resurfacing content. I think a better paradigm, though one I can't really imagine that well, would remove the need for reposters by automatically showing the content they would repost. Maybe a recommendation algorithm?

mrguyorama · on April 18, 2023

No, the issue is that pretend internet points always end up having real value, because human brains are lazy and rate in-group signalling really high and therefore trust ads that come from "big people" more. That's like the whole thing behind the influencer advertising economy.

Reddit didn't get rid of r/hailcorporate on accident. There are literal industries that exist to make fake accounts, karma farm, and sell use of those accounts to post basically sponsored messages that maybe even reddit itself doesn't know are sponsored. Think of how many people say "I search reddit for product recommendations" and know that companies have been pushing on that button for years and years. Whether reddit is honestly trying to prevent this kind of stuff doesn't actually matter, because as long as real moderation costs money and breaking that moderation makes money, the advantage is towards those who break it. FFS, reddit still has most popular subreddits modded by one account and their sockpuppets.

philipkglass · on April 18, 2023

They got rid of HailCorporate? I'm only a casual Reddit reader so I thought that I had missed some ban drama but it's still there:

https://old.reddit.com/r/HailCorporate/

mrguyorama · on April 19, 2023

It no longer shows up on the /all tab for most users. I imagine that is due to some rule fiddling they did, similar to how one of the donald trump subreddits kept gaming the system to be most of the me page so they changed the rules.

sharkweek · on April 18, 2023

Hard agree.

It'd turn into a world where people would try to make money (which still happens but normally is sniffed out), instead of a place where people like LundgrensFrontKick produce content, for free, because they love doing so, like:

* Estimating how long it took The Joker to set up the giant cash pyramid in The Dark Knight

* Comparing the box office success of movies that have a snowmobile action scene vs those that have a jet ski action scene

* Objectively trying to determine which Fast and Furious movie was the fastest and most furious

https://www.reddit.com/user/LundgrensFrontKick/?sort=top

Yes I know he has like a podcast now or something, but that only came after years of doing this for no reason other than he enjoyed doing it.

evandale · on April 18, 2023

I came across this user just last week from their Vin Diesel sleeveless shirt post! It was the first time I've ever seen their content.

RealityVoid · on April 18, 2023

It is extremely interesting that money fails so hard at the one thing it should be good at, incentivizing behavior. I mean, yes, you'd get more content, but it would be hollow, as you say.

andsoitis · on April 18, 2023

> It is extremely interesting that money fails so hard at the one thing it should be good at, incentivizing behavior.

Money itself doesn't fail to incentivize behavior. Rather, it is what you choose to reward with money that has be carefully chosen to incentivize the behaviors you want to encourage (via monetary reward).

BobbyJo · on April 18, 2023

Money is a fantastic motivator, as evidence by how quickly flaws in systems are gamed in order to attain it.

It's not money that's failing, it's the rule-masters. Agents can't work well in systems with bad rules.

anigbrowl · on April 18, 2023

No, it's money. Because the rule-masters you complain about are just optimizing for more money.

BobbyJo · on April 18, 2023

Sounds like a good motivator if even the rule-masters are chasing it.

anigbrowl · on April 18, 2023

It can be a good motivator and till be failing to serve any greater social purpose. People and organizations can get addicted to money exactly the same as people get addicted to sugar, nicotine, or cocaine. Addicts can be enormously tenacious, creative, and resourceful, but only to the end of feeding their addiction.

BobbyJo · on April 18, 2023

Totally agree. I was just taking issue with the blame on money specifically. Money is working as intended. The rules in which money is operating are fundamentally broken though, no doubt.

MichaelZuo · on April 18, 2023

This is an interesting line of reasoning, does it also apply to HN karma and the big acquirers of karma?

anigbrowl · on April 18, 2023

It could. Some people pursue it very aggressively, optimizing submission times and autoposting submissions of new papers or blog posts, for example. I recall one semi-spam account that was set up to submit anything relating to Ruby, including videos that happened to mention gemstones in the title.

jonny_eh · on April 18, 2023

Basically the same as the alignment problem in AI. You need to be very careful of how you define your rewards, because you'll end up incentivizing exactly what you define.

mitjam · on April 18, 2023

It is like Goodhart‘s Law (When a measure becomes a target, it ceases to be a good measure) with an incentive attached to the measure. It probably needs to be in constant flux by design. Maybe a good thing as it would otherwise get rigid and boring.

Karrot_Kream · on April 18, 2023

Money spent is a good way to understand revealed preferences but this goes only one-way: you can't hand out money to reveal preferences.

_ktx2 · on April 18, 2023

This is also prevalent with the way that Google incentivizes page content structure now. I have to get 3/4 the way through a page before I find what I'm looking for because they encourage this big kitchen sink posts.

tenebrisalietum · on April 18, 2023

When you incentivize words, you get words.

BeFlatXIII · on April 18, 2023

How is that any different from current Reddit? The users gamify themselves into garbage without money.

qumpis · on April 18, 2023

Youtubers who rise to the top seem to satisfy their viewers even if there's hard monetary incentive

SomeBoolshit · on April 18, 2023

As an avid redditor I can say it's already like that and they will teach it nothing of value unless they limit it to a very low number of subreddits, which will still barely teach it anything of value unless the goal is teaching it current generation humor and shitposting habits, which would be very valuable to anyone wanting to boost engagement and try to sway a fairly left leaning generation of people toward whatever they're selling.

Which is today's right-wing billionaires and their pet politicians.

spike021 · on April 18, 2023

Hate to say it but this happens already from what I've seen on Reddit. Even decent subreddits I've followed for years that don't have the issues the major subs have. People want their karma, post low quality content, and somehow people still upvote them.

pcthrowaway · on April 18, 2023

This is already an issue with the mods of the big subreddits being partial to both bias and incentives.

warkdarrior · on April 18, 2023

Welcome to the Internet. Should we ban users who try to make money from their skills?

_siis · on April 18, 2023

This isn't really users trying to make money from their skills.

This is a company taking user contributions as their own, aggregating it, and using their work for free to make money off it.

anigbrowl · on April 18, 2023

Solution, just use historical content.

nothrowaways · on April 18, 2023

Good point.

dsfyu404ed · on April 18, 2023

>people will start to gamify the system, posting as much barely-passable garbage as possible to maximize upvotes

As if this behavior isn't already rampant.

tonystubblebine · on April 18, 2023

FWIW, we at Medium feel pretty similarly to Reddit but with a yes to this question about whether authors should get paid.

AI companies are betraying basic business principles: they are taking value from datasets like Reddit and Medium without giving any value back. Fine if you can get away with it. But since AI, especially text based LLMs, relies on source material, it's pretty straightforward for the platforms that host that source material to deny access. Things like ChatGPT do need current source material.

I don't think it'll come to a war though and that the AI companies will instead give some value back. It could be as simple as citations that send traffic back. That's essentially the exchange of value that we all have with Google these days.

But if it's money, then I think the obligation is for platforms to pass that on the authors. It'd be hard for an individual author to negotiate this on their own with a company like OpenAI, but platforms are in a good position to negotiate on their behalf.

datavirtue · on April 18, 2023

The content is public. Internet companies are staring to sound like Disney.

The AI is definitely giving value back.

nicbou · on April 18, 2023

It’s public for humans, not for other businesses stealing it for their own profit. I don’t want to be an anymous contributor to an AI.

rootusrootus · on April 19, 2023

Another in a really long list of issues that need a clear differentiation between human consumption and machines. So many things that were innocuous or even useful before the age of ubiquitous cameras, other sensors, and computers, are now a big problem.

balls187 · on April 18, 2023

As an end user I get a lot of value from AI companies.

I’m curious how a content platforms TOS will matchup against a Search Engine’s webcrawler TOS.

“I want people to find and access my content, but I own it.”

vs

“I will send people to your content, but any public data I can access, I can store and process how I want.”

mike_hearn · on April 19, 2023

You're in no position to negotiate this with OpenAI because they already have the relevant data stored locally. So does Google/Bing. You could be in a position to negotiate it with smaller upcoming OpenAI competitors, but all that will achieve is granting OpenAI and Google/Bing a monopoly because their competitors will have new large costs that they don't.

Also, Medium has a metered paywall already. Why not just let them open up a corporate account and pay to access paywalled content the same way users do? Why are any negotiations required?

BTW I use Medium but I never use the paywall. I'm fine with my content being used to train AI for free. The payments and tax complexity involved aren't worth the tiny amount of income that any such deal might generate, nor do I want OpenAI to have a monopoly.

tonystubblebine · on April 19, 2023

Maybe no position with Google because they can bundle it with search results and threaten to take away search traffic. But OpenAI definitely does not already have all the relevant data. They need the new stuff also. That's part of the Reddit position as well.

mike_hearn · on April 20, 2023

I wonder to what extent that is true, now LLMs can search the web and read the results?

Tuna-Fish · on April 18, 2023

Not according to the Reddit user agreement:

https://www.redditinc.com/policies/user-agreement

> When Your Content is created with or submitted to the Services, you grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit. You also agree that we may remove metadata associated with Your Content, and you irrevocably waive any claims and assertions of moral rights or attribution with respect to Your Content.

dogleash · on April 18, 2023

Parent posted asked a "shouldn't they" question, not a "legally must they" question.

janalsncm · on April 18, 2023

Thank you for saying this. It’s a classic appeal to the law fallacy I see so often online.

ohgodplsno · on April 19, 2023

The Reddit user agreement is written based on US law and would be laughed out of any court in half the world. I literally _cannot_ waive my moral rights, and any company doing that is breaking the law in my country.

bheadmaster · on April 18, 2023

Hopefully at least moderators will get paid, and finally be held accountable for their moderation.

Powertripping basement dwellers who ban anyone who refuses to worship their supreme authority are one of the worse aspects of Reddit.

eppp · on April 18, 2023

I don't understand this attitude. If you don't like how someone mods their subreddit then why do you want to be part of it in the first place? You can make your own and run it how you please.

evandale · on April 18, 2023

You really can't make your own if you're going up against an established subreddit.

This applies doubly so if it's an established region based subreddit i.e. city, state, province, or country, and IMO these are the most problematic subreddits for overmoderation. Finding non-partisan regional subreddits is damn near impossible.

MichaelZuo · on April 18, 2023

There's two popular NYC subreddits. If two, why not three?

eppp · on April 19, 2023

But then people complain about getting banned from the said subreddit for not following the rules that make it what it is. It seems like you want to engage an already established community and just ignore the people that are currently there and play by the existing rules.

bheadmaster · on April 19, 2023

> people complain about getting banned from the said subreddit for not following the rules

That's not at all what I'm saying. There are a few subreddits with fair mods who enforce the rules fairly, but the great majority doesn't - they are the rules, and if they don't like you, tough tiddies. Making it effectively a "mod and minions", not a real community with real rules.

bheadmaster · on April 18, 2023

I want to be a part of it for the community. Moderators are the (un)necessary evil. The fact that you seem to see moderators as "owners" of a subreddit doesn't really help the case.

eppp · on April 19, 2023

Who created the subreddits if not the moderators?

bheadmaster · on April 19, 2023

The programmers who wrote Reddit, and the community. A subreddit is nothing without a community, so the fact that a moderator namesquatted a URL doesn't mean he "owns" anything, only that he has the power - the power to moderate it. Which, in my opinion, goes with the responsibility of upholding publicly stated rules and enforcing them in a fair manner.

Unfortunately, many of them start thinking, like you, that the power to moderate means they are the supreme authority and that the subreddit is about them - so they behave accordingly, feeding their ego at the expense of a community. Of course, if you believe the power itself gives them ownership over a community, that's fine. It's just that I don't.

BeFlatXIII · on April 18, 2023

> If you don't like how someone mods their subreddit then why do you want to be part of it in the first place?

Access to the rest of the current community.

eppp · on April 19, 2023

The current community follows the rules of the community by definition.

bheadmaster · on April 19, 2023

No, the community follows the rules of the moderators by definition. The community itself doesn't get a say.

Entinel · on April 18, 2023

I would say no to this. Reddit is giving you a platform and in exchange they get the content. If you don't think that is fair deal you're free to just not make the content.