Related to https://news.ycombinator.com/item?id=35617763 ("Reddit Wants to Get Paid for Helping to Teach Big A.I. Systems"; an aside, but I much prefer the title of this post I'm commenting on as it describes the actual change) and it's hard to find this particularly disagreeable. Especially considering:
> Reddit’s API will remain free to developers who want to build apps and bots that help people to use Reddit, as well as to researchers who wish to study Reddit for strictly academic or noncommercial purposes.
> But companies that “crawl” Reddit for data and “don’t return any of that value” to users will have to pay up,” Reddit co-founder and CEO Steve Huffman told The Times.
And the moderation that makes Reddit hold valuable content is done by its users on a per subreddit status. Only stuff that could break laws like extremist content and hate speech is handled by Reddit themselves.
It's really odd to call it "their" data, and this is not exclusive to Reddit.
Considering OpenAI trained on Twitter data among others, I think it'll make for more flavor that users crave, based on the popularity of both of those platforms.
You made the comment, Reddit built the API and the system you use for making comments. If they wish to charge for their part(s) in this, they can, or retract their work, or give it away. Just as you can for your comments.
Certainly they can charge for their API. It was the CEO’s phrasing that was odd. That is their data and if people want to use it, they should pay up.
I think charging for the api is bad as it will make things like user apps harder. I think Reddit’s app is bad, so other apps need to use the api in order to function.
There's a bunch of precedent that aggregators have some IP rights. Reddit does not have exclusive rights to your posts, but they can have some rights to the collection of posts from all users.
> Why would I, as someone who’s made tens of thousands of comments, care if someone scrapes and reuses my comments. I don’t want them to pay up
Why would I, as someone who's made tens of thousands of comments, be happy with a corporation scraping my content to create a service that they'll turn around and charge me for? I want them to pay up, so that Reddit, this wonderful service that has given me thousands of hours of entertainment and education, can be sustainable and grow.
> This is a really rich comment from a company that relies entirely on user submitted content and has never “paid up.”
Most redditors will agree that they get much more from Reddit than they give. I for one am very happy with the arrangement I have with Reddit.
There's a lot of people in this thread defending Reddit and they don't seem to have ever had the pleasure of dealing with an actual Reddit employee. They have a culture of unchecked cronyism. Reddit doesn't care about anyone, some people will eventually figure it out the hard way.
> they don't seem to have ever had the pleasure of dealing with an actual Reddit employee.
99.999% of redditors will never ever have to deal with a Reddit employee. Cronyism? What the hell has that got to do with my consumption of and participation on Reddit?
Most users don't interact with Reddit employees, but the moderators that maintain most of the communities you enjoy do. That's how I ended up interacting with them. Shortly after the IPO rumors, my community started being harassed by an admin through modmail.
That's an odd way to put it. The admins are basically god. God doesn't harass you, he tells you how to live. If an admin tells you to jump, you beg to know how high.
I've been a mod for a decade and never had a problem with admins.
I was new to it though, and an admin picked on me because I was parodying another subreddit (my home city subreddit). You've been modding for a decade, great. That basically backs up my original point that it's a party of tenure and closed-mindedness (cronyism). My situation was different, and it's pointless to argue, but if you trust any company (especially in a changing economic environment), keep your eyes open for poorly motivated incentives.
Regardless, the attitude that they're "god" is a weird way to put it. They randomly IP banned me for calling out an admin's publicity issue during an April fools event. That's cronyism. I've used Reddit for 12 years and engaged in conversations in good faith for years. Paid for subscriptions most of that time. All relationships, business or otherwise should be mutual in some form.
Having more ads and monetizing the API aren't mutually exclusive. Look at the avatar/award system for example, which evolved in tandem with dark patterns that push the user to the app where they can serve unblockable ads.
My perspective is that Reddit made the comment you submitted. In real terms the comment is a record in a database which backs a web application that is developed and administered by Reddit. The comment is your expression but, like it or not, it is by Reddit’s grace that they publish it on their website. (Consider things which would be illegal for them to host and publish; they need to keep a close eye out for such things and prevent those relatively few posts among the millions they receive daily.)
Technically, yes, if you’d like to credit the effect of one hearing your speech to the workings of Earth’s atmosphere. It’s true that the speech is your expression but you correctly point out that the air brings it to my perceptions.
I think what GP means by "made" is "produced". Reddit provides the platform, the community, and the reach -- it's like a record label. Much like how recording artists don't own their recordings.
Tons of recording artists own their masters, both big and small. It’s a function of their contract that determines that ownership, and those terms are clear. Just as they are clear in Reddit’s TOS. You own your content, Reddit simply has a license to use it.
> When Your Content is created with or submitted to the Services, you grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit. You also agree that we may remove metadata associated with Your Content, and you irrevocably waive any claims and assertions of moral rights or attribution with respect to Your Content.
That's just about every aspect of "ownership" I can think of, minus the label "ownership". Honestly, it seems about as close as a lawyer would allow a company governed by Section 230 to have, as "ownership" would step into exposure to liability.
Well, this comment and all other comments you have submitted to Hacker News is a record in my browser's cache. Does it mean I have the right to save them into less volatile storage and charge others for accessing them?
If I sued you for running such a service, it's likely that you would be looking to convince a judge that you got me to agree to something like this: https://www.ycombinator.com/legal/
It's not possible to hide crawling at a large enough scale, right? At some point, certain IPs/user agents will (should be?) hit with CAPTCHAs to be able to have access to content and no amount of user agent/cookie/session/whatever spoofing will get around that, yeah?
IP restrictions are easy to overcome using mobile networks. Basically, mobile networks assigns your device an internal ip and NATs out to a very small pool of ip public addresses. If they block you, they also block a very large chunk of legitimate mobile users. I'm a big ol' dummy when it comes to networking, so I imagine I explained something poorly... so any mobile network nerds feel free to pile on!
Captchas are super easy! There's a gagillion captcha bypass services for every type of captcha. Just snag the captcha token, send it in an API call, and then you get a verified captcha token.
My mobile device (iPhone) relays most traffic through the nearest Akamai datacenter. So they don't get my IP address. And that datacenter has a massive number of IP addresses, which are rotated.
Out of interest how do you know it's being relayed through an Akamai DC? I assume you're talking about private relay which I also use, but I thought cloudflare was the 2nd hop for that?
Cat & mouse game. If you’re defending against a whitehat business scraping with curl from data center IPs, sure.
Against a less-savory actor using hundreds of IPs from residential proxies/compromised hosts, you’re gonna have a rough time, especially if you’re unwilling or unable to use aggresive fingerprinting or (vomit) CloudFlare. Not to mention CAPTCHAs are generally already a solved problem for scrapers.
Residential proxies are a completely solved problem, for companies that actually lose money to them (e.g. Ticketmaster, whose profit is maximized by blocking third-party scalpers so they can do the scalping themselves)
For companies that make money by having more MAUs, well, yeah, they're going to have a real "rough time" detecting inauthentic traffic
While I appreciate it as an irreplaceable tool for countering DDoS, its premise is antithetical to a reliable and open web IMO, and it suffers from the same lack of accessible, customer-facing support as other big tech players. Lazy examples from HN algolia search:
I intensely dislike them taking over as gatekeepers of the web. Perhaps because my browser is configured to resist fingerprinting and to avoid running arbitrary scripts from random websites, it is very frequently blocked by Cloudflare.
As one example, I can no longer browse the site for Lowe's (big box home improvement chain). Consequently, I now buy everything from Home Depot (their competitor).
It's astonishing how Cloudflare can do such a poor job of determining the difference between a potential customer and an attacker. Life's too short to solve captchas for an intermediary, so I don't bother, I just find a competitor who wants my business.
> It's astonishing how Cloudflare can do such a poor job of determining the difference between a potential customer and an attacker.
I don’t find that astonishing at all. I can’t see how you’d disambiguate someone who is anonymous for good versus bad reasons. Not supporting the death of the anonymous internet, but it’s not happening because of incompetence.
I don't think Cloudflare is immune to organizational incompetence even if a lot of brilliant people work there. I have similar intermittent problems as ~tomwheeler, despite a mostly unchanged residential IP and a browser configuration that's only a little bit defensive.
My outsider's impression is that Cloudflare has decided to rely much more heavily on browser fingerprinting than on classifying good/bad network activity. That puts them at odds with anyone that's taken steps to oppose being monetized by advertising firms.
I think that both Cloudflare and the Lowe's stores of the world understand that these interventions have negative side effects. The problem is that leaving them out has even worse consequences, and no one has offered a sufficient alternative.
Put another way, one could reason that they'd prefer to do business with Lowes because they are actively investing in security measures. Perhaps your data is more likely to be compromised at Home Depot.
It induces vomit on anyone who is on any combination of a) a slow network b) TOR or c) noscript. They also fundamentally act as middlemen, the gate between users and what's supposed to be an open web. They even promote having servers run plain http and they'll do the HTTPS proxying for you; you know, so that they can sniff the traffic between you and your users.
reddit might be one of the few last places on the internet that hold the old times of pseudo-names and mindless anonymity. I don't see how changing that would benefit the company. see twitter
> One of the core features of Reddit is that any person may create as many accounts as they like for free. Changing that would be incredibly disruptive.
I wonder what their monthly active users look like if you filter out 1 person switching through 3 usernames/accounts for example.
Reddit wants people to visit the site, become interested in the content they see, and start participating regularly. That's not compatible with hiding enough content behind a registration wall to thwart sufficiently sophisticated scraping.
There are services out there that have a large pool of consumer IPs that are marketed at crawlers for exactly this reason. A lot of them are either using hacked hardware or one of those free VPN browser plugins so it would be very hard to distinguish the traffic from a legit user.
There are residential proxies that allow you bypass most of these things. I’ve been using them to crawl e.g. Amazon or Instagram without any issues, but they’re expensive. IIRC something like $10/GB
Yes - but legal and explicitly allowed by the user.
BrightData is the biggest of them, they run the free VPN Hola, and have an SDK app owners can install in their apps that allow selling bandwidth from installs. For someone who is price sensitive, trading some free residential bandwidth for whatever service is pretty compelling.
I'm sure there are scummy ones, but Bright seems to require pretty explicit consent. Not affiliated, just looked into it for some apps I have, but the payouts weren't good and I didn't think it'd be a good fit for our users.
This is exactly the one I was using. Basically you’re piggy-backing on mobile phones and other devices using their free VPN software, and it’s incredibly hard to block for large websites. Combine this with some other clever tricks, and you’re basically able to do huge scrapes for not-that-much money with incredible convenience.
No because they are only HTTP proxies. But you don’t actually know how these companies get them, rumor is that they are part of browser extensions or free VPNs which users might install on their devices.
The most “reputable” company in this space is Bright Data (formerly Luminati).
Last week someone here said that some of the big VPN players use botnets to residential IP addresses. I assumed they got residential IP addresses from ISPs but maybe not all ISPs in all parts of the world offer that.
CAPTCHAs have been broken by primitive AI for a long time (Long before GPT4-like tools). Their only purpose is to deter the lazy bots. User agents, and any other arbitrary HTTP headers, cookies, etc. have been easy to circumvent as long as the internet has existed. The only thing that sort of works is IP reputation but with IPV6 you can have as many legitimate IPs as you want.
tl;dr Dedicated crawlers built by sophisticated actors are more or less impossible to defeat.
It is very easy and cheap to scale. 1$ for 1000 captchas solved, 10$ for 1000 proxies. Then you have 1000 users, and these are kinda impossible to distinguish from your typical common users if you cared to randomize the digital fingerprints for each client to some extent. Paid APIs for publicly accessible data are not something that makes sense or works well in this world.
You're right... I went off-lane there. It makes total sense of course since there is a demand for data, and clearly, just a minority of the people can just scrape everything at will, even if it sounds like kids play to me. And actually, it all makes sense now, since pay walling your own API is just throwing some competition to scrapers, which is totally legit. Sometimes a simple question can do a good deed, thank you:)
Bots? eh... probably gonna agree. Apps, hard no. API access to third-party apps is the only way to make competition work out in the end. IRQ, AIM, and MSN Messenger all existed around the same time, and thanks to XMPP, worked equally well on an XMPP client. This made it reasonable to use all 3 if a user wanted to, plus they could use their own server too, or a friends, or a company, or whatever. Thanks to SMTP, email is (mostly) the same way right now. On the other hand, the perfect example of how shuttering API access to apps can completely kill any competition exists right now in the form of Discord. Sure, Guilded exists, and one could argue Slack is a competitor, but tell me, do you genuinely use all three? Would they be interchangeable? Or do you split personal and professional between Discord and Slack? If all 3 had a common standard, or at least had open client api's, we'd already have a unified client, making all 3 easy to access at the same time, and we'd have good competition. Reddit has competition, and there are many third party apps that allow using all of them under one roof. Killing that off would not be a welcome change.
So yeah, Reddit may not need bots, but refusing to allow apps is just pushing another nail in the coffin of competition.
Unfortunately the developer of the Apollo app already got a call, and apps will need to pay. That's then the end of reddit on mobile for me. The official app is unusable and had annoying behaviour in desperate attempts to boost engagement.
> There was a quote in an article about how these changes would not affect Reddit apps, that was meant in reference to “apps on the Reddit platform”, as in embedded into the Reddit service itself, not mobile apps
Paid I can deal with and Reddit are certainly entitled to some rev share for enabling the content - but - if this goes down the old EEE path through to extinguish third party as a way to force their interface and tools or nothing - then nothing is what it will be. Twitter's API history and present is a great example of how bad things could potentially get.
>> But companies that “crawl” Reddit for data and “don’t return any of that value” to users will have to pay up,” Reddit co-founder and CEO Steve Huffman told The Times.
But they do return value to users. I'd much rather get my answer from a Chat-GPT query than scouring through Reddit.
Maybe he meant that they're not returning value to Reddit in which case he'd be right, but I hate him trying to spin this for the users.
In the original NY Times article there's this line:
> “Crawling Reddit, generating value and not returning any of that value to our users is something we have a problem with,” Mr. Huffman said. “It’s a good time for us to tighten things up.”
So it is "to users" but more specifically it's to our users.
I would agree with Huffman here: crawling the data to build ChatGPT gives the value to ChatGPT users who aren't necessarily Reddit users, and by short-circuiting queries and processes that otherwise may have led to new Reddit users, it's taking value from all Reddit users.
E.g., the "remind me" bot uses Reddit's API and genuinely returns value to at least some of Reddit's users (unless it just plainly never reminds people). Comparing ChatGPT to things of that nature makes the difference more apparent to me.
> But companies that “crawl” Reddit for data and “don’t return any of that value” to users will have to pay up,” Reddit co-founder and CEO Steve Huffman told The Times
Pot-Kettle!
The elephant in the room but everyone is forgetting is - how much does Reddit pay its users for content? Reddit's value comes from its users, which is completely voluntarily contributed lol.
“ The elephant in the room but everyone is forgetting is - how much does Reddit pay its users for content? ”
I have to say I’m not a fan of reddit, but you could also ask the question: how much do users pay to access reddit?
The web has made a lot of people feel entitled to free (high quality) services. But as developers we know building and maintaining services like reddit is not cheap (let alone free).
Reddit users pay lot, considering just how many ads Reddit has on every page, as well as the arranged content promotions that regularly pop up "naturally".
As is usually the case, what they decided was something much narrower than the general case. LinkedIn still can and does make efforts to prevent/restrict automated browsing at scale. What they can't do is selectively block traffic from the plaintiff company altogether, when the content is otherwise publicly available.
> In a November 2022 ruling the Ninth Circuit ruled that hiQ had breached LinkedIn's User Agreement and a settlement agreement was reached between the two parties.
If one scraped the content that's served to a browser when it's navigated to www.reddit.com I would expect that ruling to apply. If the API is considered a separate service, then I would imagine they could restrict access under separate terms.
With the ongoing "slowly breaking old reddit" and the move to SPA and mobile app, the data for comments and posts will be via OAuth API access rather than a server rendered html page.
Oh no, will they go offline because of this? Unddit is extremely useful for seeing how mods manipulate subreddit and just general curiosity for what sort things are no longer in the overtone window this year.
Yesterday Stackoverflow, today Reddit. A clear pattern emerges where open web content/communities face existential issues if the current AI paradigm continues.
It's a daylight robbery. The sum of 18 years of Reddit is an enormous capital investment as well an immeasurable amount of hours spent by its users to create the content.
It's absolutely baffling how a single entity (OpenAI, Google Bard) can just take it all without permission or compensation, and then centrally and exclusively monetize these stolen goods.
The fact that we barely even blink when this happens, and that founders confidently execute on an idea like this, tells you everything there is to know about our industry. It doesn't even pretend to do good anymore. Anything goes, really.
Anyway, get ready for an "open" web that will consist of ever more private places with ever higher walls. Understandably so, any and all incentive to do something on the open web is not only pointless now, it actively helps to feed a giant private brain.
I understand where you're coming from, but can't fully agree with you.
First, Stack Overflow contributions are licensed under Creative Commons. So monetizing them is explicitly allowed.
Second, information is not "stolen" nor "goods". Copyright law is completely separate from physical property laws, so even if you could make a case about fair use of training data, copyright-ability of model weights and AI generated content (which I agree are still legal gray areas) and therefore whether or not the "Share-Alike" CC clause is enforceable in this context, it would be an entirely different argument from whether the whole industry is somehow entirely morally bankrupt.
Third, given that this is unpaid work made voluntarily by users of the platforms (Reddit, SO), why is it any more acceptable for these platforms to lock it up and monetize it than for AI companies?
I think it's completely reasonable to charge for API access, particularly above a certain volume, but not because these companies have a right to protect some sort of "intellectual capital investment", but rather because the server costs of processing the requests are not negligible.
If anything, this situation really separates the wheat from the chaff in terms of what pools of open web content are truly "open". If the platforms hosting them expect to retain control of their "investment" can they really be said to be open?
I understand the irony, given that OpenAI's own name is somewhat at odds with its practices (of merely providing open access versus truly releasing everything as open source) but I think the reasonable solution to that conundrum is something like Wikimedia Foundation, Internet Archive or maybe CERN for AI, not giving up on free, open content just because it might feed a giant private brain.
> First, Stack Overflow contributions are licensed under Creative Commons. So monetizing them is explicitly allowed.
The evolution of any human legal system can be described as follows.
1. Hey guys, here is a simple set of rules we have agreed upon, to make sure there are no conflicts. Please follow them in good faith.
2. 95% of people follow both the letter and spirit of the agreed rules.
3. Some bad actors come in and only comply with the letter of rules, hacking and exploiting the system to their obscene advantage.
4. The complexity of the rules is increased to shut down the bad actors. The new rules increase costs for everyone, good and bad actors.
Repeat steps 2-4 continuously till the system is completely broken and we are all much worse off. The bad actors, "We did nothing wrong, we followed the letter of the law."
What's the conflict? Stack Overflow content was specifically licensed under Creative Commons so that its content can be maximally used and learned from, and it seems to be working successfully in ways not envisioned before.
I don't wish Microsoft to forcibly snag the profits from my (and more significantly, many others) Stack Overflow posts - while giving nothing back to the SO community. I'm ok with SO profiting from that and giving me points in return. If/when that becomes a noticeable issue for SO I'm sure they will revisit their approach too, because nobody likes leeches.
An incentive to Stackoverflow to administer the service and to keep the lights on is to get paid for traffic to their website from Google search (which they monetize via a modest amount of ads and job posts)
Incentives for free contributors (SO users) to write up good questions, good answers and debate and to come up with and vote on better solutions in the comments is to get points, recognition and yes to help others and get credit for it in their name, even though this credit is not monetary.
If Microsoft regurgitates my answers (just using me as an example, there are infinitely better contributors) without sending traffic to the SO proper website and without people voting for my answer or participating in debates and discussions on SO website proper - and in many (if not most) cases there is no single smash-hit answer and things need to be worked out and voted on - then my motivation as an SO contributor drops to a complete 0. Basically, no reason to contribute at all, since Microsoft is going to grab my answers for itself and collect the subscription (in case of ChatGPT and Copilot), and eventually the inevitable ad revenue from majority of Microsoft and ChatGPT users never leaving the Microsoft properties and never contributing to the original SO activity.
Of course, there are tons of problems inside SO proper currently as well, but none of them destroy any motivation to contribute as third-parties scraping, regurgitating the original content and keeping the traffic to themselves.
But they didn't take anything? And those two moves of SO and Reddit are mainly about greed: they want some more money just for hosting content that people generated while viewing their ads and giving them money for features.
literally the story of google itself... built technology on a large corpus of existing text (the internet) for pagerank and then able to leverage and monetize it via search and ads.
But Google itself had and still is free. It's a service they provide to you without charge that, were it not to exist, your life would be almost immeasurably more difficult (as with any search engine). And most of the time it doesn't "take" from website owners; if anything, it generates more traffic for them.
When a model trains over Reddit, it may still provide a service that is free. But the way it's going, companies are charging money for access to those models and aren't generating traffic for the underlying training data/sites.
We drastically need copyright reform for text, imagery, video. It was never designed for this AI era.
If you take a concept like "fair use". Let's say I embed your photo and express an opinion about it. That's what fair use was designed for. In-context relatively harmless usage of the content of others, for the sake of expression, culture and education.
That's not the same thing as "let me suck up all content ever created without permission, attribution or compensation, mangle it and sell it via the backdoor whilst making you obsolete".
You can't call that fair use, they are wildly different usages at wildly different scales with wildly different impact.
We need a new copyright category specifically for AI usage. If nothing is expressed, no training permission is given. One can opt-in and allow for training, allow for training under conditions, etc.
Honestly, I think it's completely unfair for AIs to train on this data.
I work in ML so I'm aware of the consequences but society wasn't.
My step-daughter is finally crushing it as an graphics artist and she is really pissed at tools like Midjourney.
I asked her about it and she said "yes, they steal the artwork of real artists and generate fake knockoffs" ... and I don't think her opinion is invalid.
In addition, we're all kind of forced to hop on to AI whether we're a programmer or artist just to buy ourselves a little more time, delaying the inevitable. Actually, perhaps accelerating the inevitable by contributing to it.
Even in an utopian world where we would have an economic model to support this (UBI), the outcome still sucks. It wipes out human culture. There's no point in creating/producing anything as almost anything can be produced by anyone, at incredible quality, at no cost and with little skill.
Hence, your daughter being or becoming an incredible artist would have no meaning, except perhaps for herself enjoying the process of creating art.
> Hence, your daughter being or becoming an incredible artist would have no meaning, except perhaps for herself enjoying the process of creating art.
There are lots of points and arguments to be made in this general area, but I have to ask, is this really so bad? I mean, what is the point of our lives and everything we do, other than to generally spend the rest of our time doing things we enjoy for their own sake?
If we're comparing "your daughter is an incredible artist, and here's a job for her designing product packaging for a multinational conglomerate" to "your daughter is an incredible artist, and the multinational conglomerate is using a diffusion model to design their packaging", I think it's really hard to say that the former is better than the latter. Of course, it all depends on the economic model, but the line I am quoting is within that assumption you made of the economic model being able to support this. In that case, I am for the latter wholeheartedly.
Economic incentives are great to get people "hustling", but they are rarely aligned with the human values you wish to protect, and mostly by chance if they are. Your daughter's artistry is better "spent" on art for art's (and personal enjoyment's) sake than on drawing clip art for an obscure HR form somewhere, IMO.
Nothing would stop her from continuing to create art the human way. The most intrinsically motivated will certainly do so.
But it's only half the story. Besides the process of creating art in itself being rewarding, the other rewarding part should be how other people relate to it.
One might have trained themselves for thousands of hours and this will be reflected in the output. Most people suck at art thus the skill, dedication and creativity are recognized as such. This system has merit and scarcity.
The new system has no merit as any fool can type in a few words. Nor does it have scarcity which means an overabundance of output. Both contribute to a lost sense of meaning in creating and even consuming art.
If tomorrow we will all be as fast as the fastest runner, running will become quite pointless. There is no reward or recognition for running fast. In fact, you can't even call it fast anymore, as anybody can do it.
I wonder what would happen to a world where AI runs the economy. Not everyone has some hobby or passion that brings meaning into their lives. Some people just work, come home, and spend their free hours consuming some form of entertainment. Without work, would those people just have more free hours? The elimination of human labor could be disastrous to mental health.
A good point, and I've been puzzled by how hard this split in characters is between individuals.
I know several people that without external force (work, duty) would have absolutely no idea what to do with themselves. Even their free time they organize around work-like chores or spend it on passive media.
These people seem to lack any sense of wonder, of curiosity or exploration. And it seems a permanent and fixed state. This is who they are. You can't change it.
I would not worry about this problem though because surely in the hypothetical situation of no commercial work, there's plenty of other work we can make up.
>There's no point in creating/producing anything as almost anything can be produced by anyone, at incredible quality, at no cost and with little skill.
Does this imply that some significant portion of art "value" is derived from scarcity (e.g. there is more value to creating/producing art when a smaller portion of the population can do so)?
From a strictly financial sense that makes sense, but it does seem morally at-odds with anything that makes art easier for humans to produce.
Is it "good" or "bad" to enable a larger population to produce more art?
Is it "good" or "bad" to enable a larger population to produce higher quality art?
Culturally, both seem like they'd be good. In our current economic model, they're probably both bad.
With an economic model that supports artists financially and removes the need to transmute "art" into "money", I don't think we'd see human culture wiped out. Without a financial incentive to create art, what's the point in creating/producing anything if not to contribute to human culture?
This perspective seems insane to me. I'm undecided, but if I were to put forward an argument that AI art will be bad for culture regardless of economic model, it would be something like "AI art will always be worse (in some way) than human art, but it will also be cheaper than human art, and thus will replace it in basically all commercial fields, which would be bad for culture." Maybe I'd say it's worse because it's inherently soulless, or just that as a practical matter AI is be better at doing the bare minimum than humans are, or something like that.
If I thought that AI art would allow almost anything to be produced by anything at incredible quality, at no cost and with little skill, that sounds like a Sci-Fi utopia to me, an almost unimaginable world in which all limitations on self-expression are lifted. A world in which making a movie or a TV show or a video game becomes a weekend project. It sounds wonderful.
I think if we had an economic model to support this, we would definitely be in a way better system then we are right now. So many artists and musicians don't have the resources right in our current system, and have to seek day jobs or stop making art already.
> Hence, your daughter being or becoming an incredible artist would have no meaning, except perhaps for herself enjoying the process of creating art.
This is the most meaningful reason for creating art. In fact, I'd argue human expression is the defining element of art (AI output not being art in that sense of the word), and economic motivations just pervert it.
Artists don't generate art in a vacuum. Everything is a fake knockoff of everything else.
I believe the cream will still rise to the top, and the best artists will still create something totally different, and/or use AI tools to generate something better than they could create otherwise.
No, it’s not in isolation. Doesn’t matter, because fair use applies to humans, not robots. When you go from “human that does X” to “human that operates machine that does X” you’ve changed the situation.
We’ve already been through this with cameras, which are technically just the same as using your eyes and your memory. Yet both legally and morally we all feel that operating a camera doesn’t grant you the same rights as you have by just being and looking. Strolling through the park and seeing the kids playing is very different from bringing a zoom lens and a camping chair.
That said, society could agree to a fair use that applies to ML-trained models. It could simply cover all non-commercial applications, or at the very least research.
> My step-daughter is finally crushing it as an graphics artist and she is really pissed at tools like Midjourney.
> I asked her about it and she said "yes, they steal the artwork of real artists and generate fake knockoffs" ... and I don't think her opinion is invalid.
Creativity doesn't exist in a vacuum. New creations are based on long-term absorptions of existing concepts & discoveries, & the decision to advance or rebel against any combination of said concepts & discoveries.
The nature of the work will change to focus more on the final product, wherein humans still hold an advantage over art generation models in terms of errors in the produced artwork. There's the possibility that such errors will be corrected with the use of an additional model down the pipeline that's solely focused on correcting said errors, but they're not foolproof either.
There will also be a larger emphasis in some niches over the documentation of the creation of said artworks, as it currently exists in some niche circles I'm in. Reductively, it's the knockoff Gucci handbag problem, wherein the remedies towards it will be the same here:
- (Tech) Serial imprinting / rollover keys / embedded signatures for verification
- (Social) Shaming & ostracization of individuals that buy knockoffs
I'm hesitant on using the legal system to solve such a problem, as the way the current copyright system is set up, it makes it near impossible for a new artist to NOT step on an existing artist's style in some form or another, even if unconsciously doing so.
In my opinion copyright is a law that is always at odds with the free flow of information. I'd hate for that law to start influencing how I interact with user generated text on the internet. As we see on YouTube, nuance for copyright loses to erring on the side of enforcing copyright even when the use is fair.
There's no better thinker IMO on this topic than Stephan Kinsella. (C)opyright law started in the 1500's as a form of censorship. There is no reason for it, other than censorship (or if you are in the top 1%, a great way to extract monopoly profits from the rest).
Please no. I wish the users who create quality posts could get paid. But sadly, once money is involved, people will start to gamify the system, posting as much barely-passable garbage as possible to maximize upvotes, and the quality of content will deteriorate very quickly.
Money's already involved. Rather than posting barely-passable garbage, the automated systems just repost things that were popular a year or two to various subjects, with the top few comments replacing the titles of the posts. There are a number of counter-bots that detect these posts and warn people that the content is being automatically reposted, but it doesn't really stop it form happening. Presumably Reddit chooses not to stop it because, hey, engagement, woo, metrics go up, manager of engagement look good.
I'm not sure exactly how they're monetizing (maybe they sell the accounts once they have some popular posts?), but they definitely are.
I think this is actually a reddit, and all other similar platforms, problem. The issue is that there's good content, funny memes, insightful essays, whatever, that was submitted in the past. Some of it is no longer relevant and some of it your audience has already seen and would be bored by - but lots of it would be valuable to resurface now.
Because reddit focuses mainly on what's happening recently the good content of the past that might be relevant to a user today is buried. Reposters play a valuable role in resurfacing content. I think a better paradigm, though one I can't really imagine that well, would remove the need for reposters by automatically showing the content they would repost. Maybe a recommendation algorithm?
No, the issue is that pretend internet points always end up having real value, because human brains are lazy and rate in-group signalling really high and therefore trust ads that come from "big people" more. That's like the whole thing behind the influencer advertising economy.
Reddit didn't get rid of r/hailcorporate on accident. There are literal industries that exist to make fake accounts, karma farm, and sell use of those accounts to post basically sponsored messages that maybe even reddit itself doesn't know are sponsored. Think of how many people say "I search reddit for product recommendations" and know that companies have been pushing on that button for years and years. Whether reddit is honestly trying to prevent this kind of stuff doesn't actually matter, because as long as real moderation costs money and breaking that moderation makes money, the advantage is towards those who break it. FFS, reddit still has most popular subreddits modded by one account and their sockpuppets.
It no longer shows up on the /all tab for most users. I imagine that is due to some rule fiddling they did, similar to how one of the donald trump subreddits kept gaming the system to be most of the me page so they changed the rules.
It'd turn into a world where people would try to make money (which still happens but normally is sniffed out), instead of a place where people like LundgrensFrontKick produce content, for free, because they love doing so, like:
* Estimating how long it took The Joker to set up the giant cash pyramid in The Dark Knight
* Comparing the box office success of movies that have a snowmobile action scene vs those that have a jet ski action scene
* Objectively trying to determine which Fast and Furious movie was the fastest and most furious
It is extremely interesting that money fails so hard at the one thing it should be good at, incentivizing behavior. I mean, yes, you'd get more content, but it would be hollow, as you say.
> It is extremely interesting that money fails so hard at the one thing it should be good at, incentivizing behavior.
Money itself doesn't fail to incentivize behavior. Rather, it is what you choose to reward with money that has be carefully chosen to incentivize the behaviors you want to encourage (via monetary reward).
It can be a good motivator and till be failing to serve any greater social purpose. People and organizations can get addicted to money exactly the same as people get addicted to sugar, nicotine, or cocaine. Addicts can be enormously tenacious, creative, and resourceful, but only to the end of feeding their addiction.
Totally agree. I was just taking issue with the blame on money specifically. Money is working as intended. The rules in which money is operating are fundamentally broken though, no doubt.
It could. Some people pursue it very aggressively, optimizing submission times and autoposting submissions of new papers or blog posts, for example. I recall one semi-spam account that was set up to submit anything relating to Ruby, including videos that happened to mention gemstones in the title.
Basically the same as the alignment problem in AI. You need to be very careful of how you define your rewards, because you'll end up incentivizing exactly what you define.
It is like Goodhart‘s Law (When a measure becomes a target, it ceases to be a good measure) with an incentive attached to the measure. It probably needs to be in constant flux by design. Maybe a good thing as it would otherwise get rigid and boring.
This is also prevalent with the way that Google incentivizes page content structure now. I have to get 3/4 the way through a page before I find what I'm looking for because they encourage this big kitchen sink posts.
As an avid redditor I can say it's already like that and they will teach it nothing of value unless they limit it to a very low number of subreddits, which will still barely teach it anything of value unless the goal is teaching it current generation humor and shitposting habits, which would be very valuable to anyone wanting to boost engagement and try to sway a fairly left leaning generation of people toward whatever they're selling.
Which is today's right-wing billionaires and their pet politicians.
Hate to say it but this happens already from what I've seen on Reddit. Even decent subreddits I've followed for years that don't have the issues the major subs have. People want their karma, post low quality content, and somehow people still upvote them.
FWIW, we at Medium feel pretty similarly to Reddit but with a yes to this question about whether authors should get paid.
AI companies are betraying basic business principles: they are taking value from datasets like Reddit and Medium without giving any value back. Fine if you can get away with it. But since AI, especially text based LLMs, relies on source material, it's pretty straightforward for the platforms that host that source material to deny access. Things like ChatGPT do need current source material.
I don't think it'll come to a war though and that the AI companies will instead give some value back. It could be as simple as citations that send traffic back. That's essentially the exchange of value that we all have with Google these days.
But if it's money, then I think the obligation is for platforms to pass that on the authors. It'd be hard for an individual author to negotiate this on their own with a company like OpenAI, but platforms are in a good position to negotiate on their behalf.
Another in a really long list of issues that need a clear differentiation between human consumption and machines. So many things that were innocuous or even useful before the age of ubiquitous cameras, other sensors, and computers, are now a big problem.
You're in no position to negotiate this with OpenAI because they already have the relevant data stored locally. So does Google/Bing. You could be in a position to negotiate it with smaller upcoming OpenAI competitors, but all that will achieve is granting OpenAI and Google/Bing a monopoly because their competitors will have new large costs that they don't.
Also, Medium has a metered paywall already. Why not just let them open up a corporate account and pay to access paywalled content the same way users do? Why are any negotiations required?
BTW I use Medium but I never use the paywall. I'm fine with my content being used to train AI for free. The payments and tax complexity involved aren't worth the tiny amount of income that any such deal might generate, nor do I want OpenAI to have a monopoly.
Maybe no position with Google because they can bundle it with search results and threaten to take away search traffic. But OpenAI definitely does not already have all the relevant data. They need the new stuff also. That's part of the Reddit position as well.
> When Your Content is created with or submitted to the Services, you grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit. You also agree that we may remove metadata associated with Your Content, and you irrevocably waive any claims and assertions of moral rights or attribution with respect to Your Content.
The Reddit user agreement is written based on US law and would be laughed out of any court in half the world. I literally _cannot_ waive my moral rights, and any company doing that is breaking the law in my country.
I don't understand this attitude. If you don't like how someone mods their subreddit then why do you want to be part of it in the first place? You can make your own and run it how you please.
You really can't make your own if you're going up against an established subreddit.
This applies doubly so if it's an established region based subreddit i.e. city, state, province, or country, and IMO these are the most problematic subreddits for overmoderation. Finding non-partisan regional subreddits is damn near impossible.
But then people complain about getting banned from the said subreddit for not following the rules that make it what it is. It seems like you want to engage an already established community and just ignore the people that are currently there and play by the existing rules.
> people complain about getting banned from the said subreddit for not following the rules
That's not at all what I'm saying. There are a few subreddits with fair mods who enforce the rules fairly, but the great majority doesn't - they are the rules, and if they don't like you, tough tiddies. Making it effectively a "mod and minions", not a real community with real rules.
I want to be a part of it for the community. Moderators are the (un)necessary evil.
The fact that you seem to see moderators as "owners" of a subreddit doesn't really help the case.
The programmers who wrote Reddit, and the community. A subreddit is nothing without a community, so the fact that a moderator namesquatted a URL doesn't mean he "owns" anything, only that he has the power - the power to moderate it. Which, in my opinion, goes with the responsibility of upholding publicly stated rules and enforcing them in a fair manner.
Unfortunately, many of them start thinking, like you, that the power to moderate means they are the supreme authority and that the subreddit is about them - so they behave accordingly, feeding their ego at the expense of a community.
Of course, if you believe the power itself gives them ownership over a community, that's fine. It's just that I don't.
I would say no to this. Reddit is giving you a platform and in exchange they get the content. If you don't think that is fair deal you're free to just not make the content.
Exactly. Users who contribute to the community of a for-profit enterprise that monetizes their contribution should also get paid.
I remember the good ol' days before reddit, where every community had its own forum, ran for the community, not for profit. Sure, they were running some ads to keep the lights on, but those were non targeted ads, just generic stuff based on the community.
With Dpreview dying along with so many other forums that used to serve various communities on the decline, Reddit becoming the one-stop-shop for all communities is the worst possible outcome.
I meant costs for their users. Reddit are covering their costs through ads and VC money. The users are creating the content on their platform that attracts more users. The users are the value to reddit.
Moderators should, in my opinion. They’re doing a ton of the thankless janitorial work of cleaning up Reddit’s walled garden. If Reddit mods quit for a week Reddit wouldn’t have a product.
As a Reddit addict who's spent thousands of hours on the website, I feel I am very well compensated for my content. For every post and comment I give to Reddit, Reddit serves me millions of posts and comments in return. This has massive entertainment and educational value to me, so I consider it a very favourable trade.
There is already a compensation system setup for ugc. Users are already free to evaluate what they share on Reddit for what they get in return. But users have effectively been compensated for providing their content.
Users may not get paid, but they do get free access to one of the best moderated community web sites on the planet. I get hours of enjoyment and engagement from Reddit. Totally worth it for me. Of course YMMV...
There would be far less objections to generative AI if these tools were being harnessed by (for example) social democracies to make societies overall more efficient and productive and to redistribute gains into greater overall security. What HN users tend to summarily dismiss as Luddism is the “golden rule” form of American Capitalism, where having the most money/compute resources entitles you to scrape and enclose the sum total of human creative and intellectual output for corporate gain. It’s the basic conflict of Capitalism - what gains should be returned to labor vs Capital owners, sublimated through a whole mess of pedantry, philosophizing, and cynical legal maneuvering to obfuscate that fact.
What a twist of fate! The social media generation companies built their success on aggregating other people’s content and offering new ways to interact with it. “No we’re just linking to what other sites provide for free”. Now, there’s a new leather jacket in town. “We’re just training on data that you already provide for free”.
Of course it’s fun to watch a turf war, and we can all cheer for our favorite team and quibble about who deserves a punch in the gut.
But, we also need to keep an eye on the horizon. This will change the world, even and especially the spaces that we currently rely on. Just look at what happened to legacy media when the aggregators came: it largely turned into blogspam and clickbait. Comment sections (like this one) aren’t perfect, but they’re a damn good pressure valve for regular people to interact with the world. What will happen to those, for instance?
Theres plenty of incentive for people to shoot the breeze and shout into the void on comment sections. Typing to an LLM just isnt the same, especially with the amnesiac ones we have now.
That being said, I don't agree about Reddit comment quality... its just generally not horrible, with the better part of it being old or in niche subs on niche topics (like fandoms or memes) that the LLM trainers are avoiding anyway.
> Theres plenty of incentive for people to […] shout into the void on comment sections. Typing to an LLM just isnt the same[…]
I am concerned with the opposite, that LLMs will shout into our comment sections. For instance, building a convincing sentiment manipulating bit network will be dirt cheap and easy. Even here on HN there’s a financial incentive to flood the place with bots to promote tech products.
> I don't agree about Reddit comment quality... its just generally not horrible
That’s fine, but the important thing is that most comments are written by real people who wasted time to write it.
Not just that, but language is used as a marker. You can tell when someone talks about a subject they know a lot about. This ability to judge for yourself, based on the content alone, will be eroded. Anything can sound convincing, even to the trained ear. This makes content-oriented communities like Reddit and HN particularly vulnerable.
Oh yeah that is a definite issue. Maybe mods/LLM bots in some niche subs can save them with very strict on topic requirements, which would reduce the incentive for most of spam, but the more general parts of reddit (and HN) are in trouble.
That’s such generic logic it applies to society as a whole anymore. We just extend past effort.
James Madison wrote about it, saying the future owes deference to the past by carrying on the benefits it inherits from it.
I wonder if people could just social in place more; talk, make art, rather than stare at a glass obelisk all day should social media die. You think that’s ever happened in human history? I dunno.
The cynic in me thinks this will slowly morph into charging access for third-party apps too.
Third-party apps don't show ads; there's no reason ads couldn't be included in the feed and required to be shown as a condition of using the API, but I imagine it makes tracking impressions etc far more difficult. Any new features they add also need to either be incorporated into the API or remain unavailable for those users.
My only hope is that third-party apps remain niche enough that Reddit leaves them be; the first-party experiences are all awful to the point where I would probably just stop using Reddit if third-party offerings become unavailable.
They also might be pulling a Tumblr. I really hope they don't.
> For NSFW content, they were not 100% sure of the answer, but thought that it would no longer be possible to access via the API, I asked how they balance this with plans for the API to be more equitable with the official app, and there was not really an answer but they did say they would look into it more and follow back up. I would like to follow up more about this, especially around content hosting on other websites that is posted to Reddit, as well as different types of NSFW content (a text post marked NSFW due to a gory moment in a story, for instance).
As noted in the comments, the API changes will also affect the quick .json representations of Reddit pages, which were an easy way to play with real-world data for beginners learning coding/data science.
Soon Reddit's users will want a cut of it for content they create.
Then all the places these users are copying content from will want their share.
There's no solution here. Either the web stays (mostly) open and free-for-all like it is now or everyone sets up their own little walls and ends the party.
“The world's entire scientific ... heritage ... is increasingly being digitized and locked up by a handful of private corporations....
The Open Access Movement has fought valiantly to ensure that scientists do not sign their copyrights away but instead ensure their work is published on the Internet, under terms that allow anyone to access it.” - Aaron Swartz
So the title is not correct? API access is free. Crawling is not?
A major title change came from the New York Times source that is "Reddit Wants to Get Paid for Helping to Teach Big A.I. Systems". Now that makes it much more clear what this is all about and why it is happening right now.
* Offering an API is expensive, third party app users understandably cause a lot of server traffic
...
* To this end, Reddit is moving to a paid API model for apps. The goal is not to make this inherently a big profit center, but to cover both the costs of usage, as well as the opportunity costs of users not using the official app (lost ad viewing, etc.)
* They spoke to this being a more equitable API arrangement, where Reddit doesn't absorb the cost of third party app usage, and as such could have a more equitable footing with the first party app and not favoring one versus the other as as Reddit would no longer be losing money by having users use third party apps
* The API cost will be usage based, not a flat fee, and will not require Reddit Premium for users to use it, nor will it have ads in the feed. Goal is to be reasonable with pricing, not prohibitively expensive.
* Free usage of the API for apps like Apollo is not something they will offer, and thus me offering free usage of the app will likely be very difficult, Apollo will almost certainly have to move to an Apollo Ultra only (AKA subscription) model
Closing the barn door after the horse has already bolted.
Pulling data off Reddit now will likely give you a very large amount of polluted data from LLMs. I mean, yea it could be useful for some broad topics at this point, but still likely to contain a lot of GPTs own feedback.
It's likely that companies like OpenAI will just use their old reddit dataset, and then move to scraping things like YouTube for not just text, but audio and imagery too.
It doesn't really matter if some of the data is generated by LLMs. It's possible for LLMs to improve themselves by training on their own output, there is no strict requirement for "fresh" content. If it's mixed with human content and gets human feedback through voting that's great training data.
I feel API pricing is fine if the API money is more valuable for a platform than what's being built with it. If you have something valuable and you're not getting value from giving it away, sure.
The issue I have with Twitter's new API pricing is it's not either - it's paying a lot for a little, so feels more like an explicit move to stop companies building on Twitter. Like it's trying to kill the API altogther.
Always a rule of thumb with Elon topics - try to distill whether people are objecting to the principle or to the absurd cack-handed way Elon attempts to implement that principle. Because more often than not a good version what Musk attempts is fine, it's just he's not competent enough to produce the good version.
Is reddit going to pay users? Or are they just going to collect the content generate by its users and then turn around and charge people to access it?
I think we all know that it's more column B than column A.
And while I'm not entirely comfortable with LLMs consuming all of that content without reimbursing the creators of that content. I don't see how Reddit charging for its API is different on any meaningful level.
Then maybe users should poison the content they post to make their data worthless unless Reddit decides to compensate its users who provide valuable info instead of updoots.
I can consume HN content, turn around and use it to derive value. That value spills onto others in various forms or maybe I keep all the value for myself (can't think of how I would horde value without sharing because I would need to offer something to others to receive value myself).
Nonetheless, markets don't operate efficiently when people horde shit.
Reddit already exploits unpaid moderators to create and manage its communities, so I don't imagine that Reddit is in a hurry to compensate users for selling their data to Big AI.
So this is how SkyNet or the Matrix starts, I guess. Any AI trained using the content from Reddit would obviously conclude that humankind deserves to be eradicated.
/s
We are the Borg. Lower your shields and surrender your ships. We will add your biological and technological distinctiveness to our own. Your culture will adapt to service us. Resistance is futile.
God I hope this happens someday to us. Except we are humans so we have already planned for this, and we can take out whatever adversarial alien civilization exists, however advanced they may be on paper. Even if it takes millennia we should be the dominant species in the galaxy and beyond.
Reddit should consider paying its moderators. Or employ moderators who don't use their vast unchecked powers to astroturf the site on behalf of shadowy companies.
Society would not be better off if reddit mods had more economic power.
The latter point isn't a bug it's a feature. Reddit is designed to function that way. The owners/execs have never expressed any interest in countering it outside of the limpest lip service imaginable.
I moderated two large subreddits (on various accounts). They responded to this by permanently banning both the accounts and the subreddits. They don't want users.
That said, on subreddits I see people who post content without attribution all the time. I recall in /r/aww you can't directly link to an Instagram post but you can "steal" the image and post it, and it's optional as to whether or not you link to the Instagram post within the comments. Likewise, people take videos from YouTube/TikTok and re-host it on Reddit.
In smaller subreddits people will post entire pay-walled articles as if writers only get paid in likes.
You'll have an account like "Science is amazing" or something similar which seems uplifting and does show relevant/great content. Given the positive name and quality content, they get popular quickly.
But they never attribute or give back. They gain millions of followers whilst the original creators of the content get left behind. One of many things broken on the internet.
It's hilarious to see Reddit's inept attempts to monetize the content gold mine they've squandered after a decade of devaluing product and engineering.
ha-ha. Loving these double-faced stories here and there. “Crawling Reddit, generating value, and not returning any of that value to our users is something we have a problem with.” Very well Mr. Huffman, but what about “Posting on Reddit, generating free content which brings multi-hundred-million advertisement profits for the company, and not getting any of that value back is something which your users don't have the slightest problem with.“
The API is just a convenience to get the data, but surely you can get all the data you want without any additional API for free just by using their HTTP API - as any other generic user would do. Of course, filling up an enormous proxy well to avoid various ingenious "protections" could cost you some 10-20 bucks, and solving captchas automatically could cost you another 1$ for 1000, but from there, it's even easier and more enjoying to use than an API. I'm feeling like launching a scrape-it-all service to avoid greedy ip-protocol customs officers could be a profitable venture these days.
A sensible choice. Now only if open source developers would update their licenses, perhaps a new GPL license, to restrict reselling of IP through AI models. These folks need to adhere to rules if we are to have a healthy ecosystem.
Reddit does not own their users' content, however - they would also be simple resellers. All that's happening here is that they have failed to monetise where others are succeeding, and now they are positioning themselves to get a piece of that pie.
The right thing for them to do morally, would be to implement content visibility/privacy controls for their users similar to what Facebook offers (strange feeling to be referring to Facebook in this context).
My hope is that large players sealing off their content will motivate individuals to protect theirs. It brings awareness that their data is harvested and sold in ways never seen before, and is then used against them. Ideally those who make free software are the first to understand the implications.
Basically what I want is that all models trained on open source data or user created content without proper licensing are also open source and free.
The argument AI businesses use is that their use of copyrighted work is fair use, which means that there is no license that would prevent your IP from being used by AI models.
If that holds up legally, the best you can do is to try to stop your content from being scraped or not release it at all.
> Reddit is moving to a paid API model for apps. The goal is not to make this inherently a big profit center, but to cover both the costs of usage, as well as the opportunity costs of users not using the official app (lost ad viewing, etc.)
I’m amazed they are willing to charge for their abomination of an API. The search functionality is terrible, returns unreliable results, and can only return 100 at once. I would happily pay for a great version of the Reddit API. I doubt anyone doing huge scraping jobs on Reddit is using their API to do so.
Seems like the downfall of Reddit is eminent between this decision and nerfing the mobile web experience for no good reason other than to vacuum up mobile user data. What do others here think?
ChatGPT is on a trajectory to overtake Reddit in popularity.
And every interaction from users with ChatGPT is valuable content provided to OpenAI.
Most people don't realize this, but every question contains information. When a user asks "Which city is better for digital nomads, Berlin or Lisbon?", they have given out a bunch of information. That there is something called "digital nomads". That there are cities called "Berlin" and "Lisbon". That those seem to be considered good for "digital nomads".
And even more so when the chat continues. If ChatGPT praises how nice a city is for studying and the users replies "I don't study. I need a cheap apartment with fast internet", the user provided information about the preferences of "digital nomads", that apartments can be cheap or expensive, that apartments have internet, that internet can be faster or slower.
This is not how LLMs work at all. Once your chat session ends that's it. Updating the weights is expensive (although it's done semi regularly). And in updating weights the training datasets' quality becomes an issue.
Folks are drastically underestimating the "grey goo" problem when it comes to training data. Now that AI generated content is so cheap to generate, the quality of training datasets is going to plummet.
"There’s a lot of stuff on the site that you’d only ever say in therapy" Yes that is indeed Reddit in a nutshell. May not want Reddit content in your next ChatGPT model, so not necessarily a bad thing.
i.reddit.com gone, they want to kill the awesome 3rd party apps next instead of improving theirs. They are definitely killing off 3rd party apps, my prediction is that it will be killed within an year.
Except when top voted comments are hivemind approved 'funny' quips/responses, or in reply to exercises in creative writing like half the posts in relationshipadvice, iwantthemanager, nuclear/pettyrevenge, etc
How do they plan to keep Google from using its search index of Reddit for training? Or keep OpenAI from using Common Crawl? Do they simply add "No AI" to their TOS?
Yeah, what about Apollo which is probably the only reason why I still use Reddit?
EDIT: I guess it’s safe.
> Reddit’s API will remain free to developers who want to build apps and bots that help people use Reddit, as well as to researchers who wish to study Reddit for strictly academic or noncommercial purposes.
Related thread including comments from the creator of the Apollo app:
> Funny timing, given the post yesterday and my praise for how communicative Reddit has been, but today there's a comparatively much more vague post about changes to the Reddit API.
> I posted in that thread and asked a few questions which as of the time of posting have not been answered.
> Shortly after the post they emailed me about a meeting, which I've replied to and will keep you all in the loop on.
Interesting… i guess google wont be charged because of the backlinks but ChatGPT will be, because they just show an answer to one’s query and dont actually show any of the “original content” in context, and therefore no back-traffic for reddit.