In a lot of ways, the statistical processing is a novel form of
information retrieval. So the issue is somewhat like if 20 years ago
Google was indexing the web, then decided to just rehost all the
indexed content on their own servers and monetize the views instead
linking to the original source of the content.
It’s not anything like rehosting though. Assume I read a bunch of web articles, synthesize that knowledge and then answer a bunch of question on the web. I am performing some form of information retrieval. Do I need to pay the folks who wrote those articles even though they provided it for free on the web?
It seems like the only difference between me and ChatGPT is the scale at which ChatGPT operates. ChatGPT can memorize a very large chunk of the web and keep answering millions of questions while I can memorize a small piece of the web and only answer a few questions. And maybe due to that, it requires new rules, new laws and new definitions for the better of society. But it’s nowhere near as clear cut as the Google example you provide.
"Seems like only difference between me and ChatGPT is absolutely everything".
You can't be flippant about scale not being a factor here. It absolutely is a factor. Pretending that ChatGPT is like a person synthesizing knowledge is an absurd legal argument, it is absolutely nothing like a person, its a machine at the end of the day. Scale absolutely matters in debates like this.
Why not? A fast piece of metal is different from a slow piece of metal, from a legal perspective.
You can't just say that "this really bad thing that causes a lot of problems is just like this not so bad thing that haven't caused any problem, only more so". Or at least it's not a correct argument.
When it is the scale that causes the harm, stating that the harmful thing is the same as the harmless except the scale, is like.. weird.
So there isn’t a legal distinction regarding fast/slow metal after all. Well that revelation certainly makes me question your legal analysis about copyright.
So in your view, when a human does it, he causes a minute of harm so we can ignore it, but chatGPT causes a massive amount of harm, so we need to penalize it. Do you realize how radical your position is?
You’re saying a human who reads free work that others put out on the internet, synthesizes that knowledge and then answers someone else’s question is a minute of evil, that we can ignore. This is beyond weird, I don’t think anyone on earth/history would agree with this characterization. If anything, the human is doing a good thing, but when ChatGPT does it at a much larger scale it’s no longer good, it becomes evil? This seems more like thinly veiled logic to disguise anxiety that humans are being replaced by AI.
> This is beyond weird, I don’t think anyone on earth/history would agree with this characterization
Superlatives are a slippery slope in argumentation, especially if you invoke the whole humanity of the whole earth of the whole history. I do understand bmaco theory and while not a lawyer I’d bet what you want there’s more than one juridiction that see scale as an important factor.
Often the law is imagined as an objective cold cut indifferent knife but often there’s also a lot of "reality" aspects like common practice.
> So in your view, when a human does it, he causes a minute of harm so we can ignore it, but chatGPT causes a massive amount of harm, so we need to penalize it. Do you realize how radical your position is?
Yes, that's my view. No, I don't think that this is radical at all. For some reasons or another, it is indeed quiet uncommon. (Well, not in law, our politicians are perfectly capable of making laws based on the size of danger/harm.)
However, I haven't yet met anyone, who was able to defend the opposite position, e.g. slow bullets = fast bullets, drawing someone = photographing someone, memorizing something = recording something, and so on. Can you?
Don’t obfuscate, your view is that the stack overflow commentator, Quora answer writer, blog writer, in fact anyone who did not invent the knowledge he’s disseminating, is committing a small amount of evil. That is radical and makes no sense to me.
> Don’t obfuscate, your view is that the stack overflow commentator, Quora answer writer, blog writer, in fact anyone who did not invent the knowledge he’s disseminating, is committing a small amount of evil.
:/ No, it's not? I've written "haven't caused any problem" and "harmless". You've changed it to "small harm" that I've indeed missed.
I don't think that things that don't cause any problem are evil. That's a ridiculous claim, and I don't understand why would you want me to say that. For example I think 10 billion pandas living here on Earth with us would be bad for humanity. Does that mean that I think that 1 panda is a minute of evil? No, I think it's harmless, maybe even a net good for humanity. I think the same about Quora commenters.
Yes, that dichotomy is present everywhere in the real world.
You need lye to make proper bagels. It is not merely harmless, but beneficial in small amounts for that purpose. We still must make sure food businesses don't contaminate food with it; it could cause severe — possibly fatal — esophageal burns. The "A little is beneficial but a lot is deleterious" also applies to many vitamins… water… cops?
Trying to turn this into an “it’s either always good or always bad” dichotomy serves no purpose but to make straw men.
Clearly there is nuance that society compromises on certain things that would be problematic at scale because it benefits society. Sharing learned information disadvantages people who make a career of creating and compiling that information but you know, humans need to learn to get jobs and acquire capital to live and, surprisingly, die and along with them that information.
Or framing the issue another way, people living isn’t a problem but people living forever would be. Scale/time matters.
Here again I’ve fallen for the HN comment section. Defend your view point if you like I have no additional commentary on this.
If reading something is hurting your feelings, you can stop reading it.
Twitter even provides mute, block, and whatnot functionality to prevent specified things from even showing up in your line of sight to begin with. And if the app is really bothering you, you can always set it down and go outside, take a walk, meet somebody new, do something that will put a smile on your face on your deathbed.
Lumping in mean comments online, with actual abuse is approaching risible. Words have meanings, we shouldn’t dilute or distort them.
By Twitter “not taking action,” sounds like your friend is upset that he or she can no longer co-opt the proprietors of the site into enacting punitive measures on people who draw his or her ire.
Maybe some mean things were said or whatever, but at the end of the day it’s just text on a screen isn’t it? And there’s a lot more to life than text on a screen, isn’t there?
It’s also weird how you mention the technical functioning of the site, then bring up the “Trust & Safety Org” when the legacy of “Trust & Safety” is a small cabal with extremist views arbitrarily deciding what information to censor and suppress based on their own viewpoints, whims, and influence from government agencies.
That has nothing to do with the technical functioning of the site which is a matter of reproducible, specifiable, determinate functions implemented in computer code to produce a useful product. The kind of thing that really turns the mind of an autist on.
P.S. Not to be too blasé about your friend, mean words can be an issue, especially an ongoing pattern, but anonymous strangers online seems like less of an issue than irl, and was this really an issue where block or mute wasn’t sufficient? How so?
Man, if you can't see the difference between "so-and-so called me a mean name" and "1000 strangers all knocked on my door just to tell me, in excruciating detail, how they wish my children were raped and murdered", I don't know what to tell you.
X's systems for block and mute require the abuse to occur before you have an avenue to respond. Considering that all you need to get an X account is an email account, it's a pretty low bar for brigading. And that's to say nothing about organized campaigns to falsely report an account for abuse.
For individuals, I suppose you can make some kind of argument that those tools are sufficient, but if you're the poor social media manager for some township or minor government agency that draws the ire of the internet hate machine, you have to deal with all the abuse that goes with it. You are barred by the constitution from blocking people (and rightly so), and you have no real power to prevent them from creating sock puppet accounts to continue the abuse. PTSD is pretty common amongst (former, since they fired them all) twitter content moderators, because being consistently exposed to that stuff can eventually be pretty traumatizing.
It’s OLED in general, because of PWM flickering first of all. I strongly prefer IPS LCD for any screen that’s used for longer than brief periods of time. Btw Apple IPS screens in iPads, MacBooks, etc demonstrate that that tech is capable of producing great image quality
Human habitation and influence on the landscape in the Amazon seems far more extensive in scope and duration than previously thought, based on recent discoveries.
The significance of the second link is ancient peoples further north and in other parts of the Americas don’t share this genetic link, indicating a longer timeline of human habitation in the region than previously thought.
For sure! While there were definitely civilizations there, there are still huge areas that were likely never clear cut. And even the areas where pre-colonial civilizations had developed have been reforested for 600+ years, which by itself makes it interesting forest.
In considering these things, affordances must be made to the new abilities made possible by the computational tools now at our disposal.
Following your line of reasoning if it is perfectly legal to walk into a coffee shop and sit down and listen to what the people next to me are talking about, commit it to memory, even make notes about it, does it then follow that it should be perfectly legal, reasonable, and acceptable for a govt agency or some other organization to put microphones everywhere to record what everyone is talking about, then feed all this data into various databases and modeling systems?
Reciting something in a park is different than selling a copyrighted print of something in a park when you don’t hold the copyright. Which is much closer to what the NYT is accusing OpenAI of.
The training data not “existing” in the model is interesting, but at some point, a distinction without a difference.
If I hire an autistic savant to go to a library and read all the books, then I set up a book selling service where whenever people want to buy a book I have my savant employee type out the book for them, is it then going to pass muster in a copyright case if I tell the judge “It’s okay actually, because the books don’t actually exist in my employee’s brain, merely neuronal encodings of them.” ?
If I have a copyrighted image on which I don’t hold the copyright. But I want to start selling it to people, is it cool if I just run it through a lossless compression algorithm, thereby generating a new encoding of the information and then sell this new encoding along with the software and command to reverse the compression?
Regarding the open source stuff, there I think you might find more favor to your arguments.
But the stuff we are seeing within commercial enterprises like OpenAI and Midjourney is clearly copyright infringement.
And I don’t see copyright law being insane in these cases.
It would be perfectly legal that a million government agents went it coffee shops and recorded what they heard. It is the leaving of government property on private property that is the real issue, as well as transparency... not access of information (please don't do this my government).
As far as the savant reading all the books analogy goes... it's a bit off base - mostly because the AI isn't attempting to do that... it would have to be prompted specially (which as far as I understand - what's happening: people giving verbose special prompts to 'extract' copyright... which again... extract - the verbiage regenerate is better, considering there's no guarantee the generation will be a perfect reproduction...) to generated that information. What is happening, (fixing the analogy) the savant reads all the books in the library: then someone asks him to generate a brand new book... which contains some passages that happen to be like those in copy-written works... this is 100% interoperable to what human writers do all the time. Why would we ever want to punish an AI for reading and remembering better than us?
On top of that imperfect reproduction is the sale as if it's the original... that's a lot of additional assumptions to make...
Sadly the lossless compression is also a bad analogy. Math maps and 100% translatable and thus not change/encoding to the bits... if you compresses it lossy, to the point of doing it artistically, then... if none of the bits are the same - it's not the same picture, and doesn't hold any 'bit' of the old image.
> If I hire an autistic savant to go to a library and read all the books, then I set up a book selling service where whenever people want to buy a book I have my savant employee type out the book for them, is it then going to pass muster in a copyright case if I tell the judge “It’s okay actually, because the books don’t actually exist in my employee’s brain, merely neuronal encodings of them.” ?
No, and I do think OpenAI returning copyrighted works verbatim is probably copyright infringement even if it’s “laundered” through a LLM.
However if the autistic savant only provided summaries, analyses, etc that is fair use (IANAL), and should be for LLMs too.
That probably means LLMs will need some sort of scrubbing process to ensure exact training data can’t be reproduced, or if that’s not feasible then some type of output filter that looks for training data (although that would be a problem for open source models)
Without Google, YouTube, and Dropbox, copyrighted material would just be where the copyright holder deemed and authorized it to be.
(save some amount of piracy of course)
It would actually be more in line with the “Internet idea” of a decentralized network, not a massive hub and spoke arrangement.
Saying, “Hey, we already have some big corporations where copyright infringement plays some role in their business model, why not add a few more in the form of “Open”AI, and whoever else.” Is not a good argument.
The centralized server farms and behemoth corporations are in many ways representative of what the commercial internet has become, but are not a fulfillment of the original “internet idea.”
You seem to have misunderstood what I was meaning to convey.
I wasn’t even meaning to say much in the way of the virtues or not of piracy, or the legitimacy of copyright in it’s present form.
If Google, YouTube, Dropbox didn’t exist you could still share files with your friends.
Sharing a file with a friend and him benefiting in some way from that is a different thing entirely than massive centralizing corporations flouting copyright rules to benefit commercially and increase their power.
What’s your sense about SETI or the Fermi paradox, if a signal becomes so vastly diluted just within our solar system?
I’m sure the SETI people have thought about this and made various calculations, but with the inverse square law and the vastness of space, maybe “needle in a haystack” is optimistic.
Is it the wrong model to think that anything but maybe a galaxy scale civilization is just going to have it’s signals more or less totally dissolved into seemingly random cosmic fluctuations, relative to our sensors/receivers at least?
Maybe. Recovering data on interstellar distances is hard.
Integrating a long time to see if there's a signal there above background levels is maybe not so hard (especially if it was intended for detection in this scenario).
The big issue for data recovery is energy per symbol. If you can integrate for hours, that can still be a lot of "special photons" (whether they're on a weird radio frequency or light wavelength).