Yes, you're allowed to make personal copies of copyright works that you own. IANAL, but my understanding is that if you're using them for yourself, and you're not prevented from doing so by some sort of EULA or DRM, there's nothing in copyright law preventing you from e.g. photocopying a book and keeping a copy at home, as long as you don't distribute it. The test case here has always been CDs—you're allowed to make copies of CDs you legally own and keep one at home and one in your car.
To the best of my knowledge, no individual has ever been sued or prosecuted specifically for downloading books. As long as you're not massively sharing them with others, it's not an issue in practice. Enjoy your reading and learning.
Aaron Swartz, cofounder of Reddit and inventor of RSS and Markdown, was hounded to death by an overzealous prosecutor for downloading articles from JSTOR, with the intent to learn from them. He was charged with over a million dollars in fines and could have faced 35 years in prison.
He and Sam Altman were in the same YC class. OpenAI is doing the same thing at a larger scale, and their technology actually reproduces and distributes copyrighted material. It's shameful that they are making claims that they aren't infringing creator's rights when they have scraped the entire internet.
I'm familiar with Aaron Swartz's case, and that is actually why I phrased it as "books". In any case, while tragic, Swartz wasn't prosecuted for copyright infringement, but rather for wire fraud and computer fraud due to the manner in which he bypassed protections in MIT's network and the JSTOR API. This wouldn't have been an issue if he downloaded the articles from a source that freely shared them, like sci-hub.
It would be incredibly naive to assume that the scraping done for these models did not at any point circumvent protections.
The fundamental contention is that both accessed, saved and distributed material that they didn't have a "right" to access, save, and distribute. One was made a billionaire for it and another was driven to suicide. It's not tragic, it's societal malpractice.
> It's shameful that they are making claims that they aren't infringing creator's rights when they have scraped the entire internet.
Scraping the Internet is generally very different from piracy. You are given a limited right to that data when you access it, and you can make local copies. if further use does something sufficiently non-copying, then creator rights aren't being infringed.
> Can you compress the internet including copyrighted material and then sell access to it?
Define access?
If you mean sending out the compressed copy, generally no. For things people normally call compression.
If you want to run a search engine, then you should be fine.
> At what percentage of lossy compression it becomes infringement?
It would have to be very very lossy.
But some AI stuff is. For example there are image models with fewer parameters than source images. Those are, by and large, not able to store enough data to infringe with. (Copying can creep in with images that have multiple versions, but that's a small sliver of the data.)
Commercial audio generation models were caught reproducing parts of copyrighted music in a distorted and low-quality form. This is not "learning", just "imitating".
Also, as I understand they didn't even buy the CDs with music for training; they got it somewhere else. Why do organizations that prosecute people for downloading a movie do not want to look if it is ok to make a business on illegal copies of copyrighted works?
When you identify where the infringing party has stored the source material in their artifact.{zip,pdf,safetensor,connectome,etc}. In ML, this discovery stage is called "mechanistic interpretability", and in humans it's called "illegal."
It's not that clear cut. Since they're talking about taking lossy compression to the limit, there are ways to go so lossy that you're not longer infringing even if you can point exactly at where it's stored.
It was overzealous prosecution of the breaking into a closet to wire up some ethernet cables to gain access to the materials
Not the downloading with intent
And apparently the most controversial take on this community is the observation that many people would have done the trial, plea and time, regardless of how overzealous the prosecution was
I'm glad you still have that much faith in the system. That's much more faith than I have in the system (and more faith than I had in the system back then, too).
35 years is a press release sentence. The way DOJ calculates sentences when they write press releases ignores the alleged facts of the particular case and just uses for each charge the theoretically maximum possible sentence that someone could get for that charge.
To actually get that maximum typically requires things like the person is a repeat offender, drug dealing was involved, people were physically harmed, it involved organized crime, it involved terrorism, a large amount of money was involved, or other things that make it an unusual big and serious crime.
The DOJ knows exactly what they are alleging the defendant did. They could easily looks at the various factors that affect sentencing for the charge and see which apply to that case and come up with a realistic number but that doesn't make it sound as impressive in the press release.
Another thing that inflates the numbers in the press releases is that defendants are often charged with several related charges. For many crimes there are groups of related charges that for sentencing get merged. If you are charged with say 3 charges from the same group and convicted on all you are only sentenced for whichever one of them has the longest sentence.
If you've got 3 charges from such a group in the press release the DOJ might just take the completely bogus maximum for each as described above and just add those 3 together.
Here's a good article on DOJ's ridiculous sentence numbers [1].
Here's a couple of articles from an expert in this area of law that looks specifically at what Swartz was charged with and what kind of sentence he was actually looking at [2][3].
Why do you think Swartz was downloading the articles to learn from them? As far as I've seen know one knows for sure what he was intending.
If he wanted to learn from JSTOR articles he could have downloaded them using the JSTOR account he had through his research fellowship at Harvard. Why go to MIT and use their public JSTOR WiFi access, and then when that was cut off hide a computer in a wiring closet hooked into their ethernet?
I've seen claims that he wanted to do was meta research about scientific publishing as a whole which could explain why he needed to download more than he could download with his normal JSTOR account from Harvard, but again why do that using MIT's public WiFi access? JSTOR has granted more direct access to large amounts of data for such research. Did he talk to them first to try to get access that way?
He might have wanted other people to have access to the knowledge, and for free. In comparison, AI companies want to sell access to the knowledge they got by scraping copyrighted works.
Truly wow. The sucking up to coroporations is terrifying. This, when Aaron Swartz was institutionally murdered by the institutions and the state for "copyright infringement". And what he did wasn't even for profit, or even a 0.00001 of the scale of the theft that OpenAI and their ilk have done.
So it's totally OK to rip off and steal and lie through your teeth AND do it all for money, if you're a company.
But if you're a human being, doing it not for profit but for the betterment of your own fellow humans, you deserve to be imprisoned and systematically murdered and driven to suicide.
Thank you for putting my sentiment into words. THIS. It's not power to the people, it's power to the oligarchs. Once you have enough power and, more importantly, wealth, you're welcomed into the fold with open arms. Just how Spotify build a library of stolen music, as long as wealth was created, there is no problem because wealth is just money taken from the people and given to the ruling class.
> Internet people say you can, but there's no actual legal argument or case law to support that.
Quite the opposite. The burden of proof is on you to show a single person ever, in history, who has been prosecuted for that.
If nobody in the world has ever been prosecuted for this, then that means it is either legal, or it is something else that is so effectively equivalent to "legal" that there is little point in using a different word.
If you want to take the position that, "uhhhhhhh, there is exactly 0% chance of anyone ever getting in trouble or being prosecuted for this, but I still don't think its legal, technically!"
Then I guess go ahead. But for those in the real world, those two things are almost equivalent.
> If you want to take the position that, "uhhhhhhh, there is exactly 0% chance of anyone ever getting in trouble or being prosecuted for this, but I still don't think its legal, technically!"
At home? Without ever sharing it with anyone? I thought making backups of things that you personally own was protected, at least in the US. Could you elaborate on my apparent misunderstanding?
This is a specific exception in Australia Copyright law. It allows reproducing works in books, newspapers and periodical publications in different form for private and domestic use.
It seems reasonably within the bounds described by fair use, but nobody's ever tested that particular constellation of factors in a lawsuit, so there's no precedent - hand copying a book, that is.
17 U.S.C. § 107 is the fair use carveout.
Interestingly, digitizing and copying a book on your own, for your own private use, has also not been brought to court. Major rights holders seem to not want this particular fair use precedent to be established, which it likely would be, and might then invalidate crucial standing for other cases in which certain interpretations of fair use are preferred.
Digitally copying media you own is fair use. I'll die on that hill.
It doesn't grant commercial rights, you can't resell a copy as if it were the original, and so on, and so forth.
There's even a good case to be made that sharing a digitally copied work purchased legally, even to millions of people, 5 years after a book is first sold - for a vast majority of books, after 5 years, they've sold about 99.99% of the copies they're going to sell.
By sharing after the ~5 year mark, you're arguably doing marketing for the book, and if we cultivated a culture of direct donation to authors and content creators, it invalidates any of the reasons piracy is made illegal in the first place.
Right now publishers, studios, and platforms have a stranglehold on content markets, and the law serves them almost exclusively. It is exceedingly rare for the law to be invoked in defending or supporting an author or artist directly. It's very common for groups of wealthy lawyers LARPing as protectors of authors and artists to exploit the law and steal money from regular people.
Exclusively digital content should have a 3 year protected period, while physical works should get 5, whether it's text, audio, image, or video.
Once something is outside the protected period, it should be considered fair game for sharing until 20 years have passed, at which point it should enter public domain.
Copyright law serves two purposes - protecting and incentivizing content creators, and serving the interests of the public. Situations where a bunch of lawyers get rich by suing the pants off of regular people over technicalities is a despicable outcome.
> there's no precedent - hand copying a book, that is
Thank you! I had looked this up myself last week, so I knew this. I had long believed, as GP does, that copying anything you own without distribution is either allowed or fair use. I wanted GP to learn as I did.
For reference, here's the US legal code in question:
Notwithstanding the provisions of sections 106 and 106A, the fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright. In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include—
(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
(2) the nature of the copyrighted work;
(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
(4) the effect of the use upon the potential market for or value of the copyrighted work.
The fact that a work is unpublished shall not itself bar a finding of fair use if such finding is made upon consideration of all the above factors.
The spirit seems apparent, but in practice it's been used by awful people to destroy lives and exploit rent from artists and authors in damn near tyrannical ways.
Except you said "You can't make archival copies." and didn't provide a citation. That's quite a different claim than "there exists no precedent clearly establishing your right or lack thereof to make archival copies".
Congress expressly granted archival rights for digital media. If they wanted to do the same for books they could've done so. There's no law or legal precedent allowing it.
Given all this "can't do it" is more probably accurate than "can do it". IANAL but it's not like the question is finely balanced on a knife's edge and could go either way.
Congress didn't explicitly disallow it either. You left that bit out. As such it comes down to interpretation of the existing law. We both clearly agree that doesn't (yet) exist.
> IANAL but it's not like the question is finely balanced on a knife's edge and could go either way.
I agree, but my interpretation is opposite yours. It seems fairly obvious to me that the spirit of the law permits personal copies. That also seems to be in line with (explicitly legislated) digital practices.
But at the end of the day the only clearly correct statement on the matter is "there's no precedent, so we don't know". I suppose it's also generally good advice to avoid the legal quagmire if possible. Being in the right is unlikely to do you any good if it bankrupts you in the process.
That's the whole point of copyright: only the owner of a copyright has the right to make copies. I don't see how it can be more explicit than that. It's a default-deny policy.
There is an archival exception for digital media, so obviously Congress is open to granting exceptions for backup purposes. They chose not to include physical media in this exception.
> only the owner of a copyright has the right to make copies.
You are conveniently omitting the provisions about fair use, which is strange since you're clearly aware of them. The only things copyright is reasonably unambiguous about are sale and distribution. Even then there's lots of grey areas such as performance rights.
You are arguing that something is obviously disallowed but have nothing but your own interpretation to back that up. If the situation was as clear cut as you're trying to make out then where is the precedent showing that personal use archival copies of physical goods are not permitted?
> They chose not to include physical media in this exception.
That's irrelevant to the current discussion, though I'm fairly certain you realize that. Congress declined to weigh in on the matter which (as you clearly know) leaves it up to the courts to interpret the existing law.
> That's irrelevant to the current discussion, though I'm fairly certain you realize that.
I said it because it was relevant.
> where is the precedent showing that personal use archival copies of physical goods are not permitted
> Congress declined to weigh in on the matter
There was no "matter" to "weigh in on". The answer to "Can you make a full, complete copy of a copyrighted work without authorization?" has been "Almost always no" from the beginning of copyright. Even the term "fair use" arose in a US legal precedent over a century after the first copyright laws in England. It became an actual part of US copyright law in the 1970s, less than 50 years ago.
"Fair use" is plausible for a library or archive to make full copies, and keep them safe and archived, since that's their job.
Fair use isn't why we have archival rights for electronic media. That right was written into the law after electronic media became a thing.
In my comment above I gave one example why "fair use" wouldn't work for archival copies of physical media made by individuals. An actual lawyer who's paid by the copyright mafia to care about this stuff can surely find more and stronger reasons.
FWIW someone in another comment pointed out Australian copyright law allows making a copy of books, newspapers, and periodicals for personal, domestic use. Which means: a) it can be done and b) even they had to spell it out specifically
> which (as you clearly know) leaves it up to the courts to interpret the existing law.
You're repeating upthread comments. And no, you can't. There's an archival exception for electronic media. If you want to make copies of physical media you either:
1. Can't
Or
2. Rely on fair use to protect you (archival by individuals isn't necessarily fair use)
It absolutely is fair use to copy a book for your personal archives.
The fair use criteria considers whether it is commercial in nature (in this case it is not) and the “ the effect of the use upon the potential market for or value of the copyrighted work” for which a personal copy of a personally owned book is non existent.
> the effect of the use upon the potential market for or value of the copyrighted work
A copyright holder's lawyer would argue that having and using a photocopy of a book keeps the original from wearing out. This directly affects the potential market for the work, since the owner could resell the book in mint condition, after reading and burning their photocopies.
> You would get laughed at by the legal system trying to prosecute an individual owner for copying a book they bought just to keep.
I mean maybe this is true. But the affected individual will have a very bad year and spend a ton of money on lawyers.
Why do you interpret this to mean "absolutely can't do this"? "No precedent" seems to equally support both sides of the argument (that is, it provides no evidence; courts have not ruled). The other commenters arguments on the actual text of the statute seem more convincing to me than what you have so far provided.
> The other commenters arguments...seem more convincing
Because you (and I) want it to be fair use. But as I already said in my comment, it potentially fails one leg of fair use. Keeping your purchased physical copy of the book pristine and untouched while you read the photocopy allows you to later, after destroying the copies you made, resell the book as new or like-new. This directly affects the market for that book.
Do you want to spend time and money in court to find out if it's really fair use? That's what "no precedent" means.
Multiple times in this thread you make the very confident assertion that this is not allowed, and that it is only allowed for electronic media. That is your opinion, which is fine. The argument that it is fair use is also an opinion. Until it becomes settled law with precedent, every argument about it will just opinion on what the text of the law means. But you are denigrating the other opinions while upholding your own as truth.
And whether or not I am personally interested in testing any of these opinions is completely beside the point.
Copyright is a restriction on making unauthorized, full copies under almost all circumstances. Default deny. There's only one documented exception on the books right now which is electronic media. None of these are opinions.
The idea that photocopying a book for archival purposes is potentially fair use is an untested opinion. I'm not denigrating that opinion. I just think it's likely to fail as an legal argument in the unlikely event that it comes up. I'm not a copyright apologist.
I myself believed the "fair use for archival"/"format shifting" thing applied to all works for most of my life. I only learned there was no law or precedent like 10 days ago.
> Do you want to spend time and money in court to find out if it's really fair use?
No. I'd much rather pirate the epub followed by lobbying for severe IP law reform. (Of course by "lobby" I actually mean complain about IP law online.)
You’re now arguing the assumption without a precedent you don’t have the right to do something. That’s not how the law works. If you believe that the courts would laugh at a publisher trying to bring suit against you for doing this, then you believe you have the right to do it.
Such a case would not require a year or a ton of money to defend. In fact, the potential damages would be so small that you could sensibly do it in small claims court.
> You’re now arguing the assumption without a precedent you don’t have the right to do something. That’s not how the law works.
I mean copyright law has always been "You can't make full copies for any reason (almost)". And you were the one saying "it absolutely is fair use [to make full personal copies]", which is quite a strong statement to make in the absence of a precedence.
An archive could argue fair use to make full copies of physical works, because that's their role, and by keeping the copies locked away they don't harm the market for the works. These fair use factors don't apply to individuals. But IANAL and maybe that's wrong, who knows? I do know if it comes up the copyright mafia will fight it tooth and nail, and I'd put my money on them winning.
> the potential damages would be so small that you could sensibly do it in small claims court
The publisher would sue the infringer in small claims court? This seems very unlikely since the publisher would prefer to scare or bankrupt you into submission.
Or would the defendant have the lawsuit moved to small claims court? Are defendants allowed to do this?