As much as I love the Internet Archive, is it really that crazy? The four factors used for determining fair use are:
* the purpose and character of the use
* the nature of the copyrighted work;
* the amount and substantiality of the portion used in relation to the copyrighted work as a whole
* the effect of the use upon the potential market for or value of the copyrighted work.
In the Internet Archive case, they're distributing whole, unmodified copies of copyrighted works which will of course compete with those original works.
In the AI use case, they're typically aiming not to output any significant part of the training data. So they could well argue that the use is transformative, reproducing only minimal parts of the original work and not competing in the market with the original work.
To me, the point isn’t that what the IA was doing was fair use, but that what LLMs are doing arguably is not.
> In the AI use case, they're typically aiming not to output any significant part of the training data
What they’ve aimed to do and what they’ve done are two different things. Models absolutely have produced output that closely mirrors data they were trained on.
> not competing in the market with the original work
This seems like a stretch, if only because I already see how much LLMs have changed my own behavior.
These models exist because of that data, and directly compete by making it unnecessary to seek out the original information to begin with.
But look at your own argument. LLMs are not fair use because they might be prompted into regurgitating something substantially similar to the trained data.
And yet, the IA is 100% aiming to absolutely reproduce literally every part of the work in a 100% complete manner that replaces the original use of the work.
And you cannot bring yourself to admit that the IA is wrong. When you get to that point you have to admit to yourself that you're not making an argument your pushing a dogma.
I’m not arguing that the IA is right or wrong here.
The point more generally is that there’s an asymmetry in how people are thinking about these issues, and to highlight that asymmetry.
If it turns out after various lawsuits shake out that LLMs as they currently exist are actually entirely legal, there’s a case to be made that the criteria for establishing fair use is quite broken. In a world where the IA gets in legal trouble for interpreting existing rules too broadly, it seems entirely unjust that LLM companies would get off scott free for doing something arguably far worse from some perspectives.
IA was lending a digital copies (only one user at a time may read the book), it was acting like a library lending out physical books, only IA did it over the Internet which is more convenient. IA is non-profit.
What publishers argue is that you cannot treat digital books like physical ones; i.e. you cannot re-sell or lend (like IA did) a digital book.
What LLM do is that they use copyrighted content for profit and do not lend anything.
> and not competing in the market with the original work
AI absolutely competes in the market with the original works it trains on, and with new works in those same markets. Proponents of unrestricted AI training loudly tout and celebrate that it does so.
Which would be fine, if everyone else had the same rights to completely ignore copyright. The asymmetry here seems critically broken.
> In the Internet Archive case, they're distributing whole, unmodified copies of copyrighted works which will of course compete with those original works.
Libraries would be illegal if conceived of today. If this weren't digital it would be a violation of first sale doctrine.
The actual opinion rules on the concept of controlled digital lending more broadly. From page two:
> "This appeal presents the following question: is it “fair use” for a nonprofit organization to scan copyright-protected print books in their entirety and distribute those digital copies online, in full, for free, subject to a one-to-one owned-to-loaned ratio between its print copies and the digital copies it makes available at any given time, all without authorization from the copyright-holding publishers or authors? Applying the relevant provisions of the Copyright Act as well as binding Supreme Court and Second Circuit precedent, we conclude the answer is no."
No, the IA's CDL system required them to make multiple copies of books (one to digitize the book, and one for every reader of the book), which is not a legal problem a physical library runs into.
Right. I'd like a system where that distinction matters but it seems plain how the courts will arrive at a conclusion that it doesn't, because the law is about the mechanism more than it is about the intent. Still, we were all holding on to a fig leaf of an argument that the intent would control here, and IA has burnt that leaf up, at least in NY, CT, and VT.
I don't understand how AI companies can claim that they're not aiming to output the training data when the loss function is "how well can model memorize the dataset?".
One of my favourite releases of 2023 was a well restored edition of Laurel & Hardy's first year (1927) of films[1].
The copyright holder had neglected them somewhat with them only being released in ancient DVD-era masters.
This new release gives the films a full digital restoration based on the best archival materials from around the world.
I genuinely think without public domain day, this never would have happened and I very much hope we see a similar edition of their 1928 films next year.
You might be in luck here! In the US, patents last for only 20 years. Since, the Game Boy was released in 1989, any patents are long since expired.
Using the Nintendo logo might (NB: Not a Lawyer!) be permissible if it's required for software to run on the hardware. This was a big part of the Sega v. Accolade case.[1]
There are still new games being released on cartridge for the original Game Boy. The most recent I'm aware of is Ruby & Rusty last year.[2]
Also, seems like there's some legal stuff around this I need to research more extensively. But the spirit of what I was trying to say is that it would be awesome to have Nintendo officially bless it... But it's unlikely they ever will.
Didn’t EA go to court with Sega over exactly this? I believe the court said using a trademark as a device to prevent access to a device was not applicable.
Desmond Briscoe - the head of the BBC Radiophonic Workshop - enlists the help of Daphne Oram, David Cain and John Baker to explain the fundamentals of synthesised sound.
Both Dell and Lenovo will sell you laptops with Ubuntu. The selection is admittedly (significantly) smaller than for Windows laptops, but it certainly removes uncertainty over whether the hardware will be supported.
My last two personal laptops came with Ubuntu pre-installed - the first a Dell XPS 13 Developer Edition and more recently a Ryzen-based Lenovo ThinkPad X13. I've been extremely happy with both.
Both Dell and Lenovo will sell you laptops with Ubuntu. The selection is admittedly (significantly) smaller than for Windows laptops, but it certainly removes uncertainty over whether the hardware will be supported.
It seems strange to me that, over the last 15 years, hardware makers and sellers haven't tried harder to commoditize their complements and ensure that every computer they sell is at least Linux capable and ready to go.
Unfortunately the vast majority of their customers use Windows. So there is little commercial drive to add Linux support to everything. I'm glad that it's starting with a few models.
There's a lot wrong in that for such a short sentence.
The Beatles were not at all renowned for lip syncing. They grew up playing live in Hamburg and played countless live shows, both during and post-Beatles. Many of their most famous TV appearances including Ed Sullivan were performed live. I'm sure you can find times when they were required by shows to lip sync, but I find the criticism a little bizarre.
As for writing by committee, about 88% of the songs they recorded were written by the band themselves. Which committee are you referring to then? Perhaps you take issue with the Lennon-McCartney songwriting partnership? That would stretch the definition of committee to its limits. For you, any music written by more than one person is unacceptable?
What I appreciated most about 'abcde' was that just prior to the rip it would open up the CDDB output in 'vim' and allow quick and easy edits to what are, sometimes, horrific titling and track entries ...
It's a very nice workflow and avoids a lot of cleanup ...
The UK uses hand-counted paper ballots and often has multiple elections on the same day. You just have a separate ballot paper for each election.
It's not unusual to have votes for some combination of Member of Parliament, Regional Parliament, City Council, Local Mayor and Police Commissioner all on the same day. Members of the European Parliament too, until recently.
It really doesn't take that long. Polls close at 10pm and first results can come out around midnight. Depending on how close the result is, the winner is often known around 6am the next day and the new Prime Minister can be in office by the afternoon.
The UK has a similar system. You are required to go to a specific place to vote but these are normally very local and just for your neighbourhood, e.g. in the local school or church hall. My last three polling stations have been 0.1, 0.6 and 0.2 miles away.
There are 35k polling stations for 47m voters so each station has to process only ~1,300 voters in 15 hours. Queueing is unusual in my experience. Postal votes and voting by proxy are also options.
We have it similar in the Czech Republic, except it's around ~700 eligible voters per polling place. Since the upcoming parliamentary elections are likely to have a ~60% participation, a team of several people at any random polling place is facing the insurmountable task of counting ~400 ballots in several hours.
Voting in the UK is a very smooth process. It's very easy to take 5 minutes to go and vote before or after work, or during a break if you work locally.
I was searching for a good ARM board for tinkering a while back and settled on the Odroid N2+. That has a pretty good trade-off between performance and price. It retails for less than $100 and has 6 cores total, 4 performance A73 cores clocked at 2.4 GHz (compared with the 2 GHz A72s here) and 2 efficiency A53 cores at 2 GHz. This has at least good Linux support with armbian[1] and ArchLinux ARM[2].
Outside of Apple, there are not many modern options. The A73s above are 4 generations old now. There are some ARM laptops with the faster Snapdragon 8CX chip but these still don't come close to the M1.
Amlogic boards have great mainline Linux support, but their firmware/uboot setup is full of blobs, can only use an old compiler toolchain, and only works with an old version of uboot. I am linking to the odriod n2 instructions below, but if you look at any of the Amlogic boards it is a similar situation. (Including the beloved Khadas VIM3).
In the AI use case, they're typically aiming not to output any significant part of the training data. So they could well argue that the use is transformative, reproducing only minimal parts of the original work and not competing in the market with the original work.