More

Intralexical · 2025-06-17T01:31:20 1750123880

I read the second paragraph as a reference to compressibility of the resulting stream, not the contents of the encoded/discarded data.

Only random noise is incompressible, so realistic scenes allow compression rates over 100X without a 100X quality loss.

strogonoff · 2025-06-17T12:18:15 1750162695

It is not even about YouTube data rates but about display media limitations. There is not going to be any sort of realistic scene data going over the wire just because of that. 99% of it has to be discarded because it cannot be displayed. It cannot be discarded automatically, because what should be discarded is a creative decision. Even if you could compress 5+ gigabit per second into 20 megabit per second losslessly, it is a pure waste of CPU.

Also, noise is desirable. Even if you magically can discern on the fly at 30 or 60 fps, and at 5 gigabit/second, what is noise and what is fine details and texture in a real scene, which is technically impossible (remember, it is a creative task because you cannot automatically determine even neutral grey), eliminating noise would result in a fake-ish washed look.

Intralexical · 2025-06-12T18:24:09 1749752649

We've basically flooded the information space with r-strategists.

In evolution, rapid reproduction gives an advantage to spamming low-quality offspring [1], and rapid selection without agglomeration [2,3] incentivizes antisocial behavior.

Ideas spread, mutate, and evolve just like animals [4]. So when the Internet made it free for anyone to transmit information to millions of people instantly, trustworthy information sources [5] and prosocial cultural values started dying [6], as literally the worst and craziest people become dominant [7,8,9,10,11].

...Presumably "AI" is going to make this even worse, and immeasurably so.

---

1. https://en.wikipedia.org/wiki/R/K_selection_theory

2. https://journals.plos.org/plosone/article?id=10.1371/journal...

3. https://en.wikipedia.org/wiki/Context_collapse

4. https://en.wikipedia.org/wiki/Memetics

5. https://en.wikipedia.org/wiki/Decline_of_newspapers

6. https://theweek.com/culture-life/third-places-disappearing

7. https://globalnews.ca/news/1157137/internet-trolls-are-sadis...

8. https://www.engadget.com/2018-03-19-study-shows-distribution...

9. https://old.reddit.com/r/slatestarcodex/comments/9rvroo/most...

10. https://www.ipr.northwestern.edu/news/2023/why-are-online-po...

11. https://en.wikipedia.org/wiki/Kakistocracy

Intralexical · 2025-06-12T18:10:54 1749751854

I think this is an unfortunate misunderstanding of why people like and use social media.

ChatGPT can already generate endless "comments". And yet you're here.

esafak · 2025-06-12T18:26:18 1749752778

I agree with you, but the owners and customers of ad-supported communities like Facebook have an incentive to inject clickbait from fake users posing as humans. Facebook is already well along that road, with a captive if aging userbase. Maybe they'd flee if they had an alternative.

throwaway290 · 2025-06-12T18:39:43 1749753583

Most people on social media don't interact, just consume. You are talking to small self selected minority

If there was an HN clone generated on the fly for you or the guy you replied to then what's the difference? Especially if you imagine you didn't know it was generated. That's the problem with this tech, for you there's no difference. But probably a difference for society.

littlestymaar · 2025-06-12T18:28:27 1749752907

You must have missed a train or two over the past fifteen years, so called “social media” have very little to do with the social part they used to, they are mostly algorithmically driven dopamine shots to capture user's attention, and TikTok is the purest form of it.

pphysch · 2025-06-12T18:26:14 1749752774

HN users that actively engage with comments are probably 0.0001% of overall social media users

Thaxll · 2025-06-12T19:43:36 1749757416

No one knows if you're a bot or a human.

csomar · 2025-06-13T12:05:00 1749816300

> And yet you're here.

Some people are talking to ChatGPT though. We are here, for now.

Intralexical · 2025-06-13T15:25:22 1749828322

Some people are stimulating themselves with opioids too. Outside the Hacker News bubble, which statistically has elevated incentives to push "the future is AI", the consensus I've seen is revulsion towards fully algorithmically generated content streams.

This is like seeing Frito-Lay's stock price rise, and concluding that restaurants are doomed in the future. There's about the same amount of equivalence.

Intralexical · 2025-06-12T18:02:43 1749751363

I think at least some of them will come up to Canada. No language barrier, and close enough geographically and culturally to keep most of your connections. We pay less but live longer on average.

Seeing all this unfold is doing amazing things for our national pride, ironically.

FuriouslyAdrift · 2025-06-12T18:54:52 1749754492

Canada is on the brink of serious economic trouble...

Recession, housing costs, restricted immigration choices coming soon. This has been brewing for years but the US trade war sped up the timeline.

https://nationalpost.com/opinion/high-immigration-is-worseni...

Intralexical · 2025-06-12T17:58:14 1749751094

The types of personality that turn into an authoritarian when given power tend to have damaged senses of both internal identity and external reality.

When Trump says "America First" or "Leftists hate our country", what he really means is himself. He's not really lying; "America", in that context, is just an extension of his own ego. Likewise when Putin talks about the "Russian state" or "Russian world", that's something that he conceptualizes as an extension of his own physical body. The channels run by Vlad Vexler on YouTube have an accessible discussion of some of this if you're more interested.

It's not that he wants to hurt federal workers, set back science, or destroy US state capacity. But he does so anyway, because his concept of "America" is one that stops at his own ego. Other people aren't really real to him.

Intralexical · 2025-06-12T02:46:02 1749696362

> Despite what I've said I for the life of me cannot understand why people trust Google with their data without first backing it up elsewhere.

Because most people are non-technical, most technically skilled people assume a basic level of good faith and transparency from society, and even people who are both technical and cynical have finite hours for setting up NextCloud.

And by non-technical, I mean "doesn't know what a folder is", not "might struggle with long Docker commands". OneDrive and Google Photos already describe themselves as a "Backup" in the UI, so why would you need another backup unless they're lying?

Big Tech can obfsucate their bad behavior behind technical complexity, simply not care about operating in good faith, and still attain market dominance through convenient offerings and manipulative practices.

Imagine your pizza shop taking back a pie they'd already delivered, or your local bank branch unilaterally deciding to incinerate your safe deposit box. They'd never get away with it. But Google can.

Instead of faulting people for trusting them, we should hold Big Tech accountable for not being trustworthy.

Intralexical · 2025-06-12T02:32:59 1749695579

You can't fault people for using a popular service that's advertised as safe.

Maybe the solution is laws requiring prominent disclosure of high-impact practices like this. How many people would use Youtube if the sign-up page said, in bold red letters:

  WE CAN AND WILL DELETE USERS' ENTIRE ACCOUNT AND LIVELIHOOD AT ANY TIME.
  YOU WILL HAVE NO RECOURSE OR EXPLANATION.
  WE DON'T REALLY EVEN EMPLOY HUMANS FOR CUSTOMER SUPPORT.

Only half joking.

Intralexical · 2025-05-12T16:18:37 1747066717

A test to apply here: If you or I did this, would it be illegal? Would we even be having this conversation?

The law is supposed to be impartial. So if the answer is different, then it's not really a law problem we're talking about.

Intralexical · 2025-05-12T15:42:40 1747064560

> The fatal flaw in your reasoning: machines aren't humans. You can't reason that a machine has rights from the fact a human has them. Otherwise it's murder to recycle a car.

The direction we're going, it seems more likely it'll be recycling to murder a human.

Intralexical · 2025-05-12T15:38:49 1747064329

Why do you think the obvious analogy is LLM=Human, and not LLM=JPEG or LLM=database?

I think you're overstating the legal uniqueness of LLMs. They're covered just fine by the existing legal precedents around copyrighted and derived works, just as building a death star would be covered by existing rules around outer space use and WMDs. Pretending they should be treated differently is IMO the entire lie told by the "AI" companies about copyright.

sdenton4 · 2025-05-12T16:01:37 1747065697

LLMs are certainly not a jpeg or a database...

The google news snippets case is, in my non-lawyer opinion, the most obvious touch point. And in that case, it was decided that providing large numbers of snippets in search results was non-infringing, despite being a case of copying text from other people at-scale... And the reasons this was decided are worth reading and internalizing.

There is not an obvious right answer here. Copyright rules are, in fact, Calvinball, and we're deep in uncharted territory.

Intralexical · 2025-05-12T16:09:37 1747066177

> LLMs are certainly not a jpeg or a database...

Their weights are derived from copyrighted works. Evaluating them preserves the semantic meaning and character of the source material. And the output directly competes against the copyrighted source materials.

The fact they're smudgy and non-deterministic doesn't change how they relate to the rights of authors and artists.

sdenton4 · 2025-05-12T21:01:42 1747083702

Nothing in copyright law talks about 'semantic meaning' or 'character of the source material'. Really, quite the opposite - the 'expression-idea dichotomy' says that you're copyrighting the expression of an idea, not the idea itself. https://en.wikipedia.org/wiki/Copyright_law_of_the_United_St...

(Leaving aside whether the weights of an LLM does actually encode the content of any random snippet of training text. Some stuff does get memorized, but how much and how exactly? That's not the point of the LLM, unlike the jpeg or database.)

And, again, look at the search snippets case - these were words produced by other people, directly transcribed, so open-and-shut from a certain point of view. But the decision went the other way.

Suppafly · 2025-05-12T21:08:12 1747084092

>Their weights are derived from copyrighted works. Evaluating them preserves the semantic meaning and character of the source material.

That sounds like you're arguing that they should be legal. Copyright law protects specific expressions, not handwavy "smudgy and non-deterministic" things.

johnnyanmac · 2025-05-13T02:09:52 1747102192

Llms can't express, that's the primary issue. You can't just make a collage of copyrighted works and shield yourself from copyright with "expression".

sdenton4 · 2025-05-13T18:43:00 1747161780

That's certainly an opinion.

Suppafly · 2025-05-13T18:46:43 1747162003

>You can't just make a collage of copyrighted works and shield yourself from copyright with "expression".

And yet collage artists do that all the time.

johnnyanmac · 2025-05-14T08:16:13 1747210573

I'll remind you that all fanart is technically in a gray area of copyright infringement. Legally speaking, companies can take down and charge infringement for anything using their IP thars not under fair use. Collages don't really pass that benchmark.

Yoinnking their up and mass producing slop sure is a line to cross, though.

temporalparts · 2025-05-14T14:17:40 1747232260

I'm not an expert, but I thought fan art that people try to monetize in some form is explicitly illegal unless it's protected by parody, and any non commercial "violations" of copyright is totally legal. Disney can't stop me from drawing Mickey in the privacy of my own house, just monetizing/getting famous off of them.

SilasX · 2025-05-12T16:43:20 1747068200

The problem is, you can say all of that for human learning-from-copyrighted-works, so that point isn't definitive.

const_cast · 2025-05-12T21:18:44 1747084724

The difference is we're humans, so we get special privileges. We made the laws.

If we're going to be giving some rights to LLMs for convenient for-profit ventures, I expect some in-depth analysis on whether that is or is not slavery. You can't just anthropomorphize a computer program when it makes you money but then conveniently ignore the hundreds of years of development of human rights. If that seems silly, then I think LLMs are probably not like humans and the comparisons to human learning aren't justified.

If it's like a human, that makes things very complicated.

johnnyanmac · 2025-05-13T02:13:01 1747102381

Scales of effect always come into play when enacting law. If you spend a day digging a whole on the beach, you're probably not going to incur much wrath. If you bring a crane to the beach, you'll be stopped because we know the hole that can be made will disrupt the natural order. A human can do the same thing eventually, but does it so slowly that it's not an issue to enforce 99.9% of the time.

SilasX · 2025-05-13T05:41:36 1747114896

That's just the usual hand-wavy, vague "it's different" argument. If you want to justify treating the cases differently based on a fundamental difference, you need to be more specific. For example, they usually define an amount of rainwater you can collect that's short of disrupting major water flows.

So what is the equivalent of "digging too much" in a beach for AI? What fundamentally changes when you learn hyper-fast vs just read a bunch of horror novels to inform better horror novel-writing? What's unfair about AI compared to learning from published novels about how to properly pace your story?

These are the things you need to figure out before making a post equating AI learning with copyright infringement. "It's different" doesn't cut it.

kbelder · 2025-05-12T23:55:32 1747094132

If they were a database, they would be unquestionably legal, because they're only storing a tiny fraction of one percent of the data from any document, and even that data is not any particular replica of any part of the document, but highly summarized and transformed.

johnnyanmac · 2025-05-13T02:14:08 1747102448

Given that you can in fact prompt enough to reproduce a source image, I'm not convinced that is the actual truth of the matter.