"However, this is not just a cyclical shortage driven by a mismatch in supply an...

0xbadcafebee · 2025-12-28T22:32:03 1766961123

> I wonder if this will result in writing more memory-efficient software?

If the consumer market can't get cheap RAM anymore, the natural result is a pivot back to server-heavy technology (where all the RAM is anyway) with things like server-side rendering and thin clients. Developers are far too lazy to suddenly become efficient programmers and there's plenty of network bandwidth.

PostOnce · 2025-12-28T23:00:04 1766962804

Developers would prefer to write good software, the challenge and the craftsmanship are a draw.

However, the customers do not care and will not pay more so the business cannot justify it most of the time.

Who will pay twice (or five times) as much for software written in C instead of Python? Not many.

0xbadcafebee · 2025-12-29T02:56:36 1766976996

Well this is patently false. For the past 3 decades, programmers have intentionally made choices which perform as poorly as the hardware will allow them. You can pretty much draw a parallel line with hardware advancement and the bloating of software.

It hasn't gotten 100x harder to display hypermedia than it was 20 years ago. Yet applications use 10x-100x more memory and CPU than they used to. That's not good software, that's lazy software.

I just loaded "aol.com" in Firefox private browsing. It transferred 25MB, the tab is using 307MB of RAM, and the javascript console shows about 100 errors. Back when I actually used AOL, that'd be nearly 10x more RAM than my system had, and would be one of the largest applications on my machine. Aside from the one video, the entire page is just formatted text and image thumbnails.

Mr_Minderbinder · 2025-12-29T09:32:57 1767000777

> You can pretty much draw a parallel line with hardware advancement and the bloating of software.

I do not think it is surprising that there is a Jevons paradox-like phenomena with computer memory and like other instances of it, it does not necessarily follow that this must be a result of a corresponding decline in resource usage efficiency.

ganoushoreilly · 2025-12-28T22:58:28 1766962708

This is by design. Rent your computer.. don't buy! Use Geforce Now!

ragequittah · 2025-12-29T23:28:32 1767050912

There is a small part of me that wonders if my $3000 computer is worth it when that could get me about 12 years of geforce now gaming with an updated graphic card and processor at all times. But I like to tinker so I'll probably end up spending $10k or more by the end of that 12 years instead.

zozbot234 · 2025-12-28T21:24:46 1766957086

There's plenty of scope for local AI models to become more efficient, too. MoE doesn't need too much RAM: only the parameters for experts that are active at any given time truly need to be in memory, the rest can be in read-only storage and be fetched on demand. If you're doing CPU inference this can even be managed automatically by mmap, whereas loading params into VRAM must currently be managed as part of running an inference step. (This is where GPU drivers/shader languages/programming models could also see some improvement, TBH)

dummydummy1234 · 2025-12-28T22:07:27 1766959647

But aren't the experts chosen on a token by token basis, which means bandwidth limitations?

refulgentis · 2025-12-29T04:32:50 1766982770

Yes, with the direct conclusion from that being tl;dr in theory OPs explanation could mitigate RAM, in practice, it’s worse

(Source: I maintain an app integrated with llama.cpp, in practice no one likes 1 tkn/s generation times that you get from swapping, and honestly MoE makes RAM situation worse because in practice, model developers have servers and batch inference and multiple GPUs wired together. They are more than happy to increase the resting RAM budget and use even more parameters, limiting the active experts is about inference speed from that lens, not anything else)

imtringued · 2025-12-29T13:52:52 1767016372

MoE works exactly the opposite way you described. MoE means that each inference pass reads a subset of the parameters, which means that you can run a bigger model with the same memory bandwidth and achieve the same number of tokens per second. This means you're using more memory in the end.

duped · 2025-12-28T21:44:47 1766958287

It's not a zero sum game because silicon wafers are not a finite resource. Industry can and will produce more.

tokioyoyo · 2025-12-28T22:31:06 1766961066

If industry has a bit of fear that the demand will slow down by the time they can output meaningful amount of chips, then probably not. Time will show.

hackable_sand · 2025-12-28T22:13:08 1766959988

Neither are paperclips.

AtlasBarfed · 2026-01-01T20:26:46 1767299206

I'm waiting for the good AI powers software.... Any day now.

Ideally, llm should be able to provide the capability to translate from memory inefficient languages to memory efficient languages, and maybe even optimize underlying algorithms in memory use for this.

But I'm not going to hold my breath

krupan · 2025-12-28T22:35:27 1766961327

This is a temporary problem driven by the AI bubble. It's going to hurt until the bubble pops, but when that happens other things are going to hurt

treebeard901 · 2025-12-28T21:51:17 1766958677

Code the AI will produce will solve the memory usage problems which is itself a result of lazy or poor human coders.

xocnad · 2025-12-28T22:35:56 1766961356

Nice assertion. Perhaps you meant that AI could be directed towards less memory intensive implementations. That would still have to be directed by those same lazy/poor coders because the code the AI is learning from is their bad code (for the most part).

to11mtm · 2025-12-28T22:53:49 1766962429

IDK, given the prevalence of Electron and other technically-correct-but-inefficient code out there, at bare minimum it would require decent prompting to help.

antisthenes · 2025-12-28T21:39:36 1766957976

> There's been a vicious cycle of heavier software -> more RAM -> heavier software but if this RAM shortage is permanent, the cycle can't continue.

What do you mean it can't continue? You'll just have to deal with worse performance is all.

Revolutionary consumer-side performance gains like multi-core CPUs and switching to SSDs will be a thing of distant past. Enjoy your 2 second animations, peasant.