Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"However, this is not just a cyclical shortage driven by a mismatch in supply and demand, but a potentially permanent, strategic reallocation of the world’s silicon wafer capacity. [...] This is a zero-sum game: every wafer allocated to an HBM stack for an Nvidia GPU is a wafer denied to the LPDDR5X module of a mid-range smartphone or the SSD of a consumer laptop."

I wonder if this will result in writing more memory-efficient software? The trend for the last couple of decades has been that nearly all consumer software outside of gaming has moved to browsers or browser-based runtimes like Electron. There's been a vicious cycle of heavier software -> more RAM -> heavier software but if this RAM shortage is permanent, the cycle can't continue.

Apple and Google seemed to be working on local AI models as well. Will they have to scale that back due to lack of RAM on the devices? Or perhaps they think users will pay the premium for more RAM if it means they get AI?

Or is this all a temporary problem due to OpenAI's buying something like 40% of the wafers?





> I wonder if this will result in writing more memory-efficient software?

If the consumer market can't get cheap RAM anymore, the natural result is a pivot back to server-heavy technology (where all the RAM is anyway) with things like server-side rendering and thin clients. Developers are far too lazy to suddenly become efficient programmers and there's plenty of network bandwidth.


Developers would prefer to write good software, the challenge and the craftsmanship are a draw.

However, the customers do not care and will not pay more so the business cannot justify it most of the time.

Who will pay twice (or five times) as much for software written in C instead of Python? Not many.


Well this is patently false. For the past 3 decades, programmers have intentionally made choices which perform as poorly as the hardware will allow them. You can pretty much draw a parallel line with hardware advancement and the bloating of software.

It hasn't gotten 100x harder to display hypermedia than it was 20 years ago. Yet applications use 10x-100x more memory and CPU than they used to. That's not good software, that's lazy software.

I just loaded "aol.com" in Firefox private browsing. It transferred 25MB, the tab is using 307MB of RAM, and the javascript console shows about 100 errors. Back when I actually used AOL, that'd be nearly 10x more RAM than my system had, and would be one of the largest applications on my machine. Aside from the one video, the entire page is just formatted text and image thumbnails.


> You can pretty much draw a parallel line with hardware advancement and the bloating of software.

I do not think it is surprising that there is a Jevons paradox-like phenomena with computer memory and like other instances of it, it does not necessarily follow that this must be a result of a corresponding decline in resource usage efficiency.


This is by design. Rent your computer.. don't buy! Use Geforce Now!

There is a small part of me that wonders if my $3000 computer is worth it when that could get me about 12 years of geforce now gaming with an updated graphic card and processor at all times. But I like to tinker so I'll probably end up spending $10k or more by the end of that 12 years instead.

There's plenty of scope for local AI models to become more efficient, too. MoE doesn't need too much RAM: only the parameters for experts that are active at any given time truly need to be in memory, the rest can be in read-only storage and be fetched on demand. If you're doing CPU inference this can even be managed automatically by mmap, whereas loading params into VRAM must currently be managed as part of running an inference step. (This is where GPU drivers/shader languages/programming models could also see some improvement, TBH)

But aren't the experts chosen on a token by token basis, which means bandwidth limitations?

Yes, with the direct conclusion from that being tl;dr in theory OPs explanation could mitigate RAM, in practice, it’s worse

(Source: I maintain an app integrated with llama.cpp, in practice no one likes 1 tkn/s generation times that you get from swapping, and honestly MoE makes RAM situation worse because in practice, model developers have servers and batch inference and multiple GPUs wired together. They are more than happy to increase the resting RAM budget and use even more parameters, limiting the active experts is about inference speed from that lens, not anything else)


MoE works exactly the opposite way you described. MoE means that each inference pass reads a subset of the parameters, which means that you can run a bigger model with the same memory bandwidth and achieve the same number of tokens per second. This means you're using more memory in the end.

It's not a zero sum game because silicon wafers are not a finite resource. Industry can and will produce more.

If industry has a bit of fear that the demand will slow down by the time they can output meaningful amount of chips, then probably not. Time will show.

Neither are paperclips.

I'm waiting for the good AI powers software.... Any day now.

Ideally, llm should be able to provide the capability to translate from memory inefficient languages to memory efficient languages, and maybe even optimize underlying algorithms in memory use for this.

But I'm not going to hold my breath


This is a temporary problem driven by the AI bubble. It's going to hurt until the bubble pops, but when that happens other things are going to hurt

Code the AI will produce will solve the memory usage problems which is itself a result of lazy or poor human coders.

Nice assertion. Perhaps you meant that AI could be directed towards less memory intensive implementations. That would still have to be directed by those same lazy/poor coders because the code the AI is learning from is their bad code (for the most part).

IDK, given the prevalence of Electron and other technically-correct-but-inefficient code out there, at bare minimum it would require decent prompting to help.

> There's been a vicious cycle of heavier software -> more RAM -> heavier software but if this RAM shortage is permanent, the cycle can't continue.

What do you mean it can't continue? You'll just have to deal with worse performance is all.

Revolutionary consumer-side performance gains like multi-core CPUs and switching to SSDs will be a thing of distant past. Enjoy your 2 second animations, peasant.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: