Hacker News new | past | comments | ask | show | jobs | submit login

The $500B is just an aspirational figure they hope to spend on data centers to run AI models, such as GPT-o1 and its successors, that have already been developed.

If you want to compare the DeepSeek-R development costs to anything, you should be comparing it to what it cost OpenAI to develop GPT-o1 (not what they plan to spend to run it), but both numbers are somewhat irrelevant since they both build upon prior research.

Perhaps what's more relevant is that DeepSeek are not only open sourcing DeepSeek-R1, but have described in a fair bit of detail how they trained it, and how it's possible to use data generated by such a model to fine-tune a much smaller model (without needing RL) to much improve it's "reasoning" performance.

This is all raising the bar on the performance you can get for free, or run locally, which reduces what companies like OpenAI can charge for it.






Thinking of the $500B as only an aspirational number is wrong. It’s true that the specific Stargate investment isn’t fully invested yet, but that’s hardly the only money being spent on AI development.

The existing hyperscalers have already sunk ungodly amounts of money into literally hundreds of new data centers, millions of GPUs to fill them, chip manufacturing facilities, and even power plants with the impression that, due to the amount of compute required to train and run these models, there would be demand for these things that would pay for that investment. Literally hundreds of billions of dollars spent already on hardware that’s already half (or fully) built, and isn’t easily repurposed.

If all of the expected demand on that stuff completely falls through because it turns out the same model training can be done on a fraction of the compute power, we could be looking at a massive bubble pop.


If the hardware can be used more efficiently to do even more work, the value of the hardware will hold since demand will not reduce but actually increase much faster than supply.

Efficiency going up tends to increase demand by much more than the efficiency-induced supply increase.

Assuming that the world is hungry for as much AI as it can get. Which I think is true, we're nowhere near the peak of leveraging AI. We barely got started.


Perhaps, but this is not guaranteed. For example, demand might shift from datacenter to on-site inference when high-performing models can run locally on consumer hardware. Kind of like how demand for desktop PCs went down in the 2010s as mobile phones, laptops, and ipads became more capable, even though desktops also became even more capable. People found that running apps on their phone was good enough. Now perhaps everyone will want to run inference on-site for security and privacy, and so demand might shift away from big datacenters into desktops and consumer-grade hardware, and those datacenters will be left bidding each other down looking for workloads.

Inference is not where the majority of this CAPEX is used. And even if, monetization will no doubt discourage developers from dispensing the secret sauce to user controlled devices. So I posit that data centres inference is safe for a good while.

> Inference is not where the majority of this CAPEX is used

That's what's baffling with Deepseek's results: they spent very little on training (at least that's what they claim). If true, then it's a complete paradigm shift.

And even if it's false, the more wide AI usage is, the bigger the share of inference will be, and inference cost will be the main cost driver at some point anyway.


You are looking at one model and also you do realize it isn’t even multimodal, also it shifts training compute to inference compute. They are shifting the paradigm for this architecture for LLMs, but I don’t think this is really new either.

> it shifts training compute to inference compute

No, this is the change introduced by o1, what's different with R1 is that its use of RL is fundamentally different (and cheaper) that what OpenAI did.


>Efficiency going up tends to increase demand by much more than the efficiency-induced supply increase.

https://en.wikipedia.org/wiki/Jevons_paradox


The mainframes market disagrees.

Like the cloud compute we all use right now to serve most of what you use online?

Ran thanks to PC parts, that's the point. IBM is nowhere close to Amazon or Azure in terms of cloud, and I suspect most of their customers run on x86_64 anyway.

Microsoft and OpenAI seem to be going through a slow-motion divorce, so OpenAI may well end up using whatever data centers they are building for training as well as inference, but $500B (or even $100B) is so far beyond the cost of current training clusters, that it seems this number is more a reflection on what they are hoping the demand will be - how much they will need to spend on inference capacity.

I agree except on the "isn't easily repurposed" part. Nvidia's chips have CUDA and can be repurposed for many HPC projects once the AI bubble will be done. Meteorology, encoding, and especially any kind of high compute research.

None of those things are going to result in a monetary return of investment though, which is the problem. These big companies are betting a huge amount of their capital on the prospect of being able to make significant profit off of these investments, and meteorology etc isn’t going to do it.

Yes, it's going to benefit all the other areas of research like medical and meteorology, which I'm happy with.

/Literally hundreds of billions of dollars spent already on hardware that’s already half (or fully) built, and isn’t easily repurposed./

It's just data centers full of devices optimized for fast linear algebra, right? These are extremely repurposeable.


For mining dogecoin, right?

Nobody else is doing arithmetic in fp16 though.

What is the rationale for "isn't easily repurposed"?

The hardware can train LLM but also be used for vision, digital twin, signal detection, autonomous agents, etc.

Military uses seem important too.

Can the large GPU based data centers not be repurposed to that?


> If you want to compare the DeepSeek-R development costs to anything, you should be comparing it to what it cost OpenAI to develop GPT-o1 (not what they plan to spend to run it)

They aren't comparing the 500B investment to the cost of deepseek-R1 (allegedly 5 millions) they are comparing the cost of R1 to the one of o1 and extrapolating from that (we don't know exactly how much OpenAI spent to train it, but estimates put it around $100M, in which case deepseek would have been only 95% more cost-efficient, not 99%)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: