Hacker News new | past | comments | ask | show | jobs | submit login

I find it interesting because the DeepSeek stuff, while very cool, doesn't seem invalidate that more compute wouldn't translate to even _higher_ capabilities?

It's amazing what they did with a limited budget, but instead of the takeaway being "we don't need that much compute to achieve X", it could also be, "These new results show that we can achieve even 1000*X with our currently planned compute buildout"

But perhaps the idea is more like: "We already have more AI capabilities than we know how to integrate into the economy for the time being" and if that's the hypothesis, then the availability of something this cheap would change the equation somewhat and possibly justify investing less money in more compute.




Probably not. If the price of Nvidia is dropping, it's because investors see a world where Nvidia hardware is less valuable, probably because it will be used less.

You can't do the distill/magnify cycle like you do with alphago. LLM models have basically stalled in their base capabilities, pre training is basically over at this point, so the news arms race will be over marginal capability gains and (mostly) making them cheaper and cheaper.

But inference time scaling, right?

A weak model can pretend to be a stronger model if you let it cook for a long time. But right now it looks like models as strong as what we have aren't going to be very useful even if you let them run for a long, long time. Basic logic problems still tank o3 if they're not a kind that it's seen before.

Basically, there doesn't seem to be a use case for big data centers that run small models for long periods of time, they are in a danger zone of both not doing anything interesting and taking way too long to do it.

The AI war is going to turn into a price war, by my estimations. The models will be around as strong as the ones we have, perhaps with one more crank of quality. Then comes the empty, meaningless battle of just providing that service for as close to free as possible.

If Openai's agents panned out we might be having another conversation. But they didn't, and it wasn't even close.

This is probably it. There's not much left in the AI game


Your implication is that we have unlimited compute and therefore know that LLMs are stalled.

Have you considered that compute might be the reason why LLMs are stalled at the moment?

What made LLMs possible in the first place? Right, compute! Transformer Model is 8 years old, technically GPT4 could have been released 5 years ago. What stopped it? Simple, the compute being way too low.

Nvidia has improved compute by 1000x in the past 8 years but what if training GPT5 takes 6-12 months for 1 run based on what OpenAI tries to do?

What we see right now is that pre-training has reached the limits of Hopper and Big Tech is waiting for Blackwell. Blackwell will easily be 10x faster in cluster training (don't look on chip performance only) and since Big Tech intends to build 10x larger GPU clusters then they will have 100x compute systems.

Let's see then how it turns out.

The limit on training is time. If you want to make something new and improve then you should limit training time because nobody will wait 5-6 months for results anymore.

It was fine for OpenAI years ago to take months to years for new frontier models. But today the expectations are higher.

There is a reason why Blackwell is fully sold out for the year. AI research is totally starved for compute.

The best thing for Nvidia is also that while AI research companies compete with each other, they all try to get Nvidia AI HW.


The age of pre-training is basically over, I think everyone acknowledged this and it's not to do with not having a big enough cluster. The bull argument on AI is that inference time scaling will pull us to the next step

Except o3 benchmarks are, seemingly, pretty solid evidence that leaving LLM'S on for the better part of a day and spending a million dollars gets you... Nothing. Passing a basic logic test using brute force methods and which falls apart on a marginally easier test that it just wasn't trained on.

The returns on computer and data seem to be diminishing with more and more exponential increases in inputs returning geometric increases in quality, and we're out of quality training data so that is now much worse even if the scaling wasn't plateauing.

All this, and the scale that got us this far seems to have done nothing to give us real intelligence, there's no planning or real reasoning and this is demonstrated every time it tries to do something out of distribution, or even in distribution but just complicated. Even if we got another crank or two out of this, we're still at the bottom of the mountain here. We haven't started and we're already out of gas

Scale doesn't fix this any more than building a mile tall fence stops the next break in. If it was going to work we would have seen to work already. LLM's don't have much juice left in the squeeze, imo


We don't know for example what a larger model can do with the new techniques DeepSeek is using for improving/refining it. It's possible the new models on their [own] failed to show progress but a combination of techniques will enable that barrier to be crossed.

We also don't know what the next discovery/breakthrough will be like. The reward for getting smarter AI is still huge and so the investment will likely remain huge for some time. If anything DeepSeek is showing us that there is still progress to be made.


Pending me getting an understanding of what those advances were, maybe?

But making things smaller is different than making them more powerful, those are different categories of advancement.

If you've noticed, models of varying sizes seem to converge on a narrow window of capabilities even when separated by years of supposed advancement. This should probably raise red flags


> You can't do the distill/magnify cycle like you do with alphago

are you sure? people are saying that there’s an analogous cycle where you use o1-style reasoning to produce better inputs to the next training round


KIND OF

if you've tried to get o1 to give you outputs in a specific format, it often just tells you to take a hike. It's a stubborn model, which implies a lot

This is speculation, but it seems that the main benefit of reasoning models is that they provide a dimension along which RL can be applied to make them better at math and maybe coding, things with verifiable outputs.

Reasoning models likely don't learn better reasoning from their hidden reasoning tokens, they're 1) trying to find a magic token which when raised to its attention make it more effective (basically give it room to say something that jogs its memory) or 2) it is trying to find a series of steps which do a better job of solving a specific class of problem than a single pass does, making it more flexible in some senses but more stubborn along others

Reasoning data as training data is a poison pill, in all likelihood, and just makes a small window of RL vulnerable problems easier to answer (when we have systems that don't better). It doesn't really plan well, doesn't truly learn reasoning, etc

Maybe seeing the actual output of o3 will change my mind but I'm horrifically bearish on reasoning models


So you're saying we're close to AGI? Because the game doesn't stop until we get there.


I don't think LLMs lead to AGI. It's a local maxima.


I think LLMs are getting us closer to AGI in the same way that Madame Tussauds wax museum got us closer to human cloning


This argument ignores scaling laws


It really doesn't lol. Those laws are like Moore's law, an observation rather than something Fundamental like laws in physics

The scaling has been plateauing, and half that equation is quality training data which is totally out at this point.

Maybe reasoning models will help produce synthetic data but that's still to be seen. So far the only benefit reasoning seems to bring is fossilizing the models and improving outputs along a narrow band of verifiable answers that you can do RL on to get correct

Synthetic data maybe buys you time, but it's one turn of the crank and not much more


They are derived from statistical laws unlike moores law

https://en.m.wikipedia.org/wiki/Chernoff_bound

I agree with you that they require data


The stock market is not the economy, Wall Street is not Main Street. You need to look at this more macroscopically if you want to understand this.

Basically: China tech sector just made a big splash, traders who witnessed this think other traders will sell because maybe US tech sector wasn't as hot, so they sell as other traders also think that and sell.

The fall will come to rest once stocks have fallen enough that traders stop thinking other traders will sell.

Investors holding for the long haul will see this fall as stocks going on sale and proceed to buy because they think other investors will buy.

Meanwhile in the real world, on Main Street, nothing has really changed.

Bogleheads meanwhile are just starting the day with their coffee, no damns given to the machinations of the stock market because it's Monday and there's work to be done.


Is it really related to China's tech sector as such, though? If this is true then Openai, Google or even many magnitudes smaller companies etc. can just easily replicate similar methods in their processes and provide models which are just as good or better. However they'll need way less Nvidia GPUs and other HW to do that than when training their current models.


Not really.

The Magnificent Seven are the only thing propping up the whole US economy.

If they go down, you go down.


s&p500 was still up by normal amounts during 2023 and 2024 if you exclude big tech. definitely they are an outsize portion of the index but that doesn't mean the rest of the economy isn't growing. https://www.inc.com/phil-rosen/stock-market-outlook-sp500-in...


Isn't this a good reason to break these companies up and mitigate the risk?


Well said.


> doesn't seem invalidate that more compute wouldn't translate to even _higher_ capabilities?

That's how i understand it.

And since their current goal seems to be 'AGI' and their current plan for achieving it seems to be scaling LLMs (network depth wise and at inference time prompt wise), i don't see why it wouldn't hold.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: