Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You're talking apples and oranges. The plateau the frontier models have hit is the limited further gains to be had from dataset (+ corresponding model/compute) scaling.

These new reasoning models are taking things in a new direction basically by adding search (inference time compute) on top of the basic LLM. So, the capabilities of the models are still improving, but the new variable is how deep of a search you want to do (how much compute to throw at it at inference time). Do you want your chess engine to do a 10 ply search or 20 ply? What kind of real world business problems will benefit from this?



"New" reasoning models are plain LLMs with clever reinforcement learning. o1 is itself reinforcement learning on top GPT-4o.

They found a way to make test time compute a lot more effective and that is an advance but the idea is not new, the architecture is not new.

And the vast majority of people convinced LLMs plateaued did so regardless of test time compute.


The fact that these reasoning models may compute for extended durations, using exponentially more compute for linear performance gains (says OpenAI), resulting in outputs that while better are not necessarily any longer (more tokens) than before, all point to a different architecture - some type of iterative calling of the underlying model (essentially a reasoning agent using the underlying model).

A plain LLM does not use variable compute - it is a fixed number of transformer layers, a fixed amount of compute for every token generated.


Architecture generally refers to the design of the model. In this case, the underlying model is still a transformer based llm and so is its architecture.

What's different is the method for _sampling_ from that model where it seems they have encouraged the underlying LLM to perform a variable length chain of thought "conversation" with itself as has been done with o1. In addition, they _repeat_ these chains of thought in parallel using a tree of some sort to search and rank the outputs. This apparently scales performance on benchmarks as you scale both length of the chain of thought and the number of chains of thought.


No disagreement, although the sampling + search procedure is obviously adding quite a lot to the capabilities of the system as a whole, so it really should be considered as part of the architecture. It's a bit like AlphaGo or AlphaZero - generating potential moves (cf LLM) is only a component of the overall solution architecture, and the MCTS sampling/search is equally (or more) important.


Ah, I see. Yeah that's a fair assessment and in hindsight is probably the way architecture is being used in the article.


I think throwaway already explained what i was getting at.

That said, i probably did downplay the achievement. It may not be a "new" idea to do something like this but finding an effective method for reflection that doesn’t just lock you into circular thinking and is applicable beyond well defined problem spaces is genuinely tough and a breakthrough.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: