> Is it really that unlikely that a lab of genius engineers found a way to improve efficiency 10x
They literally published all their methodology. It's nothing groundbreaking, just western labs seem slow to adopt new research. Mixture of experts, key-value cache compression, multi-token prediction, 2/3 of these weren't invented by DeepSeek. They did invent a new hardware-aware distributed training approach for mixture-of-experts training that helped a lot, but there's nothing super genius about it, western labs just never even tried to adjust their model to fit the hardware available.
It's extremely cheap, efficient and kicks the ass of the leader of the market, while being under sanctions with AI hardware.
Most of all, can be downloaded for free, can be uncensored, and usable offline.
China is really good at tech, it has beautiful landscapes, etc. It has its own political system, but to be fair, in some way it's all our future.
A bit of a dystopian future, like it was in 1984.
But the tech folks there are really really talented, it's long time that China switched from producing for the Western clients, to direct-sell to the Western clients.
The leaderboard leader [1] is still showing the traditional AI leader, Google, winning. With Gemini-2.0-Flash-Thinking-Exp-01-21 in the lead. No one seems to know how many parameters that has, but random guesses on the internet seem to be low to mid 10s of billions, so fewer than DeepSeek-R1. Even if those general guesses are wrong, they probably aren't that wrong and at worst it's the same class of model as DeepSeek-R1.
So yes, DeepSeek-R1 appears to be not even be best in class, merely best open source. The only sense in which it is "leading the market" appears to be the sense in which "free stuff leads over proprietary stuff". Which is true and all, but not a groundbreaking technical achievement.
The DeepSeek-R1 distilled models on the other hand might actually be leading at something... but again hard to say it's groundbreaking when it's combining what we know we can do (small models like llama) with what we know we can do (thinking models).
The chatbot leaderboard seems to be very affected by things other than capability, like "how nice is it to talk to" and "how likely is it to refuse requests" and "how fast does it respond" etc. Flash is literally one of Google's faster models, definitely not their smartest.
Not that the leaderboard isn't useful, I think "is in the top 10" says a lot more than the exact position in the top 10.
I mean, sure, none of these models are being optimized for being the top of the leader board. They aren't even being optimized for the same things, so any comparison is going to be somewhat questionable.
But the claim I'm refuting here is "It's extremely cheap, efficient and kicks the ass of the leader of the market", and I think the leaderboard being topped by a cheap google model is pretty conclusive that that statement is not true. Is competitive with? Sure. Kicks the ass of? No.
google absolutely games for lmsys benchmarks with markdown styling. r1 is better than google flash thinking, you are putting way too much faith in lmsys
The U.S. firms let everyone skeptical go the second they had a marketable proof of concept, and replaced them with smart, optimistic, uncritical marketing people who no longer know how to push the cutting edge.
Maybe we don't need momentum right now and we can cut the engines.
Oh, you know how to develop novel systems for training and inference? Well, maybe you can find 4 people who also can do that by breathing through the H.R. drinking straw, and that's what you do now.
That's what they claim at least in the paper but that particular claim is not verifiable. The HAI-LLM framework they reference in the paper is not open sourced and it seems they have no plans to.
Additionally there are claims, such as those by Scale AI CEO Alexandr Wang on CNBC 1/23/2025 time segment below, that DeepSeek has 50,000 H100s that "they can't talk about" due to economic sanctions (implying they likely got by avoiding them somehow when restrictions were looser). His assessment is that they will be more limited moving forward.
It's amazing how different the standards are here. Deepseek's released their weights under a real open source license and published a paper with their work which now has independent reproductions.
OpenAI literally haven't said a thing about how O1 even works.
DeepSeek the holding company is called high-flyer, they actually do open source their AI training platform as well, here is the repo: https://github.com/HFAiLab/hai-platform
They can be more open and yet still not open source enough that claims of theirs being unverifiable are still possible. Which is the case for their optimized HAI-LLM framework.
But those approaches alone wouldn’t yield the improvements claimed. How did they train the foundational model upon which they applied RL, distillations, etc? That part is unclear and I don’t think anything they’ve released anything that explains the low cost.
It’s also curious why some people are seeing responses where it thinks it is an OpenAI model. I can’t find the post but someone had shared a link to X with that in one of the other HN discussions.
They literally published all their methodology. It's nothing groundbreaking, just western labs seem slow to adopt new research. Mixture of experts, key-value cache compression, multi-token prediction, 2/3 of these weren't invented by DeepSeek. They did invent a new hardware-aware distributed training approach for mixture-of-experts training that helped a lot, but there's nothing super genius about it, western labs just never even tried to adjust their model to fit the hardware available.