DeepSeek does charge for the API (just 20-30x cheaper than o1), I assume OP is e...

andrewfromx · 2025-01-26T22:23:06 1737930186

i see and where is https://groq.com/ in this. They used to be the cheapest?

KTibow · 2025-01-26T22:44:09 1737931449

Right now Deepseek's official hosting is cheaper than everyone else who can manage to run the model, including Deepinfra. I haven't seen any good hypotheses as to why other than their large batch size and speculative decoding.

rfoo · 2025-01-27T07:35:37 1737963337

DeepSeek-V2/V3/R1's model architecture is very different from what Fireworks/Together/... were used to.

That's their "business" model (okay, they don't care about business that much for now, but still) too: you can't run it efficiently without doing months of work we already did, so even with all weights open you can't compete with us.

minimaxir · 2025-01-27T05:41:08 1737956468

Speaking of which, Groq just announced the distilled Llama 70B variant: https://x.com/GroqInc/status/1883734298488180826

minimaxir · 2025-01-26T22:26:26 1737930386

The "why is DeepSeek so much cheaper than o1" question is currently a mystery, with the best guess being that compute in China is cheaper.

coliveira · 2025-01-26T23:38:45 1737934725

DeepSeek is use way less computing to achieve the same results. That's the whole difference.

minimaxir · 2025-01-26T23:52:31 1737935551

Then a US compute provider should be able to launch a similarly-priced competitor (e.g. to capture the enterprise market concerned about the China associations) using the open-source version and drastically undercut OpenAI.

I suspect that won’t be the case.

rfoo · 2025-01-27T07:36:33 1737963393

> Then a US compute provider should be able to launch a similarly-priced competitor

Right, you just need a few months to implement efficient inference for MLA + their strangely looking MoE scheme + ..., easy!

Oh wait, the inference scheme described in their tech report is pretty much an exact fit for H800s. So if you run the recipe on H100s you are wasting the potential of your H100s. Otherwise have fun making variations to the serving architecture.

To be fair, we had chance. If someone decided to replicate the effort to serve their models back in May 2024 when DeepSeek-V2 was out we'd have it now. But nobody had interest as DS-V2 was pretty mediocre. They (and whoever realized the potential) made big bet and it is paying off.

acheong08 · 2025-01-26T22:40:00 1737931200

The model is much smaller. Compute in China should be more expensive considering all the US restrictions

minimaxir · 2025-01-26T22:46:51 1737931611

No one knows the size of o1. The only hint was a paper that suggested it was 200B parameters.

Meanwhile, DeepSeek R1 is known to be 671B parameters because it is open-source.

sigmoid10 · 2025-01-26T23:02:28 1737932548

R1 is a mixture model with only 37B active params. So while it's definitely expensive to train, it's rather light on compute during inference. All you really need is lots of memory.

portaouflop · 2025-01-26T23:28:58 1737934138

Electricity prices in China are about half of what they are in the US, I expect the rest is similar.