When people are talking about $100M-$1B frontier model training runs, then obvio... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

HarHarVeryFunny 3 months ago | parent | context | favorite | on: How has DeepSeek improved the Transformer architec...

When people are talking about $100M-$1B frontier model training runs, then obviously efficiency matters!

Sure training cost will go down with time, but if you are only using 10% of the compute of your competition (TFA: DeepSeek vs LLaMa) then you could be saving 100's of millions per training run!

ahartmetz 3 months ago [–]

I was more stating the perception that compute is cheap than the fact that compute is cheap - often enough it isn't! But carelessness about performance happens, well, by default really.

Join us for AI Startup School this June 16-17 in San Francisco!
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact