Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One obvious use case is that it makes per-token generation much cheaper.



That's not so much a use case, but I get what you're saying. It's nice that you can find optimizations to shift down the pareto frontier of across the cost and latency dimension. The hard tradeoffs are for cases like inference batching where it's cheaper and higher throughput but slower for the end consumer.

What's a good use case for an order of magnitude decrease in price per token? Web scale "analysis" or cleaning of unstructured data?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: