One obvious use case is that it makes per-token generation much cheaper.

dnnssl2 · on Nov 30, 2023

That's not so much a use case, but I get what you're saying. It's nice that you can find optimizations to shift down the pareto frontier of across the cost and latency dimension. The hard tradeoffs are for cases like inference batching where it's cheaper and higher throughput but slower for the end consumer.

What's a good use case for an order of magnitude decrease in price per token? Web scale "analysis" or cleaning of unstructured data?