Hacker News new | past | comments | ask | show | jobs | submit login

I wish more places showed Time To First Token. For scenarios real time human interaction, the important part is how long until the first token is returned, and are tokens generated faster than people consume them.

Sadly very few benchmarks bother to track this.




Hi, we have this if you take a look at the models page (https://artificialanalysis.ai/models) and scroll down to 'Latency', and also on the API host comparison pages for each model (e.g. https://artificialanalysis.ai/models/llama-2-chat-70b)


Ah so you do!

Your latency numbers for OpenAI (and Azure's equivalents) seem really high, I run time to first token tests and I see much better numbers!

(Also are those numbers average, p50, p99, etc? I'd honestly expect a box plot to really see what is going on!)


Hey com2kid - if you're still there, we did end up adding boxplots to show variance. Can be seen on the models page https://artificialanalysis.ai/models and on each models page where you view hosts by clicking one of the models. They are toward the end of the page under 'Detailed performance metrics'




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: