I wish more places showed Time To First Token. For scenarios real time human int...

Gcam · on Jan 17, 2024

Hi, we have this if you take a look at the models page (https://artificialanalysis.ai/models) and scroll down to 'Latency', and also on the API host comparison pages for each model (e.g. https://artificialanalysis.ai/models/llama-2-chat-70b)

com2kid · on Jan 17, 2024

Ah so you do!

Your latency numbers for OpenAI (and Azure's equivalents) seem really high, I run time to first token tests and I see much better numbers!

(Also are those numbers average, p50, p99, etc? I'd honestly expect a box plot to really see what is going on!)

Gcam · 2024-01-29T07:30:10 1706513410

Hey com2kid - if you're still there, we did end up adding boxplots to show variance. Can be seen on the models page https://artificialanalysis.ai/models and on each models page where you view hosts by clicking one of the models. They are toward the end of the page under 'Detailed performance metrics'