Related recent discussion on twitter: https://x.com/Teknium1/status/185898785073... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

zackangelo 38 days ago | parent | context | favorite | on: Llama 3.1 405B now runs at 969 tokens/s on Cerebra...

Related recent discussion on twitter: https://x.com/Teknium1/status/1858987850739728635

Looks like other folks get 80 tok/s with max batch size, that's surprising to me but vLLM is definitely more optimized than my implementation.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact