Hacker News new | past | comments | ask | show | jobs | submit login

Related recent discussion on twitter: https://x.com/Teknium1/status/1858987850739728635

Looks like other folks get 80 tok/s with max batch size, that's surprising to me but vLLM is definitely more optimized than my implementation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: