Hacker News new | past | comments | ask | show | jobs | submit login

The real question will be, how much you can quantize that while still retaining sanity. 400b at 2-bit would be possible to run on a Mac Studio - probably at multiple seconds per token, but sometimes that's "fast enough".



Yes. I expect an explosion of research and experimentation in model compression. The good news is I think there are tons of avenues that have barely been explored at all. We are at the very beginning of understanding this stuff, and my bet is that in a few years we'll be able to compress these models 10x or more.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: