The real question will be, how much you can quantize that while still retaining ... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

int_19h 6 months ago | parent | context | favorite | on: Meta Llama 3

The real question will be, how much you can quantize that while still retaining sanity. 400b at 2-bit would be possible to run on a Mac Studio - probably at multiple seconds per token, but sometimes that's "fast enough".

modeless 6 months ago [–]

Yes. I expect an explosion of research and experimentation in model compression. The good news is I think there are tons of avenues that have barely been explored at all. We are at the very beginning of understanding this stuff, and my bet is that in a few years we'll be able to compress these models 10x or more.

Consider applying for YC's W25 batch! Applications are open till Nov 12.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact