Hacker News new | past | comments | ask | show | jobs | submit login

People that run 7B 4bit is probably because they don't have the memory for more. And there are tests showing that quantizing a bigger model is always better than running a smaller model full precision (i.e LlaMa 13B 4bit > 7B fp16)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: