Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've been trying this, and with compression on (4 bits) you can fit the entire 30B model on the 3090.


OK so don't need offloading at all for the quantised model - nice.

In practice, how good is the 30B model vs 175B?


I don't have access to 175B for comparison. In a vacuum, 30B isn't very good. In the neighborhood of GPT-NeoX-20B, I think, but not good. It repeats itself easily and has a tenuous relationship with the topic. It's still much better than anything I could run locally before now.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: