I've been trying this, and with compression on (4 bits) you can fit the entire 3...

dharma1 · on Feb 21, 2023

OK so don't need offloading at all for the quantised model - nice.

In practice, how good is the 30B model vs 175B?

Miraste · on Feb 21, 2023

I don't have access to 175B for comparison. In a vacuum, 30B isn't very good. In the neighborhood of GPT-NeoX-20B, I think, but not good. It repeats itself easily and has a tenuous relationship with the topic. It's still much better than anything I could run locally before now.