Hacker News new | past | comments | ask | show | jobs | submit login

He's running quantized Q4 671b. However, MoE doesn't need cluster networking so you could probably run the full thing on two of them unquantized. Maybe the router could be all resident in GPU RAM instead of in contrast offloading a larger percentage of everything there, or is that already how it is set up in his gpu offload config?





Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: