Hacker News new | past | comments | ask | show | jobs | submit login

Forgot about R1, what hardware are you using to run it?



I haven’t ran QWQ yet, but it’s a 32B. So about 20GB RAM with Q4 quant. Closer to 25GB for the 4_K_M one. You can wait for a day or so for the quantized GGUFs to show up (we should see the Q4 in the next hour or so). I personally use Ollama on an MacBook Pro. It usually takes a day or two for it to show up. Any M series MacBook with 32GB+ of RAM will run this.


On Macbooks with Apple Silicon consider MLX models from MLX community:

https://huggingface.co/collections/mlx-community/qwq-32b-pre...

For a GUI, LM Studio 0.3.x is iterating MLX support: https://lmstudio.ai/beta-releases

When searching in LM Studio, you can narrow search to the mlx-community.


on macos with lm-studio is it better to use the mlx-community releases over the one that lm-studio releases?

also I didn't install a beta and mine says i'm using 3.5 which is what the beta also says. is there a difference right now between the beta and the release version?


You're right, looks like 0.3.5 is now on the home page.



> 20GB RAM with Q4 quant. Closer to 25GB for the 4_K_M one

how does this math work? are there rules of thumb that you guys know that the rest of us dont?


As a quick estimation, the size of q4 quantized model usually be around 60-70% of the model's parameter. You can preciselly check the quantized model size from .gguf files hosted in huggingface.





Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: