Hacker News new | past | comments | ask | show | jobs | submit login

Off topic, but I’m very interested in local LLMs. Could you point me in the right direction, both hardware specs and models?



In general for local LLMs, the more memory the better. You will be able to fit larger models in RAM. The faster CPU will give you more tokens/second, but if you are just chatting with a human in the loop, most recent M series macs will be able to generate tokens faster than you can read them.


That also very much depends on model size. For 70B+ models, while the tok/s are still fast enough for realtime chat, it's not going to be generating faster than you can read it, even on Ultra with its insane memory bandwidth.



Thanks to both of you!


Have a look at ollama? I think there is a vscode extension to hook into local LLM if you are so inclined: https://ollama.com/blog/continue-code-assistant


Get as much RAM as you can stomach paying for.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: