Off topic, but I’m very interested in local LLMs. Could you point me in the righ...

doctoboggan · 2024-10-30T15:42:27 1730302947

In general for local LLMs, the more memory the better. You will be able to fit larger models in RAM. The faster CPU will give you more tokens/second, but if you are just chatting with a human in the loop, most recent M series macs will be able to generate tokens faster than you can read them.

int_19h · 2024-10-30T18:07:34 1730311654

That also very much depends on model size. For 70B+ models, while the tok/s are still fast enough for realtime chat, it's not going to be generating faster than you can read it, even on Ultra with its insane memory bandwidth.

thrownblown · 2024-10-30T15:37:01 1730302621

https://www.reddit.com/r/LocalLLaMA/ https://www.reddit.com/r/SillyTavernAI/

BenFranklin100 · 2024-10-30T16:02:36 1730304156

Thanks to both of you!

touristtam · 2024-10-30T19:58:22 1730318302

Have a look at ollama? I think there is a vscode extension to hook into local LLM if you are so inclined: https://ollama.com/blog/continue-code-assistant

noman-land · 2024-10-30T17:16:31 1730308591

Get as much RAM as you can stomach paying for.