If you have 128gb ram, try running MoE models, they're a far better fit for Appl...

FuriouslyAdrift · 2025-01-07T15:30:13 1736263813

Project Digits... https://www.nvidia.com/en-us/project-digits/

throwaway48476 · 2025-01-07T21:10:25 1736284225

I guess they're tired of people buying macs for AI.

memhole · 2025-01-07T19:32:01 1736278321

I haven’t had great luck with the wizard as a counter point. The token generation is unbearably slow. I might have been using too large of a context window, though. It’s an interesting model for sure. I remember the output being decent. I think it’s already surpassed by other models like Qwen.

terhechte · 2025-01-07T20:21:08 1736281268

Long context windows are a problem. I gave Qwen 2.5 70b a ~115k context and it took ~20min for the answer to finish. The upside of MoE models vs 70b+ models is that they have much more world knowledge.

logankeenan · 2025-01-07T16:30:03 1736267403

Do you have any recommendations on models to try?

stkdump · 2025-01-07T17:03:55 1736269435

Mixtral and Deepseek use MOE. Most others don't.

memhole · 2025-01-07T19:38:31 1736278711

I planted garlic this year. Thanks for documenting! I can’t wait to see what I get harvest time.

I like the Llama models personally. Meta aside. Qwen is fairly popular too. There’s a number of flavors you can try out. Ollama is a good starting point to try things quickly. You’re def going to have to tolerate things crashing or not working imo before you understand what your hardware can handle.

Terretta · 2025-01-07T17:51:04 1736272264

Mixtral 8x22b https://mistral.ai/news/mixtral-8x22b/

terhechte · 2025-01-07T18:10:13 1736273413

In addition to the ones listed by others, WizardLM2 8x22b (was never officially released by Microsoft but is available).

cma · 2025-01-07T15:59:43 1736265583

You can also run the experts on separate machines with low bandwidth networking or even the internet (token rate limited by RTT)