Hacker News new | past | comments | ask | show | jobs | submit login

With all such optimizations, isn't llama too slow to be run on a CPU with a reasonable amount of parameters?



I'm gpu poor so I use cpu to run large models. For example, llamafile running mixtral 8x22b on my threadripper can chew through long legal documents and give me advice in a few minutes, using f16 weights and kahan summation. Show me someone who has that many graphics cards.


Are you "gpu poor" by choice? I would have thought you, of all people, would be "gpu wealthy" like few can imagine.


I'm a hobbyist developer. I haven't been employed in six years. In the past few months I've been fortunate enough to find work as a TVC for Mozilla, which has been wonderful. It's a good lifestyle that affords me a great deal of freedom and autonomy, but certainly not a lot of GPUs. I have an RTX 4090, a few XTX cards, and I rent stuff from Vast occasionally, but that's about it.


I love how RTX 4090 is now "GPU poor"


It can't run most of the open source models being released today. It's not useful if your job is to keep up with LLM releases.


What are your tokens/s on that setup?


Models get better for equal parameters, CPUs get faster and good folks at intel, amd and arm are probably working very hard to catch up with apple silicon in terms of memory architecture. I can see this be very relevant in a couple of years.


I don't see how that conclusion follows from this optimization.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: