With all such optimizations, isn't llama too slow to be run on a CPU with a reas...

jart · 2024-05-16T06:33:05 1715841185

I'm gpu poor so I use cpu to run large models. For example, llamafile running mixtral 8x22b on my threadripper can chew through long legal documents and give me advice in a few minutes, using f16 weights and kahan summation. Show me someone who has that many graphics cards.

bsenftner · 2024-05-16T11:30:48 1715859048

Are you "gpu poor" by choice? I would have thought you, of all people, would be "gpu wealthy" like few can imagine.

jart · 2024-05-16T12:07:31 1715861251

I'm a hobbyist developer. I haven't been employed in six years. In the past few months I've been fortunate enough to find work as a TVC for Mozilla, which has been wonderful. It's a good lifestyle that affords me a great deal of freedom and autonomy, but certainly not a lot of GPUs. I have an RTX 4090, a few XTX cards, and I rent stuff from Vast occasionally, but that's about it.

baobabKoodaa · 2024-05-16T15:21:18 1715872878

I love how RTX 4090 is now "GPU poor"

jart · 2024-05-16T15:35:32 1715873732

It can't run most of the open source models being released today. It's not useful if your job is to keep up with LLM releases.

kkzz99 · 2024-05-16T10:16:40 1715854600

What are your tokens/s on that setup?

baq · 2024-05-16T06:12:21 1715839941

Models get better for equal parameters, CPUs get faster and good folks at intel, amd and arm are probably working very hard to catch up with apple silicon in terms of memory architecture. I can see this be very relevant in a couple of years.

nxpnsv · 2024-05-16T06:09:38 1715839778

I don't see how that conclusion follows from this optimization.