Hacker News new | past | comments | ask | show | jobs | submit login

Can someone a lot smarter than me give a basic explanation as to why something like this can run at a respectable speed on the CPU whereas Stable Diffusion is next to useless on them? (That is to say, 10-100x slower, whereas I have not seen GPU based LLaMA go 10-100x faster than the demo here.) I had assumed there were similar algorithms at play.



Stable Diffusion runs pretty fast on Apple Silicon. Not sure if that uses the GPU though.

I think one reason in this particular case may be the 4-bit quantization.


Quantization is the answer here. CPU running the large models at 16 bits (which is actually 32, because CPUs mostly do not support FP16) would be really slow.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: