Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is that just because nobody has made an effort yet to port them upstream, or is there something inherently difficult about making those changes work in llama.cpp?



I get the impression most llama.cpp users are interested in running models on GPU. AFAICT this optimization is CPU-only. Don't get me wrong – a huge one! – and opens the door to running llama.cpp on more and more edge devices.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: