Hacker News new | past | comments | ask | show | jobs | submit login

Even with a PCIe FPGA card you're still going to be memory bound during inference. When running LLama.cpp on straight CPU memory bandwidth, not CPU power, is always the bottleneck.

Now if the FPGA card had a large amount of GPU tier memory then that would help.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: