Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
alwayslikethis
on March 15, 2023
|
parent
|
context
|
favorite
| on:
Llama.rs – Rust port of llama.cpp for fast LLaMA i...
Quantization is the answer here. CPU running the large models at 16 bits (which is actually 32, because CPUs mostly do not support FP16) would be really slow.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: