- we have llama.cpp (could be enough or at least as mentioned in the paper a co-... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

leroman on Feb 28, 2024 | parent | context | favorite | on: The Era of 1-bit LLMs: ternary parameters for cost...

- we have llama.cpp (could be enough or at least as mentioned in the paper a co-processor to accelerate the calc can be added, less need for large RAM / high end hardware)

- as most work is inference, might not need for as many GPUs

- consumer cards (24G) could possibly run the big models

sebzim4500 on Feb 28, 2024 [–]

If consumer cards can run the big models, then datacenter cards will be able to efficiently run the really big models.

leroman on Feb 28, 2024 | [–]

Some tasks we are using LLMs for are performing very close to GPT-4 levels using 7B models, so really depends on what value you are looking to get.

Join us for AI Startup School this June 16-17 in San Francisco!
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact