Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can run a similarly sized model - Llama 2 70B - at the 'Q4_K_M' quantisation level, with 44 GB of memory [1]. So you can just about fit it on 2x RTX 3090 (which you can buy, used, for around $1100 each)

Of course, you can buy quite a lot of hosted model API access or cloud GPU time for that money.

[1] https://huggingface.co/TheBloke/Llama-2-70B-GGUF



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: