With Ollama i got the 20B model running on 8 TitanX cards (2015). Ollama distrib... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		dougSF70 5 months ago \| parent \| context \| favorite \| on: Running GPT-OSS-120B at 500 tokens per second on N... With Ollama i got the 20B model running on 8 TitanX cards (2015). Ollama distributed the model so that the 15GB of vram required was split evenly accross the 8 cards. The tok/s were faster than reading speed.

Aurornis 5 months ago [–]

For the price of 8 decade old Titan X cards, someone could pick up a single modern GPU with 16GB or more of RAM.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact