Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don’t know, it’s kind of amazing how good the lighter weight self hosted models are now.

Given a 16gb system with cpu inference only, I’m hosting gemma2 9b at q8 for llm tasks and SDXL turbo for image work and besides the memory usage creeping up for a second or so while i invoke a prompt, they’re basically undetectable in the background.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: