Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is there any information on a suggested inference setup? I guess they had something different in mind than TPU v4-128 when they put it on HuggingFace?


Its llama 7B, so anything that runs that.

You can quantize the cache and fit quite a bit on GPUs. At least 75k on my mere 24GB 3090, maybe 200K with a fancy quantization repo.


looking at https://github.com/LargeWorldModel/LWM - they seem to indeed suggest to use a TPU vm


I suppose you could try with a Google Colab notebook attached to a free TPU instance? Probably would be quite limited if it worked at all.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: