Is there any information on a suggested inference setup? I guess they had someth...

brucethemoose2 · on Feb 14, 2024

Its llama 7B, so anything that runs that.

You can quantize the cache and fit quite a bit on GPUs. At least 75k on my mere 24GB 3090, maybe 200K with a fancy quantization repo.

dsrtslnd23 · on Feb 14, 2024

looking at https://github.com/LargeWorldModel/LWM - they seem to indeed suggest to use a TPU vm

ZeroCool2u · on Feb 14, 2024

I suppose you could try with a Google Colab notebook attached to a free TPU instance? Probably would be quite limited if it worked at all.