You can quantize the cache and fit quite a bit on GPUs. At least 75k on my mere 24GB 3090, maybe 200K with a fancy quantization repo.