Ha, I'd recently asked about this here as well, just using some high memory AMD setup to infer.
Another thing I wonder is whether using a bunch of Geforce 4060 TI with 16GB could be useful - they cost only around 500 EUR. If VRAM is the bottleneck, perhaps a couple of these could really help with inference (unless they become GPU bound, like too slow).
Another thing I wonder is whether using a bunch of Geforce 4060 TI with 16GB could be useful - they cost only around 500 EUR. If VRAM is the bottleneck, perhaps a couple of these could really help with inference (unless they become GPU bound, like too slow).