Ha, I'd recently asked about this here as well, just using some high memory AMD ...

Ha, I'd recently asked about this here as well, just using some high memory AMD setup to infer.

Another thing I wonder is whether using a bunch of Geforce 4060 TI with 16GB could be useful - they cost only around 500 EUR. If VRAM is the bottleneck, perhaps a couple of these could really help with inference (unless they become GPU bound, like too slow).