Ultimately yes I'll be using a GPU. I've got 4x NVIDIA Tesla P40s, 2x A4000 and an A5000 for doing all this. I've already got some things i'm building for the "one client at a time" thing with llama.cpp but it won't really be too important because there's not going to be more than just me using it as a smart home assistant. The SBC comment is around something like an Orange PI 5 which can actually run some stuff on the GPU actually and I want to see if I can get a very low power but "fast enough" system going for it, and use the bigger power hungry GPUs for larger tasks but it's all stuff to play with really.