- we have llama.cpp (could be enough or at least as mentioned in the paper a co-processor to accelerate the calc can be added, less need for large RAM / high end hardware)
- as most work is inference, might not need for as many GPUs
- consumer cards (24G) could possibly run the big models
- as most work is inference, might not need for as many GPUs
- consumer cards (24G) could possibly run the big models