Who provides cheapest GPU inferencing and hosting of fine-tuned models (7B size)? I already have the finetuned model ready, just looking for a cheap place to host and run inferencing.
I've looked at Replicate and Together.ai, they both provide really the best tools in this space, but hosting is expensive. Together costs about 1.4/hr to host a 7B model. Replicate is more expensive.
Ideally, I wouldn't be charged for idle time and only active time (replicate does this already, but your finetuned model needs to be based off of a limited set of base models)
Any recommendations?
Roll your own k8s? Predibase?