Founder of Replicate here. It's early indeed. OpenAI aren't doing anything magic...

yixu34 · on Sept 13, 2023

We're working on LLM Engine (https://llm-engine.scale.com) at Scale, which is our open source, self-hostable framework for open source LLM inference and fine-tuning. We have similar findings to Replicate: Llama 2 70B can be comparable to GPT 3.5 price, etc. Would be great to discuss this further!

Dowwie · on Sept 13, 2023

How heavy of a lift is it to optimize inference?