OpenAI aren't doing anything magic. We're optimizing Llama inference at the moment and it looks like we'll be able to roughly match GPT 3.5's price for Llama 2 70B.
Running a fine-tuned GPT-3.5 is surprisingly expensive. That's where using Llama makes a ton of sense. Once we’ve optimized inference, it’ll be much cheaper to run a fine-tuned Llama.
We're working on LLM Engine (https://llm-engine.scale.com) at Scale, which is our open source, self-hostable framework for open source LLM inference and fine-tuning. We have similar findings to Replicate: Llama 2 70B can be comparable to GPT 3.5 price, etc. Would be great to discuss this further!
OpenAI aren't doing anything magic. We're optimizing Llama inference at the moment and it looks like we'll be able to roughly match GPT 3.5's price for Llama 2 70B.
Running a fine-tuned GPT-3.5 is surprisingly expensive. That's where using Llama makes a ton of sense. Once we’ve optimized inference, it’ll be much cheaper to run a fine-tuned Llama.