For a few uni/personal projects I noticed the same about Langchain: it's good at helping you use up tokens. The other use case, quickly switching between models, is a very valid reason still. However, I've recently started playing with OpenRouter which seems to abstract the model nicely.
I think we now know, collectively, a lot more about what’s annoying/hard about building LLM features than we did when LangChain was being furiously developed.
And some things we thought would be important and not-easy, turned out to be very easy: like getting GPT to give back well-formed JSON.
So I think there’s lots of room.
One thing LangChain is doing now that solves something that IS very hard/annoying is testing. I spent 30 minutes yesterday re-running a slow prompt because 1 in 5 runs would produce weird output. Each tweak to the prompt, I had to run at least 10 times to be reasonably sure it was an improvement.
It can be faster and more effective to fallback to a smaller model (gpt3.5 or haiku), the weakness of the prompt will be more obvious on a smaller model and your iteration time will be faster
Do different versions react to prompts in the same way? I imagined the prompt would be tailored to the quirks of a particular version rather than naturally being stably optimal across versions.
I suppose that is one of the benefits of using a local model, that it reduces model risk. I.e., given a certain prompt, it should always reply in the same way. Using a hosted model, operationally you don't have that control over model risk.