I agree with your comments and want to add re: benchmarks: I don’t pay too much attention to benchmarks, but I have the advantage of now being retired so I can spend time experimenting with a variety of local models I run with Ollama and commercial offerings. I spend time to build my own, very subjective, views of what different models are good for. One kind of model analysis that I do like are the circle displays on Hugging Face that show how a model benchmarks for different capabilities (word problems, coding, etc.)