I have just watched Sonnet 3.7 vs Gemini 2.5 solving the same task (fix a bug end-to-end) side by side, and Sonnet hallucinated far worse and repeatedly got stuck in dead-ends requiring manual rescue. OTOH Gemini understood the problem based on bug description and code from the get go, and required minimal guidance to come up with a decent solution and implement it.