I wonder if this has the same downsides as golden and screenshot type tests, where you end up over-asserting resulting in tests that break for unrelated changes?
Obviously that’s a risk for hand written tests too but it’s easier (today… who knows what copilot like systems will offer soon!) for a human to reason about what’s relevant.
Yes, that is definitely a downside for these tests. The worst is when the text of some exception is printed and it includes line numbers. It does still require some discipline to think about what you're printing and avoid output that will be very noisy. This problem is mitigated quite a bit by the ease of accepting changes when these tests fail for obviously nonsense reasons though (just hit a couple buttons in an emacs buffer).
Obviously that’s a risk for hand written tests too but it’s easier (today… who knows what copilot like systems will offer soon!) for a human to reason about what’s relevant.