Do you have experience with snapshot tests? If not, you should try it out for a bit and see what we mean, you'll quickly run into the situation we described.
If you introduce a new feature and have a new button on the page, for example, the snapshot test will fail (by necessity, you're changing what's on the page) because the snapshot doesn't match the current version with the new button, for example. So, what you have to do is regenerate the snapshot (regenerating it will always make it pass).
The problem is in my experience, that these kind of tests are massively unreliable. So you either fiddle with tolerances (most testing libraries let you set a threshold for what constitutes a failure in a snapshot) a lot, or you manually have to hunt down and pixel peek at what might've happened.
As an example of what usually happens is that, at some middle step of the test an assert fails because some new code causes things to render slightly out-of-order or delayed in some way. Ultimately the final product is good and if it was an intelligent system it would give it a pass. The unit tests pass. Even the integration and E2E tests pass, but the snapshot tests fail because something changed. The chance that this test failure is a false positive is much, much higher than the chance of it being a legitimate failure, but let's assume it's legitimate like the good citizens we are and try to hunt it down.
So, you go through the snapshots, 1-by-1, setting breakpoints where you think things might break - note that this process by itself is painful, slow and manual, and often running snapshot tests on your local machine can break things because things get rendered differently than how they would in CI, which leads to an extra set of headaches - What happens here is that you're testing an imperfect reflection of what the actual underlying app is, though, because a lot or most of it will be mocked/hoisted/whatever. So sifting through what is a potential breakage, and what is just the quirk of the snapshot test is basically a gamble at best. You spin up the app locally and see that nothing looks out of the ordinary. No errors to be seen. Things look and behave as they should. You have your other tests, and they all pass. So, you chalk it up to a false positive, regen the snapshots, and go on with your day.
The next day you add some other thing to the page. Uh oh, snapshots fail again!
Rinse and repeat like this, and after the 50th time sitting there waiting for the snapshot to be captured, you decide there are better ways to spend your limited time on this mortal plain and just say "Fuck it, this one's probably a false positive as well" after taking a 2 minute look at the page locally before hitting the repacture command again.
I legitimately don't think I've ever actually seen a non-false positive snapshot test failure, to be honest.
Snapshot tests are even more sensitive to flakiness than E2E tests but flakiness is still a bug that can (and probably should) be fixed like any other.
As an example - the different browser you have running locally and in CI? That's a bug. The fix is to run the browser in a container - the same container in both environments. You can probably get away without this with an E2E test. Snapshot test? No way.
Ive seen people decry and abandon E2E tests for the same reason - simply because they couldnt get flakiness under control and thought that it was impossible to do so.
I dont think it's intrinsically impossible though. I think it's just a hard engineering problem. There are lots of nonobvious points of flakiness - e.g. unpinned versions, selects without order bys, nondeterministic JavaScript code, etc. but few that are simply IMpossible to fix.
If you introduce a new feature and have a new button on the page, for example, the snapshot test will fail (by necessity, you're changing what's on the page) because the snapshot doesn't match the current version with the new button, for example. So, what you have to do is regenerate the snapshot (regenerating it will always make it pass).
The problem is in my experience, that these kind of tests are massively unreliable. So you either fiddle with tolerances (most testing libraries let you set a threshold for what constitutes a failure in a snapshot) a lot, or you manually have to hunt down and pixel peek at what might've happened.
As an example of what usually happens is that, at some middle step of the test an assert fails because some new code causes things to render slightly out-of-order or delayed in some way. Ultimately the final product is good and if it was an intelligent system it would give it a pass. The unit tests pass. Even the integration and E2E tests pass, but the snapshot tests fail because something changed. The chance that this test failure is a false positive is much, much higher than the chance of it being a legitimate failure, but let's assume it's legitimate like the good citizens we are and try to hunt it down.
So, you go through the snapshots, 1-by-1, setting breakpoints where you think things might break - note that this process by itself is painful, slow and manual, and often running snapshot tests on your local machine can break things because things get rendered differently than how they would in CI, which leads to an extra set of headaches - What happens here is that you're testing an imperfect reflection of what the actual underlying app is, though, because a lot or most of it will be mocked/hoisted/whatever. So sifting through what is a potential breakage, and what is just the quirk of the snapshot test is basically a gamble at best. You spin up the app locally and see that nothing looks out of the ordinary. No errors to be seen. Things look and behave as they should. You have your other tests, and they all pass. So, you chalk it up to a false positive, regen the snapshots, and go on with your day.
The next day you add some other thing to the page. Uh oh, snapshots fail again!
Rinse and repeat like this, and after the 50th time sitting there waiting for the snapshot to be captured, you decide there are better ways to spend your limited time on this mortal plain and just say "Fuck it, this one's probably a false positive as well" after taking a 2 minute look at the page locally before hitting the repacture command again.
I legitimately don't think I've ever actually seen a non-false positive snapshot test failure, to be honest.