Then what's to stop you from making any unit test pass by just changing the expe...

Hasu · 2025-01-07T04:51:22 1736225482

The problem with snapshot tests is that they tend to fail when something is changed.

One property of good tests is that they only fail when something is broken.

Snapshot tests aren't really automated tests, because you have to manually check the outputs to see if failures are genuine. It's a reminder to do manual checking but it's still not an automated process.

> The best testing technique in the world can't save us from developer error.

Sure, but using a testing technique that increases developer error is unwise, and snapshot testing has done that every time I've seen it used, so I don't use it anymore.

yawaramin · 2025-01-07T06:42:04 1736232124

> they tend to fail when something is changed.

Then fix the test so that it fails only when something breaks? Do people not fix flaky, overly broad, or incorrect unit tests? How is a snapshot test any different?

DecoySalamander · 2025-01-07T10:33:59 1736246039

A fix here would be to drop snapshot testing altogether, since being flaky and overly broad is a natural result of dumb diffing of your app's output.

Hasu · 2025-01-07T12:05:10 1736251510

I'm not sure you understand what snapshot testing is. It's a form of testing where you render frontend components and take a "snapshot", storing the output and saving it for later, then on the next test run, the component is rendered again and compared to the saved snapshot.

Any change that modifies the rendering of any component under test will break snapshot tests. It's literally just a test that says, "Is this render function still the same as it used to be?"

yawaramin · 2025-01-07T17:48:24 1736272104

I'm not sure you understand how to capture snapshots at the correct level of granularity. If you have snapshots that are breaking for irrelevant changes, then by definition those changes don't need to be snapshotted, do they? Cut down the snapshot to only the relevant part of the rendered HTML so that when it fails, you know something broke.

Eg in htmx we can add a snapshot test for the pagination component I mentioned in this comment: https://news.ycombinator.com/item?id=42619553

This snapshot will break if I have a bug in my pagination logic implementation, and it's highly unlikely to fail for random irrelevant changes.

Hasu · 2025-01-07T20:33:45 1736282025

It would still break if you need to modify that output in any way. Need to add a class for styling? The snapshot breaks. Need to add a new hx-attribute? The snapshot breaks. It's not tied to the logic, it's tied to whatever markup you output. You've reduced the surface area of the problem, but not eliminated it.

> If you have snapshots that are breaking for irrelevant changes, then by definition those changes don't need to be snapshotted, do they?

You're so close to getting it.

yawaramin · 2025-01-07T21:10:22 1736284222

> Need to add a class for styling?

How often? All the time? Or relatively rarely? For most projects, it's the latter. We are not constantly churning the styling.

> Need to add a new hx-attribute? The snapshot breaks. It's not tied to the logic

An `hx-` attribute is logic. That's the point of htmx.

> You've reduced the surface area of the problem, but not eliminated it.

That's a risk of any kind of unit test. You can always have false positives.

> You're so close to getting it.

That tests should be fixed until they're robust? I'm not a fan of the learned helplessness of the 'We couldn't figure out how to do XYZ, therefore XYZ is a bad idea' approach.

sensanaty · 2025-01-07T15:40:43 1736264443

Do you have experience with snapshot tests? If not, you should try it out for a bit and see what we mean, you'll quickly run into the situation we described.

If you introduce a new feature and have a new button on the page, for example, the snapshot test will fail (by necessity, you're changing what's on the page) because the snapshot doesn't match the current version with the new button, for example. So, what you have to do is regenerate the snapshot (regenerating it will always make it pass).

The problem is in my experience, that these kind of tests are massively unreliable. So you either fiddle with tolerances (most testing libraries let you set a threshold for what constitutes a failure in a snapshot) a lot, or you manually have to hunt down and pixel peek at what might've happened.

As an example of what usually happens is that, at some middle step of the test an assert fails because some new code causes things to render slightly out-of-order or delayed in some way. Ultimately the final product is good and if it was an intelligent system it would give it a pass. The unit tests pass. Even the integration and E2E tests pass, but the snapshot tests fail because something changed. The chance that this test failure is a false positive is much, much higher than the chance of it being a legitimate failure, but let's assume it's legitimate like the good citizens we are and try to hunt it down.

So, you go through the snapshots, 1-by-1, setting breakpoints where you think things might break - note that this process by itself is painful, slow and manual, and often running snapshot tests on your local machine can break things because things get rendered differently than how they would in CI, which leads to an extra set of headaches - What happens here is that you're testing an imperfect reflection of what the actual underlying app is, though, because a lot or most of it will be mocked/hoisted/whatever. So sifting through what is a potential breakage, and what is just the quirk of the snapshot test is basically a gamble at best. You spin up the app locally and see that nothing looks out of the ordinary. No errors to be seen. Things look and behave as they should. You have your other tests, and they all pass. So, you chalk it up to a false positive, regen the snapshots, and go on with your day.

The next day you add some other thing to the page. Uh oh, snapshots fail again!

Rinse and repeat like this, and after the 50th time sitting there waiting for the snapshot to be captured, you decide there are better ways to spend your limited time on this mortal plain and just say "Fuck it, this one's probably a false positive as well" after taking a 2 minute look at the page locally before hitting the repacture command again.

I legitimately don't think I've ever actually seen a non-false positive snapshot test failure, to be honest.

hitchstory · 2025-01-07T16:05:07 1736265907

Snapshot tests are even more sensitive to flakiness than E2E tests but flakiness is still a bug that can (and probably should) be fixed like any other.

As an example - the different browser you have running locally and in CI? That's a bug. The fix is to run the browser in a container - the same container in both environments. You can probably get away without this with an E2E test. Snapshot test? No way.

Ive seen people decry and abandon E2E tests for the same reason - simply because they couldnt get flakiness under control and thought that it was impossible to do so.

I dont think it's intrinsically impossible though. I think it's just a hard engineering problem. There are lots of nonobvious points of flakiness - e.g. unpinned versions, selects without order bys, nondeterministic JavaScript code, etc. but few that are simply IMpossible to fix.

Theres a big payoff if you get it right, too.

yawaramin · 2025-01-07T17:48:36 1736272116

https://news.ycombinator.com/item?id=42625155