Hacker News new | past | comments | ask | show | jobs | submit login
Launch HN: DeploySentinel (YC S22) – End-to-end tests that don't flake
56 points by mikeshi42 on Aug 3, 2022 | hide | past | favorite | 36 comments
Hi HN, Michael and Warren here - cofounders of DeploySentinel (https://deploysentinel.com). We make end-to-end testing easier and more reliable.

At my last job, it dawned upon me how many production incidents and unhappy customers could have been avoided with more test automation - “an ounce of prevention is worth a pound of cure”. However, it wasn’t clear that you can get prevention for just an ounce. Our teams ramped up on investment into testing, especially end-to-end tests (via spinning up a headless browser in CI and testing as an end user), but it quickly became clear that these were incredibly expensive to build and maintain, especially as test suite and application complexity grow. When we asked around other engineering teams from different companies, we consistently heard how time-intensive test maintenance was.

The worst part of end to end tests is when they fail occasionally in CI but never locally—a heisenbug in your test code, or what’s usually referred to as a flaky test. The conventional way to debug such an issue is to replay a video of your CI’s test browser, stepping between video frames to try to parse what could be happening under the hood. Otherwise, your CI is just a complete black box.

Anyone that finds this story familiar can probably attest to days spent trying to debug an issue like this, possibly losing some hair in the process, and “resolving” it in the end by just deleting the test and regaining their sanity. Some teams even try to put front-end monitoring tools for production into their CI process, only to realize they aren’t able to handle recording hundreds of test actions executed by a machine over just a few seconds.

After realizing how painful debugging these tests could be, we started putting together a debugger that can help developers pinpoint issues, more like how you debug issues locally. Teams have told us there’s a night and day difference between trying to debug test failures with just video, and having a tool that can finally tell them what’s happening in their CI browser, with the same information they’re used to having in their browser’s devtools.

We give you the ability to inspect DOM snapshots, network events, and console logs for any step taken in a Cypress test running in CI, to give more insight into why a particular test might be failing. It’s like Fullstory/LogRocket, but for CI failures instead of production bugs. (We’re starting with Cypress tests, with plans to extend further.)

Our tool integrates with Cypress via their plugin API, so we’re able to plug in and record tests in CI with just an NPM install and 2 lines of code. From there we’re able to hook into Cypress/Mocha events to capture everything happening within the test runner (ex. when a test is starting, when a command is fired, when an element is found, etc.) as well as open a debugger protocol port with the browser to listen for network and console events. While a test suite is running, the debugger is consistently collecting what’s happening during a test run, and uploads the information (minus user-configured censored events) after every test completes.

While this may sound similar to shoving a LogRocket/FullStory into your test suite, there’s actually quite a few differences. The most practical one is that those tools typically have a low rate limit that work well for human traffic interacting with web apps at human speeds, but break when dealing with parallelized test runner traffic interacting with web app at machine speeds. Other minor details revolve around us associating replays with test metadata as opposed to user metadata, having full access to all network requests/console messages emitted within a test at the browser level, and us indexing playback information based on test commands rather than timestamp (time is an unreliable concept in tests!).

Once a test fails, a Github PR comment is created and an engineer can immediately access our web app to start debugging their test failure. Alternatively, they can check our web dashboard as well. Instead of playing a video of the failure in slow motion to understand the issue, an engineer can step through the test command-by-command, inspect the DOM with their browser inspect element tool at any point, view what elements the test interacted with, if any console messages were emitted during the action, or take a look at every network request made along with HTTP error codes or browser network error messages.

Typically with this kind of information, engineers can quickly find out if they have a network-based race condition, a console warning emitted in their frontend, a server-side bug, or a test failure from an edge case triggered by randomly generated test data.

We dream of a world where applications have minimal bugs, happy customers, built with engineering teams that don’t see testing as an expensive chore! Although the first pain we’re addressing is tests that fail in CI, we’re working on a bunch of things beyond that, including the second biggest issue in testing: test runtime length.

We have a free trial available for you to try out with your own tests, along with a few live demos of what our debugger looks like on an example test. You can get started here: https://deploysentinel.com/

We’re looking forward to hearing everyone else’s experiences with end to end tests, and what you think of what we’re doing!




I love the concept and I will evaluate your tool soon!

First comment, not on essence: the difference between your starter and pro plan is $5. If people have fewer than 85k runs / month, you're asking them to commit to $500/mo spend to get longer retention. That's... fine, but you can probably charge more for that. For people with more than 85k runs / month, you're only charging them $5/mo for that retention - and making the pricing page more complicated.

One option: Increase your base price from $40 to $45, and say longer retention is only available for customers paying >$500/mo. You'll get what you have today but simpler.

Another option: Charge more for longer retention! $10/1k-runs makes sense, or even $15. If retention is the differentiation between the two plans, charge for it like it's worth being an upsell.


Thank you for the feedback! Definitely agree with you that we still need to iterate on our pricing to strike a good balance. Right now we're likely going to be exploring the latter soon to better differentiate where the business plan is (ex. additionally gating richer analytics is what comes to mind - as some analytics only make sense once you hit a certain scale and frequency of tests).

Let us know when you get a chance to play around with it as well! Would love to hear what you think when you get in app.


I saw that your Recorder works with Playwright, but it seems that the main product is only for Cypress? Am I missing separate instructions for Playwright?

Also, minor typo in HTML title: Cypresss -> Cypress (3 s')


my spidey senses were tingling last night feeling like I had a typo somewhere... Maybe it's time for me to build that spell checker Cypress plugin I've been thinking about. Fixed, thank you!

Our main product is indeed only for Cypress today, I'm assuming you're using Playwright today? If so I'm wondering if you've had a chance to try out their trace viewer feature as well (https://playwright.dev/docs/trace-viewer). We're itching to build on Playwright as well, but want to do it once we've built out a set of features that can provide a few times improvement over the existing trace viewer. So I'd be curious if you've used the trace viewer, and if you already see our product as having a few legs up on it :)


I never tried the trace viewer and maybe I should. It seems more limited than what you offer, no?


I would indeed say the trace viewer is more limited (I'm obviously biased) - especially when you compare the fact that we not only collect the telemetry but help integrate it into your workflow very easily (as opposed to setting up your own workflow via artifacts/S3 uploads in your CI pipeline) and the biggest being around aggregating these builds/statistics over time so that they're easily retrievable :)

If you're open to it - I'd love to chat more on what your experience has been debugging Playwright tests and seeing how we could help there! I'm at mike [at] deploysentinel.com


Yes! Yes I want this!

But, at 100 full test suite runs a day, with 2000 tests each, this would cost me some $25k a month.

I think the pricing for a lot of these tools doesn’t necessarily take into account high volume testing. We’d pay 5x more for this than for all the infra used to run the tests themselves.

Also why we cannot use the hosted Cypress solution btw. Just doesn’t make financial sense.


Thank you for that feedback! Incredibly helpful to know :) I was hoping that capping out the pricing page slider at $1.5k/mo before recommending people to talk to us about enterprise would help with this a bit.

Absolutely when we're working with larger teams, we're frequently putting together a custom plan with tailored feature and a custom quote that takes into consideration their volume (both current and projected), term lengths and other misc things. Unfortunately this makes it really hard to describe pricing expectations at scale on a pricing page.

If you think we can help your team, I definitely think we can put something workable together at that volume - feel free to hit me up at mike [at] deploysentinel.com and let's chat. (Or I can shoot you a message to the email in your gh profile)


While what you have made looks cool it seems like you are leaving out a lot of what is at the basis of many hard to maintain flaky e2e tests: a fundamental wrong approach to testing resulting in too many tests in the development process where they simply do not belong.

The [test automation pyramid](https://www.google.com/search?q=test+automation+pyramid&sxsr...) or even the [testing trophy](https://twitter.com/kentcdodds/status/960723172591992832) doe exist with very good reason.

The earlier you can write your tests the easier it is to break them up in the relevant individual functionality without tests being flaky. Now you can argue the merits of Unit tests, hence the testing trophy also being a thing. But the fact of the matter is that 90% of the time e2e tests are flaky because teams try to test things in them that shouldn't be tested at that level.

Having said that, the ability to inspect the DOM, network requests, etc at various places in history is a very neat ability.


Thank you for the feedback! Totally agree that the testing pyramid/trophy and the reasoning behind them are 100% the correct way to go with testing today - but it relies on the assumption that writing higher level tests inherently means you're going to have slower tests that are more unreliable (and we all hate slow and unreliable things!).

The reason we're tackling this space is because we don't think this needs to be true of end to end, or even heavier integration tests. They have really awesome qualities such as being able to test close to things users actually care about (which is the most important part of testing at all!), can identify broken pieces across or in between parts of your stack, and are fairly intuitive to write/understand (imo).

But what if we could then build tooling to minimize some of the bad stuff - make tests more stable by providing better debugging, enforcing best practices automatically (imagine we can tell your tests are behaving non deterministically because of X async action, or use X conditional wait instead of a time-based one), and even giving you knobs to trade off between test fidelity and speed (ex. click a button and you can turn on/off network mocking in specific tests so you only exercise a part of your system instead of spinning up/testing the world).

Hopefully if we can nail our vision, you might be saying in the future "why are there so many unit tests? They cover so little surface, don't test how our app fits together at all, and require a lot of maintenance since they tend to test some form of implementation detail or another. We should be doing a lot more integration and e2e tests :)". It'll be a long road until then, but I think developers and users will be much happier the better we can make testing.


To be frank, I think you are now overselling a product where it really doesn't need to be sold as much as it already has a lot of value in itself. E2E tests deserve better tooling to debug them, not because they are flaky, but because they need to be analyzed anyway when they fail. That's basically what rubbed me slightly the wrong way initially and what mostly inspired me to comment on the flaky E2E test remark.

E2E tests will always be more "flaky" compared to tests further down the pyramid. That doesn't mean they are less reliable, but is inherent to the place where they are executed and is absolutely fine. What your tooling here promises to do is make debugging of failed tests much more convenient and streamlined, which is awesome and in itself is already a huge selling point if I am being honest.

Some points you mention:

> but it relies on the assumption that writing higher level tests inherently means you're going to have slower tests that are more unreliable

Unreliable, to some degree, as there are just more components involved, so any disruption in any of them has the potential to make your test fail. Slower, most certainly, but that is the nature of e2e tests and not necessarily a bad thing when used within the right context.

As an extremely simplified example when faced with a login screen in your e2e tests you should just want to test two things:

- Happy flow where the user ends up being logged in. - Unhappy flow where a feedback message is shown.

This because loading the page with the login windows does cost time. Sending the request and waiting for the page to process the response also costs time. So anything that doesn't test the UI should not be tested in the UI. There likely are a lot of different ways login in can fail, but those can easily be tested at the API level.

The same is true for functionality only available for when users are logged in. Sure, you can set up an automatic session so you don't need to go through the login screen and all that during your test. But, you still need to set all that up while the logic behind it can likely much more easily be tested further down limiting the need to do as many UI based e2e tests.

> to things users actually care about (which is the most important part of testing at all!)

Is it? It does depend on the application of course, but I'd say that it only becomes important once the underlying logic functions correctly. Certainly for applications handling and processing critical/important data. A proper foundation there is needed to be able to provide the user facing parts in a way that is reliable. If you don't focus on that underlying base you are effectively doing unit and api tests through a detour.

Again, I think you made a really cool product. Just nitpicking on some of the arguments you made here :)


Sorry my response was geared more towards what I hope the future of e2e (and bleeding into integration) testing could entail, independent of what we manage to accomplish - as opposed to trying to sell our product. I do get excited thinking about where testing can improve - with the end goal that developers get to ship reliable apps for their users with less frustration of testing along the way. Our product is very much just a minuscule fraction of our way there with our current iteration.

I also totally agree - e2e tests deserve better tooling to debug, and that alone we've found quite compelling of a product to be working on already, and I appreciate you find it cool as well :D

With regards to picking the right level of testing - I think there's a few km long rabbit hole to go down with that one, and I totally admit I hold strong beliefs that are different from most people in testing today. I agree though - there's always going to be a different preferred tool for different problems. It'd be pretty wild to expect e2e testing to accomplish the bulk of it! That being said, I think if e2e tests were better in a few dimensions, developers might find it useful for more use cases than they do today (some teams we talk to want to do more e2e tests but find the shortfalls as a real barrier to justify the investment).


Hi we are running selenium tests in our CI/CD pipeline.

We do actually generate a video of the tests running as an example see here https://github.com/purton-tech/cloak/actions/runs/2787628672

We're not using cypress we use webdriver connected to a selenium docker instance.

Is that something you can connect to?


Hey Ian! We don't today but we're actually looking to chat with people who are on Selenium to see if we can/should supporting Selenium users.

If you're open to it! Would love to hear more about the use case and where you think we can help the most in Selenium :) Give me a holler at mike [at] deploysentinel.com :)


A few questions on dependencies: do you need to be using the paid Cypress dashboard? do you need to be using Github for your repo?


Ah we really should have something on our site that explains this - no, there's no dependency on the paid Cypress dashboard. A few teams use us alongside them, most others don't. We've recently added load balancing for parallelism, as well as some basic analytics based on user feedback to help fill in those gaps if you're using just us.

As for Github - no as well. We just have a Github app integration if you do to comment in on PRs. But we have teams using Gitlab/Gitlab CI as well. We print out our debugger links directly in your CI stdout or also add links into your JUnit report if that's enabled (ex for Jenkins). Though there's plenty of teams that just check our dashboard directly for test results as well! We just want to make it easy to access your test failure when they occur.


Thanks, so is this a drop-in replacement for paid Cypress, but better? Pricing seems similar with $6/1000 tests.


Yup!

Typically our customers see the pain around debugability and currently waste dev & CI cycles rerunning tests and seeing their CI as largely ignored, or a productivity bottleneck to pushing code into main/production quickly. On top of that, they usually look for some tablestakes ability to load balance tests for parallelized runners, and some basic reporting to get a health check on their suite now and again.

Teams where their primary concern is around in-depth analytics and graphing reports, we have those on the roadmap, but can't say we're better than Cypress Dashboard yet :) But we think our focus on debugging overall gives teams a better chance of fixing errors, as opposed to reporting on them.

We have a free trial if you want to take it for a spin yourself! A lot of teams start off introducing us in an experimental PR to try out the product, and then merge into their main branch when they see the benefits. I'm also happy to chat in-depth on specifically where we might be able to help if you already have an existing Cypress set up, feel free to ping me at mike [at] deploysentinel.com


Drop in replacement for Cypress is the line that’s getting me to sign up


Ahhh thank you! That's a fantastic insight - let me get that up on our website and post.

Saw your YC profile as well - I'll shoot you an email as I'd love to get a chance to chat with you more about what you thought about the messaging/product :)


Do you have plans for a self-hosted version (ie, enterprise use case where third party hosted tools cannot be used)?


We've talked about it a few times with some of our largest teams - but so far they've been happy with staying on the SaaS side of things.

If it's absolutely a requirement at your workplace, we're happy to partner with you to make it work. We've discussed this internally and there's nothing inherent in our infrastructure that would make it impossible to deploy within an enterprise (the primary needs are the ability to run containers and have an S3-compatible object store).

If you'd want to dive into details I'm happy to email you via the email in your profile, or if you want to give me a ping at mike [at] deploysentinel.com


Neat work and congrats on the launch!

Speaking of Cypress Dashboard drop-in replacement: https://currents.dev is a must-be-mentioned tool!

As well as the open source and free https://sorry-cypress.dev

Sorry for the shameless plug :) We have also been working for a while on time travelling. Hoping to share some results soon - your work is very inspiring.

Great to see such a variety of tools that make CI testing less painful!


This seems a bit overdone right? If I understand, Fullstory and the like are cool because they let you reproduce unknown user scenarios and bugs. But for tests, the inputs are all known (that's the point of a test) so it should be trivial to reproduce. In other words, how is this advantageous over just rerunning the failing test locally and opening your browser developer tools?


That's a great question! While in theory, yes that is absolutely true tests should be reproducible anywhere, unfortunately in practice, tests can unfortunately fail in hard to reproduce ways (very much like production issues!), especially when testing a complex app or working in a large test suite written by many different people.

Common examples of this entails race conditions, your test server might be super snappy on your local machine, but run much slower in CI, and expose race conditions that were unaccounted for. Other times it may be due to subtle difference between environments (service doesn't behave properly in CI due to build step differences or service is initialized differently in CI than locally).

The list of scenarios where local reproduction can be hard is unfortunately quite long, which is the downside of running end to end tests, the surface area they test (and can fail at) is really wide and complex, much like production failures. Enough so that teams find it useful to have proper debugging tools to debug what when wrong when tests fail in CI, like how you'd want proper debugging tools when apps fail in production :)

Of course there's the other side of rerunning failing tests locally, even if they are reproducible. E2E tests and their associated environment can take time to set up locally. Why bother spending 20 minutes checking out the failed commit, searching up internally how to spin up the local e2e environment to reproduce the failure, and then running the test(s) to debug. Instead you can just click through a UI and get everything you need, it's much smoother!


> it should be trivial to reproduce

Yeah, no. It should be trivial to reproduce. Practically there’s hundreds of actions and saga’s firing at the same time, and the order of events is anything but guaranteed.

That’s absolutely a problem we wouldn’t have if everything was well-architected, but yeah, meet the real world.


How do you compare to a rainforestqa?


Rainforest QA is focused on no-code testing (so either dispatching tests to manual testers in their community, or using their UI to drag & drop tests yourself).

We're focused on code-based tests, which is a lot more flexible in terms of things they can test, and scales better with the complexity of your application and testing needs. (Can have test code that sets up mock data, shortcut parts of your application to speed up tests, work with dynamic data, and be updated alongside your product changes so your tests never break).

We've found that with the advancements in DX for e2e testing libraries today, most developers find it quicker to just write tests themselves in code, rather than having it done in a 3rd-party UI tool. This also gives the bonus that developers are aligned with delivering high quality, well-tested code during their development cycle, as opposed to encouraging a "throw-it-over-the-wall" culture of having someone else test it for them after broken code has landed into the main branch.

Last point, we're not a full testing framework ourselves - we're currently compatible with Cypress, a widely popular e2e testing library, but are definitely keeping an eye on other frameworks such as Playwright as well :)


how does this compare to playwright?

https://playwright.dev/


Our product currently integrates on top of existing test libraries (and we're starting with Cypress).

If you'd be referencing how Cypress compares with Playwright - I'd say the two largest differences is Cypress has a promise-like/chain based syntax for authoring tests versus Playwright allows for async/await syntax. Outside of that, Cypress provides a pretty awesome local developer experience (imo) whereas Playwright has a leg up in flexibility (multi-tab support, browser support, better simulated actions like mouse hovering).

If you're referencing how does our tool compare to what you might get with Playwright out of the box, Playwright does offer a really awesome trace viewer (https://playwright.dev/docs/trace-viewer) and actually had a brief chat with the original PM on Playwright a few weeks ago about it. It does capture a lot of similar debugging information - where we differ today from our debugger is a focus on making these historical runs easily accessible for engineers (don't need to go to your CI run, unzip an artifact, load it up locally, or build your own analytics dashboard for trends) as well as iterating on DX improvements like letting you scroll through/search all your network requests at once, and then jump to the point in time where a network request was made, which isn't possible in the trace viewer.

In the future we're looking into providing deeper information beyond just browser-level telemetry, so a few ideas we've kicked around with our users include capturing Redux/React state (like Redux or React dev tools do locally today), or even being able to sync up backend telemetry relative to what's happening in the browser. (Ex. show me the logs that were printed to stdout in the app server container when my test was clicking on the check out button).


Obligatory alternative tool plug / comparison:

I work for https://replay.io , which is a true time-traveling debugger for JS. We have forks of Firefox, Chrome, and Node, instrumented to capture syscalls at the OS level. Those recordings are uploaded to the cloud, and devs can then use our web client (effectively the Firefox DevTools as an app + a bunch of new features) to debug the recording at _any_ point in time, including adding print statements after the fact that show values every time a line was hit, step debugging, network requests, React + Redux DevTools integration, DOM inspection, and more.

Currently, our main usage is manually recorded replays, but we're actually working on similar test integration features as well. We can record Playwright and Cypress tests, upload recordings of test runs, show results per test in our dashboard, and let you debug the full recordings of each successful and failed test (code, DOM, network, console messages, errors, etc). The test suite feature is early closed beta atm - we've been dogfooding it ourselves and it's _really_ helpful!

Based on your description + a quick glance at your home page, sounds like we're addressing the same use case in similar ways, with some differences in the underlying recording technology and implementations


Great to see you guys here, I've heard a few mentions of your team's e2e integration in the works for a bit!

We indeed do approach from a different direction, having been built around the e2e testing use case from the start, we've focused on integrating with existing CI setups without swapping browsers (some teams really love the bundled Electron runner!), as well as knowing exactly what actions are running in your test runner relative to the replay in a likely familiar UI (answering questions like what DOM element did Cypress match for this command? What was the DOM like exactly when Cypress started running that command? Which command executed before this network request?).

Since you're from the team - one thing I've always wondered is how the pricing will work out for CI test runs? From the current public pricing I've found, it looks like it works great for developers recording test runs manually, but is extremely cost prohibitive at any scale if you're running it continually in CI.


Nice, yeah, I can see the focus on "what actions" being useful.

I'm an engineer, not GTM, so I'm not sure how the test runs feature plays into the listed "X recordings per month" at https://www.replay.io/pricing . Agreed that there's a distinct difference between engineers/QA making manual recordings, and CI cranking them out - right now we've got 52 E2E tests that run on every PR, often multiple times due to re-pushes, and each of those tests generates a recording per run. So, obviously that burns through real fast :)

If I had to guess we'd probably distinguish between those two use cases. I've tagged our main GTM person in case he wants to respond here.


Hahaha yup! We get a ton of volume on our platform from teams churning out test runs per-commit in their CI, where debugability is most important.

Glad to hear from others in the space! Hope to learn more if your team's GTM person jumps on :)


Congrats on launching! Using Session Replay in the CI space makes a lot of sense. And agreed that there’s a lot that can be done by hooking into Cypress’s events. By the way, we’re doing the same and hope to show a Cypress reporter soon as well.

Re pricing: we’re still refining the pricing, but we assume that most of the time you’ll only want to debug the failing tests and while the recordings are fairly small, the larger piece is actually replaying the browser as it ran before so you can add print statements in your application and play to when a network request was returned or an error was thrown.


Thanks Jason, can't wait to see what your team's been working on the Cypress end soon then!




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: