That’s essentially why we don't have a staging environment and why “dev” is a direct copy of prod but with different backing data. I’d argue more often than not it’s overwhelmingly difficult to maintain the discipline required to not have a crummy staging that doesn't in any way resemble prod, so I’m sympathetic. We deal with that by taking away the notion that you get to stage your changes anywhere persistent before they land in prod. Of course dev is not 1000% identical to the very last bit, I’m not going to argue that. But it is a hell of a lot better than the type of staging environments I imagine drove the author to take such a stance. Like I said, we’ve yet to experience a prod-only bug that didn't reproduce in our dev env. So in that sense, anecdotally, the point does not hold.
Just to be a little more clear: I agree with the author that issues happen in prod that are unique to prod that you simply won’t catch pre-prod. And I agree with the hot take mantra that “testing in prod” is okay and not to be as frowned upon as people seem to think is trendy. But I’m also suggesting that instead of viewing the ability to test in prod as a badge of honor, it’s also possible to apply this mantra towards traditional notions of a staging environment. You can cut out many of the issues and frustrations surrounding testing in staging by actually practicing continuous integration. Build mechanisms and policy that severely limit the frequency and distances that staging systems diverge from prod and I wager you’d get much much further than the comfortable status quo of merging to staging, manual and automated integrating, and then a cadence-based release to prod. So yeah: test in prod! Just don't use your real prod unless you have to.
> That’s essentially why we don't have a staging environment and why “dev” is a direct copy of prod but with different backing data.
That's fair, and a decent way to make a staging environment, though as echoed elsewhere, the data itself can exercise your code in ways that uncover bugs. I think also this is more feasible on, say, a monolith setup, vs a sharded multi-cluster service that's integrated with manifold 3rd party systems - but yes, if you can, you probably should have this kind of prod-replica staging in adjunct to incremental canary rollouts, prod-safe testing suites etc. and also the article was explicitly suggesting in-prod testing should be adjunct to non-prod testing
You and others also have a point. I’m now thinking of ways we could seed our dev data to be maximally similar to prod. It’s all encrypted blobs though so it would mostly be about scale in our case. But your point is still taken.
> Of course dev is not 1000% identical to the very last bit, I’m not going to argue that. … Like I said, we’ve yet to experience a prod-only bug that didn't reproduce in our dev env.
…
> You can cut out many of the issues and frustrations surrounding testing in staging by actually practicing continuous integration. Build mechanisms and policy that severely limit the frequency and distances that staging systems diverge from prod and I wager you’d get much much further than the comfortable status quo of merging to staging, manual and automated integrating, and then a cadence-based release to prod. So yeah: test in prod! Just don't use your real prod unless you have to.
I’ve seen plenty of staging envs that look nothing like prod, and that what I’m calling the real sham.
That’s true but depends on what you’re building. We don’t have a million users or anything yet but I clone the prod db every month or so, change the passwords, and use that for testing. Before we had a staging db and a prod db but they’d diverge and staging would have almost no data while prod would be full of it.
How do you manage sensitive data with this workflow (i.e. do you do it manually everytime, do you automate it, what scripts, etc.)?
I get changing passwords, but say that data leaks (whether by a vulnerability in the clone environment, or by a dev gone rogue), how do you mitigate possible damage done to real users (since you did clone from prod).
I ask not because I question your actions, but because I've been wanting to do something similar in staging env to allow practical testing, but I haven't had the chance to research how to do it "properly".
Not the parent post, but working in finance, multiple products had a "scrambling" feature which replaced many fields (names, addresses, etc) with random text, and that was used upon restoring any non-production environments. It's not proper anonymization since there are all kinds of IDs that still are linkable (account numbers, reference numbers) to identities but can't be changed without breaking all the processes that are needed even in testing, but it's a simple action that does reduce some risks.