Hacker News new | past | comments | ask | show | jobs | submit login

I disagree; it's much longer than it needs to be, is filled with pseudo-technoese to hide that there's little of consequence in there, and the tiny bit of real information in there is couched with distractions and unnecessary detail.

As I understand it, they're telling us that the outage was caused by an unspecified bug in the "Content Validator", and that the file that was shipped was done so without testing because it worked fine last time.

I think they wrote what they did because they couldn't publish the above directly without being rightly excoriated for it, and at least this way a lot of the people reading it won't understand what they're saying but it sounds very technical.




no, it's one of most well written PIR's I've seen. It establishes terms and procedures after communicating that this isn't an RCA, then they detail the timeline of tests and deployments done and what went wrong. They were not excessively verbose or terse. This is the right way of communicating to the intended audience. It is both technical people, executives and law makers alike that will be reading this. They communicated their findings clearly without code, screenshots, excessive historical details and other distractions.


If you think this is good, go look at a Cloudflare postmortem. The fly.io ones are good too.

Way less obscure language, way more detail and depth, actually owning the mistakes rather than vaguely waffling on. This write up from CrowdStrike is close to being functionally junk.


One of the first things they've stated is that this isn't an RCA (deep dive analysis) like cloudflare and fly.io's, that's not what this is. This is to brief customers and the public of their immediate post-mortem understanding of what happened. The standard for that is different than an RCA.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: