There's a couple of good stories about massive outages and good incident respons... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

packetslave on Jan 14, 2018 | parent | context | favorite | on: Alert About Missile Bound for Hawaii Was Sent in E...

There's a couple of good stories about massive outages and good incident response. Mine is just one of them (and at some level, I was very lucky).

There's also the one where all the frontend servers worldwide went into a crash loop from a bad configuration push. The SRE doing the push noticed some "weirdness" and rolled back even before the full scope of the issue was known. That one's in the SRE book.

Bluestrike2 on Jan 14, 2018 | [–]

Site Reliability Engineering.[0] Google's SRE book is a pretty interesting read.

0. https://landing.google.com/sre/interview/ben-treynor.html

mynewtb on Jan 14, 2018 | [–]

GFE? SRE?

mikejb on Jan 14, 2018 | [–]

SRE is Site Reliability Engineer; GFE is the "Google Front End"

Join us for AI Startup School this June 16-17 in San Francisco!
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact