Hacker News new | past | comments | ask | show | jobs | submit login
The 4 Minute Bug (kinduff.com)
50 points by kinduff on Sept 29, 2022 | hide | past | favorite | 6 comments



Love it. This reminds me somewhat of a similar issue I saw in 2012, where Cheezburger.com went down over Thanksgiving weekend because we had no deploys for more than 80 hours, and an accidental per-server cache filled up in about that much time. https://jacob.jkrall.net/turkey-day-down-time


Very good story, it's one of those edge cases that are only found when something unusual happens.


It looks to me like a bad architected solution...

If you have a background job to update data, there is no need to do any TTL. If we're speaking about currency exchange, you might prefer to store a transactional history of exchanges, together with the current one, and populate a fast read model too (Redis) with a failover against the oltp version...

Maybe it's just me.


I agree with you, it is bad architecture.

I didn't mention it in the blog post, but the TTL was introduced by a dependency that allowed us to integrate the rate exchange service.


The story falls on the category of "it won't happen until it happens". Testing is expensive and even with all the testing in advance we observe emergent behaviors that make us run to the reams of logs to ask the very simple question: "What happened?" My take away: If you have to fail, because you will, fail graciously. Good post.


Very nice way to put the take away, thanks for reading!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: