Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Operators instead relied on logs to understand what was happening and initially identified elevated internal DNS errors. Because internal DNS is foundational for all services and this traffic was believed to be contributing to the congestion, the teams focused on moving the internal DNS traffic away from the congested network paths. At 9:28 AM PST, the team completed this work and DNS resolution errors fully recovered.

Having DNS problems sounds a lot like the Facebook outage of 2021-10-04. https://en.wikipedia.org/wiki/2021_Facebook_outage




It’s quite a bit different… Facebook took themselves offline completely because of a bad BGP update, whereas AWS had network congestion due to a scaling event. DNS relies on the network, so of course it’ll be impacting if networking is also impacted.


no. it wasn't a "bad bgp update". bgp withdrawal of anycast addresses was a desired outcome of a region (serving location) getting disconnected from the backbone. if you'd like to trivialize it, you can say it was configuration change to the software defined backbone.


The rule is that it’s always DNS.


DNS seemed to be involved with both the Spectrum business internet and Charter internet outages overnight. So much for diversifying!




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: