> Operators instead relied on logs to understand what was happening and initially identified elevated internal DNS errors. Because internal DNS is foundational for all services and this traffic was believed to be contributing to the congestion, the teams focused on moving the internal DNS traffic away from the congested network paths. At 9:28 AM PST, the team completed this work and DNS resolution errors fully recovered.
It’s quite a bit different… Facebook took themselves offline completely because of a bad BGP update, whereas AWS had network congestion due to a scaling event. DNS relies on the network, so of course it’ll be impacting if networking is also impacted.
no. it wasn't a "bad bgp update". bgp withdrawal of anycast addresses was a desired outcome of a region (serving location) getting disconnected from the backbone. if you'd like to trivialize it, you can say it was configuration change to the software defined backbone.
Having DNS problems sounds a lot like the Facebook outage of 2021-10-04. https://en.wikipedia.org/wiki/2021_Facebook_outage