AWS Experiencing Intermittent DNS Resolution Errors

t3rabytes · on Oct 22, 2019

"We want to give you further information regarding the occasional DNS resolution errors. The AWS DNS servers are currently under a DDoS attack. Our DDoS mitigations are absorbing the vast majority of this traffic, but these mitigations are also flagging some legitimate customer queries at this time. We are actively working on additional mitigations, as well as tracking down the source of the attack to shut it down. Amazon S3 customers experiencing impact from this event can update the configuration of their clients accessing S3 to specify the specific region that their bucket is in when making requests to mitigate impact. For example, instead of "mybucket.s3.amazonaws.com" a customer would instead specify "mybucket.s3.us-west-2.amazonaws.com" for their bucket in the US-WEST-2 region. If you are using the AWS SDK, you can specify the region as part of the configuration of the Amazon S3 client to make sure your requests use this region-specific endpoint name.

The DNS resolution issues are also intermittently affecting other AWS Service endpoints like ELB, RDS, and EC2 that require public DNS resolution. We are actively working on this issue and will update you as soon as the issue is resolved on our end, however at this moment I won’t be able to provide an ETA. I am keeping this case in Pending Amazon Action, will update you as soon as I get further information on the resolution of this issue."

https://www.reddit.com/r/aws/comments/dlnl28/route53_is_fail...

kache_ · on Oct 22, 2019

Maybe they should disclose when they are carrying out DDoS mitigation strategies that could have adverse effects on their customers. I can only imagine how many people wasted a more than few hours as they were misled by a green status page.

skyraider · on Oct 22, 2019

Nothing listed on AWS Personal Health Dashboard related to this incident. Also nothing abnormal listed on the AWS status page under S3, which is seemingly having DNS resolution issues for buckets with the `[bucket_name].s3.amazonaws.com` pattern.

pgroudas · on Oct 22, 2019

We've observed the same... we started seeing this at ~9am et this morning so this has now been going on for over 8 hours.

tyingq · on Oct 23, 2019

There's almost a business model for a 3rd party to run a AWS service status dashboard that's more accurate.

chenster · on Oct 23, 2019

https://www.datadoghq.com/ has something like that

pmccarren · on Oct 23, 2019

Think we could crowd source the outage reports? I'll build the dashboard if there's interest

loneranger_11x · on Oct 23, 2019

We are facing this right now for us-east-1 :/

thankfully production is unaffected.

nathanielkam · on Oct 22, 2019

Switching to googleDNS 8.8.8.8 or 8.8.4.4 bypasses the affected DNS servers and will resolve the simple URLs (no region in url). Still really surprised this isn't making bigger news yet.

BillNyeTheITguy · on Oct 23, 2019

Forgive my ignorance, but how does using Google's DNS get around Route53 not being able to resolve URLs? Does the default DNS just target Route53 DNS?

herostratus101 · on Oct 22, 2019

Tried that this morning and it solved the issue for me.

ph4 · on Oct 22, 2019

Thanks, this fixed my issues with my local dev environment.

abvdasker · on Oct 22, 2019

My company hasn't been able to deploy any updates to our ElasticBeanstalk application for about 7 hours due to this. Luckily there's nothing urgent we need to deploy, but this makes me extremely nervous about using AWS going forward. If we experienced an outage that required a rollback or forward fix we would be totally hosed.

Aperocky · on Oct 22, 2019

In defense of AWS, the ddos is probably one of the most massive and targets dns servers. I fully expect the dns to be strengthened after this incident. At the same time, using another provider doesn’t resolve the problem of ddos attack, which can happen at any point in a public network, not limited to DNS servers

journalctl · on Oct 22, 2019

AWS’s track record is still good, though. I mean, there will always be problems with cloud technology. But would you rather try to mitigate a DDoS attempt yourself or would you rather Amazon do it? I think giving up on AWS because of this event would be kind of throwing the baby out with the bath water.

mrkurt · on Oct 23, 2019

The question is really "if you're not on AWS, do you have to mitigate a DDoS"? It's a big target for reasons unrelated to what most people run.

cthalupa · on Oct 23, 2019

In my experience, you eventually piss SOMEONE off, or a competitor thinks it can drive business to them, etc. You probably won't get a huge one, but even a small DDoS can be difficult to mitigate for a company with limited resources.

itamarst · on Oct 22, 2019

And of course Route 53 still has a green checkmark.

johanneswu · on Oct 22, 2019

Now on https://status.aws.amazon.com/

Intermittent DNS Resolution Errors

We are investigating reports of occasional DNS resolution errors with Route 53 and our external DNS providers. We are actively working towards resolution.

mlyle · on Oct 22, 2019

Yes, that's the original link. There's still a green checkmark next to route 53, though.

bowmessage · on Oct 22, 2019

Likely because Route53 doesn’t have any degraded service at the moment. Would be more apt for a yellow status next to S3, given context in threads above.

banana_giraffe · on Oct 22, 2019

It's not just S3. I'm seeing failures of DNS resolution of well known, and very much still working, non-AWS sites from within AWS now.

It's really hit or miss, though. Most of the big stuff works.

jdelman · on Oct 22, 2019

...and even if they changed it - are you going to subscribe to 100+ RSS feeds to find out?

myroon5 · on Oct 22, 2019

https://aws.amazon.com/route53/sla/

ceejayoz · on Oct 22, 2019

Why bother?

30 days service credit is gonna wind up $0.50/domain. $0.10 for each over 25. Unless you're running thousands of domains, it's not even worth applying for.

luhn · on Oct 23, 2019

That's just the per-hosted zone pricing. The SLA covers the per-query pricing, which can really add up.

th582ujdj · on Oct 22, 2019

AWS is not really "Too Big To Fail".

It's more like, "Too Big, Will Fail".

ceejayoz · on Oct 22, 2019

There aren't many technological systems - big or small - that don't ever fail.

th582ujdj · on Oct 22, 2019

The internet is about to halt. SQS & SNS are not resolving from many parts of the world.

Aperocky · on Oct 22, 2019

It's amazing how much a ddos attack can do.

I read a few days back that when attackers had trouble attacking cloudflare they then went for the internet infrastructure (Internet Exchange) itself. In this case attacking a DNS service can block connection to a much larger set of internet.

foota · on Oct 22, 2019

Fwiw from some info posted above it sounds like it's the automated response to the DDoS causing issues, not the DDoS itself.

Aperocky · on Oct 22, 2019

That would still be part of the ddos though, No mitigation is impact free.

throw554 · on Oct 22, 2019

DNS is the weakest link for many companies.

holdenc · on Oct 22, 2019

This is affecting my websites that use cloudfront, for which Amazon apparently uses their own Route 53 for DNS (nslookup -query=ns cloudfront.net). Since the only way to call remote assets from cloudfront as a website asset is to use: xyz.cloudfront.net or a CNAME such as mysites-cdn.com that maps to xyz.cloudfront.net it seems there is no way to use cloudfront without the Route53 lookup.

If route53 is down and that is required to use cloudfront how is this not affecting more people? I have had about dozen customers complain today.

svacko · on Oct 22, 2019

In our case the issue is affecting DNS resolution of 'only' S3 related hostnames (my-bucket.s3.amazonaws.com)

smudgymcscmudge · on Oct 22, 2019

This comment from the other thread may be useful in mitigating your issue.

https://news.ycombinator.com/item?id=21328285

Rapzid · on Oct 22, 2019

Is Route53 under attack or a customer on Route53?

By all appearances the Route53 DDOS mitigation strategy is massive scale and distribution. This includes distributing customers and their NS records across infrastructure AND TLDs. I would have thought a blanket attack against Route53 impractical..

Aperocky · on Oct 22, 2019

Nothing is impractical when you have enough tps.

zargath · on Oct 22, 2019

Maybe a stupid question, but what to do when eu-central-1.signin.aws.amazon.com is down?

journalctl · on Oct 23, 2019

Make some coffee and do something else until it comes back up.

kache_ · on Oct 22, 2019

If you got paged and are currently dealing with this, one thing you can do is set better defaults for AWS_REGION. i.e update the configuration of your client that is accessing the AWS resource to specify the resource's region

haolez · on Oct 22, 2019

I’ve noticed CloudWatch Dashboards malfunctioning today (not showing any data).

gramakri · on Oct 22, 2019

We are facing this right now. Trying to push new pages to Cloudfront.

jniedrauer · on Oct 22, 2019

I was getting some weird notifications about "kms: server misbehaving". No production impact so far, fingers crossed.

Bob312371 · on Oct 23, 2019

Why does the AWS status say the issue is resolved when it obviously isn't? S3 is still down in parts of US

ric2b · on Oct 22, 2019

YouTube is having issues as well (via the Android app), does YouTube run partially on AWS?

whalesalad · on Oct 22, 2019

Well... I mean it’s owned by Google so I want to say that’s a negative ghost rider.

deanCommie · on Oct 22, 2019

Waze uses AWS: https://cloud.google.com/blog/products/gcp/guest-post-multi-...

You're probably not wrong, but still.

moon2 · on Oct 23, 2019

Spinnaker on EC2... how painful.

cthalupa · on Oct 23, 2019

I'm fairly sure that's where the primary developers and original creators of Spinnaker use it.

moon2 · on Oct 23, 2019

It is, their infrastructure was (or maybe still is) mostly based on EC2.

However, I think they kind of wanted to support many cloud providers, so maybe they stopped focusing on it througout the years.

buboard · on Oct 22, 2019

Dont worry, the internet was designed to circumvent nuclear attacks like these

thrax · on Oct 23, 2019

Is this the cache poisoning ddos posted earlier?

rudolph9 · on Oct 23, 2019

AWS has wasted so many hours of my life troubleshooting their “throw shit at the wall and see what sticks” services.

I don’t know why developers put up, even push for, their garbage services.

highprofittrade · on Oct 23, 2019

But it's not very hard to find out the issue is dns

yclept · on Oct 22, 2019

China