Hacker News new | past | comments | ask | show | jobs | submit login
AWS Experiencing Intermittent DNS Resolution Errors (amazon.com)
147 points by pmccarren on Oct 22, 2019 | hide | past | favorite | 56 comments



"We want to give you further information regarding the occasional DNS resolution errors. The AWS DNS servers are currently under a DDoS attack. Our DDoS mitigations are absorbing the vast majority of this traffic, but these mitigations are also flagging some legitimate customer queries at this time. We are actively working on additional mitigations, as well as tracking down the source of the attack to shut it down. Amazon S3 customers experiencing impact from this event can update the configuration of their clients accessing S3 to specify the specific region that their bucket is in when making requests to mitigate impact. For example, instead of "mybucket.s3.amazonaws.com" a customer would instead specify "mybucket.s3.us-west-2.amazonaws.com" for their bucket in the US-WEST-2 region. If you are using the AWS SDK, you can specify the region as part of the configuration of the Amazon S3 client to make sure your requests use this region-specific endpoint name.

The DNS resolution issues are also intermittently affecting other AWS Service endpoints like ELB, RDS, and EC2 that require public DNS resolution. We are actively working on this issue and will update you as soon as the issue is resolved on our end, however at this moment I won’t be able to provide an ETA. I am keeping this case in Pending Amazon Action, will update you as soon as I get further information on the resolution of this issue."

https://www.reddit.com/r/aws/comments/dlnl28/route53_is_fail...


Maybe they should disclose when they are carrying out DDoS mitigation strategies that could have adverse effects on their customers. I can only imagine how many people wasted a more than few hours as they were misled by a green status page.


Nothing listed on AWS Personal Health Dashboard related to this incident. Also nothing abnormal listed on the AWS status page under S3, which is seemingly having DNS resolution issues for buckets with the `[bucket_name].s3.amazonaws.com` pattern.


We've observed the same... we started seeing this at ~9am et this morning so this has now been going on for over 8 hours.


There's almost a business model for a 3rd party to run a AWS service status dashboard that's more accurate.


https://www.datadoghq.com/ has something like that


Think we could crowd source the outage reports? I'll build the dashboard if there's interest


We are facing this right now for us-east-1 :/

thankfully production is unaffected.


Switching to googleDNS 8.8.8.8 or 8.8.4.4 bypasses the affected DNS servers and will resolve the simple URLs (no region in url). Still really surprised this isn't making bigger news yet.


Forgive my ignorance, but how does using Google's DNS get around Route53 not being able to resolve URLs? Does the default DNS just target Route53 DNS?


Tried that this morning and it solved the issue for me.


Thanks, this fixed my issues with my local dev environment.


My company hasn't been able to deploy any updates to our ElasticBeanstalk application for about 7 hours due to this. Luckily there's nothing urgent we need to deploy, but this makes me extremely nervous about using AWS going forward. If we experienced an outage that required a rollback or forward fix we would be totally hosed.


In defense of AWS, the ddos is probably one of the most massive and targets dns servers. I fully expect the dns to be strengthened after this incident. At the same time, using another provider doesn’t resolve the problem of ddos attack, which can happen at any point in a public network, not limited to DNS servers


AWS’s track record is still good, though. I mean, there will always be problems with cloud technology. But would you rather try to mitigate a DDoS attempt yourself or would you rather Amazon do it? I think giving up on AWS because of this event would be kind of throwing the baby out with the bath water.


The question is really "if you're not on AWS, do you have to mitigate a DDoS"? It's a big target for reasons unrelated to what most people run.


In my experience, you eventually piss SOMEONE off, or a competitor thinks it can drive business to them, etc. You probably won't get a huge one, but even a small DDoS can be difficult to mitigate for a company with limited resources.


And of course Route 53 still has a green checkmark.


Now on https://status.aws.amazon.com/

Intermittent DNS Resolution Errors

We are investigating reports of occasional DNS resolution errors with Route 53 and our external DNS providers. We are actively working towards resolution.


Yes, that's the original link. There's still a green checkmark next to route 53, though.


Likely because Route53 doesn’t have any degraded service at the moment. Would be more apt for a yellow status next to S3, given context in threads above.


It's not just S3. I'm seeing failures of DNS resolution of well known, and very much still working, non-AWS sites from within AWS now.

It's really hit or miss, though. Most of the big stuff works.


...and even if they changed it - are you going to subscribe to 100+ RSS feeds to find out?



Why bother?

30 days service credit is gonna wind up $0.50/domain. $0.10 for each over 25. Unless you're running thousands of domains, it's not even worth applying for.


That's just the per-hosted zone pricing. The SLA covers the per-query pricing, which can really add up.


AWS is not really "Too Big To Fail".

It's more like, "Too Big, Will Fail".


There aren't many technological systems - big or small - that don't ever fail.


The internet is about to halt. SQS & SNS are not resolving from many parts of the world.


It's amazing how much a ddos attack can do.

I read a few days back that when attackers had trouble attacking cloudflare they then went for the internet infrastructure (Internet Exchange) itself. In this case attacking a DNS service can block connection to a much larger set of internet.


Fwiw from some info posted above it sounds like it's the automated response to the DDoS causing issues, not the DDoS itself.


That would still be part of the ddos though, No mitigation is impact free.


DNS is the weakest link for many companies.


This is affecting my websites that use cloudfront, for which Amazon apparently uses their own Route 53 for DNS (nslookup -query=ns cloudfront.net). Since the only way to call remote assets from cloudfront as a website asset is to use: xyz.cloudfront.net or a CNAME such as mysites-cdn.com that maps to xyz.cloudfront.net it seems there is no way to use cloudfront without the Route53 lookup.

If route53 is down and that is required to use cloudfront how is this not affecting more people? I have had about dozen customers complain today.


In our case the issue is affecting DNS resolution of 'only' S3 related hostnames (my-bucket.s3.amazonaws.com)


This comment from the other thread may be useful in mitigating your issue.

https://news.ycombinator.com/item?id=21328285


Is Route53 under attack or a customer on Route53?

By all appearances the Route53 DDOS mitigation strategy is massive scale and distribution. This includes distributing customers and their NS records across infrastructure AND TLDs. I would have thought a blanket attack against Route53 impractical..


Nothing is impractical when you have enough tps.


Maybe a stupid question, but what to do when eu-central-1.signin.aws.amazon.com is down?


Make some coffee and do something else until it comes back up.


If you got paged and are currently dealing with this, one thing you can do is set better defaults for AWS_REGION. i.e update the configuration of your client that is accessing the AWS resource to specify the resource's region


I’ve noticed CloudWatch Dashboards malfunctioning today (not showing any data).


We are facing this right now. Trying to push new pages to Cloudfront.


I was getting some weird notifications about "kms: server misbehaving". No production impact so far, fingers crossed.


Why does the AWS status say the issue is resolved when it obviously isn't? S3 is still down in parts of US


YouTube is having issues as well (via the Android app), does YouTube run partially on AWS?


Well... I mean it’s owned by Google so I want to say that’s a negative ghost rider.


Waze uses AWS: https://cloud.google.com/blog/products/gcp/guest-post-multi-...

You're probably not wrong, but still.


Spinnaker on EC2... how painful.


I'm fairly sure that's where the primary developers and original creators of Spinnaker use it.


It is, their infrastructure was (or maybe still is) mostly based on EC2.

However, I think they kind of wanted to support many cloud providers, so maybe they stopped focusing on it througout the years.


Dont worry, the internet was designed to circumvent nuclear attacks like these


Is this the cache poisoning ddos posted earlier?


AWS has wasted so many hours of my life troubleshooting their “throw shit at the wall and see what sticks” services.

I don’t know why developers put up, even push for, their garbage services.


But it's not very hard to find out the issue is dns


China




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: