Worldwide DNS Performance

jimaek · on March 26, 2015

Hey everybody, im the creator of dnsperf. I just wanted to clear some things up.

1. The point of the service is to have an objective way to compare different DNS services. dnsperf is supposed only to help you in your search for the best DNS service and not crown the absolute winner.

2. I plan to add as many locations as possible to get more reliable data. More locations will also solve the problem of "DNS at the same datacenter as my test nodes"

3. Some people are confused about the node locations. Here is the map http://www.dnsperf.com/network

4. Its not open source but some time in the future it will be.

5. The tests I am running are supposed to exclude variables such as local/ISP caching, DNS proxies, resolvers, SRTT and so on. I wanted to test the nameservers directly and get the raw performance data of the provider himself. I believe thats the best way to do fair comparison.

Thank you for your comments and love. I'm glad people liked the service :)

Feel free to email me with your feedback or questions.

corford · on March 26, 2015

Hi jimaek!

Thanks for putting this service together. For an African testing location maybe a VPS provider from ZA would work? Following are a few providers with Joburg or Cape Town presences (haven't personally used any though so YMMV):

http://www.vpsnine.com/hosting/locations.php http://www.webafrica.co.za/hosting/vps/ http://www.webnow.co.za/vps/ http://www.web-telecoms.co.za/vps/

(posted these to the comments on your site but re-posting here in case you miss it).

jimaek · on March 26, 2015

Thank you, I will look into them.

Loic · on March 26, 2015

Could you please add the IP or the provider of the client performing the tests? It seems that you are using servers hosted with OVH, this makes the OVH results better because your tests are within the provider network.

jimaek · on March 26, 2015

I use only 2 servers hosted with OVH. I see the problem here but there is no way to avoid it. Monitoring 37 providers will result in multiple "same DC" problems. I plan to solve this problem by keep adding new locations with different datacenters. The more locations I have the less impact this kind of issues will have.

Loic · on March 26, 2015

If you provide the "same DC" information, this helps making the right decision. Maybe you can exclude the "same DC" results or offer an option to filter them out. This of course reduces the number of probes for the given provider, but maybe no probes is better than a too good to be true probe.

jimaek · on March 26, 2015

I simply cant know what DCs are used by the providers that I monitor. So I have no idea where we share datacenters. What happens if they change DCs? I would need to constantly monitor their infrastructure. Filtering out would be too hard and pretty useless.

By simply adding more locations I fix the problem, add more data and simplify the whole process.

sandstrom · on March 26, 2015

This! If you have dc-info, you can ignore intra-dc lookups.

Florin_Andrei · on March 26, 2015

The word "speed" usually suggests that more is better.

What the graphs actually display is latency - in which case less is better.

That was confusing for a few moments.

windexh8er · on March 26, 2015

Latency is a function of the speed of light - and in most engineering realms "speed" is well understood when using latency as a description. I had no qualms understanding what the creator meant, but I agree that the technical reality is the graph should be defined by "query latency" vs "query speed". Either way, fantastic data!

Hopefully this data is trended over daily, weekly, monthly, quarterly, and yearly rollups. I see monthly when drilling into a provider, but that's all. A rolling annual graph would really tell a true story of how consistent a provider is and that, in itself, is very important when considering outsourcing DNS.

jimaek · on March 26, 2015

I guess I will need to fix that. thanks for the feedback

rajacombinator · on March 26, 2015

This is cool and I've seen it before but I have a hard time interpreting the results. For instance, I have a hard time believing Google's DNS is substantially slower than anyone else's which leads me to question all the results.

Also, and I'm hoping someone here can clear this up for me, but it's not clear to me how these results translate to actual DNS performance for my site. I only vaguely understand how the magic of the internet works but don't the DNS results get distributed throughout the network making the performance of the root servers less relevant? If I could chop 50-100ms off my TTFB by switching my DNS from Namecheap to someone else, I would definitely pay for that, but it's not clear to me.

flipp3r · on March 26, 2015

Why would you pay more for faster DNS? Most browsers come with DNS prefretching built in, I doubt it really matters for a website.

rajacombinator · on March 28, 2015

Did you read what I wrote? I said I would pay if I were convinced it would lead to faster loads, but I'm not.

pixl97 · on March 26, 2015

Because the first page load is what matters most.

moe · on March 26, 2015

Your DNS information should already be cached in all major ISPs DNS caches (unless you mean the first page load ever).

corford · on March 26, 2015

How about the first subsequent load after the TTL for the previous load expires?

Admittedly, not much of an issue if you're a big site with hundreds/thousands of requests a minute but it is an issue if you're a less trafficed site and have a low TTL.

sandstrom · on March 26, 2015

Another way of gathering data would be a js-snippet that participating websites can include, which would run a random subset of the tests on some share of of visitors, and report back to your service.

It'll be more realistic then DC -> DC queries.

- http://www.w3.org/TR/resource-timing/

- http://googledevelopers.blogspot.se/2013/12/measuring-networ...

acdha · on March 26, 2015

> It'll be more realistic then DC -> DC queries.

That depends on what you're trying to measure. If you want to know what the total end-user experience is like, Resource Timing is awesome.

If you're trying to compare DNS providers without noise from the differences in client implementations and ISP competency, however, it'll be very noisy and you need a massive number of samples to avoid seeing trends which are really just sampling artifacts.

The best approach would be to do both so you could compare the server response times with the measured client values so you can get an idea whether a slow-down this week is caused by something you have any control over.

jimaek · on March 26, 2015

Its realistic but not consistent and reliable. Like I said I want to test their raw performance without any variables in between. All providers get tested directly from the exact same locations and times. Thats the data I wanted to get. It makes it easy and fair to compare.

rilita · on March 26, 2015

Would you mind updating your bar charts to say "Query time" instead of "Query speed". Some mention of the unit of the numbers would be nice as well, such as "Query speed ( ms/req )"

lucaspiller · on March 26, 2015

"Asia DNS performance"

This should really be taken with a grain of salt as Asia is a big place, and unlike Europe or the US, the interconnects between countries are poor. From where I am in the Middle East it's faster to ping Europe (~180ms) than Central Asia (~250ms). I'd love to see a report that takes this and CDN performance into account on a country-by-country basis!

jlgaddis · on March 26, 2015

And, FWIW, it appears (to me, at least) that the Asia testing likely only happens from Japan, Hong Kong, and Singapore. I imagine the latency from any of those locations to "Central Asia" is probably pretty high anyways.

_hnwo · on March 26, 2015

Same goes for "The Pacific" and probably "Europe". Very generic - depending on where the testing was actually done the results could vary a lot.

niklasber · on March 26, 2015

Agreed. Any plans on releasing this open source? Would be cool if others who are interested could set up country-by-country versions for Asia for example. Like one for Vietnam, Japan etc.

_hnwo · on March 26, 2015

Worth noting, similar things being done here: https://atlas.ripe.net with real-world examples (measurements from peoples home internet accounts, workplaces, etc - not 'sponsored' vms, not from some random data centre) and measuring just about anything (DNS and websites)

You can join, host a probe (get it sent to you, plug it in) and then run your own measurement tests based on the 'credits' you earn from having the probe running.

corford · on March 26, 2015

Nice! Just signed up to host a probe but looks like they'll only send you one if they don't already have good coverage of your location/network. Not holding out much hope as afaict Europe is already pretty much blanketed :(

rgbrenner · on March 26, 2015

Is this a good test? The fastest result is 5.9ms.. that means that DNS provider won because they happened to be in the same city where dnsperf tested from. That node is in Sao Paulo Brazil.

There are about 400m people in south america, and this test declares a provider to be the fastest for South America because both the provider and the test taker happened to pick Sao Paulo.

aeden · on March 26, 2015

Not really, no. In general tests for latency aren't that valuable except to perhaps show providers where they might want to place their next PoP, since if you have the ability to look at a finer grain you can actually identify generally where latency presents the greatest problem (i.e. areas where instead of 40ms response time, you see 400ms).

ukandy · on March 26, 2015

Not forgetting cache, and the fact that slower response times aren't horrendous in the scheme of things.

SixSigma · on March 26, 2015

My turn to be pedantic :

Speed is is the magnitude of velocity so bigger is faster.

What is displayed here is the time taken - the reciprocal of speed.

Personally I run dnrd, a dns caching proxy http://dnrd.sourceforge.net/

ajkjk · on March 26, 2015

No. Speed has a different meaning when talking about latencies.

SixSigma · on March 26, 2015

requests per second

cbd1984 · on March 26, 2015

> Speed is is the magnitude of velocity so bigger is faster.

If you're going to be pedantic, velocity is a vector, and vectors must reside in a vector space, so what vector space do the velocities in this context reside in?

Ignoring all context to call into doubt the use of a perfectly well-understood word adds nothing to the discussion.

contravariant · on March 26, 2015

That vector space would be the real numbers.

SixSigma · on March 26, 2015

request to and from the server are in opposite directions and travel along edges of a graph via nodes.

colmmacc · on March 26, 2015

[Full disclosure: I've worked on Amazon Route 53]

It's always neat to see nice data collection like this, but unfortunately the average speed to the authoritative name servers isn't a very meaningful measurement. Real world resolvers bias heavily towards the fastest name server for your zone, and they are so latency sensitive that they'll do things like issue concurrent queries to several name servers at the same time.

The upshot of that is that what really matters is the latency to the closest name server, or at worst the latency to the 3rd fastest server; for the rare boot-strapping cases. Bind, the most common resolver by far will issue up to 3 concurrent queries to different name servers as part of its SRTT algorithm. The next most common resolvers; Unbound, OpenDNS, and Google Public DNS perform pre-fetching and so the latencies aren't contributing to the user experience except for extreme outlier queries.

Some large DNS providers design to this behaviour, and seek to increase the average distance to their DNS servers by operating the name servers for each domain in different data centers. That gives routing and path diversity for the DNS queries and responses. Since network path diversity increases with distance, this works best when you include a location or two that are quite far away, which increases the average latency to those servers - but thanks to resolver behavior doesn't do much to the user-experience.

A write up for Route 53's consideration of the trade-offs is here: http://www.awsarchitectureblog.com/2014/05/a-case-study-in-g... (there's also a video about the role this plays in withstanding DDOS attacks: https://www.youtube.com/watch?v=V7vTPlV8P3U around the 10 minute mark).

Where the average latencies are low, all of the name servers are in close proximity to the measurement point and I would wager that the network path diversity is probably quite low. A small number of link failures or ddos/congestion events, maybe even one, might make all of the servers unreachable.

A more meaningful measurement of the speed itself is to perform regular DNS resolutions, using real-world DNS resolvers spread out across your users. In-browser tests like Google analytics go a long way here, and it's fairly easy to A/B test different providers. The differences tend to be very small. Caching dominates, as others here have mentioned.

Apologies if I seemed to rain on dnsperf's parade here; it's a neat visualization and measuring this stuff is tough. It's always good to see someone take an interest in measuring DNS!

mslot · on March 26, 2015

[Full disclosure: I've worked on Amazon Route 53 ;)]

The RTT mechanisms in resolvers have a high degree of randomness and will aggressively try the other, slower name servers again. E.g., out of 1000 samples, my desktop in the Netherlands (via XS4All) sees low latencies from Route 53 ~60% of the time:

  $ seq 1 1000 | xargs -n 1 sh -c 'dig test-$0.trosc.com | grep "Query time" | awk "{print \$4}"' | histogram.py -f "%3d"

   18 -  59 [   609]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
   59 - 100 [     1]: 
  100 - 141 [     1]: 
  141 - 182 [   229]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  182 - 224 [   102]: ∎∎∎∎∎∎∎∎∎∎∎∎
  224 - 265 [    37]: ∎∎∎∎
  265 - 306 [     1]: 
  306 - 347 [    15]: ∎
  347 - 388 [     2]: 
  388 - 430 [     2]:

This looks decent at the median (20ms), but falls off beyond with 185ms at the 90th percentile and 88ms average, with one >1s outlier removed.

As you pointed out, Route 53 optimizes for availability and DDoS resilience over RTT performance. There are 4 name server IPs to choose from which gives me 4 different paths to 4 different server locations via anycast, giving me 4 different RTT buckets. Few DNS providers go through such lengths for availability. Still, 185ms is a lot. It's probably because anycast/BGP advertisements from the US get to AMS-IX with a smaller number of hops than competing advertisements from European locations. I would guess Route 53's current striping is not heavily tuned for RTTs.

Caching solves part of this, but there are a lot of resolvers out there. As a thought example: Assume your sources of traffic are uniformly distributed among 75000 resolvers and you use 60 second TTLs (pretty standard), then you won't see significant benefit from caching until you get to >>1000 requests/s.

Many applications also have a long-tail of DNS names and basically wont benefit from caching at all. This could be motivated by availability as well (think shuffle sharding :). I'm building one where DNS query time currently dominates page load time (especially aliasing to CloudFront can be slow :). It's useful to understand that there's a general availability vs. latency trade-off in DNS that only gets partially addressed by the resolver.

moe · on March 26, 2015

Assume your sources of traffic are uniformly distributed among 75000 resolvers

I think your argument is flawed because your users are not uniformly distributed among those 75000 resolvers.

In practice just ~1% of the resolvers (Comcast, NTT, Telekom, etc.) are handling >90% of your users. Consequently the benefits of caching kick in much earlier and stronger than you suggest.

mslot · on March 26, 2015

Large ISPs and public DNS resolvers typically don't use a single server, but rather a fleet of DNS resolvers each with their own cache. Some providers like Google Public DNS use two-layered cache, but it's still fragmented per server location. A lot of people also have their own resolvers, think companies especially.

The 75000 was mostly a thought example, it's very hard to know what a good number is, although there is a Route 53-related reason for that number. In any case, the benefit of DNS caching is probably much less than you think due to short TTLs and the number of resolvers.

moe · on March 26, 2015

Large ISPs and public DNS resolvers typically don't use a single server, but rather a fleet [...]

I assume all major ISPs use 2 or 3 layers of cache, which makes the size of their perimeter fleet largely irrelevant.

it's very hard to know what a good number is

Could you perhaps ask your former Route53 colleagues for some log-file insight?

the benefit of DNS caching is probably much less than you think due to short TTLs and the number of resolvers.

I don't think so. The overwhelming majority of clients uses their ISPs resolver. So all it takes is one hit per major ISP per TTL to keep it zippy for almost everyone. That's why DNS works so well, after all?

mslot · on March 27, 2015

I assume all major ISPs use 2 or 3 layers of cache, which makes the size of their perimeter fleet largely irrelevant.

Not really. The resolvers tend to be geographically dispersed and use anycast. Having a multi-layered cache would probably decrease performance, except within a specific location.

There are some nice research papers studying DNS resolvers, e.g. here's one for cellular networks: http://www.aqualab.cs.northwestern.edu/component/attachments...

Could you perhaps ask your former Route53 colleagues for some log-file insight?

They see what's behind the cache, not how much traffic the resolvers are taking. Could be the same, could be 100x more, hard to tell.

So all it takes is one hit per major ISP per TTL to keep it zippy for almost everyone. That's why DNS works so well, after all?

Caching works great with long TTLs, e.g. as used for NS, MX, CNAME records. The problem is the 60 second TTLs that are commonly used for A records in cloud services. Except for reasonably high volume names, it's not highly probable that your A records will be in a given cache at a given time. Many applications also use many different domain names (e.g., one per user), which creates a long tail of low volume names.

Of course, traffic is not uniformly distributed in any way, so there might be parts of the day when your name will be constantly served from cache everywhere, or parts of the world where it is never served from cache.

rilita · on March 26, 2015

The site is very clean looking and it is entertaining to look at the results.

That said, there are a number of problems with this:

1. As stated by other people here, this is only from the perspective of where the tests originated, and is somewhat irrelevant once the DNS result is cached by your ISP.

2. If you are using any of a number of modern browsers, DNS will be prefetched before you even click on any link, so the DNS time from a user perspective will be zero the majority of the time.

3. The point of this was originally for choosing a good host for shared JS files. It is much more meaningful to use a distributed host with location based DNS to point you to the nearest fastest server, rather than just making the DNS request itself as fast as possible. Why do you think Google has so many servers all around the world?

andypiggott · on April 2, 2015

We're thrilled to see a new DNS performance monitoring tool in the public domain. It takes courage to release things like this to the world and open them up for critique, I applaud jimaek, the author, for doing just that.

As many have noted, the results are highly varied from many of the other reports we see today but more locations and tests are going to continue to improve accuracy.

The testing is synthetic, that is, it's run from within datacenters. I'd encourage anyone who can help provide/sponsor a VPS or resources to also do so. Providing consistent benchmarks against Authoritive Queries from know locations is great but the test network itself needs to expand to get truly meaningful comparisons between providers.

I'd hope to see real-user data also get added in the future, that is, testing on the end of home cable/broadband connections, these tests perform differently to synthetic tests but represent end-user experience more accurately.

It would also be great to see raw lookup performance between providers, exclusive of network latency.

For those interested in diving deeper into DNS performance, I'd also recommend reviewing Cloud Harmony's reports, the latest of which can be found here: https://cloudharmony.com/reports/state-of-the-cloud-dns-0315...

tribaal · on March 26, 2015

Interesting for me to see DigitalOcean perform so badly (in this particular metric), especially since they seem to be pretty fast for me in general.

aeden · on March 26, 2015

My 2c:

DNS latency is only one factor in overall perceived performance, and it can be affected greatly by routing, traffic load, and other factors. Additionally, many requests you'll make will likely go through a resolver with caching anyhow, so that too can have a significant impact on perceived latency.

Thus DO (or any provider for that matter) only really needs to provide reasonable resolution times from their servers to be effective.

A more important matter is consistency, i.e. are they consistently returning responses or are they actually dropping packets due to load. With a service that's growing quickly, keeping up with the increase in load on their network probably presents a more significant problem.

danielsamuels · on March 26, 2015

They recently switched to using CloudFlare, look at the detail graphs - http://www.dnsperf.com/provider/DigitalOcean

I'm not sure why they're still so high though.

acdha · on March 26, 2015

This a a great service – and I'm really happy to see that “world-wide” isn't defined as “North America and Europe”. Hopefully it won't be too hard to add Africa, the Middle East and India to the mix as I've seen much more network variability in those regions.

For this kind of measure, it would really be better to use either a median or, better, reporting multiple percentile performance. In my experience, DNS timings have extremes which might be surprising (e.g. most people see 50ms but 10% of visitors see 5+ seconds) and a few large samples will distort an average.

Your 2 second cap will cap the distortion, which is much better than e.g. Google Analytics which will merrily record multi-hour timings, but I'd really prefer to see e.g. 50th, 75th and 95th percentiles to see whether a particular provider might have better worst-case performance than another.

windexh8er · on March 26, 2015

Throwing my $0.02 into the pool...

I've been very happy with DNSMadeEasy (although I think the name somehow diminishes it's technical viability somehow - not sure why, but it sounds "cheesy" which is not reflective of the service at all). I am very glad to see them in the top 10 for the most part, they offer a fantastic product, well documented easy to use API and great management panel.

For comparison I've tried Zerigo, PowerDNS, UltraDNS, Dyn and Route53 and dropped them all for either cost or technical reasons. For my small use cases I've found DNSMadeEasy to be a nice middle-ground.

Would be interesting if the OP could add some sort of subjective rating options from outside sources in terms of user reviews. Could rank the providers on performance (latency), cost, functionality (API, Anycast, etc) and management.

corford · on March 26, 2015

I was surprised Dyn weren't faster. Drilling down, I wonder what was going on between 25th Feb and 11th March? http://www.dnsperf.com/provider/Dyn

andypiggott · on April 2, 2015

Dyn's early results were not representative of our network or service. This was as a result of the limited number of nodes currently testing and the location of these nodes. We found a routing issue between Azure and ourselves via one carrier leading to elevated results for a short period of time.

There's been great commentary on here about providers showing as fast because of the proximity of the tests. This will only be rectified by adding more monitoring and a broader base of test locations.

I know jimaek is working on getting new locations online - if anyone here can assist in providing or sponsoring test locations, I encourage you to reach out to him.

jimaek · on March 26, 2015

I can say that it was nothing related to dnsperf. The perf change was on Dyn's side

puzzlingcaptcha · on March 26, 2015

Nice to see OVH doing well. I have been really happy with their smallest 'VPS classic', at a third of a price of a t2.micro or linode (2.40€) it's a lot of bang for the buck. Plus I have a top level domain from them for like one euro per year (.ovh) so I can get a free ssl cert from StartSSL as well. Dirt. Cheap.

beauzero · on March 26, 2015

Does godaddy use a third party DNS or were they just not on the list?

Thaxll · on March 26, 2015

100ms on Google DNS, there must be something wrong with your test...

scrollaway · on March 26, 2015

How is OpenDNS and DNSCrypt not in the list? Am I missing something?

aeden · on March 26, 2015

The tests appear to be looking at a subset of authoritative DNS providers, hence OpenDNS (and Google Public Resolvers), which are resolvers, not authoritative name servers, are not present.

cbd1984 · on March 26, 2015

I'm trying to understand here: Aren't Google's Public Resolvers authoritative for the domain names Google itself owns?

aeden · on March 26, 2015

I'm fairly certain their resolver name servers are separate from their authoritative authoritative name servers.

nateguchi · on March 26, 2015

They appear to be zone servers only

Ono-Sendai · on March 26, 2015

X-axis should be labelled.