Hacker News new | past | comments | ask | show | jobs | submit login
How NAT traversal works (2020) (tailscale.com)
351 points by robocat on March 19, 2022 | hide | past | favorite | 37 comments



I submitted this because it is the best summary of NAT I have read, and it goes into some of the technical details. Kudos to tailscale for writing this.


This is a really well written article, one of the best summaries of NAT traversal I have ever read. I started to laugh mid-way, because it's amusing (and a bit sad) the lengths we have to go to establish communications using the IPv4 addressing space.

It's good to learn about ICE too. I had'nt thought about the security implications : indeed this whole mess makes it very easy for an unprivileged attacker to trick the endpoints into relaying the traffic to their own gateway. For example, if I understand things well, another user behind the NAT can easily make the NAT device relay the traffic to them by sending probes at the same time. I'm not sure how can this be defended against, and if it's even possible.

Is anyone aware of any interesting reading on the subject of MITM of NAT traversal ?

Are there any good software libraries that abstract away all these details


In most cases ARP spoofing [0] is probably a lot easier, and gives you all the traffic, instead of just one connection. And if we're talking something like a CGN where ARP spoofing isn't possible, how would you know when to try to hijack a connection if you can't already see the signaling traffic. And if you can already see the signaling traffic, you probably have easier means to MITM.

And even if you managed to get past all that. Don't they layer encryption and authentication on top anyways? I know WebRTC uses DTLS and the certificates are exchanged through the signaling channel. So unless you can MITM the signaling channel already (which probably itself uses TLS), it won't get you anything.

In short, I don't think it's particularly interesting.

[0]: https://en.wikipedia.org/wiki/ARP_spoofing


Of course, but ARP spoofing doesn't have the same requirements (being on the same L2 domain), and it can be easily defended against at the switch level.

But indeed, this attack requires us to know when (and where) to start sending probes.

> And even if you managed to get past all that. Don't they layer encryption and authentication on top anyways?

That's true of most network-level MITM attacks, it doesn't mean they are useless, as the layered encryption and authentication is not always properly implemented.


The remark about cryptographic authentication in probe packets is the key. The attacker mustn't know what your packets say, or be able to fabricate or (important!) save and re-use what it has seen before.

It involves lots of public-key encrypted random numbers carried in the probe packets. You discard packets that don't match your expectations, and you are safe(ish). If you have to avoid too much public-key encryption and decryption because your cores are slow, it gets more complicated, like establishing beach-heads and using symmetric encryption and, still, random numbers between those while establishing the next level up. But cores are usually fast enough nowadays, even on cheap routers and IOT SOCs.


I'm not sure I understand how can cryptographic authentication completely solve that problem, as it is only validated on the endpoints themselves, not on the router.

As long as you have a signal that probing has started, you can just start sending probes : even if their contents are not validated, the NAT device will still define mappings accordingly. The probability distribution for the birthday attack in this configuration is a bit different, but not that much : for 3 devices to get to the same port number, at 4096 probes you get a ~93% probability.

The only way I see would be blocking probes that match an already-received invalid probe, but that creates other problems as it allows an attacker (or even just corrupted packets) to block this communication.


With care, a hostile router can interfere with you getting a connection, but cannot trick you into connecting to it instead.


You may also be interested in this: https://samy.pl/slipstream/

It's an attack leveraging NAT workarounds (like SIP ALG) to potentially access any device behind a NAT by letting a single device load some content sent with the right package sizes and fragmentation properties (say, by publishing a malicious ad).


Modern computing seems to be a story of "worse is better", and not just the classical "worse is better" of Gabriel's paper but something that amounts to "worse is better squared". Gabriel bemoaned the adoption of Unix and C over more "well-thought-out" designs, but Unix and C were at least designed. ("Say what you will, at least it's an ethos.") The same could go for TCP/IP's lean design as opposed to the heavyweight OSI model. But it seems like even these designs are too onerous for the modern day, and instead we build what would normally go on top of Unix or TCP/IP on the Web. All platforms are now the Web platform, which was never designed to run programs and hence is a non-cohesive mess. All networking is now client-server, or indeed is now just HTTP requests: goodbye end-to-end principle (hello NAT!), goodbye protocols other than TCP or UDP. We can build the same things on this platform as we did with native platforms, but we have to do it using kludgy methods which sacrifice a lot of performance and only work 95% of the time: so instead of TCP, we get WebRTC, and instead of assembly we have WebAssembly. And what do we gain? We gain security, and we gain ease of installation (just go to www.zombo.com vs go to zombo.com, click on "Downloads", click on "Windows x64", wait 4 minutes for it to download... and so on).

Can we do better? I think we can: if operating systems caught on to capability-based security, then the Web platform could become a legacy platform. (We see that phone OSes, which use capability based security, still direct us to "the app" rather than "the site".) And adoption of IPv6 could void all the kludgy workaround of NAT that we've had to develop.

But we live in the world we inherit, rather than the one we imagine, so currently all we can do is traverse NATs and write webapps.


Its stuff like what you outlined that drove me into product management. Technically better almost never easily translates to market success. you think TCP/UDP is bad? we are stuck with light water reactors because they were the first commercially viable plant design. Had a lead cooled reactor been viable before that we could have a much different nuclear power landscape. People could implement a new technology, but getting people to use it is the realm of sales, marketing, and the dark side of tech.


Wait til we look back at Li-ion batteries. We're blowing our wad on a technology that will likely never meet the need, which makes innovative technologies that actually do meet the need much less likely to be successful.

Worse is better.


C was not well-designed. In fact I would say it was poorly-designed It is weakly typed with a fair bit of undefined behavior. Many people praise Dennis Ritchie for creating C, but I don't. Ken Thompson did a great job on designing Unix and UTF-8 though. To me, Ken is the real hero of these two.


C was designed to the constraints of the time and place of its creation, and was a thoroughgoing success within those bounds.

It is our fault that we use it beyond.


C came out in 1972, but we had Pascal in 1970 which is strongly typed and doesn't have undefined behavior. Pascal wasn't perfect and did have a few issues with use in writing operating systems but Modula-2 and it's descendants were really good. The early versions of the Macintosh System were written in Pascal and used assembly to handle the low level bits Pascal was missing. With Rust we're finally getting back to safe programming. I'm hoping to see other existing and new languages with more safety.


Apollo Aegis, a system that was inspired, like Unix, by Multics, and was better than Unices of the time, was coded in Apollo Pascal. You could identify the difference between Standard Pascal and an actually useful Pascal by looking at Apollo Pascal extensions.

Pascal had null pointers and manual allocation, so was not really any safer than, anyway, C90. Of course C pre-90 lacked even function prototype declarations.


I heard C called the JavaScript of the 1970s once in that it's a mediocre language that got really popular by tagging along with other platforms.


It certainly tagged along with Unix but with 16-bit systems like the IBM PC, Macintosh, Atari ST and Amiga, it began to take over. We had safe languages like Pascal and Modula-2 at that point but computers weren't networked back then and programmers cared more about saving a few CPU cycles than having more stable programs. Today we know how dangerous C is for online security. Sure, safe code can be written in C but it's a lot more work for the programmer than using a language that is designed to be safe. It's good to see Rust come along and I'm hopeful to see safety being added to existing and new languages.


> but Unix and C were at least designed.

And instead of C, we get Javascript.


Tailscale just keeps impressing. Love how simple their product is and how everything they touch just seems so well done.


Thank you. I live in two places; if Tailscale "just works" I could use it. I came to the comments figuring if there was actually a better product people would say so here.


ZeroTier and Nebula are the only modern alternatives I personally know about.

https://www.zerotier.com

https://github.com/slackhq/nebula

Disclosure: I wrote and founded ZeroTier. Listed Nebula too for neutrality and completeness.


I’ve used it for a year or two and it’s one of my favorite tools. Everything just works. It’s as close to magic as I’ve found.


Are you saying that you have tried it and it doesn't work, or that you're looking for alternatives before you use it? (If the latter, try Nebula, or if the ends are static and you control enough of the network stack you could just do plain wireguard)


No, I hadn't tried it. WOW. It just works. Zero hiccups in the transition. No need to look further.

I can now toss code for variant SSH config files, I no longer need port forwarding rules on each router, and I no longer need my DynDNS subscription.

??? My NY computers are inside a university IP space, which simplifies library access. It would be very convenient to sometimes access the web from CA as if I'm using one of my NY computers. I can't determine if/how Tailscale supports this. ???


Tailscale supports remote web browsing because it supports SSH tunnels: https://medium.com/maxkimambo/web-browsing-over-ssh-tunnel-a...


It's also built in natively: https://tailscale.com/kb/1103/exit-nodes/


I learned more from this than I did my college networking class


What is the "best" IP/port signaling protocol? XMPP? SIP? It would have a naming scheme which can work without DNS and would handle direct IP/port "calls" without going thru a signaling server.

Such "IP phones" would use UPNP on domestic NATS to forward their port.

Which signaling protocols are able to handle IP/port handover via their signaling servers (in the various mobile network "roaming" contexts)?


This is so good, I clearly remember in the early days of gamedev a lot of this was known very much by word of mouth(we used to call it NAT punch-through) and it was a right pain in the ass to get working(layer on dynamic host migration and you had one of the most fun/challenging problems in gamedev networking).


If only we could use IPv6 addresses, and translate them back to IPv4 at the endoints if needed.


This is a really well written write-up, but I do want to point out that NAT traversal like this is not a new idea at all and this is by no means the first such write-up.

Here is one from 2014:

https://www.zerotier.com/2014/08/25/the-state-of-nat-travers...

Here's an earlier one from what seems to be the late 2000s:

https://bford.info/pub/net/p2pnat/

Here's an RFC from 2010 describing not only NAT traversal but a protocol for cryptographic addressing, which is another technique used by both ZeroTier and Tailscale:

https://datatracker.ietf.org/doc/html/draft-ietf-hip-nat-tra...

Here's an RFC for NAT traversal with STUN from 2008:

https://www.rfc-editor.org/rfc/pdfrfc/rfc5389.txt.pdf

I can keep going. I first learned about NAT traversal around 2002 and cryptographic addressing in the mid-2000s.

A lot of ideas in computing get invented and re-invented or at least re-popularized over and over again. Another such idea from networking is zero trust, which was originally called deperimeterisation and was developed by a group called the Jericho Forum in 2003-2005:

https://twitter.com/jonoberheide/status/1505160010371895299

It then got re-invented by Google as BeyondCorp in 2013, then by Forrester and Gartner as Zero Trust most recently. In this case we maybe had to wait for a more confusing term. Deperimeterisation more accurately describes what's happening and there seems to be a rule in networking that prohibits clear language that is not misleading. Zero Trust is a lie since (1) there is no such thing, and (2) the way it's usually deployed today delegates all trust to a single third party like Google or Okta that now has root on the entire universe. This is actually centralized trust.

My intent here is just to remind HN readers that what's new around here is often not new at all. Our field has an incredibly short memory and re-discovers things constantly. I've been on HN since the start and feel like I've watched several generations re-discover things that date back to the 1980s. Hell I watched the entire history of databases get speed run starting with the NoSQL trend (1970s hierarchical data models) and proceeding through the re-discovery of why the RDBMS became popular.


To be fair to the author I don’t think it was really presented as novel.


Yeah, except for the DERP protocol, my impression is that it's what they do in Tailscale but don't claim that what they have done is new. They even referenced STUN's RFCs, so I don't think that they are misrepresenting anything as new but simply describing in detail how their product works.


Yeah and from what I remember, DERP has some design choices that differentiate from STUN/TURN and that’s exactly how it’s presented.


I wasn't expecting such a good written article. Good job Tailscale!


I have been using a homebrew wireguard bounce server for a couple of years, and am now about ready to switch, wholesale, to tailscale.


> our coordination server and fleet of DERP (Detour Encrypted Routing Protocol) servers act as our side channel.

I love the creative acronym




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: