Hacker News new | past | comments | ask | show | jobs | submit login
We abused Slack's TURN servers to gain access to internal services (rtcsec.com)
381 points by reader_1000 on April 7, 2020 | hide | past | favorite | 78 comments



So Slack's VoIP uses WebRTC, which connects via UDP/TCP to always send SRTP packets through a TURN proxy (which extends STUN via ICE) to work around usual NAT problems. These guys scanned the TURN and found an SSRF which allowed them to connect to Slack's VPC on AWS using IAM temporary credentials. Interesting.

For fun, read that last paragraph out loud to a non-techy near by and watch their eyes...


This is a nice summary actually. (btw, you can read it to techy-but-not-in-the-field and still get the same look. I am not sure if I should be sad or proud from the fact that I understood 90% of what you have said without google-fu...)


If you are into SIP this is pretty well known.


I am not :). Well, at least "not anymore".


I understood every word of that and my eyes still glazed over. :)


Could be a line on CSI: Los Angeles.


Do you mean this paragraph?

"Our recommendation here is to make use of the latest coturn which by default, no longer allows peering with 127.0.0.1 or ::1. In some older versions, you might also want to use the no-loopback-peers."


I believe GP means this paragraph:

> So Slack's VoIP uses WebRTC, which connects via UDP/TCP to always send SRTP packets through a TURN proxy (which extends STUN via ICE) to work around usual NAT problems. These guys scanned the TURN and found an SSRF which allowed them to connect to Slack's VPC on AWS using IAM temporary credentials. Interesting.


Thanks! I don't know where my head was.


Sounds like you're STUNned. Try not to TURN you head and maybe put some ICE on it.


OP means they paragraph they just wrote.


Thanks! I don't know where my head was.


Hey I thought he was referring to the last paragraph in the article too…


Read it to myself and my eyes glaze over, and I've spent the past couple of weeks trying to decipher all the acronyms involved in WebRTC!


I've worked with SIP and H323 but not WebRTC so I knew about STUN/TURN/ICE, but you're right about the acronym-soup, even to those who have networking experience --- VoIP is its own little niche. (Along the same lines, I've been a bystander to a group of GSM developers' meetings and it's just as incomprehensible.)


Even a techy who isn't familiar with networking protocols would start to glaze over!


Not so much networking protocols but WebRTC maybe? I did a hello world type of implementation of WEBRTC awhile back and it made perfect sense to me.


Yeah there's the WebRTC you can hack in an afternoon and there's the WebRTC that covers the most common edge cases. The problem that the solution to one problem (for example your websocket going disconnecting mid call while WebRTC still soldiers on) can create three new problems when a different edge case arises.


I recall Elon Musk's Acronyms Seriously Suck memo


Heh, reminds me of neuromancer


I read your paragraph and almost didn’t read the actual article :(


Acronym city right here.


Huh, I happen to be knee-deep in this stuff right now. This article noted that Slack seemed to be running an old TURN server (pre-coturn):

https://webrtchacks.com/slack-webrtc-slacking/

Given that the latest coturn has this vulnerability mitigated by default, perhaps all this boils down to is "Slack runs outdated software, we exploited it."?


It's not really a bug in old coturn, just a feature in the protocol. According to the article newer versions just disable routing to 127.0.0.1 by default but there are still other network addresses you might have to consider (see article for a recommended list of "denied-peer-ips").


You may already know this, but it's worth getting the word out. Do not just deny routing to 127.0.0.1. 127.0.0.1 is merely the conventional "localhost" address; however, ALL 127.x.x.x is "localhost". You can check this now on your local command line with "ping 127.1.2.3".

(This just seems to be one of those bugs that every proxy goes through at some point, just like pretty much any attempt to write a web server that serves files off disk will have at least one directory traversal bug.)


Fun fact, windows blocks remote desktoping to localhost (because doing so would lock the console session unless you break license and modify termserv).

Fun fact 2, this only blocks 127.0.0.1.

(side note: if you don't mind breaking license compliance on your personal pc, you can use this to remote desktop into other user accounts on the same pc for a nice separation of work spaces (or just move to linux))


You are not breaking the license if you pay per-seat rather than per-device.

Ask me about CALs, I can tell you all about them.


Let's start with an easy question, what is CAL?


Client access license. For Microsoft Windows they come in per-seat and per-machine. To OPs point, they were talking about RDP. When you are licensing per-machine then (windows server) you are limited to one desktop user and two remote users (for windows desktop it’s one user at a time). When it is per-seat then you can have as many concurrent connections as you have licensed for. Per-seat is enforced by Microsoft Licensing Server


I was mainly referring to personal pc uses, not corp/server users.

It's still technically breaking the license to override TERMSERV.dll's protections on home/professional PCs not allowing somebody to be logged in while somebody else is already logged in.

ie, a home or professional licensed pc can not support 1 user on remote desktop, and 1 user sitting down in front of it at the same time, you have to pay for a server license to get that functionality. This technical limitation is considered a DRM measure and also protected by the DMCA anti-circumvention provision.

if you don't mind breaking the law, I was pointing out how you could use abuse 127.0.0.2 to connect remote desktop to localhost in other to do seperate user accounts for seperate tasks, (such as putting your job hunting activities in a seperate account so you don't get distracted by discord/chrome desktop notifications and reddit in your new tab page)


Also don't forget that IPv6 ::1 is localhost too.


The IP protocol is full of security gotchas so you better use a higher level protocol. Spoofing, arp attacks, TCP session hijacking, etc.


Yes and I think you typically want to use this whole recommended list of deny-ips. Maybe that should be the default for coturn.

But running coturn in its default-config isn't a good idea anyway.


Isn’t that 127 address a bogon?


Yup, sorry if my phrasing implied it's a bug. It's just better defaults, and there's evidence to support the idea slack was just running the software with the old defaults.


But its not necessarily the defaults. The article isn't clear if the private services are on 127/8 or the TURN server has access to other things in the DMZ/VPC/whatever.

I'm guessing its actually the latter as everyone is so fond of the 1:1 VM/container->service model. Meaning its likely a config problem with the denied-peer-ips the parent here links.


Timeline—

November 2017: added TURN abuse to our stunner toolset

December 2017: discovered and reported TURN vulnerability in private customer of Enable Security

February 2018: briefly tested Slack and discovered the vulnerability

April 2018: submitted our report to Slack, helped them reproduce and address the issue through various rounds of testing

May 2018: Slack pushed patch to live servers which was retested by Enable Security

January 2020: asked to publish report

February 2020: disclosure delayed by HackerOne/Slack

March 2020: report published


Things like this are why mTLS internally are so important. If a hole is found in your firewall, services still don't trust each other until they have a valid TLS certificate.


Totally agree. I've been rolling every service out with mTLS. It was a huge PITA tho without a service mesh (which we can't use for different reasons), so I built a drop-in solution for use with any Kubernetes.

I'm still developing on it a bit but my solution is open source [1]. If anybody want to use this I'm happy to provide answers to questions, and quick bug fixes (as this directly benefits my work right now). If you're using kubernetes this is a pretty easy drop in for your pod. It's part of our default setup now.

[1] https://github.com/FreedomBen/metals


I think Istio gives you mTLS for free if you add it to your kubernetes cluster.


> I think Istio gives you mTLS for free if you add it to your kubernetes cluster.

Yes, Istio was the service mesh I referenced above that we can't install for different reasons:

>> It was a huge PITA tho without a service mesh (which we can't use for different reasons)

If you have Istio then you don't need MeTaLS (unless your client comes from outside the cluster or something, and even then I think there are ways to make it work).

I don't know that I would agree that it is "for free" as Istio still needs to be configured, and it isn't trivial from my experience. I could also see situations where something like MeTaLS where you place a few env vars for certs and you're done is nice to have. I would definitely recommend Istio if you can use it though.


Any kind of inter-service authentication, really. And for lots of reasons, not just SSRF. But regardless: coherent inter-service authentication is not a norm.

If you're exclusively interested in mitigating SSRF, a more targeted solution is to run your connections (HTTP or TCP) through a proxy that enforces network-level rules. That seems like it would have worked here. For HTTP SSRF, Stripe has a good tool, Smokescreen.


I worked for a large tech company that was hacked by Russians at least twice, and they had a not only unencrypted, but unauthenticated sql front end and api to their inventory management system that listed every server on the network and every piece of software and version installed on it, as all the user accounts on it and privileges as well as sysadmin contact info and everything you’d need for social engineering. I realized how bad it was when I was using it to find all the servers at the company that needed to be patched for heartbleed. Anybody on any server in our network could get to it, or just someone who plugged a laptop in at the office or got in to our vpn, and you’d have the keys to the kingdom. I told the head of security and he said it would break too many things to put it behind authentication.


I wish there was an easy way to do mutual TLS auth with pre-shared keys that can be stored and copy/pasted just like normal passwords or API keys without having to maintain a CA and handle certificate issuing & renewal (sure, technically forever-lived certificates aren't as secure, but even those would already be a major upgrade compared to the status quo).


I personally use cert-manager to run the CA, then create a cert for each app and have k8s inject it. It is more manual than a service mesh, but many applications support this strategy out of the box. (I will say that creating a certificate resource is a lot easier than the old days of some directory of shell scripts acting as your CA, though. And the same code manages my letsencrypt certs.)

For example, I have Grafana backed by Postgres, and they both understand this authentication scheme out of the box. Postgres is happy to be provided with a cert to present to connecting applications, and is happy to check the cert that applications present against the CA cert.

The main problem with my setup is that I use a ClusterIssuer CA, so really anyone in the cluster can get a valid certificate. This is not amazingly secure and things like Istio do a bit more provenance checking of the application before issuing a cert, which I like. But this is simple, and does protect against the attack this article covers -- as long as you don't go out of your way to present the application's cert when proxying a connection. (Which is probably an easy mistake to make, so be careful.)


Shameless plug (I really did not intend to promote this yet but it can help so figured I'd mention it in case you are interested), but if you are using Kubernetes you could get pretty darn close with MeTaLS[1]. Generate self-signed certs that last for 10 or more years and copy/paste them into environment variables for your service, and you've got it. Of course I don't recommend that as it's not as secure as doing things "the right" way but it's definitely better than no mTLS at all (as you pointed out).

MeTaLS won't provide you with client-related stuff, but most clients and client libs make it easy to set a certificate/key with a request.

[1]: https://github.com/FreedomBen/metals


Huh? TLS has PSK ciphers. They're popular in low-power IoT devices, because even ecc crypto is rather expensive, compared to e.g. AES.

I'm not sure if any of the PSK modes manage to work with perfect forward secrecy though. Otherwise, leaking the PSK would also allow decrypting any previously-sniffed traffic.


Funny enough Slack itself recently open-sourced https://github.com/slackhq/nebula which does exactly that.


As a complete novice in this area, I don't understand the advantage of using a proxy-like service such as TURN.

What is the advantage over simply routing the media streams through application servers (i.e. user A connects to server which links to user B) which can then perform application-specific authentication, enforce restrictions on payloads, etc... Performance?


If you have a centralized server, you have a SFU. SFUs typically expose a range of UDP (and/or TCP) ports for communication. Peer connections are allocated on a port basis. So if a user is connected to your SFU, they take up a port, and need to be able to egress over a large UDP/TCP port range to connect, since the port is assigned randomly.

However, many firewalls block port ranges, or even UDP entirely. What you really want is a way to let people speak WebRTC over a common port (443 TCP is almost never blocked.) TURN facilitates this. Sometimes it's built into SFUs, sometimes not, and requires coturn in front of it. In Slack's case (and the project I work on as well) they are running Janus, which does not have TURN built in, and hence, run coturn to facilitate TURN.

Slacks's approach is particularly interesting because they always push people through TURN, instead of allowing direct connectivity to their SFU. Hard to say why exactly, but probably it's a mix of locking down SFU onto the private network for some reasons, being able to push TURN to edge but keep SFU on private LAN, etc. Typically you don't do this I don't think, you run TURN and SFU both with public IPs, and the client connects to one or the other depending on what ICE candidates win (which is a function of their firewall rules: your browser tries to pick the 'best' candidate it can get to, ideally one over UDP without a TURN hop.)


There is no reason an SFU couldn't run everything over one port though! Then you can just use the 3-tuple to route stuff to the proper connection.

Someone is doing this right now for Pion, really excited to see it. I am especially excited to see what it means for deploys, right now asking people to expose port ranges adds so much overhead vs 1 UDP and 1 TCP for media.


Can you elaborate on what this means?


Right now most SFUs start up an ICE Agent [0] and listen to a random port. ICE is used to establish the connection between two peers. Basically both sides exchange a list of peers, and try to find the best path.

With an SFU you end up having thousands of remote peers each with their own port on your server. However you could easily listen on a single port and then handle the inbound packet depending on what the remote 3-tuple is (clients ip/port/protocol). Effectively you would just be running all your ICE Agents on one port, but doing one additional step of processing.

I need to fill out [1] more to fully explain the idea, but I think it could make a huge difference when making it easier to deploy WebRTC SFUs.

[0] https://github.com/pion/ice

[1] https://github.com/pion/webrtc/wiki/SinglePortMode


Yup that's a great point. I'd love to see this approach explored further. Is there any risk of tuple collisions in some bizarro NAT situation? I'd guess not, since the remote tuple needs to route to a single destination, but there's some weird stuff out there... eg one could imagine a router abusing the IP protocol to somehow route packets to different destinations despite them having the same return IP/port combo. i'm no networking wizard, but in general i assume if its possible, someone is doing it :)


SFU?



Pushing everything through a proxy does not seem ideal. Seems kind of like the easy road to adding VoIP to everywhere that slack already works.


My knowledge is about 2 years old on this but I can try to explain: TURN/STUN are to facilitate users communicating behind NAT and firewalls. TURN routes all traffic through a central server and pushes it to clients which it has an established connection with, thus getting around NAT/Firewall. STUN is a bit more lightweight in that it really just helps users to negotiate a normal P2P connection and then they send messages directly to eachother.


Thanks! That's in-line with what I thought was going on. It sounds like TURN is very close to being an open proxy.

Rather than falling back from p2p to STUN to TURN, why not replace TURN with something more application/protocol-specific?

Perhaps a webrtc-only proxy that performs authentication and can perform authorization along the lines of: user A is (only) allowed to connect to user B using protocol WebRTC.


A TURN server has to do much less computation, and it also doesn't need to decrypt the payload. It's more or less a fancy packet forwarder.

In addition, only a fraction of users will need TURN; the rest can use direct peer connections with the aid of NAT traversal; the two kinds of connection are more or less the same to higher layers. Conversely, if the application depended on an application server to process data, chances are you wouldn't implement a second version of the same protocol that works without the server.

So a single TURN server can handle a lot more traffic than an application server, is potentially more secure, and is more easily shared between different applications, and even different owners.

If you want it geo-distributed for latency, the ability to share the same TURN servers between different applications and owners gives you cost-latency advantages too.


> In addition, only a fraction of users will need TURN; the rest can use direct peer connections with the aid of NAT traversal;

Is there actually any data on this or is it mostly anecdotal? Because I've done my own experiments on hole punching before and its almost impossible with today's routers, almost all of which it seems implement symmetric NAT (impossible to match the ports after initial contact with the STUN sever cause it becomes assigned randomly). Compound this with the fact that some ISPs have more than 1 layer of NAT, I have trouble believing that the majority of Slack users either have a direct public IP or a convenient way to conduct NAT traversal successfully.


> I have trouble believing that the majority of Slack users either have a direct public IP or a convenient way to conduct NAT traversal successfully.

Google's libjingle documentation[1] alludes to a statistic that says that "8% of connection attempts require an intermediary relay server".

This will obviously depend on the user demographic, I would guess that users on corporate connections are probably less likely to form successful p2p connections.

[1]: https://developers.google.com/talk/libjingle/important_conce...


A custom intermediary needs to perform some expensive operations, such as decryption and re-encryption of the DTLS and SRTP going through it. It's much simpler and cheaper to just forward packets.


I think many TURN servers do this, but Slack's didn't.


Edit: nvm, confused STUN and TURN


It sounds like the TURN server is effectively acting like an (open) proxy. Wouldn't that mean the operator still has to have the infrastructure to handle the connections + traffic?

I'm assuming, perhaps incorrectly, that most of these RTC connections are happening over NAT and therefore usually go over TURN rather than by connecting directly. Even if that's not the case, why not try direct p2p connection first then fall back to routing through an application-specific proxy, which can have tighter controls on who connects to who and what payloads they send?


Sorry, my bad, I indeed confused STUN and TURN.


am i wrong or security researchers aren't paid well. i mean not sure how much this bug is wort but def. $3500 looks like a small number.


Yeah, I had the same thought. For something as big as this? Should be at least 2 more zeros imo.


I don't understand what's so big about this. It's akin to telling someone that they forgot to use passwords on their mongodb database. Does that really deserve $350k compensation?


Depending on what a black hat could do with the data in your database, it might absolutely be worth it. I understand that 350k is way more than bug bounties usually pay, but 3.5k is taking advantage of people's ethics to outsource your security.

Let's put it another way: The team who discovered this has skills WELL worth 350k for a year's worth a work. How many security issues would they have to catch for it to be "worth it"? Maybe more than 1, but 100 show stopping vulnerabilities for 350k is crazy to me.

edit: ESPECIALLY slack, if it was possible to use this to get access to any chat logs.


No, none of this is how vulnerability research compensation works.


Instead of sending traffic anywhere, why don't they have the destination address first send a (slack-authenticated) request to the TURN server saying "I'm happy to receive traffic from [SOURCE]" and then a temporary window is opened for [SOURCE] to open a connection to that specific destination.


tldr-

November 2017: added TURN abuse to our stunner toolset

December 2017: discovered and reported TURN vulnerability in private customer of Enable Security

February 2018: briefly tested Slack and discovered the vulnerability

April 2018: submitted our report to Slack, helped them reproduce and address the issue through various rounds of testing

May 2018: Slack pushed patch to live servers which was retested by Enable Security

January 2020: asked to publish report

February 2020: disclosure delayed by HackerOne/Slack

March 2020: report published


Don't use indentation for formatting linebreaks. It beaks HN layout.

Just add extra linebreaks


I've fixed the formatting now.


kind of important almost title-edit-worthy to note this is an exploit and research that went on late-2017 until about mid-2018 no? Not that this is some current thing


Published March 2020. It's not about some Slack issue that is irrelevant now. It's about misconfigured TURN-servers and at least for me it's a current thing ;-)




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: