Handling bot attacks against a Tor hidden service

abuani · on May 6, 2017

I have never felt such a greater sense of inadequacy in cyber security as I have after reading this article. This is simply amazing the level of sophistication Eddie was in terms of an attack on the service, and the mitigation techniques used were things I never would have considered. I thought zip/tar bombs were just relics of yesteryear that older folks talk about when they discuss how fun it was to prank the new hires.

Serious question: how does one begin to gain the knowledge necessary to mitigate such an attack, and is it something that developers should be more familiar with?

WrtCdEvrydy · on May 6, 2017

Zip bombs are a classic way to crash any web service that allows you to upload files, modern AV will sometimes fuck up and bite into the file (normally in the legacy fields since they require 'brand name' 'well known' antivirus like 'Norton')

Especially if you are aware that they open the files to "extract info" from them. You can modify the file extension to the correct type and let it rip.

abuani · on May 6, 2017

I found this: https://github.com/abdulfatir/ZipBomb which I will be looking into today! Are zipbombs typically something a developer should be actively writing to protect against? Or does a library like Helmet typically provide protection against these attack vectors?

WrtCdEvrydy · on May 6, 2017

Protection generally comes in the fact that the AV will explode or your library will explode. You just need to ensure that such an explosion does not destroy your service as well.

SomeStupidPoint · on May 6, 2017

And try to make it explode quickly -- if it fails slowly, it can be uses to DDoS, by getting all your worker threads to spend most of their time on your files.

Cozumel · on May 6, 2017

Don't rely on libraries, just sort yourself out with learning the basics :)

abuani · on May 6, 2017

Not trying to be lazy here, but what are some legitimate resources to begin learning? I'm willing to wade through the complexity, so white papers or research is also very welcome.

fictioncircle · on May 6, 2017

Owasp presentations, Blackhat presentations, etc. usually give you a general idea of what people are seeing and then you research the specific attack(s) in depth. This isn't the sort of thing you can commit to memory.

https://www.owasp.org/images/c/c9/Benelux_day_20161125_G_Pel...

nikcub · on May 6, 2017

Pretty easy explanation: you've bumped into a few of the ~dozen people who are crawling hidden services for research or law enforcement purposes.

When you publish your HSDir they'll come and crawl, and chances are none of them were expecting a 50PB archive.org mirror and just got stuck.

It's likely that once the operators of each crawler realized this HS was an archive.org mirror they stopped the crawls.

The early version of a crawler I ran across hidden services would have tripped up in exactly this way[0]

Everything else in this post is either a misunderstanding of Tor[1] or plain paranoia.

[1] the top exit nodes have little to do with who is crawling or attacking a hidden service, France and Germany feature heavily in nodes because of the many cheap Tor-friendly hosts, there is nothing 'unusual' about unnamed nodes and the AS confusion is just someone doing a good job of staying anonymous - thanks for reporting them

rrobukef · on May 6, 2017

I'd say not holding to the standards of robots.txt and 403-Forbidden is quite malicious, just not evil or bad. If you build a crawler, you should play nice. But bot A-D were easily discouraged.

Eddie however is another problem. It overloads the network, doesn't crawl and doesn't parse the responses. This is not crawler behaviour...

The rest of the post is solid inductive reasoning (from my perspective): the bot is identifiable by his behaviour. It has a faster response time that a source-relay-source roundtrip. Thus the bot must originate there.

This is supported that the anonymous relays were set up just before the attack, all at the same time. And after the attack stopped the majority of all traffic through the relays stopped.

There are also ways to keep your registration private without resorting to fraud. Though probably a number of people think of this as the 'easy' solution.

nikcub · on May 7, 2017

> I'd say not holding to the standards of robots.txt and 403-Forbidden is quite malicious

Most hidden services don't publish robots files. The only ones that do are the proxy services (which are hidden services but not usually 'hidden'). The purpose of the proxying is to find, discover and monitor what are usually illegal or malicious services.

I don't think there are legitimate crawlers on hidden services - there are a couple of drug market search engines but they identify themselves outside of robots.txt

It's really difficult to run a large-scale hidden service because of this - you need to be able to throttle or block connections but not based on the inbound circuit. You also need to setup guards (which OP makes no mention of)

> It overloads the network, doesn't crawl and doesn't parse the responses.

It's likely adding those later responses into a crawl queue that is tens of thousands of URLs long.

Overloading the network is unintentional, usually your crawling is throttled by your circuit.

stefantalpalaru · on May 7, 2017

> I'd say not holding to the standards of robots.txt and 403-Forbidden is quite malicious, just not evil or bad. If you build a crawler, you should play nice.

https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea... :

> A few months ago we stopped referring to robots.txt files on U.S. government and military web sites for both crawling and displaying web pages (though we respond to removal requests sent to info@archive.org). As we have moved towards broader access it has not caused problems, which we take as a good sign. We are now looking to do this more broadly.

beardog · on May 6, 2017

Exits are not used in onion circuits at all.

mynewtb · on May 7, 2017

Nodes that are exits can also be used as relays I think?

jstanley · on May 6, 2017

Read the article again. Your comment is either wrong or deliberately misleading.

apeace · on May 6, 2017

> If I could easily tear down the entire tunnel from the remote client to my hidden service, then the delay to rebuild the tunnel would mitigate the resource exhaustion attack ... For example, if I see hostile activity from 127.0.0.1:12345, then I want to close the entire Tor connection associated with this port ... forcing him to renegotiate the entire tunnel.

This seems like a great suggestion for Tor. I hope the author will get in touch via the mailing list and see what solutions might be possible.

irl_ · on May 6, 2017

This is the second time in a week now I've seen this guy talking about Tor on HN. He just seems to not understand Tor.

> He's exploiting a vulnerability in the Tor daemon

This is a vulnerability present in literally any proxy and is a limitation of the operating system. If you open more sockets than you can open, you can't open any more. You can not have anonymous unlinkable connections and have tracking psudeonyms together.

If you wanted to scale, try OnionBalance.

https://onionbalance.readthedocs.io/en/latest/

> The TorStatus page has no country associated with the ASN information

I know that for atlas.torproject.org we use the MaxMind GeoLite GeoIP service. If this is not listed here, no information will show. This is more common than you would think, especially for Tor relays which are commonly hosted on smaller ASs than MaxMind care about.

DamonHD · on May 6, 2017

The mitigation goes against my instincts to tar-pit; but in this case it is the server not the client that is resource limited. Very interesting, and bloody annoying that someone would make such an effort to break a tool such as this.

libeclipse · on May 6, 2017

Did anyone else get a notification to install that site's certificate?

It's a strange technique.

oger · on May 6, 2017

Same here - but the site was asking me to identify myself with a cert. So the other way round to what you are describing. Did not have the time to follow through - but it's certainly very weird behavior. Could be some bad guys trying to identify who is stealing their show...

EvilTerran · on May 6, 2017

Most likely there's just something hosted on the same IP address that makes legitimate use of the "client certificates" feature of TLS. In order for that to work, the server has to express an interest in client certs, and that happens early in the TLS handshake, IIRC before SNI has been resolved - so even if you only want to use them on one domain, your server will always ask for them.

The way it's meant to work, the server can specify which certificate authorities it accepts client certs from, and your browser will only prompt you to pick a cert if you have one loaded from one of those CAs - if you don't, you won't even know the server's asking; in practice, some browsers will show the dialog in any case. ISTR some versions of Safari act like that.

(I ran into that at work - we were setting up a web API authenticated with TLS client certs, and started getting bug reports from (largely non-technical) users, completely befuddled by these dialogs that had started popping up for them on our human-facing domains; we ended up provisioning a dedicated IP just for the API to work around it.)

rrobukef · on May 7, 2017

The author has strong opinions about the security of standard TLS/SSL. He activated client side certifications for good security. The setup is mentioned in one of his blog posts, his opinion in another.

belorn · on May 6, 2017

The fake IP address whois was very interesting. While fake whois for domain names are common and not that big, fake IP address whois means that there was a ISP out there that endanger their peering. Feels similar to when a CA gives out bad certificates, and I wish RIPE would act similar to how browsers do it. There is few enough ipv4 address that we can afford to be picky about it in clear cases like this.

Sami_Lehtinen · on May 6, 2017

I played at some point with Tor bot code and I got rates up to tens of thousands of requests per second and hundreds of megabits per second. So the attacks don't sound too serious in this case. Also modifying Tor itself to allow higher data rates is possible, if you're already using anonymous systems. Then you can avoid the slowness caused by onion hops.

philamonster · on May 6, 2017

I always find the mitigation sleuthing stuff to be so damn fascinating and humbling. Inspirational stuff. Thanks to OP and Dr. Neal.

Matt3o12_ · on May 6, 2017

This might be a stupid question but it sounds like those attackers access the Tor Server directly (without using relays). If this is intact the case, why does he not just ban those IPs from those offending hidden and seemingly private relays? Wouldn't that solve the problem until they get a new ip?

c22 · on May 6, 2017

They're connecting through tor, not directly. I'm not sure what makes you think that.

irl_ · on May 6, 2017

This is also as I understood it. It is possible to have a single hop circuit to an exit, but I don't think you can do single hops to hidden services yet. The hidden service would also have to be explicitly configured to be a single hop hidden service (where it acts as its own rendezvous point).

I don't see any strong evidence that the box itself must be a high speed relay, and in fact, I believe that it is his service that chooses the 3 hop path to the rendezvous point and his Tor daemon that cryptographically verifies that path.

10165 · on May 6, 2017

"And it isn't like they were doing HTTP 'HEAD' requests -- no, they were doing 'GET' requests."

Some httpds treat them the same -- they still send the file after HEAD requests instead of only the headers.

jimktrains2 · on May 6, 2017

1) Can you provide an example?

2) Wouldn't it still be best to do HEAD in any circumstance where you don't want the body?

10165 · on May 7, 2017

1) nautil.us

2) Yes.

I was not implying one should do otherwise. I was just pointing out that servers that respond to GET will not always respond to HEAD as expected. Some sites treat it the same as GET. Others may not allow it. For example, Amazon responds with 405 MethodNotAllowed.

jimktrains2 · on May 7, 2017

http://nautil.us appears to respect HEAD, and uses Apache. Do you have a specific example of it not respecting it?

% curl -vX HEAD http://nautil.us Warning: Setting custom HTTP method to HEAD with -X/--request may not work the Warning: way you want. Consider using -I/--head instead. * Rebuilt URL to: http://nautil.us/ * Trying 107.20.148.228... * Connected to nautil.us (107.20.148.228) port 80 (#0) > HEAD / HTTP/1.1 > Host: nautil.us > User-Agent: curl/7.47.0 > Accept: / > < HTTP/1.1 200 OK < Access-Control-Allow-Origin: * < Cache-Control: post-check=0, pre-check=0, max-age=0 < Cache-control: no-cache="set-cookie" < Content-Type: text/html; charset=utf-8 < Date: Sun, 07 May 2017 04:55:46 GMT < Expires: Thu, 11 May 2017 00:00:00 GMT < Last-Modified: Sun, 07 May 2017 04:55:47 GMT < Pragma: no-cache < Server: Apache/2.4.25 (Amazon) PHP/5.5.38 < Set-Cookie: lbh_session=%2B67OvEeIwXsDYbsLxSZLVjlyp%2BWUj%2BOntgIOlRdx6qoOLqyx3WuVpd2ZEH074o5bxTr7IebRTJsGpVdyaw75GEir4ZwZwrmiKAojkoOkvduxZAtpg8D4SAqwNb1EB0l3eOb1gMt%2FMuYpGZsouFJtPHTXssM82%2FKFkU7Gxm%2BTAheHa%2F7VyQ%2BAysgzthDcDyd9RYvU7NXmFAwh596ZEk7TtkwzAGVcoL%2FLjImPvk5q6Xl%2BKMWQDvOkVPIc0JtuC1rWIy3DUsOas8vCM%2BWYdv9KW9lElqzk5IHS6L7kkWSNb7U%3D44229d0bd83cc6954cf8ad73bc14d08a1d039d9a; expires=Wed, 17-May-2017 04:55:46 GMT; Max-Age=864000; path=/ < Set-Cookie: lbh_session=eDfxDyIM%2BJFhuIpEml2KXA9B8Wcyc4Bo8GJfD0Xr3dNzGgh%2B2QdqgZWRhFVFBguslYrnQfnmrKorJjhwM47N969Qwx1NFLintVOKhP3ivrS5BVq4Kwos59OOpklaUifDEOH1FX9BG8%2BHGX9Fn8kb2duHS%2F1BRJFnGaEyOA1qmB7sFPhsjVPAL2%2BTYHNByRvxwnA2CqaY09uKs%2FC5ui6rnYCRYvI3Q7Z6KLL8QWVlT5rs71FQ%2BYXbdQyIHgiPR7yN8JnaHMgaz4qzETfr6heE04uLfUSKjIjQMM5v0YAEK0I%3D06738da1155f8af1a61c5d13cd8cee0513d4175c; expires=Wed, 17-May-2017 04:55:47 GMT; Max-Age=864000; path=/ < Set-Cookie: lbh_session=9jlaxMJdXhuYik9BjvgSVfG2Xp8HJBLTUNeI8HNcw52ORZC5bbei%2F22YgBTWHMmym1fSQHljSs9dwUbQE5Zgx%2FIWki3S8aakHI%2BXac30JU5eI3FFLeWORwFrsJDniM%2BKCDyhUi5i2zad8aYF%2FNnndhh4yISYk0ASjKa4%2BAnQxR3fZjqK1iw44K3Oe%2FoVc4weHIYCra6ecNnMWkFzBZkLUuJ%2F1gJN0w%2FNdjFs8DERSHLteTbg2OnqjOSEmn62fYXUb%2FW6YRQblJB0J%2BElbJ%2BKIn5v5NRXAerGcIT2O%2F6t08s%3D70ec7ec53459a33b68c9fda357cfbf634fcada85; expires=Wed, 17-May-2017 04:55:47 GMT; Max-Age=864000; path=/ < Set-Cookie: lbh_session=DUK2jE22vfFQmL5vZpV8LpqFsFD0%2F1aHV2mpi6MHNOw4oEastxJGbqL70Tlq79lpD%2F41%2Bl9P%2Bz4%2B8aNESLphAr4%2BlwkEn83jPGE2J83JazLGQJC07ndgXRL7Hf%2FsXbMnyaOwpFPGRwQ7AdLvuIfX8j0lQ7gEEoAF4NQmupcPo0PeQ41gTAf3tJbusD4ONNqkLVi3lGH1qhT%2FjXbu1mpPwYdcZyU18OU3qomqbWkx%2B1RsX8vsiHjoCADs%2FIHhZaY4rBH%2BDi6oDS8JR9vgBG5ll6jN3eTlXtvRblDHE1IMHMA%3D78fb89e5fda5a29bb58f6ab3b872d9150e7ecd9b; expires=Wed, 17-May-2017 04:55:47 GMT; Max-Age=864000; path=/ < Set-Cookie: AWSELB=E93BBFC71E4DF46DDD850E2C67B1FBE52FEAA0E103B670233CA20FC7694721647519A155E8C10ED0C96618595B97A7D45BA1E9EE061A86361B235D0E008D08712CA9113D57;PATH=/;MAX-AGE=604800 < X-Powered-By: PHP/5.5.38 < X-UA-Compatible: IE=Edge,chrome=1 < Connection: keep-alive * no chunk, no close, no size. Assume close to signal end <

10165 · on May 7, 2017

There is no need for keep-alive for a single HEAD request. Why use HTTP/1.1?

   cat << eof |nc -vv nautil.us 80
   HEAD / HTTP/1.0^M
   Host: nautil.us^M
   User-Agent: curl/7.47.0^M
   Accept: */*^M
   Connection: close^M
   ^M
   eof

Anyway, it looks like they fixed the problem or I was mistaken.

I will need to find another example.

Meanwhile looking on stackexchange one can still see people running websites asking whether to block or "turn off" HEAD as recently as last year.

If a user expects every website to respond properly to a HEAD request, then the user might be occasionally "surprised". This is because not every person running a website understands or agrees how HEAD can be useful. Sadly, GET is the only method that a user can expect to work on across all websites.

kupiakos · on May 6, 2017

Isn't that violating the standard?