I have never felt such a greater sense of inadequacy in cyber security as I have after reading this article. This is simply amazing the level of sophistication Eddie was in terms of an attack on the service, and the mitigation techniques used were things I never would have considered. I thought zip/tar bombs were just relics of yesteryear that older folks talk about when they discuss how fun it was to prank the new hires.
Serious question: how does one begin to gain the knowledge necessary to mitigate such an attack, and is it something that developers should be more familiar with?
Zip bombs are a classic way to crash any web service that allows you to upload files, modern AV will sometimes fuck up and bite into the file (normally in the legacy fields since they require 'brand name' 'well known' antivirus like 'Norton')
Especially if you are aware that they open the files to "extract info" from them. You can modify the file extension to the correct type and let it rip.
I found this: https://github.com/abdulfatir/ZipBomb which I will be looking into today! Are zipbombs typically something a developer should be actively writing to protect against? Or does a library like Helmet typically provide protection against these attack vectors?
Protection generally comes in the fact that the AV will explode or your library will explode. You just need to ensure that such an explosion does not destroy your service as well.
And try to make it explode quickly -- if it fails slowly, it can be uses to DDoS, by getting all your worker threads to spend most of their time on your files.
Not trying to be lazy here, but what are some legitimate resources to begin learning? I'm willing to wade through the complexity, so white papers or research is also very welcome.
Owasp presentations, Blackhat presentations, etc. usually give you a general idea of what people are seeing and then you research the specific attack(s) in depth. This isn't the sort of thing you can commit to memory.
Pretty easy explanation: you've bumped into a few of the ~dozen people who are crawling hidden services for research or law enforcement purposes.
When you publish your HSDir they'll come and crawl, and chances are none of them were expecting a 50PB archive.org mirror and just got stuck.
It's likely that once the operators of each crawler realized this HS was an archive.org mirror they stopped the crawls.
The early version of a crawler I ran across hidden services would have tripped up in exactly this way[0]
Everything else in this post is either a misunderstanding of Tor[1] or plain paranoia.
[1] the top exit nodes have little to do with who is crawling or attacking a hidden service, France and Germany feature heavily in nodes because of the many cheap Tor-friendly hosts, there is nothing 'unusual' about unnamed nodes and the AS confusion is just someone doing a good job of staying anonymous - thanks for reporting them
I'd say not holding to the standards of robots.txt and 403-Forbidden is quite malicious, just not evil or bad. If you build a crawler, you should play nice. But bot A-D were easily discouraged.
Eddie however is another problem. It overloads the network, doesn't crawl and doesn't parse the responses. This is not crawler behaviour...
The rest of the post is solid inductive reasoning (from my perspective): the bot is identifiable by his behaviour. It has a faster response time that a source-relay-source roundtrip. Thus the bot must originate there.
This is supported that the anonymous relays were set up just before the attack, all at the same time. And after the attack stopped the majority of all traffic through the relays stopped.
There are also ways to keep your registration private without resorting to fraud. Though probably a number of people think of this as the 'easy' solution.
> I'd say not holding to the standards of robots.txt and 403-Forbidden is quite malicious
Most hidden services don't publish robots files. The only ones that do are the proxy services (which are hidden services but not usually 'hidden'). The purpose of the proxying is to find, discover and monitor what are usually illegal or malicious services.
I don't think there are legitimate crawlers on hidden services - there are a couple of drug market search engines but they identify themselves outside of robots.txt
It's really difficult to run a large-scale hidden service because of this - you need to be able to throttle or block connections but not based on the inbound circuit. You also need to setup guards (which OP makes no mention of)
> It overloads the network, doesn't crawl and doesn't parse the responses.
It's likely adding those later responses into a crawl queue that is tens of thousands of URLs long.
Overloading the network is unintentional, usually your crawling is throttled by your circuit.
> I'd say not holding to the standards of robots.txt and 403-Forbidden is quite malicious, just not evil or bad. If you build a crawler, you should play nice.
> A few months ago we stopped referring to robots.txt files on U.S. government and military web sites for both crawling and displaying web pages (though we respond to removal requests sent to info@archive.org). As we have moved towards broader access it has not caused problems, which we take as a good sign. We are now looking to do this more broadly.
> If I could easily tear down the entire tunnel from the remote client to my hidden service, then the delay to rebuild the tunnel would mitigate the resource exhaustion attack ... For example, if I see hostile activity from 127.0.0.1:12345, then I want to close the entire Tor connection associated with this port ... forcing him to renegotiate the entire tunnel.
This seems like a great suggestion for Tor. I hope the author will get in touch via the mailing list and see what solutions might be possible.
This is the second time in a week now I've seen this guy talking about Tor on HN. He just seems to not understand Tor.
> He's exploiting a vulnerability in the Tor daemon
This is a vulnerability present in literally any proxy and is a limitation of the operating system. If you open more sockets than you can open, you can't open any more. You can not have anonymous unlinkable connections and have tracking psudeonyms together.
> The TorStatus page has no country associated with the ASN information
I know that for atlas.torproject.org we use the MaxMind GeoLite GeoIP service. If this is not listed here, no information will show. This is more common than you would think, especially for Tor relays which are commonly hosted on smaller ASs than MaxMind care about.
The mitigation goes against my instincts to tar-pit; but in this case it is the server not the client that is resource limited. Very interesting, and bloody annoying that someone would make such an effort to break a tool such as this.
Same here - but the site was asking me to identify myself with a cert. So the other way round to what you are describing. Did not have the time to follow through - but it's certainly very weird behavior. Could be some bad guys trying to identify who is stealing their show...
Most likely there's just something hosted on the same IP address that makes legitimate use of the "client certificates" feature of TLS. In order for that to work, the server has to express an interest in client certs, and that happens early in the TLS handshake, IIRC before SNI has been resolved - so even if you only want to use them on one domain, your server will always ask for them.
The way it's meant to work, the server can specify which certificate authorities it accepts client certs from, and your browser will only prompt you to pick a cert if you have one loaded from one of those CAs - if you don't, you won't even know the server's asking; in practice, some browsers will show the dialog in any case. ISTR some versions of Safari act like that.
(I ran into that at work - we were setting up a web API authenticated with TLS client certs, and started getting bug reports from (largely non-technical) users, completely befuddled by these dialogs that had started popping up for them on our human-facing domains; we ended up provisioning a dedicated IP just for the API to work around it.)
The author has strong opinions about the security of standard TLS/SSL. He activated client side certifications for good security. The setup is mentioned in one of his blog posts, his opinion in another.
The fake IP address whois was very interesting. While fake whois for domain names are common and not that big, fake IP address whois means that there was a ISP out there that endanger their peering. Feels similar to when a CA gives out bad certificates, and I wish RIPE would act similar to how browsers do it. There is few enough ipv4 address that we can afford to be picky about it in clear cases like this.
I played at some point with Tor bot code and I got rates up to tens of thousands of requests per second and hundreds of megabits per second. So the attacks don't sound too serious in this case. Also modifying Tor itself to allow higher data rates is possible, if you're already using anonymous systems. Then you can avoid the slowness caused by onion hops.
This might be a stupid question but it sounds like those attackers access the Tor Server directly (without using relays). If this is intact the case, why does he not just ban those IPs from those offending hidden and seemingly private relays? Wouldn't that solve the problem until they get a new ip?
This is also as I understood it. It is possible to have a single hop circuit to an exit, but I don't think you can do single hops to hidden services yet. The hidden service would also have to be explicitly configured to be a single hop hidden service (where it acts as its own rendezvous point).
I don't see any strong evidence that the box itself must be a high speed relay, and in fact, I believe that it is his service that chooses the 3 hop path to the rendezvous point and his Tor daemon that cryptographically verifies that path.
I was not implying one should do otherwise. I was just pointing out that servers that respond to GET will not always respond to HEAD as expected. Some sites treat it the same as GET. Others may not allow it. For example, Amazon responds with 405 MethodNotAllowed.
Anyway, it looks like they fixed the problem or I was mistaken.
I will need to find another example.
Meanwhile looking on stackexchange one can still see people running websites asking whether to block or "turn off" HEAD as recently as last year.
If a user expects every website to respond properly to a HEAD request, then the user might be occasionally "surprised". This is because not every person running a website understands or agrees how HEAD can be useful. Sadly, GET is the only method that a user can expect to work on across all websites.
Serious question: how does one begin to gain the knowledge necessary to mitigate such an attack, and is it something that developers should be more familiar with?