Hacker News new | past | comments | ask | show | jobs | submit login
Jerks on the Internet: what my first DDoS taught me (sergiomattei.com)
208 points by sergiomattei on April 3, 2019 | hide | past | favorite | 91 comments



If you have some kind of expensive request, use fair queuing by IP address. If someone has a request pending, more requests from the same source go behind IP addresses with fewer requests. So each IP address competes with itself, not others.

For some reason, this isn't done much. I have it on a site of mine. I didn't notice for a week that someone was making a huge number of requests and not even waiting for the task to complete. It didn't hurt anything.


It’s not used because any serious attack is going to come from multiple unrelated sources, think a botnet full of compromised IoT devices hitting your server with 20TB a second worth of requests. So you might as well plan for that scenario instead.


That's an entirely different kind of attack. If your server is being flooded with 20TB/s of traffic, there's nothing you can do on the box itself to fix things. Whatever you do, the legitimate requests won't be able to get through.

If however your DOS attack is an attacker making lower-volume CPU-expensive requests on your site, there's plenty of things to help mitigate the assault.


That's the key difference between a DOS and a DDOS, yes. Since I can't load the OP article on this machine for some reason, I can't review the symptoms described. The title of the article does say DDOS, however, which is focused more on saturating bandwidth, not CPU.


The headline and article itself do indeed (incorrectly) call it a DDOS attack. The attack itself was very much non-distributed, and involved a single malicious actor `curl`ing an expensive API endpoint in a loop.


> The title of the article does say DDOS, however, which is focused more on saturating bandwidth, not CPU.

Sadly DDoS has at times been used as a blanket term that also includes DoS.


Those requests seldom get far enough to start significant server activity. It's the ones that look like legit requests that are the problem.


Only a small percentage of a volumetric attack has to get through to take you down. Also, depending on the attack, they probably all are "legit requests."


If the source IP is fake, the request can't get beyond the first packet. Those get filtered out easily. That's Cloudflare's main offering.


Seldom isn’t good enough. When it comes to security you have to be right 100% of the time. An attacker only has to be right once. Good luck.


That's not true. Security is always a trade-off between effort invested and probability of a possible breach.

There is no 100% secure system.


This is not true though. Security is about mitigating threat at some cost. There are some threats you can't mitigate cost-effectively. Some are unmitigable at all.


its about layers which includes real security and sometimes trickery. the statement is true though about 100% and the attacker only once but some tack maybe would be in order. there are procedures that can reduce vectors and so minimizing damage and minimizing successes.


true, fq wont solve the problem completely. but it will solve the elphant- vs. mice-flow problem i.e. not make your interactive traffic suffer queueing latency from a simultaneous big transfer.


Has anyone seen a 20TB attack?


The biggest DDoS attack to date took place in February of 2018. This attack targeted GitHub, a popular online code management service used by millions of developers. At its peak, this attack saw incoming traffic at a rate of 1.3 terabytes per second (Tbps), sending packets at a rate of 126.9 million per second.

https://www.cloudflare.com/learning/ddos/famous-ddos-attacks...


Never mind that the units would probably 20tb/s, it wouldn’t be impossible. We saturated our gig ethernet with 8 image upload workers from a flask app. We could have saturated a 20tb line (if such a thing existed) with 160000 workers. I’ve seen bot nets for rent with something like 30k bots. Which means you might be able rent enough machines with enough bandwidth to saturate 20tb/s for maybe $1k/hr. multiply by 8 if you really care about 20TB/s. I do think 20TB starts getting to be big players though, especially since anyone with that much bandwidth is going to have teams dedicated to mitigating these kinds of problems.


And even if you havent had time or knowledge to setup fair queuing, this attack was very light, and an awk logparsing script that append ips to a firewall would have been more then sufficient.

Certainly no need for cloudflare


How have you set it up, technology-wise?


Hope 10% of this might be useful to you.

1) the three most important metrics for any endpoint are error rate, lantency, throughput. So i hope you've learned to not be surprised that abnormal throughput (either via ddos attacks or friendly n+1 queries) is a common error condition.

2) banning ip addresses is useless and often counter productive. If possible, short-circuit requests from an ip so you can isolate them and then determine how much damaging knowledge they know (while letting them think they're still hitting your real service)

3) feature development is a great goal, but don't forget that the most important feature is availability. Spending an extra 30 minutes to consider things like pagination and so on, end up being worth it if you think you might get attacked more than 0 times per year.


>short-circuit requests from an ip

What does that mean?


I’d guess serving them a static placeholder or cached result to prevent them from hammering the DB?


In an electrical circuit you have the defined path a current is supposed to take. A short circuit is when the current takes a shorter path to the ground. So A->B is the normal route a request would make. Short circuiting in this regard means A->C for that one ip. The goal is to reduce load on your server (IE you return a page that the attacker think is still valid, but has reduced load on your infra).


Good steps to take in the article. But I'd also add that Django REST Framework, which they seem to be using, has throttling capabilities built-in which I would have attempted before changing the API: https://www.django-rest-framework.org/api-guide/throttling/

Adding pagination seems reasonable, but it may have broken clients who didn't expect pagination to be there.


Hi! Thanks for the recommendation, I did go with that approach


Consider limiting HTTP access to Cloudflare's IP range. Looking up DNS history reveals the real IP address for direct attacks.

    curl -k https://134.209.46.107/products/ -H "Host: api.getmakerlog.com"


Seconded. Argo is also great and makes this even easier (for $5/month).

https://www.cloudflare.com/products/argo-tunnel/


>With growth also come the assholes

That pretty much describes the whole history of internet.


... the whole history of our species.


> Therefore, when requesting the endpoint, a massive SQL request would be made, freezing the server while the items were fetched + serialized into JSON (a Django REST Framework performance weak point).

No caching?


I was surprised to not see that in the things I will do / fix list. Caching those JSON API endpoints would have dramatically changed the ability to absorb that DDoS. If merely adding pagination brought it back to functioning (from 100% CPU to ~60%), it wasn't a very large attack and caching would have trivially handled it. Either way, even with Cloudflare and pagination, they should prioritize adding caching on the API at some point in the near-term. The relief on the database will be considerable and it'll buy a lot of API usage growth runway at almost no cost.

Since they're already using Nginx, if they don't want to bother with learning anything else, it's a couple of hours of research to learn how to set up rock solid basic caching using Nginx. It'll quickly get you 85% of the way on caching, until you need something better. Set Nginx loose to do one of the things it's very good at.


Can anyone shed some light on why someone would go out of their way to conduct an attack like this? Is DoSing production web applications just a hobby for black hat jackasses with nothing better to do?


Everyone has mentioned the destructive nature of some people. However I would also point out that some use a DoS as a means to cover up trails in logs and distract the admin from another type of attack. If you're paying attention to the DoS, you might not notice any logs or alerts about someone downloading 6GB of data from your database.


Lumping DDoS in with hacking has always seemed disingenuous. skiddies DDoS things. Why? Who knows. Boredom? lulz? Perceived personal slight? "Justice"?


Pretty sure it's jackasses with nothing better to do. Either for revenge or the lolz. Probably an impulsive decision since the attacker gave up so fast.

There's not much you can achieve for your own gains with a DDoS. I've heard of rare cases of extortion or underhanded business practices to hurt competitors.


its a rush. its why some people with money still shoplift.

I guess humans are closer to monkeys than we like to believe and still like throwing poop at each other.


I don't know much about "blackhats" but I know people, so the answer is yes.


I helped build a website for a cryptocurrency. We were DDoSed with the attacked contacting us via telegram to demand around $10,000 for him to stop. We told him to stop acting like a fucking child and implemented Cloudflare.


I've seen people with substantial IT skills who simply have destructive instincts. This one guy I know was hyperactive and made clearings in forested areas to hang out in with his spare time.


In general, about 10% of people are just drizzleshits. Plain and simple.


It's fun. Or maybe you are angry. Or maybe you just want to test to break it down. Or maybe you just want the programmers to feel sad, maybe because they were boring


Fun in a same way as trashing your neighbor's car just because you like to see sparks and glass shards flying all around. Rather an indication of a sad frustrated life


>> to see sparks and glass shards flying all around

Well, no. If you _only_ liked sparks and glass shards flying all around - you'd buy yourself a car and destroyed it yourself.

Destroying the car of a neighbour implies something going between you and the neighbour.


It's usually kids doing it for the same reason they graffiti or vandalise other peoples stuff.


>which hosts other in-development apps too

Don't do that

>I trust my users completely.

Don't do that

>Prioritize bugfixes over new features

while that is a nice thought, it is unlikely to be so simply followed. "the road to hell is paved with good intentions"

>people editing other’s tasks for example, haha

That's a very cavelier attitude to take. Quite frankly that should have been baked in from the get go with tests to verify.

You see this situation as someone being a jerk. Someone could have accidentally done the same thing due to the lack of planning on multiple levels.


It was actually all unit tested - bugs happen though, and that one was a particularly nasty one.

It was patched though and there was no evidence that anyone ever used it.


Glad we at Cloudflare could help!


I will also say Cloudflare has saved sites I’ve managed numerous times. I’ve been null routed by major data centers for a few Gbps and they wouldn’t give us any options. It was always “wait.”

Most of the DDoS providers in this space are insanely expensive so I’m very glad Cloudflare has existed!


I hope another takeaway from this was to check at a lower level much sooner. At least the way it reads you spent quite some time suspecting your app being at fault or the tech stack goofing out. Htop should have shown high cpu usage from the db process right away, traffic was probably more than usual; access logs are always a good thing to check too.


The lesson here appears to be that the balance between feature development and fixing technical debt isn't obvious - these things are measured historically and you only know that you've got it wrong after the fact. In fact, if you experience the exact same codebase and don't suffer the ddos, did you get the balance right after all?


-A INPUT -p tcp --dport <port> -m limit --limit 25/minute --limit-burst 100 -j ACCEPT


Why not just block all visitors with user agent curl?

Let me just say, that's a really dumb DDOS attack.

Edit: one other thing, we spell it psych and not sike.


>we spell it psych and not sike.

I have been informed by someone a generation younger than me that "the kids" are intentionally spelling it "sike" these days.

Wikionary lists it as a variant of "psych."

https://en.wiktionary.org/wiki/sike


I guess that depends on who is doing the spelling. Seems to me that the 80's spelling of it was "sike". This website seems to agree:

http://www.inthe80s.com/glossary.shtml


oh..gosh. well, that makes sense, in a way.

I confess I did have to google how to spell 'psych' properly. started from 'pysch'.


I always was sure it was spelled psych, since it's basically short for "psyched out" and I had always seen it spelled that way.

But the topic recently came up in conversation with my teenaged niece, she claims it's definitely spelled "sike" even though she's fully aware of the etymology.

I chocked it up to a "kids these days" generational kinda thing, like how we got "phat" in the 70s and "kewl" in the 90s, both of which are now in the Oxford English Dictionary.


>I chocked it up

chalked it up ;)


I mostly learned how to spell it recently from looking up filming location for the show Psych.


Spelling it sike is leet.


Can anyone recommend monitoring solutions to help identify these issues?


1. CloudFlare. That's like the first thing I normally do for any new project. (which he then switched to).

2. Your clients should not be the first one to alert your site is down. Pingdom does a great job to alert you before your customers/users do.

3. The author brought up some of the queries weren't paginated and running an expensive SQL query, so there's a few options. Since he's using Django there's some Django specific options in this list.

A) Implement a backend cache that will return back the JSON query (throw it on a redis). Cache and return that from the backend.

B) Add a Django throttle to the view (can be done via IP / username).

C) Enable logged in users only to access endpoint (harder to do on the fly though, since you need to make changes to your frontend). If a logged in user is causing you hell, turn off signups and kick that user off.

D) Have CloudFlare cache a public response for you on endpoints and return it (you need to make sure the JSON should always be the same for every API call though, which is very very risky).

E) Author brought up DRF JSON serialization is slow. Another alternative is to use Serpy which sees a 50-100x speedup. I'd only recommend that for complex JSON payloads. Not because it's hard, but because it's additional complexity.

The author is also using Dokku which is fine for most projects, but you'd imagine at some point it'll probably be switched onto a load balancer + web machines. Alerting can also be set on the load balancer level if it goes above % threshold.

Since he's using Dokku (so by that definition docker), they could probably use a log aggregation service that would allow him to access his logs much faster to see what's going on. Papertrail, etc.

Monitoring CPU usage would also be helpful here, but I'm not sure if Dokku allows that.


Strongly recommend Cloudflare in combination with some uptime monitoring service. Cloudflare gives you so many options on ways to mitigate attacks, and you get CDN and other services for free. Pretty great.


I recently started reading the book "Release It", which includes a lot of great techniques to avoid problems like the one described in the article at "design time". [1]

1: https://pragprog.com/book/mnee/release-it


Website doesn't work with JS turned off. It's even worse: It's one of those sites that redirects you to another page to tell you to turn JS on but doesn't allow you to go back in your browser. So even if you do turn on JS you have to go get the link again. (Many scientific journals have the same problem but with cookies instead of JS.)


Why is the text for this article 2.5" wide in a normal browser? It makes it annoying to read :(


Was the site designed only for smart phones?


Thanks for the feedback! Will modify and enlarge the font a little bit.


The font size is fine. The width of the text block is absolutely ridiculous.

http://webtypography.net/2.1.2

https://practicaltypography.com/line-length.html


I opened the print dialog, and discovered that this short piece would take 65 pages to print. Yes, that is a sign that the width of the text block is absolutely ridiculous.

I couldn't figure out what's going on with my laptop, because the inspector short-circuited this goofy behavior. On a larger screen, I notice that the ".card-content" div has a ridiculous 150px of padding. It is nested in a ".card.blog-content-card" div, which has an atrocious 200px of margin. That in turn is nested in a ".blog-post-container.container" div, which has a merely unseemly 93px margin.

After 886px is used for white space, there's not much screen left for text. Might want to fix that?


Looks much better now, thanks for the update.


turning off #blog-post's padding and the .blog-content-card's left & right margins makes the layout nice on a desktop.


Hello, this is a response to an earlier comment regarding Thorne/Blandford’s book. If a reading group exists by now, please let me know.


Nginx rate limit the endpoint, log offenders, auto fail2ban.


The UFW rule not having worked for you may have been Docker's fault.

If your gateway/webserver is running in a Docker container and you've published port 80/443, Docker will set up it's own IPTables rules, bypassing anything you've set up using UFW.


Good read. Some thoughts: - Add throttling at nginx level - Proactive monitoring and alerts needed - Should fool the attacker into querying a fake endpoint


Hah. I remember the first time my server got DDoS'ed. I was scared shitless and I was stressing so much. Glad everything turned out OK in the end.


What doesn’t kill you makes you stronger.


My version of the saying is - what does not kill you cripples you.


“What doesn’t kill you makes you smaller” - Super Mario


get yourself a low orbit ion cannon & render all your enemies baseless


this is ironic. I just having a DDOS from a botnet on China.


That's not ironic.


Does anybody have experience in getting DDOS'd? All i see are 3 ip addresses (offending) in the screenshot and it makes me wonder how many is typical?

I have never been ddos's and all I ever receive are failed ssh attempts with simple passwords. Pretty much the easiest thing to tackle. But I'd love to know from DDOS'd people how their attacks looked?

From cloudflare logs all I see is a single IP address being blocked (multiple times? Or is it their multiple actions being blocked?).

EDIT: Thanks all, these were helpful answers.!


I've gotten a good number of DDoSes sent my way. For the ones I've noticed, there's usually pretty good IP diversity. Volumetric attacks are either like tcp syn floods, spoofed from everywhere, or udp reflection spoofed from you to reflecting hosts, which have pretty good diversity. If you want to survive these, you need to have either a big connection, or packet filtering by someone with a big connection. As of a few years ago, 10Gbps was enough to ignore casual attacks, as long as your IP stack is up for it -- you may need to do a bit of tuning and make sure you've got recent syn handling. On the other hand, if you're running a 10Gbps connection, be sure you're not a reflection target -- be extremely careful about running UDP servers that send significantly larger replies than the requests, if they're exposed on public ips.

Layer 7 attacks are different; you can't spoof those, so you don't get perfect distribution -- but there are lots of ways to distribute simple requests. If the requests are coming from a botnet, there's usually a lot of control about what the requests look like, but if they're coming in through tricking other software (which is unfortunately common), then at least you'll likely have some identifying information; it's dumb to block things by user-agent, but it can be pretty effective. The way to handle these is really to try to make sure the effort your server spends is roughly on par with the effort the client spends; and try to make sure you're running the best optimized TLS handshakes you can (ECC certs are easier on servers than RSA).


I don't know where the line goes between DOS and DDOS, but in my youth I had a virtual server attacked by an acquaintance for a few days. They still won't admit to doing it, but all available evidence at the time pointed to them.

It started as a TCP SYN flood, which I had never seen before -- but since it originated from only two addresses, I could block them. Once I did, more addresses joined in on the attack. I figured out what was happening and how to prevent the connection table from filling up completely under the SYN flood, and then the attack changed shape into a UDP flood, from yet more addresses.

(Many of the addresses corresponded to free shell hosts and managed webhosts with easily exploitable PHP scripts. I tried to inform the people running those hosts that they were being used for an attack, but the vast majority of them seemed to ignore that. A couple of people responded and we could figure out kinda what had happened and what scripts were involved. Remember, kids, that connecting a machine to the internet is a great responsibility.)


They tend to look like that in your Apache logs when they target at the web application level.

Once you're flooded and your process load spikes, you can't see what your web server has run out of workers to handle.

Netstat will give you a better picture as far as IPs/connections once you get the process load down so you can actually run anything.


Anybody know why blocking the offending IP didn’t work?


Most likely because he was trying to block the IP at a point where it was forwarded by some other service. At that point, the offending IP would only exist in the x-http-forwarded-for http header and not as the source address of the request.


Yup! This was it.


I have had this before similar situation to op they found a heavy page that hammered the database until it went down. Banning their ip at the firewall stopped them, they came back a few days latter using a bunch of proxies about 1000. I wrote a script like Fail2Ban that detected the ip and blocked it worked a treat.

I was also attacked by a single ip that sent 10gbit of nonsense to apache, I had to contact my isp to get them to block the ip down stream from me, if they had used a bot net I don't think I would have been able to stop it.


By definition, DDoS is Distributed. From my experience working on a Layer4 DDoS Protection solution, a typical case often range from 1000 to 100k flows.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: