I wonder why the author uses a 404 error response. I usually configure NGINX with "return 444;" which closes the connection without response. Scanners don't deserve a response. I may have wasted bytes receiving the request, but I won't waste any more once I know the request is garbage.
That was mostly just for the blog post. In reality my default vhost 301's back to the IP that sent the request. I doubt it ever does anything, but I like to think it makes hackers attack themselves in the confusion :p
I also have a fake /admin path that just contains a bunch of offensive/illegal phrases in 10 ish languages, but it was out of character for the post.
444 is a good idea though, I didn't know about that response code!
Beware though -- nginx 444 doesn't actually close the connection. At the packet level, it just does not respond.
This distinction is important if you have a load balancer in front of nginx. The LB will wait until timeout for a response, occupying a bit of stateful memory and probably causing an error which is indistinguishable from "backend application server is offline".
Yep it's great for tarpitting if you are not behind an LB.
The other problem, if you are behind an LB, is that the client (DoS attacker) will get a 503 from the LB after timeout. So, no gain even if your timeouts are reasonable.
It'd be great if you could return a custom response from nginx that would tell the LB to drop the request -- or you could move the exploit-detection logic to the LB instead of nginx, and the LB could do its own 444 equivalent.
For others who aren't interested in reading the whole thing:
The author of the post used zip-bombs, which are compressed HTTP responses that expand to 1000 times the size of the compressed data. He could send relatively small responses that would fill the requester's memory and crash the process. Beautiful.
I use "402 Payment Required“ right now, which is sent to the client. Didn't know about 444, which isn't listed on the Wikipedia page about HTTP return codes ...
I always feel like a git telling people "it's nftables now!", but it's been over a decade and folks keep using iptables as the common identifier. It's slow to change language. You're right, many of those iptables commands are utilities/scripts around nftables now.
I work in IT and it is now probably 25 years that I keep hearing that IPv6 is round the corner. "Adoption" is ~35% but what this means that in 35% of the cases, you can get to a place though IPv6. This does not mean that you must, or do. It is just the capacity.
When a technology takes 25 or so years to be mainstream it means that there is a problem somewhere ("too complicated", ...) or that there is no problem in the forst place ("iptables work fine for me", "I NAT my 10.x network", ...)
I started with ipfw, then ipchains, then iptables and now nftables. That is just on Linux.
To be fair: I just downloaded a ipfw setup and didn't give it much thought. I spend several weeks hand crafting several ipchains scripts. I spent ages with iptables and wrote a rubbish multi WAN effort and eventually ditched it for pfSense for edge. More ages for a host based effort. I also use ufw quite a bit for iptables. I use firewalld for nftables, these days.
I understand the first part -- sending requests with no host header to a spam log (or even better, don't log).
What I don't understand is the second part -- blocking those hosts. Seems pointless now that you've de-noised your logs. They're still sending packets. Saves thousands of bytes on outbound?
What about all the scan-spam on sites WITH host headers? Whatevs.
Serves a few purposes, but as you said the main objective is already done by de-noising.
The other reason I do this is because it's easy to detect that kind of scanning in HTTP logs, but not as easy for other services (ssh, ftp, smtpd, etc) without something like fail2ban, and the blanket ban applies to all of them. So, if a bot scans your HTTP server enough times, they can't go after "softer" targets later.
For scan-spam that does hit your "real" site, it's a bit more tricky as there absolutely will be false positives. You can grep for all 401/403's and add them to the list, but that will sooner or later hit a real user. So it's much more specific to the application you're hosting, where this works for just about any site. The other nice thing is that even when they scan your "real" site, they'll often hit the default host via IP scans at the same time, so you can still manage to ban them.
Running your own mail server is a great source of data to identify botnet-compromised hosts.
When I started banning IPs that send "HELO <myhostname>" for 24 hours, I cut the number of fake login/registration attempts on a bunch of my web-based projects by ~50%.
It works the other way, too. Temporary bans on hosts that try to access /wp-admin (I don't run Wordpress anywhere) cut my email spam significantly.
(Some day, I'll get around to implementing a real reputation tracking system, with exponential ban lengths.)
What are some best practices to deal with in on a PC? I mean, by default pretty much everything is closed and it's not like there is any "legitimate traffic" at all, but over time it still accumulates some open ports by running stuff in docker and elsewhere: a jupyter console here, an MPD UI there — most of the time I don't even think about the fact that I'm constantly scanned by someone, and remember only after I see some logs and get disturbed by the number of rude guests.
On my internet facing hosts, I use the firehol level 2 and level 3 block sets along with blocking all CN IP space that I can accurately identify. My logs are eerily quiet.
I tried firehol for some time and quite liked it (much more than iptables). This was after shorewall started to fade out (and is now abandoned or so).
I had some problems to get community support and it seems that activity around firehol is fading away and I am not sure whether this is because this is a complete, finished product, or because it is abandoned.
ipfilter requires seperate kernel modules for various options. On most common distros they have been built and installed by default and will just be loaded at runtime I'd assume. If you run a highly customized kernel you probably have had the issue before when doing something with the firewall.