Hacker News new | past | comments | ask | show | jobs | submit login
Block web scanners with ipset and iptables (nbailey.ca)
110 points by yabones on Nov 10, 2022 | hide | past | favorite | 44 comments



I wonder why the author uses a 404 error response. I usually configure NGINX with "return 444;" which closes the connection without response. Scanners don't deserve a response. I may have wasted bytes receiving the request, but I won't waste any more once I know the request is garbage.


That was mostly just for the blog post. In reality my default vhost 301's back to the IP that sent the request. I doubt it ever does anything, but I like to think it makes hackers attack themselves in the confusion :p

I also have a fake /admin path that just contains a bunch of offensive/illegal phrases in 10 ish languages, but it was out of character for the post.

444 is a good idea though, I didn't know about that response code!


Beware though -- nginx 444 doesn't actually close the connection. At the packet level, it just does not respond.

This distinction is important if you have a load balancer in front of nginx. The LB will wait until timeout for a response, occupying a bit of stateful memory and probably causing an error which is indistinguishable from "backend application server is offline".


That is actually cool, it is a tarpit for these bots!

On a well configured site the LB timeouts should be short enough anyway.

But it is a risk, especially on classic DOS attacks.


Yep it's great for tarpitting if you are not behind an LB.

The other problem, if you are behind an LB, is that the client (DoS attacker) will get a 503 from the LB after timeout. So, no gain even if your timeouts are reasonable.

It'd be great if you could return a custom response from nginx that would tell the LB to drop the request -- or you could move the exploit-detection logic to the LB instead of nginx, and the LB could do its own 444 equivalent.


You will like this more elaborate attacks you can do to those bots https://www.hackerfactor.com/blog/index.php?/archives/762-At...


Good read, thanks.

For others who aren't interested in reading the whole thing:

The author of the post used zip-bombs, which are compressed HTTP responses that expand to 1000 times the size of the compressed data. He could send relatively small responses that would fill the requester's memory and crash the process. Beautiful.


I use "402 Payment Required“ right now, which is sent to the client. Didn't know about 444, which isn't listed on the Wikipedia page about HTTP return codes ...


It is listed on wikipedia, but under "Unofficial Codes -> nginx" as it is nginx specific and not standardized.


Not to detract from the article, but we should be using nftables in 2022. :-)

https://wiki.nftables.org/wiki-nftables/index.php/Moving_fro...


He probably is in a way. iptables is now just a wrapper around nftables and nftables understand the iptables syntax.


I always feel like a git telling people "it's nftables now!", but it's been over a decade and folks keep using iptables as the common identifier. It's slow to change language. You're right, many of those iptables commands are utilities/scripts around nftables now.


Thi is snot very much different from "use IPv6".

I work in IT and it is now probably 25 years that I keep hearing that IPv6 is round the corner. "Adoption" is ~35% but what this means that in 35% of the cases, you can get to a place though IPv6. This does not mean that you must, or do. It is just the capacity.

When a technology takes 25 or so years to be mainstream it means that there is a problem somewhere ("too complicated", ...) or that there is no problem in the forst place ("iptables work fine for me", "I NAT my 10.x network", ...)


If you've set up your rules from scratch using nftables then they are not compatible.


Where is that wrapper? In user space or in the kernel?


aiui the iptables-nft wrapper is userspace, it translates iptables syntax into nftables and applies those nftables rules.


I started with ipfw, then ipchains, then iptables and now nftables. That is just on Linux.

To be fair: I just downloaded a ipfw setup and didn't give it much thought. I spend several weeks hand crafting several ipchains scripts. I spent ages with iptables and wrote a rubbish multi WAN effort and eventually ditched it for pfSense for edge. More ages for a host based effort. I also use ufw quite a bit for iptables. I use firewalld for nftables, these days.


I've been using fail2ban to kill this for years. Seems to be quite effective: https://github.com/fail2ban/fail2ban


If possible, report the hosts you block using f2b to AbuseIPDB or similar projects. That way we'd be collectively better able to hinder this abuse.


There's crowdsec to share info about IP in a collaborative way: https://www.crowdsec.net/


This.

Trusted combo: Fail2Ban + 7G firewall

https://perishablepress.com/7g-firewall-nginx/#download


Actually you don't need to respond to bogus http clients at all: https://gist.github.com/radupotop/2aef0bdc0ccbd3a706044e3598...


I understand the first part -- sending requests with no host header to a spam log (or even better, don't log).

What I don't understand is the second part -- blocking those hosts. Seems pointless now that you've de-noised your logs. They're still sending packets. Saves thousands of bytes on outbound?

What about all the scan-spam on sites WITH host headers? Whatevs.


Serves a few purposes, but as you said the main objective is already done by de-noising. The other reason I do this is because it's easy to detect that kind of scanning in HTTP logs, but not as easy for other services (ssh, ftp, smtpd, etc) without something like fail2ban, and the blanket ban applies to all of them. So, if a bot scans your HTTP server enough times, they can't go after "softer" targets later.

For scan-spam that does hit your "real" site, it's a bit more tricky as there absolutely will be false positives. You can grep for all 401/403's and add them to the list, but that will sooner or later hit a real user. So it's much more specific to the application you're hosting, where this works for just about any site. The other nice thing is that even when they scan your "real" site, they'll often hit the default host via IP scans at the same time, so you can still manage to ban them.

It's not perfect, but it's good enough :)


Running your own mail server is a great source of data to identify botnet-compromised hosts.

When I started banning IPs that send "HELO <myhostname>" for 24 hours, I cut the number of fake login/registration attempts on a bunch of my web-based projects by ~50%.

It works the other way, too. Temporary bans on hosts that try to access /wp-admin (I don't run Wordpress anywhere) cut my email spam significantly.

(Some day, I'll get around to implementing a real reputation tracking system, with exponential ban lengths.)


This is an important aspect of it - you can use information on one angle of attack to protect other devices.

Do note that doing this kind of thing can block people on Tor, because Tor is used for attacks quite often, also.


Another neat trick is to add a link in robots.txt and instruct bots to stay away. If they don’t, you add them to your blocklist


I was confused at what you were saying at first. For those that may also be confused:

You can add something like this to your robots.txt:

    User-agent: *
    Disallow: /some/unguessible/url
And then you ban any IPs/bots that visit that URL.


What are some best practices to deal with in on a PC? I mean, by default pretty much everything is closed and it's not like there is any "legitimate traffic" at all, but over time it still accumulates some open ports by running stuff in docker and elsewhere: a jupyter console here, an MPD UI there — most of the time I don't even think about the fact that I'm constantly scanned by someone, and remember only after I see some logs and get disturbed by the number of rude guests.


I have used wail2ban and just started using ipban


On my internet facing hosts, I use the firehol level 2 and level 3 block sets along with blocking all CN IP space that I can accurately identify. My logs are eerily quiet.


I tried firehol for some time and quite liked it (much more than iptables). This was after shorewall started to fade out (and is now abandoned or so).

I had some problems to get community support and it seems that activity around firehol is fading away and I am not sure whether this is because this is a complete, finished product, or because it is abandoned.


I don't actually use the firehol scripts - I use the source lists with my own custom iptable/pf scripts.


Just added some Digital Ocean IP blocks to my firewall config.


This seems like a neat solution; however, my issue is that all my websites are behind Cloudflare nowadays. Hence, iptables is useless ¯\_(ツ)_/¯.


Could this be easier done without install ipset command, by modify /proc/net/xt_recent/ directly?

https://ipset.netfilter.org/iptables-extensions.man.html


Why not using the way easier to configure i(f)tables? It's so much more straightforward and flexible.


What are i(f)tables? Google does not suggest anything.


Sorry i meant to say nftables.


I get the impression the author missed out on zcat for reading gzipped files.


fail2ban is excellent. No need for anything else. Configure it for all your server logs. It'll handle the iptables or nftables config for you.



ipset requires a separate kernel module.


ipfilter requires seperate kernel modules for various options. On most common distros they have been built and installed by default and will just be loaded at runtime I'd assume. If you run a highly customized kernel you probably have had the issue before when doing something with the firewall.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: