But how much of this "bad actor" interaction is countered with tracking? And how...

Szpadel · 2024-12-30T14:40:35 1735569635

I can tell you about my experience with blocking traffic from scalpers bots that were very active during pandemic.

All requests produced by those bots were valid ones, nothing that could be flagged by tools like fail2ban etc (my assumption is that it would be the same for financial systems).

Any blocking or rate limiting by IP is useless, we saw about 2-3 requests per minute per IP, and those actors had access to ridiculous number of large CIDRs, blocking any IP caused it instantly replace it with another.

blocking by AS number was also mixed bag, as this list growed up really quickly, most of that were registered to suspicious looking Gmail addresses. (I feel that such activity might own significant percentage of total ipv4 space)

This was basically cat and mouse game of finding some specific characteristic in requests that matches all that traffic and filtering it, but the other side would adapt next day or on Sunday.

aggregated amount of traffic was in range of 2-20k r/s to basically heaviest endpoint in the shop, with was the main reason we needed to block that traffic (it generated 20-40x load of organic traffic)

cloudflare was also not really successful with default configuration, we had to basically challenge everyone by default with whitelist of most common regions from where we expected customers.

So best solution is to track everyone and calculate long term reputation.

berkes · 2024-12-31T11:37:05 1735645025

TBC: I wasn't saying that F2B is a silver bullet. Not at all.

But that protection depends on the use case. And that in many of my use-cases, a simple f2b with a large hardcoded list of URL paths I guarantee to never have, will drop bot-traffic with 90% or more. The last 10% then split into "hits because the IP is new" and "other, more sophisticated bots". Bots, in those cases are mostly just stupid worms, just trying out known WP exploits, default passwords on often used tools (nextcloud, phpmyadmin, etc) and so on.

I've done something similar with a large list of known harvest/scraper bots, based on their user-agent (the nice ones), or their movements. Nothing complex, just things like "/hidden-page.html that's linked, but hidden with css/js.

And with spam bots, where certain post-requests can only come from repeatedly submitting the contact form.

This, obviously isn't going to give any protection against targeted attacks. Nor will it protect against more sophisticated bots. But in some -in my case, most- use-cases, it's enough to drop bot-traffic significantly.

codingminds · 2024-12-30T20:58:34 1735592314

I've learned that Akamai has a service that deals with this specific problem, maybe this might interest you as well: https://www.akamai.com/products/content-protector

stareatgoats · 2024-12-30T15:45:25 1735573525

Blocking scalper bot traffic by any means, be it by source or certified identification seems a lost cause, i.e. not possible because it can always be circumvented. Why did you not have that filter at point of sale instead? I'm sure there are reasons, but to have a battery of captchas and a limit on purchases per credit card seems on the surface much more sturdy. And it doesn't require that everyone browsing the internet announce their full name and residential address in order to satisfy the requirements of a social score ...

Szpadel · 2024-12-30T16:03:26 1735574606

The product they tried to buy what not in stock anyways, but their strategy was to constantly try anyways, so in case it would become in stock they would be the first to get it. It was all for guest checkout, so no address yet to validate nor credit card. Because they used API endpoints used by the frontend we could not use any captcha at this place because of technical requirements.

As stated before the main reason we needed to block it was volume of the traffic, you migh imagine identical scenario for dealing with DDoS attack.

jsdwarf · 2024-12-30T23:50:15 1735602615

Disabling guest checkout would have been my weapon of choice or at least requiring the user to enter an email address to so that they are notified when the product becomes available.

dspillett · 2024-12-31T00:23:23 1735604603

> Because they used API endpoints used by the frontend we could not use any captcha at this place because of technical requirements

A time sensitive hash validating each request makes it a bit harder for them without significant extra work on your part. Address sensitive is much more effective but can result in issues for users that switch between networks (using your site on the move and passing between workers networks, for instance).

bornfreddy · 2024-12-30T16:18:33 1735575513

> Because they used API endpoints used by the frontend we could not use any captcha at this place because of technical requirements.

That doesn't compute... Captcha is almost always used in such setups.

It also looks like you could just offer an API endpoint which would return if the article is in stock or not, or even provide a webhook. Why fight them? Just make the resource usage lighter.

I'm curious now though what the articles were, if you are at liberty to share?

Szpadel · 2024-12-30T17:01:06 1735578066

We had captcha, but it was at later stage of the checkout process. This API endpoint needed to work from cached pages, so it could not contain any dynamic state in request.

Some bots checked product page where we had info if product is in stock (although they tried heavenly to bypass any caches by putting garbage in URL). This kind of bots also scaled instantly to thousands checkout requests when product become available with gave no time for auto scaling to react (this was another challenge here)

This was easy to mitigate so it didn't generate almost any load on the system.

I believe we had email notification available, but it could be too high latency way for them.

I'm not sure how much I can share about articles here, but I can say that those were fairly expensive (and limited series) wardrobe products.

shaky-carrousel · 2024-12-30T21:55:09 1735595709

Hm, is probably too late, but you could have implemented in your API calls some kind of proof of work. Something that's not too onerous for a casual user but it is hard for someone trying multiple requests.

Szpadel · 2024-12-31T01:46:08 1735609568

This was actually one of my ideas how to solve it, observed behaviour strongly suggested that all those thousands of IP addresses where used by single server. Even small PoW with this volume should heavly influence their capacity. But we decided that we did not want to affect performance of mobile users. We later learned that such strategy is also used by cloudflare js check

miki123211 · 2024-12-30T18:59:14 1735585154

> Why fight them? Just make the resource usage lighter.

Because you presumably want real, returning customers, and that means those customers need to get a chance at buying those products, instead of them being scooped up by a scalper the millisecond they appear on the website.

thatcat · 2024-12-30T23:38:08 1735601888

maybe offer the item for preorder ?

geysersam · 2024-12-30T20:55:46 1735592146

Sounds like a dream having customers scooping up your products the millisecond the appear on the website. They should increase their prices.

dspillett · 2024-12-31T00:26:37 1735604797

No matter what the price, they would still have the "As stated before the main reason we needed to block it was volume of the traffic" problem that was stated above, for a popular item. In fact increasing the base price might attract even more scalpers and such.

sesm · 2024-12-30T17:52:46 1735581166

I remember people doing this with PS5 when they were in short supply after release.

jillyboel · 2024-12-31T04:53:52 1735620832

The best solution is to put everyone in a little cage and point and keep a permanent record of everything they do. This doesn't mean it's a desirable solution.

shwouchk · 2024-12-30T16:47:59 1735577279

Require a verified account to buy high demand items.

cute_boi · 2024-12-30T16:31:51 1735576311

why not charge people? This is the only solution I can think of.

throwaway99210 · 2024-12-30T14:22:02 1735568522

> E.g. a large fai2ban rule to just ban anything that attempts to HTTP GET /admin.php or /phpmyadmin etc, even just once, gets rid of almost all nefarious bot traffic.

unfortunately fail2ban wouldn't even make a dent in the attack traffic hitting the endpoints in my day-to-day work, these are attackers utilizing residential proxy infrastructure that are increasingly capable of solving JS/client-puzzle challenges.. the arms race is always escalating

JohnMakin · 2024-12-30T15:05:44 1735571144

we see the same thing, also with a financial company, the most successful strategies we’ve seen is making stuff like this extremely expensive for whoever it is if we see it, and they stop or slow down to a point it becomes not worth it and they move on. sometimes that’s really all you can do without harming legit traffic.

josephcsible · 2024-12-30T19:21:26 1735586486

Such a rule is a great way to let malicious users lock out a bunch of your legitimate customers. Imagine if someone makes a forum post and includes this in it:

  [img]https://example.com/phpmyadmin/whatever.png[/img]

RiverCrochet · 2024-12-31T03:41:47 1735616507

That would be in the body of the request. OP is talking about URLs in the actual request, which is part of the header.

While I don't have experience with a great number of WAFs I'm sure sophisticated ones let you be quite specific on where you are matching text to identify bad requests.

As an aside, another "easy win" is assuming any incoming HTTP request for a dotfile is malicious. I see constant unsolicitied attempts to access `.env`, for example.

berkes · 2024-12-31T11:00:32 1735642832

A lot of modern standards rely on .well-known urls to convey abilities, endpoints, related services and so on.

In my case, I never run anything PHP so I'll just plain block out anything PHP (same for python, lua, activedirectory etc). And, indeed, .htaccess, .env etc. A rather large list of hardcoded stuff that gets an instant-ban. It drops the bot-traffic with 90% or more.

These obviously aren't targeted attacks. Protecting against those is another issue alltogether.

josephcsible · 2024-12-31T04:47:12 1735620432

When legitimate users viewed that forum post, their browsers would, in the course of loading the image, attempt to HTTP GET /phpmyadmin/whatever.png, with that being the URL in the actual request in the header.

mattpallissard · 2024-12-30T15:26:14 1735572374

That's not the same type of bot net. Fail 2 ban simply is not going to work when you have a popular unauthenticated endpoint. You have hundreds of thousands of rps spread across thousands of legitimate networks that. The requests are always modified to look legitimate in a never ending game of whack-a-mole.

You wind up having to use things like tls fingerprinting with other heuristics to identify what to traffic to reject. These all take engineering hours and require infrastructure. It is SO MUCH SIMPLER to require auth and reject everything else outright.

I know that the BigCo's want to track us and you originally mentioned tracking not auth. But my point is yeah, they have malicious reasons for locking things down, but there are legitimate reasons too.

fijiaarone · 2024-12-30T22:34:24 1735598064

Easy solution to rate limit. Require initial request to get 1 time token with a 1 second delay And then require valid requests to include the token. The token returned has a salt with something like timestamp and ip. That way they can only bombard the token generator.

get /token

Returns token with timestamp in salted hash

get /resource?token=abc123xyz

Check for valid token and drop or deny.

int0x29 · 2024-12-31T00:28:07 1735604887

As at least one person working on this has pointed out in this thread: their adversaries have IP blocks and ASNs.

sangnoir · 2024-12-30T19:28:15 1735586895

> You wind up having to use things like tls fingerprinting

...and we've circled back to the post's subject - a version of curl that impersonates browsers TLS handshake behavior to bypass such fingerprinting.

miki123211 · 2024-12-30T18:56:39 1735584999

This depends on what you're fighting.

If you're fighting adversaries that go for scale, AKA trying to hack as many targets as possible, mostly low-sophistication, using techniques requiring 0 human work and seeing what sticks, yes, blocking those simple techniques works.

Those attackers don't ever expect to hack Facebook or your bank, that's just not the business they're in. They're fine with posting unsavory ads on your local church's website, blackmailing a school principal with the explicit pictures he stores on the school server, or encrypting all the data on that server and demanding a ransom.

If your company does something that is specifically valuable to someone, and there are people whose literal job it is to attack your company's specific systems, no, those simple techniques won't be enough.

If you're protecting a Church with 150 members, the simple techniques are probably fine, if you're working for a major bank or a retailer that sells gaming consoles or concert tickets, they're laughably inadequate.

jsnell · 2024-12-30T15:39:10 1735573150

The question is a bit of a non sequitur, since this is not tracking. The TLS fingerprint is not a useful tracking vector, by itself nor as part of some composite fingerprint.

fijiaarone · 2024-12-30T22:37:19 1735598239

The point is that you have to use an approved client (eg browser, os) with an approved cert authority that goes through approved gatekeepers (eg Cloudflare, Akamai)

jsnell · 2024-12-31T06:54:15 1735628055

That seems pretty unlikely to be the original point of https://news.ycombinator.com/item?id=42549415, which mentions none of that, and doesn't even have directionally the same concerns.

But also, what you wrote is basically nonsense. Clients don't need "an approved cert authority". Nor are there any "approved gatekeepers", all major browsers are equally happy connecting to your Raspberry Pi as they are connecting to Cloudflare.

tialaramex · 2024-12-30T14:39:18 1735569558

A big problem is that where we have a good solution you'll lose if you insist on that solution but other people get away with doing something that's crap but customers like better. We often have to mandate a poor solution that will be tolerated because if we mandate the better solution it will be rejected, and if we don't mandate anything the outcomes are far worse.

Today for example I changed energy company†. I made a telephone call, from a number the company has never seen before. I told them my name (truthfully but I could have lied) and address (likewise). I agreed to about five minutes of parameters, conditions, etc. and I made one actual meaningful choice (a specific tariff, they offer two). I then provided 12 digits identifying a bank account (they will eventually check this account exists and ask it to pay them money, which by default will just work) and I'm done.

Notice that anybody could call from a burner and that would work too. They could move Aunt Sarah's energy to some random outfit, assign payments to Jim's bank account, and cause maybe an hour of stress and confusion for both Sarah and Jim when months or years later they realise the problem.

We know how to do this properly, but it would be high friction and that's not in the interests of either the "energy companies" or the politicians who created this needlessly complicated "Free Market" for energy. We could abolish that Free Market, but again that's not in their interests. So, we're stuck with this waste of our time and money, indefinitely.

There have been simpler versions of this system, which had even worse outcomes. They're clumsier to use, they cause more people to get scammed AND they result in higher cost to consumers, so that's not great. And there are better systems we can't deploy because in practice too few consumers will use them, so you'd have 0% failure but lower total engagement and that's what matters.

† They don't actually supply either gas or electricity, that's a last mile problem solved by a regulated monopoly, nor do they make electricity or drill for gas - but they do bill me for the gas and electricity I use - they're an artefact of Capitalism.