Hacker News new | past | comments | ask | show | jobs | submit login

AFAICT this isn't possible, unless you're OK with showing everyone the poisoned text.

TTBOMK there's nothing here that "detects botness" of an individual request, because in the limit, that's impossible -- if an attacker has access to many different IPs to make requests from (and many do), then any sequence of bot-generated requests from different IPs is indistinguishable from the same set of requests made by actual living, breathing humans (and vice versa).

So how does Anubis work against bots if it can't actually detect them? Because of the economics behind them: To justify creating a bot in the first place, you need to scrape a lot of pages, so paying a small electricity cost per page means you will need to pay a lot overall. Humans pay this too, but because we request a much smaller number of pages, the overall cost is negligibly low.




You glossed over how it works. Bots don't maintain session cookies to avoid rate limits, so they will have to do the challenge over and over again, whereas humans keep the session cookie and amortise the cost of the challenge over multiple requests.


You're right, Anubis sets a week-long cookie:

> exp: The token's expiry week after the token was issued

This is surprising to me because it effectively nullifies Anubis.

> Bots don't maintain session cookies to avoid rate limits

Maybe they don't today, but there's absolutely nothing stopping them adding this, so if Anubis gets any traction, in two days' time they will.


You can turn Anubis into Proof of Storage by appending large amounts of random data to the session cookie and then hashing it on each request. Combined with per-session-cookie limits, you can effectively force the LLM bots to store all the data you want if they want to scrape your site.

Maybe suckerpinch can work it into a sequel to Harder Drive: Hard drives we didn't want or need[0].

[0] https://www.youtube.com/watch?v=JcJSW7Rprio


Great clip! I love the idea of making internet miscreants reluctantly store your data!

Slightly more seriously though, I think for the Proof of Storage idea to pack enough punch to be a deterrent, you'd need the cookies to be quite large. Is there a way to avoid needing them to send you all the bytes each time? Because that will cost you (the site owner) too.

I had the idea of sending requestors a challenge like "Flip the bit at position i of the large chunk of stored data, hash the result and include the hash in your headers". Instead of the site owner keeping the full stored data, they would just keep the RNG seed used to generate it -- this saves on storage, though it still requires them to do the same time-consuming hash computation done by the requestor.


Anubis remains effective because the token is subject to a rate limit and you could additionally limit token lifetime to some maximum number of requests if you wanted to.

All of these factors (total requests, rate of requests, associated IPs, associated browser fingerprints) tie in to detecting bad players, who should receive more frequent and larger challenges.


You could make poisoned text extremely small, so it isn't visible to humans that percieve web page optically, but visible to crawlers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: