Hacker News new | past | comments | ask | show | jobs | submit | xena's comments login

I can claim that my car is able to fly. That does not mean pressing the gas pedal makes it generate lift.

What a disingenuous comparison. The contention here is organizational politics, not physics.

Patches are welcome!

This looks like a marine biologist desperately wanted to keep their job in spite of the "nothing that's not AI" mandate so they made up some bullshit.

They’ve been working on decoding dolphin sounds for a long time - Thad was telling me about this project in 2015 and it had been ongoing for a while. One challenge is doing this realtime is extremely difficult because of the frequency the dolphin speech occurs in. And they want to do this realtime which adds to the difficulty level. The other challenge on the AI side is that traditional AI is done using supervised learning whereas dolphin speech would require unsupervised learning. It would be interesting to learn more about how Gemma is helping here.

That is a surprisingly cynical take; the marine biologists in question seemed pretty enthusiastic in the video!

I'm not saying this is the case here, but every time I've been in internal or promotional videos related to my work, I've been performing for a camera. I'm not playing a theater character, but it's also not what you'd get if you dropped by my desk and asked me the same questions. Calling it acting might seem strong. But it's not not acting. So it's acting.

Does the general principle "we're always performing, in a particular costume, for our audience" help confirm the excited marine biologist desperately wanted to keep their job in spite of a "nothing that's not AI" mandate, so they made up some bullshit?

Separately, could invoking it anytime someone appears excited be described as distrustful of human sincerity or integrity?

After working through these exercises, my answers are no/yes, which leaves me having to agree its clearly cynical. (because "define:cynical" returns "distrustful of human sincerity or integrity")


Xe here. If I had to guess in two words: timing and luck. As the G-man said: the right man in the wrong place can make all the difference in the world. I was the right shitposter in the right place at the right time.

And then the universe blessed me with a natural 20. Never had these problems before. This shit is wild.


Squeeze that lemon as far as it'll go mate, god speed and may the good luck continue.

Honestly it's a fair assumption on bot filtering software that no more than like 8 people will share an IPv4. This is going to make IP reputation solutions hard. Argh.

You mean with the art assets extracted?

  $ mkdir -p ./tmp/anubis/static && anubis --extract-resources=./tmp/anubis/static

Apparently user-agent switchers don't work for fetch() requests, which means that Anubis can't work with people that do that. I know of someone that set up a version of brave from 2022 with a user-agent saying it's chrome 150 and then complaining about it not working for them.

Main author of Anubis here:

Basically what they said. This is a hack, and it's specifically designed to exploit the infrastructure behind industrial-scale scraping. They usually have a different IP address do the scraping for each page load _but share the cookies between them_. This means that if they use headless chrome, they have to do the proof of work check every time, which scales poorly with the rates I know the headless chrome vendors charge for compute time per page.


Is there any particular date/time you'll introduce a no-JS solution?

And are you going to support older browsers? I tested Anubis with https://www.browserling.com with its (I think) standard configuration at https://git.xeserv.us/xe/anubis-test/src/branch/main/README.... and apparently it doesn't work with Firefox versions before 74 and Chromium versions before 80.

I wonder if it works with something like Pale Moon.


It will be sooner if I can get paid enough to be able to quit my day job.

I used to have an ISP that would load balance your connection between different providers, this meant that pretty much every single request would use a different IP. I know it's not that common, but that would mean real users would find pages using anubis unusable.

Do you think that, if this behavior of Anubis gets well-known and Anubis cookies are specifically handled to avoid pathological PoW checks, does Anubis need a significant rework? Because if it's indeed true this hack wouldn't last much longer and I have no further idea to avoid user-visible annoyances.

Well, if they rework things so that requests all originate from the same IP address or a small set of addresses, then regular IP-based rate limits should work fine right?

The point is just to stop what is effectively a DDoS because of shitty web crawlers, not to stop the crawling entirely.


> Well, if [...], then regular IP-based rate limits should work fine right?

I'm not sure. IP-based rate limits have a well-known issue with shared public IPs for example. Technically they are also more resource-intensive than cryptographic approaches too (but I don't think that's not a big issue in IPv4).


> then regular IP-based rate limits should work fine right?

These are also harmful to human users, who are often behind CGNAT and may be sharing a pool of IPs with many thousands of other ISP subscribers.


> Weigh the soul of incoming HTTP requests using proof-of-work to stop AI crawlers

Based on the comments here, it seems like many people are struggling with the concept.

Would calling Anubis a "client-side rate limiter" be accurate (enough)?


Probably not

OMG lol I forgot that I left that in. Hilarious. I think I'm gonna keep it.

I didn’t even blink at this, my inner monologue just did a little “well, naturally” in a Redditor voice and kept reading.

BTW Xe, https://xeiaso.net/pronouns is 404 since sometime last year, but it is still linked to from some places like https://xeiaso.net/blog/xe-2021-08-07/ (I saw "his" above and went looking).

I'm considering making it come back, but it's just gotten me too much abuse so I'm probably gonna leave it 404-ing until society is better.

Maybe there is some space on the market for a Proof of Emapthy widget

I have seen some projects that require acknowledging certain politically-charged statements before they will allow you to participate, like "you must agree that sovereign country X is at war with aggressor country Y".

That's what route-specific Anubis is for.

parent is referring to a different kind of abuse

or you just not cranking up the required proof of work effort enough.

Do you respect robots.txt so administrators can block this tool?

Should I be blocked if I ask Claude Desktop to lower the prices in all of my Craigslist ads by 10%?

Do user agents doing work for users need to respect robots.txt? If yes, does chrome?

Any scraper is also a “user agent doing work for users”. Which ones should respect robots.tx?

Does the user agent fit the definition of a web crawler? If so, then observe robots.txt. This one does not, see https://en.m.wikipedia.org/wiki/Web_crawler

Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: