More

xena · 2025-04-15T19:09:28 1744744168

I can claim that my car is able to fly. That does not mean pressing the gas pedal makes it generate lift.

handfuloflight · 2025-04-15T19:28:04 1744745284

What a disingenuous comparison. The contention here is organizational politics, not physics.

xena · 2025-04-15T17:08:25 1744736905

Patches are welcome!

xena · 2025-04-14T16:04:59 1744646699

This looks like a marine biologist desperately wanted to keep their job in spite of the "nothing that's not AI" mandate so they made up some bullshit.

vlovich123 · 2025-04-14T16:35:18 1744648518

They’ve been working on decoding dolphin sounds for a long time - Thad was telling me about this project in 2015 and it had been ongoing for a while. One challenge is doing this realtime is extremely difficult because of the frequency the dolphin speech occurs in. And they want to do this realtime which adds to the difficulty level. The other challenge on the AI side is that traditional AI is done using supervised learning whereas dolphin speech would require unsupervised learning. It would be interesting to learn more about how Gemma is helping here.

Philpax · 2025-04-14T18:09:48 1744654188

That is a surprisingly cynical take; the marine biologists in question seemed pretty enthusiastic in the video!

dogleash · 2025-04-14T19:21:21 1744658481

I'm not saying this is the case here, but every time I've been in internal or promotional videos related to my work, I've been performing for a camera. I'm not playing a theater character, but it's also not what you'd get if you dropped by my desk and asked me the same questions. Calling it acting might seem strong. But it's not not acting. So it's acting.

refulgentis · 2025-04-14T21:05:34 1744664734

Does the general principle "we're always performing, in a particular costume, for our audience" help confirm the excited marine biologist desperately wanted to keep their job in spite of a "nothing that's not AI" mandate, so they made up some bullshit?

Separately, could invoking it anytime someone appears excited be described as distrustful of human sincerity or integrity?

After working through these exercises, my answers are no/yes, which leaves me having to agree its clearly cynical. (because "define:cynical" returns "distrustful of human sincerity or integrity")

xena · 2025-04-13T09:57:18 1744538238

Xe here. If I had to guess in two words: timing and luck. As the G-man said: the right man in the wrong place can make all the difference in the world. I was the right shitposter in the right place at the right time.

And then the universe blessed me with a natural 20. Never had these problems before. This shit is wild.

underdeserver · 2025-04-13T10:48:23 1744541303

Squeeze that lemon as far as it'll go mate, god speed and may the good luck continue.

xena · 2025-04-13T08:57:03 1744534623

Honestly it's a fair assumption on bot filtering software that no more than like 8 people will share an IPv4. This is going to make IP reputation solutions hard. Argh.

xena · 2025-04-13T04:26:08 1744518368

You mean with the art assets extracted?

  $ mkdir -p ./tmp/anubis/static && anubis --extract-resources=./tmp/anubis/static

xena · 2025-04-13T00:20:45 1744503645

Apparently user-agent switchers don't work for fetch() requests, which means that Anubis can't work with people that do that. I know of someone that set up a version of brave from 2022 with a user-agent saying it's chrome 150 and then complaining about it not working for them.

xena · 2025-04-13T00:05:52 1744502752

Main author of Anubis here:

Basically what they said. This is a hack, and it's specifically designed to exploit the infrastructure behind industrial-scale scraping. They usually have a different IP address do the scraping for each page load _but share the cookies between them_. This means that if they use headless chrome, they have to do the proof of work check every time, which scales poorly with the rates I know the headless chrome vendors charge for compute time per page.

ArinaS · 2025-04-13T08:12:24 1744531944

Is there any particular date/time you'll introduce a no-JS solution?

And are you going to support older browsers? I tested Anubis with https://www.browserling.com with its (I think) standard configuration at https://git.xeserv.us/xe/anubis-test/src/branch/main/README.... and apparently it doesn't work with Firefox versions before 74 and Chromium versions before 80.

I wonder if it works with something like Pale Moon.

xena · 2025-04-13T08:43:51 1744533831

It will be sooner if I can get paid enough to be able to quit my day job.

vhcr · 2025-04-13T04:22:30 1744518150

I used to have an ISP that would load balance your connection between different providers, this meant that pretty much every single request would use a different IP. I know it's not that common, but that would mean real users would find pages using anubis unusable.

lifthrasiir · 2025-04-13T01:09:26 1744506566

Do you think that, if this behavior of Anubis gets well-known and Anubis cookies are specifically handled to avoid pathological PoW checks, does Anubis need a significant rework? Because if it's indeed true this hack wouldn't last much longer and I have no further idea to avoid user-visible annoyances.

solid_fuel · 2025-04-13T01:35:14 1744508114

Well, if they rework things so that requests all originate from the same IP address or a small set of addresses, then regular IP-based rate limits should work fine right?

The point is just to stop what is effectively a DDoS because of shitty web crawlers, not to stop the crawling entirely.

lifthrasiir · 2025-04-13T02:49:04 1744512544

> Well, if [...], then regular IP-based rate limits should work fine right?

I'm not sure. IP-based rate limits have a well-known issue with shared public IPs for example. Technically they are also more resource-intensive than cryptographic approaches too (but I don't think that's not a big issue in IPv4).

dharmab · 2025-04-13T03:07:50 1744513670

> then regular IP-based rate limits should work fine right?

These are also harmful to human users, who are often behind CGNAT and may be sharing a pool of IPs with many thousands of other ISP subscribers.

specialist · 2025-04-13T04:20:02 1744518002

> Weigh the soul of incoming HTTP requests using proof-of-work to stop AI crawlers

Based on the comments here, it seems like many people are struggling with the concept.

Would calling Anubis a "client-side rate limiter" be accurate (enough)?

runxiyu · 2025-04-13T07:28:30 1744529310

Probably not

xena · 2025-04-12T23:55:49 1744502149

OMG lol I forgot that I left that in. Hilarious. I think I'm gonna keep it.

didgeoridoo · 2025-04-13T00:12:23 1744503143

I didn’t even blink at this, my inner monologue just did a little “well, naturally” in a Redditor voice and kept reading.

mkl · 2025-04-13T00:05:48 1744502748

BTW Xe, https://xeiaso.net/pronouns is 404 since sometime last year, but it is still linked to from some places like https://xeiaso.net/blog/xe-2021-08-07/ (I saw "his" above and went looking).

xena · 2025-04-13T00:24:07 1744503847

I'm considering making it come back, but it's just gotten me too much abuse so I'm probably gonna leave it 404-ing until society is better.

IsTom · 2025-04-13T21:40:38 1744580438

Maybe there is some space on the market for a Proof of Emapthy widget

ranger_danger · 2025-04-14T20:30:56 1744662656

I have seen some projects that require acknowledging certain politically-charged statements before they will allow you to participate, like "you must agree that sovereign country X is at war with aggressor country Y".

cendyne · 2025-04-13T02:16:26 1744510586

That's what route-specific Anubis is for.

frontalier · 2025-04-13T05:13:26 1744521206

parent is referring to a different kind of abuse

1oooqooq · 2025-04-13T09:06:44 1744535204

or you just not cranking up the required proof of work effort enough.

xena · 2025-04-07T19:05:58 1744052758

Do you respect robots.txt so administrators can block this tool?

canogat · 2025-04-07T19:19:09 1744053549

Should I be blocked if I ask Claude Desktop to lower the prices in all of my Craigslist ads by 10%?

randunel · 2025-04-07T19:16:26 1744053386

Do user agents doing work for users need to respect robots.txt? If yes, does chrome?

what · 2025-04-08T04:07:49 1744085269

Any scraper is also a “user agent doing work for users”. Which ones should respect robots.tx?

randunel · 2025-04-08T07:16:51 1744096611

Does the user agent fit the definition of a web crawler? If so, then observe robots.txt. This one does not, see https://en.m.wikipedia.org/wiki/Web_crawler