More

xboxnolifes · 2025-04-17T04:22:15 1744863735

They really aren't anything alike at all.

xboxnolifes · 2025-04-14T03:02:57 1744599777

Hosting a few higher CPU requirement video game servers would put you there. Something like 2 dedicated cores, 8-16GB RAM, and 100-200GB Disk for $30/mo.

xboxnolifes · 2025-04-13T17:59:44 1744567184

If you clean the floors 10 times less often you can.

xboxnolifes · 2025-04-13T09:08:14 1744535294

For starters, awareness of not letting yourself get so hungry in the first place before eating something? Tons of people in my own life push off eating until they're starving and then just grab the most convenient option. They aren't in an economically vulnerable position, It's self inflicted.

xboxnolifes · 2025-04-13T02:34:10 1744511650

Regular users usually aren't page hopping 10 pages per second. A regular user is usually 100 times less than that.

pabs3 · 2025-04-13T03:38:43 1744515523

I tend to get blocked by HN when opening lots of comment pages in tabs with Ctrl+click.

xboxnolifes · 2025-04-13T06:17:16 1744525036

Yes, HN has a fairly strict slow down policy for commenting. But, that's irrelevant to the context.

pabs3 · 2025-04-13T12:55:49 1744548949

I meant to say article pages not comment pages, but ack.

xboxnolifes · 2025-04-13T02:31:36 1744511496

> As long as a given client is not abusing your systems, then why do you care if the client is a human?

Well, that's the rub. The bots are abusing the systems. The bots are accessing the contents at rates thousands of times faster and more often than humans. The bots also have access patterns unlike your expected human audience (downloading gigabytes or terabytes of data multiples times, over and over).

And these bots aren't some being with rights. They're tools unleashed by humans. It's humans abusing the systems. These are anti-abuse measures.

immibis · 2025-04-13T09:07:55 1744535275

Then you look up their IP address's abuse contact, send an email and get them to either stop attacking you or get booted off the internet so they can't attack you.

And if that doesn't happen, you go to their ISP's ISP and get their ISP booted off the Internet.

Actual ISPs and hosting providers take abuse reports extremely seriously, mostly because they're terrified of getting kicked off by their ISP. And there's no end to that - just a chain of ISPs from them to you and you might end with convincing your ISP or some intermediary to block traffic from them. However, as we've seen recently, rules don't apply if enough money is involved. But I'm not sure if these shitty interim solutions come from ISPs ignoring abuse when money is involved, or from not knowing that abuse reporting is taken seriously to begin with.

Anyone know if it's legal to return a never-ending stream of /dev/urandom based on the user-agent?

zinekeller · 2025-04-13T09:44:36 1744537476

> Then you look up their IP address's abuse contact, send an email and get them to either stop attacking you or get booted off the internet so they can't attack you.

You will be surprised on how many ISPs will not respond. Sure, Hetzner will respond, but these abusers are not using Hetzner at all. If you actually studied the actual problem, these are residential ISPs in various countries (including in US and Europe, mind you). At best the ISP will respond one-by-one to their customers and scan their computers (and at this point the abusers have already switched to another IP block) and at worst the ISP literally has no capability to control this because they cannot trace their CGNATted connections (short of blocking connections to your site, which is definitely nuclear).

> And if that doesn't happen, you go to their ISP's ISP and get their ISP booted off the Internet.

Again, the IP blocks are rotated, so by the time that they would respond you need to do the whole reporting rigomarole again. Additionally, these ISPs would instead suggest to blackhole these requests or to utilize a commercial solution (aka using Cloudflare or something else), because at the end of the day the residential ISPs are national entites that would quite literally trigger geopolitcal concerns if you disconnected them.

immibis · 2025-04-13T14:24:09 1744554249

These the same residential providers that people complain cut them off for torrenting? You think they wouldn't cut off customers who DDoS?

op00to · 2025-04-13T14:58:45 1744556325

They’re not cutting you off for torrenting because they think it’s the right thing to do. They’re cutting you off for torrenting because it costs them money if rights holders complain.

zinekeller · 2025-04-14T04:51:04 1744606264

> They’re cutting you off for torrenting because it costs them money if rights holders complain.

Yup, I'm assuming that immibis thinks that the ones using Anubis are those ones with high legal budgets, but this is not necessarily the case here.

fc417fc802 · 2025-04-14T04:20:01 1744604401

If it's a cable company then there's also a conflict of interest.

zinekeller · 2025-04-14T05:04:07 1744607047

> These the same residential providers that people complain cut them off for torrenting?

Assume that you are in the shoes of Anubis users. Do you have a reasonable legal budget? No? From experience, most ISPs would not really respond unless either their network has become unstable as a consequence, or if legal advised them to cooperate. Realistically, at the time that they read your plea the activity has already died off (on their network), and the best that they can do is to give you the netflows to do your investigation.

> You think they wouldn't cut off customers who DDoS?

This is not your typical DDoS where the stability of the network links are affected (this is at the ISP level, not specifically your server), this is a very asymmetrical one where it seemingly blends out as normal browsing. Unless you have a reasonable legal budget, they would suggest to use RTBH (https://www.cisco.com/c/dam/en_us/about/security/intelligenc...) or a commercial filtering solution if need be. This even assumes that they're symphatetic to your pleas, at worst case you're dealing with state-backed ISPs that are known not to respond at all.

bayindirh · 2025-04-13T10:18:38 1744539518

When I was migrating my server, and checking logs, I have seen a slew of hits in the rolling logs. I reversed the IP and found a company specializing in "Servers with GPUs". Found their website, and they have "Datacenters in the EU", but the company is located elsewhere.

They're certainly positioning themselves for providing scraping servers for AI training. What will they do when I say that one of their customers just hit my server with 1000 requests per second? Ban the customer?

Let's be rational. They'll laugh at that mail and delete it. Bigger players use "home proxying" services which use residental blocks for egress, and make one request per host. Some people are cutting whole countries off with firewalls.

Playing by old rules won't get you anywhere, because all these gentlemen took their computers and work elsewhere. Now we all have are people who think they need no permission because what they do is awesome, anyway (which is not).

immibis · 2025-04-13T14:20:19 1744554019

A startup hosting provider you say - who's their ISP? Does that company know their customer is a DDoS-for-hire provider? Did you tell them? How did they respond?

At the minimum they're very likely to have a talk with their customer "keep this shit up and you're outta here"

sussmannbaka · 2025-04-13T09:12:05 1744535525

Please, read literally any article about the ongoing problem. The IPs are basically random, come from residential blocks, requests don’t reuse the same IP more than a bunch of times.

immibis · 2025-04-13T14:23:29 1744554209

Are you sure that's AI? I get requests that are overtly from AI crawlers, and almost no other requests. Certainly all of the high-volume crawler-like requests overtly say that they're from crawlers.

And those residential proxy services cost their customer around $0.50/GB up to $20/GB. Do with that knowledge what you will.

mrweasel · 2025-04-13T16:31:24 1744561884

> Then you look up their IP address's abuse contact, send an email

Good luck with that. Have you ever tried? AWS and Google have abuse mails. Do you think they read them? Do you think they care? It is basically impossible to get AWS to shutdown a customers systems, regardless of how much you try.

I believe ARIN has an abuse email registered for a Google subnet, with the comment that they believe it's correct, but no one answer last time they tried it, three years ago.

47282847 · 2025-04-13T22:38:23 1744583903

ARIN/Internet registries doesn’t maintain these records themselves, owners of IP netblocks do. Some registries have introduced mandatory abuse contact information (I think at least RIPE) and send a link to confirm the mailbox exists.

The hierarchy is: abuse contact of netblock. If ignored: abuse contact of AS. If ignored: Local internet registry (LIR) managing the AS. If ignored: Internet Registry like ARIN.

I see a possibility of automation here.

Also, report to DNSBL providers like Spamhaus. They rely on reports to blacklist single IPs, escalate to whole blocks and then the next larger subnet, until enough customers are affected.

bbor · 2025-04-13T03:59:23 1744516763

Well, that's the meta-rub: if they're abusing, block abuse. Rate limits are far simpler, anyway!

In the interest of bringing the AI bickering to HN: I think one could accurately characterize "block bots just in case they choose to request too much data" as discrimination! Robots of course don't have any rights so it's not wrong, but it certainly might be unwise.

inejge · 2025-04-13T04:17:02 1744517822

> Rate limits are far simpler, anyway!

Not when the bots are actively programmed to thwart them by using far-flung IP address carousels, request pacing, spoofed user agents and similar techniques. It's open war these days.

parineum · 2025-04-13T05:40:10 1744522810

Request pacing sounds intentionally unabusive.

j16sdiz · 2025-04-13T07:56:36 1744530996

It is not bringing down your server, but they are taking 80%+ of your bandwidth budget. Does this count as abuse?

immibis · 2025-04-13T09:09:13 1744535353

Are you at a hoster with extortionately expensive bandwidth, such as AWS, GCP, or Azure?

ithkuil · 2025-04-13T08:43:08 1744533788

Isn't that what a rate limiter would address?

mkl · 2025-04-13T09:02:19 1744534939

Not when the traffic is coming from 10s of thousands of IP addresses, with very few requests from each one: https://drewdevault.com/2025/03/17/2025-03-17-Stop-externali...

KronisLV · 2025-04-13T09:57:28 1744538248

That very much reads like the rant of someone who is sick and tired of the state of things.

I’m afraid that it doesn’t change anything in of itself and any sorts of solutions to only allow the users that you’re okay with are what’s direly needed all across the web.

Though reading about the people trying to mine crypto on a CI solution, it feels that sometimes it won’t just be LLM scrapers that you need to protect against but any number of malicious people.

At that point, you might as well run an invite only community.

bayindirh · 2025-04-13T10:24:03 1744539843

Source Hut implemented Anubis, and it works so well. I mostly never see the waiting screen. And after it whitelists me for a very long time, so I work without any limitations.

KronisLV · 2025-04-13T10:35:55 1744540555

That’s great to hear and Anubis seems cool!

I just worry about the idea of running public/free services on the web, due to the potential for misuse and bad actors, though making things paid also seems sensible, e.g. what was linked: https://man.sr.ht/ops/builds.sr.ht-migration.md

ithkuil · 2025-04-14T10:03:24 1744625004

ok, but my answer was about was how to react to request pacing.

If the abuser is using request pacing to make less request then that's making the abuser less abusive. If you're still complaining that request pacing is not pacing the requests down enough because the pacing is designed to just not bring your server down and instead make you consume money, then you can counteract that just by tuning the rate limiting even further down.

The 10s of thousands distinct IP address is another (and perfectly valid) issue, but it was not the point I answered to.

rollcat · 2025-04-13T07:47:54 1744530474

It's called DDoS. DDoS is abusive.

xboxnolifes · 2025-04-13T02:24:09 1744511049

> just sit down. nobody will know...

People don't pee standing because of some sense of masculinity or something. It's just more convenient. Only those over 50 or so by now grew up with that stigma.

tgsovlerkhgsel · 2025-04-13T02:35:19 1744511719

It's only more convenient if someone else does the cleaning.

xboxnolifes · 2025-04-13T06:18:39 1744525119

Without being too crude, there's still cleaning to be done even after sitting down. Body hair doesn't pick itself up.

fsckboy · 2025-04-15T02:43:12 1744684992

even in spring with my winter coat, i don't shed much, but your point is well taken

fsckboy · 2025-04-13T03:04:22 1744513462

unless the flag of this enlightened masculinity you are flying means "real men leave the door open", it is still true, nobody will know.

xboxnolifes · 2025-04-13T18:01:33 1744567293

There's no flag. Saying "nobody will know" in thus context implies that one would care if they did know. Thus, my response.

xboxnolifes · 2025-04-13T02:23:29 1744511009

> just sit down. nobody will know...

People don't stand pee because of some sense of masculinity or something. It's just more convenient.

xboxnolifes · 2025-04-11T03:30:15 1744342215

It's not provably correct if the comment is made toward 2025 models.

simonw · 2025-04-11T04:35:59 1744346159

Gemini 2.5 came out just over two weeks ago (25th March) and is a very significant improvement on Gemini 2.0 (5th February), according to a bunch of benchmarks but also the all-important vibes.

xboxnolifes · 2025-04-10T19:29:18 1744313358

If this isn't what you meant, then what did you mean? To me, it's exactly how I read what you said.

namaria · 2025-04-10T19:54:35 1744314875

I am sorry but that's nonsense.

I quoted the paper "Evolution through Large Models" written in collaboration between OpenAI and Anthropic researchers

"In other words, the model learns to predict plausible changes to code from examples of changes made to code by human programmers."

https://arxiv.org/pdf/2206.08896

> The idea that models can only write code if they've seen code that does the exact same thing in the past

How do you get "code that does the exact same thing" from "predicting plausible changes?"

simonw · 2025-04-10T20:18:41 1744316321

That paper describes an experimental diff-focused approach from 2022. It's not clear to me how relevant it is to the way models like Claude 3.7 Sonnet (thinking) and o3-mini work today.

namaria · 2025-04-10T21:02:11 1744318931

If do not you think past research by OpenAI and Anthropic on how to use LLMs to generate code is relevant to how Anthropic LLMs generate code 3 years later I really don't think it is possible to have a reasonable conversation about this topic with you.

simonw · 2025-04-10T21:34:50 1744320890

Can we be sure that research became part of their mainline model development process as opposed to being an interesting side-quest?

Are Gemini and DeepSeek and Llama and other strong coding models using the same ideas?

Llama and DeepSeek are at least slightly more open about their training processes so there might be clues in their papers (that's a lot of stuff to crunch through though).