Hacker News new | past | comments | ask | show | jobs | submit login

Same thing happened to me and my service (https://next-episode.net) almost 2 years ago.

I wrote a HN post about it as well: https://news.ycombinator.com/item?id=26105890, but to spare you all the irrelevant details and digging in the comments for updates - here is what worked for me - you can block all their IPs, even though they may have A LOT and can change them on each call:

1) I prepared a fake URL that no legitimate user will ever visit (like website_proxying_mine.com/search?search=proxy_mirroring_hacker_tag)

2) I loaded that URL like 30 thousand times

3) from my logs, I extracted all IPs that searched for "proxy_mirroring_hacker_tag" (which, from memory, was something like 4 or 5k unique IPs)

4) I blocked all of them

After doing the above, the offending domains were showing errors for 2-3 days and then they switched to something else and left me alone.

I still go back and check them every few months or so ...

P.S. My advice is to remove their URL from your post here. This will not help with search engines picking up their domain and ranking it with your content ...




Might I suggest a spin on this: instead of blocking the IPs, consider serving up different content to those IPs.

You could make a page that shames their domain name for stealing content. You could make a redirect page that redirects people to your website. Or you could make a page with absolutely disgusting content. I think it would discourage them from playing the cat and mouse game with you and fixing it by getting new IPs.


One possibility: Serve different content, but only if the user agent is a search engine scraper. Wait a bit to poison their search rankings, then block them.


... be careful with this.

Assuming you've monetized your content with ads, depending on your ads provider, this may have deleterious effects on your account with that provider, as they may then assume you're trying to game ads revenue.


The mirror is almost certainly running their own ads, given they strip the JavaScript out.


I've tried this with zip bombs, but I can't tell how well it worked out.


Wait what? Care to follow on this hypothetical topic please?


zip bombs are files that when unzipped expand to enormous sizes. I'm not sure if OP put one to be downloaded for the offender to kill their disk space, or if you could stream one hoping the client browser/scraper would attempt to decompress and crash for memory or disk outages?

That's my read on it anyway.


Did the same things for spam bots :p


> Or you could make a page with absolutely disgusting content.

Not if you value the people who might move to the real domain.


You could do this without effecting normal traffic depending on uniqueness of ip doing the scraping.

Love the idea.


I think you missed the point - if people show up at $PROXY expect nice stuff but see junk, then they won't move over to $REAL and instead blame $REAL.

E.g. you'd like some way to redirect people from $PROXY site to $REAL site, and disgusting content on $PROXY won't do that - it'll reflect poorly on $REAL


If you can identify the crawler - you can provide 'dynamic' content for that specific user context.


It's a proxy, so there's no "crawler". It's just an agent relaying to the user. Passing something to this proxy agent just passes it directly to the user.


If those IPs are VPN services, you might be negatively affecting all VPN users in addition to the proxy.


"Or you could make a page with absolutely disgusting content." You've never heard of Rule 34, have you...


obviously somebody too young to have seen the method of using an http redirect to the goatse hello.jpg for unwanted requests

edit: or when somebody embed-links your image inside some forum, replace the original filename with the contents of hello.jpg


As soon as you have a few of their IPs, look them up on ipinfo.io/1.2.3.4 and you'll find they probably belong to a handful of hosting firms. You can get each firm's entire IP list on that page and add all of those CIDRs to your block list. Saves you needing to make 30K web requests.

In most countries in the western world, there are 3-4 major ISPs and this is where 99% of your legit traffic comes from. Regular people don't browse the web proxying via hosting centres as Cloudflare will treat them with suspicion on all the websites they protect.


The site seems to be hosted on OVH cloud. OP should report this to them.

https://www.ovh.com/abuse/

Found the hosting information from here: https://host.io/us.to


Consider reaching out to Afraid.org first, https://freedns.afraid.org/contact/

They are the ones providing the subdomain


THIS ^^


For 2) you mean you loaded it from the adversary's proxy site, just to clarify?


Yes, constructed the honeypot URL using the proxy site and called it (thousands of times) so I can get them to fetch it from my server through their IP so I can log it.


They literally proxy your website? I thought they'd cache it... that makes more sense now in your statement that you hit their website with a specially formatted url. Since they pass that through to you you can filter on that.

Also: since you say 4k-5k IPs... any of them from cloud providers? And specific location?


No cloud providers as far as I'm aware.

They were all from the same 4-5 ASN networks, all based in Russia.


If you happen to use Cloudflare.... Cloudflare -> Firewall rules -> Russia JS Challenge (or block)


Residential proxy botnet.


Why do they bother doing this domain proxy stuff in the first place?


High quality content with a good standing in Google => unique and quality impressions => more revenue from the ads they insert in the content.


There is also the potential to use it as a watering hole for more sophisticated or subversive measures where they subtly change what you post to promote something you don't actually promote (so at some point they deviate from pure proxy to mitm).


Also for (2), any worries that your own providers might imagine you're trying to mount some half-baked DOS campaign?


Wasn't really worried about that.

I didn't do it as a super quick burst, but in a space of multiple hours.

First because the proxy servers were super slow and second - I couldn't automate it - their servers had some kind of bot detection which would catch me calling the URLs through script.

Instead, I installed a browser extension which would automatically reload a browser tab after specified timeout (I've set it to 10 sec or something) and I opened like 50 tabs of the honeypot URL and left it there to reload for hours ...


Look out as this is not optimal.

Since they will fingerprint your browser. But it looks like they were people with low IQ, so you were fine.


Side note: great idea for a website. This could be really helpful. You got a new user here.


I have to agree, my SO has been looking for something like this for a long time. Signing up today!


Wow, hadn't seen this before. Awesome site!


Thanks!


>4) I blocked all of them

Don't block them. Show dicks instead


Once you have their IP addresses you can make them serve anything you want. Set your imagination free.

For starters: copyright-infringing material.


Unless you hold the necessary rights to the copyrighted material, that would make you a copyright infringer yourself.


How would they prove that when they label every content as it was their own?


Presumably they'll tell the copyright holder that sue them where they got it from, and provide evidence for that, and then the copyright holders will (also) sue the original source.


Makes me wonder if you could switch serving content based on the URLs. So they redirect back to your website. Or display images marked as copyrighted.


I tried but couldn't redirect back to my website as they stripped / rewrote all JS.


You could have a "stolen content" pure HTML/CSS banner that gets removed by Javascript. Only proxy site visitors will see the banner because the proxy deleted the Javascript.


some people like me will see the "stolen content" banner on the original website. And attackers can trivially remove it as soon as they get aware of it.


Would it be possible to hide a hash/encoded URL somewhere in JS and delete the site/redirect if the hash/encoded URL contained something unexpected?


Thanks for the advice. I will give a go to some of these. p.s. I can't remove the URL as the post is not editable anymore. I'm just waking up... in Australia.


The mod can though, if you email him at hn@ycombinator.com.


8chan like every forum ever has dumb moderators who dont know how to do their job / over extend their hand (and the moderation position of web forums seems to attract people with certain mental disorders that make them seek out perceived microinjustices which the definition thereof changes from day to day)

there were a bunch of sites mirroring 8chan to steal content

these were useful because they had both a simpler / lighter / better user interface (aside from images being missing), and posts / threads that were deleted would stay on the mirrors. being able to see deleted posts / threads was highly useful as the moderation on such sites tends to be utterly useless and the output of a random number generator. it was hilarious reading "zigforum" instead of "8chan" in all the posts as the mirror replaced certain words to thinly veil their operation. they even had a reply button that didnt seem to work or was just fake.

tl;dr the web is broken and only is good when "abused" by proxy/mirrors


Instead of blocking by IP, just check SERVER_NAME/HTTP_SERVER variables in your backend/web server (or even in JavaScript of the page check window.location.hostname) and in case those include anything but original hostname, redirect to the original website (or serve different content with a warning to the visitor). If you have apache2/nginx this can be easily achieved by creating a default virtualhost (which is not your website), and additionally creating explicitly your website virtualhost. Then the default virtualhost can have a proper redirect while serving any other hostname.

Those variables are populated by the browser, unless proxying server is rewring them, your web-server will be able to detect imposter and serve him/her with a redirect. If rewrites are indeed in place, then check in the frontend. Blocking by IP is the last option if nothing else works.


As the OP mentioned, JS is stripped and URLs are being written, so I doubt either of those approaches will work.


Making js essential is not that hard, right? Just "display: none" on the root element, which is removed by js :)

More sophisticated options can been found in other comments.


Forcing all users of your website to use JavaScript to get around a scammer is pretty heavy-handed.


Presumably OP would only have to do this for a limited time, until the scammer gives up and moves on to an easier target. It's not the best, but I don't think it's as bad as you say.


Just explain why in a way that vanishes with JS enabled. Like other have said it'll not need to be used for long.


most websites these days already use javascript, and all modern browsers already support it. unless you're some really niche turbonerd website, nobody is going to notice or care...


Show me one website that today really works without javascript.



I've been surfing without javascript since 2015. Most websites continue to work fine without it (though some aesthetic breakage is pretty standard). About 25% of sites become unusable, usually due to some poorly implemented cookie consent popup. I don't feel like I'm missing out on anything by simply refusing to patronize these sites. I will selectively turn JS on in some specific cases where dynamic content is required to deliver the value prop.


Same, I even wrote a chrome extension to enable js on the current domain using a keyboard shortcut; but it has gotten to be more of a pain especially on landing pages.


> Most websites continue to work fine without it

> About 25% of sites become unusable

These two statements seem pretty contradictory. 75% feels like a low threshold for "most."


Most is more than half.


In casual conversation, I would never interpret most as being solely more than half. However, it seems like perhaps most people agree with you :)

https://english.stackexchange.com/questions/55920/is-most-eq...


In my entirely casual understanding of English most means the set that has more members than any other. When the comparison is binary (sites that work vs sites that don't) then "more than half" is both necessary and sufficient as a definition.

When comparing more than two options most could be significantly less than half (e.g. if I have two red balls, and one ball each of blue, purple, green, orange, pink, and yellow, then the color I have the most of is red, despite representing only one quarter of the total balls.)

That said, any attribute attaining more than half of the pie must be most.


In retrospect, the never in my previous comment was certainly an overstatement. While I agree with your reasoning, there is often a distinction between technically correct use of language, and what the hearer is likely to understand from what is said.


Even JS-heavy websites are moving towards being usable without Javascript with server side rendering.


The other kind of problem is if the website is not really proxied but rather dumped, patched and re-served. In such case the only option (if JavaScript frontend redirect doesn't work) is blocking by IP the dumping server.

To identify IPs, as pointed in the root comment of this thread, you can create a one-pixel link to a dummy page, which dumping software would visit, but a human wouldn't. So you will see who visited that specific page and block those IPs for good.


I would think you'd want to be careful about search engines with that approach. Assuming the OP wants their site indexed, you could end up unintentionally blocking crawlers.


Tail wagging the dog is never a good answer.


It's trivial to strip that "display: none" out, too.


Yea if they're already rewriting content to serve ads (likely since they're probably not doing this for altruistic reasons) you're just putting off the inevitable. While blocking or captcha'ing source IPs is also a cat and mouse game it's much more effective for a longer period of time.


Maybe an html <meta> redirect tag that bounces through a tertiary domain before redirecting to your real one? If they noticed you were doing it they could mitigate it, but they might deem it too much effort and just go away.

You might also start with the hypothesis that they're using regex for JS removal and try various script injection tricks...


If they're already stripping JS, I can't imagine it would be a lot of work to also remove the <meta> redirect.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: