Hacker News new | past | comments | ask | show | jobs | submit login
My website is being stolen in real time and I don't know what to do
90 points by joeyjones on Feb 14, 2014 | hide | past | favorite | 97 comments
I launched the site http://altexplorer.net at the start of January as a Block Explorer and information hub for alternative cryptographic currencies. This morning I found a site http://4co.in which is ripping-off my site in real-time; every time a page is loaded on 4co.in it uses php to load the corresponding page from http://altexplorer.net, removes analytics and ad tags, replaces the site name, and replaces the link URLs.

I've put a lot of effort into building this site and keeping it running, and now someone in India is stealing it in real-time. Every page load to 4coin causes an identical page load in the nginx logs of http://altexplorer.net. What can I do besides blocking the source IP address to stop this?

Screen shots: Alt Explorer home page: https://d1eem2029tdth0.cloudfront.net/img/altexplorer-home.png

4coin home page: https://d1eem2029tdth0.cloudfront.net/img/4coin-home.png

Alt Explorer profitability page: https://d1eem2029tdth0.cloudfront.net/img/altexplorer-prof.png

4coin profitability page: https://d1eem2029tdth0.cloudfront.net/img/4coin-prof.png




Lot of good suggestions already. I am not sure if you are interested in contacting the perpetrator directly and asking them to stop this but I did a little research for you.

looking up the whois info, it says that the registrant's email was bgrf@ymail.com

When I put this email in google, I came across another spammy site called baklinks.blogspot.com. This site asks you to swap back links. At the bottom of the blog post, I found the name of the person "Naveen K R"

I then looked up google with "Naveen K R + bgrf". I was able to find a site he (probably) runs called www.zokali.com

More googling combos, I finally found his linkedin profile and his name "Naveen K Ramanand"

https://www.linkedin.com/in/krnaveen.

May be you can contact this guy directly. Seems like he is the one doing this or at least he knows who.


People usually use the same username, too, so using the LinkedIn username on Twitter: https://twitter.com/krnaveen. There's this tweet from November:

  I started to earn money on 4co.in short links. It’s easy -
  make the short links and earn the biggest money. http://4co.in


Thanks, I hadn't thought of trying to find the person behind it. This is verified by checking a caches version of 4co.in with google: http://webcache.googleusercontent.com/search?q=cache:Gfp7QK3... as the username for the "Hello World!" post is "krnaveen".


Extending that investigation, This Google Plus profile shares the same name and similar face. https://plus.google.com/108991124177710449415/about


You seem to have a particular set of skills


If you end up trying to block his IP, don't just DROP or REJECT his packets. TARPIT [1] them! This way not only would you be denying him access, but you would also be draining his resources.

Another thing to try is to see just how much data his server will take. See if you can send him a GB-sized response.

[1] http://www.netfilter.org/projects/patch-o-matic/pom-external...


Please try contacting them directly and simply asking to stop before doing this!


Because it could just be doing this by mistake, right?

Dan, I need to show that I'm over 18 on a couple of sites, and they want me to verify by entering a credit card number. But I pay everything in cash, so I don't have any credit cards. Could you email me your card #s, please?


The javascript solution has already been suggested, but take a step back and think about it: the same way the leech worked out your links, domain name, logo and all the stuff that brands your website, he can easily figure out the simple JS code suggested here.

<img src="x" onerror= "if(document.location.href==='http://4co.in')document.location='//xxxxxx.xxxx';">

So I say, go a step further:

- do not send his users to a black hole, instead show a banner warning them about the leech and then after a few seconds redirect the user to your website.

- The JS code for the above should go in the same JS file that provides core functionality to your website. After done that, run your JS past http://closure-compiler.appspot.com/home or if you better still install the yuicompressor cli (http://yui.github.io/yuicompressor/) in your machine. The resulting code will be minified/compressed and seriously obfuscated. So trying to defeat it will that the leech hours if not days depending on his experience.

- encode/obfuscate the warning string (1st topic) to make it harder to find within the JS code.

- and finally do a daily spot check on website following @jarrett comment below


You found out the right first step yourself: Block the source IP address. Sure it will turn into a game of whack-a-mole with them changing their IP but eventually, their customers will get fed up with their downtime.

Second idea: Javascript redirect all of your pages to your own subdomain. Again, its just a step in an arms race, but this would be a little too hard/expensive to take to court. You can win an arms race if you try.


If you have a hard time determining their IP, here's a trick that might work. Visit their site with a unique but innocuous-looking path or query that would never be accessed by a normal user. For example:

http://4co.in/?q=1

If the query string is being passed through, which I suspect it is, you can use the query string to easily locate the corresponding entry in your own logs. Or, if the query string isn't being passed through, you can use a path instead:

http://4co.in/q

You probably already thought of this technique. I decided to post it anyway in case you hadn't, or in case anyone else is facing a similar challenge.


Building on this, you could create a script on your server that requests a random url, then greps that url in your logs to figure out the IP and then add that to the banned list. It'll be an auto banning machine!


countermove: he works it out and blocks your magic query - query needs to be made from somewhere indistinguishable from normal requests. Changing IP would be best... maybe via tor.

A unique sequence of legitimate requests might be more difficult to for the other side to detect and it won't result in 404s. Could randomise the sequence and each can come from a different IP as long as they were synchronised properly.


> countermove: he works it out and blocks your magic query

He very well might. But my estimation of the thief's skills is low. I could be wrong, of course.

> A unique sequence of legitimate requests might be more difficult to for the other side to detect and it won't result in 404s. Could randomise the sequence and each can come from a different IP as long as they were synchronised properly.

That's probably the best bet. A legitimate but very winding path through actual links on the site would work quite well. Given enough steps, it would almost certainly be unique. Because you'd be varying the path each time, the thief would find it hard or impossible to block you.


Clever solution! I like it


The more subtle response is to feed them bad data until they can't trust you.


Bad data would have my vote. It's nasty, subtle, harder to detect, and it would undermine people's trust in the culprit.


For now I have added a news post with a link to the proper site and am debating between blocking the IP or delivering a static page with a link to the proper URL or a javascript redirect to the proper site.


I'd do a javascript redirect so you get the ad-views and the users of the site see the new URL.

(for now)


interesting that they didn't change the donation addresses. so if someone uses theirs, and likes it, sends some BTC to them, it will go to you?


Yup, donations should still go to me.


Why do you jump to the assumption that they're malicious then?

Put a few more branding elements into your page design, so that visitors to their site will understand that they're redirecting your data. Then take advantage of the extra publicity that they're giving you.


Having a duplicate site would do a lot of damage to my Google search rankings.


Not sure if he's stripping it, but serving a 'canonical' meta tag would solve that if he's not,


Detect their IP and 301 their requests to goatse. Or something worse, if you're bent like that. :)


Why do that to people who probably don't know 4coin is being a thief?


The point is to make them stop going to 4co.in at all. No credibility = no traffic = dead site. And hopefully the thief learns a lesson.


goatse as it was is no more. They had planned to offer vanity email addresses, but I am not sure if it took off. It looks like they're doing something with dogecoin now.

But an image search should help you find the image.


Don't punish users. The goal here shouldn't be to silently redirect or deceive them with fake data or throw up goatse.

Instead, make it annoyingly clear to anyone that visits 4co.in that the content is stolen. 4co.in users aren't visiting 4co.in to spite you. They just don't know and will gladly use your website instead.

The game of whack-a-mole is strongly in your favor because you're on the right side of a trapdoor.


<img src="x" onerror= "if(document.location.href!='http://altexplorer.net')document.location='//goatse.cx';">


This won't work as it seems they're replacing any mention of "altexplorer" with "4coin"


Oh,

<img src="x" onerror= "if(document.location.href==='http://4co.in')document.location='//goatse.cx';">

Maybe like this then?


<img src="x" onerror="eval(atob('aWYoZG9jdW1lbnQubG9jYXRpb24uaHJlZiE9J2h0dHA6Ly9hbHRleHBsb3Jlci5uZXQnKWRvY3VtZW50LmxvY2F0aW9uPScvL2dvYXRzZS5jeCc7'))" />


Question answered. Topic closed.


You, sir, win.


but 4coin can add some simple code (e.g. use regex) just to remove that line from scrapped content before sending it to the user though.


You could defeat regex by sending a different base64 string each time by including a call to void(server side random number).


yes, you can, but since the core of above line is unchanged because the you need the functionality of that line, they can still regex the core part (but not the whole line since there are some random chars or random numbers you put)


Look for either the php user agent and/or the source ip. Why not use mod_redirect or something and redirect him to some bizarre internet meme site? I would suggest tub girl or goatse. It will get the point across very loud and clear. Or, just serve a different copy of your site to him that makes it loud and clear what he is doing is not ok. Either way, you can use mod_rewrite to cause this guy agony and prevent him from perpetrating this.


I noticed that OP put a link to the legitimate site. How about serve a version of the site that redirects to the corresponding page on your own?


recommendation: respond with fake data, based on source IP. the problem will take care of itself.


this is probably better than banning their source ip, as it will take longer to detect and piss off their customers.

also, report them to adsense and anyone else serving their ads.


Gigabytes of fake data.

Let them eat /dev/urandom to their heart's content.


> Let them eat /dev/urandom to their heart's content.

No! You can't just give them purely random data. No, sir. That would be easy enough to detect.

What you need is plausible randomness. Shift the value of every transaction by a small percent. Trending everything downward over time, but making it plausible, would be far more entertaining with random upward trends. Best buy now before it gets too expensive! Oh, I'm sorry? That wasn't the actual price? Well, you'd best use a reputable source!

If you're going to poison the well, you don't want to be caught. You want them to wonder at what point their data set diverged and for how long they've been serving incorrect data. Sinister points for interspersing legitimate data with munged data.

The trick with being evil in this case is to be subtle about it. They want to scrape all your metrics? Let them. You just can't guarantee the accuracy of the data they're scraping, right? [wink, wink]


I am going to lok into this later today through nginx. I am planning on having every request from their scraping IP return a static page linking to the proper site.


Excellent. Keep us posted. Good luck.


Yes...yes...yes.


Could we make him pay a few bucks?

Specifically, can we make him traffic multiply? I wonder what exactly is he doing with request headers... maybe this could work:

1) set up page /fluffy with wildly compressing contents, say 50MB of $£€$£€$£€$£€$£€.. always force gzip encoding 2) set up a few bots (amazon?) to download that page from his site, but do not accept any compression

Start the attack on some time the guy is probably sleeping, it might go on for a few hours before he notices, costing him a couple of hundred bucks in bandwidth.

Or maybe just some cpu waste in same vein: the guy has to open the gzip before forwarding to do string replace and re-zip it afterwards, so you can make sure that the content REALLY balloons..


Instead of blocking source IP. Detect and send "unwanted information".


I agree, this may be a more effective approach than trying to block the IP and the whole whack-a-mole issue.

Essentially, they trust the data you're providing and are trying to make a buck off that info. But if they lose that trust because they don't know whether the data is legit or not, you win!

I would also try to mask the fact that the data is not accurate, if they immediately see everything as simply zeroed out, it would be a huge red flag you're on to them. If you provide them ALMOST correct data, it would be harder for them to determine what's going on and their users will see realize the disparity and (hopefully) get burned and never come back.

Essentially, the trick is to destroy the site's credibility so there's no financial benefit to continue to steal from you.

Good Luck!


This is a great approach.

I'd also recommend only borking SOME of the data - an intermittent bug is harder to fix than a consistent one!


You can use javascript frame busting techniques to redirect back to the main page. You can also use mod_rewrite or some proxy setups to make it so a completely different set of pages shows up for people coming from that site. This is better than just blocking it because it's a bit more subtle and lets you tell that site's users what's happening.


This exact same thing happened to me a couple years ago.

This is how I got it resolved within a day:

http://pzxc.com/internet-is-still-wild-west


I dealt with a somewhat similar situation a while back: https://news.ycombinator.com/item?id=4291454

I issued a DMCA takedown notice to their host and it was taken care of in a couple of days. I suggest doing the same.


If you have time, go to war.

Have a page that spits the IP/hostname of referrer in a hidden section. Using that you can identify the IP/hostnames, so if he changes, you can always detect it.

Now that you can detect him, when he crawls your site, feed him garbage info for every single page, then constantly check his page for the hidden ip/hash in case he changes his IP/host. Hide that in a minified js. You can also feed his page bogus links that violates google's SEO so he can get blacklisted.


The thing is that he isn't scraping the site ahead of time, he scrapes for content in real time. When a request is made to 4co.in he requests the corresponding page from altexplorer.net, does string replacement on the site name and url, and then outputs it to his users.


First post here at HN... but I would try a shame tactic (per codegeek's helpful name research). In a nice bright box just above your normal content, send the following text back to his IP address ...

"Hello, my name is <insert his name here once you are certain> and I've stolen the content that you are viewing right now -- someone's hard work. I stole it in a very intentional and fairly disrespecful way. Sometimes we get life lessons and this may well be one of mine. Instead of using my skills to do good with the precious time that I have in this beautiful world, I've chosen to write a fairly nefarious script to copy every single page of someone else's website and suck it back into my website, so that I can profit from someone else's work. The message you are reading right now may go away for a day or two, if I change my IP address. But rest assured, it will be back once my IP address is rediscovered. This event will also follow me forever on search engines when people search my name -- future employers, friends, family. I have been doing this for <x> days and have been asked to stop. I haven't yet, but time will tell.... (<insert-pretty-date-here>)

In the meantime, if you would like to visit the real website go <here>..."


The JavaScript frame busting methods are not the right approach, you have no control over what his users see. There is no reason he can't filter out any JavaScript or other HTML. In fact he might not even display your live HTML. He might have copied it to make his page templates and it scraping just the data from your site, you just don't know. If he isn't doing this now, he will if he gets in an arms race with you.

You need to return bad data to his site by IP address and possibly user-agent. Don't make the data bad to mess with the users, just make it return unusable data, for example all numbers are zeros. Then what you do it make a scheduled task that scraps his website (using his domain name). If you start getting HTTP requests in your logs that correspond to the schedule job you created then you add the new requesting IP to the blacklist of funny data, then make a second request to his website validate the IP you blacklisted. You could setup your scrapping tool to use random tor exit nodes and cycle the user-agent info.

He could do the same (random ips) but might not... Really you need some type of accountability which you can never have on a public website but requiring registration/authentication would help some if it becomes that important to you.


Sample log excerpt: 162.222.227.123 - - [14/Feb/2014:18:18:48 +0000] "GET / HTTP/1.1" 200 23271 "-" "-" "162.222.227.123"

162.222.227.123 - - [14/Feb/2014:18:37:51 +0000] "GET /chain/42 HTTP/1.1" 200 76170 "-" "-" "162.222.227.123"

162.222.227.123 - - [14/Feb/2014:17:40:58 +0000] "GET /block/0e67dcf5f6797840a98061af7581138f2347feb168d78f7138d4268c6f854748 HTTP/1.1" 200 15719 "-" "-" "162.222.227.123"

162.222.227.123 - - [14/Feb/2014:18:38:21 +0000] "GET /tx/6c636ebff9674f4168b80b415f8a9097509802992b0422a4fa98c543da9c068e HTTP/1.1" 200 15898 "-" "-" "162.222.227.123"

162.222.227.123 - - [14/Feb/2014:17:41:05 +0000] "GET /address/GRjc357hnC7THEUPVJmpMmCjSAGn54CJnx HTTP/1.1" 200 14034 "-" "-" "162.222.227.123"

162.222.227.123 - - [14/Feb/2014:18:13:21 +0000] "GET /news HTTP/1.1" 200 16675 "-" "-" "162.222.227.123"

162.222.227.123 - - [14/Feb/2014:18:19:12 +0000] "GET /profitability HTTP/1.1" 200 188354 "-" "-" "162.222.227.123"


Since the faker's requests don't have a User-Agent, you could block all requests lacking a valid User-Agent HTTP header.


Use imagemagick to watermark all image requests on the fly so you can keep changing the position of a url watermark on all images.

edit - actually, don't do this as it is trivially easy to get around by doing 2 or 3 requests and keeping anything that hasn't changed.

Or if you do do this, add a low level noise filter on top so that the attacker can't just directly equate pixel values.


Currently 4co.in is showing this:

--- Site is down!

Sorry everyone! i really apologize for what happend!!

It all happend because of my silly mistake and misconfiguration and it was affected for at max 10hrs.

Instead of making a scene somebody would have contacted me!

Now i understand the risks of live development. It was not my intension to steal anything. ---


Before you react, try to estimate on how much money this is costing you, then determine how much money you're willing to spend to combat the problem. Try to keep the costs of your response in line with the damage inflicted.


I think mentioning the short URL provider auto-killed fragmede's comment, and my copy&paste of it. Here goes again:

fragmede's comment below is [dead], but has very good advice.

---

Nice bit of news you added to the top, which 4co.in is putting on their own site.

One piece of advice though: Drop the short link and link directly to altexplorer.net, otherwise it looks like 4co.in was 'hacked' and the short link is a phishing/some other sort of scam and not legit.

You should be able to pickup the 4coi.in domain as the referrer if you want metrics for how many people were using 4co.in.

---


I think they have to use a short-url or it gets re-written. He might be able to ajax in a page on from site (using the short-url to get it so it's not re-written) that contains the real URL as I doubt they are injecting something to re-write what is added vis JS.

[0] > This won't work as it seems they're replacing any mention of "altexplorer" with "4coin"

[0] https://news.ycombinator.com/item?id=7240554


True, I didn't think it through before commenting.


I think the text of "altexplorer" is being replaced automatically, hence the need for a shortener.


Report them to their web host and the ad networks they use. Don't troll them with different content just go for the kill - some accounts like AdSense carry lifetime bans.


I have reported it to the abuse contact of the web host, colocation provider, and cloudflare. I am however not expecting a response in a timely manner from any of them.

Luckily they are stripping out the ad tags before displaying by site so it shouldn't affect Ad Sense.


Do you know who their web host is? If so, block the whole Class B from fetching your site.


Class B? What is this, 1992?


You're right. Thanks for keeping me honest.

Don't just block the /24 ("Class C" IP block), block the /16 ("Class B" IP block) the IP resides in. You'll reduce your audience in India, but they'll have to switch webhosts to continue leaching his/her site.

http://www.oav.net/mirrors/cidr.html


I would contact the other site first and find out WTF. It is unlikely, but they might have a good reason for it. If they are just trying to rip you off, solution might be as simple as just asking them to stop.


The thing is that the coin information isn't unique and the block, tx, address, etc info is all on the block chain. The unique content is the profitability and historic information, but the charting is broken in what they scrape.


Problem is contacting him first tips him off. If you build a technological solution, you can reuse it if this ever happens again.


You should be able to apply behavioural detection here even if the IP address changes - they'd have to be polling your site regularly. Is there a discernible pattern in the logs?


In addition to the other ideas here, I'd also recommend feeding a completely fake site to the source IP of the thief. Possibly including some political ideas that could get him in trouble in his host country (up to you depending on how mean you wish to be).


That's dark...


It was meant as a joke (since it'd be, well, evil) but I suppose it could be read otherwise.


there are companies offering services to deal with this, just depends on how much your time is worth. Here is one option. www.distilnetworks.com ... in case you tire of whack-a-mole


Serve a different website for that ip with fake data.


And here I thought "copying is not theft"


This isn't copying - its actually much closer to stealing.

The difference is that copying doesn't take anything from the original author. Here, the original author is still hosting 4co.in.


Why not feed his visitors subtly incorrect data?


Because his users are not at fault and shouldn't be penalized.


In a word: Goatse.


Ok, time for a reality check

If you can't imagine what to do in this situation you shouldn't be running a website of this nature

This type of thing can (and does) happen and it's up to you to know how to defend yourself.

The others have given plenty of ideas, but I guess there are more specific things that can be done depending on their page structure/ads etc


This is just terrible advice. "You shouldn't run a website unless you know how to deal with this"... and yet the only way you would learn to deal with this is if you were running a website...

You are an idiot.


You don't accelerate your car to 100mph and then ask "how do I brake"

If he launched the site, some technical knowledge he must have, however, to then not know (as in, to not have any idea) what to do seems strange.

Thanks for the offence, but it's not me who's hopeless about their website.


So, you're suggesting him just giving up on his site and moving on? One issue that wasn't even his fault and he should walk away?

I guess trying to ask for advice and acting on that advice, all the while learning more and more about the potential attack vectors one should be aware of when dealing with these relatively new cryptocurrency services, is a shit idea.

BTW, love your intolerant handle. I guess you'd be bashing me then. Sorry my taste in music differs from yours, please don't "bash" me.

cowers in fear as the old, bitter hippie grabs his cane


No, I expect him to sit, analyze the situation and learn things instead of a hopeless "I don't know what to do"

The issue is not "Please advice me what to do", it is saying it in the spirit of someone who doesn't know how he got in the situation in the first place. For someone who builds a site like that, he should've know better.


Read his post again. He knows what is happening. He even mentions that he could block the IP address. He simply wants to know if there are better solutions.

So the problem is the way he titles the post. Sure, okay. So make that your argument. You'll sound even pettier but at least you're not fighting a strawman.


He's not really asking what to do - I'm sure he can think of 10 things. He's really asking 'what would you do'. And he's getting some awesome feedback from the community. What made you so bitter!


You shouldn't bother with all the personal attacks.


Do I need more karma to down vote people? Because, I would totally down vote this guy if I could.


I think you need 500 to downvote.


worst advice ever.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: