> If I change certain things, they counter it immediately and make it work. If t...

rcar1046 · on Aug 21, 2018

Since the seemingly colluded disappearance of any freely available finance data APIs (excepting IEX), both Yahoo and Google Finance employ this method of randomizing class names on pages with finance data (quotes, fiscal data etc.). Inspect element on those pages for a good example of this tactic. I feel like this could make it much more difficult for your content to be stolen.

winkeltripel · on Aug 21, 2018

Easily bypassed: just retry on failure until scraper gets syntax it likes.

fencepost · on Aug 21, 2018

The other end retrying until it gets what it wants will dramatically change its usage pattern in ways that may be easy to detect unless they have an enormous store of IPs to connect from.

There are enough suggestions in here to provide a bunch of useful options, and while the site itself may not be making money, the experience dealing with this may be very useful on a resume or for building a client base with similar issues.

Possible approach: look for abnormal usage patterns to ID opponent systems. Randomize format and possibly other steps to assist that. Build that randomization in marginally effective ways that are easy to improve later. Build a way to feed bad/poison/"test" data to specific source IPs. At a time chosen to maximize impact, start feeding poison data to the suspect IPs using the marginally effective randomization, while feeding regular data to most visitors but with much improved randomization. Basically make your opponent's site visibly unreliable.

If you feel particularly vicious and know something about the opponent's infrastructure, make the poison vicious e.g. feed SQL injections. Be aware that this may have costs - you'd likely be fine on a legal basis ("I'm not responsible for their crappy sanitizing of inputs they shouldn't have had anyway") but you might still incur costs (lawyer if sued).

Edit: also, anyone going to serious measures to continue scraping after you act against it may also be inclined to ddos your site if you actually fully block them.

rakjosh · on Aug 22, 2018

The content is served using API that is open for everyone to use. And that is making it difficult for me to protect it. I've tried changing the structure of API response several times, but they counter it within few hours. Tried adding unique headers to requests etc (That worked for quite some time) but they figured it out ultimately. I come up with some solution, they figure it out in a day or two. And that went on for few weeks