> The core of webXray is a python program which ingests addresses of webpages, passes them to the headless web browser PhantomJS, and parses requests in order to determine those which go to domains which are exogenous to the primary (or first-party) domain of the site.
Naturally this means a different user agent and finger print which could ultimately mean the script is fed a different page altogether. The odds of that are probably low but still; someone could have a really shitty website that uses hundreds of trackers but could serve WebXray something completely without them.
I would like to see this type of stuff as web browser extensions. That way we can get the exact, most correct information possible. Also would simplify a semi-convoluted build process that seems to have tripped up a few readers.
Firefox has an extension called Lightbeam which does this. The author of the WebXRay software, tilbert, has already noted in a comment below that his software can be run in batch mode on multiple sites which is an advantage over the extension.
The major ad blocking extensions could do this. They already know about the requests. It's also possible to change the user-agent (say, use the top 10) and quickly gather the data using a bunch of cloud servers.
Yes, but they can figure out what on the blacklist was used for a given site. For example, Ghostery has Ghostrank which is pretty similar to this--it sends back to advertisers what stuff was blocked.
yeah, I already spoof the UA string, I've done a bunch of testing to verify it is working and getting me the right code. more likely way to get banned is I'm hammering ad networks from the same IP addr.
Naturally this means a different user agent and finger print which could ultimately mean the script is fed a different page altogether. The odds of that are probably low but still; someone could have a really shitty website that uses hundreds of trackers but could serve WebXray something completely without them.
I would like to see this type of stuff as web browser extensions. That way we can get the exact, most correct information possible. Also would simplify a semi-convoluted build process that seems to have tripped up a few readers.