The current state of the art in ad fraud detection has basically become "here is...

hansvm · on Feb 22, 2021

Speaking as someone who knows approximately nothing about ad fraud, what additional protections exist? The scheme you described could be easily thwarted by appropriately sandboxing their script to modify a shadow DOM instead of the real one (and countermeasures like checking the page once in awhile would just as well apply to a JSON approach).

geocar · on Feb 22, 2021

Ad fraud takes a number of different forms, including:

* Buying a low-value ad (like a banner) and cramming a high-value ad (like a video) in there, and lying to the ad server about the visibility, sound, etc, using JavaScript. Sandboxing is typically stymied by a number of cross-domain limitations in real-browsers we can detect server-side.

* Buying installs for a older/hacked browser (or browser extensions) that has been scripted up to load the ads. People would embed these in screen-savers making real users visit these ad pages when the pc owner was unlikely to be around. They won't have the protection real browsers have, and so can trivially modify the network profile.

* Making a headless browser call ads on pages. These pages "look" valid, and you can visit them to see the ads, but the headless browser has collected cookies from various shopping sites, and uses a number of home/DSL proxy services to obscure detection. For any single impression, they look indistinguishable from legitimate traffic.

These are detected in different ways: JavaScript helps some for the first two, but in the second it's mostly that you're looking for bugs in the implementation (and it's just JavaScript gives you a wider search area). Usually these things are "home grown", so if you've got a wide view of the industry, and can change your scripts frequently, you can "detect" them being built in real-time.

However that last one is tricky, and outside of bugs[1], you're left with timing attacks which I won't enumerate because their obscurity is the strongest protection for continued utility, but in general they work on the principle of leaking some identifying data in HTTP and DNS responses, and relying on the fact that that headless browser needs to call lots of ads to pay for the electricity and Internet that it uses, so we get lots of opportunities for a collision.

[1]: https://geocar.sdf1.org/browser-verification.html