Hacker News new | past | comments | ask | show | jobs | submit login
Analyzing Analytics (Featuring: The FBI) (exploits.run)
286 points by benryon on April 24, 2020 | hide | past | favorite | 43 comments



Looking at the Siberian husky site... stdLauncher.js is part of Verint ForeSee, one of those "would you like to take a survey about our website" solutions. The AAM analytics code right above the survey and urchin code lists as domain an IP associated with Sungard AS, an outfit that holds a number of federal contracts for IT services. This IP, 209.235.0.153, hosted the FBI website at some point in time. It's oddly easy to figure this out, even without something like a DomainTools subscription, because there are a lot of people scraping and archiving the FBI most wanted pages due to their cultural significance.

Some searching on code samples shows that the AAM section of analytics code is an exact match for analytics code served up by an older version of the FBI's most wanted website. Likely that it was also used on older versions of other FBI websites as well.

In the end I find it unlikely that this website has anything to do with the FBI, and more likely that the website owner copy-pastad a large section of source code and accidentally ended up with this result.

One bit of commonality I've noticed is that a lot of websites with the FBI tracking code were all built with FrontPage. I'm not sure if this is causal or coincidental, but perhaps it contributes to this that FrontPage allows you to open a webpage that you saved from IE and edit it... which might lead to some websites being complete duplicates of FBI websites, except for visible content, simply because websites like the FBI most wanted were relatively prominent parts of the early internet.

Edit: I spent a little time riding the WayBackMachine to some of the other webpages when they were apparently using FBI analytics code. The results are odd but they're so inconsistent that it's hard to think it was at all intentional. One interesting finding is that both ohthx.com and ppc-guy.com, at the time they supposedly had the FBI analytics ID, were apparently hosting an analytics package called Prosper202 that redirected the WayBackMachine crawler from the login page to fbi.gov. I have a suspicion that this was a partially-joking way to deter crawling of the admin interface of the software. The record that they used the FBI analytics code is presumably just an artifact of the crawler following the redirect. It seems that this exact Prosper202 behavior results in the majority of the old hits.


This technique was recently done by some redditors to uncover that the multi-state COVID reopen protest is being pushed by some guy who uses an antique shop in FL as a front for his shell LLCs.

They are the websites that are being used on the facebook pages that are primarily pushing 'reopen' content, and the GA accounts on those pages links them to a bunch of pro-firearm shell corps as well.

Here's the thread. It got deleted since it was deemed as doxxing (a reddit no-no) even though Whois data is public:

http://removeddit.com/r/maryland/comments/g3niq3/i_simply_ca...


Krebs also mentioned this in his recent post https://krebsonsecurity.com/2020/04/whos-behind-the-reopen-d...

A very interesting way to associate the same site owners!


Look at the updates on that post, nothing is so clear cut. The problem with internet sleuthing is that everyone gets very excited and innocent people can be injured in the unnecessary witch-hunt.

>Update, April 21, 6:40 a.m. ET: Mother Jones has published a compelling interview with Mr. Murphy, who says he registered thousands of dollars worth of “reopen” and “liberate” domains to keep them out of the hands of people trying to organize protests. KrebsOnSecurity has not be able to validate this report, but it’s a fascinating twist to this tale: How an ‘Old Hippie’ Got Accused of Astroturfing the Right-Wing Campaign to Reopen the Economy

Update, April 22, 1:52 p.m. ET: Mr. Murphy told Jacksonville.com he did not register reopenmn.com or reopenpa.com, contrary to data in the spreadsheet linked above. I looked up each of the records in that spreadsheet manually, but did have some help from another source in compiling and sorting the information. It is possible the registration data for those domains got transposed with reopenmd.com and reopenva.com, which included Mr. Murphy’s information prior to being redacted by the domain registrar.


Right, and this is exactly why reddit bans doxxing. The original reddit poster was correct that there was a single individual buying most of these domains; however, other than purchasing the domains, there was no evidence that the individual was using those domains to promote protests. Let's not forget reddit's Boston bombing debacle[1].

[1] https://en.wikipedia.org/wiki/Sunil_Tripathi#Misidentificati...


Sadly I get “Could not connect to Reddit“ when visiting that link


You'd need to disable tracking protection or whatever equivalent in your browser. It's a known issue with removeddit.

If you check your console log, you'll see:

> Tracking Prevention blocked an XHR request to https://www.reddit.com/api/v1/access_token.


Worth noting that, since the Analytics ID is the publicly visible, anyone can load Google Analytics on their own site using that ID. No FBI connection required.

This is called Analytics hi-jacking and it was once (still is) a common spam technique: Create site buy-my-stuff.net, load a bunch of hijacked analytics scripts there, and then the owners of those accounts will see “but-my-stuff.net” in their analytics reports.

Edit: As commenter lmgk reminded me, you don’t even need to make a site, just use the API to make pageview calls.


Is it not possible to whitelist your own domains in Google Analytics? Forgive my ignorance, I don't use it at all.


You don't need to host a site. The data format to send data into Google Analytics is an open API (called the Measurement Protocol). You can just ping Google's servers directly with the appropriate payload, which include crafted URL parameters.


The actual google analytics account has a setting admins can control to only allow data from specific domains though this can be faked.

Also, usually these IDs are copied when someone clones a website they want to steal the design of but they don’t bother updating the style or JS.


Any info about what domain is being visited would be client side, which could be easily changed.


In his book "Permanent Record" Edward Snowden[1] describes fake websites used by government agencies to disguise internet traffic that is actually use for spy craft stuff.

eg: maybe a website about siberian huskies actually has a hidden login or hosts another service when contacted on port 80/443 in just the right way?

Now, that would make more sense for the CIA than the FBI, but I think it illustrates another avenue of interpretation

[1]: https://www.goodreads.com/book/show/46223297-permanent-recor...


That doesn't make sense, why would they even let people know that there's a connection? The hidden login part may be true, but just not on a sites that are related so obviously. It could be a smokescreen of some kind though.


I agree that having a fbi google analytics would be a gaffe


"Pet scam" is a big business [1]

In this example its quite possible FBI put their traps to get better understanding what third parties are involved; who is visiting the site, and probably some admin management page behind it. Sort of like get the contacts of a criminal and go from there.

[1] https://www.ipata.org/current-pet-scams


Interesting. I've got his book on my reading list, but haven't gotten to it yet.

I just made a tenuous mental connection between this concept and a Reddit phenomenon called "Lake City quiet pills". I heard about it on the podcast "Stuff They Don't Want You To Know" and it held my interest for a few hours' worth of investigation.

The short version is that a Redditor died. He was a stereotypical grumpy old dude, and someone hopped on Reddit and posted that he'd passed. Someone got interested and tied that poster to some websites, one of which had a bunch of stuff hidden in the public source. It definitely seems like a clandestine group of some kind communicating, but to who it was and to what end isn't clear. The Reddit conspiracist belief seems to be that it was a group of assassins-for-hire.

Podcast: https://www.iheart.com/podcast/182-stuff-they-dont-want-you-... Subreddit: https://www.reddit.com/r/LakeCityQuietPills/


Google analytics used to be called Urchin (they bought Urchin and made it Analytics). So all the urchin.js code is probably just really old Google analytics tracking code.


The original Urchin was used for log analysis https://en.wikipedia.org/wiki/Urchin_(software). Which might explain a 'self-hosted' version of the software as well.


It had a hybrid log/js approach around the time google acquired it. I believe one of the first. Was the best product around. As a shared web hosting provider in early/mid2000s it was becoming more than a competitive advantage to offer it.


The Google Analytics 'trick' (to identify all the sites someone owns) has been around for quite a while. All you have to do is use a code search engine like publicwww to search for the snippet of code or the analytics ID.

It's not just the Google Analytics ID or GTM Id, you can also use the Adsense pub-id or just about anything else that you might think sites have in common. When you start to also look at backlinks and IP neighborhoods, things can get interesting, as well.


On a related note, I wonder if there are/were common patterns in the sting sites set up by Dept. of Homeland Security, such as U of Northern New Jersey [0] and U of Farmington [1]. Both of those were initiated during the Obama administration and featured fairly nice modern designs, similar in aesthetic to much of the Obama-era digital overhauls (though a quick skim shows that they don't share similar CSS naming semantics).

[0] https://www.nytimes.com/2016/05/06/nyregion/students-at-fake...

https://web.archive.org/web/20160327093120/http://unnj.edu/

[1] https://www.freep.com/story/news/local/michigan/2019/11/27/i...

https://web.archive.org/web/20180414235355/http://university...

http://archive.is/qLrUi


On brief review, one (UNNJ) is running WordPress while the other (Farmington) doesn't show any evidence of a dynamic CMS. That suggests to me totally separate provenance. My guess would be that two different contracts were awarded to two different companies to build the websites, which would both be consistent with common federal contracting behavior and a good idea from an OPSEC perspective since it would minimize any similarity in these "sting" websites.


Now this is the Hacker News I want to see. Just a mere observation using known meta-analytics with entertaining implications.


Or maybe they just stolen code from FBI website to have a feature and pulled way more code than required without even knowing what it does.


A coworker sysadmin once told me that when he was inspecting the web server access logs (for an unrelated reason) he noticed that many requests to a resource on our website have a strange referer URL that was never present in requests to pages. He inspected that site and found that they were using our resource. We didn't really care about it, but that was really interesting.

Maybe it's the same with these sites?


The article says all three fbi.js files were on waybackmachine. I was only able to download urchin and the other ones are not there. Anyone have a mirror? Besides the author? pastebin or mega


All three are from commodity commercial software, finding other websites of the same period that used Urchin/GA and ForeSee should get you more or less the same files.


In the Wayback Machine archived version of triggerParams.js there is an OMB parameter of “1505-0186" if the client is section 508 compliant (US accessibility guidelines). A search of that OMB number turns up a Customer Satisfaction Measure of Government Websites survey from 2008/2009 (which makes sense if the archived js is from the FBI site). What isn’t clear is if the same version was used on all of the sites (some of the parameters are hard-coded) and how it got copied across to a mixture of hobbyist sites, plumbers, Most-Wanted pages etc. A quick peek at the page source of a random sampling of the sites in the Wayback Machine show very little similarity with each other (e.g. style of code, page layout etc.) which strongly suggests that it wasn’t people just ripping off the FBI page and wrangling it with a text editor. It is curious.


The big takeaway from this article for me is that I should probably look for or write a browser extension that tracks changes to analytics tools and IDs on sites. If a site is silently taken over, the state actor would either need to separately gain access to the analytics tool accounts, or would need to modify the IDs to connect to a new account. I'd love to see how often tracking IDs change on high-profile sites.


>If a site is silently taken over, the state actor would either need to ...

Why would they need to do that?


Google analytics ID's are tied to the account that created them.

Presumably the FBI doesn't all share just one massive "fbi@gmail.com" email address.

Even if a bunch of FBI employees decided foolishly to use google analytics on their honeypot sites, one would expect them to all separately sign up using different google accounts - either using their real email addresses, or hopefully throwaway ones.


I think you're confusing Google accounts (email addresses) with Google Analytics accounts (tracking ID prefixes). A single user can create dozens of GA accounts.


But by default, it's a 1:1 mapping...


Pointing back to a government domain is not how nation state monitoring infrastructure is set up.


Sure, this isn't a comprehensive strategy, but you'd be amazed at how far behind some of those agencies are in terms of day-to-day operations for investigations.

A relative of mine works at FBI and several years back he told me a story about how an investigation into an organized crime syndicate was blown up because an agent on the case was dumb enough to check out the target's LinkedIn profile while he was logged into his own real account. So the target got a notification that Joe Blow from the FBI had just viewed his profile. Over a year of work down the drain with a single GET request, crazy.


My issue is the confidence with which the author presupposes that the existence of this code on sites indicates seizure or utilization in an investigation. It is a lazy position that leaves others (i.e. HN readers in this thread) with a little more intellectual horsepower to evaluate the other - and frankly more realistic - alternatives.


What are the more realistic alternatives?



Please refer to the (current) top comment.


That is a fascinating read. It sounds like it is also prudent to use separate analytics ID on your websites if you choose to go that route.


This is stupid, people working on investigations obviously aren’t going to have access to fbi.gov or tbe analytics accounts for that.


I mean. You cannot track with Google Analytics. Why would anyone use it for that.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: