Hacker News new | past | comments | ask | show | jobs | submit login
The Quantum Ad-List contains domains used by ads, trackers, malware (gitlab.com/the_quantum_alpha)
158 points by URfejk on Dec 22, 2020 | hide | past | favorite | 52 comments



TBH, at this point we may as well start using whitelists: 1st party domains and known 3rd party CDNs for static content and maybe media, 1st party scripts for frequently visited sites and a special button hidden in a safe place to enable 3rd party scripts for those who want to live dangerously.


I've been using whitelisting techniques for almost a decade now and it's made my online experience massively better than the average person's. Firefox + NoScript + uBlock Origin, and default deny everything. Then simply whitelist what is strictly required to get sites you care about to work. You can combine this to great effect as well with using similar policies for your system firewall. I run a local firewall on all of my devices and default deny all inbound and outbound traffic, whitelisting only specific ports (and in some cases IPs), and layer an application firewall on top of that to only allow specific applications to make calls on specific portions to specific endpoints.

The combination results in a situation in which I have control as a user over what my computer is doing, and I can interact with online services without being in complete submission to the current heinous model. It's not perfect, and it doesn't make you impossible to track or anything of the sort. But it definitely cuts down on the crap you deal with and drastically reduces your risks of being infected by malicious software.


I have the same methodology but I'd say it's made my browsing massively worse than the average person. Mostly because of NoScript whitelisting, Ublock improves things immediately out of the box.

Here's some typical examples of my interactions with the modern web:

I'll open a website and nothing will appear because nowadays it's normal to load all content through Javascript. So I'll whitelist only the site and its CDN. It still won't appear because the site will be misconfigured to not load unless googletagservices has finished loading. So then I allow google. But then some other scripts from the tens of third parties will also be necessary to load it so I'll try to randomly temporarily whitelist the least scammy looking ones until it works.

For e-commerce websites, I'll allow all scripts in the tab because I know it's a lost battle to shop without it and then during checkout I'll often be redirected to another part of the site or to a payment processor in a new tab without script privileges which won't work, so I allow scripts on this tab and reload the page but it breaks their process and now I have to start over.

With some websites I get memory leaks because of NoScript. Alibaba and Vinted were like that for a long time, I don't know if NoScript or Firefox fixed something on their end or if the websites stopped doing the stupid shit they were probably doing, but for a long time I couldn't open those without using 10s of gigabytes of ram and crashing Firefox.

I put up with all of it because I don't like being spied on but, man, am I mad that I have to go through this.


>I'll open a website and nothing will appear because nowadays it's normal to load all content through Javascript.

I,ve found this behavior to be strongly correlated with low-quality content, another waste of my time.

Unless I already know I want to read the site, I,ll often close the tab at this point and move on.

Chances are, nothing of value is lost.


Yeah it's a pain, but have you tried turning NoScript off? It's a nightmare of "give us your email address!" "accept our cookies!" "check out our whizbang menus that barely function!" "watch our article text slowly fade in!" "look at this image scroll in from the sides for some reason!"

The web with NoScript is a chore; the web without NoScript is unusable.


> misconfigured to not load unless googletagservices has finished loading.

Yep, this would be a website I'd just not use. For e-commerce, there's always another place. Ironically, sites which do their own tracking first party (Amazon.com) are much better than sites which do not, unless they're built on a standard platform like Shopify or BigCommerce.

I'd say your model for what makes an enjoyable web experience is different than mine. I care about content, not just getting sites to load. If a site is badly built enough and lacks so much respect for me as a user, I don't care what they have to say anymore and I move on. Your approach isn't necessarily wrong, but it certainly isn't conducive to whitelisting, because you'll end up whitelisting most of the things I'd actively want to block.

It's 2020, writing a website that works properly is easier than ever, and these businesses have no excuse for how bad their sites are.


you're well on your way to default-deny mode on uBlock, which is a great way to run uBlock Origin / uMatrix if you don't mind a couple of refreshes and a couple false starts when it comes to ordering stuff on e.g. shopify:

https://github.com/gorhill/uBlock/wiki/Dynamic-filtering:-de...


Yeah I agree, I been using uBlock/uMatrix on all browsers/profiles for many years. Everything is disabled by default. Works very well if you know what you are doing :-)


That was a nice rabbit whole. Thanks. I'm using medium mode now and I've disabled uMatrix, because I think I didn't really understand how to use uMatrix and I think it made me less safe.


> I think it made me less safe.

That's unlikely, unless it increased your own risky behavior. If you install uMatrix and then set it to allow all content, you would be in the same situation as if you didn't have uMatrix at all, ie. no less safe.


You're right. When I think about it now, it's clear I'm really comparing these two states:

1. uBO defaults + uMatrix which is mostly in "allow all mode"

2. uBO in medium mode (as defined by the above video) w/o uMatrix installed

I think (2) is safer than (1). I've been in (1) mode pretty much since I started using uMatrix (2 years?).


You may realise (and perhaps just be yet to switch, like me..) but uMatrix is no longer maintained; the repo is archived. There is a fork intending to continue maintaining it, called nuTensor.


All of my computers are default deny. Annoys the hell out of my wife. Default deny is still staying on. She has learned that if she wants a strange website that she doesn't normally frequent to hit the "default allow with caution" button and only to rarely hit "okay, let's go incognito and enable everything" mode.


As a regular user this is the sort of issue I face too - it's fine to lock things down for me, but less technically inclined members of the household then have a severely degraded experience and their are many more 'support calls'. This leads to a policy which is much more liberal than it would be if it were just me, simply to maintain usability.

Even just installing uBO on an elderly parents computer is a tough call as low resistance usability is very important and complex use flows get harder to teach/learn.


Known 3rd party CDNs are used for all kinds of content from CSS needed to make a site work to ads and tracking scripts.


That's uMatrix (at least in-browser)... Sadly it appears to have been discontinued by gorhill.

IP whitelisting isn't fool-proof though. For example, I run an anti-censorship DoH resolver "IP fronted" by Cloudflare specifically to be a hard target to block at IP layer. I mean, you can block Cloudflare's IPs but that'd mean you'd end up blocking other websites too.


UMatrix is vital. Did anyone pick up the project after gorhill?


Don't think u/gorhill would be inclined at the idea of handing over the reins after the uBlock.org fiasco. Forks may survive, but the extension looks to be all but dead.


That's basically what I do with uMatrix.


Yup. I was going to post this too. Why keep 34000 site Peter Lowe’s ad blocker when a 50 site white list will do the same.

But umatrix is complex. Ublock is good enough. I just wished I could whitelist all cdns


uMatrix makes more sense to me than ublock origin's advanced mode. Frankly I don't see the supposed complexity of umatrix's interface. If you call it a 'matrix' maybe that sounds exotic and scary, but really it's just a table, the sort people are taught to read in grade school.


This is me with LittleSnitch.

I really resent the fact that Apple hamstrung LS with Big Sur. I get it that packet-level firewalls still work as intended, but it was really nice to be able to kill telemetry on a network level—not to mention disabling pinging data-hungry Apple services (e.g. iCloud Sync) when tethered on limited data.


You can run Catalina at least for two years and then move to hardware firewall solution and any other os of your choice. Actually the move from Apple against LS and VPN was the final straw for me.


There is, it's built into your browser. Disable JavaScript by default and enable it manually for sites of your choosing.

For Chrome, you can do so here: chrome://settings/content/javascript

Unfortunately this list does not sync across installations. :(


That's not enough. Half of the sites are broken without JS, but you don't really want to open the floodgates to malware/adware/etc. by enabling all JS - you only want to unbreak the site itself.


I do this in conjunction with an actual ad-blocker. Some websites do just break without JS and need to be white-listed (e.g. Google Maps, OpenStreetMap), but, more often than not, disabling JavaScript improves the site by disabling the ad-blocker-blocker and the cookie pop-ups.


A lot of the JS-heavy sites are broken in some way no matter what you do. Nowadays I even need to semi-regularly refresh Google Image results because now and then the page breaks and clicking on the image thumbnails doesn't do anything. If one of the richest IT companies can't get a simple click event to always work correctly what hope do the others have?


I wish there were more details about how this list of domains was compiled.

This is the explanation on the Gitlab:

> We were testing an AI that could show some basic emotions about internet content, and turns out it was very precise at getting “annoyed” by ads and “unsolicited” third party connections…

> From that, I forked our own project and tweaked it in a specific way to basically only focus on ads, trackers, etc. and act like a web crawler, turns out to be very effective!


I agree. I think its very likely 'import a ton of other block lists and then apply some sort of logic to whitelist potential problem domains', but I would love to be wrong.


Explanation is what our current AI tech struggles with most.


Feedback: Perhaps add info about Pi-Hole usage.


Shouldn't just adding the URL[0] work?

[0] https://gitlab.com/The_Quantum_Alpha/the-quantum-ad-list/-/r...


I had 1,217,258 blocked domains on my Pi-Hole to begin with, and this added 515,284 new domains.


Yes, I have seen a rough add of ~500K domains indeed.

Had to whitelist api.instabug.com since with it blocked my 9GAG app goes crazy and tries to connect to it non-stop, lighting my phone on fire.


I'm surprised to see that there is any overlap between 9GAG and HN :)


Next you'll tell me you are surprised that a 9gag and an HN regular gets laid! ;)

In all seriousness, I casually browse it for 30-40 minutes a day. Helps put my mind at ease here and there (although it too became heavily politicized and not as much fun as it was years ago).


You leave him alone, he is a renowned Field Marshal, though as OP, also known to suck a lot of #!@%

P.S. 9gag in-joke, you wouldn't get it.


I went from 2.039 million to 2.043 million, so about 5,000 new domains.


Yes just login to the web interface go to group management and add that url - just done that on my pi hole running on a Model B Rev 2


Yep, just did the same. Thanks for confirming that it also works for you.


I was asking the same and then looked at my pihole and came to the same conclusion.


Seems to be at least partially compilation of other lists, but without giving them fair credit. [0] does make some pretty convincing claims.

0: https://gitlab.com/The_Quantum_Alpha/the-quantum-ad-list/-/i...


Is there a way we can block part of the subdomain, like a proxy firewall. Used to see this thing called mcafee web gateway. That used to block all ads,banner ads and even instagram ads !! Is there one such open source project out there ?? Where we can download and add the fanboy type block list to block the ad crap !!


DNS and IP blocks are like hitting a tiny finishing nail with a 10 pound hammer. Sure, you smash it, but you smash everything around it too.

Try Squid. It's a dynamic higher layer HTTP/web proxy that can block specific URL content, etc.

https://en.wikipedia.org/wiki/Squid_(software)


Awesome work!

For anyone interested in using The Quantum ad-list in their adguard setups, I've added this to my self-updating ad gaurd block list generator (smashblock - which is also based on hblock).

https://github.com/smashah/smashblock


This really makes me wish there was a way to include files in /etc/hosts files...


Just generate /etc/hosts from a cron job... Pretty easy to concat, sort, and filter however you like in a short script.


You can also set up a systemd service to monitor files in a directory and refresh the /etc/hosts when they are changed.


If you are okay with a GUI on Windows, Mac or Linux, I can recommend Hozz. https://blog.zhangruipeng.me/Hozz/ No longer under development but works just fine.


Having used it for a while, on thing it seems to pick up that the standard pihole lists don't is

>clients.l.google.com

>clients4.google.com

Thousands of requests though so pushed up the overall block rate by a good 5%


Interesting. My company is on there, and one of our domains listed is not used in any for ad serving. It's the URL of our customer portal.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: