> Majestic promotes their list as the "top 1 million websites of the world"
Well, the source URL provided by the article author initially claims, “The million domains we find with the most referring subnets”. Then it makes a contradictory comment mentioning ‘websites’. At best we can say Majestic is vague and/or confused about what they’re providing – but given the author’s results, I suspect this list contains domains with no guarantee Majestic ever saw a live HTTP service on these domains.
> Citation needed, because if you do this, you'll also cut yourself off every search engine in existence.
How about I cite HN user ~gojomo, who for nearly a decade wrote & managed web crawling software for the Internet Archive. He says: “Sites that don’t want to be crawled use every tactic you can imagine to repel unwanted crawlers, including unceremoniously instant-dropping open connections from disfavored IPs and User-Agents. Sadly, given Google’s dominance, many give a free pass to only Google IPs & User-Agents, and maybe a few other search-engines.”
Well, the source URL provided by the article author initially claims, “The million domains we find with the most referring subnets”. Then it makes a contradictory comment mentioning ‘websites’. At best we can say Majestic is vague and/or confused about what they’re providing – but given the author’s results, I suspect this list contains domains with no guarantee Majestic ever saw a live HTTP service on these domains.
> Citation needed, because if you do this, you'll also cut yourself off every search engine in existence.
How about I cite HN user ~gojomo, who for nearly a decade wrote & managed web crawling software for the Internet Archive. He says: “Sites that don’t want to be crawled use every tactic you can imagine to repel unwanted crawlers, including unceremoniously instant-dropping open connections from disfavored IPs and User-Agents. Sadly, given Google’s dominance, many give a free pass to only Google IPs & User-Agents, and maybe a few other search-engines.”