Hacker News new | past | comments | ask | show | jobs | submit login

The funny thing is the site which should have been removed like the of Stackoverflow spam clones, or sites like Canva and Pinterest that make thousands of similar looking pages with slightly different heading are still allowed and rank on Google.

Also hate the top 10 pages whenever you search the best something, like best domain name registrar. I don't want to read a spam blog post with affiliate links, I wish google would show me the actual domain registrars instead (like chatgpt does when i ask it). Google has been gamed so badly and they have been doing nothing about it just because the spam blog posts contains their ads.

There are some tricks I learned on HN to use uBlock origin to filter these spam sites but Google really needs to fix this. There is only so much an adblocker can do to fix search. And right now all the useful content is getting blocked while the spam content is not only allowed but ranking on top of everything.




The moment I saw the term "SEO" it was like a stopwatch ticked on until search was dead. It used to be frowned upon to do little tricks, like keywords in a tiny, transparent font picked up by crawlers, but not seen by users.

When gaming search engines became a profession, the end of search appeared on the horizon. Guess we're headed back to web rings and link indexes (which will be consolidated, heavily monetized, and abandoned). If we're lucky, we'll be back to dialup BBSes by the 2040s.


Yes. We can point to the exact moment in time when Google turned to the dark side. It was on August 9, 2006, when Eric Schmidt, CEO of Google, addressed the Search Engine Strategies conference.[1] This was the moment that SEO, now officially endorsed by Google's CEO, became respectable. Until then, SEO was considered to be a branch of the spam industry. There were conferences such as the Web Search Spam Squashing Summit in 2005 on how to kill it.

[1] https://www.google.com/press/podium/ses2006.html


There was/is whitehat SEO: properly creating links to relevant content within a site, using keywords that help bots find relevant content, traversing your own site to make sure there aren't dead zones with content that will never be found.

Google should have leaned into just this small segment of tooling and done much more of a ban hammer on the bad actors. They didn't and a bunch of people have left. I use DuckDuckGo mostly and sometimes Google but never by default. I don't even like DDG that much but it's good enough.


> web rings and link indexes

The popular "Awesome ${whatever}" collections are very much that, link indexes for relatively narrow areas.

Curation matters. This is something computers are still bad at, or are too expensive to deploy for a mass-market service.


In general, listicles suck. (Business) Insider is a good example of this.


SEO sends the message that you have a content writer, and, thus, that you are a well established business (The content that is required to be produced is more nefarious than having hamster marketing guys spin in wheels, but the nature of the work is irrelevant to Google).


I feel like the turning point for me was when Google removed the ability to always exclude certain sites from searches. I had a number of sites configured Ty always be excluded because the results were always useless. Ever since, the list of useless sites in my search results has been slowly creeping up.


Yup, same for me.

It really seems high-quality search is fundamentally in opposition to serving ads, alas. (At least once every page in existence probably serves ads via the search operator's network.)


That's exactly what Brin and Page said in the paper where they presented Google...

"[W]e expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of the consumers"

and

"[W]e believe the issue of advertising causes enough mixed incentives that it is crucial to have a competitive search engine that is transparent and in the academic realm."

Both from The Anatomy of a Large-Scale Hypertextual Web Search Engine by Sergey Brin and Lawrence Page (1998)


have they ever addressed this in later years? Like, what was their justification for Google becoming the AdMonster it did?


They're now richer than god. Why would they care about anything other than money? Societal rules don't affect them very much at this point.


Yes but you can't hear them because they are on yachts


Sounds like a job for the orcas…


Lol, hope springs eternal :).

... but more seriously: Yeah, their reasoning (now) isn't exactly going to be unbiased and/or unblemished by their own experience. They probably have the most extreme survivorship bias ever. (Not their fault, they just do because of whatever factors got them into the position they're in.)


Totally. Advertising is fundamentally about distracting someone so you can put money in your pocket without regard to the impact on that someone.

I have a lot of complaints about, say, McDonald's, but they make their money through giving people something of value to them. Advertising legitimizes the making of money in a way unrelated to value delivery. (The same is true about a lot of finance.)

When you combine that with up-and-to-the-right numerical goals and standard executive incentives, over time you pretty much guarantee what Doctorow calls "enshittification". Delivering value becomes at best a side effect of the system.


Aren't McDonald's one of the biggest advertisers in the world? They sell poor quality food on the basis of brand recognition.


That's unrelated to the point GP is making. The point is that they are not incentivized to put poison into burgers to get money from the poison industry.


Well put. Ads are the poison of information.


This is a good point. The problem is that it is very hard to make a living serving high-quality results which is likely why ad-funded search still dominates. The vast majority of the world will likely never want to pay for search on its own.

There are, of course, a few relatively successful paid general purpose search engines but these serve a niche demographic if you consider the world-wide scale of google et al. Possibly specialized search (we build one) will be able to thrive in the future, but these engines also serve niche markets in the end.

Thus, the real competition to ad-based search is not high quality search and that is likely why search results don't get better.


I've used uBlocklist to filter these things since Google removed that feature


I think that there are current SEO practices that make sense that a search engine uses as a ranking signal, like accessibility (mobile friendliness, screen reader compatibility, +), time to load (performance, image optimization, no js bloat), security (https), no stuff appearing on top the content (modals), use of h tags and breadcrumbs to order content, srucured data for bots and more.

Of course people game the system, apply shady practices, sell courses with tips. It is a nuance topic, the name now is bastardized by marketing companies doing shady things, but on his core is just practices to create good websites and provide a good user experience


Oh man can you imagine Yahoo, playing the long game, brings back their user-managed indexes and becomes the dominant search engine again.


You might have stumbled onto a practical and useful application of LLMs. Yahoo (or whoever) could crawl the web, categorise each page, and provide a genuine "Index".

Like in the back of a book, or an old subject based card catalogue ... Dewy-decimalise the web :)

I've always felt that an index by subject would be more useful than string-match based searching. Of course, the index might rank links within each sub-sub--sub-sub...category with something like the original page-rank.

Now if Yahoo (or whoever) could avoid the enshitification trap ... imagine what a fabulous resource that could be.


> Like in the back of a book, or an old subject based card catalogue ... Dewy-decimalise the web :)

Now that is an awesome idea. I really do think we need to have category-based indexing as well as page search.


Until ChatGPT came along, I figured it was inevitable that human curated search came back into ascendancy as the crawler model has become such a failure.

Now we can use ChatGPT to filter through Google's infested mess, but this double edged Sword of Damocles will be able to create infinite attempts to bury genuine content with ad spam.


I might pay like max. 3 EUR a year for this to get a search engine that gives the good results without ads, SEO spam and bogus clone sites.

That amount of money is probably more than Google now makes from my online presence because I adblock, block 3rd party cookies, tend to click "block" to everything including the idiotic "legitimate concern" and never ever click on ads.


Can't remember exactly where I saw it, but last number I saw said that Google makes ~$12 a year per user. Which begs the question... why have they not atleast tried a "Google Premium".

Fuck it, I'd pay $15 a year to have a Google search that puts as much effort into finding me the shit I actually want as Google does today in serving me BS ads I never pay attention to.


> Which begs the question... why have they not atleast tried a "Google Premium"

They announced "Google Contributor" at one point but it never went anywhere


First thought is that it's the most profitable users who would choose adless.

The ads industry would not like their reach to be limited to those not paying for premium search.


I always wanted to see just a "price transparency" aspect.

Tell me exactly how much the advertiser paid for his placement, and that's a hugely important signal here.

If I'm searching for weird hobby parts, even though it's a high purchase intent query, they're probably paying pennies per click.

But if you start searching, say, financial stuff and the ad placement figures start showing multiple dollars per click, it's a warning "these people are willing to spend THIS MUCH MONEY to present a message to you, this probably means there's something sketchy involved."

I know, for example, anything pertaining to insurance and financial products is highly likely to turn into a farm of cross-selling and personal-information harvesting, because the cost per acquisition is so high and the tendency for everyone to sell the information to everyone else is so great.


I'd argue most people do not even realize they click on ads. My mother! The ad call-outs got so inconspicuous that they're almost indistinguishable from ordinary search results. And if you don't realize that and just click on the top results, you're amongst the top profitable users.


>I might pay like max. 3 EUR a year

wow, Search really isn't worth developing.


I might pay more if they didn't sniff any other data from me. Search is just one color of light from the prism, so to speak.

Edit: not beam but color


I would pay 10 EUR. :)

Would 3 EUR or even 10 EUR / year for each customer be enough to run operations?


This is an interesting point. User managed or curated indices offer unique advantages, especially when 'depth of coverage' is more important than 'breadth of coverage'. I believe that we are witnessing people shift away from demanding 'search breadth' as we speak, so someone might possibly decide to do this.


> User managed or curated indices …

Do you mean self-managed?

Everything else is effectively the influencer scene. Which is increasingly deplorable as well.

Anything with wide enough reach becomes cost-effective for gaming.

So one would have to return to a highly fragmented world to make gaming the system cost prohibitive.

And that would get us to a pre-Internet world. But then again, it’s not entirely unthinkable that we’re headed towards increasing Internet fragmentation if various governments get their way.


I thought Altavista was the dominant search engine, not Yahoo.


Brilliant idea!


> If we're lucky, we'll be back to dialup BBSes by the 2040s.

Unfortunately, there's no going back - the closest we can get is BBS-via-SSH. The entire landline phone infrastructure is crumbling around us (or in many cases, completely gone). Voice calls are packet-switched now, rather than circuit-switched as in the past. The upshot is, fancy modulation techniques that made full-duplex 33.6k possible over voice-grade connections aren't going to work, and even good old Bell 103 (300 baud) may end up being problematic.

I'm not sure I can even get new landline phone service, and if so, it's going to be expensive - and the wire plant is an unmaintained mess. When I got my folks off their landline and onto VoIP some years back, their old landline had so much hum it was nearly unusable. Once the inside wiring was disconnected from the landline and connected to an ATA, the hum was gone. It wasn't our wiring.


I am trying something different that might work for you with aisearch.vip

The challenge will be staying true to not showing ads, respecting user privacy, and not requiring a subscription. So far, the only thing that works is free daily quota + pre-paid


> Pinterest

Kagi recently released a leaderboard for per-domain customization: https://i.imgur.com/ViLamx7.png

They are 2nd for "lowered" and first for "blocked".


Kagi is amazing but the costs are very non-negligible. :F


More than 1.5 cents per search is a lot, they're pricing model really discourages me from making them my default search engine. A $10 unlimited search would be more justifiable. $15 would be stretching it.

If I'm not on a functionally unlimited unlimited plan (rate limiting is fine if it's reasonable) then I have to think about it every time I search. Should I really be searching here or should I be using the search box on stackoverflow or github or Wikipedia? Just introduces some constan cognitive frisson to the experience.


There is Ultimate which is not capped. That seems to be what you want?

Personally, my record was below 900 searches and usually below 800, so the 1000 searches (technically 1500 because early adopter) are absolutely fine for me.


Kagi ultimate seems to be multiple times the price that poster mentioned they thought reasonable for unlimited search


Sometimes reading is hard, thanks.


I agree, but then I have very few subscriptions, so there's space in the budget ;)


Well that's a feature that I might be willing to pay for.

Add Twitter to that list and I'll start finding accessible content again.


They have block/lower/raise/pin for domains, and additionally, they support regex rewriting of domain names, all my twitter results on Kagi redirect to nitter (though I don’t know if that still works, I was gone last week)


Ohhh regex rewriting for domain names? I'm a paying customer for two months but I hadn't discovered that feature yet. Thanks!


Why is w3 so low? I always found it at least marginally helpful.


It is better now, before the revamp, it had incredibly outdated information. Even now, there is very little reason for anyone not to go to the same page on MDN instead, unless they are a literally-just-starting-out beginner (as MDN has more information). That, coupled with them ranking high on organic results, makes people want to block them.


I always have a feeling they could get surprisingly far by having one person doing random searches, then tagging all the obvious spam to be deindexed. But Google is very adamant about never hiring a person to do a job well, if AI can be trained to do the same job badly


Don't they have whole teams of people looking at sites through the lens of the leaked 'quality rater guidelines' - ? I thought it was like a few thousand people in the philippines downranking sites for thousands of reasons - without being transparent about possible biases and such. Maybe I'm thinking of somewhere else. And wasn't there a story about a bunch people who were doing similar as 1099 contractors in the US striking because they were not getting a living wage or something?


Looking at the quality of current google searches I highly doubt it. I mean I could single handedly remove thousands of domains in a month and improve google search result, I have a hard time believing that they have 1000s of people doing it and the quality is this shitty.


They were fined 2.4 billion euros for downranking a shitty SEO spam operation: https://techcrunch.com/2022/02/07/google-shopping-lawsuit/am....

How many of those thousand domains will result in billion dollar fines?


Wow, that is not what that article is saying at all.


I'm aware, but that's not the point. I went to that company's website around the time that case was in the news, and 100% agreed with google downranking it.

The point is, downranking decisions by google have a non-trivial chance of being litigated in court, and at that point, I doubt that the decision will actually be made on the merits of the particular site. A sympathetic website owner with a sympathetic regulator is frequently going to win against google, regardless of how shady the site actually is.


a simple yahoo search-https://search.yahoo.com/search?p=google+quality+raters&fr=u...

first ten results include: theworkathomewife dot com › google-ads-quality-raterHow to Get Paid to Be a Google Ads Quality Rater

Sep 18, 2022 · As a quality rater, you ensure ads on Google – and other search engines – are both accurate and visually appealing. You will be given sample search terms and potential

Forbes - After Months Of Protest, Google Search Quality Raters Finally Get A Raise

ARS - 2017 - https://arstechnica.com/features/2017/04/the-secret-lives-of...

and the google note:https://support.google.com/websearch/answer/9281931?hl=en

with a hand wavy - these raters don't actually effect the results..

I read dozens of pages of the raters guidelines pdf years ago - and I can see how they are being sneaky with things the 'down rank things that lack trust signals' - which in of itself sounds okay - especially with some situations.. but that exact thing can be used to push down entertainment sites, and can help adwords make more money by forcing lots of sites to pay to show up in the top 10 or miss 90% of the internet traffic.

So I agree that they are pushing shitty results in many search verticals - but yeah it does seem they are using thousands of people to create them this way on purpose.

It's no longer a better search engine - it once was - today it's popular because it's a default on many devices, and the network effects make it so people build their sites and update their listings like it's the only yellow pages in town.

Their high-brow censoring leaves much entertainment and other things better found using other portals to search and find imho.


Yea why cant i just remove these spammy Stack Overflow clones. Its wasting me precious time whenever I need some answers. Seems like such an easy feature but yea Google already lost its way a long time ago…


For those on Google or DDG, here is my uBlock filter with hundreds of GitHub/StackOverflow copycats: https://github.com/quenhus/uBlock-Origin-dev-filter

It blocks copycats and hide them from multiple search engines. You may also use the list with uBlacklist.


These filters are light enough that they work well in Manifest V3 adblockers. The "Google - Global" filter only adds 1376 dynamic rules for example.


I googled something.

One of the results was "title (recommended)".

That should be enough for Google to ban the result.


so Google should have a list of titles that if found cause them to ban the result, which frankly might affect people with little personal sites who aren't very good at making sites but just want to put up some stuff they really know a lot about, but anyway is there any other items in the list?

I'm pretty sure SEO sites will be able to figure out don't title your page "title (recommended)".


When all this search engine 'things' came about websites were indexed organically.

Then websites started to optimize for what Google actually parsed. Meta tags, no things behind JavaScript, semantic markup etc. And that worked really well. Stuff was easy to find.

Those were the best days. There was some luring, but at least if you were looking for something technical you found it (provided it existed and was indexed).

This is no longer the case. Give you an example:

Any search term for Windows+Error+KB<number> is a nuclear wasteland of companies having copy-paste pages only to sell you their services at the end. Lately the same has been popping up on YouTube.

Any search for an issue with formatting a harddrive lists 20 pages which sell formatting tools.

A search for checking whether a certain animal is dangerous yields wildly inaccurate results just to sell me pest control. Spoiler: it was a Jerusalem Bug. You don't get a fever at all. The thing just has huge mouthparts, and that can physically hurt.

When you put "(recommended)" in your title because you know it's then displayed verbatim in Google is nasty.

All driven by these websites which 'recommend' things to you.


You can use the AdGuard Annoyances list in the uBlock Origin settings to block certain types of content. This includes blocking cookie scripts that pop up every time you opt out of non-essential cookies. It also has an unbreak filter list that, when used in combination with EasyList, can remove anti-adblock warnings .


> The funny thing is the site which should have been removed like the of Stackoverflow spam clones, or sites like Canva and Pinterest that make thousands of similar looking pages with slightly different heading are still allowed and rank on Google.

I recommend this userscript https://github.com/vladgba/Back2source for avoiding Stackexchange clones, it saved me a lot of time.


It's also not clear why Google killed Blogger.

More investment in Blogger with better linking between blogs (the way you can @someone on Twitter) and discovery (#tag searches allowing you to see posts from multiple blogs, curated blogrolls), more realtime/live logging functionality, easy microblogging functionality, etc. would probably have led to a lot more content existing on Google owned spaces, as opposed to being siloed behind Facebook and Twitter paywalls.


> It's also not clear why Google killed Blogger.

Because they kept chasing "the next big thing" aka "the next product to give me promotion inside the company".

Remember Google Wave? Knoll? Plus? Orkut? Spaces? iGoogle and Sites? Slide? Jaiku? Buzz? Aardvark?...

(I had to scroll through https://killedbygoogle.com to remember some of them)


> The funny thing is the site which should have been removed like the of Stackoverflow spam clones

Absolutely. I have a browser extension installed specifically to block domains that are this type of spam from my Google search results, and I’m adding to it almost weekly.

Come on Google, you’re seriously ranking that as front page worthy?


> There are some tricks I learned on HN to use uBlock origin to filter these spam sites but Google really needs to fix this.

Oh I did use ublacklist which mixes blocking in both of the google search and the google image search. But curious what is the filter you have in your uBlock origin?


I'm not OP, but here is my own uBlock filter with hundreds of GitHub/StackOverflow copycats: https://github.com/quenhus/uBlock-Origin-dev-filter


thank you!


thanks


Is there a good list of spammy sites for uBlock Origin?


May I ask which filter list you are using?


What are stack overflow spam clones?


There are various sites that seem to crawl Stack Overflow and present the content verbatim in their own UI. They're annoying because they are sometimes out of date, they generate duplicate results and they're practically uninhabited.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: