The mermaid is taking over Google search in Norway

Ueland · on July 29, 2021

I have some experience on this field. Around two years ago i was a DevOp for the company running Dagbladet, Norways #2 newspaper. One of the things I did was keep an eye on mysterious traffic.

I managed to find a huge spam network that set up a proxy service that delivered normal content, but injected "you can win an iPhone!" spam to all users visiting them.

Since I was in the position of being able to monitor their proxy traffic towards many sites I managed. I could easily document their behaviour.

In the same time, I wrote a crawler that visited their sites over a long, long time. I learned that they kept injecting hidden links to other sites in their network, so I did let my bot look at those also.

By this time, I also got a journalist with me that started to look at the money flow to try and find the organisation behind it.

My bot found in excess of 100K domains being used for this operation, targeting all of westeren Europe. All the 100K sites contained proxied content and was hidden behind Cloudflare, but thanks to the position I had, I managed to find their backend anyways.

We reported the sites to both CF and Google, and to my knowledge, not a single site were removed before the people behind it took it down.

Oh, and the journalist? He did find a Dutch company that was not happy to see neither him or the photographer :)

avian · on July 29, 2021

> We reported the sites to both CF and Google, and to my knowledge, not a single site were removed before the people behind it took it down.

As someone that tried reporting spam sites because they were using content scrapped from my website, I'm not surprised.

Cloudflare has a policy that they will not stop providing their IP hiding/reverse proxy services to anyone, regardless of complaints. The best they do is forward your complaint to the owner of the website, who is free to ignore it.

They say "we're not a hosting provider" as if that's an excuse that they can't refuse to offer their service. I'm sure many spam websites would go away if they couldn't hide behind Cloudflare.

yosamino · on July 29, 2021

> The best they do is forward your complaint to the owner of the website, who is free to ignore it.

Or worse. Since I have no way to know beforehand who I would be dealing with, this is actively dangerous - what if the mobster running this site is having a bad day and choses to retaliate ?

Also what a stupid fucking policy that is. Even if you are not legally compelled to block content, what is the point of actively helping distibute harmful content?

What they are doing is worse than just saying "We are not a hosting provider" - because while what is true, they are actively distributing content that is hosted elsewhere while hiding who is hosting it.

One can easily write an email to abuse@hoster.example.com and usually these people do not want garbage on their networks. CF is making it impossible to do notify them, and they refuse to implement an alternative procedure.

I still do not understand the moral position of profiting off of enabling criminal scum, when it would be so easy not to...

tlogan · on July 29, 2021

I do not think that it is up to Google or CloudFlare to police the internet. If a site is doing something illegal then report it to appropriate gov agency. If gov agency does not anything then get involved into political process to fix that.

hackbinary · on July 29, 2021

If Google, or CF, or whoever are fronting illegal activity with their services, they are absolutely responsible for damages the party they are proxying.

Platforms must be responsible for the content they are hosting, broadcasting, and publishing.

One to one communications between two people exchanging ideas and having a private discussion is different from mass broadcasting.

fstrthnscnd · on July 30, 2021

> Platforms must be responsible for the content they are hosting, broadcasting, and publishing.

> One to one communications between two people exchanging ideas and having a private discussion is different from mass broadcasting.

The highway is used both by those visiting their friends and those doing mass deliveries. Is it the job of the highway maintenance crew to control for what purpose their network is used?

I would like to know if the above analogy stands.

Edit: https://news.ycombinator.com/item?id=27994831

zizee · on July 30, 2021

The "owner" of the highway is the government, who regulates commercial traffic differently to personal traffic. The government places strict rules on who is allowed to use the highway, and how it is used.

The highway maintenance crew is akin to the person installing racks for CloudFlare.

hackbinary · on Aug 6, 2021

Highway are a poor analogy for information broadcast systems in general. Highways are closer to a one to one transmission system rathe than a broadcast system of one to many.

franga2000 · on July 31, 2021

They're already removing things they don't like. I see no reason why they shouldn't remove things that are objectively 100% harmful.

Like seriously, is there a single person on the planet that's going to defend online scams? It's immoral, it's illegal, it benefits no one and harms thousands. And it's not like it's very hard to detect and block either.

xorcist · on July 30, 2021

If someone were to tell AT&T that this call center customer of theirs is in the business of extorting people for money, they'd at least look at it and help law enforcement accordingly. Cloudflare has a talk-to-the-hand attitude until actively forced by law enforcement. That's an important difference right there.

pdimitar · on July 29, 2021

Gov agencies and political processes take ages to do anything at all.

At this point I'd still like the internet companies doing partial policing of content. At least they'll achieve something.

a2tech · on July 29, 2021

Because criminal scum pay their bills. You don't think 8Chan was on a free account do you?

The sooner developers realize that Cloudflare is not saving the Internet the better.

oblio · on July 29, 2021

At this point I'm convinced that at least 10% of all legitimate economic activity is actually money laundering for crime organizations, in various forms. I imagine that percentage goes even higher in the financial capitals of the world.

BeFlatXIII · on July 29, 2021

That’s the low estimate.

samstave · on July 29, 2021

'ey... I'm trafficking Paintings 'ere -- ya know... Art

cratermoon · on July 29, 2021

> what if the mobster running this site is having a bad day and choses to retaliate ?

I wonder if someone with malicious intent could set up a site designed to generate complaints (how exactly would be an exercise for the reader), put it behind Cloudflare, and purposely use the information in the forwarded complaints to harass, abuse, dox, or otherwise harm people.

FDSGSG · on July 30, 2021

>One can easily write an email to abuse@hoster.example.com and usually these people do not want garbage on their networks. CF is making it impossible to do notify them, and they refuse to implement an alternative procedure.

But that's exactly what CF does. They forward your abuse complaints to the abuse contact of the IP address hosting the content.

arthur2e5 · on July 30, 2021

The retaliation is quite real—CF keeps your entire e-mail address and name in there, so you are essentially doxxing yourself. Pretty sure 8chan posted a lot of the reports they got back in the day.

eru · on July 29, 2021

They might take that stance, to avoid liability and complication.

At the moment, they have a very clear rule. If they stop providing services to obvious spammers, they will create lots of grey areas, and they will also implicitly make a judgement that the client they still serve are _good_ in some way, and an enterprising lawyer or muckraker might exploit that.

LudwigNagasena · on July 29, 2021

Cloudflare dropped the Daily Stormer. The ship of pretense of no judgement has sailed.

Litost · on July 29, 2021

To anyone else who's curious links for the above:

https://blog.cloudflare.com/why-we-terminated-daily-stormer/

https://en.wikipedia.org/wiki/The_Daily_Stormer

sneak · on July 29, 2021

This may have had something to do with the fact that the daily stormer was claiming prior to that that their lack of suspension was an implicit endorsement by CloudFlare of their site and content.

Misuse of trademarks is a thing.

I agree, however, that CF's policies are applied arbitrarily.

iratewizard · on July 29, 2021

And 8chan, the 4chan alternative where anyone can make and moderate their own board.

hnbad · on July 29, 2021

Better known for being linked to the Christchurch and El Paso shootings, being the origin of the QAnon movement and having a history of hosting child pornography.

https://en.wikipedia.org/wiki/8chan

iratewizard · on July 30, 2021

Facebook, reddit, MySpace and Twitter have all been linked to mass shootings and child pornography. None of these sites condone, enable or remotely desire such content.

hnbad · on July 30, 2021

> None of these sites condone, enable or remotely desire such content.

Yeah, and that's the difference, isn't it? 8kun might not condone any of these things, officially, but it very much enables and desires them.

This kind of discourse is seen as the "price of freedom", its presence a demonstration of absolute tolerance and blind faith in freedom of speech. Facebook, reddit, MySpace and Twitter are more strictly moderated and impose actual terms of service on their users' freedom of expression.

But of course this also means the people most motivated to join networks that offer guarantees of free speech absolutism are those whose discourse is not tolerated by these mainstream alternatives. And their presence will almost guarantee an absence of "normies" who don't run into the limits of their freedom of speech on the moderated networks much and feel uncomfortable around the former group.

Heck, the only reason 8chan ever became large enough to be widely known was because 4chan evicted Gamergate. And 4chan isn't exactly known for its strict moderation and suppression of political views.

iratewizard · on July 30, 2021

Do you know why 4chan evicted Gamergate?

hnbad · on Aug 2, 2021

Enlighten us, I'm sure your explanation will reframe 8chan in a way that makes it seem a lot more respectable.

iratewizard · on Aug 8, 2021

Moot was trying to be friends with the people in that circle. A girl he was trying to date didn't like it. He was taking awkward babysteps towards his lackluster job at Google where he would never be promoted or accomplish anything meaningful again.

avian · on July 29, 2021

How is that different from a hosting provider that has to address legal complaints regarding spam, copyright infringement, etc. on their servers? Just like a hosting provider, they specifically have a relationship with the website owner to provide the reverse proxy service. It's not like they can say "we don't know who or how our service is being used".

It seems to me that if they want to be in this business they have to deal with these liabilities and complications, not hide behind some vague "our hands are tied" language.

studentrob · on July 29, 2021

Presumably if illegal content is not taken down by the customer then the host cancels the service, right? Otherwise the host risks liability. That's different from revealing the IP of a customer which requires a court order.

eru · on July 29, 2021

You have a point, but I assume those businesses' lawyers understand this better than our armchair speculation here.

lelanthran · on July 29, 2021

> How is that different from a hosting provider

If their argument is "we only retransmit what we get, with caching" then they are in the same place liability-wise as the phone providers ("We only retransmit what we et, with caching").

In other words, a common carrier.

Hosting is different. For exmaple, Youtube is not liable for what their users upload. They comply with takedown notices because they host the content, not the user.

breakingcups · on July 29, 2021

But in a way, they actively host the content. The fact that their server periodically retrieves new content from a different backend makes no difference. The page sits on their hard drives and is server by their servers when I visit that domain. It's always been a very, very thin argument and it has gotten even thinner with the likes of Cloudflare Pages and Workers.

Cloudflare is just a huge company actively ignoring abuse complaints and somehow they are getting away with it. It even helps their PR to a certain market segment.

They even still host kiwifarms, a board that is primarily known for its vicous harassment of people and is known to have driven multiple innocent people to suicide.

I consider CloudFlare a bad actor at this point and I wish the other big names around them would too. They are subsidizing crime with VC money.

IfOnlyYouKnew · on July 29, 2021

This logic doesn’t make sense. Nobody is under the illusion that CF is somehow incapable of denying service to individual customers.

mattbee · on July 29, 2021

This policy even extends to stresser/booster/DoS-for-hire services services - try searching for some and see who fronts them?

20 years ago the transit providers of the internet would have spotted Cloudflare for what it is, and cut it off.

guest159835 · on July 29, 2021

I've been reporting hundreds of spam sites to Cloudflare, but always get the same lame excuse. Godaddy the same. Meanwhile good content drops in Google rankings and spam moves to the top.

FeepingCreature · on July 29, 2021

That seems like the sort of thing that should require a judge's order.

trangus_1985 · on July 29, 2021

Cloudflare is not a public institution. It troubles me that they get to define, draw, and then maintain that line.

However, I do agree - privacy unveiling like that should require a judge's order.

stavros · on July 29, 2021

But they don't, that's explicitly their stance. There is no line. They host everyone equally. To do the opposite would require drawing a line.

yosamino · on July 29, 2021

That is not true. They do have a line specified here https://www.cloudflare.com/abuse/

It's just that the procedure is so useless that it might as well not exist.

fauigerzigerk · on July 29, 2021

IANAL, but I don't see a Cloudflare specified line anywhere on this page. I think this is just the bare minimum they are legally required to do.

trangus_1985 · on July 29, 2021

> There is no line

You are missing the point of the complaint, which is that it's a private decision to hold that policy. Maybe it was a bad idea to use the word "line", but the intent still stands unadressed.

account42 · on July 29, 2021

> They host everyone equally.

Everyone except those that are too right wing.

stavros · on July 29, 2021

Are you referring to the one incident where they stopped hosting a racist hate site and then vowed to never take sides again?

account42 · on July 29, 2021

Yes. Also to the incident where they stopped hosting 8chan after they vowed to never take sides again.

You can agree with Cloudlare not providing services to those sites as much as you want, but you cannot pretend that Cloudflare hosts everyone equally. They cannot use that as an excuse to not deal with spammers.

LudwigNagasena · on July 29, 2021

Well, that one incident shows that they don’t host everyone equally. A very simple and obvious conclusion.

stavros · on July 29, 2021

One data point is not a pattern.

dnzkw · on July 29, 2021

Two: Daily Stormer and 8chan.

antifa · on July 31, 2021

Know of any equivalent "left wing hate sites" that cloud flare hosts unbanned?

nextlevelwizard · on July 29, 2021

"He has never murdered anyone" "Are you referring to the one incident where he shot a racist hater and then vowed to never murder again?"

All I'm saying is that we won't know until they come under pressure again

stavros · on July 29, 2021

> All I'm saying is that we won't know until they come under pressure again

That's also true of people who haven't murdered anyone yet, though.

Whom do you trust more? The person who did something and vowed to never do it again, or the person who didn't vow anything? I tend to prefer the former.

wccrawford · on July 29, 2021

When it comes to murdering someone, I'm going to prefer the person who has never murdered anyone yet.

When it comes to service providers, I would tend towards your direction. They did a thing that had conflicting ethics on each side, weighed the outcome and their ethics, and then made a hard decision for the future. What they did could be reversed, too, and didn't cause much permanent damage.

Murdering someone is very permanent and should take a lot more initial consideration.

lupire · on July 29, 2021

this metaphor is absurd. the actual murderers here are the contributors to the banned sites, nor cloudflare, and there were a lot more than 1 murders.

studentrob · on July 29, 2021

Yes, it is akin to revealing the IP of a user on a social media site.

andyjohnson0 · on July 29, 2021

I'm pretty sure they stopped providing services to a neo-nazi site a few years ago. A decision that I am completely happy with btw.

nine_k · on July 29, 2021

This is very rational of them. They position themselves as a pipe for "bytes", not "content".

By ignoring the content they serve, they rid themselves of the necessity to analyze and judge what they serve. Not only would this require a brain the size of a planet and the expense of running it, but also would inevitably conflict with someone else's judgments, and bring various PR woes.

They don't analyze the internals of their traffic the way internet backbone providers don't analyze the internals of the traffic they pass around.

I frankly find this position superior: imho it does more good by preventing censorship than harm by serving good-intentioned and bad-intentioned customers alike.

rowanG077 · on July 29, 2021

In fact I completely agree with that stance. It's not cloudfares job to police the content. They provide a simple service. If something is unlawful law enforcement should go after the owners.

withinboredom · on July 30, 2021

And how, might I ask, do you propose to do that?

rowanG077 · on July 31, 2021

Law enforcement can get a warrant to get information to try and find them for example. They can hire security experts. They can do tons of things.

dylan604 · on July 29, 2021

That sounds like a hell of an investigation, and now my curiosity is running. 100k domains sounds like an huge amount of logistics on their side to keep it all running. It would be interesting to read about how a spam company manages that kind of infrastructure compared to a "legit" company.

Legit company will always have internal struggles between dev/sales/marketing, so things just take longer and are much more draining to accomplish. I'd imaginge spam org just needs to have bare minimum stuff up to satisfy whatever need it is they have knowing that humans won't necessarily be perusing those domains, yet it's 100K domains. I could almost see something like this running more smoothly. I can also see it being run by small number of people that let things lapse and it's just barely hanging together. So many questions...

tluyben2 · on July 29, 2021

It is not very difficult to manage: a company of mine was bought by a squatter (I found out after dealing with a broker for the sale; I had to integrate it with their 'tech team' and walked away after) and for many years already, this all has been fairly easy to automate. The registars have apis, cloud flare has apis. There was 1 tech guy keeping it all up and running and he didn't have to do anything. It would register and provision with content automatically. There is really almost no work involved besides keeping money in the registrar account and the costs are only the domains probably, maybe they have a little hetzner load balanced setup with 2 machines but that's likely it.

tikiman163 · on July 29, 2021

The reason you found so many domains is that they intentionally take down thier spam sites and reload them under a new domain every few hours. They do this so they can't be taken down by people reporting them as spam. They literally setup the next domain while the current one starts being used so they can do a live swap to the next one without interruptions to thier spam operations. This is typically done in an effort to spread Trojan malware to anybody running computers with out of date operating systems and browsers. Windows getting people off of Internet Explorer has been a huge hit for them as it reduces the number of possible vulnerabilities someone might have when they get sent to one of these Trojan spam sites.

ultimoo · on July 29, 2021

> By this time, I also got a journalist with me that started to look at the money flow to try and find the organisation behind it.

Very curious to know what you found!

Ueland · on July 29, 2021

We did publish a whole series about the network and companies we found in the process, unfortunately in Norwegian only and soft-paywalled: https://www.dagbladet.no/nyheter/sonjas-52-oppdagelse-avslor...

IG_Semmelweiss · on July 29, 2021

did the article resonate with norwegians? I assume the report probably answered so many questions of the populace on google's malfunction, even if nothing came out of it

what was the feedback to the article ?

lifeisstillgood · on July 29, 2021

Can I just clarify?

There is / are organisations that a) scrape legitimate sites for content, b) host that content on their own 100K domains, c) sit behind cloudflare, d) do some seo??? e) when someone finds their site they then inject an ad or similar rubbish f) do this enough that they make money off the ad / competition / porn ?

That seems like a problem that the ”original-source” metatag was supposed to stop?

tyingq · on July 29, 2021

Canonical urls help with noting your own purposeful duplicated content. But that meta tag goes on the duplicated content. So it doesn't help with scrapers, who strip that out.

lifeisstillgood · on July 29, 2021

But I thought that it was useful for google - who could find two caches with same content, one of which was 2018 one of which 2020 and both say "this is canonical". At that point the 2018 version is real and the other rejected.

Then again, you could just do it with publication dates ...

tyingq · on July 29, 2021

I don't know why, but Google seems unable to figure out (or just doesn't care) "who published it first". I've seen it be confused many times.

pepy · on July 29, 2021

Do you want to get to the bottom of this? A friend of mine is a top Dutch lawyer with an interest in these things.

Ueland · on July 29, 2021

This was two years ago and the network is now (to my knowledge) gone.

pepy · on July 29, 2021

alright, thanks for the clarification

keyme · on July 29, 2021

Google search has progressively deteriorated in quality over the last 10 years, to the point where I see it becoming useless in the relatively near future. And it's mainly not even their fault.

I've been using Google search for all kinds of research for 15 years. There used to be a time when you could find the answer to pretty much anything. I could find leaked source codes on public FTP servers, links to pirated software and keygens, detailed instructions for a variety of useful things. That was the golden age of the web.

These days, all the "interesting" data on the Internet is all inside closed Telegram chats, facebook groups, Discords or the rare public website here and there that Google doesn't want to index (like sci-hub, or other piracy sites).

The data that remains on SERPs is now also heavily censored for arbitrary reasons. "For your health", "For your protection". Google search is done.

omega3 · on July 29, 2021

> And it's mainly not even their fault.

It's precisely their fault, they've created an environment that incentivizes low quality, irrelevant content and are actively hostile towards users. Two examples just from top of my head: ignoring the country website, previously if you wanted to search only local news it was very easy to do. Another was ignoring completely the exact phrase search with double brackets.

spaniard89277 · on July 29, 2021

Ignoring double brackets drives me crazy. That's the last straw that sent me to DDG, although I have to say that DDG isn't much better either.

What made me really angry aboyt Google Search was when they removed their function to search in discussion forums. But even then you could more or less filter out crap.

Nowadays it feels very hard. I find myself using the site: flag many times, but you need to know the site beforehand, which is another problem.

BitwiseFool · on July 29, 2021

I also feel like some product manager decided that having a blank results page is horrible. So even if I put terms in quotes, and there are no results with those quoted terms, Google decides to show me results that have virtually nothing to do with what I want to see.

throwuxiytayq · on July 29, 2021

Except a blank page is exactly what I want to see if there are no results or if I mistyped my query. These shoehorned-in results throw me off every time because it takes extra mental effort to reinterpret them as "oh, google has no results for what I typed in past this point, so they're showing me random crap". I miss the old days when search was as precise as a scalpel.

Maybe I'm naive about the complexity of the problem (every article I read about the difficulty of what Google's doing certainly suggests so), but I honestly believe that we've reached the point where a talented and well-founded startup could outplay Google at their own game.

BitwiseFool · on July 29, 2021

>"blank page is exactly what I want to see"

I literally couldn't agree more. I can't stand how bad searching has become.

While we're at it, you know what else I really hate? How google switches the order of the buttons for Images, News, Shopping, Video, etc. on EACH QUERY. Who in the world ever thought this was a good idea?

throwuxiytayq · on July 29, 2021

I've always assumed that this is a bug in their A/B tests, because I cannot even imagine how utterly degenerate their product design process must have become to come up with this on purpose.

aasasd · on July 29, 2021

Exactly, it's very easy to see how Google doesn't leave the user unspammed. YouTube's search works the same way, and even if there are useful results on top, they quickly trail off into clickbait garbage. Plus the unrelated lists of ‘people also watch’, injected every few items. The search filters are barely enough to dial in when you want to skip obvious trash, but give up on anything slightly complicated. On Play Store, it's worse: you just get troves of what Google thinks you should be getting, with no control on your side—because if people could skip apps with payment inside, they would, and who in Google wants that.

BitwiseFool · on July 29, 2021

I feel like YouTube's goal is to always get you to watch something else. Scroll down in search results? See unrelated videos. As soon as the video starts? See a 'Recommended' badge. Pause the video? See an overlay with other videos. Leave the video running? Autoplay fixates on something else.

GuB-42 · on July 29, 2021

I think there is a market here. "Dumb" search engines, that search exactly the words you type, maybe with advanced features like regex, metadata search, etc... It won't replace Google's guesswork, but sometimes, I just want to grep the internet.

All non-Google engines are all about privacy, which is nice, and almost a requirement if you want to compete with Google, but I'd like to see features that actually improve search too. DDG gets a honorable mention with its bangs and applets.

BitwiseFool · on July 29, 2021

I'd like to go a step further and hope for "dumb" search engines that are tailored towards indexing specific subsets of the internet as a whole. As an example, imagine a search engine that is specifically tailored towards programming questions. Or one that specifically omits some of the more annoying SEO optimized results, like Livestrong and USA Today.

zomglings · on July 30, 2021

My current product started off as precisely that kind of search engine.

User adoption is a huge problem - almost no users made it their default search engine because even programmers need to do non-programming searches and it's too easy to go into your browser, hit ALT+d, bang out your search query, and hit enter.

And because google and ddg do a good job on most programming related searches, they get to be the default search engines.

lubesGordi · on July 29, 2021

I think so too. It'd really be nice to get a search capability that doesn't take my past searching into account. I want an unbiased search, and give me good tools to filter.

WarOnPrivacy · on July 29, 2021

> DDG gets a honorable mention with its bangs and applets.

Yeah but it ignores most other operands. It's fairly frustrating to be unable to mandate a search term.

quijoteuniv · on July 29, 2021

«I just want to grep the internet” +1

soco · on July 29, 2021

Yeah, I also try DDG first then and only if not okay I go !g. Now I wonder, why did GOO break such a useful thing? They could have shown advertisements also in their "classic" search (let's call it like that) so I'm really at loss - what was in it for them to change??? Germans have a word for that "verschlimmbessern" (or even two words - kaputtreparieren) which means breaking something by trying to make it better.

input_sh · on July 29, 2021

> Now I wonder, why did GOO break such a useful thing?

As far as I understand it, they want to catch synonyms and different tenses for the words.

But they do a remarkably shit job. nginx and apache2 aren't synonyms, but completely different tools for the same job. Yet apache2 instructions appeared as a match when I've used "nginx" in my query (the word apache2 was in bold in result snippet).

spaniard89277 · on July 29, 2021

I guess it has something to do with Google Internal dynamics. But I'm not the one being paid big bucks to think about such stuff. Maybe they've done their due diligence and made their tradeoffs, but I'm clearly not the target of the search engine anymore.

I'm only a dumb nobody, so if this is a problem for me, I wonder how it is for all the smart people that hangs out here at HN.

It just feels very uphill to use Google right now. No matter how many flags or tricks.

Snarwin · on July 29, 2021

The median Google search user probably never learned to use any of these "advanced" features in the first place. For them, having Google ignore the precise wording of their query and show results for more common related terms is almost certainly an improvement.

remus · on July 29, 2021

> It's precisely their fault, they've created an environment that incentivizes low quality, irrelevant content and are actively hostile towards users.

I think this is an overly harsh take. I strongly suspect that any algorithm for ranking search results is open to gaming and manipulation by malicious users.

account42 · on July 29, 2021

Google changed SEO from a seedy practice to something they actively encourage, promote and support.

Google stopped shitcanning sites that that present different things to Googlebot and regular users, including sites that require a login for normal users but show content to Googlebot.

Google imposed arbitrary ranking criteria that favor long-wided blogspam over concise articles that immediately tell you what you want to know.

Yes, this is their making.

account42 · on July 29, 2021

> gnoring the country website, previously if you wanted to search only local news it was very easy to do

Also the opposite: insisting on pushing local and localized results on google.com even when I set my browser language to english.

DoingIsLearning · on July 29, 2021

They used to have google.com/ncr

'ncr' stands for no-country-recognition and it did what it said on the tin.

Of course like all useful power-user features it got deprecated for the natural language query non-sense we have today.

usr1106 · on July 29, 2021

Nowadays you need to VPN to the target country. For a reason to complicated to explain here I searched local businesses in the city of Melun, France. There were no reasonable hits. Well, my IP was Finnish (to my best knowledge they have no other means of localizing me) and "melu" means noise in Finnish with "melun" being a common form. No addition of French shopping terms could convince Google that I am not interested in noise abatement. Accepted language header did not help. After switching to a French IP it worked like a charm. And one would guess searching for shopping and businesses would be Google's strength.

yreg · on July 29, 2021

How do any of those make people talk inside closed Discord groups instead of the open web?

BizarroLand · on July 29, 2021

I'm sure public human SEO manipulation is at least partly to blame. The only thing that is surprising is that it isn't worse than it is. At least the first half page is usually close to what you want.

Adrig · on July 29, 2021

One of the last use case for Google is being a proper search engine for Reddit. But I think they are aware of their downfall, that's why the top of the page is increasingly taken by their widgets to provide directly the information.

On the other hand, Youtube is the second most popular search engine and I don't see it slowing down. What an insight they had when they bought it.

Edit : I entirely agree to the fact that valuable information is found more in communities nowadays. I also predict that the web in 5 years will be mostly explored through communities

the_duke · on July 29, 2021

> that's why the top of the page is increasingly taken by their widgets to provide directly the information.

Another reason for that is user retention.

If you get your information directly on google.com, you won't navigate away, probably search again, and bring in more ad revenue.

wil421 · on July 29, 2021

When I’m looking for reviews of a product I usually type XXX review Reddit to avoid the XXX top 10 list blog spam that google returns. I don’t want a review from someone who just jumbled together a top 10 list without ever looking at the product in person.

tonypace · on July 29, 2021

YouTube search is regressing quickly. They're losing there too.

nuker · on July 29, 2021

> Google search has progressively deteriorated in quality

49 out of 50 review sites are now just affiliate links to Amazon. “Check the price on Amazon” buttons is the main content there

wccrawford · on July 29, 2021

I've noticed this a lot lately. There are words on the page that look like a description of the product and a review, but once you really read them you see that they could be generated by a bot and they don't actually review the product, just describe the basic properties of it. Then they provide that button.

topicseed · on July 29, 2021

True, although Google has been knowing about this issue and has released guidelines alongside a "Product Review" algorithm update three or four months ago.

Let's see if things improve in the near future.

nuker · on July 29, 2021

And many are on first page of Google search, the “best search engine” lol. We get what we pay for it. Where is pay for web search startups?

jacobn · on July 29, 2021

https://neeva.com/ is trying, but I doubt they're succeeding :-/

mojzu · on July 29, 2021

I think it depends on what you’re searching for, for dev related stuff no other search engine I’ve tried comes close. But there are whole industries now that are so heavily SEO’d that finding useful information without knowing the exact keyword to search for is incredibly frustrating

kall · on July 29, 2021

I agree, and I‘ve read the opinion too that it‘s a problem people have with DDG. Yet google doesn’t feel excellent at that. Could it be worth competing with google there? I‘m not gonna say it‘s "easy", but maybe worthwhile and possible?

I don‘t think I have used more than 1000 different sites in all development searches ever. It‘s the stack exchange network, github, official documentation, non-github official issue tracking/communities and some high quality blogs. That seems very manageable. You could probably index that into one elasticsearch and one sourcegraph instance. Add a little more specific faceted search, add back powerful and precise query syntax and still maintain "just past in whatever and hit the first result" functionality. I‘m likely underestimating the breadth of other developer needs than my own. I don‘t know.

mojzu · on July 29, 2021

I think a tool like that could be very valuable, as you said in most cases you end up in the same few common locations. Most of the time the reason I fall back on google is because I'm not sure whether what I'm looking for is going to be in a github issue, in a bug tracker, in a forum, in a stackoverflow answer, in a mailing list, etc.

There was a docs aggregation site I tried at one point that was quite useful, but without search across issues/forums etc. I didn't end up sticking with it

jacobolus · on July 29, 2021

Google scholar search is still very useful.

DuckDuckGo is nowadays more useful than Google for my web searches.

smusamashah · on July 29, 2021

You should try yandex.ru for all that interesting stuff. They don't censor any of it.

_fnhr · on July 29, 2021

Google seems to also place less emphasis on search phrases. When searching for exact article names I easily find them on DuckDuckGo, but not on Google. Two recent search-term examples:

1. the scientific worldview needs an update

2. from reproducibility to over reproducibility

account42 · on July 29, 2021

In general, google no longer primarily searches what you asked them but for what they think you want. This might be better for the average user but can be extremely frustrating when you are trying to find something more niche.

BitwiseFool · on July 29, 2021

I think it is also the result of the whole "Ok, google" voice assistant push. It seems like Google switched to natural language processing and the old-school system of keyword searching is no longer effective.

_abox · on July 29, 2021

Yeah these algorithms are so stupid. They always assume you want more of what you've seen before. For me it's usually the complete opposite.

yetanotheralexn · on July 29, 2021

Your examples seem to work for me (the second one only if combined with double quotes). Do you have more? https://snipboard.io/PYhNHW.jpg https://snipboard.io/HvRaiE.jpg

It would be cool to find datapoints for a proper bug report for Google :)

_fnhr · on July 29, 2021

Well, there is always stuff Google thinks you shouldn't read about. Try these (both are first hit on ddg.gg and nowhere on google):

- Politics Influences the Science of COVID-19

- Ten elements of false covid narrative

- Josh Mitteldorf unthinkable thoughts

cratermoon · on July 29, 2021

Whether or not it's Google's fault depends on how much you attribute the development of the advertising-driving distraction factory internet to Google's business. We can debate whether or not Google was ever really in the search engine business – certainly at one point the search was a useful tool. Today, Google search is a sort of glorified Yellow Pages*. Their main product is selling ads in this Nouveau YP. The results their search engine returns are now heavily skewed towards revenue-generating sites. Such sites may incidentally be informative, but they are generally selling something.

Edit: see this other HN story: https://news.ycombinator.com/item?id=27993564

This is not to say that all search results are bought, although of course those are present now, too. But overall Google presumes that whatever the user is searching for, the best result is one where the answer is "buy this thing".

For those search results that don't lead directly to commercial products, the revenue generation is indirect: through the collection of user preferences and activity, Google can refine its search results towards maximizing revenue. At the very least, the result is likely to be a site that has ads, some of which generate revenue directly for Google.

*In the old-fashioned Yellow Pages book, you couldn’t really “search,” but there was an index by category. It had many of the issues inherent in categories, but it didn’t take an expert to find things. Google search eliminates the needs for anyone to understand a taxonomy of businesses.

herbst · on July 29, 2021

Google only recently started to totally butcher the Swiss search results. For some reason I could still find direct download links to movies and music a few years ago (kinda legal here).

Now such search results often don't even get a second page...

IfOnlyYouKnew · on July 29, 2021

If 90% of what you’re searching for is keygens and „inside closed Telegram groups“, it might just be time to grow up?

janmo · on July 29, 2021

I've seen the same here in Germany but they do appear only if you use the results within the last 24h functionality. It looks like the German content is generated through GPT2 or 3. It makes no real sense if you read it. If you go on the page you are immediately redirected to a scam just like the article mentions. Interestingly they use ".it" domains here. It also looks like the domains might have been hacked or are expired domains that have been bought.

For example if you check havfruen4220.dk on archive.org you can see that it appears to have been a legitimate business website before. https://web.archive.org/web/20181126203158/https://havfruen4...

How do they rank so well?

I've checked the domain on ahref and it has almost no backlinks. But if you look closely you will see that all the results that rank very well have been added very recently. On the screenshots in the article you can see things like "for 2 timer siden" which means 2 hours ago. It looks like google is ranking pages that have a very recent publishing date higher.

Edit: Here is what the content of such a site looks like: https://webcache.googleusercontent.com/search?q=cache:Bk0VsM...

adventured · on July 29, 2021

Typically Google has a warming/trial period for new large content sites, after their search bot is introduced to the content and has spidered its way through the site.

For example there used to be a very common content farm system, that was structured like like this:

https://domainsites.com/site/nytimes.com

So when people searched for sites by domain name, the zillions of low traffic long-tail results of this farm system would be all over Google's results.

What it would present on the page is a mess of data about nytimes.com, such as traffic, or keywords pulled from the site header, maybe a manufactured description (or pulled right from the site head), sometimes images / screenshots of the site. Anything that could be stuffed in there to fill up enough content to get Google to not do an automatic shallow content kill penalty on the content farm. This worked for several years very successfully until Google's big algorithm updates, 9-10 years ago or whatever now (Penguin et al.). You could just build a large index of the top million domains (eg Alexa and Quantcast used to provide that index in a zip file), spider & scrape info from the domains, and build a content farm index out of it and have a million pages of content to then hand off to Googlebot.

So initially such a farm will boom into the search rankings, Google would give them a trial period and let out the flood gates of traffic to the site. Then Google would promptly kill off the content farm after the free run period expired and they had figured out it was a garbage site.

I still occasionally see this model of content farm burst up into traffic rankings, and it's usually very short lived. It makes me wonder if that's not more or less what's going on with the Mermaid farm.

kostecki · on July 29, 2021

This definitely looks like an expired domain that was bought. Havfruen seems to be a restaurant in the city of Korsør - which conveniently have the postal code of 4220.

NorwegianDude · on July 29, 2021

.it pages are used in Norway too, but I'm not sure it's something GPT-ish that's being used. Whole sentences are copied word for word from other articles.(might be a small dataset it's trained on?)

It could of course be that its something similar to GPT that is trained on all the content it could find and then writes articles, cause it's clearly messing up sometimes, form the small piece of content available at the search results page.

I'm not sure if this is an ML race and the reason we're not seeing the same thing in English is because Google might understands English better than spammers. While in Norwegian and German it's the other way around?

Clearly freshness is a large part of it. Google seems to have indexed millions upon millions of pages tied to this in the last 24 hours.

ROARosen · on July 29, 2021

Seems like not a new thing. Here is a warning tweet from beginning July from Danish Cybersec guy @peterkruse who saw his name coming up for a different domain owned by the same registrant as havfruen4220.dk

https://twitter.com/peterkruse/status/1410895961803665410

nmstoker · on July 29, 2021

I presume "GPL" was an autocorrect from the intended "GPT" right?

janmo · on July 29, 2021

Correct, it was a typo

dylan604 · on July 29, 2021

I don't know. I've tried reading the GPL2 & 3, and a lot of it just sounds like lawyer gibberish to me that could easily be attributed to GPT

MrUnderhill · on July 29, 2021

Interesting, I've been seeing the same spam for Norwegian searches, but with the domain nem-multiservice dot dk, or nem-varmepumper dot dk - presumably another legitimate business' domain that expired and was grabbed by the scammers. Visiting those domains show the same graphic as shown in the article.

Almost any search in Norwegian will have obvious scam sites like these in the top 10 results.

Other domains part of the same scam that show up in my results today: mariesofie dot dk, bvosvejsogmontage dot dk

I wonder if it is related to this: https://www.dk-hostmaster.dk/en/news/dk-hostmaster-takes-102...

NorwegianDude · on July 29, 2021

Yup. Those domains are the same thing, and redirects to the same thing. There are even more domains.

Never seen anything on this scale before. I can search for basically anything(tax rules, baking, stocks, property, hygiene...) and Google will most likely show those domains somewhere.

e_carra · on July 29, 2021

I had similar experiences with: https://www.xspdf.com/resolution/51859292.html

The content seems taken from other websites and mixed in a nonsensical way. It comes up frequently in my search results. www.xspdf.com has completely unrelated content and seems a separate business.

ricardo81 · on July 29, 2021

Poor man's cloaking

curl -A 'Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0' 'https://havfruen4220.dk' > 1.html

curl -A 'Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)' 'https://havfruen4220.dk' > 2.html

gvb · on July 29, 2021

The "diff" output (above) needs an extra line break to avoid HN automatic line wrapping. The output of the diff command is:

diff 1.html 2.html

NorwegianDude · on July 29, 2021

I've noticed this daily.

Would be interesting to see the actual content. Based on the small snippets in the search results, it takes content from other sites, like large Norwegian news sites, and somehow outranks them hard.

I wonder what the Google Search Console looks like for that domain, considering that it's probably getting millions worth of free traffic.

EDIT: After looking more at it, it's insane how much it ranks for and how well. Straight up brand names seems to be the hardest to compere with, at least larger ones. Those seems to be around page 4-5 for me.

Some brands I was unable to find at all, but ironically another .dk domain showed up in it's place that did the same thing. There is also some .it domains using the same content.

I've found that it takes contents from multiple sources and glues it together in sometimes great ways. Like one sentence from this page, another thing from that page.

Maybe this is some ML that collects content and pieces a lot of it together sentences or half sentences to one large article? It's clearly from completely different sources, but about the same thing.

Example: "wash car"

Result in google: "A dark winter with snow and salt is hard on the car, and it's extra important to wash the car" - Collected from one article.

"Keep the pressure washer at 30-50 cm from the car..." - From another article.

Ironically, there is like 11 results all tied to this thing outranking the original articles(those are last), even if it's medium to large sized well known companies selling for billion(s) of dollars each year in Norway.

Sometimes it goes from one thing and switches to something completely unrelated, so I guess the spammers still have something to improve.

Weird.

weird-eye-issue · on July 29, 2021

Some data on their traffic from some SEO tools I pay for:

Ahrefs: 230k organic traffic valued at $124k SEMRush: 558k organic traffic valued at $355k

These are estimates and can be widely under or overestimated but they show that this is happening on a very large scale.

For a quick idea on how this is possible I looked at their top pages (according to Ahrefs). Their top page is ranking #2 for the keyword "interia" which has 207k searches per month in Norway and is rated as 0 (out of 100) for being easy to rank for. Usually when a keyword has that amount of searches it would be incredibly hard to rank for, I've never seen anything like this. So what is happening here looks like they are just taking advantage of a market with really low competition keywords.

NorwegianDude · on July 29, 2021

Interia is a large polish web portal, from what I could find. Norwegian people doesn't know it, but polish people might. There is probably around ~2 % polish people in Norway. It also ranks as #1 for me. It's in polish too, so basically only ~2 % of Norway would understand it.

However, the weird thing it that it steals content from articles, and then outranks them. Most pages seems to be boosted, maybe as a result of it being new. (Most content is just hours old)

Could you check these too? (exactly the same thing, but newer, it seems) www.mariesofie.dk nem-varmepumper.dk

Clearly reused domains.

weird-eye-issue · on July 29, 2021

The keyword data was based on searches in Norway alone, it is an order of magnitude higher in Poland. In Norway almost anybody could rank for that keyword if they tried due to the difficulty being different based on location and language.

Ueland · on July 29, 2021

Sidenote but what do you think about Ahrefs? I'm doing some tests to see how easy it is to get ranked for keywords (with actual helpful content, not crap like this thread is about), but i find the Adsense keyword tool not that helpful as they delete many keywords when you search for them, which kinda voids that tool.

But I currently feel that paying $100/mo for Ahrefs for something I do as a side project is a tad wasteful.

weird-eye-issue · on July 29, 2021

You need a tool like Ahrefs or SEMRush for competitor analysis and keyword research. One trick with Ahrefs if you want to be frugal is to pay for the $7 trial and use it as much as possible during the trial to do your keyword research and cancel. Technically if you are efficient enough that trial could get you months worth of content at least.

55555 · on July 29, 2021

ahrefs is the best in the business.

richardpetersen · on July 29, 2021

Ahrefs is the best data source but this is the most frugal approach

weird-eye-issue · on July 29, 2021

Depends what data you mean. For backlinks this might be true but they have noticeablely less keywords than SEMRush

gnyman · on July 29, 2021

Pet theory (disclaimer that I know very little about SEO) would be that the website with the cloned content loads fast and does not load 4 MiB of javascript, thus beating the original content in ranking mostly because of the speed, which is I believe a important factor in Google rankings (and getting more important).

And add to that the some link spam and preventing the visitors to return not get any bounce back...

Either way, I can't help to be a bit impressed by the SEO spammers outsmarting the people at Google. (Edit: and I don't mean to say they are smarter or anything, just that they only need to find one weakness in the algorithm while the people working to improve it needs to make it works for everything.)

jval43 · on July 29, 2021

Once the hard requirement on speed impacts the quality of results it no longer helps me as a user. I'd rather have the sites invest their time in good content and wait a few seconds rather than get fast but low quality SEO-ed results. Same with AMP, the quest for speed doesn't make my experience faster if I still load the original page (which is often still necessary).

monday_ · on July 29, 2021

Not sure how relevant this is, but the animal characters in the top image are from a Russian children hit cartoon "The Smesharicks" (literally "The Laughballs").

snickersnee11 · on July 29, 2021

Also, the left image of a woman from a russian meme from a tv show.

Schnurpel · on July 29, 2021

If I would run a global infrastructure company like Cloudflare, I also would not take any sides, and leave my service open to anyone. The world is full of people who get upset about something. However, if I declare a hands-off policy, it must be truly hands-off. Cloudflare kicked off Switter https://www.theverge.com/2018/4/19/17256370/switter-cloudfla..., it banned 8Chan https://blog.cloudflare.com/terminating-service-for-8chan/ , it banned the Hacker News https://mobile.twitter.com/thehackersnews/status/66900183605... . That’s not how hands-off works.

notRobot · on July 29, 2021

To be clear, that's not HN, but The Hacker News, a different website, known for... dubious reporting.

bigpeopleareold · on July 29, 2021

I hate dealing with this and now refuse to use Google now when I saw patterns in search results while I was researching common things (like housing) in Norwegian, here in Norway. I rarely use Google these days, but I thought for a second that Google might be better with search results than DDG in Norwegian, but this stuff is aggravating. This is one of those where they screw around with history that you just have to start fresh again on whatever you were doing instead of going back.

edit: one other thing I have seen, but it doesn't mean it is always spam. All The Words In A Title Are Capitalized - it's something to pay attention to whether it is spam or not. Conventionally, titles are usually not like that in Norwegian.

eitland · on July 29, 2021

> edit: one other thing I have seen, but it doesn't mean it is always spam. All The Words In A Title Are Capitalized - it's something to pay attention to whether it is spam or not. Conventionally, titles are usually not like that in Norwegian.

Another big one is that Norwegians like Germans write words together, just one example from one of the stupid ads: "Spesial Reportasje" is a dead giveaway not only because of the capitalization.)

(Oh well, sadly because of pressure from Words incompetent spell checker over years and lenient teachers this is getting worse. I fear we are seing compound damage here as kids that got away with this are now becoming teachers...)

eru · on July 29, 2021

In German there was actually quite a lot of historic development about whether to write words together or separate or with hyphens.

The current state of formal German will surely not be the end of history.

See https://de.wikipedia.org/wiki/Leerzeichen_in_Komposita#Gesch... (Might need Google Translate, if you don't speak German.)

rvba · on July 29, 2021

For someone learning German as a foreign language separating the words would really help. Even if it leads to things like "Trink Wasser fur Hunde" (as mentioned in the Wikipedia article).

Hyphens or spaces are still better than those long words...

eru · on July 29, 2021

Yes, I can see that. Pervasive hyphens would resolve most of the ambiguity and make it easier to learn. (But they also look kind of ugly.)

Just be glad you ain't learning Turkish or Finnish, though.

CRConrad · on July 31, 2021

I find the difficulty of learning Finnish is being consistently exaggerated on the Internet.

Source: Learned it as an adult myself.

eru · on Aug 3, 2021

Oh, I'm not saying it's difficult (no clue whether it is). I'm saying that they have long words.

qkls · on Aug 1, 2021

What languages did you speak before learning Finnish and what level are you currently at?

CRConrad · on Aug 2, 2021

> What languages did you speak before learning Finnish

In order of skill, or chronological? C: German, Swedish, English, and French. S: Swedish, English, German, and French.

> and what level are you currently at?

"Level"... Lived and worked here 26 years (longer than in any other country), usually speak only Finnish with colleagues. (Exception: my previous job, 2014-18, at an unusually international company; quite a lot of English there.) What number is that, on whatever scale you were thinking of?

[EDIT:] IOW, it's gotten to the point where I fear my actual native language is only my fourth-best any more, Finnish having pushed it off the podium. [/EDIT]

Anyway, my point was: I may have improved a bit since, but was probably quite close to my current "level" after two or three years. It really isn't all that humongously difficult as it's made out to be.

The logic of a multi-inflected agglutinative language may feel unusual at first, but once one gets used to it, it's just that: logical. The orthography and especially the pronunciation of Finnish is the most straightforward of all the languages I've dabbled in (smatterings of perhaps half a dozen more besides the ones I speak). And I think, above all, it has the fewest cases of "this is the rule, but the exceptions are this, that, and the other", where you just have to learn by rote that "this word works that way, but that word works this way", of any language I've come across. Learn the rules and you know it; no exceptions to learn.

qkls · on Aug 2, 2021

Sori jos kuulosti hyökkäävältä, kiinnosti vain kuulla lisää miten suomen oppimisen vaikeus koetaan :)

I'd say that it's fluent level then. I've met people that have lived in Finland for 20 years but can't still form sentences in Finnish.

The "Finnish is hard" trope is probably a mental block, Finnish probably seems hard on surface but it's very logical if you dive deep into it.

CRConrad · on Aug 2, 2021

No kyl se näyttää hyvin avaruusoliokieleltä ellei sitä ainakin vähäsen jo osaa, joten kai ihan ymmärettävää...

IfOnlyYouKnew · on July 29, 2021

Compound words are about 70% of the fun we have.

eru · on July 29, 2021

We don't have much fun in Germany..

tremon · on July 29, 2021

That actually compounds the problem.

bigpeopleareold · on July 29, 2021

This reminds me of the facebook group: Bilder i kampen mot særskrivingfeil: https://www.facebook.com/ettord :D

(Something like: Pictures in the struggle against mistakes when using spaces between words)

eitland · on July 29, 2021

There's also the Norwegian "Astronomer mot orddeling" ("Astronomers against word splitting") that is very open to non-Astronomers as well.

bigpeopleareold · on July 29, 2021

Just want to add to my comment also that it is not limited to havfruen4220.dk, but clarifies a general pattern. I tried a couple of search terms like 'mattilbud rema 1000' and found more .dk domains on the second page (nem-varmepumper.dk, humanrebels.dk) - two things that have nothing to do with food.

the_biot · on July 29, 2021

For all that Google search has been utterly crap for going on a decade now, I have to admit part of the reason is that they get targetted relentlessly by SEO spam operations like this. I like DuckDuckGo for now, but I imagine as they get bigger they're going to be a target for these kinds of spam just the same.

boomlinde · on July 29, 2021

> they get targetted relentlessly by SEO spam operations like this.

Why, though? There is an arbitrary ranking system that seems increasingly independent of what I actually searched for. Google had created a game where the winner isn't necessarily relevant or at all useful. It's inevitable that spammers will play that game.

skinkestek · on July 29, 2021

> I have to admit part of the reason is that they get targetted relentlessly by SEO spam operations like this.

A bit of it is probably that.

Outright ignoring my queries: +, doublequotes, "verbatim" and all takes more than SEO tactics, it takes someone inside Google, either malicious or more probably incompetent on the inside.

Or more probably: someone was so busy trying to use AI in searches that no they haven't had time the last ten years to consider if it was smart.

logicchains · on July 29, 2021

>Or more probably: someone was so busy trying to use AI in searches that no they haven't had time the last ten years to consider if it was smart.

Or maybe Google started applying "We know better than the users", the driving principle behind their software and libraries, to their search.

fauigerzigerk · on July 29, 2021

Is there really any difference between DDG and Google when it comes to SEO spam? If there is, I sure haven't noticed in spite of using both, often for the same search terms.

It seems to me that the techniques used to spam Google's index work just as well on Bing's index.

raverbashing · on July 29, 2021

Even worse, getting this kind of spam through to DDG (Bing?) seems easier than on Google

It seems DDG is worse at finding the more authoritative sites about a subject compared to Google.

shuger · on July 29, 2021

That's an advantage. Since google tuned up their engine to treat authoritative results as better their searches became absolute dogshit.

You search for a very specific thing and all the results are big sites that have said something that contains two of the 6 words you search for in a completely generic article that helps you none.

My favorite is when your query contains a word that is the very essence of what you search for and google chooses to display results without it so you have to do extra click "yes I actually want to search for what I said I want to search for".

rvba · on July 29, 2021

Because they automated anything and you cannot contact any human from quality assurance.

beebeepka · on July 29, 2021

Google search has been a brochure for a long time now

dhosek · on July 29, 2021

The ones thing I want more than anything from google or DuckDuckGo or anyone really is the ability to give a list of domains and never have their results show up in my searches. I know I can do this on a per search basis but I want it to be a configurable setting.

mattwad · on July 29, 2021

UBlacklist is a plugin that does this. It's so great to be able to hide all those sites that just cache Git issues and SO posts.

RileyJames · on July 29, 2021

Just to add to that, uBlacklist has a power feature called subscriptions. Which is massively under utilised.

It enables a collaborative effort in blocking spam / low value domains.

If you make a block list, please submit it to the list I’ve made: https://github.com/rjaus/awesome-ublacklist

(There’s no great subscription discovery as yet)

nickysielicki · on July 29, 2021

Oh man, this plugin is going to save me hours of time over the next 30 years. Goodbye forever, cplusplus.com

dhosek · on July 29, 2021

I installed it and it's—ok? For search results where the spam overwhelms the signal (it used to be able to do a decent reverse phone lookup by putting a phone number into Google), you end up with empty pages or mostly empty pages in the search results. Better than nothing, but it really should be a feature from the search engine, not a browser plugin.

eitland · on July 29, 2021

I used to have a text document on my desktop containing a list of domains that contained autogenerated content, each with a minus in front, like:

-stupidautogeneratedcontent1.com -stupidautogeneratedcontent2.com etc

I figured sooner or later Google would pick up the signal but I think instead they just started ignoring my "- requests" as I stopped using them. edit: or maybe they fix the problem. Spam sites used to be a problem during the early decline of Google. I think what happened was that problem actually almost disappeared for me and was replaced by irrelevant results from non-spam-sites

Edit: mahalo.com was one of those, https://en.m.wikipedia.org/wiki/Mahalo.com

dcdc123 · on July 29, 2021

I’ve you tried -site:site.com? I think that still works.

niutech · on Aug 1, 2021

Just filter out results using uBlock Origin like this:

   google.*##.g:has(a[href*="example.com"])

matsemann · on July 29, 2021

Yeah, I've seen this domain a lot lately. But I've complained about the Norwegian results for years [0]. For most searches there will be a result that's just keyword spam ranking high. Retried my "pes anserinus bursitt" search now 2 years later, and two results are spam from havfruen, and there are some other results from https://no.amenajari .org which is also just translated and scraped content for all languages google seems to love, as I've seen it for years. A third domain I often see as well is "nem-varmepumper". Apparently a site about heat pumps has content on everything.

Can't fathom Google not catching this..

[0]: https://news.ycombinator.com/item?id=21621099

porbelm · on July 29, 2021

When I try that search, havfruen is seventh place. NHI and other good results at the top.

YMMV a lot with Google results. For me, it's usually great where DDG is kinda crap, but not as bad as... shudder ... bing

fogihujy · on July 29, 2021

With DDG, I found this thread. Google set to Norway as region/language found nothing from havfruen4220.dk, unless I specifically added site:havfruen4220.dk in the search.

My guess is that someone at Google reacted.

matsemann · on July 29, 2021

Almost all my searches from the last days still show havfruen as a result somewhere. My pes anserinus above. Or "obos fellesgjeld" from a ~week ago which was when I noticed the pattern first. "monstera jord" gives lots of translated blogspam, and then a row of havfruen results.

Switching language on Google has basically no effect. Sometimes I want to find Swedish results for a thing with the same name, but no matter what I do I get Norwegian results ranked first. So don't think this is easily emulated from abroad.

fogihujy · on July 29, 2021

Yeah, that might be it. Finnish Google is rather useless in general -- especially for Swedish results -- and I expect the same for Norwegian ones.

eitland · on July 29, 2021

Just unintentionally confirmed it was still there for me when I searched for Roblox gift cards.

bash-j · on July 29, 2021

The last time I accidentally installed malware on my computer was when the top Google result pointed me to a site masquerading as the official site for the software. That thought me a lesson to pay attention to the domain name.

hayksaakian · on July 29, 2021

Interesting because it shows that bounce-back is a more significant ranking factor than before.

It seems like they've manipulated rankings by locking people in to reduce their bounce-back stats (in addition to keyword-stuffed content)

ma2rten · on July 29, 2021

I don't think it necessarily shows that. Their good ranking could be completely unrelated to bouceback.

wokwokwok · on July 29, 2021

Who knows? It's a black box after all.

...but, you know. Can you see anything else they're doing that would give them that kind of ranking? These pages are just piles of crap, and google is pretty good at filtering that sort of stuff out.

If it was that easy, google would be filled with spam everywhere.

The chance that someone did something random thats very uncommon (block back) and it happened to be a super effective signal to google seems:

a) like an edge case they didn't think of

b) like it'll get fixed pretty fast

c) not that unlikely.

Compared to, say, the idea that some random spammers have built a network of incredibly sophisticated ML-generated pages that can subvert googles algorithms which seems:

a) not substantiated by any obvious content on the pages

b) requires a very high level of sophistication which seems totally lacking

c) very unlikely

...but I mean, who knows right?

We're all just speculating. I guess it'll get fixed soon, and we'll never know.

Miraste · on July 29, 2021

Everyone and their mother blocks back buttons. Major news sites do it. There is no way that's what's ranking them this high.

chopin · on July 29, 2021

Sites should be punished into oblivion for doing this. Why do browsers even allow it? Is there a legitimate use-case for this?

I am maintaining an SPA and the only thing I do is trying to not pollute history. But I'd never try to block the back button.

ma2rten · on July 30, 2021

I think a legitimate use-case is asking for confirmation before leaving the page when filling out a form so that the user doesn't loose their data.

chopin · on July 30, 2021

But this can be achieved with other means. I have exactly this on my SPA.

purplepatrick · on July 29, 2021

I agree. Tons of sites employ the bounce-back avoidance tactics, and these don’t particularly help their ranking (in fact, lots on non-ranking sites do it — presumably just to keep you on the page)

soheil · on July 29, 2021

I wonder why Yandex opens every link in a new window. How can they track bounceback?