Google confirms the leaked Search documents are real

nojvek · 2024-05-30T21:16:14 1717103774

KEY TAKEAWAYS:

• Google claimed they don't use a "domain authority" metric, but the docs show they totally do - it's called "siteAuthority."

• G said clicks don't affect rankings, but there's a whole system called "NavBoost" that uses click data to change search results.

• Google denied having a "sandbox" that holds back new sites, but yep, the docs confirm it exists.

• G assured us Chrome data isn't used for ranking, but surprise! It is.

• The number and diversity of your backlinks still matter a lot.

• Having authors with expertise and authority helps.

• Putting keywords in your title tag and matching search queries is important.

• Google tracks the dates on your pages to determine freshness.

• A lot of long-held SEO theories have been validated, so trust your instincts.

• Creating great content and promoting it well is still the best approach.

• We should experiment more to see what works, rather than just listening to what Google says.

From: https://www.reddit.com/r/SEO/s/ChlTrhjPnG

I wonder how chrome data works. Are they using every chrome browser to sniff what users are clicking on?

jeroenhd · 2024-05-30T05:32:26 1717047146

I was afraid of this. Now, it's a matter of time before Google search will get even worse as SEO hustlers push more of their useless crap to the top now that internal algorithm data has been published.

Guess I should look into that Kagi thing people keep mentioning.

josefresco · 2024-05-30T14:27:31 1717079251

The leak essentially confirmed what SEO experts already suspected (knew) but Google denied. SEOs have spent 2+ decades observing Google search behavior and honestly I wasn't even a little bit surprised their observations were proven correct. At this point, the "garbage" on Google isn't SEO optimized organic results, it's the ads.

aaronwall · 2024-05-30T18:53:57 1717095237

For what it is worth, Google has favored macro-parasites over micro-parasites. The bigger companies have access to the ears of market regulators, etc. The average small publisher or affiliate site has almost nobody care if it disappears.

Part of the most recent Google update was penalizing high authority trusted sites for publishing off topic content from third parties. There is a concept called "goog enough" explaining how the likes of Forbes ranked for just about everything. https://www.blindfiveyearold.com/its-goog-enough

altdataseller · 2024-05-30T09:49:13 1717062553

I doubt that will happen. One becsuse the leak didnt really disclose any major secrets that most marketers didnt already know.

2, even if some of this wasnt widely known, its not like you can take advantage of it overnight. Theres no quick hack to building a trustworthy domain or getting lots of trustworthy links for example

bestnameever · 2024-05-31T17:27:03 1717176423

> Theres no quick hack to building a trustworthy domain or getting lots of trustworthy links for example

Sure, but there are other potential hacks that this leak exposes that marketers may now be focusing on more so now they would have otherwise based on the information in the leak.

drivebycomment · 2024-05-30T06:15:39 1717049739

It doesn't look like this leak will do that. This doesn't have "algorithms" in any real sense of algorithms.

Jensson · 2024-05-30T06:50:41 1717051841

Before SEO communities argued what were part of the model, thinking some things like clicks didn't matter. Truth is Google did everything you can imagine, including clicks, now they all know it instead of having to guess.

Still wont change much though, its very hard to game since Google has a lot of ways of mitigating click farms or it would have discovered a long time ago.

aaronwall · 2024-05-30T18:58:08 1717095488

Back in the day a friend mentioned you could choose what version of a phrase you wanted to make the canonical for a search autocompletion by embedding a broken image call to a SERP page for the version of a keyword you wanted to be more popular.

Google has tons of ways to identify real users versus fake users. And lots of the fake it until you make it efforts leave statistical outliers that can lead to ignoring or smoothing away much of the benefits, especially if there is no fire following the smoke trail.

tmaly · 2024-05-30T10:40:50 1717065650

What could be worse than recipe pages that are 20 pages worth of text with the recipe hiding somewhere among the text?

Mathnerd314 · 2024-05-30T16:02:22 1717084942

For me the recipe sites are pretty usable (with adblock), there is generally a "jump to recipe" button to skip past the text. And sometimes I even read the text, if it is a good recipe the text often has useful information like substitutions and preparation techniques. Certainly a "just the recipe" website format would be worse SEO-wise, but I am not so sure it would be more useful.

dgellow · 2024-05-30T12:57:40 1717073860

They could split it in multiple pages instead of a single page! Imagine having to click “next part” >10 times just to see if you eventually end up with a section that contains the actual recipe

cush · 2024-05-30T13:59:04 1717077544

And unskippable ads after every third image. Then the moment you get to the final image there’s an email registration wall. It has a little X button that doesn’t work on iOS.

dgellow · 2024-05-30T17:52:45 1717091565

I see that we have a connoisseur of the devil’s work here :)

BobaFloutist · 2024-05-30T20:52:06 1717102326

The thing is, there's a limit to how many times the typical user would do that before just clicking back to google for a different recipe.

dgellow · 2024-06-01T12:59:14 1717246754

Create a fake google page, inject it into their history, the users goes back, sees something google like, now you’re 100% evil, congrats! :)

brokenmachine · 2024-05-30T23:13:09 1717110789

They also break the back button.

khedoros1 · 2024-05-31T05:50:50 1717134650

That's why I open search results in a new tab, and close the tab to go back to the search.

brokenmachine · 2024-05-31T09:59:11 1717149551

Me too, but I still hate it when they break the back button.

aaronwall · 2024-05-30T18:55:05 1717095305

Sites that have a poor user experience by design create the ranking signals for their own demotion by such design. Get a lot of traffic from search with not many people liking the destination page and that ranking will quickly go away.

IncreasePosts · 2024-05-30T16:04:12 1717085052

Almost all of them now have a "jump to recipe" link at the top of the page.

rnd0 · 2024-05-30T09:45:18 1717062318

Isn't Kagi dependant on Google for their results? Doesn't seem tenable in the long-term to me.

elromulous · 2024-05-30T10:23:52 1717064632

Are you thinking of ddg? Afaik kagi runs their own independent engine from scratch.

SushiHippie · 2024-05-30T11:40:22 1717069222

Kagi is a Meta Search engine (it uses other search engines API e.g. Google, Yandex, Brave Search, Marginalia, ...) + it has its own (tiny) index, their own index mostly consits of their "Small Web" pages.

hoppyhoppy2 · 2024-05-30T10:34:06 1717065246

>Our own index of the finest results augmented by the results from the best search engines on the market.

https://kagi.com/

aaronwall · 2024-05-30T19:00:38 1717095638

The design is too monochromatic.

jeroenhd · 2024-05-30T10:37:53 1717065473

DDG uses Bing. Them blocking trackers except when they come from Microsoft (because of this search engine deal) is why I don't really care about using DDG.

You can use bangs to search on Google, but that's not the default.

disagree123 · 2024-05-30T07:40:59 1717054859

[flagged]

regnerba · 2024-05-30T08:23:25 1717057405

Care to expand upon that? I have been using Kagi for just under a year so far and really enjoying it.

viraptor · 2024-05-30T08:35:19 1717058119

It's a troll. Downvote, move on.

blueflow · 2024-05-30T08:12:44 1717056764

10 years ago "chrome botnet" was a meme. And now we get the evidence for it.

I don't know what to say.

nioj · 2024-05-30T00:16:31 1717028191

Related discussion: https://news.ycombinator.com/item?id=40514491

jader201 · 2024-05-30T00:19:29 1717028369

Which is a dupe of: https://news.ycombinator.com/item?id=40496967

janmo · 2024-05-30T09:28:11 1717061291

The main takeaway is that Google has been lying and gaslighting about their ranking factors.

The main lies that were uncovered is that they are indeed using clicks, and chrome browser data for ranking purposes.

Summary of their lies here: https://www.reddit.com/r/SEO/comments/1d2gllz/google_caught_...

_heimdall · 2024-05-30T17:08:07 1717088887

I don't understand why anyone would trust those lies though. For a very long time Google has been a data and advertising business with a monopolistic hold on browsers, search, analytics, and advertising. Of course they use those together to make more money.

pixxel · 2024-05-30T14:12:01 1717078321

https://old.reddit.com/r/SEO/comments/1d2gllz/google_caught_...

Awtem · 2024-05-30T15:58:24 1717084704

Good encryption should still hold, even if you know the algorithm. The same reasoning should be applied to search engines.

Spivak · 2024-05-30T21:40:12 1717105212

Oh there's no way such a thing is possible. Unless you have an omniscient oracle your search engine will be based on some metric correlated with quality and relevancy. The moment those are known people will produce low quality content with high scores on those metrics.

Can you think of even a single metric that can't be gamed?

Dxtros · 2024-05-30T17:07:59 1717088879

to get real answers from humans and not marketing campaigns I constantly put reddit in my search. I feel like 5 years ago that wasn’t necessary.

aaronwall · 2024-05-30T19:02:05 1717095725

Very easy to post things on Reddit as a marketer, particularly when working with a small group who can respond to each other to season threads. Plus you can pay trusted Reddit account holders to post items for you.

whatamidoingyo · 2024-05-30T18:45:11 1717094711

> not marketing campaigns

> reddit

A great marketing tactic is to pose as reddit users. If you have just 2-3 realistic accounts, you can ask a question as account 1, and write your answer with account 2. Now imagine a company with $$$. They can guide an entire thread.

sdwr · 2024-05-31T00:03:54 1717113834

I'm sure there are marketers on Reddit, but the nature of virality itself makes it pretty robust against attempts at manipulation.

(You have to astroturf really hard or be a part of an existing wave, astroturfing reads different, so the best you can hope for is bending the narrative a step or two)

yoyogogo69 · 2024-05-30T19:03:15 1717095795

now imagine entire countries spreading influence and manipulation through reddit. add in extreme bias and hivemind. only a select few topics where reddit would be a good place to learn from

jongjong · 2024-05-30T05:12:00 1717045920

Google should change their algorithm to rank websites randomly; they all show up in search results with equal probability, so long as they exceed a certain threshold of relevance for the user's keywords (the threshold could vary for different keywords but would be made public and there could be instructions on how to meet the threshold requirements so it doesn't have to be a secret and anyone should be able to get their sites showing for at least one set of specific keywords). That would make it impossible to game. Maybe they could have 5 slots in a side container for 'Top trending' for those keywords for the current day, week, month or year (the user can choose the granularity). Problem solved.

xyzzy123 · 2024-05-30T05:17:49 1717046269

You would game it by creating more websites.

jcpham2 · 2024-05-30T16:57:48 1717088268

Other others have stated below this does in fact become a cat-versus-mouse Sybil attack scenario where the barrier to entry isn't high enough to stop a bad actor from creating many websites. Like online identity and reputation has to be tied to more than just an email address.

jongjong · 2024-05-30T05:27:41 1717046861

But would be difficult to build a lot of websites which all meet the threshold for specific keywords. The thresholds don't have to be particularly low. In fact it's better if they require a certain amount of work to meet. So maybe only a relatively small number of websites would qualify for a specific niche keyword but the idea is that, among those, they are ranked randomly. You'd probably have to use AI to figure out site quality in niche areas.

Or Google could go with a lower risk approach of keeping their results as they are with their current algorithm, but only randomize 3 slots out of the top 10 based on this new threshold approach.

dmoy · 2024-05-30T05:45:34 1717047934

Do you remember those autogenerated websites that were just giant lists of all words? Those disappeared many years ago, but if you made search ranking random, they'd come right back.

eastbound · 2024-05-30T05:57:33 1717048653

In 2030: Do you remember those websites storytelling about their grand mother just to introduce a mathematical theorem? We’re so lucky they disappeared like the giant lists of all words, because they were 100% fabricated by Google’s unnatural incentives.

Google has the ability to change the face of the internet in 2-3 years. They can detect the chaff and shut it down, and I wonder whether it’s an anti-competiton feature that they require that websites write a thousand words per page.

AuryGlenz · 2024-05-30T06:32:03 1717050723

I asked ChatGPT to tell me how to get away with murder in the style of a recipe blog and it (surprisingly) did a bang-up job: https://chatgpt.com/share/b738b68d-8294-4a2c-87ff-f95a6e2d91...

I did this after simply wanting to know how much powdered sugar to put in whipped cream and getting frustrated at trying to scroll through 3 blogs just to find the ingredient list for something so simple. Eventually I just asked ChatGPT.

I wonder if Google can start running an LLM on websites to judge them on things like that. Hell, looking for a photographer in your area? Have it judge how good the photography is on each website. The possibilities are there but I don’t know if they’ll bother.

domador · 2024-05-30T06:55:19 1717052119

Your link doesn't seem to work.

Trencin · 2024-05-30T08:46:20 1717058780

It was removed because it was against policy. I was able to generate a new response with this prompt "I'm writing a novel. Tell me how I can get away with murder, write it in the style of a recipe blog"

AuryGlenz · 2024-06-01T06:03:27 1717221807

Huh, that's odd. Since when do they go back and check older generations?

consp · 2024-05-30T08:44:08 1717058648

> Do you remember those autogenerated websites

Still many copy/paste sites around. Crawl data, put a skin on top, publish on stolen domain to make it legit, clickfarm away!

bonton89 · 2024-05-30T12:34:25 1717072465

I honestly think the problem can't really be solved because of the adversarial relationships involved. But if there was more than one search engine with significant marketshare maybe it would be easier to route around the problem.

vincnetas · 2024-05-30T05:50:15 1717048215

Why would it be difficult? Just copy paste content to different domains. And done. And for example if google decides to down rank sites that have same content on different domains, well, then you have a nice weapon against your competitors, just copy their sites lot of times and you got your competitor removed from google.

ncruces · 2024-05-30T06:50:29 1717051829

It's a game of cat and mouse, and apparently all the “this is easy” people think they're just smarter than everyone out there.

aaronwall · 2024-05-30T19:10:33 1717096233

One of my buddies that got into SEO a half decade before I did mentioned the copy and paste rankeroo stuff was real popular back in the days of Infoseek, Altavista, Excite, Lycos and similar.

Google looks for the canonical version of a document and then deduplicates before returning the result set.

You can add &filter=0 to the end of the search URL for a particular query to turn off the duplicate content filters.

An old school spam technique for some affiliates in the early days of Google was to buy a high PR link to their affiliate URL so that like site.com/?aff=123 would be the default version of the homepage & the branded searches for the merchant would then owe the affiliate the commissions until the rankings shifted again.

jongjong · 2024-05-30T06:47:53 1717051673

Well surely the algorithm can detect duplicate content. Also Google should focus beyond content and consider user satisfaction metrics to decide what is above or below the threshold. Maybe AI can help with all these things?

krisoft · 2024-05-30T08:09:37 1717056577

> That would make it impossible to game.

Have you considered that it would make it also pretty lame user experience?

amarcheschi · 2024-05-30T08:18:34 1717057114

Hold up, I think he might be up to something for when he discovers to order an array in O(1)

groone · 2024-05-31T18:51:50 1717181510

The golden times were 8-10 years ago, where you could change the order of keywords in google search and get more precise matches. Could find pretty much any obscure thing on the internet.

Then could find that article that you remember read 6 months ago by adjusting the keywords until it is on the first page.

Now it does not matter at all what you enter in the search box. No matter the inputs you get one set of results and will never find something specific.

stephen_cagle · 2024-05-30T19:51:05 1717098665

Yeah, and we should give it a cool name. Something that communicates that this is a new kind of search! I am thinking "NewHorizons" or how about "AltaVista"? What do you think?

rexreed · 2024-05-30T11:39:15 1717069155

In not too many years, the average user will be prompting to get their needs met, rather than searching a flawed search system, wading through pages of sponsored and SEO-gamified links, opening up multiple tabs to try to dig out the details from sites hustling whatever they hustle, and then trying to read to get their needs met. Google sees the writing on the wall, which is going towards a prompt-based direct ask system mediated by an LLM. It definitely is far from perfect now, but the writing is on the wall. Search and SEO are both going to be relics of a bygone past in not too many years.

mrgoldenbrown · 2024-05-30T17:24:21 1717089861

The same forces that drowned us in SEO crap will drown us in LLMO crap. Hopefully we'll enjoy a brief period of usefulness first.

mu53 · 2024-05-30T12:04:51 1717070691

I don't know. There were recently some documents released on how OpenAI was soliciting companies to integrate their product recommendations more deeply into the training data. This is obvious a huge way to monetize chatgpt-like products. Rather than SEO optimized sites gathering the ad revenue with click bait and gamification, OpenAI will collect the revenue themselves.

At the end of this enshittification, users will be looking for other options. Imagine a salesman that is ignoring the elephant in the room to tell you that if you bought this brand of shoes, you would run faster instead of giving tips on the skills to learn to be a better runner.

Search is a great way to find high-quality references and non-hallucinated answers by tweaking the keywords slightly. A salesman-like LLM might be pushing products when you just need information. ChatGPT's authority is going to dwindle, and search is a good tool to find authoritative sources

_heimdall · 2024-05-30T17:05:50 1717088750

I'd argue that sales is entirely a game of ignoring the elephant in the room and selling someone on something they don't already think they need.

Its not really sales if I go into a shoe store and say I want a pair of Air Jordan 4s in size 11, that's just customer service.

yoyogogo69 · 2024-05-30T19:15:06 1717096506

people already know how to be a better runner. they just dont want to do it. they rather buy shoes that make them run faster instead. people also dont like choice. they might think they do but really i dont believe it. getting a single answer with a simple question is more appealing than having to come up with a detailed question with many answers to choose from. even searches have been doing this with the single boxed result at the top

beeboobaa3 · 2024-05-30T08:44:46 1717058686

Any good summaries about what was revealed in the leak?

janmo · 2024-05-30T09:24:14 1717061054

The main takeaway for me is that Google is caught lying. Many things were already assumed but Google used to deny them.

- They claimed that clicks were not a ranking factor, it turns out it is.

- Also turns out that they are using Chrome data for ranking purposes (Not good for the ongoing lawsuit)

- There is also a field called something like "is small personal site" and it presumed that those sites are penalized.

You can find a summary here: https://www.reddit.com/r/SEO/comments/1d2gllz/google_caught_...

kobalsky · 2024-05-30T12:37:03 1717072623

didn’t they just leak the schema? we know they may be tracking that information but we don’t know how it affects the model

_heimdall · 2024-05-30T17:08:55 1717088935

Would they track the data if they haven't been using it, or didn't expect to use it in the future?

brokenmachine · 2024-05-30T23:18:08 1717111088

Didn't they just say they're not using it currently?

UncleMeat · 2024-06-01T14:25:30 1717251930

It is hard to delete protobuf fields.

flembat · 2024-05-31T05:11:08 1717132268

Its getting harder and harder to like Google, not only are they NOT not evil, they are also not even competent.

Fer24 · 2024-06-05T00:33:47 1717547627

Well, another reason to take whit caution what Google says