Hacker News new | past | comments | ask | show | jobs | submit login

The reason for that is actually rational: when Amit Singhal was in charge the search rules were written by hand. Once he was fired, the Search Quality team switched to machine learning. The ML was better in many ways: it produced higher quality results with a lot less effort. It just had one possibly fatal flaw: if some result was wrong there was no recourse. And that's what you are observing now: search quality is good or excellent most of the time while sometimes it's very bad and G can't fix it.



I wouldn't call that rational. There is no reason you can't apply human weighting on top of ML.

Honestly, I don't believe for a minute they "can't fix it." They do this sort of thing all the time, for instance when ML shows dark skinned people for a search for gorilla, they obviously have recourse.


You do know that Google basically slapped a patch on that one right?

https://www.theverge.com/2018/1/12/16882408/google-racist-go...


I’m confused. I read that article and it has this:

> But, as a new report from Wired shows, nearly three years on and Google hasn’t really fixed anything. The company has simply blocked its image recognition algorithms from identifying gorillas altogether — preferring, presumably, to limit the service rather than risk another miscategorization.

Is that not an example of human intervention in ML?


Yes but then they fixed it right.


Fixing it right would be re-training the ML algo.... they basically told the algo to never ID anything as a gorilla (even actual gorillas)


> G can't fix it.

Yes, they can. They should simply stop measuring only positives, and start measuring negatives - e.g. people that press the back button of their browser, or click the second, third, fourth result afterwards...which should hint the ML classifiers that the first result was total crap in the first place.

But I guess this is exactly what happens if you have a business model where leads to sites where you provide ads give you a weird ethics, as your company profits from those scammers more than from legit websites.

From an ML point of view google's search results are the perfect example of overfitting. Kinda ironic that they lead the data science research field and don't realize this in their own product, but teach this flaw everywhere.


They have been already doing this for a loooong time, it's a low hanging fruit.

Take a look sometime at the wealth of data google serp sends back about your interactions with it


The fact that they do collect data does not mean that they use that data in any meaningful way or at all.

They ought to see humongous bounce rates with those fake SEOd pages. Normally, that would suggest shit tier quality and black-hat SEO, which is in theory punishable. Yet, they throw that data away and still rank those sites higher up.

You mean to say that no one at Google has even heard of "external SEO", which is nothing more than fancy way of saying link farming? They do know, this is punishable according to their own rules, yet it works, because either they cannot fix it or do not care to.


They'll never tell how they use the data for obvious reasons and I also can't go into any details. But any obvious thing you can think of almost certainly has been tried, they've been doing it for 20+ years and ranking alone is staffed with several hundreds of smart engineers. Mining clickthrough logs is a fairly old topic itself, has been around since at least early 2000s.


Please provide proof for this theory that google measures this also.


I worked in ranking for two major search engines. They all measure this, this is a really low hanging fruit - how much time it took you to come up with this idea? Why do you think so lowly of people who put decades of life into their systems that they didn't think of it?

Technically just open google serp in developer tools, network tab, set preserve/persist logs option, and watch the requests flowing back - all your clicks and back navigations are reported back for analysis. Same on other search engines. Only DDG doesn't collect your clicks/dwell time - but that's a distinguishing feature of their brand, they stripped themselves of this valuable data on purpose.


Again, this is not about data being collected, we do know how much data Google collects, it is all about what is being done with the data and by extension how good the end result is.

This touches the broader subject of systems engineering and especially validation. As far as I am aware, there are currently no tools/models for validation of machine learning models and the task gets exponentially harder with degrees of freedom given to the ML system. The more data Google collects and tries to use in ranking, the less bounded ranking task is and therefore less validatable, therefore more prone to errors.

Google is such a big player in search space that they can quantify/qualify behavior of their ranking system, publish that as SEO guidelines and have majority of good-faith actors behave in accordance, reinforcing the quality of the model - the more good-faith actors actively compete for the top spot, the more top results are of good-faith actors. However, as evidenced by the OP and other black hat SEO stories, the ranking system can be gamed and datums which should produce negative ranking score are either not weighted appropriately or in some cases contribute to positive score.

Google search results are notoriously plagued with Pinterest results, shop-looking sites which redirect to chinese marketplaces and similar. It looks like the only tool Google has to combat such actors is manual domain-based blacklisting, because, well, they would have done something systematic about it. It seems to me that the ranking algorithm at Google is given so many different inputs that it essentially lives its own life and changes are no longer proactive, but rather reactive, because Google does not have sufficient tools to monitor black hat SEO activity to punish sites accordingly.


So they do collect it, they only ignore it - just like the 10 - 30 (or more) clicks I've spent on the tiny tiny [x] in the top corner of scammy-looking-dating-site-slash-mail-order-bride ads that they served me for a decade?


My impression is that the ML algorithms at Google have the goal of increasing profitability from search. If that is the case, the quality of search will tend to be secondary to displaying pages that bring more revenue.


Blatantly false that Google has "no recourse", Google can put on penalty and bring domains down.


"Request manual review of search results" button?


Since this is now the top spot here on H/N I suspect it just got the attention of some Googlers who I’m sure will review it.

They may not give the site a manual action, though. They’d rather tweak the algorithm so it naturally doesn’t rank. Google’s algo should be able to see stuff like this.

I know that I’ve seen sites tank in the rankings because they got too many links too quickly. It could be that the link part of the algorithm hasn’t fully analyzed the links yet.

I’d be interested in seeing what the Majestic link graph says about this site, ahrefs doesn’t have tier 2 and tier 3 link data.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: