Hacker News new | past | comments | ask | show | jobs | submit login

My guess is that suppressing spammy pages got too hard. So they applied some kind of big hammer that has a high false positive rate. You're getting the best of what's left.

Maybe also some quality decline in their gradual shift to less hand weighted attributes and more ML.




My guess is that Google et al are all hell-bent on not telling you that your search returned zero results. They seem to go to great lengths to make sure that your results page has something on it by any means necessary, including: searching for synonyms for words I searched for instead of the specific words I chose, excluding words to increase the number of results (even though the words they exclude are usually the most important to the query), trying to figure out what it thinks I asked for instead of what I actually asked for.

I further suppose a lot of that is that The Masses(tm) don't use Google like I do. I put in key words for something I'm looking for. I suspect that The Masses(tm) type in vague questions full of typos that search engines have to try to parse into a meaningful search query. If you try to change your search engine to caters to The Masses(tm), then you're necessarily going to annoy the people that knew what they were doing, since the things that they knew how to do don't work like they used to (see also: Google removing the + and - operators).


I was going to reply with something along the same lines. Dropping the keyest keywords is a particularly big pet peeve of mine.

For those "needle in a haystack" type queries, instead of pages that include both $keyword1 and $keyword2, I often get a mix of the top results for each keyword. The problem is compounded by news sites that include links to other recent stories in their sidebars. So I might find articles about $keyword1 that just happen to have completely unrelated but recent articles about $keyword2 in the sidebar.

It also appears that Google and DDG both often ignore "advanced" options like putting exact phrase searches in quotation marks, using a - sign to exclude keywords, etc.

None of this seems to have cut down on SEO spam results either, especially once you get past the first page or two of results.

I suspect it all comes down to trying to handle the most common types of queries. Indeed, if I'm searching for something uncomplicated, like the name of the CEO of a company or something like that, the results come out just fine. Longtail searches probably aren't much of a priority, especially when there's not much competition.


Surely most engineers want the power of strict searching and less of the comforts of being always getting filler results, right?

So... is there an internal service at Google that works correctly but they're hiding from the world?

It might be useful for Google to make different search engines for different types of people. The behaviors of people are probably multi-modal, rather than normally distributed along some continuum where you should just assume the most common behavior and preferences. \

It would even be easier to target ads...

Or maybe this doesn't exist and spam is too hard.


> They seem to go to great lengths to make sure that your results page has something on it by any means necessary

You just described how YouTube's search has been working lately. When you type in a somewhat obscure keyword - or any keyword, really - the search results include not only the videos that match, but videos related to your search. And searches related to your keywords. Sometimes it even shows you a part of the "for you" section that belongs to the home page! The search results are so cluttered now.


Searching gibberish to try to get as few results as possible.

I got down to one with "qwerqnalkwea"

"AEWRLKJAFsdalkjas" returns nothing, but youtube helpfully replaces that search with the likewise nonsensical "AEWR LKJAsdf lkj as" which is just full of content.


> I put in key words for something I'm looking for. I suspect that The Masses(tm) type in vague questions full of typos that search engines have to try to parse into a meaningful search query.

Yeeaap, sometime in gradeschool - I think somewhere around 5th grade, age 11 or so, which would be around 1999 - we had a section on computers, where we'd learn the basics about how to use them. One of the topics I remember was "how to do web searches", where a friend was surprised at how easily I found what I was looking for - the other kids had to be trained to use keywords instead of asking it questions.


It's surprisingly easy to get zero results returned pasting cryptic error messages. It doesn't mean there is nothing, though. Omit half the string, and there's the dozen stack overflow threads with the error. Maybe it didn't read over the line break on stack overflow or something, but I haven't tested anything.


Tyranny of the minimum viable user.


Two anecdotes: It’s really fascinating.

1. My work got some attention at CES so I tried to find articles about it. Filtering for items that were from the last X days and searching for a product name found pages and pages of plagiarized content from our help center. Loading any one of the pages showed an OS appropriate fake “your system is compromised! Install this update” box.

What’s the game here? Is someone trying to suppress our legit pages, or piggybacking on the content, or is that just what happens now?

2. I was looking for some OpenCV stuff and found a blog walking through a tutorial - except my spidey sense kept going off because the write up simply didn’t make sense with the code. Looking a bit further I found that some guys really well written blog had been completely plagiarized and posted on some “code academy tutorial” sort of site - with no attribution. What have we come to?


The first seems big right now, on weird subdomains of clearly hacked sites. E.g. some embedded Linux tutorial on a subdomain of a small-town football club.


Yup. Entertainingly I just saw an example of the “lying date” the original article pointed out: according to google the page is from 17 hours ago. However right next to this it says June, xx 2018. Really?


Well that “big hammer” so to speak is that they tend to favor sites that have a lot of trust and authority.

Someone mentioned that the sites that have the answer typically is buried in the results. That’s because they tend to favor big brands and authoritative sites. And those sites oftentimes don’t have the answer to the search query.

Google’s results have gotten worse and worse over the years.


This! I think this is the biggest piece of the puzzling issue.

Was it Panda update or that one plus the one after - it took out so much of the web and replaced it with "better netizens" who weren't doing this bad thing or that bad thing.

Several problems with that - 1 - they took out a lot of good sites. Many good sites did things to get ranked and did things to be better once they got traffic.

The overbroad ban hammer took many down - and many people that likely paid an seo firm not knowing that seo firms were bad in google's eyes (at the time) - so lots of mom and pops and larger businesses got smacked down and put out of the internet business - just like how many blogs have shut down.

Of course local results taking a lot of search space and the instant answers (50% of searches never get a click cuz google gives them the answer right on the results page (often stolen from a site) are compounding this.

They tried having the disavow tool to make amends - but the average small business doesn't know about these things, and getting help on the webmaster forum is a joke if you are tech inclined, imagine what an experience it is for small business owners.

I miss the days of Matt Cutts warning people "get your Press Releases taken down or nofollowed or it's gonna crush you soon" - problem is most of the people who were profiting from no-longer-allowed seo techniques were not reading Matt's words.

I also appreciated his saying 'tell your users to bookmark you, they may not find you in google results soon' - yeah, at least we were warned about it.

The web has not been the same since those updates, and it's gotten worse since. This does help adwords sell and the big companies that can afford them though.

In these ways google has been kind of like the walmart of the internet, coming in, taking out small businesses, taking what works from one place and making it cheap at their place.

I'd much rather have the results of pre-penguin and let the surfers decide by choosing to remain on a site that may be good that also had press releases and blog links... rather than loosing all the sites that had links on blogs. I am betting most of the users out there would prefer the results of days past as well.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: