One thing I find annoying is that they still return results that are those sites that seem to register a load of terms that all point to the same page. You see this with telephone numbers, song lyrics etc. where the result looks like "Lyrics for Stairway to Heaven" but you click through and there is no content, just a page that says "Upload some lyrics to this song".. etc.
These sites should be heavily penalised for click-baiting and they have been doing it for years.
I've found my result quality has gone up dramatically now that I use extensions allowing me to block domains from google search results. It seems silly that I ought have to do this, but Google has finally become useful again as a result.
Google used to let you do this itself when you were logged in. I never understood why they removed that; surely the information on what domains were deemed unwanted was valuable!?
So I assume by customer you mean advertisers and by product you mean the users carrying out searches.
For their commercial search results I'll grant you that there's an incentive. But for their non-commercial search results why would they care? That's how it used to work, you couldn't delete a domain from the paid results, but you could make it disappear from the unpaid ones.
Are these client-side only extensions that are manipulating the DOM, or are they somehow feeding parameters to google to exclude domains (by e.g., automatically suffixing a series of "-site:blah.com"s to the query)?
They manipulate the search result DOM. uBlacklist is the one I use and it'll manipulate the search result page telling you when something has been blocked, allows you to unlbock that site, etc.
I use the userscript "Google Hit Hider by Domain (Search Filter / Block Sites)" by Jefferson Scher, whom you may know as one of the top support specialist at Mozilla: http://www.jeffersonscher.com/gm/google-hit-hider/
"GHHbD" is a precursor to uBlacklist; and was for many years THE replacement to blocking sites on Google Search after Google removed the built-in function.
It also has a 'Block' button next to the search results (remember, the script existed long before uBlacklist), which allows to grey-out or hide results, based on subdomains down to the base domain.
Don't forget to use ungoogled chromium if possible
The only downside is you will have to load unpacked extensions instead of using the Chrome "store" and you will have to manually install Chromium updates from the same site:
I worked on a search engine at a startup that did exactly this, you could up and downvote each result. The main feature was that we essentially "sharded" the search engine so it could be embedded on different sites and give different results based on each community. So a search for "casting" on a fishing website would give different results than the same search on a metallurgy site, as voted by each community. We could also learn passively by watching which links users clicked on and where they "disappeared" from the search engine - presumably the last clicked link was the result they were looking for.
Google did copy the voting feature on their results page briefly but abandoned it. [0] This was back in 2006-7. We learned the hard way that it's pretty much impossible to compete with Google in search even when you're innovating. They either copy you, or can just blackhole you out of existence.
You can work around that by using javascript (checking for tab losing focus) and checking how quickly the links are being clicked. If you middle-click a bunch of links, then don't click any more links, then it can be inferred that one of the links that you clicked was "good".
The lack of an ability to vote on search results seems like a baffling omission. Normally Google loves to crowdsource their work, but for this one area where we would actually want it, they decide that they know better when they clearly don't.
It would be abused into the ground by the same people who set up these scam sites in the first place. For most things on the internet, the crappiness of the average person is the limiting factor.
They don't have to make it a public vote. I'm fine to do this work myself for a week or so, and then enjoy crap-free searches. Or link my votes account to someone who I trust and use their opinion.
Idk why everyone is praising public anything, because local communities and knowledge webs worked fine pre-internet. Most of the bullshit came with globalization.
I wish there was some standardized way to say "I trust you" on the internet, and share part of their information bubble (e.g. reviews, up/downvoted websites, youtube recommendations, etc.), and this bubble could have some transitive properties (if I trust X and X trusts Y, then to some extent I trust Y too).
But it's probably too privacy sensitive (if I see which sites you upvoted, then I have some information about you). Hence, that's probably why this has to be either completely private or completely public.
Well, every social media network tries to capture and monetize this, of course.
But yes, it would be nice to have this in a less proprietary way. But I fear it's either going to be a privacy issue (because the person you trust has to publish every page they like and dislike), or it's going to be anonymous and therefore easily gamed by bad actors.
If the group of trusted people in a bubble is large enough (say 1000s of people), then it wouldn't be a problem, I suppose.
For example, I wouldn't mind sharing my upvoted websites/videos/products with everyone on HN, as long as it is anonymized. Bad actors can be distrusted by the community, I suppose (though moderation seems to be largely an unsolved problem still).
But the common notion is that it's probably too hard for "regular" users, so we have the internet of nonsense instead. Few more years and we won't find anything good.
What if they took only votes from people that pay and have account for X time? Also what’s the problem with adding revert per user? With some events architekture it should be trivial. They could even train AI on this data to automatically sugest suspects.
Here's an easy thought experiment: for any propose of the type "what if they had a system that did X?", imagine that someone tells you they'll give you 10 million dollars if you can figure out a way to game that system that can't be detected by Google.
If you spend 5 minutes on it, can you think of a way? If you can, congratulations, now imagine millions of other people thought about the same tricks and you get the reason why Google can't ever really win against SEO.
Google might not be able to 'win', but they can stay ahead.
Eg if they can get to a place where SEO efforts point in the same direction as making your website genuinely more useful to people, then that's good enough for Google.
It's totally fine if they only use my own votes to modify my results. In fact, if we're going to do that, I'd really like to block some sites from my search results entirely. Though ideally, I would also like to include the votes from people I trust.
And when those accounts suddenly start behaving like bad accounts, their votes get treated as such. There will be a window of confusion, but it might not be that big. And there would be an economic incentive for people to establish trusted accounts with high-quality voting behaviour.
Their entire business (advertising) is clustering. They've already done the hard bit.
Just cluster people by downvotes and whatever other thousand metrics are already being tracked, and allow them to see results given by other clusters by showing which areas are dense on a PCA or something or saying 'I want my results to only be influenced by people who have downvoted github.org' or whatever.
Not really. In these areas there's a general principle that the moment you start using a signal to enforce on bad actors that signal's value decays. It's sad, I actually think web3 could help curb abuse at scale by making it too expensive to scale horizontally. Imagine paying a "stamp" to send an email on the blockchain, spam email would dramatically drop. Maybe similar mechanics could be in play for other things but then you lose democratization of information.
The idea is you'd have an email client but all data is stored encrypted on chain and you'd have to pay tx costs to send things. By having this sort of stamp tax spammer wouldn't be able to scale mass messaging as it would bankrupt them.
in meatspace though, my real mailbox is mostly filled with spam that the sender paid to have delivered.
granted, it would at least act as a limiting factor.
Isn't Google's monetization for Search results the primary reason for not allowing crowd funded quality metrics? If only good results get to the top how do they make money? Google's Search business model appears to be predicated on poor results getting disproportionately prominent listing.
The actual sponsored links end up at the top anyway. I don't see how bad regular results would contribute to their revenue; it would just drive people away.
>I don't see how bad regular results would contribute to their revenue
Let's say you run a good song-lyrics site that has the correct lyrics for everyone's favorite songs. You happen to be on page 2 of google results for common queries; all of page 1 is taken up by spammers, fake pages, etc.
How can you possibly drive traffic to your site? Maybe you can invest in SEO but no promises there. You'd be competing against people whose whole focus is SEO and nothing else. The only option left is to buy ads.
I agree that it would work as long as Google has an absolute monopoly on search. Google wouldn't care how bad their search results are, because there's nowhere else to go. But if there are alternatives, users should stop using Google and use the alternatives instead, and then Google has an incentive to improve their search results for users again.
Let's say we have three search results, good, average and bad, displayed on one result per page in that order. The user is happy, Google is unhappy as there is no motivation for the site owner of the best search result to pay Google for their listing. The only way Google gets money is if it enables the result order to be modified, i.e. the owners of the average or poor sites to disproportionately effect their listing prominence by appearing on page 1. Google do this by allowing paid adverts. Bad or average search results thus get disproportionately high prominence in the listing with the knock on effect that the owner of the best search result now has motivation to pay Google to regain their primary listing. And even further, if the site owner of the best search results now has to pay Google for the prominence of their listing why waste money on continuing to provide the best results? The user is unhappy, Google is happy.
But you do vote on search results. Google knows if you stay on a page or leave it quickly for another result. That's an implicit vote that they definitely take into account.
Having that explicit button might not really add any additional value.
Which heavily penalizes sites that make information easy to find.
Especially for use cases like reviews where you are looking for multiple opinions.
The hard to navigate site full of waffle, and SEO duckspeak nonsense gets a positive while the site with clear concise information the user can absorb in 2s gets penalized.
That's not how I use the results page, though. I open the top x number of results in separate tabs, and then check those pages. I only go back to the tab with the results page after I've checked them all, so that might be a while, even if they all suck.
So if they look at my behaviour the way you say, then the feedback they'd get from me would be that the top x-1 results are always bad, while the xth result is always good. That sounds like a poor algorithm for them, but it might explain why my Google results always suck.
Thanks to things like Google Analytics, Google knows what site you're viewing at any moment in time. It can see you go through the results sequentially and stop at some point.
Luckily for search, it is how the vast majority of users use things. Less than 1% of users doing things differently makes the stats a little messier, but still works quite well.
If I can block the 5 word soup gpt2 fake support sites that make up the top 20 results for obscure debugging messages from my own personal account, then my signal to noise ratio went from 0 to some number above 0. This is an infinite improvement for the first page.
Noone ever wants to go to xypdf.com for any reason unless they want to feel like they just had a stroke, how is (was? it made me stop using google except as a last resort in 2019) it often 3 of the top 5 results?
I wish I could just invert their SEO quality metric (there was a golden window around 2018 where you could just type -best into search engines that still respected subtraction to get only good reviews but sadly quality sites have fallen into line with the duck speak). I feel it's a pretty reliable indicator of garbage.
The signal:noise ratio would certainly improve for anyone voting even if they didn’t use it to adjust other people’s search results.
I’m also skeptical that all of Google’s enormous investment in ML and staffing is completely powerless to identify bad actors with atypical usage patterns. What seems far more plausible is that they’ve decided it isn’t costing them more in ad sales than it brings in. There are individual domains which would improve results by being blocked but they also pay for search ads so … unsolved grand challenge of computer science it is!
> I’m also skeptical that all of Google’s enormous investment in ML and staffing is completely powerless to identify bad actors with atypical usage patterns.
Isn't that exactly what people are complaining about here though? At least part o the problem with search results is that google seems powerless to recognize and remove useless bad actors (stack overflow copies, etc.) from their index.
My position is that they could do far more if they cared - simply blocking spam domains like it’s the previous century would make my experience better – but that they have made a business decision not to devote more resources to the problem.
"These sites should be penalised for click-baiting and they have been doing it for years."
If SEO works and a result appears closer to result #1 in the SERPs, then the "true", non SEO-assisted result it is displacing would appear further from result #1. Apply this across the board and what we have are many, many non-SEO results that are pushed down in Google's ranking. No one is "penalising" these pages, however they suffer visibility problems because they have not engaegd in SEO. The incentives created by Google's secretive ranking system and online advertising commercial focus are perverse or at least in conflict with the user's goals. Google discourages and even prevents any user from looking at results that were hits but were not ranked high. Pages that may not succombed to the the influence of such incentives may "disappear".
What if a user understands this and wants ignore the Google ranking system. What if the user wants to see the true, non-SEO results. Google actively limits the user's ability to see those displaced results. For example if a user searches for a common term, such as "example", she will not be able to view more than 200-300 results. Elsewhere in this thread someone also noted even with a paid API, Google limits users to 1000 results. If the user wants to the see the full range of pages that have hits for the word "example", she cannot do so. If the user would like to perform a single search for all pages containing the term "example" and then sort by some other objective criteria such as alphabetical by domainname, date, page size, etc., she cannot do so.
Under Google's model of the web, pages that do not acquiesce to an online advertising company's secretive ranking system may become nondiscoverable, despite the fact that they may indeed match the user's query. Computers assist us in searching through data but "relevance" is ultimately decided by the user. That is why we can have HN threads that claim search result quality is declining. Though they may be slower, humans can determine relevance better than any computer. From the disclosures of Matt Cutts and others we know that humans are involved in Google's ranking implementation. Penalties are used. The search process is not 100% math/computer-based. However, in Google's model of the web, filtering results is the exclusive domain of the online advertising company and only the humans on its payroll, not the user performing the search. There is no option to disable the online advertising company's "assistance" in filtering.
Ironically, my work on my own search engine has led me to be a bit more patient with Google's problems. At least I think I understand them better. Search engines fail in weird ways.
I think in part that Google just has gotten a spectacularly confusing failure mode. If it can't find good matching contents, it starts second-guessing your query and producing other results, which makes you think it's not even considering what you entered. It may even be "better" in the sense that it's more likely to return at least something relevant, but in practice it's bad UX because it's so unintuitive what's happening. It's probably one of those unfortunate optimizations that are invisible when they work and frustrating when they don't.
There is so much stuff on the Internet it's easy to start thinking there is guaranteed to be good results for any search, and that just doesn't seem to be the case. Especially with highly specified searches with 6-8 terms, you quickly enter the domain where you're reasonably unlikely to find an exact match.
> I think in part that Google just has gotten a spectacularly confusing failure mode. If it can't find good matching contents, it starts second-guessing your query and producing other results, which makes you think it's not even considering what you entered.
This is probably part of it but not the whole explanation:
Try to search on Google for:
slack ngrok
When I and others did earlier today there were a number of pages that contained the words including from the slack.com domain.
The top result however was a page that didn't contain ngrok at all.
I saw a specialist at another search engine comment that it was because it was a very popular result (at least thats what I read into it).
Here's my problem with Google: they are either just really bad at QA or they don't care or they consistently overestimate their dumb AI and underestimate me.
I'm fed up.
Not including pages that doesn't contain the search terms or anything similar isn't hard when there are multiple good results at the exact same domain / pagerank, is it?
Given the term "ngrok", I can't say I blame them/it. It looks like a typo to me. I imagine for well over 90% of people searching, it would be a typo. Perhaps, it is a common typo?
Well, it is a well known tool for temporary tunneling of http.
Google knows and the rest of the results confirm it.
Now, if your explanation was the entire explanation it would be kind of ok if they did like Kagi do and Google did themselves at some point, ask nicely:
did you mean <something else>?
or
we included results for <something else>. Please use doublequotes if you want exact matches.
Of course with Google this would be pointless as as far as I can see they ignore doublequotes anyway these days...
I've taken this possibility into account after the numerous recent threads about this, but I don't think it holds up. I'm very often able to find exactly what I'm looking for after 15 minutes of massaging search terms on 3-4 different search engines. The reason it feels like search has gone to hell, for me, is that for 20 years I took for granted that I could type the first terms that came into my mind into one search engine (Google) and the result I wanted was the first result.
Outside of programming-related topics, anything I search returns pages of pop psychology listicles or news articles. Since I am literally never looking for pop psych listicles, Google (and, to be fair, the other search engines as well) has become a lot less usable.
I agree that the open web has deteriorated, with crap drowning out real content. But I maintain that Google et al have failed, or been beat. The content is there, they just can't find it and/or rank it anymore.
Yes, the more popular terms you search for, the more muddy the waters get.
If you search for anything related to sexuality and psychology, the results are littered with sites that were squatted for serving ads (i.e. no relevant content, just ads), poor quality articles with very low quality content (e.g. poorly formed Quora questions with no expert answers).
As with anything, you have to know what search terms for a given subject are good. For example, you'll get more objective answers the more academic sounding you are, because the fewer people have tried to occupy those search terms.
I agree with your experience - to me it also seems I used to enter a search phrase and get back a page of results that exactly match it, these days, I often get things that only vaguely thematically relate to my search phrase instead. Sometimes I would change a word in my search phrase or add an extra word to make the intent much more specific and still get essentially the same search results!!! Terrible, just terrible.
If I'm searching for a development issue, and my first results page contains 2-3 sites that 1-1 copy stackoverflow/github issues, then Google has failed. I doubt those can be more authorative than the original sources
Even programming related topics now return heavily SEOd sites with instead of high quality programming content. Often the SEOd content is not completely awful for these searches, but it’s usually not great & never as good as the best sites.
> I'm very often able to find exactly what I'm looking for after 15 minutes of massaging search terms on 3-4 different search engines.
If you are adjusting your query, then you are going to get different results, possibly including ones that contain the information you want (but not your original search terms).
We've been tweaking search terms to find what we were looking for since day one, what's changed the most it's that it's gotten far less likely that you'll realize you need to do this.
I think this is one of the real drawbacks of ML-algorithms, their failure modes are completely incomprehensible. Dumb algorithms we can grok, and learn to help along the way when it doesn't work. There is really no point where it will always work.
I think this is my biggest problem with how Google now works. It's always been disappointing when you didn't find what you were looking for. But you used to be able to examine the results and see how your search terms might not have been optimal, and adjust accordingly. It was the expectation that you'd have to tweak. Now, changing your exact search terms hardly seem to make a difference.
I think the major difference is that the algorithm used to highly weight matching of specific words and phrases from the search terms, so adding a word, re-ordering, and swapping for synonyms would drastically change the results. Now it seems they're using ML and natural language processing to try to actually understand what you're looking for and give it to you. You can change your search terms, but the language embedding doesn't change much, so the system is actually working as intended. I could see that this might actually be desirable for a large segment of the population who wants their search engine to "just work" in response to natural language queries. If the corpus being indexed was high quality, maybe this would be a good experience. But due to the ads, affiliate marketing, and blogspam that make up a large part of modern internet content, it's simply frustrating.
I wouldn't be surprised if they've done user testing that validates their approach. Programmers tend to be comfortable with the concept that a computer will do what you ask, even if it's not what you meant, but most people want to get the right results on the first try. The natural language/ML approach may be much more intuitive and forgiving in that regard. It's just not an approach that's compatible with the low average quality of the content being indexed, in that it takes away the authority of the user to improve their search results.
I think there's somewhat of a tradeoff in search performance between quality of results on the first try and ability to improve the results on subsequent tries, and google is now optimizing for the former at great cost to the latter. And honestly they're failing at both.
Either google failed, or the pop science listicles have won ... they're not passively being indexed, sorry you found them by mistake. They actively and aggressively create completely artificial content, tailored not at humans but at google, so that they can push the latest stupid ad for some telephone company on you.
It's like bitcoin if you want - they compete for something so useless the entire concept becomes a huge waste of time. Search engine SEO is like hashrate-dependent token mining: the only people who win are the farmers at the cost of burning their entire ecosystem.
> There is so much stuff on the Internet it's easy to start thinking there is guaranteed to be good results for any search, and that just doesn't seem to be the case.
I'm increasingly of the opinion that Google (the advertising engine) has destroyed Google (the search engine), by the two step process of making it profitable to produce blogspam then forcing search to remove blogspam - and a lot of the useful content has gone out with that bathwater.
Not to mention the rise of unsearchable platforms. Google can't search inside Discord.
> I'm increasingly of the opinion that Google (the advertising engine) has destroyed Google (the search engine), by the two step process of making it profitable to produce blogspam then forcing search to remove blogspam - and a lot of the useful content has gone out with that bathwater.
If you're looking for someone to blame google seems like the wrong party here. Surely the people abusing the system (i.e. blog spam) should at least share a good chunk of the blame.
Sure, but emergent bad actors exploiting features of a system that pay them are kind of inevitable, faceless, and there's an endless supply of them. The internet would be a very different place if it wasn't for this phenomenon of human behavior.
Google are just the most powerful and most visible actor in this, and replacing the early less profit orientated web with a dark forest of advertising and tracking is to a great extent on them.
(I don't think the web3 people have realised how important it is that the cost/benefit ratio of "ham" (good content) needs to be above the cost/benefit ratio of spam, by quite a large margin, or spam drives out ham)
I’ve been on the internet since the 80s and the bad actors were still around back then. USENET was made unusable at one point by people spamming, mass cross posting, and playing cancel wars. This was well before ads or profit existed on the net.
This is all true but Google and later Facebook made it easier to monetize spam and were clearly unconcerned about the impact on communities unless advertisers stopped buying. An unethical immigration law firm still needed to sign up clients but a teenager in Moldova could scrape someone else’s content, SEO it, and make a decent income without knowing who the money came from.
I think pjc’s point is extremely important: micropayments could dramatically change things but it’s quite hard to set values which will deter enough spam to be effective without excluding innocent people or simply increasing the damages when someone is compromised. That was what killed proof-of-work email spam concepts decades ago: even if you could get adoption, it wouldn’t have hurt the people using botnets to spam as much as many legitimate users.
Anything that facilitates commerce makes it easier to monetize spam, just as it makes it easier to monetize porn, or even illegal activity (eg bitcoin). You can't really fault Google or Facebook for this, as soon as the internet moved away from hobbyists, academia, and scientists towards the general public, it was absolutely inevitable.
The only way to have prevented something like that would have been to make the internet a giant closed-garden AOL like system with total control over users, identity, and content.
Fundamentally, if you make something frictionless to join, you end up with parasites. If you impose a cost to join, you end up hurting average users, because it's hard to raise the price high enough to keep out bad actors, especially if the expected value is positive and high. (If I need to spend $5000 in order to get one sucker for a MAKE MONEY FAST scheme, it's still worth it)
Part of Apple's value proposition is a tightly controlled walled garden with a high cost to entry. If you're willing to pay those costs, you're protected from a lot of spam, at the cost of a lot of friction to enter the ecosystem, and the lost of full autonomy over your devices.
Can anyone, even in theory? Are there open APIs to all systems of discord? Does Discord have one? Wouldn't that open up all of these systems to systematic classification of all users? Also, is there a web link you can construct that will open up e.g. Discord's desktop app when you click it?
Searching Discord externally is not possible AFAIK. It's an actively user-hostile platform, which is doubly irritating because most community efforts (e.g. modding of games) mandate Discord and allow for nothing else. Frequently they don't even accept communication from another space, such as bug reports via email or GitHub.
It's practically mandatory to "verify" your account with a phone number for many servers. Discord is incredibly anti-privacy, the fact that it is not transparent to people that aren't currently using it is a vendor lock-in measure.
You're probably much less likely to leak PII to non-community members through Discord than a forum.
I also think ephemeral-by-default seems to result in much fewer long-running fights than you would have in public-record-by-default communities like forums.
Like when a thread gets heated in a forum, it keeps getting bumped and none of the people involved can resist picking at the sore.
On Discord it seems like someone steps away, the channel moves on, and the fight actually dies
No - anonymous accounts are good for privacy. If you're contributing information about something, having that information index-able is just common sense.
The world wide web is not the world. If it exists you can search it. Even if you need to open a client, run Discord search, and execute OCR to interpret the results.
In tech, it's never what's possible, it's just what costs area reasonable.
On Amazon I searched in "CDs & Vinyl" for the band "The Birdstones". It showed me results in books.
Criminy. Amazon does this all the time. If I wanted a book, I'd have searched under "Books". I am not confused about the difference between "CDs" and "Books".
What's the good of having categories if they're completely ignored?
I now avoid Amazon for this reason. I'll search for something like "4k HDMI capture card" and half the results are 1080p (and don't say 4k or a synonym anywhere) and a tenth do DisplayPort. I end up having to open every linked product and use my borwser search to confirm that my requirements are actually present.
Same with any sort of requirement like "plastic" a colour...
It just wastes my time so I shop elsewhere.
I think a much better UX would be saying: No results but consider searching for "4k capture card" or "HDMI capture card".
It depends how you view the problem. If one assumes that humans make mistakes - it's probably reasonable to show people things outside their selected search scope. If you're an exacting user, this is rather frustrating.
Realistically neither of the above were the concern. It's an attempt to boost engagement in the hopes that you'll find a product for eventual purchase, nothing more.
You're right, it is there. It is not in headline font, however. It's in a small font, and I did not see it. I just looked at the search results, not seeing all the other noise on the page.
Some funny example is "Eclipse": Do you mean Eclipse the IDE? Or the vampire movies? Maybe you mean the astronomical event? or the Nissan Eclipse car? Or you wrote wrong ellipse?
I'm fine with Google not reading my mind at the first try, but at least offer me alternatives and exact text matching. For example since a couple of years looking for phones or symbols is completelly broken.
And I really admire how Google is able to guess the encoding of the websites, detect what is text, do language detection and dropping all the porn. Writting a crawler that actually works is HARD.
When I search for "databricks series b valuation" in Google (from Argentina, using Google.com in English) result #6 is:
"Python get value from database - Büro Jorge Schmidt", which judging by title and preview seems to be a Python + MySQL tutorial. It returns a 403 error and might be a hacked site, since the home page is for a graphic design studio in Munich.
Result #8 is something similar:
"Intellij flatten packages - Músicos de Viaje". This is definitely a hacked site (from Spain, apparently) that redirects me somewhere else.
Result #10:
"How to calculate tax percentage in sql query". Another hacked site, this time for an evangelical church from Brazil.
Now... how can Google think that any of these sites are relevant? Even if it doesn't realize the pages are hacked... even its crawler has been fed content that included the keywords... :
A - The sites themselves don't match the query at all.
B - No legit site about the subject would link to these sites.
C - The results themselves (title, url, preview), as Google shows them, have nothing to do with the search!
I just tried that search, all of the results look relevant and I definitely don't get any of the results you are getting.
I wonder if you have some malware that is hijacking the results? I once had some malware (chrome extension) that was corrupting my search results. It was surprisingly difficult to remove (given that it was a chrome extension...).
No, I get the same results on Safari on iOS (iPhone) so I have to eliminate the possibility of malware.
Google results are personalized, based on location, search history, etc. The fact that I'm in Argentina has been adding a lot of noise to results on searches where my location is not relevant at all.
In this case, I suspect that Google thinks these hacked sites with developer target content are relevant to me, because of my regular search history.
It sounds like their location based targeting sucks. Spotify has the same problem for me - it spams my playlists & recommendations with songs that I have never, ever shown the slightest interest in purely based on location.
This seems to be a common theme in the industry. Recommenders heavily overweight location - an incredibly general factor, even in the presence of troves of specific, individual level data. Goes to show how little basic reasoning really goes into how these systems work.
All online tools should have a “pretend I’m in Silicon Valley” toggle so you can get the same results the engineers get.
Localized search is useful at times (restaurants for example, I like getting the local McDonald’s or China Wok rather than the biggest one in New York) but it’s completely useless for many terms. But maybe not the ones Google makes the most money on.
How about spamming me with artists I have taken the time to mark as 'Do not play'? The FIRST recommended album in Album Picks is such an artist. Good reminder that I should get rid of my premium subscription, thanks!
If I use an incognito window I don't get the crappy results either.
My feeling is that since that query is SO unusual for me, based on my search history (I can't even say what it means) it raises the "likelyhood" that the hacked spam sites with programming terms that also include those keywords are good results for me.
If I search for something more typical, like "Spider-Man No Way Home" or "Ruby rails tutorial" the results don't include hacked sites.
I see some people are having trouble replicating the results, and I wonder if you some how got thrown into a really bad user test group--if they do that.
Yes, I did. No trash.
Safari on Mac (or iOS), once I log in, spews the same garbage results.
Copied from another answer:
---
My feeling is that since that query is SO unusual for me, based on my search history (I can't even say what it means) it raises the "likelyhood" that the hacked spam sites with programming terms that also include those keywords are good results for me.
If I search for something more typical, like "Spider-Man No Way Home" or "Ruby rails tutorial" the results don't include hacked sites.
Rule #1 of Google usage: turn off results based on search history. Think about it, how could that possibly improve your search results when you search for something you have never searched for before? I (sort of) remember the day that was announced, almost everyone turned it off immediately.
Almost nobody turned if off. And, to answer your question: If your history shows hundreds of queries for Java, PHP, and Ruby, odds are your query about Perl or Crystal or Go isn't about the species or stone or game.
Just tried that search in English. First result is "The data- and AI-focused company has secured a $1.6 billion round at a $38 billion". And if you click the search box you get the first search option as databricks valuation history where the first result is funding every round.
Google has been so much better for search for me than other search engines. Atleast for what I search for programming, news etc.
- I checked the cached (by Google) copies of the hacked pages and they include mentions of a "Databricks SQL Connector". So if I search for "Databricks" Google thinks "it must be a programming thing".
- If I now search for "databricks series a valuation" I don't get the spam results, for some reason. I think that if I repeat the search Google produces the exact same results... but internally, since I first searched, it might have realized that those sites were not good.
That being said, I have issued queries in the past that I have just found absolute walls of malicious results. I haven't really invested much attention span in entertaining this sort of thing so I just move on and modify my query, but I'll keep an eye out for it going forward.
The article mostly talks about IA (instant answers) which are notoriously hard. The recent advances in machine learning have made the technology more approachable, so startups like Kagi Search (disclaimer: founder) can also leverage latest advances in NLP and compete on this ground.
Both engines use the same article for source, but Google completely misses the context.
These examples show that a search startup has a chance to go neck-to-neck with Google and compete even in technology as sophisticated as instant answers. We invested considerable resources in the Kagi Search AI capabilities, discussed in some detail here https://kagi.ai/last-mile-for-web-search.html
What is mind boggling though from a product management perspective is that Google had nearly a decade head start and a cash purse of hundreds of billions of dollars to get this right.
To be fair, it is likely that the vast majority of queries are answered correctly, but only the outliers get the public attention. Also Kagi is not without its own share of silly mistakes too, but just being able to be considered in the same basket as Google is already a huge thing for us.
My favorite is, given that I have a baby and I am a trained scientist and I live in the US, I find myself converting milliliters to fluid ounces a lot.
Right now the Google Assistant will correctly transcribe the request to a Google search... Only for the search to interpret “ml” as “miles” and, faced with the discrepancy between the length and a volume, cube the miles. So I am expecting an amount that is like 1 oz because I am converting like 30 mL... and I instead get 4 quadrillion ounces (exact number is 1 mi³ = 140,942,994,870,857 + 1/7 oz because of course it's got that extra 7th in there what were you expecting from our ridiculous US system, haha).
> No one at Google is responsible for these half baked and largely irrelevant widgets or wants to stake their career fixing them.
You're just ... wrong about this. There's an entire team of dozens of people (maybe hundreds now) focusing on this specific web answer feature. I personally worked on the team (not this feature, though).
I don't understand why people say things they know nothing about.
I think it largely stems from this very popular comment from a couple of years ago: https://news.ycombinator.com/item?id=19553840 From an external point of view certainly seems to explain a lot of Google's issues with the stagnation and death of many of it's products.
The fact that there have been plenty of other comments from Googlers to back it up since shows that, in some parts of Google at least, there's a grain of truth there. It might not be the whole story, many teams might be proud of their part, and many Googlers may not be focusing on promition and shiny new things, but that doesn't really matter to Google's users. What we see has been plenty of Google products being killed off, or left to rot. Now even Search has people complaining about it. Startups are getting traction competing against it. A decade ago that would have been unthinkable.
If you're right and teams in Google do actually care about the older, less shiny things they build then Google has a significant brand and reputation problem. If you're wrong then Google has a massive engineering culture problem. Either way, Google has a problem.
I think people say this because some problems stick out like a sore thumb and stay that way for months if not years. The line of inference is that there must be no incentive inside the company to fix them.
Isn't this just the software engineering equivalent of the fundamental attribution problem? My backlog is long because I have important stuff(TM) to address, whereas their backlog is sign of dysfunction.
> "At the bottom of every question page we provide a link to answers-support@google.com. We encourage you to use this link whenever you see questionable content posted to the site. In your email, please provide information about the question, its ID number, and the reason you find the content questionable."
And yet despite all of those people, the quality is bad and seems to get worse. A startup seems to be outcompeting them.
Sometimes industries or teams have effectively negative value, or their preconceived notions about how something works based on teachings from their field is wrong. This is the case for chiropractors (the whole field is useless / a net negative), vs back doctors.
We see this happening today with Google moving away from traditional and high quality techniques like direct keyword ranking, bm25, and pagerank and move towards lower quality methods based on hype such as BERT/other "semantic search" based on LMs, query rewriting, and using these dense vectors directly in pagerank (and this degrading it).
The amazing power of language models in certain domains (text generation) has unfortunately caused a proliferation of them in a place where they are still pretty bad (information retrieval and search).
Google is full of search chiropractors when they need search back doctors.
> The article mostly talks about IA (instant answers)
Not really. TFA is discussing the search results. If google/bing want to put instant answers then that is what will get judged. If your ML/AI is not good enough yet to provide natural language answers, don't make it the most prominent part of the search results.
I think that Google is optimizing for the "average user" to the detriment of power users such as the HN crowd. Most people treat Google as an internet oracle and send queries like "how do I do X" while power users will search for keywords. One example of this optimization is the automatic answer boxes that show up for certain questions, which are wrong disturbingly often or don't include important details.
The average use absolutely does this. I see this with family members and friends. Most just type in full questions.
From my experience the best way to get good results is to start typing keywords for yourr question and then creating the query based on the autocomplete results. If I notice I don't get autocompletion for a certain query I'll restructure it until I do. This has proven very useful in providing good results.
For technical stuff, using the quotation marks is almost essential.
I miss Altavista. I could generally either find exactly what I was looking for inside of three iterations of refining the search, or find that it wasn't to be found.
I still just want a blazing fast full text search of the reachable WWW that understands regexes and a basic predicate calculus. Unfortunately the overhead and small potential user base means that under the current regime such a thing will never be made.
Speaking of, if any government actually wants competition, they don't need to break up Google, they just need to force them to offer full access to their cache and compute at some reasonable rate, much like how the ILECs were made to carry the CLECs' traffic.
This. And let's not forget that is the "average user" that mostly naively clicks on ads, not the power user. And selling ads is still Google core business.
Looks like Google is slowly turning into a big nigerian scam.
I think this is overall a good thing. Power users have trained their behaviour to work in the way that simple systems can deal with. While average users ask the question exactly how they would ask another human. Google has now reached a level where it works best when you deal with it in a natural and human level.
There is nothing actually better about the way we originally used search engines, it was just required at the time.
No, I disagree. With keyword search, I am confident that eventually, with enough included and excluded terms, I will find what I’m looking for.
With natural language search, sometimes it works great, but it’s a crapshoot and when you don’t get the results you need you’re stuck.
Several times in recent memory Google has returned results so bad, I completely gave up searching. Most recently was when i was trying to look up a Windows 11 BSOD error code (where even pasting the error code verbatim only brought up pages of garbage sites with no useful technical information).
Google results for Windows error codes have been gamed to hell and back by the likes of Easus and Drivereasy, where their tools are somehow the answer to every single fault Windows could possibly manifest.
For some queries, being able to ask off the top of your head without thinking is good. Think 'what day of the week was June 14th, 2002?' or 'Who is the mayor of Los Angeles?' For quick questions with clear answers, the current system is a huge advantage over what we had before.
For other, more complicated queries, the act of composing your search and considering your keywords, etc. is a step in the process that helps a searcher mentally understand the results they're going to receive along with what they mean. Having to stop and consider makes you aware that you're working within a system and its constraints, which makes it more suitable for questions that are complex or not socially settled.
Instant answers (IA) caused a shift in the way contents are written. Content optimized for IA tend to be repetitive and shallow. Viewing content written for IA is a frustrating experience and these tend to dominant the result page now.
The reason you are seeing a lot of that sort of content is because Google is looking for that sort of content, in point because of Google's peculiarities, but also because of how refined the art of black hat SEO has become.
Meaningful websites still exist. The bulk of the content on the Internet is older than IA, and it's still out there (not that you can find it with Google).
Hey, free advertising for you and your search engine:
Marginalia search, by punishing ad- and tracking-heavy pages and by being strict in how it interprets my queries sometimes surfaces better results than Google and DDG.
In particular I have found great resources about Linux partitioning and git usage after giving up mainstream search engines and trying them in marginalia.
That says quite something about how badly broken the situation is given that marginalia is one person and a tower pc in a living room.
I keep getting reminded about a Linux quote on how they managed to go forward by studying the latest 20 years of OS research and throwing it all away :-)
No idea if you're right or not, but I think it's an interesting take. It might be because of SEO, it might be more information in walled gardens, it might be the death of the personal web page.
Both Google search and the internet are changing, as are our perceptions, so it's really hard to say why search quality is better or worse.
Whether search quality is deteriorating really depends what we expect from a search engine. Many of us on HN use search engine as a way to look for web pages with specific words. Search engines did start out performing word searches, but somewhere along the way, they started trying to answer questions. Many of the complaints such as Google dropping search terms are actually the result of the search engine trying to answer questions instead of performing keyword searches. Search engines these days are doing well for the easy questions, but of course they completely misunderstand questions with more nuances.
A bit off topic: Some of the search queries in the article used Google like a keyword matching tool, while other queries in the article were using Google to answer questions. That's because we only have one search box trying to do everything. Would we be better served if we have a checkbox to tell the search engine that we just want to perform word searches?
It's definitely deteriorating, and the worst part is that it completely ignores quotes if it thinks you meant something else, and shows the results for what it thinks you want. Completely useless in a lot of cases
I have an Intel Realsense camera, which sometimes reports the error "Failed to recconect" (there being a typo in the drivers) [1] - that's a pretty unique error, so in combination with the product name that should be a very easy keyword search, right?
But no, when I search for realsense "failed to recconect" Google returns pages that contain neither realsense nor recconect [2]. They offer me a supreme court opinion, a review of a car dealership, and a facebook church service.
Correcting the spelling of a query is one thing - but also completely ignoring other keywords? I can see why there are so many people posting about the poor quality of Google's search results.
This has only happened to me twice so far, but it has given me the "Did you mean:" even when using quotes for searches that definitely have results (as I found them with duckduckgo).
I can't remember what they were now
If you search for anything political on Google, you'll notice that the results are clearly slanted in one direction, towards the opinion of a handful of pre-approved news outlets. This leads me to seek alternatives whenever I need neutral sources, for instance Yahoo search.
I wonder if they're getting cash kickbacks from established corporate media outlets for pushing their material to the top of the search results. That would actually be less creepy than if its being done as some kind of information manipulation program.
It's high time Google and other search engines were forced to expose the inner workings of their ranking algorithms to the public, particularly now that they have near-monopoly power in the sector. People should also be able to adjust the dials on the algorithm themselves.
In Australia, Google is being blackmailed into boosting Fairfax, Newscorp and Seven West Media content higher up in their index. It's reached a point where most queries are useless if they contain a word or even a synonym for that word that has been recently used in a major media site owned by those three.
I use Google through a VPN to avoid it. That breaks maps integration.
I'm curious, why go through the hassle of a VPN rather than another search engine? Are the other results so poor or do the others do the same as Google?
Try "what countries are using ivermectin" in google.com and then try Yandex.com. For me, the third site on Google (the kitchen sisters) appears broken and the rest are all some variation of "why ivermectin is bad" articles. Yandex actually answers the question.
I think it is related to the fight against "fake news", "hate speech", etc... People don't tolerate a truly neutral search engine, because it will reflect human nature and human nature is not always pretty. I remember the time when Google returned antisemitic websites when searching for "jew", they refused to do anything about it because "jew" is used mostly by antisemites and therefore, an antisemitic website is what people searching for that term most likely want, the search engine did its job. I don't think it will fly today.
So search engines now have to get the "truth", preferably the politically correct one, and since you can't rely on the crowd for that, you have to introduce bias, and "pre-approved news outlets" are the most obvious choice.
I find these responses fascinating as the "clearly slanted" results tend to change direction depending on the political affiliation of the person making the claim! Having said that, I'd love to be proved wrong if you have any evidence to show a particular bias one way or the other?
Search "mass formation psychosis" on Google and DuckDuckGo. This is a trending phrase popularized by a doctor that has been canceled and banned from the mainstream due to his criticism of the world's COVID response.
DDG shows the author's substack as #1 result and is neutral otherwise. The other doesn't even have it on the first SERP, and is overwhelmingly critical.
If you argue that covid response is not politics, I will disagrer strongly.
Search politically controversial topics incognito with a VPN on using google vs yandex vs duckduckgo? Ivermectin, January 6th arrests, BLM protest deaths, VAED, mRNA studies before 2020, Robert Malone, Geert Vanden Bossche. I mean there's an endless list of things you can experiment with.
My experience is that it's slanted whichever side of the easel you're one. The evidence is really clear when you measure results between various search providers, and especially when searching up contentious or controversial topics. So it doesn't really change direction so much as it confirms one particular set of beliefs depending on which search engine you're on. I think this is clearly in Google's disfavour, because people have started to notice and they're actively searching for alternatives to Google in order to avoid it.
As a corollary, search on Google News (as in, browsing to news.google.com and searching there, or !gn via DuckDuckGo) is really bad. The index seems to update really slowly, so breaking events are usually missing entirely, and the grouping of articles into single events is also quite broken.
Non-factual answers are especially interesting, because otherwise reasonable and intelligent people believe in them. Everyone would be better off studying those things and see how they've drawn those conclusions we don't subscribe to, so that we can figure out what we are wrong about.
It's absolutely catastrophic if you are not allowed to draw your own conclusions about things. This is your inalienable prerogative as an adult in a free country. Even at the risk of some people being wrong sometimes, you simply cannot have authorities distributing doctrine and call yourself a democracy.
The most frustrating part of using Google these days (for me anyhow) is Google returning results that don't match terms that I specifically wrap in quotes. If I search for:
"gamakatsu octopus hooks"
I expect to only receive results for that. Instead I get bombarded by results that match a portion, or when Google thinks I tangentially might have meant something else. There was a time when it respected the quote characters, but those days have long since passed.
What's galling is that they've actively gone out of their way to make it worse, instead of just letting it regress through neglect.
For example, a few weeks ago, I image searched for a meme that I created years ago on 4chan. A dozen or so results were returned, none of them relevant. But if you tack on the name of a 4chan archive, for example "4plebs" (not even "site:4plebs..."), all of the sudden it turns up.
Google in general seems to penalize 4chan and its archives, which is ironic since it's one of the few places where actual humans post OC. Meanwhile Pinterest spam, AI-generated blog posts, and reddit threads full of bots and shills abound in its results.
Speaking of 4chan and google's declining result relevancy, a particular instance of the latter was discussed there (one of the few places it could be discussed, given the amount of censorship that prevails everywhere else these days):
This is still the case today, at least in the US (I just checked). Instead of emphasizing the painting of Beethoven we all know, the one that was actually done during his lifetime, the one featured in the infobox of his Wikipedia page (which is also the top link result), it instead emphasizes a much more obscure painting that was done posthumously, for no obvious reason other than it giving him a noticeably darker skin tone. I'm not even offended by it, I just find it ridiculous that Google actually went out of its way (probably for pc reasons) to train their algorithm to return less relevant results.
When I search “tim lee food blogger age” Google actually shows results with “age” striked out (so it shows top results as if age wasn’t part of the queried string).
Trying to think why/how it’d conclude that age wasn’t necessary for good results.
Query rewriting, a useless and use-hostile technique, strikes again! (No I am not talking about trying to fix spelling mistakes with "did you mean". I mean straight up removal or rewriting of correct English words)
I am embarrassed by the whole field of NLP for making such a big deal out of a task that is fundamentally bad.
Google optimizing for the "average user" comes at the expense of the whole world, because the "power users" who are optimized against literally build the internet for the normies. Cater to the normies for too long, and we see the status quo.
Force the normies to get better at writing their queries. NLP will unironically have a large number of people with the title "query engineer" soon anyway due to the rise of increasingly large foundation language models proliferating. Welcome to the brace new world of botched semantic search!
> Google optimizing for the "average user" comes at the expense of the whole world, because the "power users" who are optimized against literally build the internet for the normies. Cater to the normies for too long, and we see the status quo.
If I were feeling cynical, I would point out that catering to us was a great idea for early Google, but not so much now. The problem with power users is that the more control you give them, the more they learn about how your product works and what decisions you're making. Power users can pick out dark UI and other user-hostile patterns more easily.
Early Google needed the power users, because we were an essential component of building Google's early dominance: It was us gesturing the normies over and saying "as the computer person (TM) in your life, you should be using Google. They're cleaner and their results are better." We were needed to both scale their user base and plug gaps since at the time there weren't dozens of engineers working on issues and search + web indexing was still in its infancy.
Now, though? There's no benefit to Google in engaging with power users. They have the numbers of normies they need for profit, and all engaging power users would do now is lead to more conversations like this in the real world, which is not what they want.
"Oh, you're still using Google? They suck now. Use X, Y, and Z." = conversations Google doesn't want.
We cater to power users. If you absolutely, positively need a word or phrase to be on a page, use quotes. "age" or "tim lee" would tell us that exact word or that exact phrase has to be on the page. Which still doesn't help with this query, given that his age doesn't appear to be known, but you can do it for others.
Even quotes don't actually work. I've witnessed google query rewrite even with full quotes. I've also seen other posts on HN from others who've had this happen to them.
Quotes do work. They really, really do. Every time I look into a complaint about this, I find that the text really is on the page. It just might not be readily visible. Twitter link is in my bio here. If you ever get an example where you find quotes don't seem to be working, do pass it on.
We show this to indicate to you that we found a page that might be relevant but which doesn't contain that exact word. And why we would conclude that is explained better if you search for ["tim lee" food vlogger "age"].
That's telling us to find only pages that have "tim lee" on it in that order, as well as the word "age" and food vlogger are nice to have but not required. And it turns out -- there's not a lot of pages. Certainly not many pages, it seems, about the Tim Lee you're looking for with his age. And that's because not everything we want is actually out there. There might not be a page that has his age.
So back to why we dropped it. By showing you some pages that don't have the word age on it, we're able to show some other pages that are generally about him, which might get you closer to the answer.
BTW, there is a Wikipedia page I've seen suggested as having the right answer for his age. But that's for a different Tim Lee -- not the food vlogger, and it also only lists his birth year, so knowing the exact age is hard
It no longer does this because now your comment is the only result for the query. Before, it could not find a single result, said as much, and gave you the wikipedia page that included the age. What's wrong with that, again?
I realized after testing some more, that the query might not have had many relevant search results. It’s still a strange tactic. Age is contextually important in my query - Google didn’t recognize this
Censoring terms of the search vector to find relevant results is a legitimate approach that works well. In this example, the article author gives a Wikipedia page as the desired top result. That document does not include the term "age" anywhere. Therefore you can see how the term is not necessary to the search.
As a user, I can’t think of a time when censoring a term has given me relevant results. It just serves to frustrate me. If Google doesn’t want to return a blank page, they ought to have a small section at the top that says something like, “No results found for [search terms]” and then another section following that clearly states it is excluding a term.
FaceBook Marketplace and Craigslist do something similar when there are no local results. They show results from outside your area but clearly call it out.
While I agree this isn't technically "censorship" from one perspective, I do think it's important to consider this: by forcibly omitting search terms against your wishes, it is actively and deliberately preventing you from searching from what you wanted to search for. This is both frustrating for the user, and has a detrimental effect upon the quality of the search results.
When most people search, they're generally looking for information that matches a topic, not necessarily every single word in a query. You can imagine the number of misses that would happen if we didn't compensate for misspellings, for example. Or for synonyms or plural forms. But in no way are people prevented from searching for exactly the words indicated, if they feel that's best. Use Tools > All Results > Verbatim from the search toolbar. Or put quotes around the word or words you want exactly sought.
It works like trash when searching for obscure technical problems, e.g. debugging some little used tool in linux. It seems more often than not the most important word in the search is omitted, making the results completely irrelevant.
We wouldn't say the word is censored. It's just an indication we found a page tha that might be relevant even though it doesn't contain one or more of the search terms, and we want to help the user understand that. But yes overall, that's exactly why the approach can be useful, in case a page with the right information uses a slightly different related word. Also, that Wikipedia page is for a different Tim Lee -- a comedian, not the food vlogger.
Googles search's results are often wrong because of corporate choices. In Greece there is a completely independent news web site which for some reason just isn't registered as a news web site with google. As a result not only is this web site less shown than others (in google feed or in search results) but in the past there have been cases where this web site was the first to publish a story but google search only returned other news sites which reproduced this story even using the original material!
In my opinion, google has become too big and has lost focus on actual quality/engineering.
I believe the theory that Google is optimizing for ad revenue. Sites without ads get ranked lower. The biggest example I can think of is Wikipedia. When I search a proper noun with a Wikipedia article, I almost want to go look at that article. Recently, I feel like I really have to dig for it.
Anecdotal evidence - but I feel like this could be completely valid.
One of my biggest side projects for many years was a student tool centered around test scores. It was a niche use case with a huge amount of students using the tool on one day per year (1m+). There was exactly one competitor. I had a better domain but a much worse site in terms of design, speed etc. We were nearly the same in traffic, until I decided to monetize the site with a lot of Google ads. Immediately, Google shot my site up in the rankings, and actually seemed to penalize the competitive site. My traffic went up 10x and the competitive site remained flat. This happened for 3 years, then the niche use of both of our tools was “patched”.
This seems accurate to me too. And it also seems the most likely - a profit maximizing corporation doing profit maximizing things? Not really a huge surprise.
Yup. I've noticed recently that when I search for a public figure (whether it be a politican or businessperson or celebrity), their Wikipedia is now at the bottom of page 1, or not at all anymore. Never used to be the case 5 years ago.
Do you have an adblocker that's accidentally blocking the side-bar? I find it difficult to find a proper-noun search that doesn't include a side-bar link to Wikipedia.
I share the view that Google SERPS have dropped in quality the last 5-7 years. Of great annoyance to me is the amateurish way that a search results page will find relevant Twitter results but then clicking on the results takes you to the root page of the Twitter user and not the result. Since many Twitterers are prolific posters, it can be very time-consuming or even impossible to find the result listed. Thankfully Inoreader takes me to the exact Twitter result.
One of the worst things I've noticed recently about Google Search is how it is very anti-startup because of the concept of the Google Sandbox, an essentially arbitrary length of time they put a huge negative penalty on your site to try to entice you to buy paid ads instead before your funding runs out waiting for organic traffic.
Perhaps that's my biased opinion on their motivations as I've recently launched https://grizzlybulls.com and yet even though Bing has tiny market share, I'm getting 10x more organic traffic from Bing rather than Google...
One reason it might deteriorated is that goog is constantly battling ppl 'optimizing' their content for Google while competitors likely see less than 1/1000 of this.
That would make sense if so many easy to detect low hanging fruit like blog spam, scraped stack overflow pirates, and listicles didn't make it to the top. Those are easy for google to de-rank and yet they don't.
What's the source for it being easy? It seems easy, from the perspective of a human looking at results, but I'm not sure how much that's worth given the scale and complexity of the problem.
Just block the domain. At first, you can block manually, but we know Google doesn't like doing things that way. Fortunately, they have a lot of heuristics to find sites like that; usually the content is just copied from another source. And since they scrape the web all the time, they should know which content has appeared first where.
But the issue isn't that they can't; the issue is that they don't want that. Why the sites with copied content exists? To earn money through ads. What earns Google money? Ads!
Any simple heuristic has false positives, meaning they'll end up taking down legitimate sites that had repeated content for a good reason.
Say, for example two sites quoting text from the us constitution. The second one to be crawled would be considered to be spam copying the first one and removed from web results. Then you'll get comments on hacker news complaining that Google is censoring it for political reasons.
And any simple heuristic is quickly reverse engineered by SEOs, who will find a way to mask it as legitimate.
They could use the heuristics to build a list of domains to block and then have someone review it. After doing it for a long time, they could build a neural model on top of that, and automate it.
As I have said, the reason they don't do it is not because they don't have the skills and know-how.
Captchas are way harder to solve than it is to detect these sorts of poor results. Google should have absolutely no problem building a classifier that could scale to solve the problem. But as you said, it's not worth it to them, but for the reasons of losing revenue rather than scale and complexity.
There are billions of dollars on stake from both sides, search engines and spammers, in an endless arms race that has been going on for more than 20 years.
Trust me, it's beyond naive to say fighting webspam is a low hanging fruit problem.
Why should I trust you? I trust my own eyes. I regularly see spam sites that get to the top of results that are seen by many people for months. These could be filtered with a one line change.
That's a good point! A lot of online content is created for search clicks only, not quality. For example, I often search Google in lithuanian language and I keep getting whole lot of auto-translated blog/article farms as a result. Those translations almost always are shoddy, but these pages keep popping up in the first Google results because a) there is not a lot of native (language) content to push out these clickbaits and b) those clickbaits are on domains that are also, well, in the first results of other lesser languages... I think these auto-translated pages started to pop up two or three years ago...
I don't know if search is deteriorating as a whole, but certain searches seems to be manipulated for political reasons. The famous example is an Image search of "white couple" - really, try it, it is like only 50% correct. But I don't believe the image search itself would be that bad, rather certain queries are given manipulated results.
I'm not convinced "white couple" showing interracial couples is a political move, or even on purpose.
I think it simply is because very few people would describe in text a white couple as a "white couple" and not just as a "couple". In a majority white culture there is no reason to specify skin colour when both are white. It's just a couple.
On the contrary, I think it would be very normal to write "black and white couple" for an interracial couple, and because "black and white" also contains "white" thus those images would show up when you search for "white couple".
This is easy to verify by looking for the word "white" around the images Google returns for "white couple", and they are definitely there - often in the image title itself.
However, if you just search for "couple" then you'll find what you are looking for. At least on my google 9 out of 10 results are a white couple.
You're overestimating the smarts of google: Image search does not classify image contents. It uses the site text for ranking of the images on that site. It finds lots of "Black And White Couple Stock Photo" images - the image description in fact contains the string "white couple".
I'm fairly certain it actually is classifying image contents since at least a year or two ago (or maybe a little bit even longer – I can't remember when exactly I first noticed this), because I've noticed getting image search results that can't be explained otherwise (the keyword I've been searching for most certainly doesn't appear anywhere on the page containing the image in question).
It can be turned off by doing a verbatim search (i.e. surrounding the search term with quotes), but otherwise it definitively happens, and it's occasionally been quite useful actually.
Likewise, since at least around the same time (if not somewhat earlier), Google has also been running OCR on all images it indexes.
While that's certainly possible, this feels like a mistake AltaVista would make 20 years ago, it seems weirdly out of place for Google doing that in 2022. They used to be a bit more context aware and not just present you results due to some bit of text matching.
If I try the search on flickr.com, "white people" returns black&white pictures of people, while "white couple" returns swans. Meanwhile shutterstock.com returns mostly correct results.
Either your theory is wrong, or the other search engines are also manipulating results for political reasons, or they are copying the results from Google.
One theory is that Google has made a substantial change to a neural network based search, and they are still working out the kinks in getting it to work. How could it not be A/B tested such that we wouldn't notice the bad searches? The answer to that I am not sure. I read their research publications, and the NLP research coming out of Google is far beyond any other company. I can only imagine what they aren't publishing.
Google is not meaningfully ahead of other megacorps in NLP like facebook, and likely based on this thread and other starting to backslide.
Google may publish more than others, but only because if you write "we trained our models on 24 specialized TPUs for 1 month" than the reviewers instantly know you work for deep-mind despite "double blind anonymity" and thus you are much more likely to be accepted.
A/B testing with the decision criteria being search ad revenue will prefer lower quality search results ... since you have to search more (and see more ads) to get what you're looking for. :(
Honestly its pure speculation. But I could see scenarios where neural network driven search is great 90% of the time, and misses the last 10%, but in aggregate improves their revenue and product. Users on HN complain about the results, whereas normal searchers dont. The SEO spam works since it answers the question/fulfills whatever distance metric they are using for the search query.
Basic explanation of how one of these systems works:
(offline computation)
user query comes in -> algorithm runs across all the text on the page and computes the distance between the query and the meaning of the text (as represented by word embeddings called BERT).
Maybe the SEO spam has a good distance metric there, causing flaws, but the search works very well for a large number of users when trying to actually extract an answer to a search.
I work in NLP, but could easily be 100% wrong. References to my knowledge of their NLP is literally just reading all of googles most recent research
Interesting. I wonder how they might be approaching continuous learning.I.e. The UI has mechanisms for user feedback that might be useful in some fine tuning process
The methodology in this article is terrible. It makes me doubt that the people at Surge HQ understand even the most basic scientific concepts.
This is like doing a taste test between two sodas where one is clearly labeled "Coke" and the other labeled "Pepsi". It will end up measuring branding and public perception instead of anything empirical or even objective.
This isn't a measurement of search quality, it's a public opinion poll with a sample size of 250. In fact the whole thing is a poorly disguised advertisement, and I don't think it serves them well.
I've been gone from G for 4 1/2 years now. When I was there, the weekly meetings often featured "search quality" measurements that were rigorous in their objectivity (I thought). They bent over backwards to be non-self-deluding.
I distinctly remember Udi Manber saying "if the web is slow, it's our fault" (actually, the speech was that everything is "our fault"), meaning, really, "take responsibility for problems and don't throw up your hands."
However, the natural tendency of any organization is to reward the suckups and promote mediocre people who just get along with everyone. It wouldn't surprise me if that's what's happened with Google, too.
For example, my first week at Google/YouTube, I was in a New Hire meeting with our VP. Someone asked about profitability, and he responded that Larry said we didn't have to worry about revenue yet, since the main goal was user growth/happiness, and revenue could come later. Which I thought was fascinating, considering how big YouTube already was at the time (in 2013)!
Though I think this changed a year later, and I find YouTube ads a poor experience compared to Instagram and TikTok -- which aren't merely "better than the rest", but stuff I actually enjoy watching.
in 10+ years of targeted advertising, i still have yet to see a single ad for something that is actually relevant to what i want to buy.
i'm actually baffled that these companies that have my entire purchasing history and metadata still cannot do a better job at making me spend money than absolute randos on the internet.
Indeed - it really surprised me that we didn't see big ad companies saying:. "we've crunched all the data about you, and here is a list of the products you'll be wanting to buy this week. Buy them all? [Yes] [No]".
The company should be smart enough to know I only buy certain brands of dog food. It should know I need a gift for my mum (and what she likes). It knows I need new trainers but will only buy the cheapest. Etc.
The opportunity for this has pretty much closed with the era of data privacy coming in, but I think it is both surprising and a shame that this didn't happen in the last decade.
I was in Ads (the first time) from 2008 - 2010. At that time, there was a "user-specific" (I forget the name) group that specialized in modifying ads based on the user themself. They were VERY very restricted by the lawyers in what personal data they could use. I imagine it's been considerably loosened by now, but I don't know that.
To someone else's point: yes, they are restricted by who bid for the keywords, but honestly, that's the whole magic of Google Ads. You have a "motivated buyer", someone who actually wants to send flowers or stay in Duluth. How do you know that? They searched for "flowers" or "duluth hotels", duh.
Is an advertiser willing to throw money at people who probably want to send flowers, based on their past behavior? Well, maybe, but not on the Search Results page. There are lots of other web venues where they can place creepy ads like that.
This is a fundamental issue with the ad business - the ads are selected from those who bid, not from those that are relevant. For example - if I'm listening to a podcast with a captivating guest, the relevant links would point to their books or blogs. But since they didn't bid for ads, I'm shown some crypto drivel or date-Thai girls scams
I believe that would have been during the time that google broke the ability to + words (without initially providing another verbatim alternative) in order to use + for google plus somehow.
I can't square that change with the claimed commitment to search engine quality.
For the first two, I think that black people use the phrase “white people” more than white people use the phrase “black people”, so “white people” includes black people in the results due to using quotes associated with people in the search ranking. (Whether things people have said should factor into image search is another question.)
But if you just search for “person” or “couple” you get results showing mostly white people and couples. I don’t think what you’ve observed is saying what you think it is…
well for the first white people I can see that the first two images of black persons has the words white people in it, specifically "Opinion: white people know racism..." (haven't clicked to see rest) and "Why I'm no longer talking to White..." where the next word is people, it's a Guardian article so that explains high ranking I guess - what is your point?
Image search does not classify image contents. It uses the site text for ranking of the images on that site.
Do a google (not image) search for "white people" and you'll see that this phrase is mostly used in pages that are in fact about racism and therefore likely to contain images of black people.
It most certainly does, and has been for at least a year or two, or possibly even a bit longer (can't remember when I first noticed this behaviour).
You can e.g. do a search along the lines of site:<domain of online clothing store> <hair style/hair colour/…>, and at least for the most common and recognisable kinds of hair styles, it will actually return relatively reasonable image results, even though online shops most certainly don't have the habit of annotating the hair styles worn by their models on their product pages.
Along the same lines, Google is now also in the habit of OCRing any text content it can find in images and indexing that for search, too.
It's true that it'll still also take the text surrounding the image into account, but it's no longer true that image search is only based on that.
Nothing that can’t be easily explained, of course, anyone coming with a reasonable explanation is being downvoted by the “critical thinkers” of HN who can only instead provide low effort quips.
Not completely sure what this “phenomenon”. There’s a few things I can imagine you are insinuating but they all had simple explanations so I’m not sure.
Don’t disagree with the main theory that search quality is deteriorating. I have to use increasingly contrived queries to get anything but bullshit blog spam, and indexing seems really odd at times.
It seems weird to me that Google would take "search quality" seriously for 15+ years (including nearly a decade as one of the biggest companies in the world) and then suddenly stop. Are you able to share any of those objective measures of quality? Because it seems to me that most of the discussions around the declining quality of Google search amount to anecdata backed up by reasoning that doesn't really make a lot of sense to me (e.g., "Google only cares about short-term revenue!")
> It seems weird to me that Google would take "search quality" seriously for 15+ years (including nearly a decade as one of the biggest companies in the world) and then suddenly stop.
Is anybody claiming that it’s sudden? Everybody I’ve seen complain about Google search results has seemed to think that it’s been getting slowly worse over a long period of time.
I don’t think google search has gotten worse, I think the SEO abusers have just started winning. So much content on the web is automatically generated and made to look just like real hand crafted content. So much of it has titles and headings which have little relation to the search.
I was recently searching for a piece of ceramic cookware, specifically looking to avoid non stick coatings. And google search showed me lots of listings with my exact search terms, but when I click the page it shows their generic product range which has nothing to do with ceramic despite the title saying so.
Vast majority of these SEO abusers are monetized through google ads. For every advertisement dollar spent through google, google takes 30-50%. And 80% of Google's income is from internet ads.
I would be surprised if google's search quality would not deteriorate - they have very strong incentive for that.
I still get strange results from flaky websites that have strange TLDs and the content is literal garbage. Like just words mangled together. Ads on there are the worst and reach into malware territory. Results like these are on the first page. I have no explanation for it, but 100% of my searches for personal use are DDG or Bing now.
>I don’t think google search has gotten worse, I think the SEO abusers have just started winning. So much content on the web is automatically generated and made to look just like real hand crafted content. So much of it has titles and headings which have little relation to the search.
right, I mean we have nearly daily posts here about almost human level quality texts you can generate with various machine learning solutions.
Add to that sites who serve users needs must focus on doing that, the sites that just game Google search rankings can focus on that.
I mean maybe the problem is that the SEO abusers are only 35% of the way towards being constructive, a la XKCD https://xkcd.com/810/ - so the thing that wasn't considered is as your spammers get closer and closer to being constructive the more stress is placed on the systems that have to deal with fighting non-constructive spam and the worse it is for the users.
But maybe at some point there will be a breakthrough with SEO abusers being able to be as constructive as the actually useful sites, at which point there will be a short time when the SEO abusers are the only sites returning results, causing all other sites to go out of business, causing the downfall of the SEO sites which rely on actually useful sites for a great deal of their stolen content / content seeding corpora.
Not sudden for me.
It has been progressively worse over time, and unfortunately the competition is not much better.
Searching today is not as bad as it was during the Altavista/Netscape/56k modem era, but it is getting dangerously close, from my personal and anecdotal experience.
It seems Google started decaying when they gave up their social platform G+
Nowadays all those tweaks to the algorithms, specially search and gmail spam filters, are like a desperate attempt to make the ship move faster by removing parts of the hull while keeping the water out. It will just sink at some point.
I mean, there are two ex-Googlers in this thread claiming firsthand that search quality was considered sacrosanct as of a few years ago, so it seems like it'd have to have been sudden if indeed it has happened at all.
Well, it's possible they attempted to keep search quality high, and they were unsuccessful but they thought they were successful.
After all, if me, my team and my boss are rewarded when an imperfect measure of search quality goes up and it's going up, why would I fight to switch to a different measure that wasn't rising?
I didn't notice a rapid decline until early 2019, when the search results stopped containing the keywords I used. Before then there was a blend of synonyms and actual keyword matches.
This was roughly when Google's new AI began "interpreting" the "meaning" of the specific jargon and product serial numbers I was inputting, and then decided what I actually needed was song lyrics.
To me, it seems crazy that a company would actively try to self-sabotage their bread-and-butter product in search of a bit of extra revenue when they're already making record profits. Not to mention that, as the author of the article points out, Google published a well-known quantitative paper on why that's a bad tradeoff.
That happens when companies have monopolies. They can sacrifice quality for profits. Happens frequently. Obviously you can't sacrifice quality to zero, but you can a bit. It would make sense that they would keep pushing it down little by little (returning higher and higher profits) to find the turning point. I don't think they're yet in danger though because there isn't a real uptick in the competition.
It happens all the time though. Look at amazon, they flood their marketplace with cheap knockoff products that hurt the credibility of their ecommerce offerings, but the money generated is too much to shut off. So in the short run, amazon makes more money for a worse reputation, and in the long run ??? its unclear
My dad frequently says that sooner or later the finance guys get in charge of the company and then they cut the fat and keep cutting into the product. Sooner or later they go too far. Seems fairly inevitable, seems to happen a lot. I assume that what happens is that the product matures, so the obvious thing to do is make things more financially efficient. Problem is, the company sells a product and the finance CEO's core competency is reducing costs, and reducing costs isn't something that customers actually buy.
I don't think that's analogous: you'd have to argue that Amazon spent a ton of time and effort on keeping cheap knockoff products off their marketplace and then did a sudden u-turn and decided not to do that anymore. Even then, I don't think the analogy holds, because the quality of Amazon's marketplace listings isn't anywhere near as synonymous with Amazon as the quality of Google's search results is with Google.
>Even then, I don't think the analogy holds, because the quality of Amazon's marketplace listings isn't anywhere near as synonymous with Amazon as the quality of Google's search results is with Google.
What? Why not, and anyway Amazon I'm giving money. I'd think that quality there would be more deeply linked than anywhere I'm just giving some time.
The CEO and upper management might simply not care about the long-term viability of the company. Cashing out in the short-term sounds more like what these executives generally do.
Sundar Pichai is probably the weakest of the FAANG executives - if not one of the weakest Fortune 100 executives. Google's search result quality, reduction in overall prestige and lack of new major products are just a few indicators of that.
But speaking in general, there is limited correlation between executive compensation and company performance. You can DuckDuckGo it if you want to find the evidence for this statement.
For whatever it's worth: I had a talk with a WSJ reporter, who said he's met all the FAANG chief executives, and Sundar was the least impressive of them all.
Having everyone you've worked with love you and find you brilliant is not at all the same thing as being brilliant.
It's rational and commonplace behavior for a monopoly to prioritize profit maximization over product quality, there must be thousands of case studies over the years. If a firm has no viable competitors, why do they care whether their product is good or not?
The cases you mention involve maximizing profits by saving on costs. The theory above is that lower quality leads to an increase in revenue, and it is tenuous at best.
I think just attributing this to the right people not being in charge over simplifies the problem.
Search is a complexity beast and simply continued to grow in complexity during the several years I worked directly on it. Folks were proud of the fact no one could even enumerate all the features in the system (attempts were made and abandoned).
The tools to change search safely werent keeping up with the complexity of the system. Understanding impact with evals and experiments became much harder. Gwsdiff and friends grew flakier. Debugging had so many different entry points depending on what you needed to do.
The search stack deserves some really deep cleanups and refactoring, the eval and devtools are similarly in need of a ton of love.
> the weekly meetings often featured "search quality" measurements that were rigorous in their objectivity (I thought).
I wish this part got discussed, but every time I've attempted it, the discussion has been shut down by "lol they're experts at search and you're not and you don't know what they know."
I wouldn't put it past Google to be blindsided thinking their own metrics are objective (perhaps they are objective measurements of something but not of what they actually want to measure). If anything, the battle with SEO just shows how hard it is to do something right and avoid getting gamed. If they can't rank SEO spam off the front page, why would I believe their measurements are any better than the rankings?
There's also always a small possibility that metrics are worse than wrong; they could actually say everything is fine, keep serving these long form SEO spam articles that people click and read for far too long before realizing it doesn't have have the answers they seek.
But slow/fast isn't quality. What were the rigorous measurements? Latency of results to click? User clicked above the fold?
I have wondered about this now for several years, because from my point of view Google search has steadily degraded since around 2010. The specific degradation isn't that it returns nothing, or irrelevant pages, but that it returns mostly _recent_ content, and returns very little _older_ content which is still assuredly on the web.
I can see how Google's approach will work for the average search query - after all, the average query is probably about something happening now - but that doesn't equate to _quality_.
> But slow/fast isn't quality. What were the rigorous measurements? Latency of results to click? User clicked above the fold?
Slow and fast is what we experience. We live in a phenomenological world, not a world of millisecond metrics. While it isn't rigorous, for humans, numbers are pointless. We can't experience them. What matters is whether it feels fast.
That isn’t quality. That’s like saying a crappy textbook full of errors that ships prime same day is better than a good textbook that takes a week to arrive.
just an anecdote, but I see quite a few people in my linked in network who are now at google within the last few years. People I wouldn't consider "google quality".
I tried the query "databricks series b valuation" on Kagi (just a beta user there) and the results were:
1. Databricks Funding Rounds, Valuation and Investors (https://craft.co/databricks/funding-rounds) - not directly to the point but does include information about all rounds.
2. Databricks Raises $1B at $28B Valuation, Plans Massive - not answering the question at all.
3. *Databricks Closes $33M Series B Funding - FinSMEs* (https://www.finsmes.com/2014/07/databricks-closes-33m-series-b-funding.html) - Direct hit! didn't even have to click into the page.
In my mind this is yet another proof of Google's search quality decline. I remember being so excited when I saw the first structured search result but now I tend to use other engines first.
Article forgot to mention censorship. All kinds of results are censored/banned from Google. This detoriation is just as bad, maybe worse, than the one caused by paid for results or simply inaccurate results.
Yes, this is the reason I resorted to bing a few times in the past week. Google seems to penalize certain news sources to such an extent that vaguely remembering the content of the article does not suffice any more to find it again, whereas the article I have in mind usually shows up in the top results on bing with the same query.
I actually want this in my search engine. I want it to exclude low quality news sources and sites with cloned content. But I want to control the exclusion list myself, like I used to be able to do.
What if someone wanted to get a large number of results, more than 300, for a common term found on the web. Is Google suitable for such a task.
How could this be proven. Search for the term "example" and see how many results can be accessed. Is it greater than 300.
By limiting the number of results that can be accessed, Google is "hiding" a portion of the web from the user. How do we explain this practice. Perhaps that portion is not deemed useful for user data collection or advertising purposes hence it is excluded. Perhaps Google wishes to prevent users from accessing large chunks of its index. Who knows.
We need a search engine that exposes the full web and does not try to guess what someone is searching for. If the user requests all pages with the term example, then that is what the search engine returns. Google is far too limited.
If one goes to a library and searches an academic database, she is never precluded from viewing all results. Even though she may only access the first few pages, she is always allowed to see all the results. She can view results that were low on the relevance scale to understand why they scored low. She can then subsequenty narrow her search. I have seen this implemented in non-academic databases as well. First a broad search is performed. It returns all results, not just a portion. These results are stored. Every single result is accessible by the user. (Not possible with Google. User only sees 200-300 max.) Then the user can narrow her search and search within those results. The user repeats using different searches until she has what she wants. The user controls the number of database items she wishes to search, based on the initial broad search. With Google, the user has no such access to all results from a broad search, nor the ability to search exclusively within that set. Google is extremely limited. Everything is geared toward user data collection and online advertising. It truly detracts from any search functionality they may have to offer. The user is the product, not the database.
Incidentally, I noticed very recently (in the last month or so) that searches in google maps (at least on mobile) are also starting to return completely irrelevant results. Something is seriously wrong at google.
The real question about G is this: who is really in charge there? Answer that, and the explanations for the whys of their behavior will fall into place. I'm not pretending to know the answer, but this question has helped me figure out what's going on with other companies and their strange directions taken.
It started happening to me around November. I can understand if it shows me an establishment/location with a similar name but the results it gives have nothing in common with the location I'm searching for.
This drives me nuts when I'm trying to navigate somewhere, no pun intended. For example, I'll tell it I want to go to a trailhead for a lake but it will switch it to the literal middle of the lake itself.
That also happened to me. I once want to drive to a port. And google decides to direct me to drive straight into the port. (The residents even make a sign that says 'If google direct you to here, you are on the wrong way', It seems it had gone wrong for many years and never got fixed.) Who on the earth think list port/lake/ocean into driving target is a good idea???
Yesterday I was googling for “rub fingers genie magic”. I was looking whether it was a cup, kettle or a vase you “rub fingers” to invoke the magic genie.
But google would automatically change to “jenie” and bring up Amazon crap. I tried adding “alladin” and it changed query to bring up alibaba crap.
Google had no idea about the famous alladin and genie story.
Google not only misunderstood me, they thought I was an idiot for asking a non e-commerce query.
The top 5 results above the fold were all ads. It was truly frustrating. I called a friend to ask them that question and get an answer.
It might be because it's spelled "Aladdin", it's a lamp, and you don't necessarily rub it with your fingers, you just rub it. Your queries were way off.
If you had just Googled "Aladdin Genie" you would have every single result in the first page be relevant.
There's one context I am aware of that "rub" and "genie" go together, and I would certainly expect Google to suggest something with Aladdin pretty high. There is that song with the lyrics "I'm a genie in a bottle, you have to rub me the right way", but I expect Aladdin to overwhelm the results. Not getting Aladdin seems to be a big failure.
I tried the query in Firefox icognito, logged in, and in Safari on iOS using the cell network not my wifi, and got identical results:
1: Youtube video from the "Aladdin" movie (didn't watch, so not sure)
2: Quora question about genie not giving three wishes
3: Movie Quotes - Aladdin
4: Etsy lamp product
5: Stock photo of man hand rubbing lamp
6: Amazon product
7: Genies in popular culture - Wikipedia
8: Iconfinder result
9: Genie - Disney Wiki
This seems rather different than the poster. It's got way more commercial stuff than I'd like (but how does Google know I'm not looking to buy a stuffed lamp toy or something), but the answer to their question would be in the Wikipedia entry, I expect. It might be hard to find the original story, though, since the search term "Aladdin" is probably overwhelmed by the movie.
Lately Google has been giving me results that ignore my operators like quotation marks and - signs. I knew it was all downhill from here when Image Search became 50% video results.
Google search is OK for me in English. In my native language (central Europe), it's almost impossible to find content that is not eshop or someone selling something. I'm looking for HTML4 articles, phpbb forum discussion, etc. where people actually share knowledge, but I only get super-SEO optimized crap. The best I get is a blog spam from an eshop.
Does the content you're looking for exist in your language, and if so, can you find it on some other search engine? It seems like a lot of 'knowledge forums' are in English.
Indeed it does. These are mostly queries for non-computer stuff, like I'm interested what's difference between various variants of pepper sprays, or merits of oil vs oil-less air compressor, etc. All the queries I can think of that would have Wikipedia page at the top in the english version give me only eshops and blogspam from sellers in my local language.
For tech stuff Google is usually pretty good, sometimes I have to change a little my query, but it works.
For non tech stuff is a disaster. I recently looked for a specific brand and model car tyre datasheet, I know it may be a nich, but still I got ONLY ads. Not a single result was about data, let alone datasheet, all were ads.
>For tech stuff Google is usually pretty good, sometimes I have to change a little my query, but it works.
I rarely search programming topics nowadays. However anything related to electronics/mechanics/disassembly - tear downs/etc. is pretty a rather futile experience.
I just tried to find this article on google by searching "Can cats eat blueberries bad search results" (Can Cats eat blueberries is very relevant to the article) and it showed up on the first page of results, albeit at the end.
I'm very confused by this article. They seem to just assume that breaking up Google would fix the problem, but they don't explain why. To me it seems that a fundamental problem of search on the internet is that it's adversarial, which means Google needs to make these ranking changes to try to make things better in the SEO war. How does breaking Google up fix this? Wouldn't all the new search engines just be worse at fighting SEO spam, leading to even more spam?
disclaimer: i work at Google, though not on search.
Hah, I love that DDG, Bing and Google all return some article about cats being able to eat blueberries, while Kagi delivers your article at the first position :D
I hate when I search for tech issues and I get doc pages but the deprecated docs get top billing. Ironically this often happens when looking up google documentation of their own products. Also sometimes I am looking up a term out of curiousity / intellectual interest but the results are dominated by people selling products for that term. I agree that search could use an overhaul, maybe even niche search designed around certain domains could be a target for producing something better.
On a slightly different tangent, has anyone noticed that predictive text while typing on a phone (iPhone for me) seems to be getting progressively worse? Like it predicts the most random words that have little connection to what I typed on the screen.
I don't know if all my mistakes over the years are being remembered and so the prediction algorithm is accumulating more errors and performing worse, or if something else is at play (clumsy fingers as I get older?).
I never understood why they didn't expand the "did you maybe mean" section when they pushed into ML based query interpretation.
Let me search what I want to search the way pagerank trained me to. Keywords over sentences.
But if you want to start pushing in that direction why not just add a "you may find better results when phrased as" section which these models appear to be preferential towards.
From the examples, ngrok for example, search didn't even care to include the keyword in all searches, even if the page is more "popular" the fact you'd (and I have repeatedly had to) quote the term to insist it be included is nuts for a search engine.
Finding obscure bug solutions was already barely possible, but has become impossible when even explicit error messages are "interpreted" to 'why won't <unrelated program> do <unrelated keywords>' on StackOverflow because it noticed that the program uses the same library and has had some searches with the same process name in the error line.
I think there's more quality content on the web now than there's ever been.
It's seems like Google wants to prioritize the commercialized web, though, by boosting BuzzFeed-like online publications that post ad-laden articles full of referral links in their search results, along with social media. Even in the development space, Google will prioritize w3schools and content farms that constantly churn out how to guides. Blogs and forums seem to get deprioritized in search results these days.
I think it's complete BS that when I search for something like Basecamp, they show the paid advertisement before the organic result, even though they're the same.
Feels so slimy. The organic result would have sufficed.
Nothing but a glorified 1997 Yahoo Index at this point.
I never click the advert. I know I'm probably in the minority, but I won't give them the infinitesimal revenue.
While Google results may well be deteriorating, this example strikes me as a poor one. Unless you're trying to demonstrate poor results, why wouldn't you put quotation marks around "series b"? There's no point in making the algorithm guess what you mean when you can be explicit.
Putting "series b" in quotes doesn't actually help! https://imgur.com/a/zNNTK3d And "series b" is a common enough phrase that I wouldn't guess you should have to.
Personally I'm at full scale war with ads on the internet and I make sure I see almost none (nowadays this can be achieved in a lot of ways - Pi-hole, NextDNS, uBlock etc.)
So when recently my wife and I were searching for a holiday destination online on her laptop (with no blocking as she gets very annoyed if blocking makes even one page unusable - she'd rather drown in ads all day) I was pretty shocked that for almost all search queries, on a 13" laptop with 1280x800 resolution and the browser running full screen at 100% zoom, the >>entire visible Google results page is all ads!<< There is literally no organic search result visible "above the fold"!
So, Google could improve their search engine dramatically and the typical searcher would not even see the results of it...
Would you agree that every business needs to find ways to earn money?
If we agree on that, could you agree that using a "free" service might be free only in the sense that you don't pay any money but instead they use your attention to sell ad space which in turn pays for their bills?
Me personally, I don't see anything wrong with that.
For instance, I'm perfectly fine with a website earning a commission on the products it reviews. I go out of my way to make sure the website that convinced me gets their commission.
However, a lot of the content is now written specifically for the commission, with zero effort spent on the content itself. That's how you get "best products of 2022" lists on January 1 2022, which contain zero useful research about the products. They add no value.
I also see more and more websites that just rephrase other websites' content, and replace the affiliate links with their own. In the end, you get a dozens of websites feeding off the same 2-3 original articles.
My favorite quote on the subject is this one by Banksy:
“People are taking the piss out of you everyday. They butt into your life, take a cheap shot at you and then disappear. They leer at you from tall buildings and make you feel small. They make flippant comments from buses that imply you’re not sexy enough and that all the fun is happening somewhere else. They are on TV making your girlfriend feel inadequate. They have access to the most sophisticated technology the world has ever seen and they bully you with it. They are The Advertisers and they are laughing at you. You, however, are forbidden to touch them. Trademarks, intellectual property rights and copyright law mean advertisers can say what they like wherever they like with total impunity. Fuck that. Any advert in a public space that gives you no choice whether you see it or not is yours. It’s yours to take, re-arrange and re-use. You can do whatever you like with it. Asking for permission is like asking to keep a rock someone just threw at your head. You owe the companies nothing. Less than nothing, you especially don’t owe them any courtesy. They owe you. They have re-arranged the world to put themselves in front of you. They never asked for your permission, don’t even start asking for theirs.”
My personal philosophy is: if Google wants my money they should ask for it. Put their search engine behind a paywall. If they try to coerce me to give them money by selling my personality, needs and wants to advertisers I’ll try to prevent them any way I can.
I’m shocked that the original author didn’t think to put “series b” in quotes instead of as separate words. Doing that returns a table format of every databricks round as the second result.
In general, “term” is like the old +term format which we all used to use back when we thought search was good until google killed the +.
In reality the Web has gotten worse with far more bad spam actors and our search habits have changed partially because google steered us towards more natural language style queries away from the old style queries many people used to use in this community.
In other words, we got used the convenient of free form text without special operators, and when the webspam increased we feel that having to go back and futz with the query using these operators means search has degraded.
On the other hand, Bing got it right. I’m the sort of person that would happily see AND and OR back as search operators, but most people aren’t, and they shouldn’t have to be trained to put things in quotes just because.
If we could only give feedback on search results, things would improve. I am tired of finding the same articles over and over again that just update the year every year and gives you a top list of product "reviews" with a link to amazon.
If you are looking for an alternative, try Neeva. Being ad free means we aren’t beholden to advertisers, and unlike the other alternatives such as DDG, Neeva isn’t just a wrapper around Bing results. Hope you enjoy it.
disclaimer: I started you.com, a private search engine with summarization.
I've seen many users complain about Google's quality and wanting to have some control over their sources. That's why we added source preferences as a feature (you have to log in).
You can also change the order for a query and similar queries directly within the search results page so that you eg see StackOverflow or Code Completion higher than web results.
Why only 2022 really? Put all other subtle things aside, myself feel it's getting harder and harder to distinguish Ads and genuine results without reading the contents carefully.
Google's primary differentiation used to be that they were by far the best search engine. Since then, they have moved upstream:
1. Android
2. Chrome
3. A deal with Apple
Even if a much better search engine were to emerge, without an extremely large delta in search quality, they may struggle to compete against Google because Google controls the entry points.
Apple & Microsoft still control the platforms (OSes) upon which Google reaches the majority of consumers, though. Seems likely that Apple will try to displace Google eventually.
I cannot rate any of the results anymore as being good.
I want my search engine to search for the keywords i typed in, not what the search engine thinks I want to search for.
If Google shows me a different person then the one I'm looking for, I am at fault, not the search engine. I should be more specific.
So yes, Google's search quality sucks big time, but not in the way the author thinks. He think Google should know what you are searching for. Which is for me, a really bad factor.
I've been saying this since 2016! So what can be done about it? There are certainly better search engine alternatives out there. But the market doesn't seem to want it? The average Joe and Jane still go right to Google for search and Chrome for browser. And the truth is that as soon as a solid alternative comes out, the entrenched incumbents will squash it like a bug through acquisition or other means.
I recently changed over to ddg as my primary search engine (for the nth time...) and this time it seems like it sticks. The quality of google search results have been dwindling, that and a lot of ads just pushed me into switching over again. I have to say that ddg have improved their service as well, so hats off to them.
Yesterday on HN someone claimed that they were more or less happy with code that they wrote 10 years ago. My question, is this a brilliant senior dev that I could learn something from or a grad student that doesn't know what they don't know? My impression is that 2015 could answer this question but 2022 google can not.
Worse, when one is polyglot and it will just automatically translate search results or nicely return local results (after automatic translation) when I want the original stuff I am looking for.
Then all the shitty url redirects for any kind of document returned by it for analytics.
There's been a lot of HN discussion [1][2][3] about Google Search recently, and whether it’s gone downhill.
I used to work on Search and Search Measurement at YouTube, Twitter, and Microsoft, so I thought it would be fun to move beyond anecdotes, grab some data, and do a quantitative analysis.
tl;dr I didn't have historical data, to see how Google Search has trended over time. But compared to Bing, Google still generally outperforms -- although some of its failures are pretty surprising!
If there are particular areas of Google Search that people are interested in digging into, give a shout -- I love running these kinds of search / human eval analyses.
I think the broader theme that these outrage bait posts miss is that the web itself is rotting / deteriorating. Most interesting content isn't on the web anymore and is definitely not in text.
Content that would be have been on an open web forum 10 years ago, are now hidden behind various data silos/walled gardens - whether it's Reddit, YouTube, TikTok, Instagram, Twitter or Discord. Each of these walled gardens have different levels of tolerance for the open web, from using it as an SEO channel (e.g. Reddit) to completely opaque to it (e.g Discord).
Some car and photography forums I was on ~10 years ago are all barren now as the old users have moved on and new users prefer to communicate in a Facebook group or something like that.
Interesting self-published blogs and recipes aren't on the web anymore, they're on YouTube channels or on someone's Instagram channel (or whatever its called).
There's no less interesting information on the internet than there was a few years ago - to the contrary there's much more. I agree that the ratio of good to trash content has plummeted but it's the job of Google and its competitors to find the good stuff. If they're failing at their core competency then it's worth discussing.
The experiment I want to see is one where (1) the same set of test queries is run on multiple search engines (2) for each query result, the result links from all engines are combined into a single document (without including information about the search engine from which they came) (3) the links are ranked blindly for fitness in relation to the search query (4) the fitness rank of the results is then reunited to the source search engines.
Good list. To add to it: I'd like to see the same queries run from a) multiple IP addresses (where you search from seems to affect results), and b) what the differences in results are for those with Google accounts and those doing history-/accountless searches.
youtube now seems to provide me only ~10 relevant results and then random other videos i might like with no way to delve further into results based on my search term. this results in it being almost impossible for me to find certain things, especially videos without a lot of views, unless i know the exact title of the video ahead of time.
what are some of the decisions that go into this ? is it just cost savings ?
I'm pretty sure the term of art for submitting a headline in the form of a question when the article does not and cannot answer the question is "clickbait".
To answer this, Google's search results need to be compared as before and after. The OP is talking only about the current search quality and a comparison to Bing.
Even on Google Maps many of the bad reviews are getting automatically filtered. My guess is to protect businesses. Many of the google services just keep getting worse and worse.
One thing that's recently been driving me crazy with Google: how the heck do Github aggregating websites get better SEO than regular Github when I search for a repo?
I dont understand how you people are getting bad results. My first link for clam antivirus is clamav.net. And the remainder on the first page are download links and wikipedia links about clamav.
I'm not sure if you you people are trolling or what.
With sincere respect to the author, I stopped reading the article. Even though I found the topic interesting and am definitely inclined to believe that Google's search results are declining in quality, I felt that the search terms and criticisms were distractingly unfair.
If someone were evaluating hammers and handed a bunch of hammers to beginners and judged the hammers by those results, I'd tune out there, too, for the same reason.
A suggestion for improvement would be to have the users RTFM (read the fine manual) first, and then take a reading a week later on their everyday search results. Google is a tool, like an other. Know your tools.
Always search for a multi-word proper name, especially a common name like tim lee, with quotes. Put "tim lee" into quotes, and wikipedia shows up on the first page:
Is vlogger really a common search term? Personally, I'm okay with Google suggesting blogger because I don't think vlogger is all that common. But maybe? Perhaps a better search term would yield better results?
Anyway, this kind of criticism works best for me when the critic gives the subject the most reasonably charitable chance and then talks about the bad. Expecting great results out of bad search terms isn't reasonable, in my opinion.
I understand your point but I find it reasonable to expect google to do some of this stuff on its own by now. I assume that only a small portion of its users know how to use "advanced" searches such as using quotes or ~ etc. In the same way that google uses search completion, they could try to group important parts together.
But I agree with you that the example search strings seems somewhat fabricated or at least especially selected to produce bad results. In my experience the best way to get good results is to use as little relevant search strings as possible. "databricks series b" has the right answer at position 1.
> I assume that only a small portion of its users know how to use "advanced" searches such as using quotes or ~ etc
That's true, and I'm not doubling down here when I say:
I tune out when I'm watching a horror movie, and the characters behave in a way that's designed to bring themselves sorrow. As soon as the characters know they're in a horror movie but decide to split up or go down to the basement alone anyway, I stop watching.
The article totally misses the point. Yes, for the digital morons (their turks) google may work perfectly fine. However, for people who actually look for real content google gets worse and worse. All the good material gets far lower ranking than the repetitive low quality medium posts with identical and non-existing content.
Far worse than this is it is destroying the quality of content produced by well meaning creators.
Every review (and almost every site now) has to have the word 'best' wedged in 1000 times and text descriptions of what would be far better communicated by graphs or diagrams. The vocabulary and grammar has to he reduced to duck speak (which becomes more verbose, less clear, harder to read, and more ambiguous). There need to be a hundred repetitions of whatever key words are popular. And there needs to be 50MB of javascript and anti-responsive nonsense for what would more legible, more attractive, more accessible, and actually usable on a phone if it were plain HTML with no styling.
Then you need further 20MB js libraries to progressively or dynamically load 100kB images.
It's not just highlighting terrible content, it's actively destroying good content.
Not sure if the article misses the point then. If Google satisfies your "digital morons" by making it slightly harder for the more advanced users to get the results they want, couldn't that be the right thing to do? For example, over time, Google has been using synonyms more so that often the word you searched for doesn't appear in the page. This works for most people very well, but makes the self proclaimed experts complain
This reply is based on my own biased experience, but I run a small website to share public trail data and I've found that Google (in my opinion) artificially suppresses my site's results on really basic searches. Within Google's search console you can easily check if a page has been indexed. For example, I've published new trails/hikes in the past where Google's index claims it includes the page.. but when I search "myhikes <name of trail>", it sometimes doesn't show up - even after clicking through multiple pages! If I change my search to "site:myhikes.org <name of trail>" it'll show up... weird? I think so.
I understand how keywords can be confused by search engines and "myhikes" is fairly generic as many people might post a blog with the string "my hikes", etc. Now if I search a popular trail that Google likes to serve up regularly (i.e. "myhikes <name of popular-indexed-trail>") it comes up as 1st in the list.
Additionally, what pisses me off even more, is that I've searched for "myhikes <name of trail>" and have been served Google's own map / shitty trail tiles ranked as #1, then my site is ranked #2. Doesn't that last bit feel a bit anti-competitive? It does to me, but maybe I'm biased.
Thanks for the reply! It's actually up and running, but if the response says "forbidden" it's likely because I blanket-block a lot of non-US IPs, AWS IP ranges, etc. because of annoying crawlers. This is bad practice, but I do it for several reasons. I've turned off some blanket-blocking for now.
If you see this comment, would you mind sharing if you were making a request from a US-based IP, VPN, or outside the US? Just curious - it'll help me understand things a bit better.
These sites should be heavily penalised for click-baiting and they have been doing it for years.