Hacker News new | past | comments | ask | show | jobs | submit login
Search engines and SEO spam (twitter.com/paulg)
592 points by iamjbn on Jan 3, 2022 | hide | past | favorite | 534 comments



This was in response to mwseibel's thread, which had a big discussion yesterday:

Google no longer producing high quality search results in significant categories - https://news.ycombinator.com/item?id=29772136 - Jan 2022 (1167 comments, spread over multiple pages - note the "X more comments" links at the bottom)


To some extent, I worry that the problem with search engines is that there isn't any data worth returning. Yesterday's thread talked a lot about reviews. Writing a review is hard work that requires deep domain expertise, experience with similar products, and months of testing. If you want a review for something that came out today, there is no way that work could have been done, so there simply isn't anything to find. Instead you'll get a list of "Best TVs 2021" or whatever, with some blurb and an affiliate link, not an actual review. That's what people can make for free with a day's notice, so if you write a search engine that discards those sites, that's fine, you'll just return "no results" for every interesting query.

I guess what I'm saying is that if you want better reviews, you probably want to start writing reviews and figuring out how to sell them for money. Many have tried, few have succeeded. But there probably isn't some Javascript that will fix this problem.


I think one of the fundamental things that make search work well about 1-2 decades ago was that web sites would link to each other, and that those links could vaguely correlate with reputation. There were link spammers, but there was actually a some decent organic content as well.

What's happened since then is that almost all the normal "people linking to things they like" has gone behind walled gardens (chiefly Facebook), and vast majority of what remains on the open web are SEO spammers.


Why has blogs and articles stopped linking to things? I'm reading a restaurant review site, and they won't link to the restaurant. The chef name is a link to a list of all articles tagged with the chefs name, rather a wikipedia link or something useful that can tell me who that person is.


Average websites goal is now to keep you on them as long as possible. According to some metric folks, the longer you stay on a website the more money you spend there. Linking to another website destroys that metric.

Also if you are going to make a purchase somewhere, any website would try to get a cut of the money you spend by actually sending referral links to the product. So small websites that do not allow this service will not get linked so much.

On a metalevel it is thus that links or connections between items are information. Information is money. And as soon as that became evident links and connections also became more scarce.


> According to some metric folks, the longer you stay on a website the more money you spend there.

That is really sad. Metric folks inventing metrics for the sake of metrics, which dubiously correlates to profitability of the company.


Yup and developers have been allowing the marketing and product teams to break the back button as well opening every external link in a new window instead so users have to keep something open to their site. You always had middle-click to do this, but now it's being forced on users.


I just noticed even the goobers at GitHub break the back button when you click a project link too. I don't know why people champion this brand when they have dark patterns and shoehorn 'social' functions into the proprietary platform.


This is a prisoners dilemma of sorts and the whole free web is loosing in this.


Because, years ago, linking to lower reputation sites would drain your page rank.

So everyone worried about SEO became afraid to link to anything except:

1) Their own website 2) High reputation sites like NYTimes, etc.

It's sad. Makes it harder to navigate the web.


Bang on. Saying that "there isn't anything out there anymore" is missing the point: Google's algorithms created this situation, intentionally or not. Before Google, people linked to what they wanted and communities would naturally cluster around topics of interest. Google came in and made reputation into a currency which effectively destroyed all these communities through incentivizing selfishness.


"When a measure becomes a target, it ceases to be a good measure"

-- Goodhart's Law.

Google's algorithms didn't create this situation; people chasing high Google rankings did. Had Google used completely different algorithms yet became equally dominant, people still would have poured their hearts and souls into getting higher rankings.

Basically, an application of the tragedy of the commons. Or: "why we can't have nice things".


But that's taking for granted that Google would have become dominant. Perhaps if they hadn't chosen the algorithm they did then they wouldn't have been as overwhelmingly successful. Instead, I could imagine a world in which there are multiple search engines and none of them are all that good. In fact, that's the world I remember from before Google existed. Search was bad but communities were strong and life was good.

Then Google came along and we all found it a lot more convenient than the bad search engines we were used to. And of course, we all know where that led. In some sense, Google built an 8-lane superhighway and bypassed all the small towns.

We all traded away paradise in exchange for convenience. Now we have neither.


On the glass-half-full side of this: we're getting those communities again! Here on HN, on reddit, for certain topics on various social media (there are pearls there too), on Mastodon, various blog authors, Ars Technica, Quanta, etc. [1]

It's just fragmented - i.e., catering to a specific group. Because if it isn't, it's awesome for 5 minutes and then monetization rot sets in.

[1] None of these work for everyone; conversely, all of these are seen as great things by some and have people who prefer that one thing over others for its quality.


The trouble is, you are no longer "surfing" the Web, you are digging through your RSS feeds and links to interesting sites, fediverse subscriptions etc,.that's not good UX, perid.


>Google's algorithms didn't create this situation; people chasing high Google rankings did.

You're technically right. You'd be more right if you said people chased the highest spots on search engines for the widest breadth of queries.

If there were implicit alphabetical ordering of search results I guarantee you'd end up a bias toward A's, Z's or otherwise in people trying to get top spots.


>Google's algorithms didn't create this situation; people chasing high Google rankings did.

But lowkey Google incentivized such behaviour by not being open and transparent on how exactly their algorithms work.


That would have allowed people to artificially chase rankings even faster and more efficiently. It makes the problem worse, not better.


How is transparency worse than smoke screen that we have today? For example healthy and good websites could rank according to good content, good optimization, variety of multimedia content, decent design and UI etc. You can't have too much of good things and qualities. That would be something like writing a too good book or making a too good product.


Because the rank algorithms are subjective heuristics, not absolute metrics. All rank algorithms always have been. It started with the link metrics, then people started gaming that. It's been a signal/noise war ever since.

It's also dangerous to ask for the exact criteria because they are ever changing. Google et al don't want to be prescriptive about what a good site is, they want to recognize what a good one is. You make a good one, they'll figure out how to recognize it.

They can't sit down and publish "The Definitive Guide to a Good Website". That's just not their role and it will be out of date before it's published.


I understand that Google can not prescribe and direct how websites should look like but more transparency on their part wouldn't hurt.


A big problem is that a lot of community content went behind Facebook. Instead of creating webpages or forums people started using Facebook pages and Facebook groups. This is the main reason I have been anti Facebook for over a decade. Not because of privacy reasons as many are but because I saw that Facebook will put the web behind it's closed doors. Even today some of the best reviews about any product or service are usually in enthusiast community forums. But a lot of that activity has gone behind closed doors of facebook and now reddit. Most of the current thriving forums are those that pre existed Facebook.


Surely there is just a different algo that could bring about better communities?


Different, but not better.

The incentives to game the algo remain. People adapt to the environment.


That's why mechanism design [1] exists as a field of study. The whole idea of that field is to provide the proper incentives to steer the participants towards your objective. Yes, considering they will try to "game" the system however they can.

I'm pretty sure google could do strictly better (i.e.: better in all reasonable accounts) than they do now if they focused on the users' experience instead of revenue for a couple terms.

[1] https://en.wikipedia.org/wiki/Mechanism_design


> The incentives to game the algo remain. People adapt to the environment.

Perhaps it could work if the algorithm changed its algorithm all the time.


You don’t think Google does this already?


Paul Graham says Google doesn't want to follow anybody down that road (human intervention in search). But ISTM the problem is that even though they don't, they can just throw a giant pile of money at it if they needed to crush a competitor. Also, VC will refuse to invest in anybody doing it because Google.


PaulG doesn’t know what he’s talking about.


I disagree, he's a pretty good guide to how VCs think, though not always in the ways he wants.


Only if implemented by the monopolist.

People's best chance is stopping using Google and pushing for it to be broken-up.


I wonder if punishing presence of advertisements would filter out most pages that are SEO'd to the max and instead promote "labors of love" type pages.


This is an interesting idea because it would create a type of non- or anti-commercial SEO that could counteract the commercial one. However, Google would never do it because they sell most of the ads that would be (not) hosted on these sites.


Google owns the largest online advertising network though, so that’s definitely not where their bread is buttered.


Wouldn't it be reasonable from Google to show how their ranking algorithms work so all webmasters and content creators know how to behave on the web. Now we have black box that's causing confusion and is misdirecting websites and web users.


I don't think so. If the problem is people gaming the system, making it easier to game isn't going to improve the situation. It's not going to put good content creators on a level playing field with spammers, because good content creators simply don't care as much as spammers about search engine gaming.


But people are already gaming the system, and on any search from product reviews to code snippets, I see SEO-optimized spams populating the top results. Good content creator don't have the time or the technical inclination to reverse-engineer the ranking algorithm (or to brute-force by creating tens of thousands of sites and see what sticks in a giant multivariate test).

Knowing the actual rules might give them a fighting chance, since the bad guys already know these rules anyway.


> It's sad. Makes it harder to navigate the web.

Some would even say it killed the web by centralizing all the content in the hands of a few [0].

Which is the direct consequence of everybody optimizing to better show up on Google/Facebook/Amazon/Microsoft and ultimately even migrating all their hosting to these companies.

[0] https://staltz.com/the-web-began-dying-in-2014-heres-how.htm...


There aren't as many blogs now as there used to be.


That will get worse yet most likely. Younger people no longer produce public text to the extent they did prior to the the smartphone heavy era. Supply of that blog style content will continue to dwindle as the producers age out. I'm sure there's a stability point it may reach, of course, because some tiny percentage of people will always want to write long-form.

Younger people TikTok, they Instagram, they chat in private conversations with eachother, they occasionally post short messages in walled gardens like Facebook, they YouTube, they listen to music, they watch Netflix & Co. That's what they do. They do not persistently write LiveJournals, Tumblrs, blogs. That pre video/audio-focused era is over and it's not coming back (even if there's occasionally a bubbling up of hipster fakery centered around how cool it is to write text).


I heard an interesting theory the other day: blog viability declined because Google killed Reader. Which indirectly ends up poisoning Google's biggest well, since blogs are an important source of relevant cross-domain links.

I'm somewhat skeptical, it seems a little too poetic to blame Google's ultimate downfall on a decision that was notably hated at the time. But it's plausible. If you want it to be a conspiracy theory, you can posit that killing off independent blogs was the intent, to convince bloggers to migrate to Google Plus.


Google used to prioritize blogs and original content like forum posts in search results, but they don't anymore.


What do they prioritize now, "reputable" news organizations enrolled in Trusted News Initiative?


Blog spam and pages filled with AdSense ads.


I find that claim surprising considering how many more people there are simply using the internet at all.

Fewer unique blog domains due to “blogging” sites that aggregate users? Sounds plausible. Fewer people blogging overall? I’m not convinced yet.


I'd believe it. As an IT consultant, I interact with a lot of people who are semi-techs themselves- mostly small business owners who are used to wearing a lot of hats, and also the type to have been motivated to run their own personal blogs about diving/photography/conlangs/quilting/gardening/whatever their personal hobbies are.

Ten years ago, the majority(!) had at least something up and running, where they would post essays, thoughts, whatever came to mind.

Nowadays? All gone. All! When asked why, the answer almost always is along a mix of ever-increasing negative feedback and harassment from randos, and aggressive automated spamming of their forums. Loss of the pseudo-anonymity plays a large role as well. Many have deleted years' worth of work, simply because they are afraid of someone trolling through their posts to find something to harass them with.

I was never a blogger myself, but I am sad about the change. There was a lot of good stuff out there for a while, and sometimes it just plain made me happy to read someone joyfully nerding out on a favorite subject of theirs.


I think a lot of people are still writing this kind of content, but you have to look elsewhere for it: Reddit, Facebook, Twitter; to name the obvious ones. It’s also harder to find, but you can find all kinds of personal content written in comments and posts on these sites.


I realize that this is a hard thing to 'prove', but I am personally certain that the amount and quality of such things has dropped significantly from a decade ago.

Not to zero. You can still find things tucked away in a post on reddit or the like. Almost never, as far as I have experienced, on Facebook or its ilk, as the affordances are different. I genuinely think there has been a loss.


It used to have positive utility, as before you were acquainting with people you would literally have had nil chance of acquainting with before.

Now?

Nope. Putting anything out there is basically just doing the rest of the world's Open Source Intel for them. Maybe it isn't the Net that changed. It's just there's way more sharks out there that can't just leave well enough alone.


I frequently append site:reddit.com to searches for a niche search term these days. I think a lot of people who would have blogged or commented on blogs are posting there instead.


I wonder if they'll do a walled garden after their IPO. I've always found the site pretty useless, outside the 'old.reddit.com' version. On the bright side, maybe this will open up space for one of the federated clones to grow.


I think the bigger issue now is that more content is inside social media "silos" like twitter, instagram or youtube. I don't have the numbers though.


Why is this a problem? Can't google index social media silos?


Which ones. They can index their own, but for the others only the public stuff. Facebook has a lot of things private so nobody can see them except your friends. (they are by no means perfect, but a lot of things are private and only seen by friends - most of it isn't of interest to a search engine anyway but comments of the form "I love X product" could in a perfect world be indexed as a sign of what people find good)


> I find that claim surprising considering how many more people there are simply using the internet at all.

Most of these many more people are mobile users, where creating long-style text content can be quite bothersome.

What ain't bothersome, with a smartphone, is taking pictures and videos to slap filters over them, alas that's why we are where we are with TicToc, Instagram and Twitter dominating large parts of the web.

It's even noticeable in a lot of online discussions with text outside of these communities; The average length of forum posts feels like it's gotten way shorter over the decades. People have less attention to read anything that looks longer than a few sentences, often declaring it a "wall of text" based on quantity of text alone.

Imho it's a big part of what drives misinformation; Doing any kind of online research on a small phone screen is extremely bothersome compared to the workspace an actual computer/laptop, particularly with multi-monitor, gives.

There's also the difference in attention; When I sit down at my laptop/desktop, I actively decide to spend and focus my attention on that task and device.

While smartphone usage is mostly dominated by short bursts of "can't do anything else right now", I don't chose to take out my phone and surf the web, it's something I do when I'm stuck in some place with nothing else to do and no access to an actual computer.

But for the majority of web-users [0], that smartphone access to the web is all they know, which then ends up heavily shaping the ways they consume and contribute to it.

[0] https://techjury.net/blog/what-percentage-of-internet-traffi...


> TicToc, Instagram and Twitter dominating large parts of the web

For my part, I'm glad these fora aren't indexed well; I don't want my search results dominated by single-sentence posts and photos. In particular, I don't have accounts on any of these services.

I'd be happy if search engines would decline to index sites behind paywalls. Links to Medium, Substack and Washpo are very common, and if the first thing I see is a popup demand for payment, that browser-tab gets closed.


I wonder if it would be possible to have a big filter button “commercial” or “non-profit” or something along those lines. So you get results that are not deemed commercial or are.

Don’t know how hard it would be to know which is which. Maybe non-commercial : don’t run ads, don’t sell a product or service and provide information only.


I wonder if the majority are moving to vlogging instead?


The original Google algorithm was a clever hack but it relied on web being a hypertext, and links being used contextually.

The algorithm made linking valueable. So instead of writing hypertext people tried to create isolated sites and boost their rank. Remember rel="nofollow"?

Eventually bigger sites took over the small ones.


Noticed this with online newspapers too to the extend they are reporting about a website or product and don't include a URL to it.


I agree very much with this. It seems that between the walled gardens and also people being so reluctant to have “their” audience leave their site/page/etc the discoverability of the web has dropped dramatically.


That's an interesting observation. IMO, we stopped linking to good content because Google was good at finding it. Now Google is suffering, and we need to go back to doing more links.


Yup, early Google relied on a lot of unpaid , unseen human intervention and choices. I ran some weblinks and curatorial sites during the search wars, and PageRank could only work because there were people behind the sites choosing links based on their usefulness to their audience.


I wish FB would be more open, but since they have all this walled garden info, are they well placed to start a competing search engine? Would be interesting if their activity could help filter out seo hackers.


FB search seems to have gotten worse and worse. Unless I can remember the specific Group where I saw something, it's very unlikely that I can find it again. And they know which posts I've been highly engaged on...


This explains why running a search engine on the original Google PageRank algorithm would not work as well today as it did back then.

But Google doesn’t run on PageRank anymore. PageRank is merely one of hundreds of signals they use to sort results.


Does this mean that Facebook is the only company well poised to take on google search?


The main driver of SEO spam, and online scams in general are countries that have little to no opportunity for economic growth. There are literally millions of Internet savvy people who would be able to survive on what we would consider barely anything profit-wise in adsense revenue, which also usually pays out in US dollars. In this currently terrible global economy, desperation turns the most intelligent minds bound into poverty into bootleg SEO engineers, online catfishers, scammers, and ransomware creators, and God bless their creativity...

Instead of creating income opportunity and crowdsourcing people in foreign countries for common (more positive) good, companies rarely create opportunities for the people who would normally turn into spammers and scammers, and that's what creates an endless army of people that constantly destroy online communities like Soundcloud, FaceBook, Twitter, and TikTok with stolen content, trend scams, fake news, and spam messages.

Google search has been invalidating and subverting their most accurate search results based on abstract SEO rules for quite some time now. It was likely done so that they could implant paid ads first into content, because that makes them the most profit. Doing that has destroyed their reliability and reputation as a search service leader, and they're never going to admit it, but payola is the undertone that is ruining their search results... There is a certain type of corruption that occurs when a company turns away from upholding customer service and value towards a monopolistic "profit-first economic stranglehold" business model... That strategy never ultimately works out well for both companies AND users in the long run. The next leader will likely be a search that avoids the same pitfalls until they themselves become a profit-driven monopoly.

There is no algorithm that will usefully and fairly counter spam based on desperation, companies need to realize that creating opportunity for people to operate equally on their platforms is the best move, otherwise, spam will drive any community of rule abiding users away or into madness.


>The main driver of SEO spam, and online scams in general are countries that have little to no opportunity for economic growth.

Not quite right because cybercrime aka hacking, cracking, spamming etc. originated in US not in East Europe, Russia and third world countries which are dominating hacking and spamming scene today. Main motivation of cybercriminals is quick money and ease of getting away with it since you are not physically committing a crime but digitally/electronically.


It is quite right. Those are the main drivers and it's due to lack economic opportunities.

Hacking heavily originated in the US because the US practically built the entire modern tech universe from the ground up. The US was far out in front when it came to utilizing the Internet and the Web, so of course unethical people in the US pioneered various types of online crime, the US was the early adopter.

If you're an elite engineer in the US, you can make millions of dollars doing legal work for big tech. It helps in a big way to drain the labor pool as it pertains to criminal activity online. You generally can't do that today in the countries that dominate SEO spam, online scams, etc. In those countries elite engineers suffer terrible wages doing legal work compared to what they should be able to earn for their abilities; commonly they can earn a lot more doing illegal work instead, it's a very potent lure.

You're an elite engineer in Russia, top ~1%-3% globally. What do you do? Earn several thousand dollars per month doing legit software development in Russia (with either zero or little consequential equity compensation); flee Russia for a more affluent market; or do illegal work where the rewards can be dramatically greater. It would be difficult to resist if you were unable or unwilling to leave Russia.


Furthermore, the penalties for cyber crimes, and ability to track footprints are much more articulate and accessible for authorities in the US and UK for citizens that hack and abuse US & UK systems, which makes enforcement upon US and some EU hackers much more harsh/severe/less complex to enforce, and more likely to be apprehended... Many users in economically disadvantaged countries use older devices like PCs running older software, routers that don't allow Mac level reporting, and well past EOL cell phones that don't leave the kinds of footprints that modern devices do (on top of the well documented now legacy security measures they can take).


>You're an elite engineer in Russia, top ~1%-3% globally. What do you do? Earn several thousand dollars per month doing legit software development in Russia

Become software entrepreneur?

And many international software companies have software development teams and presence in Russia.


In places with a less-established legal system it's harder to make money by above-board entrepreneurship and keep it instead of handing it over to local strongmen (two colorful examples that have stayed in my memory and unlike many others have become public and have also been described in non-Russian media - https://www.independent.co.uk/news/world/europe/valery-pshen... and https://abcnews.go.com/International/wireStory/us-embassy-ru... , but of course those are the exceptions because the usual result is complying with threats and handing over your business or most of it). But it's not really about Russia, it's a general issue with parallels in other countries as well. And of course, there's the issue of the local market; the financial advantages for a skilled tech person going towards entrepreneurship legitimately are less attractive in most places compared to USA; heck, even EU potential tech entrepreneurs often just go 'across the pond' to start their business.

If you can't get a work visa to a first world country, you do have less options than someone already living there; and the salaries offered by first-world "international software companies" in their remote subsidiaries tend to be 'according to local market rates' (the same "several thousand dollars per month" mentioned by the parent poster is a decent rate) and thus not as competitive with "black entrepreneurship" which pays according to global standards.


> Become software entrepreneur?

Exactly. Hacking for hire, making cheats, botnets, SEO farms, selling exploits and hacked social media accounts; practically anything you can think of that US software engineers can't be bothered with, as they already earn a healthy salary. That is entrepreneurism.


It's hardly only shady stuff; Kaspersky, ABBYY FineReader, VKontakte, Telegram are Russian software products that come to mind.

Russia also has its own SaaS enterprise sector with companies like SKB Kontur or Diasoft.

Just like the US has to this day warez and cracking groups, where it's for the longest time mostly been about scene prestige, and not making the big bucks.


I wasn't speaking about that kind of entrepreneurism but about making legal software and legal web services that solve problems and are useful. So many Russian hackers got arrested when they travelled somewhere outside Russia and now they are serving 10 or 20 year sentences in US jails.


> So many Russian hackers got arrested when they travelled somewhere outside Russia

How many? 20? 30? 50? IMHO the cases are rare (and get widely publicized whenever that happens, creating a disproportional visibility), you get a couple captures per year but the number is just a tiny fraction of the actual participants, more like an exception than the rule.


My assumption is beyond 20. US is only after big time cybercriminals[0] smaller ones get away.

[0] https://www.fbi.gov/wanted/cyber


> making legal software and legal web services

SEO is perfectly legal. So is spamming, regrettably.

I think Goo no longer cares about the quality of search results; they have other business priorities, so SEO works. Spam is another thing again; we still, after 30 years, don't have an agreed definition of spam. We still don't have a flawless spam filter - far from it. So it astonishes me how much email spam I get promoting SEO services for my non-existent website.


>SEO is perfectly legal. So is spamming, regrettably.

SEO is legal but spam is not at least not in US and many other jurisdictions.

>I think Goo no longer cares about the quality of search results; they have other business priorities, so SEO works.

Google cares about spam but there is so much data and information on the web that it is impossible to figure out what is spam and what is not. Another big problem is fake data and information that is also very hard to figure out. Generally Google prefers popularity over quality because it is easier to detect what is popular than what is of good quality.

>Spam is another thing again; we still, after 30 years, don't have an agreed definition of spam. We still don't have a flawless spam filter - far from it.

Definition of spam is unsolicited message. So if I get pharma emails in my email inbox that I didn't ask for it is considered spam. SMS messages are another example for example if I get SMS message promoting free coupons but I didn't ask for it then it is spam.

Considering how much spam Google saw in the last 20 years both on the web and on the Gmail they should have some decent machine learning/AI algorithms which could flag spam pretty efficiently.


> Definition of spam is unsolicited message

That may be your definition. As I observed, not everyone agrees with it. Most definitions of spam include the word "bulk", for example.


Bulk unsolicited messages?


Reviews are a special category. It suffers from a couple of issues:

1. You need to have enthusiastic reviewers (people who care enough about a product category to review them semi-throughly.)

2. Proper reviews can take time and may need domain knowledge.

3. Competition. When there were one or two people doing reviews on some category of products, maybe the economics worked out. Once you have hundreds or thousands competing with you, the time demand may be overwhelming and not worth it.

4. If you are a trusted reviewer or site, you will get economic pressure to review a particular thing or brand you may not like very much but the money may be good. So you will begin to experience conflicts of interest.

5. If reviews are just a hobby and not a way to make money, eventually you will slow down or move on, opening a hole that gets filled up by spammers.

7. Some things are timeless (a pipewrench, let's say) and some are seasonal (consumer electronics, toys, etc). The former deserves a through review but the latter doesn't deserve as much but it may get the bulk of interest due to seasonal demand). Does it really matter if the latter's latest iteration has 2% increased battery life to discuss?

I'm sure there is a lot I didn't think of. But it's a doomed category, unless people are willing to pay for professional reviews (Consumer Reports types and other independents).


If reviews are just a hobby and not a way to make money, eventually you will slow down or move on, opening a hole that gets filled up by spammers.

In my experience, the best reviewers are hobbyists. The thing is, it's not reviews that are their hobby. Rather, they review the products go along with their hobby.

So, for my hobbies (espresso and aquariums), there are tons of easily accessible reviews on all kinds of aquarium gear and coffee machines, grinders, etc. On the other hand, nobody does plumbing or HVAC as a hobby (that I know of) so it's very difficult to find high quality reviews of water softeners or furnaces. It takes a very special rare sort of person who would install these things just to review them. The closest thing I could find was this video [1] on a DIY water filtration system by an RV/off the grid type hobbyist (from what I can tell).

[1] https://www.youtube.com/watch?v=WCC4TOYYGF8


People like to talk about their work too. There's plenty of those sort of reviews out there. Mostly on reddit because like others have mentioned organic search results are completely gamed.


> Reviews are a special category. It suffers from a couple of issues

Review sites suffer from a singular problem. They are overwhelmingly SEO spam content farms. People go find some product niche and pay some Fivvver/whatever people to write literally fake reviews of products. Because they're pulling all the SEO tricks and are in a niche category they shoot to the top of search results for that niche.

Their reviews sound realistic and viable but they're pure fantasy. The writers never touch the products being reviewed. Many times they'll pull details from Amazon listings (including factual errors) and even other "review" sites.

Once they get established in their niche they'll accept paid placement from product manufacturers without marking it as such. A single scammer might own dozens of these sites, even supposedly competing ones.


I pay for Consumer Reports. I'd encourage more people to too. I don't trust it completely but it's a good companion to manual searches on Reddit/HN/car forums, etc.


Someone pointed out yesterday on that other search thread that [most?] libraries provide free access to Consumer Reports through a membership. I just looked at the San Francisco Public Library and it does indeed give me access to the magazine and a searchable database.


One way out is to grow a community of enthusiastic reviewers. LibraryThing succeeded for book reviews. LibraryThing book reviews are better because LibraryThing caters to bookworms.


Goodreads also has a community of enthusiastic reviewers, but because Amazon owns it I'm afraid they don't have much incentive to improve the site or change anything.


While I agree with everything you said in general, Ken Rockwell seems to buck the trend for photography gear, especially lenses.


> If you want a review for something that came out today, there is no way that work could have been done, so there simply isn't anything to find.

That's not strictly true, given that reviewers are often sent pre-release versions of things in order to do that work before release day.


Not sure why you're being downvoted, as you're correct - however to point out there seems to be a trend where reviewers are only given pre-release versions if they practically always give favourable reviews to the products they list, especially if they're provided the product for free; there doesn't have to be an express relationship or contract between a reviewer and a company either, it's the reverse of how Bill Gates apparently has given $200 million+ to different news channels/media organizations - and so they're less likely going to as freely share negative news about him or perhaps his organizations, so then ; this makes me think, similarly to how stocks being sold by CEOs (etc) must be pre-planned to avoid shenanigans like market manipulation, that anyone giving large sums of money to any media/journalism organization must divide the amount up over 20-40+ years, so that organization at least has a runway and not dependant on larger "dopamine hits" at shorter intervals.


Yeah, that's been a problem with reviews for a long time. In fact it's what Consumer Reports used initially to differentiate themselves: their "thing" was that they only reviewed products bought anonymously at retail (no free samples or manufacturer-provided review items) and didn't accept any advertising from manufacturers either.

Sites that receive free review samples and are supported by affiliate links are kind of the exact opposite model.


It does in a funny way provide something of a metric for how willing the site is to be critical. Several video game reviewers I follow have stopped receiving product from some studios, which I think is a badge of honest review. Although it's not something you'd know easily so it doesn't help much in terms of finding good reviewers.


I trust DC Rainmaker's reviews of fitness tech products because he always returns products back to the manufacturers after writing reviews. So there's no conflict of interest based on free products.

https://www.dcrainmaker.com/product-reviews


If companies don't like his reviews, they'll stop sending review units. That hits both in the pocketbook and the race to be one of the earlier reviewers of a new product. Reduced conflict, perhaps, but not none.


If companies don't send him review units then he just buys them retail. He has already done this for many products.


Yes, I'm aware. That's less money in his pocket, and less ability to have the review be available on or before the product launch. There's still some conflict of interest, even if it's lessened.

Only purchasing review units at retail would remove this conflict.


Incorrect. He can then return them.


Depends, if someone is popular you can't afford not to have them review your things. A a certain point a bad review will still generate more money than no review at all. Few reach that level though, most reviewers don't have that much following.


This presupposes that companies think their products are bad. If you have (what you believe to be) a good product, you definitely want DC Rainmaker to review it. I think this is a reasonably general point across industries - companies want to get their products into the hands of the most reputable reviewers.


In DC Rainmaker's case it is probably the opposite. A fitness product not reviewed by him is a bad signal.


Any serious publisher has that policy. Here's Wirecutter's (NYT) take on it: https://www.nytimes.com/wirecutter/blog/yes-i-work-at-wirecu...


Search engines are pretty good at solving the problem they were designed to solve, which is "finding pages which contain all the query words". But they are pretty bad at solving the much harder problem of rating the trustworthiness & authenticity, intentions of the owner, monetization of the site, etc.

One possible solution to this could be:

- Let the community vote on the most trusted sources

- Include results from enthusiasts that have little incentive to write biased reviews (Reddit, HN, expert forums)

- Look at the ownership of the site and how transparent they are about it

- Regularly reassess these criteria

This wouldn't scale for a generic search engine, but I'm working on a service that does this for many product verticals/niches.


Agreed here, but in your second bullet, people have great incentive to write good quality reviews on Reddit, HN, expert forums... karma/recognition etc. It just so happens that these "forums" have built in voting systems that they spend time preventing from being gamed so the search engine doesn't have to.

Not sure if this is a good model for a search engine, but it does work to a small degree in those forums.


people have great incentive to write good quality reviews on Reddit, HN, expert forums... karma/recognition

Internet points are a terrible reason to write anything. They're completely meaningless. We should all judge comments on their own merit and not because the author has a lot of karma. Apart from mine, obvs.


This resonates with my experience. A couple of years ago I invested more time than I proud of into buying the right bluetooth headset for me. I have found a site with pretty detailed reviews and tested their reviews standing in stores and trying dozens of headsets out. I also bought 3 headsets on Amazon and sent back all of them later. My impression was that the reviews on this particular site are 100% unbiased, where all other reviews I read just want to sell whatever product is in focus.

I wonder how a search engine could distinguish between "honest & professional" and "fake & amateur" headset reviews without having a head and two ears?


This resonates with me a lot. A few months back I upgraded my desktop's insides. New motherboard, CPU, graphics card, etc. That was the first time in about seven years I've gone looking for review for that sort of stuff.

I remember doing the exact same thing in the past and being overwhelmed with information. The detail and data in reviews would take a long time to collate and make sense of. But this time even the big name sites seem to be much shallower. Less models reviewed, less testing and benchmarking, more regurgitated press releases and other news.

Last time it took me a while to sort out all of the information, this time all my time was spent trying to find any that wasn't 100% fluff.


To address your issue, we can simply accept that certain information cannot ever be available day and date. The best results I've ever got from Google is by doing

> site:reddit.com <QUERY>

If I want to know about headphones, or TVs, I'll find better answers in the sidebar than anywhere else on the Internet. But, I will not be able to find those quality answers until a reasonable period of time has passed where products can be tried and reviewed by real people.

The issue is the immediacy. We want answers now, but we won't ever have them until later. This requires a cultural shift in consumerism that companies will not like: "Wait and see."

The same problem happens for people pre-ordering video games that end up releasing in a state of complete trash. You can only get reliable information once the early adopters have tested it. If you are an early adopter, you are shit out of luck, but you are doing the rest of us a service that we appreciate greatly.


I do the same reddit.com trick, but for TVs and headphones (and anything else they cover), I can wholeheartedly recommend rtings.com.

They're in Montreal. They buy their products retail. They get their funding through subscriptions and affiliate links. Overall ratings are formulaic based on measurements applicable to the category. The formulas are available on the site. So are all the test designs.

They're surprisingly thorough too--when I was shopping for some over-ear headphones I really appreciated that they measured the clamping force (when you wear glasses, something that clamps onto the arms too hard gets really uncomfortable pretty quickly) and breathability (temperature differential between your ears and ambient when wearing the headphones). These are pretty important for all-day comfort but don't really factor in most of the time.

Their methodology is all available:

Breathability: https://www.rtings.com/headphones/tests/design/breathability Clamping Force: https://www.rtings.com/headphones/tests/design/comfort#compa...


Maybe there's a hardware engineer out there with a decade of experience shipping and reviewing TVs publishing his thoughts to his blog. He's heard about the latest and greatest and he's offering his expectations based on the promotional material, his friends at the company, history from the brand - whatever. Maybe, if he's built a reputation of good reviews he's got a big audience. Big audience? TV brands give him an early review model.

Modern Google actually makes the content problem worse. When our notional TV blogger is starting out in our world he publishes two or three essays, nobody reads them, he stops putting in do much effort, posts occasionally, dwindles off. In a world with a perfect search engine his early essays get some attention to encourage him to post more, a feedback loop starts, and before you know it he's a full time TV reviewer.


> If you want a review for something that came out today, there is no way that work could have been done, so there simply isn't anything to find.

I think in practice this is actually largely untrue -- with technology products, video games, movies, and just about anything I can think of, most well known reviewers are given early access to the product so that reviews can come out on or before day 1 of general availability. That said this does create a dirth of 100% trustworthy reviews on day 1 since companies are naturally disincentivized from giving early access to reviewers who they know are going to write a negative review.


There are still good reviews. For TVs, RTINGS produces high-quality reviews (although they're not listed super straightforwardly). For computer internals, AnandTech does even better. You don't have to talk about the absolute latest product out that same day for you to have quality reviews of other options in the meantime.

Everyone just makes blogspam because its far less work than actually buying products and developing expertise and testing them and writing out a whole thorough review. Google's algorithms just can't tell a quality review from a surface-level, uneducated take.


Its a double edged sword. Reviews take effort so you want to make them easier for customers to write. But making them easier for your users also makes them easier for those trying to game the system. This is why Amazon's product reviews are useless as well as pretty much any other community based review system.

But on top of that you do have the problem of whether or not someone is really qualified to write a review. So Joe User thinks product X is good. What is their metric for good? It reminds me of an LTT review of the Amazon TV from a few months ago. They gave it an awful rating but noted that the reviews on the product page were generally very positive. And their reasoning was that the people buying these TVs and reviewing them didn't have a good comparison point for what a good TV actually is. They are probably comparing it to a much older and less advanced product not to a contemporary one.

So then you think the answer must be get reviews from industry related media. But then you fall into the classic problems of unethical journalists or simply ones that are out of touch.


It's not a question of whether someone is qualified or not, everybody is more than qualified to write about their own feelings toward a product they bought or service they used. In fact no one is more qualified to talk about their own feelings and experience. How useful that review is to you is a combination of the writing quality and depth, and how similar the reviewer is to your own experience and preferences.

Professional critics usually try to distinguish themselves by producing well written in-depth reviews but not from their own perspective but that of a hypothetical everyman who, ideally, is similar enough to a critical mass of their audience.

So it always interests me when people complain about popular gaming review sites being out of touch because almost always it's the reader that's out of touch but doesn't realize their bubble. It's not an absolute rule but I'm in enough niche hobbies to realize that my desires for products are way out of whack.


There’s a lot of good quality reviews on YT on launch date of pretty much anything these days.

It’s not a problem of doing the review, it’s that there’s not much of a market for written reviews, most people would rather watch a video instead.


Interesting. I never watch video reviews. They're painfully slow and impossible to search.


> It’s not a problem of doing the review, it’s that there’s not much of a market for written reviews, most people would rather watch a video instead.

I'd say it's more that YouTube offers a clearer path to content monetization than text does. YT is a much more lucrative platform for the same level of effort as SEO for their text blogs.


There’s not much money in written reviews, and people can’t find them amongst the automatically written SEo/affiliate crap


How do you compare 3 or 4 videos before watching them? Watching video reviews of the reviewers?


Like most things, it’s a reputation ladder.

There’s top channels like LTT and if what you are looking for is out of their niche, you look for the biggest channels in that niche and go mostly by association (who they have made collaborations with,..).

EDIT: of course the big win of video reviews is that you can see the thing working.


I mentioned subreddit searching in another post - this is a good way to find reviews. I often have other forums I will google search inside for opinions on something. It's more effort than a review site, but it is due-diligence. Since there is money in it for someone lying to you, unfortunately its a "why we can't have nice things" scenario. It is why you can't ask the car dealer what the best car is for you. If you want a good understanding of something, you need to dig. If I see a site with Amazon links, I close it in < 500ms.


Developing common standards/protocols for everything required for a quality review vs. a "candy" or shallow hype review would be a good place to start, making it culture that everyone educated knows about to follow - and then they can only go to or support reviewers who list what testing protocols they follow.

Industry has already done this with the "food pyramid" - influencing, capturing governments to make the food pyramid more based on economic reasons and much less on science - with the government putting it out and distributing it into schooling of different levels, giving it an unearned or undeserved authority which then people blindly trust/follow - not understanding that or when systems and their output or oversight have been captured; why the pandemic bringing the classroom home via Zoom, so parents could see/hear the learning material has outraged many parents - an example I've heard, where white children are being taught to feel guilty about their 'white privilege', or parents being upset their children are being taught at a very young age that they can decide what gender they are; I'm not stating what I believe here, just giving examples I've heard of.

This capturing of the government is why I think ultimately the government should be developing and maintaining such platforms, as per law, and requiring individuals and organizations to in real-time add and update their data (simply example being restaurants, their menu's ingredients, their open hours) - in part to de-risk the government having an unnatural power as "the single arbiter" of truth, perhaps instead to de-risk capture that the government funds multiple independent organizations at the federal level - that States can decide which ones they follow, if necessary, part of why States exist - to de-risk the potential capture of the Federal umbrella; however the system is in an imbalanced, broken, captured state - with the duopoly evolving to be more extreme lead or formed by the establishment, with a broken voting system in arguably most countries of the world, and mainstream media being captured by for-profit industrial complexes that fund MSM through ad revenues - which further develop or mould our culture and narratives/talking points and beliefs, whether truthful or not; without fixing these the other platforms/systems excelling won't be possible.


Regarding finding reviews, I think we have to look at the problem in a different way.

Instead of having 1 search engine which returns the same results for everyone (depending on interests, etc, like Google does), we could have trust networks. E.g. you trust a few people, those people trust other people. From this network you could build something like PageRank, which computes some kind of transitive closure of trust for one given person. This will then determine the search ordering for that person.


Well said - it is among my biggest annoyances with the web. Reviews are almost always packaged into best-of or top-X lists. The quality of the Wirecutter is gradually trending down but it is still the website I use to find the "best" of something. I don't have to waste time sorting through hundreds of list-spam sites.


I always liked the wire cutter for just kind of cutting through the crap and saying “this is the one”. I wonder if we need some sort of thing for reviews where humans filter out the sites that are credible.

It’s a bit funny because this we sort of done by Jason Calcanis’ Mahalo back in the day - but maybe he was just ahead of the SEO curve.


Jason Calacanis was never ahead of anything, except a few more recent idiots.

"Obvious, unactionable, or wrong" is a term of art created specifically to talk about Calacanis' advice.

"Mahalo", a Hawaiian term loosely translated as "who is this freak and why does he think our language is there to promote his shitty startup?" was the answer to the single most terrible thing VCs could identify at the time: that someone would create Wikipedia without a plan to monetize every aspect of it. And yet Calacanis did a worse job than the "original fool", Jimmy Wales, whose similar attempt at least didn't end up as a parked domain of SEO spammers.


Companies like Sweetwater do this right. They have “sales engineers” that help you find what you’re looking for over the phone or text message or email. It probably doesn’t scale but as a customer, I don’t care as it saves me so much research time and I consistently get what I’m looking for.


"SEO" spam is "Google SEO" problem. So SE ranking Optimization is not (yet) so much a problem for other Crawler/index SEs (Bing, Mojeek, Gigablast). You might say that Amazon (in eCommerce) and TRIP (in Travel) have cracked the problem of combining good/deep Content/Reviews and Category expertise with Search.

We regularly see partnership opportunities with customers interested in our API [0]. I presume Bing see the same, though their terms are more fixed and require you sharing more data. Definitely big opportunities in other categories, which are often squandered through a naive, if understandable route, of choosing a Scrape and index route.

[0] https://www.mojeek.com/support/api/


> Why not try writing a search engine specifically for some category dominated by SEO spam?

Back in the olden days, there were lots of organizations that collated high quality content from the best writers. They nurtured expert writers and paid them well. They fact-checked the content and employed diligent editors and proofreaders so it was accurate and well-written. Over the years, they'd build a reputation for reliability and trustworthiness that kept people coming back for more. If you wanted to learn about fitness, or cars, or cooking, or science, you'd find a reputable author and publisher and buy their magazines or books.

But then, in the early 2000s, the geniuses from SV "disrupted" the publishing industry and its financial model. They brought us a much better way to find content, the search engine. Because they were so much better than the old-fashioned publishers, search engines gobbled up the advertising money and became the dominant gateway to content. Publishers had to abandon expensive high-quality writing because rankings and eyeballs now mattered more than quality and trustworthiness. Instead of investing in writers, they invested in marketers and SEO specialists.

The result: worthless content, writers banging out garbage for peanuts, and useless search engines.

Two decades later, looking at the barren wasteland they had created, the SV geniuses thought: I know what we need, more search engines, but smaller ones that collate high-quality content from the best writers. There must be money in that, right?


It's been really sad to grow up and watch the cool techie optimism of the 2000's internet get sucked dry by profit motives and left to rot. The change has occurred pretty much entirely within my adult lifetime (I'm only 27 and I still remember when Google was the cool new thing on the internet).

It went from "search engines and the web will usher in a new era of wisdom and democracy" to "useful content is dying at the hands of monetization schemes, and also the internet will be the death of liberal democracy, woe unto us all" in about 15 years.


I tried creating a search engine for recipes. It works well and people like it, but the struggle is no one remembers that it exists and Google is just their default for search.

So from an individual developer perspective, it's very hard to get people to change their habits. And Google/duck/Bing is the one stop shop.

It's still out there, but I haven't worked on it much lately. I always think that if I had some good advertisers, a better UI, and a salary coming in, maybe it could take over some of Google's usage!


> I tried creating a search engine for recipes. It works well and people like it, but the struggle is no one remembers that it exists and Google is just their default for search.

Link please



Just an idea but what about making it easier for folks to remember to use you search somehow‽

I like and use Duck Duck Go‘s !bangs [1] all the time, maybe try to add your site with a rememberable name.. may I suggest !garlic ?

[1] - https://duckduckgo.com/bang [2] - https://duckduckgo.com/newbang


Just submitted it, let's see what happens!


This is really cool! Bookmarked!

One piece of feedback: It seems to somewhat liberally fuzz the term, doesn't tell me about it and not toggle it in the UI. I searched for "Natto" and got some great results at the top (Spicy Kimchi Natto!). However, a few results down the recipes start including "Nattu" which seems to be a Indian chicken dish and then quickly have nothing to do with the term at all and give me stuff like Tafelspitz.


Yeah, that's the typo detection in search. The key thing is that "Natto" is prioritized, so that part is working correctly!

Tweaking search can be a whole full time thing. I'll write it down and maybe look into if I can tweak the search so if there are exact matches to not show alternatives.


This is great. I've been juicing a lot lately and every site you go to is full of adds and useless content and you have to click `Jump To Recipe`.

1 thing that would be helpful is to add in the ingredient amounts.


Yeah, that was part of the long term monetization plan. Paid for user accounts ($3ish/month) that allow people to clip recipes, view previews, discuss, modify, and share.

Never quite got around to it unfortunately, as I wanted more users before new features.


Have bookmarked. I shall try to remember to use it.


Always nice to see other sites using Svelte!


(1) What is it?

(2) Do you have a bang on DuckDuckGo? I'm pretty aggressive with bangs, and I suspect a lot of DDG users end up being aggressive with them as well.


Linked above, I didn't know you could just submit a random site to a DDG to be included in bangs


I think the name may be part of the problem there. Most things people want recipes for probably don't contain or go with garlic. E.g. if I wanted to make a cake, garlic is one of the last things I'd think of.


Something else that has largely disappeared is that there used to be a fair amount of organization of content, whereas now a lot of content is just thrown into a big pile and the user is left to go fishing on their own with search engines, whose ability to search seems to be declining (ie: Google often seems to no longer support mandatory include/exclude search parameters). Generally speaking, the result seems to be decreasing order and increasing chaos.

Of course, the massive volume of content creates a fundamental problem, but user curation & categorization on sites like Youtube would be possible, were Google to provide the software support so people could do that. Whether this and similar decisions are deliberate or accidental is likely one of those things that we will never know.


> Google often seems to no longer support mandatory include/exclude search parameters

I've noticed this, and it's frustrating. I have assumed it's intentional. I am left to guess as to what a change in this behavior would accomplish.


>Because they were so much better than the old-fashioned publishers, search engines gobbled up the advertising money

No, what happened is that the publishing industry lost their monopoly. They could no longer extract monopoly rents from advertisers.

One problem with search engines is affiliate marketing. If Google de-indexed the junk affiliate sites, the web would be much less polluted with affiliate spam.


> No, what happened is that the publishing industry lost their monopoly. They could no longer extract monopoly rents from advertisers.

Publishing was not a monopoly. Google/FB are a duopoly. If publishers capture rent from advertisers, they plow it back into content, aka the thing consumers actually want. If Google/FB capture rent, they don’t provide a living wage to content creators and plow the money into buying other startups and whatever “metaverse” is.


An "industry" can, almost by definition, not be a monopoly.


It can if there are only a few big players and they're all in cahoots.


Popular HN post circa 2015: “Content Marketing Handbook“

Popular HN post in 2022: “Search engines and SEO spam”.

Inevitable popular HN post 2025: “How to avoid getting flagged when content marketing“.


I cannot find the link now, but there is great one page graph / chart that shows all the categories of Craigslist that have been cloned/displaced by start-ups. Does anyone have a link to that report?


I think you're unfairly putting blame on Silicon Valley. Publishers were only able to produce high-quality content because, with no conversion metrics, advertisers were willing to overpay for placement. Tech undermined publishers' revenue, but what it revealed was that people don't actually want high-quality journalism, they want entertainment, and they're definitely not going to pay a premium for it. This was hidden behind publishers' business model.


>Publishers were only able to produce high-quality content because, with no conversion metrics, advertisers were willing to overpay for placement.

This implies that big budget advertisers (the CPGs, like Coke and P&G), are buying Google/FB because they have better conversion metrics. That isn't true today; only SMBs and gaming companies care about conversion metrics. There are interns in LA/NY probably collectively spending millions on FB for P&G and only reporting the number of likes back to their bosses. Google and FB has never meaningfully delivered on conversions past anything like app downloads.

Tech undermined publisher's revenue because the internet cratered distribution costs. Advertising revenues for big media crashed because the eyeballs moved away, not because it was any less efficient.


> the CPGs, like Coke and P&G

Did any of these heavily buy newspaper ads before 2000? Definitely TV, possibly magazine, but newspaper? I just don't remember seeing ads for Tide in newspapers.


I thought for a moment about your reply. Do you remember Saturday / Sunday coupon sections? These are essentially adverts. Those are 100% consumables -- like branded health, food, and cleaning products.


That's a fair point; I forgot about those. That said, the earlier comment was on conversion metrics. Unlike brand advertising, brands track conversion metrics on coupons closely.

I stand by my point that tech just exposed a bad business model. Newspapers were only viable because they were the ~only game in town.

Now what actually killed newspapers wasn't search, it was Craigslist killing the classifieds...at least according to a study Google paid for.


I guess there's also the rise of influencers in the mix here. The commoditization of publishing means content creators can more easily work independently.


Those guys covered literally nothing compared to what I can get recommendations for with “product type Reddit”. No thanks.


You may not be aware, but the written word can be used for more than product reviews.


Oh, they were just usually wrong on everything else. Fortunately, these days we have individuals debunking the nonsense. Back then, people just uncritically believed total horseshit.

The invariant has always been: find people who make falsifiable predictions and improve. Back then the pool was small and you had no choice. Now, fortunately we have a choice.


What does SV stand for?


Greed, mostly.


Silicon Valley, i.e., the California tech scene.


Silicon Valley


I've been troubled by the just plain awful results being delivered by Google search over the last few years. I think these are just plain hard problems to solve and that Google is not incentivized to solve. Google wants you to click on ads at the end of the day, full-stop.

Often times I find myself searching for "best ($product|$thing_to_do)" which I think many other people do as well because we all want the best. Other times I'm looking for a music or a book recommendation with some depth. This of course nearly always leads to SEOd trash. There is no relevance nor is there trust. So, I like others to use keywords like "reddit" or "forum" to get to real humans who I trust and intentions are not to sell via affiliate links.

These issues often lead to the need in finding trust in real human-centered recommendations that stem from real human interests and needs. I've never found an algorithmic solution to this problem. This is why I think college radio stations or those south-of-the-dial end up being so, so much better. And why beer recommendations from your local brew-shop owner are better than anything you can find on the net.

I think building search vertical that are hand-curated would be very interesting to see. But I also think we need to build more communities which allow recommendations to be shared without an incentive to get hits via search and aren't paid for by large corporations and where community impact/quality _is_ incentivized. I do worry that those days may be gone and there are just not may be enough folks (not in tech) willing to spend so much time online and contributing to niche communities. A lot of folks spend much of their time in walled-gardens like Facebook, Instagram or Twitter, so it'll be challenging to be sure.


> Often times I find myself searching for "best ($product|$thing_to_do)" which I think many other people do as well because we all want the best.

I do too. I'm wondering why didn't Google invest some effort into "best X" searches? I bet they could extract such information from the web and correlate various sources. They already answer all sorts of semantic knowledge questions.


> So, I like others to use keywords like "reddit" or "forum" to get to real humans who I trust and intentions are not to sell via affiliate links.

And therein lies the problem. Reddit makes very little money. Forums probably make negative money nowadays. Google has decided to demonetize the organic internet and subsidizes SEO crap and AMP or whatever dumb thing their signals consider valuable. We get what we incentivize, and right now the incentives in almost all of tech are pretty atrocious.


Did forums ever make money?


yes i think. until they were demonetized


What do you mean by demonetizing forums?


> I think building search vertical that are hand-curated would be very interesting to see.

That was my inspiration behind a side project I made a few years ago — a decentralized, hand curated "search engine" [0]. Never got beyond the side project stage. But I see promise in this in the future. Eventually we'll figure out that moderated crowd-sourced curation is better than the best machine learning. The filtering capabilities have to be pretty sophisticated to make it work, though.

[0] https://github.com/emwalker/digraph


In some ways paid search disincentives Google from delivering quality organic results.

The larger the gap between paid results vs organic results, the more users click the paid results.

Not sure how to solve this problem.


But paid results do not always, if ever, answer the search query better in any shape of form.

So this would end up with displeased users and bounce backs.


>This may not just be a problem with Google but possibly also the recipe for beating Google. A startup usually has to start with a niche market. Why not try writing a search engine specifically for some category dominated by SEO spam?

>You might need to do a lot of manual spam fighting initially. That could be both the thing-that-doesn't-scale, and the thing that differentiates you by being alien to Google's DNA. (They must hate manual interventions; so inelegant).

Is he describing...Yahoo circa 1994? A manually curated directory service.


Some of the examples used in the Twitter thread Paul was referring to would be better served by a manually curated directory service with a possible addition of a search engine only surfacing content from the sites in the directory.

For health information and recipes in particular there are only a handful of really high quality sites that have quality content for 95% of the information most people need. I bet if you wanted to increase the coverage to 99%, that list would expand to less than a thousand sites. At those numbers manually curating the information would be easily achievable.

How to get people to use your top notch Google replacement instead of Google, however. That's the hard problem.


Isn't that what google Programmable Search is?

https://cse.google.com/cse?cx=dc408db269da4e769 (try searching for something you want a review of)

Make a search, whitelist the domains. Every time you run into a good review site, add it to the searchable list.


That's all fine and dandy, but the goal isn't to just make some good sites a bit easier to find, it's to keep the top of your search results from being interspersed or superseded by SEO spam. Unless I misunderstood your suggestion.


It's really hard to get people to use something other than Google. If you were to launch such a product, it would have to be so much better that people recommend it organically to other people.


That was what Google was to Yahoo/Altavista back in the day, a 10x improvement. Reading this thread, people feel pain enough do all sorts of hacky stuff - appending 'reddit 'or 'forum' to queries, blacklisting spam domains, switching search engines depending on topic. If G keeps declining and a new product does things better, the penny will drop and people will swap.

Siebel and PG see blood in the water no doubt, they see G's market share and want to fund companies to take some of this.


Should be able to run on top Google in a browser extension to insert itself only when the topic allows.


And makes me think that StumbleUpon had a similar curation ability, in that the value qualifier is how often [hopefully] real people interact with content - tracked by who's using SU and agreed to allow tracking; can't remember if sharing that was optional or not?

The gamification of the system then would have to come through onboarding fake users, pretending/mimicking real user behaviour to send that signal into the system; not sure if SU ever ran into that problem or was actively paying attention to trying to identify and removing fake or suspicious signals from their output?

I feel a much better system is easily within reach, it's simply getting the right structure to it, the right foundation, and then it will quickly take off due to the quality difference. I've already figured out a design pattern that Twitter and Facebook has indoctrinated us with, making us think it is normal - and keeping us blind to an actual normal way or organizing or communicating, but that isn't conducive to control or ad revenues - and so extending my future plans to include a better search-directory system would fit snugly into my efforts.


SU was a great way to surface random interesting stuff. I bet most blog entries today could be picked from Twitter, even if they are unlinked.


I always used DMOZ more than Yahoo! Directory. It looks like [dmoz](https://en.wikipedia.org/wiki/DMOZ) became https://curlie.org/ which is still active.



He's right as often as he's wrong


What has he been wrong about?


Okay. I was gonna respond with something snarky about what a crappy mod he was, or how he was a Saint and you don't deserve to worship at his functional feet. But I'll tell you what he was wrong about: He was, as a leader and a human and a mod, petty. He pied pipered himself into a sweet spot and no one would deny he's a good coder, but there the ego took off and forever left behind a skidmark. The cool exterior, the sense of self-importance, the punching down, above all the love of spreading one's revelatory wisdom to the poor little guy; you can love that sort of thing too much, and he did. Perhaps you weren't here or didn't interact with him directly. In my view he became dismissive and derogatory toward people who worshiped him (like you) once he acquired a small degree of fame.


What in the world are you babbling about? Who here does worship who?


I'm going to guess Paul Graham, who wrote the tweet. One of the founders of YCombinator and HN (just in case people who read this don't know).


I'm starting to think Yahoo circa 1994 might be better than Google today.


I wouldn't just complain about Google. Google search results mostly reflect a deeper problem with the web today. I do miss the simplicity of the 2000s.


The funny thing is that if the people who worked on spam at Google were free to talk about it, I'm sure it would become evident that they know more about spam and anti-spam efforts than anybody else in existence. It's a ridiculously hard problem, especially when people are targeting you directly. But they aren't free to talk about it, because if they did it would just give more assistance to the spammers, and make the problem worse.

I'm not saying that curated search results for particular verticals is a terrible idea (though I'm sure like anything the devil is in the details), but on the whole Google search is very, very good considering the constant assault they are under from spammers (which most other search engines are not, at least directly).


The problem isn't that Google doesn't employ these people or invest in their activities.

It's that Google has destroyed their own search results in order to continue to expand their revenue opportunities.

If Google:

- Enabled downvoting on results, like YT videos. (Has its own spam problems, just like YT)

- Allowed you to block certain domains from your search results, like YT videos. (If they added some kind of "coordinated network detection" and down-ranked domains coordinating with ones you've blocked, that'd be pretty cool).

- Allowed you to create your own custom search engines, like "Programmable Search Engine".

That would be incredibly valuable. They already have most of the tech. They could even create a subscription service around custom search engines if they really wanted. Plenty of people would find something like that incredibly valuable.

Anyhow, buried in there is your startup idea. Remember: your startup doesn't have to generate the same revenue or profit as the incumbent on day one to be successful.


The biggest perverse incentive for Google is that making better search results can mean less clicks to ads (clicking an ad because results are crap, going thru more pages of results means more ads). Clicks are revenue which is much easier to optimise for.

Internal Search owners can push for better algos, but what if the algo causes revenue to fall? Are there strong forces strong enough within the organisation to ensure that search quality prevails?

If this is the case, the problem is existential. It can only be arrested at the very top

https://en.wikipedia.org/wiki/Perverse_incentive


The biggest perverse incentive for Google is that making better search results can mean less clicks to ads

This gets close to the real root of the issue -- attention is monetizable independently of the quality of content. There would be much less incentive to create SEO spam if search engines negatively weighted pages with ads and affiliate links, and if manufacturers were barred (e.g. by the FTC) from owning or imitating reviewers.


> The biggest perverse incentive for Google is that making better search results can mean less clicks to ads

This is also something that Google can control if competitors come along.

i.e. If a reasonable competitor comes along that is willing to sacrifice ad revenue for better search result quality than Google, google can just adjust their search quality upwards to knock them out (and then adjust it back once the competitive threat is gone).

Perverse incentives from Google are all over the place - Searching for the delivery business "Just Eat" in the UK for instance returns an ad for their competitor Deliveroo above the legitimate organic search result for me - and I can also see that JustEat are trying to pay for their own brand name just to compete - and IMO this sort of behaviour is anti-competitive, borderline extortion considering Google is the de-facto way of searching for a business, and shocking from a search-quality perspective (where the wrong result is intentionally shown at the top because they paid more money).


If I had to pay $10/month for good search results, I absolutely would. I think most people would. Get rid of the ads and spam, and you have a service worth a premium. The solution is to make it user-centric instead of advertiser(spammer)-centric.


Some kind of browser or extension that re-ranks and filters search results on the web.


The custom search engine is harder than you'd think.

Google's search algorithm is tuned up for searching the whole web. It turns out the heuristics you need are very different depending on the size of the collection.

When Gerard Salton was doing IR experiments with punched cards he was working with collections of as little as 70 documents and in that case you are going to be very concerned about recall and not precision. Maybe there is 1 relevant document and if you miss it you failed.

If you had 70 billion documents you might have 10,000 relevant documents and if you lost 60% of them you still have 4,000 documents. The end user gets more results than they can sift through.

Thus I always groan when I see a site is using "Google Site Search" because the relevance is usually worse than you'd get with the alternatives.

Connected with that is the tuning work: Google has sufficient data to tune up a big model for everybody but true personalized search eludes them because they don't have enough data from you to tune up a model for you.


I agree with you that "true personalized search eludes them because they don't have enough data from you to tune up a model for you". That's what Larry Page said as well "Google doesn't know what you know". His ultimate goal is Answer Machine powered by AI but that's not happening anytime soon. I think internet search engines that we are using today are primitive compared to what we will have in the future.


The problem with all of this is it would help us greatly, but it would be useless to the 99% that the internet is increasingly being designed for. Modern UI trends are becoming obsessed with removing as many options and features as possible so the dumbest humans bordering on smartest vegetables can still use the service.


And customization breaks caching.


It does not if there are common interests and characteristics among users. Let's say for example I'm a young African-American girl who wants to learn how to code and I query "how can African-American girl learn coding?" and Google shows me Black Girls Code a non-profit organization that focuses on providing technology education for African-American girls. Considering that Google knows that I'm African-American girl and that I want to learn coding, how many other African-American girls want to learn coding? Probably many so caching doesn't break customization and personalization as long as Google knows my characteristics and interests and characteristics and interests of other people that are similar to mine.


It doesn’t have to actually, there are some pretty advanced caching mechanisms that allow you to combine cached elements together. For the web at least, you could do this back in the day with Server Side Include, and a place I worked at used it to cache logged in content.


> That would be incredibly valuable. They already have most of the tech. They could even create a subscription service around custom search engines if they really wanted. Plenty of people would find something like that incredibly valuable.

Why would they do this? Google's customers are the advertisers, not the end-users. And no one is going to pay for a search engine, it's been tried and has failed.


> And no one is going to pay for a search engine, it's been tried and has failed.

Always curious about things like this. I certainly would pay for this; it sounds like many other people here as well would. I'm curious if the constraint is that there aren't enough people to actually pay for the investment required for the service, or if there aren't enough people willing to pay to meet the standard VC notions of success. We seem to have a problem with building and supplying services for niche (read: "not expressable as an integer percent of the world's population") customer bases, and I'm never sure if that's a business problem or a cultural problem.


The people most able to pay for a service like this are the people that advertisers most want because they’re the people with enough discretional budget to spend on things like better Google search results. Allowing someone to buy something like this also reduces your attractiveness to your advertising clients.


I think you have to look at it more like Amazon Prime.

Nobody is going to pay for /just/ a search engine. But they might pay for, say, a /better/ search engine, plus additional features around gmail/gcal/gdrive.

Think of it more as subscribing "to google" and less as subscribing to "google search".

Regardless, the point isn't to "fix" google. It's to highlight a possible path for a new market entrant.

... If an existing player wanted to make a move here, I would say that both Mozilla and Apple are well positioned to add "personalized search" to a subscription service. Same with Microsoft. DDG could also make moves here if they expanded beyond search.


You only need ~1/10000th of Google's revenue to be a financially successful startup. 1/1000th and you'll have a great business, and at 1/100th you'll be somewhere between a unicorn and a decacorn.


sure, but you'd need a better search engine if people are going to pay for it.

A company with an objectively superior search engine could make even more money with ads so now you’re back to the beginning


I don’t know if that follows. Google has been maximizing revenue at the expense of the search result quality.

An “objectively superior” search engine, from an end user’s perspective, might have to make engineering choices that come at the expense of ad revenue.

But we’re all just talking hypothesis, it’d be cool to see someone launch a startup to get some answers.


"only"


if you think about it, Google provides advertisers a customized search engine to find customers. So it is not you searching the web, it is web's advertisers searching leads


there is a smaller niche market. SEMrush is a tool used in the digital marketing industry that is now public and has a multi-billion dollar market cap. It originally started as a search engine. When they didnt gain traction they used the tech to monitor Google and interface it for customers who are tracking their performance in search results (and much more).


Can I ask what you mean by public?

It's not open source as far I know and there's only the free trial way to try it.


Probably that they completed an IPO (initial _public_ offering)


How do you fight brigading, the organization of groups elsewhere to collectively vote on something? Eg white supremacist groups get together and vote down everything by people of color, and vote up their pages about how great they are?


Randomly select votes that are actually recorded. Then add in metavoting that votes on the votes with random sampling. At Google's scale with a sufficiently random sampling you'd be extremely hard pressed to successfully brigade or spam the voting.

Google could easily use its current fingerprinting to constrain (to an extent) multiple votes. Even knowing only a portion of the population will participate in the voting they can use a Wilson confidence interval[0] or similar to properly weight votes.

Random sampling works here since you're not guaranteed one vote per user per page and the outcome in binomial, seen and downvoted or seen and not downvoted.

[0] https://www.mikulskibartosz.name/wilson-score-in-python-exam...


easy, voting blocs, you assign yourself to the results of people who vote similarly to you. additionally there'd be local and regional blocs too. I can't think of a reason that the naive everyone sees everything everyone else is doing would work in the long run. That's Twitter, and it's garbage.


This is a great point. I would think Google could rank users from low quality to high quality in terms of the quality of the websites which they recommend or downvote. Tricky business and could be difficult to control, but basically the same thing they currently do for websites, but extended to humans.


How does Google already handle this exact problem on YT?


They don't. There's a lot of pretty obvious manipulation that goes on in YouTube recommendations and search results.


> - Enabled downvoting on results, like YT videos. (Has its own spam problems, just like YT)

They're going the OPPOSITE DIRECTION from this!! They recently removed all downvote visibility of YouTube videos from the user, so now downvotes only feed into their algorithm. So in the last line of defense of me ending up watching a shitty video, one of the most valuable tools has been removed by my betters. It's preposterous that people think that Google is doing a good job. They're actively getting worse, and ignoring everyone saying so.


They're doing a great job. I'm so happy dislike visibility have been removed. It removes the effectiveness of pile on coordinated harassment which many youtubers have fallen victim too.


Except the ratio is visible for the video creator, so they know nonetheless. And there's the possibility of disabling the the like/dislike for each video.

All the big channels I follow are mad at this change, and there's a coordinated effort to bring it back.


>Enabled downvoting on results, like YT videos.

you mean the dislike counter they just disabled to force people to sit through more low quality content and pre-roll ads to claim increase in platform engagement and viewership?

The only thing matters is revenue and Google had increases in acquisition costs in prior revenue reports. Expect to see the data points for the latter metrics to be highlighted on the earnings announcement, and a record quarter for YT coming out of the change.


> Allowed you to block certain domains from your search results

I would love for Google to build this in. Until they do, there is a WebExtension that does this: https://addons.mozilla.org/en-US/firefox/addon/hohser/ ("Block or Highlight Search Engine Results"). I use it to block stuff like W3Schools so when I search for something, MDN is always #1. Saves me a lot of time having to add "MDN" to the end of every query.


Those shitty SEO spam sites exist only to serve ads, and Google has a monopoly on internet ads. So there is no real incentive for them to solve the problem.


Google has 28.9% share, Facebook 25.2%, and Amazon has 10% and growing fast. Not a monopoly, and the incentive is there: if search results are consistently bad, people will stop searching as much, and revenue and market share decline.


Google had the same incentives in 2011, 2012 when they built and released Panda and Penguin.


Real review sites serve ads too. I don't think Google has any incentive to make things worse, and they still want people to google reviews instead of just asking friends or people on reddit for reviews.


Some kinds of "spam" can improve search results.

Things have changed in the past few years, now that Google has developed advanced transformer models, but for a long time Google's question answering facility has been: "let spammers make 10^8 pages where the title is the question and the answer is in the page".

The trouble is that there's a fine line between "answer is in the page" and "word salad!"


> - Enabled downvoting on results, like YT videos. (Has its own spam problems, just like YT)

Not convinced this would help. The spammers would just hire people to dislike competitors

> - Allowed you to block certain domains from your search results

This I would use. Never show me results form collider, watchmojo, ranker,

> - Allowed you to create your own custom search engines, like "Programmable Search Engine".

I think this would lead to people writing highly polarized engines. The Red Pill engine for example and we'd have a new problem, the proliferation of popular highly biased results. Of course that's not to say Google's results aren't already biased but they certainly are trying to cover everyone.


> Enabled downvoting on results, like YT videos. (Has its own spam problems, just like YT)

Are there any search engines that do this? It's a great, simple idea.


Not really that simple, I see a lot of potential for abuse - using bots and brigading to mass downvote your competitors or political opponents.

Couple of positions up or down in google results for somewhat popular and valuable keywords can mean the difference in thousands of dollars per day of ad or affiliate revenue. I suspect it would get pretty wild if google launched something like this. There already are black-hat seo methods and services, but something so simple and direct would turn it up to 11.


> I see a lot of potential for abuse

They already have the tech to fight this on YT. They, in theory, are supposed to be doing the same thing to detect inauthentic behavior on ad placement and click abuse.


Downvotes could apply just to future recommendations for search results you see, and not apply to advertisements.


> Allowed you to block certain domains from your search results

Blocking pintrest would be a dream come true.


This 100%. In travel, we see Google constantly tweaking its algorithms, and compared to Bing, Google surfaces a ton more small, well-written travel blogs [1]

Not only that, Paul and Michael have seen plenty of startups, and at least in recent memory, the number of vertical search and consumer startups that Y Combinator has funded hasn't been that high

As a consumer startup, I know this issue firsthand. Paul and Michael assume that if you build a better product, they will come! That's simply not true these days.

Instead, you need to:

- Build a better product

- Option 1: Figure out a channel with enough growth on an existing platform. This likely means you're doing SEO for your new search engine

- Option 2: Get your customer lifetime value high enough so you can pay for ads. This is tough, since it's a bit of a chicken and the egg problem since most search engines are monetized with ads

As the founder of Wanderlog (YC W19; https://wanderlog.com), a consumer vacation planning app [1], I definitely remember the idealistic days when I thought the best consumer product on its own would win! But growth doesn't just come, and the same can be said of vertical-specific search engines.

[1] Try searching "[your city] itinerary" on Google vs. Bing: it's much more likely you'll find a small blog rather than Lonely Planet or the local travel bureau as the top result


Hi! I used Wanderlog to plan a recent month-long group trip, which was definitely the most complex vacation I've had to plan. For context I am very active when traveling (e.g. multiple activities each day); so not sure how my experiences map to others.

The best part of it was (going to a foreign country) being able to find / identify all the attractions relative to each other, so I could go to cluster A on Monday, cluster B, on Tuesday, etc.

The hardest part of it (and why I needed to create a separate google sheets anyways) was--once I figured out opening hours of different locations, hard-to-book activities with limited reservations--the ease of moving things around more fluidly e.g. cluster B on Monday, cluster A on Tuesday, etc. and having a more information-dense view so I could see larger portions of the itinerary at once.

It would be cool to have an "input everything" --> "input time restrictions / unmovable things" --> output planned activity cluster type workflow.


[1]: both signed in, but with the profile image removed

Bing: https://i.judge.sh/ShareX/2022/01/www.bing.com_search_q%3Dat...

Google: https://i.judge.sh/ShareX/2022/01/www.google.com_search_q%3D...

Interestingly Google didn't have a top-result ad and the google.com/travel carousel is 4th from the bottom.

For the actual results, both thefearlessforeigner.com and paigemindsthegap.com seem to be actual travel blogs (the pictures didn't appear in a reverse image search, so they are probably organic), but they're clearly geared towards being a 'faq' for visiting the city and have affiliate links where appropriate. Bing went straight for discoveratlanta.com, and frommers.com is well-thought-out but not a personal travel blog.


>> - Option 2: Get your customer lifetime value high enough so you can pay for ads. This is tough, since it's a bit of a chicken and the egg problem since most search engines are monetized with ads

nonoonononooonono. No. Don't monetize anything for the first 10 years. That's the only way it can work. Then you can go monetize it and buy an island and not give a shit if you destroy what you created.

Oh but don't worry. You'll have investors.


Also, i'd be very surprised if they didn't have tens of thousands of workers aiding in spam review already.

The hard part in all of this isn't finding and stopping spam - it's defining what spam is. Are all the pie recipes where there's a 2000 word essay about their grandma at the top 'spam'? They still have the recipe, and Google Home devices pick up the recipe instructions just fine so people end up not reading it, but many people would still consider that spam since it adds such an obstacle to getting the information you want. Same for cnet articles like "Best smart home devices to buy in 2022" - it's a reputable brand with a list of smart home devices, but it's hardly a review and exists to funnel people to their Amazon affiliate link.


> The hard part in all of this isn't finding and stopping spam - it's defining what spam is.

This is one area where Google could use personalised results to provide a better experience for the user. Let me decide what spam is for me. Let me mark results as good or bad, so that the algorithm knows what kind of pages should be prioritised or filtered out the next time. Google SearchWiki was a step towards this but they killed it off.


Is conservative leaning info spam or not spam? What about liberal leaning info?

We have seen what this leads to inside the social networks as well as YouTube, and at a macro scale I think we might want to have a shared concept of what constitutes a good search result for a given query.

At micro scale, it can seem more optimized to get exactly the type of result you want, but if we take an absurd example like an Apple Pie recipe shouldn't we all have shared understanding of what types of ingredients would make for an Apple Pie?

The shared understanding, I believe, is core to communication. If all of us have our own specific ideas of Apple Pie, then who is actually right on what an Apple Pie really is? What happens when your search results insist that an Apple Pie doesn't actually have apples in it, but instead pears?


Let's have niches where the content is hand curated by human beings instead of pure statistics by machines.

Hmm why stop there let's actually make the users do the curating and even the content creation by rewarding them with social validation. Let’s have hard working moderators who work on the community full time.

Then we could just build a search engine over it. We could call it Reddit. Or HackerNews.

Maybe the users aren't all as good as professionals at curating the information. Let's hire professionally trained curators pay them well and we could call them newspapers. Then we can come in disrupt them and replace them with an algorithmic marketplace that eventually becomes infested with click bait.


AFAIK the 2000 word essays in recipes are Google's fault - it prioritizes pages with a lot of content, so you have to add that junk to the top in order to rank highly. While I'm sure there's more going on behind the scenes than I'm aware of, it does seem like the rules could be altered on a category-specific basis where a lot of text isn't necessarily a positive.


This reminds me of the page inflation that struck tech books during the late 1990s / early aughts. The Marketing Wisdom was that fat books sold (or took up more shelf space), so texts got padded with weak writing, gratuitous puffery, and other elements, which (much as the recipie essays) simply got in the way of delivering actual informative content.

(The fact that many of these books were rushed out with very poor quality control also didn't help.)


Recipe intro text is useful for contextualizing the recipe and copyright purposes. In RSS days, it was a way to get readers to click through, so the author got the ad views. Also people who write recipes like to write about food.


I'd say it can be useful, but that's not often the case (especially not to the tune of 1000+ words).


This one is hard because it does actually seem to be the case that the cruft around the recipe is valuable if the content is right. Most of the recipe blog stuff is garbage, but if you look at youtube it is clear that creators who add extra flair around the recipe are a powerful force.


prioritizes is correct, but in some ways it's not the best descriptor.

Google's algos, while advanced, still rely a ton on text to actually tell what the page is about. They need it.

If they just relied on other factors (title, links, website, etc.) they would end up with worse results for users. Im sure they've tested it.

Google's core algo in a lot of ways is much simpler than people think (in other ways of course it's very complex).


While I think the essays are excessive, I appreciate that some of them document that the blogger actually made the recipe with progress pictures. With the more basic recipes websites, I wonder if anyone's actually made it before or if the recipe is from some scrapped database of unknown origin and quality.


Years ago I wanted to pursue micro blogging, but this "feature" of Google search stopped me from doing it.

What's the point of writing succinct, to-the-point mini articles about problems and solutions if nobody finds them on Google?


This is largely because micro-blogging means less content, and less content means you could write five 300-word blog posts instead of one 1,500-word post.

I've done blogging for the last 10+ years, and many of those I spent as a freelancer working with startups/brands/editorials. Everyone is after "word count" and I absolutely hate it.

Whenever I work on articles for my own blog, I just don't consider word-count at all. I think if your content is great and informative, then readership will be natural.


This is a very interesting approach. Do you have traffic data collection on your blog?


I collect post views, but not using Google Analytics or anything like that. I built a pretty substantial developer blog (tips, resources, etc,.) back in 2014. I think it peaked at around 350,000 monthly visitors after 12 months.

Later on, I sold it because I needed the money. Not so much that I didn't want to keep working on it. Unfortunately, the new owners didn't have any idea how to maintain a "healthy" content blog, and it has plummeted down to around 30,000 monthly visitors. All the content they're publishing now is some thin headline-clickbait bullshit.

I even gave them free advice on how to fix it, but I think that for a lot of people, they just don't care and will mindlessly pump out as many pieces of content as possible. And such blogs can be identified from a mile away.

And therein lies the problem with Google SEO at the moment. Even myself, someone who has done SEO work for more than a decade, I can see that results are getting worse. In some niches, the same crappy articles that dominated 6-7 years ago are still dominant today.

I guess we're stuck in time, or so Google thinks.


Could it also be due to reduction of public interest in blogs over the past few years? Most stuff are now published in the form of vlogs instead of blogs. I do miss the good old blogs era, tho, and I wish there were still high quality blogs around.


It's two-fold. If Google priortizes pages with a lot of content that's one thing, but longer content also means more space for ads, or more scroll events to trigger ads, etc.

Incidentally, prioritizing long content seems odd to me, in my experience the best pages are short and get right to the point, at least in the context of something like a recipe or other "how to" resources.


Yeah, the newest nuisance seems to be sites that clone Github Issues and StackOverflow with a crapper interface. Somehow they rank higher than the original sources. I'd say it's spam but it's definitely not traditional spam.


I'm not going to say solving spam programmatically is easy, but the gitmemory garbage site (for one example) has been around long enough that there's no excuse for not downranking or removing it. How hard could it possibly be for humans to spot these few sites and nuke em? I'm sure Google engineers see them all the time.


And the strange Wikipedia mirrors that are shown in Google Verbatim searches instead of the original. If I disable Verbatim, they disappear and I get regular Wikipedia instead.


Somehow you got down votted by their creators here :)


> and Google Home devices pick up the recipe instructions just fine so people end up not reading it

I think this isn't entirely related, but that's perhaps the beginning of a bias you might end up having that everyone experiences technology in the same way as it marches on. I've yet to encounter a Google Home in the wild, I imagine far more people are consuming recipes on phones, tablets and PCs.


Compounding the problem, the 2000 word essay is sometimes really useful if it's describing a technique used in the recipe (cf Stella Parks' recipe for homemade bagels on Serious Eats: https://www.seriouseats.com/homemade-bagels-recipe). But somehow only spammy blogs with plagiarized recipes, AI-generated "essays," and affiliate links for every ingredient and tool used make it into the first page of results on Google (or DDG, for that matter).

At some point, Google must have moved away from using site-level reputation in search rankings, as I almost never see recipes from reputable sources like King Arthur Baking, Serious Eats, or Food52 in the first page of results.


Your point is good but I'm not sure I'd say very good given how easily the same SEO spam domains can stay at the top of search results for ages simply by scraping someone else's content. What I'd be most interested in knowing is what their success metrics are defined as — for example, how much of a problem does Google's management consider it if someone searches, finds the answer they were looking for on someone's Stack Overflow rip-off, and stops searching? I could easily believe that a significant amount of what we're seeing here is that they're focused on some kind of user frustration metric which doesn't include things like damage to other businesses.


Yes, I've noticed this particularly with technical results. A lot of sites seem to have scraped StackOverflow and GitHub issues, put a crappy ad-loaded interface around them, and somehow out-rank the original SO/GitHub content.

It's like the bad-old-days of ExpertsExchange, which somehow was never delisted by Google for its shady SEO tactics.


You just have to look at Google's profit motive here. Their motive isn't to provide quality search results. Their motive is to show users ads, either in the search results themselves or on the destination sites via their ad network. The SEO spam sites aren't a bug, they are a feature of Google's profit algorithm. Google's search quality will never improve so long as their motivation is to show you ads. Why should it? Competition may help here, either by an outsider like the OP suggests, or via breaking Google up with anti-trust enforcement, or both (my preference).

As a user, your best personal and ethical move is to install an ad-blocker, to make ad-based business models less viable, which will help promote business models that don't abuse the customer.


The core problem, I guess, is that search engines view all their results as ads. That’s why they got into the ad business in the first place.


>" The core problem, I guess, is that search engines view all their results as ads. That’s why they got into the ad business in the first place. "

This seems a bit overly cynical. Some search engines only served ads, but they're long gone. The survivors are those who dedicated themselves to finding links which were responsive to people's search intent. They seem to have gotten into ads because it was the best business model in this market.


> It's like the bad-old-days of ExpertsExchange, which somehow was never delisted by Google for its shady SEO tactics.

This is really what made me suspect that Google was teetering on the edge of the MBA death spiral: these problems run for years when they'd be easy to block, which suggests to me that whatever metric gets you a bonus / promoted doesn't include things like that which are long-term threats to their core business even if it's selling a lot of ads short-term.


> A lot of sites seem to have scraped StackOverflow and GitHub issues, put a crappy ad-loaded interface around them, and somehow out-rank the original SO/GitHub content.

Some even made slideshows of SO screen captures and put that on Youtube, with a fake video or spoken intro to make believe an actual content will be discussed... A number of shameless people would go any length to grab bits of money anywhere and anyhow, and I've hit those links a couple of times.


They outrank the original content because google is corrupt.


I said this in a similar thread yesterday, but I think this is an unsolvable problem because much of the content either no longer exists in website form or is old.

To put it simply, a new generation of the people who used to make the reliable niche websites that not just answered your questions but also helped you learn a particular topic have moved to youtube instead.

Google search is hollowing out as a result with the meat going and the SEO'd fluff that kinda answers the question but ONLY the direct question being asked with none of the wider expertise that more educated people in what they were searching for.

Of course google owns youtube as well.. so perhaps they just see it as an inevitable transition.


Just a note on that, youtube search is finally getting better, yesterday I noticed it was able to find key words in the middle of a lecture that had nothing in the title or comments. I always wonder about their AI transcription service, it's gotten so good, if they're storing all that audio as text, I guess their search is going to get excellent?


Is that...essentially a Proof-of-Work system...


The problem, IMO, might be the monoculture we have around search. Because Google is soo big, it's enough for spammers to target it and they have the vast majority of the search visibility. If we had better, more diverse competition, that might manifest as a tradeoff, presumably, they would have competing and diverse criteria so you would probably not be the top result on _all_ dominant search engines. SEO spam needs upkeep and attention to latest algos, else it decays. Competing algos would yeld better results for everyone. Maybe Google is just ripe for a shakeup.


Doesn't your model predict that Bing would have substantially less SEO-gamed results?

(Disclosure: I work at Google, but not on search)


Well... Yes, it should. But, no, it seems it does not. I thought about this when typing it but did it anyway, maybe because I thought there is still something worthwhile there.

I still think the model could work if the algorithm is sufficiently different than Google's. Ideally, people would go "I did not find anything I cared about on Google, I know, I'll use Bing!" - but nobody does this, because the results are consistently worse.

Don't get me wrong, I like G as a company, I think they do worthwhile things! But they have left things slip and need competition into this field, I mean real competition, then maybe they would actually address issues.

Maybe the issue is also on the incentive level as well. I mean more searches means more eyeballs and more money for Google. If someone searches one thing and they are done that is less interaction! I hope they don't work like this, but it's possible.

And another possible problem is the opposite. Maybe Google is optimizing search for what it thinks people want, but it uses the wrong metric. Or it gives people what they want but not what they need.


This x100000.

There is no scenario - none - where thousands of engineers at Google working on search wake up in the morning and say "we sure have made it good enough wr2 SPAM. I think I'll have another Danish."


I agree with this and the grandparent comment wholeheartedly. That said, there's a kind of institutional blindness that can build up in companies—especially ones that dominate their sector. It may have roots in intransigent upper management, ossified and inflexible process, wide-scale burnout, a culture of passing the buck, or any number of other pathologies.

I don't claim that Google has any of these and certainly have no insight into their search group. But I've personally been at powerful companies with best-of-the-best talent that were blind to the decay in their own living room, so I would caution against immediate dismissal of PG's take.


Especially since "made a change that improved search result relevance by X%" is an extremely compelling story for promotions. If indeed there is a launch-driven culture for promos at Google then there'd be extra incentive for new mechanisms to reduce low quality search results.


When the cafes were open, you can bet they said, "I'll have another Danish, and then get back to work on this problem that never seems to go away."


I sure wish I had problems that were totally unsolvable, they are so easy to measure progress on. /sarcasm

I think it’s more likely that because they are just building hundreds of tiny tweak experiments and it’s someone else who desides what to build and if it even worked. Search quality is such a meta-problem that it goes beyond any real hope of simply working on it in anything beyond piecemeal trial and error fashion on their dataset.


What is a danish?


A breakfast pastry something like a donut.

https://en.wikipedia.org/wiki/Danish_pastry


The curated search results business model doesn't work. Google gives "aggregators" and other search engines the death sentence for organic search traffic from economically meaningful queries, so you'd get no free traffic. This is one of the major antitrust complaints against Google in the EU. Since you get no organic search traffic, you need to build a brand using advertising, and once you start down that road you need to monetize the first click which compromises the quality of your site.


> This is one of the major antitrust complaints against Google in the EU.

The complaints I've read are from exactly the kind of generated content farms people are complaining about in this thread.


I'm sure they know all about it, but are prevented from doing anything by the business model. Pinterest has been spamming up my search results for years. Maybe other people find it helpful, but I do not. It's obvious I am never going to get value from Pinterest. Let me click a button to add it to my block list. One single click would have given me years of massively improved results.

The fact that this feature does not exist shows that there is something deep within Google's core that is preventing them from addressing SEO spam, just like there is something deep within Airbnb that makes it difficult to filter out Airbnbs with problem reviews.

Google has been coasting for a good long time and now major players are realizing they are wide open for disruption.


> But they aren't free to talk about it, because if they did it would just give more assistance to the spammers, and make the problem worse.

The reality is more that some Google engineer will come up with an algorithm change that makes the result 40% better, but it will come at the expense of making that search 3ms slower so the change won't get merged. Or it will make the results worse for some niche set of queries that the business team really cares about, so again it won't get merged.

There are lots of consumers who would gladly pay $1 a month or whatever in order to use a couple extra milliseconds of compute power per per search in exchange for drastically better results, so there is lots of room for a startup to compete.


> There are lots of consumers who would gladly pay $1 a month or whatever in order to use a couple extra milliseconds of compute power per per search in exchange for drastically better results

Google has a paid-for Search API, so they could do that if they chose to pursue it. And then they could let Google One users opt-in to the same thing via ordinary Search. I'm not sure whether Bing has anything equivalent.


I think the problem is just that the solution isn't in Google's wheelhouse: There is no algorithmic ranking system that can't be gamed. Human moderation and curation is the only way to provide true quality, and Google is allergic to solutions that don't automate and scale.

I think a really good search engine would still algorithmically search it's index, but the content library should be human-curated with a goal of ingesting content via author, not via platform. Once a given author was human-approved as a quality source of information, content they produce could be automatically ingested going forwards, and conditionally re-reviewed by a human if there were reports the quality had decreased.


This was Yahoo in late 90s early 2000s. They had a human curated directory search where one could look up something like "kayaking" and find a bunch of sites on kayaking. Then if you wanted to search on keyword it was outsourced to AltaVista and later Google. Altavista results were terrible and were almost nothing more than a keyword search (IE the word you were searching appeared on this page). Google got much better at the general search and this was history.

I think the death of the directory search dramatically dropped the number of self-curated, informative sites from a domain expert that were common in the early internet. Now instead of making a website, many people are on content silos like Reddit/FB


I do still think we could adapt this model on top of content silos... assuming we can index them! Consider that one could also, rather than just ingesting Reddit content, we ingest new posts from particular users who write quality posts on Reddit.

Assuming a method also existed for an author to authenticate themselves with the search engine, one could also enable an author to help identify their content across multiple platforms, as well as suggest other quality authors to consider.


> The funny thing is that if the people who worked on spam at Google were free to talk about it, I'm sure it would become evident that they know more about spam and anti-spam efforts than anybody else in existence.

That may be true, but I think one of the good points made on the OP is that it might actually be cultural constraints that keep them from solving the problem:

https://twitter.com/paulg/status/1477761335412809729:

> You might need to do a lot of manual spam fighting initially. That could be both the thing-that-doesn't-scale, and the thing that differentiates you by being alien to Google's DNA. (They must hate manual interventions; so inelegant).

Google has some very smart and knowledgeable people, but the things they do have to fit into certain boxes, which means there are some problems they just can't fix, e.g.

* Everything has to be automated at scale, which leads to consistent poor user experience (unappealable account closures initiated by inscrutable algorithms, SEO spam).

* You get promoted by building new products, not maintaining existing ones, which leads to self-defeating churn outside of core areas (e.g. abandoning Google Talk and squandering their position in the messenger market).

* etc.


I understood PG's point differently. My understanding is that he is suggesting an angle of attack in which carefully crafted manual reviews (that do dot scale) can be used to bootstrap a product that does scale thanks to something else (e.g. collaborative filtering). All of this being on a niche domain where you can drive a wedge into the mediocre performance of Google (online shopping probably being the worse possible choice, but there are many others).


But why is Google even dealing with spam? What if they (or someone else) curated top websites for a given category? For instance, when I search for a programming-related term, I already know that I want to see the answer on either Stack Overflow or one of a few reference documentation sites. It is possible that some other site could have the answer instead, but in practice the random sites that often show up at the top of the results are usually SEO spam. A search engine that figured out or let you select the semantic space you are in and then promoted known websites - maybe ones you curate yourself! - would be a big improvement.

Of course you can always hardcode the site you want in the Google search results but this is hacky and not very expressive.


Legitimate sites could help a lot by adding machine-readable descriptions of their content, per the schema.org spec. The richness of these descriptions means that this is effectively a "hard", non-forgeable claim to being a worthwhile, non-spam source (quite unlike the old META tags that got abused to death pre-Google). Of course spam sites could simply lie in their schema.org tags, but the lies are easy to spot (with combined machine- and human-review) and then they just get banned. It makes it a lot harder (and hopefully infeasible) to SEO-spam by just copying random content.


A lot of what counts as spam these days isn't something like "I search for bicycle reviews and get penis enlargement pills", it's more like "I search for bicycle reviews and get some blog who searched Amazon for the 5 most popular bikes and posted links to them with a little blurb and called it a 'Review'".

These sort of things are easy to spot, but only if you actually have a basic amount of familiarly with the topic. It's hard to spot with "AI" or super-cheap labor.


You talk about this "constant assault from spammers" like it's not Google's fault and it's an intractable problem. That is not a correct characterization. There are plenty of low hanging fruit that could easily be detected and deranked, for instance scraped stack overflow spam. But google chooses not to deprioritize these results. The reason they don't is that they make money on ad clicks, which many responses have already elaborated on.


The search results markedly worsened in the last 5 years. Why could they keep up with SEO spam until 5 years ago, and now they can't? Their revenue has been growing dramatically, so they could proportionally increase the allocation. It's probably because the focus of their HR/changing workforce is now elsewhere: maybe fighting "disinformation": both COVID and political. Those efforts were non-existent 5 years ago.


I think it is also no longer in their interest. If you look at their mobile results now, there are sometimes no search results for webpages, just ads, and their automatically extracted data. So, it is in their interest now to have the search for non-advertisers to be bad. Eventually people will consider those results junk and just use the google extracted data/people who paid to go up.


Most comments focus on the technical side of things, whereas I'm sure there are also legal restrictions involved in this. If Google delists a website on the grounds that it's a copycat of stack overflow, or because they have low quality content according to Google's taste, there might be lawsuits filed against Google, claiming that the company is discriminating.


In which countr[y|ies] does Google not have the discretion to decide that certain sites / pages / etc. "belong considerably further down" in Google's search results pages? Seems to me that sorting the search results to #1, #2, #3, etc. is pretty well baked into their basic product.


I think the issue is that these crappy results are kind of good for revenue. It’s not just organic results impacted but all the affiliate ads.

Google is smart so I assume they crunched the numbers and figured out they make more money from people filtering through crappy results that include viewing and clicking ads than by surfacing good content.

I think Google is optimizing for ad revenue, not for good search.


I agree it’s a hard problem. I don’t agree it’s “really really good”. I regularly encounter obviously scammy websites. With Google’s js execution capabilities I’d assume they can detect that. I’m talking about the VPN install pop ups and so on. Right now there’s a whole bunch of GitHub.Io hosted sites that’s doing that. It’s not even porn. It’s home decoration stuff.


> I'm sure it would become evident that they know more about spam and anti-spam efforts than anybody else in existence

Really?

I can point you to Hard Problems that have been solved better at little startups than at Google - or, indeed, at any other bigco. That's why acquisitions happen.

Why does Google having 1000 engineers working on a problem automatically mean they are the smartest?


Spam, yes, but Google has also made meaningful shifts that are clearly directed from the top-down. It's much harder now (imo) to get specific results, they've overall started looping SERPs into broad answers.

This is def a user-engagement strategy -- but it has cons as well.

Part of the complaints in the thread were spam related, other were something deeper


I don’t doubt it is hard but I’m forced to sign into Google now pretty much, just let me rate results and ban domains again etc. You will solve the seo problem really quick and start giving me results I want.


Highly OT, but if a technical person (not at a managerial level) involved in tackling spam at Google were to leave the team, are they allowed to work on the similar problem space at a different company?


I agree. You can say a lot of bad things about Google, but they definitely have some of the smartest and highest paid engineers working on their search. Plus there are already a lot of people trying to compete with Google and so far, no one seems to provide consistently better results.

The only advantage a startup might have is that they could do completely new concepts, such as specifying what area you search in, allow you to modify their classification of your query and/or moderating sites you include - which is probably necessary anyway, since you'll hardly have the budget to fully index the web. I'm not saying it's impossible, but it's not going to be easy at all.

And after all of that, you still need a way to make some money.


Really? I don't think the bureaucratic bloat at Google cares and the original authors of the search engine in its current incarnation are probably long gone. It is maintenance mode and I don't think they dare touch too much.

It takes time and effort to build up a spam site's ranking but it is trivial to blacklist those who get to the top.


A lot of people forget that one of the inputs to the Google ranking algorithm is input from human quality raters who work off of an extensive, 172 page, guide that Google publishes and updates for anyone to read: https://static.googleusercontent.com/media/guidelines.raterh...


Apparently the "human quality raters" never found the sites reported in this thread.


How hard is spam, really. If you're Google? Here's what I would do as a heuristic (uh, not evil?): We know everything about you and everywhere you've visited and everyone you've talked to in the last 60 days. We know all their phone numbers and email addresses. We even know the girl's phone number you met at the bar, who didn't give you her phone number. So if any of those people email you, we'll categorize that as "not spam". Also, if it's your boss or a coworker, "not spam". If it's a major company that's existed for more than ten years, not spam. Everyone else, spam. Done.

This is hyperbolic, right? But they can solve spam in a split second, if they just admit they're watching you all the time.

[edit] /s thx for reading to the end, folks.


    Why not try writing a search engine specifically
    for some category dominated by SEO spam?
I like to compare search engine results and wrote this tool to make it easy:

https://www.gnod.com/search

There in fact are many vertical search engines. You can click on "more engines" to see the whole list.


Okay that's sort of what the !bang in DDG is for, and why it's a meta-engine. What's the blue sky ideal for a real, no-bullshit, everything search engine that doesn't fall prey to the constant flood of garbage?

I have an exterminator who comes to my house every couple months, and sets up traps here, poison there. I don't have any rats in my house. I do see rats running across the yard sometimes. The exterminator explains it like this: Rat pressure. The rats overpopulate and there's "pressure" (like, uh, "memory pressure", which is also a fluid concept) so they try to get into your house more, through smaller holes, as a function of how much outside drama is going on, how many they are and how overpopulated, how scarce their food supply is, how cold it is outside, and whatever else drives rats into your house. (I love the dude who's my exterminator).

Anyway, this is the same problem every search engine faces. The more surface area they expose, the more pressure they have building, the more ways people have to fake out their systems.

We have to go back to the 1990s Yahoo! model. Curated content. A list of websites that are reputable. 1990s Yahoo is the future.


Creating a "trustless" search crawler, where anybody can participate, and then applying an algorithm to determine trust or value feels like it'd be a never-ending arms race - that'd require AI and extensive/expensive resources that is likely better invested in developing real trust networks and curation; curators are corruptible and regulatory capture of policy is possible if the organization is infiltrated or poorly overseen.

Carte blanche opening your system up to anyone to inject data seems like the wrong foot to start off on, whereas my curating a moderator, someone I personally know and feel good about, trust to whatever level, and hiring them - ideally making sure they're someone you respect and you're someone they respect, pay them well, and at scale will be able to pay for itself; this did just bring to mind however big pharma and pharmaceutical trials structure and how that system can be/is/has been captured - and so perhaps the pressures when dealing with multi-billion dollar market categories will always lead to shenanigans if ever trying to centralize too much, not allowing for de-risking and broader resource distribution via sales/profits to more parties than the "5-star" rated products.

In a thread on HN, I think it was yesterday, a few people posted about review sites where some product reviews are free - but others you had to pay for. A system to facilitate such organizations could allow a highly competitive environment, where organizations develop/build a brand - build trust for their brand as being competent and thorough - and so then over the lifetime of a reader/customer, perhaps they'll spend $1,000 buying reviews (say 333 big purchases over 40 years that you're willing to pay $3 a hit for) to make sure they're ; mind you there will be organized that could be captured to say promote one conglomerate of products over another, perhaps even regionally, but I'm beginning to think it's a necessary layer to combat the shit show that is Amazon (et al) reviews. Ideally these systems and how the reviews present the information, and how thorough - the technical depth and breadth and testing done - will help educate those who dive into using this system, which will sharpen themselves while keeping reviewers on their toes and arguably strengthening their organizations and competency as well.


> Creating a "trustless" search crawler, where anybody can participate, and then applying an algorithm to determine trust or value feels like it'd be a never-ending arms race - that'd require AI and extensive/expensive resources

Not necessarily: https://yacy.net


Legitimate question; if this is a real business model (and I believe it could be) then why the fuck does Yahoo.com look like a dead clickbait aggregator instead of, yknow, what it used to look like? i.e. FINANCE ___ [Stocks] [ etc] ENTERTAINMENT ___ [Movies] [TV] Where's that site?


Visionary leadership left the company? And arguably it lost its soul and excitement. I genuinely thought when Marissa Mayer was brought in as CEO and announced they bought Tumblr for $1.1 billion in cash, I thought she could actually turn Yahoo! around - that she understood platforms and holistic systems; perhaps she did but her hands were tied, and then they made terrible decisions like banning porn on Tumblr - so bureaucracy, politics, and arguably the ad industrial complex and "mainstream" pressures (perhaps like billing/financial system being used as a tool to suppress freedom/sexuality/porn because they've not been successful politically) were pressures her or the Board of Directors couldn't counter.

Then Google came along and was a better search engine, for a time, that was a traffic leak for Yahoo! - and then Google has now devolved; I also thought Google had a good shot at competing with Facebook, but whomever's pulling the strings there, the launch of various platforms, they don't seem to understand it can take 5-10 years after the MVP of a product is launched for it to mature - but for whatever reason their executives or managers haven't been comfortable pulling the trigger, arguably because anyone with that entrepreneurial spirit just takes their idea and gets funding and owns a large portion of whatever they've done; but then you can never develop a full breadth, holistic ecosystem, that can grow into every crevice, nor as broadly, or nuanced as possible - so they're stuck being Search, Gmail, Calendar, etc.

I'm quite certain I've figured out the foundational MVP facilitating an "infinitely" growing system and that would allow 3rd parties to integrate, however I have severe chronic pain that messes up my executive function, so it's difficult for me to actually self-direct and execute - I'm stuck mostly in a low activity, stream of consciousness and go-with-the-flow life of routine - otherwise I would try to launch my plans, which I've done plenty of UX/UI for, as that is simple enough that somehow bypasses higher executive function (moving a pixel and then responding via visual feeling of it isn't complex) - but organizing to turn that into adequate specs to get solid estimates or fixed price quotes for work is extremely difficult to me.

On January 11th I do have a surgery that may or may not reduce my pain by 50%+, may or may not improve my executive function, ... I've even attempted to write draft "Show HN:" posts to explain what I am doing, the starting feature sets, the reasoning behind the design decisions I've made - but it just gets too complicated too quickly for me mentally then to be able to organize further or polish it. I think I have the perfect domain name for it too: ENGN (engine), what makes me smile every time I notice it in my layout/mockup of it is in the search input box it says "Search ENGN". My username on HN actually is an older incarnation of a plan I had, loceng being a short form of "local engine" - and ENGN being from engine, a name I brainstormed after Tumblr sold for $1.1 billion - and I realized that eventually I'd want to try to do my "local engine" idea but that that was too long for a brand name. Fortunately engn.com was for sale at the time, I can't remember if it was $2,000 USD or $4,000 USD - either way, not a bad price for a 4-letter .com that's pronounceable to something with meaning.

I've wanted to write a book too - on health, health systems, and on these systems we're talking about here. I'm 38 now and I taught myself to program when I was 11, learned SEO at 15, evolved to design as I'm more creative and programming became mindnumbing to me, and eventually thought I'd need (or want) VC money - so I started engaging on Fred Wilson's of USV.com's blog - AVC.com - so I have plenty of self-taught experience. The problem is even going back to my shorter or longer writings, or comments of mine on HN or other, it's nearly impossible for me to try to do the organization of it all - to compile parts, etc.

Maybe this surgery goes well and I can begin to do more, or maybe it doesn't; I've tried to hire people or get help over the years but 1) no one has been willing to engage enough as I'd need due to my executive dysfunction, and maybe that's a moot point as 2) it's extremely difficult for me to even manage someone or an ongoing project - whereas I could explain things and direct if people are initiating, if others are directing the conversation, then I could respond - but otherwise I can't do normal oversight and management any longer.

The most accurate odds I can give that this surgery will help (piriformis syndrome, my sciatic nerve goes through the piriformis muscles, rather than around it - so there's constant compression + that's worsened with use/engagement of the muscle) is 50/50. There's a high probability that this surgery t's not related to the primary source of my pain, which is from LASIK eye surgery I did 7 years ago - I got arguably the worst of the worst symptoms: central sensitization and hyperalgesia - a hypersensitivity to pain, where all sensations, pain especially, is amplified to as what seems as strongly as possible; and why I must highly limit my activity level as any little stresses on body, likely even normal natural muscle use which causes micro-tears, then compounds the problem and will take many days of very low activity to return to a still difficult-dysfunctional baseline. But perhaps there's a high probability that the sciatic nerve having been compressed for most of my life, my mind, nervous system, could handle that level of pain/sensitization - but then the damage to the cornea that happens in 100% of LASIK surgeries was what finally broke the camels back.

If this surgery doesn't go well, doesn't help - which it took me 1.5 years to even find a surgeon who does this type of specialized surgery - then I'm afraid I may end my life because this pain, the lack of productivity, of being stuck, of quite little social interaction overall - HN is likely the most stimulated my mind gets, only possible to write this fluidly when it's been at least 3 days of eating a very low inflammatory diet and very low activity level - and only if I've been mostly inactive, primarily sitting, since waking and getting out of bed - to not trigger any pain in my body. Anyway, it gets boring, repetitive.

I've thought of trying to find an Elixir/Phoenix/React/etc developer or agency on Upwork before surgery and try to struggle to get them on at least developing the initial foundation of ENGN, but aside from the struggles I listed above that I'll encounter, it also will cost additional money - and I've not worked in 5 years, I've spent $250,000+ on stem cell treatments to heal old high school football injuries, that I didn't even know I mostly had and only weren't tolerable after LASIK made my nervous system super hypersensitive - and to pay for this surgery my mother is taking $27,000+ USD out of her retirement; I'm in Ontario, Canada, but the healthcare system has been practically useless to me. Even if the foundation for ENGN would cost just $5,000 to $10,000 to get the ball rolling in terms of starting to get users to signup and bringing in revenues - it's more money, but even thinking about that additional stress would put on my mother, then adds to my already overwhelmed nervous system - so there's plenty of resistance there to overcome on its own. There's also always the potential I'd somehow hire a bad contractor or agency, bad in one or many ways, and then the MVP wouldn't get finished - primarily because of my own incompetency-dysfunction, and then I will just be reminded, again, of how stuck my life is and how it barely moves forward - personally or professionally.

I'm living a version of the Groundhog Day movie that keeps repeating itself, except where I'm in pain, and so far where I can tell people my story and ask as many people as possible to help and nothing happens. It's why last thing I try is this surgery, though I am supposed to do another PICL stem cell treatment - where they treat tissues inside of my neck - the first one did reduce my neck pain and migraine some - for whiplash related issues, in part from football - because they only treat one side of the tissues, not all of the tissues, the first treatment - and so the second treatment they target the remaining high yield tissues. But I'm certain I'll know after the surgery if there's any improvement or hope that my life can start to become different, and even though there is a stromal stem cell treatment that was developed at University of Pittsburgh - that had very successful human clinical trials, that were fast tracked under compassionate grounds in India, to heal/regenerate deeper corneal tissue for severe scarring and chemical burns - the ETA for it being FDA approved was 5 years to be clinically available in the US, perhaps less time before available in India, but I'll have nothing else significant treatment wise for further pain reduction to look forward to in the near term after getting this surgery - and so why I'm afraid I won't be around much longer if it doesn't help much.


Hey man, try some breathing exercises and cold showers if you can - I watched this documentary https://www.youtube.com/watch?v=8cvhwquPqJ0

and then the original vice documentary: https://www.youtube.com/watch?v=VaMjhwFE1Zw

and then tried the breathing: https://www.youtube.com/watch?v=tybOi4hjZFQ and cold showers and they've been helping a lot with inflammation!

There are tons of benefits of cold showers, so if you can start doing that, that might already help a lot! Here's a bunch of results you could check out: https://www.youtube.com/results?search_query=cold+showers+be...


Another frustrating thing for me: well-intentioned people like yourself will chime in and offer advice or suggestions for things they've found highly helpful - and instead of me being able to compile/organize a list of things I've tried and their outcomes, explaining why they didn't help, why I didn't continue them, why there wasn't a net benefit, I instead get reminded that that kind of writing up a complete list is an extremely difficult task for me. I'm only replying to you this morning because I couldn't actually begin responding to you yesterday even though I saw your reply because 1) I had opened my right eye already which allowed the post LASIK eye pain and symptoms to trigger fully (central sensitization/hyperalgesia), and 2) I had begun to do some activity yesterday - namely driving a short distance to my mother's for a late lunch and then take my puppy to an off-leash dog park for an hour - that relatively small amount of movement being enough to really compound/amplify with the seemingly exponentially growing pain/sensitization from the eye pain.

Central sensitization and hyperalgesia is really a different kind of beast when it comes to pain, people including most doctors really seem to have a hard time understanding it. That added stress to my nervous system from opening my right eye and triggering/ramping up the sensitization/pain from just low activity and careful movement was enough to lock up my thinking, and arguably my emotions, but more specifically the locking/the eye pain and increased pain from movement makes fluid thinking have much more friction added to it; one good or as relatable as possible thought experiment for a "normal"/unaffected person may be to imagine the feeling when you get something in your eye and you desperately react to get it out because it's so painful: now imagine that foreign object feeling is immense because it's permanent and broad, it's "a lot of objects" in your eye(s) because your cornea and nerves were sliced across 90%+ of the cornea (and so abnormally/constantly signaling as such), and imagine how your brain/nervous system may try to cope from such an overwhelming/overriding system constantly firing to draw your full attention to what your eye is telling your body is an active/present moment object in your eye - potentially as if you just walked into a sharp object and your eye is signalling for you to literally freeze still because it thinks you're in the moment just done something to critically wound your eye and moving another millimetre or less would threaten your survival (as the evolutionary strength of the reaction has come to dictate).

Well, the answer is some people after LASIK get this severe reaction, their nervous system gets overwhelmed - arguably the more naturally sensitivity, creative, healthy, and grounded people will have stronger/more detrimental symptoms/reaction to the eye damage; central sensitization and hyperalgesia that LASIK for years completely swept under the rug as being possible and only recently admitting to being a potential "side" effect. In fact they purposefully mislabeled what should be called corneal neuralgia syndrome as "dry eye syndrome" to mislead people away from learning that the "dry eye" part is actually on a spectrum of symptoms caused by the damaged cornea/nerves that happens in 100% of their surgeries; non-LASIK done research done has shown up to 40% of people have permanent problems after LASIK, and in 2011 one of the expert FDA advisors, who votes yes to approving LASIK, published a letter to the FDA asking them to immediately recall LASIK because there was data they were ignoring that they should have never been ignoring, and that it should have never been approved in the first place. Part of the medical industrial complex where arguably regulatory and institutional capture has occurred in the name of profits.

So again, I'm writing this out first thing in the morning with my right eye shut still, my eyes having had a reduced level of pain while I slept allowing my nervous system to calm down some, along with recovering from the movement yesterday that compounds with the eye pain and worsens the mental/executive symptoms/dysfunction. There are a few tangents I know I lost track of writing above, I'm not going to be able to go back to insert them, so if what I wrote above doesn't flow well or seems to be missing something then that's why.

To answer you specifically - I've done breath work and cold showers in the past, daily to multiple times daily for weeks, and there was no net benefit and arguably it was added stress to my nervous system that's already overwhelmed from active overriding [eye] pain that can't be reduced. That same eye pain and the friction it added once I'd opened my right eye, and in part I had used a lot of mental energy already writing longer comments on HN - that mental energy otherwise goes to trying to maintain focus/distraction on anything to try to keep me from getting completely sucked in/lost to the pain, the friction that will block mental (including emotional abilities/processing), lead to that small inconsequential reminder or level of frustration triggered from a well-intentioned person offering advice - where I was too blocked, too much friction/resistance from the pain at that point that it took me at least 20 minutes of going back and forth to reading parts of what you wrote then having to leave because of frustration/irritability being triggered because of the very slight pressure/stress that was added to my already overwhelmed nervous system; I'm not sure I'm describing this well here but it's best I can do at the moment.

That's why logically, emotionally I'm already certain it's the compassionate thing to do and I'll forgive myself for not being able to handle the burden nor for being able to handle knowing the burden/consequences I'd be leaving behind, logically I'm only doing the piriformis syndrome surgery as the last thing I try - so if it doesn't have a dramatic reduction in pain and cause a noticeable reduction in my executive dysfunction (my argument or hypothesis is the eye pain, a/the major source of pain could be compounding with another potentially major source of pain [piriformis syndrome, similar to sciatica], so eye pain could be say 33%, the PS could be say 33%, and the compounding could potentially cause runaway/cascading/feedback loop pain of 33%+ - for potentially a 66%+ reduction in pain/sensitization) then I'm gone. Of course I have arguments and proof points from my prior experiences with healing injuries with stem cell treatments as to why the surgery may or may not help, so the outcome is completely uncertain if it will help enough, if my executive function will improve at all, if my quality of life can improve at all, and so the most accurate odds I can give it is a 50/50 shot.

I've also done plenty of water fasting, I did carnivore/high fat red meat only diet for 8 months - now I try to stick to just organic red meat, kale, and raspberries. I've also done many Ayahuasca ceremonies, MDMA therapeutic sessions, massage, acupuncture, etc. There are pros and cons to all of it, some I've been able to maintain because it's at least neutral - other things like acupuncture are intolerable, for example, as after sessions to clear my Wood meridian energy line [3 end points at right eye] then my body is completely calm but then the pain is completely localized at my right eye and then for the following 6-8 hours I feel an intense burning sensation at my right eye - meanwhile feeling the pain and strwss referring from my eye and building up the stress into my body until there's an equilibrium of sorts where the pain level/hypersensitivity/sensitivity level in my body matches the level of pain in my eyes.

My nervous system is very healthy, it's the pain that's overwhelming, short circulating or disrupting different processes. I can't do anything more to reduce the eye pain because the stromal stem cell treatment is up to 5 years away from being clinically available, everything being delayed due to pandemic as well, so my last hope is the piriformis syndrome surgery - for which I have no real or solid reference for how much it might be sensitizing my nervous system - all I know is "sciatica" symptoms bothered me enough 15 years ago for me to first try to investigate it, but that my nervous system could handle that pain along with vast majority of high school football related injuries that I wasn't even aware I had, and it was only after LASIK that my nervous system and therefore mind was disrupter.)


Lot to digest.

Listen, first of all, do not consider ending your life. Seriously, you're way too smart for that. I'm sure you've got it worse, but I've had enormous sciatic problems in my life, I've had 3 herniated discs; they're behaving for the moment after massive doses of cortisone and without any painkillers, but I know what it feels like to cough them all out of my back at the same time. Not to be able to put a foot in front of another or turn your neck for weeks. (I'm a huuuuge fan of intramuscular cortisone injections, though. Like 5 or 6 large cortisone over a week, with some B-12. Every couple years. Not in the spine... fuck that. Alternating butt cheeks. You won't feel any benefit until the third day at least. If you can convince a doctor to give you that for a week, you will be fucking superman. They won't do this in America unless you know a doctor personally, but they'll do it in Mexico or Spain. I had it the last time my discs went out and it's been 6 years and the inflammation has not come back. They thought it would).

Anyway, before you off yourself, do try a fuckton of intramuscular steroids. The fifth day I levitated off a bed in the hospital; I hadn't walked in a week; I felt so good I went to a club; I got drunk and spent the night on a beach drinking and making out with an 18 year old model from Denmark. Seriously. There were wild cats walking around; it was winter on the Spanish coast. If you do one thing before you die, go get five cortisone shots in your ass, in a week.

I also got the hiccups for 24 hours and couldn't sleep, but that's neither here nor there. And I got temporary blindness in my left eye from fluid behind the retina, caused probably by too much testosterone. But. Goddamn it, I'm ok. You can be okay.

Enough about that.

About Yahoo and Google. That entrepreneurial spirit is, in my experience, way too often just about getting the funding and fucking off. We all know why these companies go downhill, but somehow it's always such a shock when they actually deteriorate in front of our eyes, huh? Google's search results, for instance. I would have expected their cofre business to stay more or less fine, not collapse a couple years after all the competition was eliminated.

It would be fine if they didn't grow into every crevice. Get search right, that's all we ask. I don't want Google to be my chat room or my shopping site. Why do they need to? Search is huge. They own 90% of the market.

>> but organizing to turn that into adequate specs to get solid estimates or fixed price quotes for work is extremely difficult to me.

That's always the worst. The business side. I've always just built things and hoped for the best. It sounds like you've got something interesting going there, although I have no damn clue what you're building, that's an exciting feeling. ENGN is killer. If you own ENGN.com, hell, money well spent.

I don't understand what you mean about "executive function", since you obviously have the capacity to write well-crafted email and think pretty clearly; perhaps I lack the executive function to discern your lack of executive function (I'm a brutally self-punishing alcoholic, but otherwise a damn good programmer)

Anyway I don't know if you're trying to ask for pointers to workers for this concept, I'm probably not it; I'm $200/hr and I'm already covered for the next year. This, however, should be your symphony. And I think you know how to do it.


The fact that you included Reddit, SO, Google Scholar etc is awesome (I thought it was only for main search engines). Thanks for sharing. Bookmarked.


I have a version of fixing this that I would personally enjoy a lot. Leave google alone, let it crawl the web, prioritize what it wants to via algorithms. But, give me a version of that which ONLY surfaces results from discussion forums (including SO, Reddit, HN, etc). For most of the stuff where I am actively searching and not just looking stuff up, discussion forums of motivated, self-selected contributors have the stuff I need with the context I need. It used to be that blogs had answers, but that media has been categorically ruined by SEO.

Now, one of the deficiencies here has been examples. Try this: "best miter saw". you will not find any websites that actually discuss the answer to this question, despite it being a product category with a lot of price variability and performance tradeoffs (weight, capacity, power, cord vs cordless, accuracy).

Nearly any product reviews for large purchases follow the same pattern unless consumer reports has decided to dig deep (e.g. washing machines).

How about guitar strings? Sandpaper? Printers? google's algorithm has allowed profit motivated websites to displace the commons to too great an extent.


How can they tell it’s a discussion forum? Does the scraped search spam that copies stack overflow content look enough like a discussion it fools their heuristics? Is it a manual process (in which case you can bet it “doesn’t scale” and won’t be built) this is the problem they face. Literally nothing is simple given the size of their dataset, the scope of their user base, and the adversarial nature of the very world around them generating new data they must work with in order to do the job.

There’s definitely an element of “we got our profit so fuck it” with respect to the search engine advertising business and Google’s incentives to make search quality better, but that doesn’t change the difficulty of the underlying problem. If I wanted to pay Google $5 per search for super high quality results, even $50, they can’t just make this product better to get my money, they are fighting an ongoing war against adversarial SEO which prevents this from ever being better than stalemate at best or more likely due to economics, the slow slide into declining quality we see due to the SEO side having more money with which to pay for engineering brainpower.


You may be interested in searX - it refers to data sources as Engines; you have the ability to run your own instance (or use a public shared one) and only enable engines you want results from (reddit, stackoverflow etc.). Build your own meta-engine recipe, basically.

https://searx.space to learn / get started. Find one and visit it, click Preferences upper right then follow your schnoz.


The results include all the SEO spam that infected Google, ie. SO clones. How is it better?


The GP commented about using curated sources. One can disable all those (google, bing, etc.) and choose to only enable results from reddit, wikipedia and so forth in searX, which directly queries based on a config inside the project.


My current solution for this is to just tag `site:reddit.com` to the beginning of Google searches. A Google search for `site:reddit.com best miter saw` has a lot of relevant results.

Marketers/SEO people are starting to infiltrate this as well, but since they can't control and SEO the content on Reddit nearly as much, this still works pretty well for now.


Of course! this is a tip I got from HN years ago. it gets old to add Reddit, PracticalMachinist, Fine Woodworking, etc. I really want a proxy for user generated content where nobody got paid to write it.

Fun story: I knew a person who worked for a home building website part time. She got paid to write stories on home renovations. She had never done _anything_ she wrote about. Mostly, she gathered up other blogspam and recycled and rewrote it without citation. Sometimes she went to forums, sometimes reddit, sometimes youtube. But, the universal part of it was that she had to produce 2x pieces of content per week endlessly. Just for a local LA builder. Most of the content wasn't "wrong" but, it also wasn't exactly incisive and didn't include any details that would have been useful. Instead it was just filler. The worst part is that it consistently improved that company's search ranking.

Content farming needs to die.


https://search.marginalia.nu has a very interesting approach to this:

Use tracking especially and JS generally as a weight in ranking so sites that contains much of any of these needs to be exceptionally high quality to float to the top.

This means sites with limited ads and tracking, typically enthusiast driven pages float to the top.

Now always when someone discusses a novel way to combat webspam someone will immediately counter: if this becomes popular SEO hackers will immediately start doing this.

Well - if reducing page size and removing tracking becomes a leading SEO trick I can deal with a bit of SEO hacking :-)

Yet for some reason I feel Google won't start using this very simple metric :-)


Another big factor in what I do is prioritizing the opinions of certain indieweb sites when ranking domains, basically a segment of the graph consisting of humans with a particular dislike for seo spam. This makes ranking manipulation much less effective.


Good to see you in this thread! I just added Marginalia to the recommended search engines in a new search tool I'm building, to get programming answers faster. The search assistant builds queries for specific sites with "site:targetsite.com , programming question" (that comma is not a typo). When doing a query like that I get no results but these warnings:

/!\ The term "," contains characters that are not currently supported

/!\ Try rephrasing the query, changing the word order or using synonyms to get different results. Tips.

sample: https://search.marginalia.nu/search?query=site%3Astackoverfl...

Please make your engine ignore the comma, it shouldn't affect the search.

Either ignore the site:... expression or filter sites accordingly.

Thanks a lot for creating Marginalia!


Ignoring comma seems doable, I'll have it fixed in a few days, currently away from my work computer.

site:-queries are supported, but only at the first domain level. (e.g. site:marginalia.nu; not site:search.marginalia.nu). I might tune it so that it strips subdomains automatically, that is pretty trivial.


I've made the changes to the syntax (as well as fixed a bug that broke site:-queries), although I think it's probably not a great search engine for searching big mainstream websites like SO, the crawler is designed to de-prioritize them over blogs and what have you. Disregard what I said in the other comment about the domain names, use the full domain name.

e.g. this works fine: https://search.marginalia.nu/search?query=site%3Amemex.margi...

If you just do a site:-query without any search terms, you'll get a dump outlining the crawling status of the site. Like this:

https://search.marginalia.nu/search?query=site%3Astackoverfl...

You can see it's crawled 80 pages but not gone much farther since SO ranked so low.


I think the complaints about SEO spam are valid - but - I think msweibel and pg misdiagnose the challenge. The challenge is that you are dealing with an adversarial system, and, the better your search engine is, the more widely used it becomes, the more valuable it is for your adversaries to find ways to game your rankings.

Any new niche search engine will go through a small window of time where they have the luxury that none of the sites they are indexing are spending all their effort trying to reverse engineer your signals, and optimize against them. I'm incredibly skeptical that they can remain useful once people all the SEO efforts of various marketers start to be turned against them.


The main motivation to SEO crappy content seems to be ads and affiliates links. What if you take the motivation away to SEO crappy content by deprioritizing sides that contain ads or affiliate links? Of course Google would never do this, but someone else could.


> The main motivation to SEO crappy content seems to be ads and affiliates links.

Maybe the main observed motivation. But I'd argue that a lot of that is just a fraudulently-profitable front to much more devious problems.


This doesn't work because all of the "good" content has ads and affiliate links too.


Just give users the ability to blacklist domains when searching; pretty soon you'll have a decent list of what users consider worthless.

And pintrest would die.


Google used to have such a feature.

It would be nice if Google would ask you the simple question: "Did you find what you're looking for?" Instead they rely on the assumption that users only stop looking when they've found what they're looking for.

These days, there's a reasonably high chance that I quit looking because I gave up in futility--not because I found what I was looking for.

It's also the case that there's no way to train Google not to omit search terms or generalize them to the point of uselessness.

I really wish abusive SEO were the only problem but it's far from the case. Search results being crappy is a cumulative effect. You could solve SEO spam and I'll still not be able to find a USB SuperSpeed cable because it gets generalized to "usb cable" and there are a gazillion more charging cables than there are SuperSpeed cables.

Used to be that you could quote things to indicate that you really meant it. That's fuzzy now too. Every time we figure out how to circumvent the bad results, features are removed.


uBlock Origin static filters to the rescue!

Block results from specific domains on Google or DDG:

    google.*##.g:has(a[href*="thetopsites.com"])
    duckduckgo.*##.results > div:has(a[href*="thetopsites.com"])
And it's even possible to target element content with regex with the `:has-text(/regex/)` selector.

    google.*##.g:has(*:has-text(/bye topic of noninterest/i))
    duckduckgo.*##.results > div:has(*:has-text(/bye topic of noninterest/i))
Bonus content: Ever tried getting rid of Medium's obnoxious cookie notification? Just nuke it from orbit:

    *##body>div:has(div:has-text(/To make Medium work.*Privacy Policy.*Cookie Policy/i))


>And pintrest would die.

They're friends with the king, so don't hold you breath.


Google knows how to surface relevant results and they choose not to because they aren't optimizing for relevant results, they're optimizing for revenue or profit within some constraints (don't lose too many users, privacy, avoid actually terrible or completely irrelevant results).

All the various suggestions in this thread plus far more complex and insightful solutions are known to Google. Most of it boils down to using automated user feedback to improve or measure search result relevancy.

Google doesn't need to solicit user upvotes / downvotes to improve rankings. They can monitor user clicks on results in addition to analytics on the sites the users visit to determine which sites are relevant to which searches.

Google doesn't optimize for search relevancy.


Likewise Youtube doesn't optimize for relevant results, only engagement to maximize ad exposure. The side effect of this is polarizing content gets returned more than relevant content (polarizing content being more engaging than relevant content apparently).


This is already happening for a bunch of verticals:

Travel - Expedia, Hotels.com, Kayak, etc.

Consumer Goods - Amazon, WalMart, EBay, Etsy, etc.

Automobile Purchase - Cars.com, Autotrader, etc.

Career/Job - Indeed, LinkedIn, etc.

As Google continues to lose search volume on these big revenue categories it is going to make spam much more difficult as they are working to sort out long tail spam. Way harder.


Yeah the comments in this thread are baffling (or they didn't read the 100 character tweet lol). The tweet is just describing domain specific database-like websites. Have people not heard of allrecipes.com? Yummly? Or one of the other thousands of recipe db sites? No blogspam, just structured recipe search. You can even search by ingredient!

Mayo clinic, Harvard health, and pubmeb do a great job with health info. IMDb for movies, Goodreads for books, *gearlab.com for reviews, booking.com for accomodations.

I think the biggest threat to Google isn't a better general search engine, it's user behavior switching to more domain-specific websites as the top of the funnel. E.g. people going directly to Amazon to search for products instead of first searching Google.

To some extent, Google has figured this out, which is why they now have a dedicated flight search, hotel search, product search (Google shopping still exists and it's pretty good!), etc.


Wow so that's completely opposite of my feelings on this topic. I would never, ever use Expedia for travel search over Google flights/hotels. Google travel is the meta-search engine for this vertical. Expedia etc. are all-in on spam and scams, trying at every opportunity to take an extra dollar from you.

The same with Amazon. You'd think that with all the armchair search quality experts cropping up lately there might be more vocal complaints about the fact that Amazon's own search can't find basic consumer products sold by Amazon itself. If I want to find stuff on Amazon, I search Google for it.


Except almost all of these aren't search engines, but their own walled gardens. You can't search for an item on Ebay and find a link to the product's Amazon page, for example.


I'm almost certain that pg gave "compete with Google by competing in some niche" advice 10+ years ago.

In any case, I'm not sure that competing in search is a very attractive notion. AdWords is the only meaningfully profitable search and business. Even if you steal 10% of Google's market, that absolutely doesn't translate into 10% of the revenue.

That said, recipes. Someone make a search engine where the top results don't start with 500 words on the history & etymology of butter, because that's what Google want.


It's usually more than 500 words lol


The quote-Tweeted thread mentioned recipes as one of the things that has been SEObliterated. It's a great example of the problem, and also a great example of the problems any solution will encounter.

Recipes have become a bellwether Internet problem. In the past, your great-grandmother had a card file with a bunch of 3x5 index cards with the ingredients and instructions on how to make everything, and they pretty much all fit on one side. There was a great deal of domain knowledge required (e.g. "whip to stiff peaks"), but these things reveled in their terseness.

Internet recipes all begin with 9 paragraphs of the author's first time encoutering the dish in a Moroccan bazaar in 1997, and the life story of the chef. There are two embedded 10-minute videos of the lifecycle of the vanilla bean. And then you get to the ingredient list. Then two more 10-minute videos, then instructions.

The drive to make recipes full-contact Internet content has changed what it means to be a recipe. This is similar to how cooking shows evolved from Julia Childs working on a sound stage to a carnival barker presentation with vivid personalities dominating the scene.

I'm not sure there is any technological solution to a problem that has fundamentally changed what it means to be a recipe, short of establishing a new informational silo in the form of a new Web site devoted to recipes only. You could encourage an RSS-like format for recipes, but that requires buy-in from places that profit from the new evolution. This new status quo may be good or bad--you can make the argument either way--but it is what it is. A cultural change is required more than tweaking algorithms.

(Unless tweaking algorithms can be foundational to cultural change, in which case we really, really, really need to take a hard look at the corporate behemoths and their algorithms, and sooner better than later.)


The recipe problem is mostly because actual recipes are not copyrightable. See e.g. https://www.plagiarismtoday.com/2015/03/24/recipes-copyright...


Neither are ideas. Hence article spinning and book summaries.

We need a semantic dupe filter: If it doesn’t add new facts or new ideas, treat it as an identical copy.


>I'm not sure there is any technological solution

The technological solution would be to stop rewarding them for these monstrosities. One of the main motivator for turning a short recipes into a 19 page essai about the chef's life is that more words = better ranking.


But the technology is solely controlled by a single multinational advertising corporation. The motivations are controlled by that same advertising corporation.

Which is the same as there being no technological solutions.


And funny enough, it's obvious that Google's engineers know this because they're adding more small, self-hosted featured results to the top of the page all the time.


And the end game is that ads pay for it. Nuke ads or downrank them and the incentive goes away, making space for enthusiast non-profit-driven websites.


I think a lot of this is due to Google both owning search and the ads on the websites (AdSense). There’s an incentive for them to prioritize click farms (and other sites filled with their ads). I think in general there may be a correlation between the number of ads on a site and its usefulness to me, which is inverse to its usefulness to google.

I’m curious what would happen if those products were split up into 2 separate companies.


I also can't help but wonder this. I'm certain people at google search want to provide the best quality search results and do this with integrity. But at some point in the business hierarchy you are at a level where people set objectives for both these departments ( search & ads ) and are trying to optimise for things like total revenue/profit.


Yes. In fact if you wanted to cut out the SEO spam then delisting anything with Adsense would probably be a good start for a competitor.


Good idea. You could start with fitness. Lots of high-quality information out there that’s entirely, 100% inaccessible via google.

Over COVID, I did the whole fitness thing from a few different angles (overhauled diet, trained for a marathon, now lifting weights a lot). I found I could only find good info by going directly to a trusted source - literally, typing http://www. like I’m in the 90s or something. This is the exact issue a search engine should solve, but Google doesn’t.


Is your trusted source the same as my trusted source which is the same as my neighbor’s trusted source which is the same as my Australian cousin’s trusted source?

If not, at least one of us is going to hate this new search engine.


This is a problem I'm working on.

What sorts of unique things did you do that Google failed at? Maybe you read through discussion sites or got tips from books or something like that


Social media and trial and error. I knew a bit, and asked some friends for advice. I used that to find people who said stuff I thought made sense on Quora, Tik Tok, and Instagram (ie they agreed with what I knew to be true and false, so I could assume the other stuff they said was likely to be true as well). I tried what they said, found what worked, and went all in when I saw results.

Importantly, this was bottom up: it was largely recommendation engines suggesting people I then filtered through for what I wanted (running, bodybuilding) vs didn’t (traditional weight loss). I couldn’t specify what I wanted, or it would be garbage SEO spam.


That's a new angle to me. I rely on pseudonymous content like Reddit and HN. It does make sense to look for people or groups focused on a larger topic like fitness.

That helps a lot! Thank you.


Could you share your trusted sources ? Human search engine :P



JPG on Tik Tok, and Geoffrey verity schofield on instagram/quora/buy his book.


He’s just describing Webrings, except in a reactive tense (“filter out spam sites”) rather than a proactive tense (“associate your site with other worthwhile sites”). Google’s ranking algorithm only works when someone is proactively curating, and only SEO spammers do so these days. Reactive curation is not a viable way to manage information.

The simplest way to compete with Google is to create a DIY Webrings site that disallows harvesting of data by Google. Charge curators to create a webring, and let curators select three hashtags and a description that represent their list of fifty or fewer sites. Use the revenue to pay a human to curate the list of hashtags, and let users tip a webring curator in gratitude with an Apple Pay button.

This is how to make a million dollars, Pinboard-style, out of the ashes of the original curated Yahoo idea and the information structures of hashtagging. It doesn’t work if you allow free-for-all infinite-sized lists, it doesn’t work if you allow free-for-all hashtags, but with clear limits and moderation of tags (instead of webrings), it would thrive. By moderating tags, users can keep the webring they paid for, and SEO rings will be stick out for having no shared network with any other rings, which allows for easier detection and culling of malicious non-participatory actors. Plus, with the curation networks in place, it becomes possible to bubble up rings that have unusual content for positive human moderation activity.

I tried to find some good podcast lists yesterday and each site I visited had a really interesting cross-section, but there were so many duplicates. I wish the ring site existed, so that it could remember what it had shown me already, and I could say “show me rings that intersect with this podcast and have something new I haven’t seen before”.

That’s where the theory of pagerank and the practice of curation and the capabilities of search align, and given that moderation of hashtags scales very cheaply, is a billion dollar opportunity that Google and Amazon cannot compete with if handled properly. It’s not about trying to get a cut of every visit’s revenue potential. It’s about giving human beings a directory that respects their time and remembers what they’ve seen.


If I understand correctly, you're saying you'd create an exclusive webring, but the rule to joining is that you have to Disallow: Google, Bing, etc. in your robots.txt file. That sounds outrageous, but speaking as a content creator, I wouldn't be giving up much. My blog gets about 3% of its traffic from search engines. I have no idea who these visitors are or what they searched for, since browsers no longer send referral links. If a webring offered me the benefit of positive regularly engaged community, then having my blog part ways with search engines would be a no brainer. That is after all the Facebook model, except tailored for the open web. Believe me when I say that we bloggers are waiting to be rescued.


No, you can join the webring and be still indexed by Google, but the sitelist of the webring cannot be. That’s all. It prevents Google from leeching off the human curation and not paying a fair market value for it. Given that Google makes billions of dollars a year on pagerank data, webring curators got screwed over pretty hard already once by Google twenty years ago, so no reason to allow it again.


So the webring would use rel="nofollow" hyperlinks? Blog authors might see that as an insult.


I think pg is missing something important here. The reason Google was able to beat Yahoo, Altavista, Ask, etc. was not just because they had a better formula — it was also because they started in the era where 'search' was still seen as secondary to 'portals' by the big guys. Had these companies known how important search is to the internet back then, they would've copied Google's secret sauce and crushed it long before it could suck up their traffic.

This isn't going to happen again. Google isn't going to sit around twiddling its thumbs while a competitor develops a better algorithm.

You have to attack the problem from a different angle entirely (make something that looks nothing like a search engine), I don't think a niche market is going to be enough.

Perhaps you just want to make something that scares Google into acquiring you, rather than actually bettering the situation. If that's the case, I implore you to think of doing better ways to spend your life.


Altavista was the only search engine in the same league as Google. So Google hired the guy who built it. Altavista infrastructure couldn't scale beyond a single server, because that's how DEC was, so it was a smart move for him.


> I don't think a niche market is going to be enough.

To displace a giant gobbling 180 billion dollars a year? Yeah, no kidding. But nobody is asking for that. They are just asking for decent search results.


For medical search the answer is pubmed. Not only is the collection of documents clean (of low-grade scammers, pharma companies have to pay big $ to play) but the NIH has done a large amount of search quality and ontology work -- the system knows "Tylenol" is synonymous with "Paracetamol", "Acetaminophen", etc.


> the system knows "Tylenol" is synonymous with "Paracetamol", "Acetaminophen", etc.

This is the exactly the kind of thing that Google cannot fathom manually doing. As if entering facts into a computer were morally wrong somehow. They'd much rather launch the equivalent of a shell script that harnesses face-melting amounts of computational power, processing literally trillions of webpages in bulk, signal and noise together, junk, spam, and misdirection alike, to learn bad associations and then serve them up with no human review and then put the full force of their reputation behind a results page that apparently people never check and certainly can't correct because of the inscrutability of a machine-learned model that has few to no levers to adjust.


Google has long lied about what they do.

I had a chance to debrief people who had left their relevance team and they told me things that were outright contradictory to what rank-and-file Google employees have told me. (What they told me did make sense in terms of my experience as an IR system developer, SEO publisher, etc.)

Microsoft bought a company called PowerSet that had extracted a large database of entities and relationships from Wikipedia and used the technology to make the "Bing" search engine.

Earlier Microsoft engines were a joke, but Bing was so good that Google saw it as a threat so they bought Freebase to get a similar kind of database, then they killed it to incorporate it into the "Google Knowledge Graph".

For all of their hating on semantics note that they hired R. V. Guha as their chief scientist, who worked with Doug Lenat on the notorious

https://en.wikipedia.org/wiki/Cyc


Isn’t that kind of easy to do? I mean, you can do that kind of thing as a one-man-show on consumer hardware using GloVe or fastText.


A search engine that only indexed Reddit, Stack Exchange, Wikipedia, and a small number of other "good" sites would get 80% of the way there.

No, DDG bang operators don't let you do this. I want an SERP, not a shortcut to a single site's on-site search.


For business news, I do this with https://yup.is


https://twitter.com/mwseibel/status/1477707884632834049

> I’m pretty sure the engineers responsible for Google Search aren’t happy about the quality of results either. I’m wondering if this isn’t really a tech problem but the influence of some suit responsible for quarterly ad revenue increases.

Please no more of this. Two men, Page and Brin, together have basically unfettered control over Google.* If Google does something bad then, unless it's genuinely something small enough that those two could not be expected to hear about it, it's happening with—at the very least—their acquiescence. And low overall search quality is not something that some "suit" is successfully hiding from Good Czar Larry. They could fire the "suit", or command him or her to make other decisions. This is—again, at the very least—something that they have chosen not to do. The responsiblity lies with them.

* There is the risk of lawsuits from the minority shareholders, I assume. But IIUC this is not realistically that big a restraint on what shareholders with a majority of votes can do. However IANAL.


What I am looking for is control over the results. Personalized blacklists and lists of sites to be (de)prioritized and also the ability to subscribe to community curated versions of the same.

And to be clear I want to be able to control these myself, not algorithm trying to guess my preferences. No guessing, just do what I tell you to.

Multiple search profiles with different priorities would be nice too.

I would like the search algorithm to be transparent, I should be able to tell why I got a certain result and how I can avoid such results in the future.


You can use ublock origin to blacklist domains from your search results. I do this for example with codegrepper and other sites who just copy paste Stack Overflow comments in a less readable format.


Thanks. I've been using other browser extensions for this, doing it from ublock origin is a good idea, will simplify things.


FYI - Google hires 10,000+ search result raters [1], who are contractors, to evaluate search result quality.

In an ideal world, you build a thing, and it's done. It runs automatically and prints out money.

In reality, you still need human labors to do manual tasks, even in tech industry.

[1] https://www.searchenginejournal.com/google-eat/quality-rater...


Thanks for posting this. An acquaintance of mine did this job 6 years ago and I wasn't sure if it still existed.

Crowd-sourced humans are making Google appear more intelligent than they actually are. I always envisioned that spam efforts would just immediately set off an alarm that would be handled by a bot to blacklist you without a human even knowing your site existed, but there still seem to be at least a few ways to game Google's search ratings.


Just a minute ago, I made a small typo in a non-obscure programming-related search term.

    Showing results for searchterm
    No results found for searchterm
Followed by an unending list of random celebrities I don't know nor care about, businesses I've never been that sell items I have absolutely no use of, and random foreign news articles.

Failure to recognise the typo is unexpected but forgivable. But then, rather than helping me with my search, they attempt to distract and lead me away from it - using triggers that you'd think they should have known wouldn't work.

I really don't understand how this is even possible, and it's not a rare occurrence.


Was this linked for the irony of everyone spamming replies advertising their startups which don't solve the problem but kinda-sorta do, resulting in something hard to read and understand?


Interestingly, Google Maps doesn't suffer as much from the issues with Google Search. Maybe because it has those community-driven curation features that PG is talking about? Google Maps is fantastic at finding places to go to (and getting you there).

Also, why hasn't Apple built a search engine yet? It baffles me that they chose to go head-to-head with Google on Maps, yet outsourced their search engine. I would've liked it the other way around: Google Maps and Apple Search.


It’s much harder to get SEO spam style content to maps given the geographic region limits involved… But it happens my favourite example is searching for suppliers of something basic, say structural aluminium extrusions, big and heavy and you ideally don’t want to ship it far so it’s an ideal thing to search for a local supplier of. In Australian results is basically a given that I’ll get results for my city because they will list it as a delivery area they supply to however when you actually try to find them on the map as a pin, nope it’s either not their or just is a sales office not a warehouse or workshop, so they have tricked the system into listing them as local to my area in a way that pollutes my maps search results.

But this only works for certain industries. It’s much less common to see this kind of tactic if your searching for say a coffee shop because they sheer number of local results let’s Google be “hyper local” with these kinds of results.


Google Maps is spammed by fake locksmith or other trades that make it look like they're local but all route to the same boiler room (probably right next to the tech support or IRS scammers) from where they dispatch a crooked & most likely unlicensed tradesman that will do a poor job and overcharge you (destroying the lock so they can sell you an overpriced replacement instead of picking it, etc).

For licensed trades the solution is to go to your official trade licensing body (for the UK it's the NICEIC for electricians and the Gas Safe Register for gas/HVAC technicians), for unlicensed ones it's more difficult. There are "review" sites that claim to provide good results but their business incentives & vulnerability to spam/fake reviews are unknown.


Apple makes 10+ billion a year from Google by setting It to the default search engine on iPhone.

I agree they should make their own search engine. But currently they’re being paid a ton of money not to.


I'm seeing a lot of comments along the lines of, "Google shows ads on the SEO-gamed sites that show up in results, so their incentive is to give spammy results". But wouldn't this predict that results would be much better on Bing and other search engines that don't have much presence in the "put ads on random sites" market?

(Disclosure: I work on ads at Google, speaking only for myself)


> But wouldn't this predict that results would be much better on Bing and other search engines that don't have much presence in the "put ads on random sites" market?

Honestly what I think is every search engine sucks these days, and Google manages to suck a little bit less.

The reason is because how easy it has been to publish low quality content. It's rare to find high quality contents. The issue with search engine is that they don't show these rare contents. These aren't recommended by default. These are hidden.

What happened is the recommendation system is broken! If there weren't any neural networks making decision, it would have been different issue. But with modern search engines deploying recommendation system, I think it is all about rich gets richer scheme. You can't recommended new or fresh but quality content because it was never visited! So, when the entire backend is relying on user data, the system is being fed crappy data because users don't care and those who do are few in number.

As long as system is making revenue, it will be this way. Most people never care at all and would never be bothered because they only care for simple queries. If anyone deviates from the norm, Google search results are pretty bad like every other search engines.


10 years ago, the original engineer of Google's search engine told me what he now wanted was asynchronous, human-powered search with curated results, e.g. a Google-like interface, but queries cost $5 and take 15 minutes.

Money's no object for him, so he wanted to outsource the filtering, ranking, and interpreting of results. Would be even more useful today (albeit a tiny TAM.)


I'm seeing something similar to this when it comes to Lead generation for sales teams. The market right now is full of tools where you search for companies/prospects that you want to target and you end up with a list that is either outdated or incorrect data.

One of the solutions is to hire people remotely that will build a list from searching and verifying that the information is correct. There are a ton of scaling issues with a person building the lists, and you will still run into errors, but it has been the best way to verify and have the most up-to-date data.

A problem I noticed is people are so use to getting tons of results when they use Google or another database site, so they expect the same with human-powered results. SO I think a lot of expectation setting and also just working with the TAM that understands the impact a curated list will provide over a outdated database. Both have their costs but for a higher impact the human curation will always win over automation. (I have an example of 3 different lead gen sites having a different city for 1 company, the AI on the websites mixed up the location, a Human would be able to catch that)


I would have told him not to quit his day job. That sounds ridiculous.


I don't agree, plus he's a billionaire.


Why’s he not building it now, then? He has the means.


Building costs more than money.


True. I’m wondering though if he hasn’t given up on the idea. It’s been ten years. Maybe he doesn’t believe anymore that a human could do better, even if given 15 minutes of time?

If such a demand exists, should we not be seeing an active “secondary marketplace” for people offering to do 15-minute human meta search & research tasks?


I said "wanted it" not "wanted to build/finance it." Doubt he's given up wanting it.

A human can always do better: take Google's results, then remove SEO spam/duplicates, extract more relevant snippets, combine results from multiple nearby queries, etc.

Demand exists, but someone has to build it. And it's unclear how big the market is.


> the original engineer of Google's search engine

you mean Larry Page?


No, the guy who re-wrote Larry's research code into Python and put it in production.


> You might need to do a lot of manual spam fighting initially

This is why I am very hyped on Brave search's goggles feature , it will let you share exclude / include site lists to use w/ the search engine. Hopefully it will empower these niche communities to curate a list of non-spam sites ( like the ad-blockers do with ads today )


I believe that the only moat protecting $100B of AdWords revenue is the quality of the Google Search results. There is no meaningful switching cost to using a new search engine, and the spend inertia in ad spend is not very significant (i.e. any online marketing manager will happily spend 5% of their budget in a different search engine adwords-like program if they get better ROI, there is no incentive the be a "Google Ads only shop").

On the other hand, Google needs to maintain the ballistic trajectory of its revenue growth. So how can they fix search quality when they've minmax'd themselves into this situation in the first place? If they were to make the ads background yellow again, that would have negative short term effects that I doubt any career exec can stomach.


No.

The other moats are lock-in to advertising networks, website metrics, and effective control over Web standards through the Chrome browser.

An alternative search platform might provide better search. It would be fighting Google on at least three other fronts. It might have some success, but it would be challenging. (As history largely demonstrates.)

Even a rival tech monopolist, Microsoft, barely holds even with its own search offering (I use that indirectly via DDG), and scrapped its own web-browser development


I mean, Google is big and it has that advantage like any big enterprise, but the search engine market is very permeable compared to what people traditionally refer to as a moat (say, trying to compete with YouTube as a video hosting platform or with Salesforce as a CRM).

If you have a good search engine, people will flock to it, and search ads will be valuable. That's it. That's how Google became Google.

The fact that Microsoft couldn't do it honestly doesn't mean much. Microsoft also couldn't do a phone OS, a portable music player, and many other things. They have a complex web of conflicting interests that $SEARCH_ENGINE_STARTUP does not.


Just for the record, I'm in at least mild agreement that search is looking increasingly vulnerable. Google are falling down here.

It's just that "search" is really a web of interrelated services, capabilities, and revenue streams, and they tend to reinforce each other strongly. I'd like to see the monopoly disrupted.[1] But I don't think it's just a matter of "build a better mousetrap^Wsearch engine." Attack one corner, and Google will snipe at you from the others.

And with the AdWords cash cow, they've got an immense revenue stream.

________________________________

Notes:

1. Well, mostly. Google's acquired so goddamned much personal data that the premise is frankly kind of terrifying as well --- a weakened Google with neither the revenues nor talent to defend that pile.... And I'd really like to see the toppling occur without simply raising a new monopoly in its place.


“ There is no meaningful switching cost to using a new search engine.”

Unfortunately, defaults matter and Google is spending billions of dollars yearly to make sure they are the default search engine wherever they can. Most people don’t switch from default.


The way I see it that's only a problem if you want to be bigger than Google. But you can get to $1B in revenue with just the deliberate adopters (i.e. under 1% of the market).


search ads responsible for the rise of google search[1], content ads (seo spam) responsible for google search's fall?

my guess is the rate of spam content production far outpaces the rate of original content creation. so the power law concentrates even further in the tiny percentage of OC and a moat forms around them (highest ad $, highest authority/authenticity).

where do we end up 5 years from now? further consolidation and the continued return to aol style portals (telco/media giants and fast-lane to own content?) pay-to-access silos dominating the internet?

[1] oversimplifying a bit of course, there was a novel ranking method that was more than accurate enough, and it scaled, which allowed for the search ad business to go gangbusters.



Sadly the only way of fixing it is making search results unattractive for cracking. Ranking of a page in search results is a metric, and every metric is a hackable metric. Only way they won't be hacked is if there's no incentive to.

Sure a search engine that specialises on narrow area of knowledge without much money in it, can be very relevant and bullshit-free.

But there's no way to make it work for the general web search. People hack things. If they didn't we'd have Communism built by now (yes the "good" - classless, stateless one).


All of the amateur search quality experts forget to mention the regulatory environment. Obviously, Google could nuke Pinterest from orbit, dramatically improving image search results. Clearly, Google could effectively take down Statista, technically. But various Eurocracies have shown an extreme willingness to take the side of Yelp, Pinterest, and whatever other spam/scam mills are able to form a shadow alliance with Microsoft's astroturf campaigns like "fairsearch" and whatever.


big part of it. and the scrutiny is only going to get more intense


If this is a real concern they can side step it by simply allowing users to block specific domains, like they have in the past.


Being able to flag Fandom, Quora, and Pinterest results would bring me great joy.


I don't think this problem can be solved by another search engine as they currently exist. The problem must be solved by a new kind of search engine that exclusively searches Internet communities (HN, Reddit, etc.). The content must be community produced, since all other forms of writing have monetary incentive. You will find the best results in communities of enthusiasts with respected moderation teams.

So, a curated strategy where the users can UP/DOWN sources they trust for particular topics. Relevancy and user ranking of answers determines score. Of course, the user can search any sources they like, but a voting system would control the defaults.

I think this avoids slanted results as well, because topics are objective. The subjectivity will be in the comments, where they belong. That's in contrast to today where SEO scams can determine how high up results are.

So, to game my proposed search engine, you need to infiltrate the users. I believe this is harder to do across numerous sources compared to the current system, which is game the algorithm.


Oh so it's not just me... Most recently I was trying to find a way to reset a printer and also fix a certain error code. I search on google and it's filled with irrelevant content, unrelated to the model number I just put in, scam websites, ink sellers, even though I used correct filters such as + sign and the syntax, which is a joke if you think about it. A billion dollar company and this is the best they do. The advanced search is burried, the syntax they have is explained by third parties or burried somewhere in their options.

I jumped onto youtube, I put in the model number, same thing pretty much. I get unrelated videos mixed in with the model number I put in. I'm pretty sure some videos even though the model number is the same is not being shown. Ironically there's a video explaining a certain solution and warning people not to fall victim of another video scamming people, which the dislike has been removed, comments obviously deleted so some people may be calling and getting scammed.


I have seen several google alternative search engine projects being posted in HN every other week. You have your privacy focused open source google alternative search engine for "insert niche here" with big hopes of disruption.

I will give you my two cents. I have used duckduckgo, bing, searx etc. for extended periods of time and hated every one of those things. The problem is that what you search seem to be essentially the gateway to wild west of internet. I understand the proposition of spam control in search engines, but atleast to me I think the early days of google without DMCA and copyright bans made google the best.

I fear SEO spam control will only bring the worst of the moderated internet. It will not be the first time big tech tried to douse a gasoline fire with more gasoline because they taught the more fire meant the previous fire will get suffocated by the lack of oxygen(?). Rather than using "AI" as a crutch to solve SEO as a problem, I want to see an option that is true to 2005 era google.


Go read any default subreddit on Reddit to see what this idea would look like long-term, especially the "amateur police" part.


The whole thing seems "simple" to me – graph of identities with url vetting/liking/approve-this-message-like actions, you don't need anything else.

Reputation, non-fakeness etc. can be derived from it for anybody - you just list identities you trust/follow (with weights?) and anything you look at can be scored.

Virtual identities can also be created, ie. identity listing all links mentioned on HN (with positive sentiment only?), links from wikipedia etc. so people can follow those to create their reality graphs.

The interesting part is that it doesn't claim universal truthness - depending on who you follow your results will be skewed towards their opinion of the world. Ie. if you follow MIT, Wikipedia and E. Musk you'll see different view of truthness than somebody following FOX News and Flat Earth Society for example.

It could be interesting to focus on "dislike" marking (only?) as it may be much more lightweight to approach it from blacklisting side.


Searching code is also impossible on Google. If there’s a competing search engine for that I’ll use it at least for this use case.


Give Neeva a try. We have improved ranking and some nice features around tech queries.


hot take: why would I need to enter my email to do a search online? You already lost me :o


* we're adding a code finder as a topic search engine on Breeze * publicwww.com is also a good source code search engine


If I remember correctly, it used to be called About.com - with categorized and human curated links.

It was big during the dot com days, but withered after Google.

Interestingly, I do think that that model may need to be revisited.

Edit: I feel that Reddit is filling some of this need, at least for things like Vaccuum and Espresso machines with dedicated spaces.


It's funny you mentioned About.com. Or as it's known today DotDash AKA IAC.

It's easily one of the largest SEO players out there, and they've been on a buying spree recently with their purchase of Meredith. The quality of the content has gotten better, but it's still a monster of an SEO optimized content machine.


Yes, I know nothing about what they’re doing today. In fact my pihole blocks them, must be in some list. So I’m pretty sure they’re crap today.

I just remember the concept from way back.


What are some search categories that are so dominated by spam that they are unusable?

I'll start: "how to rent a car" [0]

[0] worth noting that I personally get somewhat reasonable results for this, with a 3rd result from nerdwallet.com and a 4th from wikihow.com, both of which seem to answer the question in an unbiased way


Nerdwallet and wikiHow are both SEO spam content farms. They just happen to have above-average quality content.

They don't exist without a search engine.


Not sure how you're differentiating "seo spam content farm" from "website"


That is part of the problem


Google is not important because it has all the information - it's important because it has hardly any.

A major complaint is that there used to be good free reviews of commercial products that could be easily found.

That is not "all the information". Information about the current round of commerically advertised products is something like 5-10% of all commerce (or less).

And we are entering a world where "all the information" is what we do all day, what we say, how we react to different stimuli.

That is the real review sites - why do people take this train and not that, why is that park safe and this one full of muggings.

We need to solve the Google problem not because we want blogging like it's 2009 but because epidemiology is about to open humanity's eyes. And it's going to hurt if we don't make it free and open.


Google’s job is to serve its customers, and it does that really, really well.

The problems being discussed today (and yesterday in the similar thread) come from the fact that for Google user != customer.

When you have incentives that are misaligned like this, you can only go so far! We seem to have reached that point with Google, where there is not much more that can be done on the search experience front without jeopardizing customer experience (ad revenue).

Disclosure: I’m working on a paid search engine to solve this problem on a fundamental level, by aligning the incentives and making user also the customer so we can best serve them and their needs. It is called Kagi and is currently in closed beta accepting beta-testers.

https://kagi.com


Big money in SEO, I had an acquaintance all the way back in the early 2000s that had tens of thousands of domains that he ran experiments on to reverse engineer how the algorithm worked. He also had tens of thousands he ran SEO/link networks on. He made a lot of money for a long time by being front page for a lot of terms.

Same thing is happening today, there are just more of these actors doing it. They just game the algorithm for terms related to products. Notice you still get decent serps in Google for terms that don’t relate to something that can be sold using an affiliate link.

Fairly easy problem to fix, but Google would have to hire a black hat to help solve it. But the good ones ain’t gonna work there.


Google is so spammy I now instinctively use other search methods at times. Which is very interesting, because doing so is high friction. But it's so spammy out there that pain(spam) > pain(friction).

It is not to be "un-Google" but because I get better results.

For example searching in a good subreddit can be more fruitful, giving answers from genuine people in moderated parts of the internet. If you get crap then try another subreddit - some mods are better than others.

Is this a business opportunity - I think so, although I have no idea how you would go about it. Maybe a decent search engine for programmers would be a good start! E.g. "Exception Message XYZ" + site with decent answers.


In 2013 I elaborated about this topic: http://blog.databigbang.com/letters-from-the-future-challeng... I would add that in 2021 we can easily do Natural Language Understading (NLU) and Natural Language Generation (NLG) and can build zillions of web pages that don't follow the original page ranking concept of Google. Probably important sites share less low rank pages and there are many more link rings and clusters. More decentralized blogs seems a thing of the past (expecting to be rebooted in the future).


Could this issue be related to Gmail's spam filtering? For approximately 2 years now it's been downright porous, I'm getting on average 1 obvious spam message in my inbox that is something like:

c0nGrats-You_HaVe_Won_ThE_Pr1ze!

..Or some silly variation of this that takes literally 0.1 ms for a human to discern that it's spam. Yet something happened to Gmail's spam algorithm in the last couple years that has been consistently letting these through. To be fair, it does catch most spam but it's only batting something like 75% and the spam it does catch is often times much less obvious to human eyes than the stuff it lets through.


A problem is that good SEO doesn’t meant good quality. And assessing quality is hard, so people lean on other people to assess quality (either by appending something like “Reddit” to their search queries or asking friends irl or on twitter/discord).

I wrote a bit more about search engines competition and problems/opportunities here - How Alternative Search Engines Can Win Users https://konaraddi.com/writing/2021/2021-08-05-on-search-comp...


People don't want to search anymore. they want to see well-categorized data. For example, instead of searching for cheap vacuum cleaners, I think they want a site that lists vendors that sell cheap vacuums.


I for one am optimistic what a "post-search" world might look like. Maybe a lot like the early web. I don't think affiliate links themselves are necessarily the problem. I'll gladly use a link from a high-quality reviewer to give them a little kickback. SEO seems to be the issue. Maybe we end up with trusted brands for reviewing specific things. For example, I trust outdoorgearlab.com for pretty much anything camping related, and no purchase comes to mind that I've regretted yet.


One example, the website called gitmemory which crawls github data regularly and have better SEO than github that usually you will find results above original github links.


Hey, smart people: It's called CURATION BY HUMANS.


I'm not very well versed in SEO but isn't this just good old Goodhart's Law?

Come up with criteria to determine which websites are "better quality". Measure them, rank them, put the ones that fit the criteria best at the top.

On the other side, there's the people promoting their websites. Do what you can to get as close to Google's ideal as possible through whatever means. Profit.

At this point the criteria becomes useless for any real quality analysis.


Affiliate links are environmental toxic waste and it would be only logical to tax such affiliate payments to fund cleanup and mitigation efforts.


Who would write good or even decent content for free?


I guess Paul's definition of "beating Google" is "creating a startup without clear revenue path aiming to be acquired by Google or a competitor" as I can't think of any meaningful way a niche search engine would provide a good enough value proposition against existing Google competitors or embeddable search engines (as well as SaaS like Algolia).


I just switched to Duck Duck Go yesterday and was not impressed. When I searched "define hot take" Google gives me a canonical, prominent definition with bold-font title and button to hear the pronunciation. DDG gives me multiple definitions in a normal search results page with none prominent, and no way to hear the pronunciation.

I'll be switching back to Google


Key difference you seem to miss: Google doesn't want you to leave the search engine (unless it's ads), so whether you look for a translation, sport results, weather or definitions they hoard content and shove it on their page. I agree it's often convenient, but realize Google does more than provide links.


Good idea Paul I had similar one but no way you would do it manually. Machine learning algorithms need to detect spam not people because that way search engine can't scale. If people were marking what's good content and what's not such search engine would be reduced to content curation not organic search and discovery.


A lot of the spam results just seem to be copy pasted content.

I wonder how difficult it is to compare the main body of text in search results, then say if it is over a 95% match with another site (I.e. it has been copy-pasted), demote it in the search results. If a site generates too many of these demotions then it gets blacklisted from the index.


I have experimented using LSH (Locality Sensitive Hashing) for identifying similar documents, among 50k documents in total.

My LSH implementation is here: https://github.com/loda-lang/loda-rust/blob/develop/script/t...

Example of the 100 most similar documents: https://github.com/neoneye/loda-identify-similar-programs/bl...

There can be false positives, so after LSH then do a more in-depth comparison.


How would you avoid throwing the original site out with the bathwater?


Maybe try and time stamp the page, presumably the earliest page is the original source. Could also combine it with a site reputation rating or something similar.


I believe google tracks click throughs from search results pages, which should provide in theory plenty of insight into what links aren't really working for specific keywords and what are... thus helping improve or reduce rankings of SEO laden sites.

Wonder if someone can throw light on to why this isn't effective.


> Maybe ultimately you open up spam fighting to your users. If you managed this well, you could harness a lot of energy.

Doesn't Google already consider that if a user returns to the results page (or clicks a second link) then the first link visited was not satisfactory. Seems like a pretty elegant solution.


That's to Google's benefit though, they get another chance to present some adverts to the user.


It'd be really interesting if Google allowed upvote/downvote on search results... but it'd be super hard to imagine them every taking the votes into account much versus ad revenue.

And the upvote/downvote would be very tricky to implement in a way that the SEO crowd couldn't just game it horribly.


Clicking a result is essentially an upvote.

Immediately returning to the results page is essentially a downvote.

You can't really crowdsource this stuff, because the problem of brigading and other forms of abuse is way too high. Just imagine what the crowdsourced results for "trump won" or "trump lost" would look like, or hydroxychloroquine, or ivermectin, or to go with some older cults of personality, Hitler or Ataturk.


I have wondered about this.

When I run web sites I frequently look at the log and find a large fraction of the traffic is from search engines. This is a problem because it costs me money to serve that traffic. It might not be initially obvious but it costs more than serving real users because the search engines will scan everything and break the cache.

Google sends a significant amount of traffic. Bing sends a detectable amount of traffic. Baidu's crawler might be more active than the two of those together but I never get hits from Baidu. Other crawlers deliver me trouble instead of value: even if I'm not interested in hosting pirate or plagiarized content, a crawler that is looking for trouble is only going to bring me trouble.

I hate doing it but I turn off crawlers other than Google and Bing both at the robots.txt and web server level because I just can't afford to serve Baidu queries.

I'd like to sign an exclusivity contract with a search engine such that they get exclusive access to crawl it and in turn I get a privileged position in search results. This would give the search engine and myself an incentive to deliver end-to-end quality results.


We still need a search engine that actually blacklists everything serving ads. Google beat altavista, now we need to beat google.

I mean no mincing about- recipe sites that are ads are blocked. Results with pixel tracker etc are blocked. Hell, results that are paywalled are blocked because they're useless.


Google's ranking alhorithm shaoes the web.

And the web now looks like a 1500-2000 word listicle with 3 images becasue that is what thr ranking algorithm favours.

If you find the info you need and leave quickly that actually down ranks the page. That is is idiotic. Pages that give you what you want quickly are punished!


I think it will be hard to create a great a search engine while the web works as it does today. Maybe there could be like a sitemap but for text content that has the content structured, indexed, and signed by a trusted party in a way that makes it easy to analyze for plagiarism and so on.


For developers, you can remove some spam websites from Google and other search engines, with these uBlock filters: https://github.com/quenhus/uBlock-Origin-dev-filter


Most of their complaints are not related to Google Ads, which means the poor results are not there because of profit motives.

Moreover, they are more related to a specific type of search query, that likely a result of broad based ranking algorithms that loosely are the most efficient ranking system.


Google search results are garbage, at least from a developer's perspective.

Most of the results are poorly formatted content "gathered" from stackoverflow, github, quora, etc.

And from a "person who wants to see an image" perspective, Google is purely a gateway to Pinterest or Gettyimages.


How about a platform for curation. Curators who know a subject well can link to content that looks good to them. Search goes through the curators, people can favorite certain curators. Lots of people like to curate. This is a better idea than trying to go after spam.


301 Battlefield moved: curator spam.


Amazon’s search results and scammy third party sellers are a similar trust problem. When possible I try to purchase directly from the product manufacturer’s website. Similarly, I don’t search Google for product reviews, I go directly to trustworthy review websites.


Improve your search engine results with this one weird trick!

Just block any domain containing the word pinterest


An other thing does not help is how some sites gate content from being scraped. Also forums are not as popular today again reducing the amount of indexable content. Think about some sites have migrated from using forums to something like discord.


Good. In fact, if we want people to visit websites other than google.com (and then read the answer in the snippet or the box in the sidebar) then it's good that google results are crap. Use google less.


Is there a search engine for programming? One that not only searches stackoverflow, github, relevant subreddits and the other big sites, but also finds programming articles in personal blogs?

That would be valuable to me.


Here is a handy list of alternatives to google search

https://fabform.io/a/alternative-search-engines


If anyone has the skills to work on something like this please email me (email address in my profile.)

I can show you a demo. Just to show I am not screwing around: if you don't like the demo I will pay you $500.


What are the pros and cons of a user generated tagging system? If you have a community of dedicated individuals who maintain a group of tags, searching those tags should yield high quality results.


> And boy would Google find it hard to follow you down that road.

This is a good perspective. Where can Google not go? Places that don't lead to profit. They will try (cough Wave cough) but will give up.


But isn't profit the point of any company? If you're going down a path that doesn't lead to profit, you'll fail whether Google follows you or not.


No. That's why Non-Profits, 501(c) corps in the US, exist.

E.g., Linus wasn't looking for profit, and Linux ate the world.


> Lots of people want to be amateur police. And boy would Google find it hard to follow you down that road.

Kinda like they tried with YouTube Heroes?

But then, who’s to say you won’t get the same kind of backlash?


ah, so everyone wanted to move from carefully crafted personal websites where every detail counts and low effort publications are harshly punished to platforms with guranteed readership and now we have a curration problem?

Someone (who probably doesnt have a website) said that comment moderation on your own website is to much work. Perhaps the whole internet is to much work?

But i like the spam search engine by and for spammers as a way of finding the latest and greatest affiliate marketing and blockchain swindle.


Several people mention DuckDuckGo in that Twitter thread. I use DuckDuckGo for my main search engine, and it's not obviously any better than Google regarding SEO spam.


I don't think they apply much spam fighting to the results they get from the underlying search index (Bing), but I could be wrong.


If you were to start a search engine, what stack would you use?


Typesense looks easy to use.

But with 3x memory needed for the indexes, the server costs probably aren't going to be bootstrap'able.

Especially for a "small" crawl of a billion web pages, event at just 10k per page.


Eh, I run a 100M-index off consumer hardware in my living room. Very doable if you avoid bloated off the shelf solutions.


What search software do you run?

What sort of memory and space do you have on the single server?

What's the average document size that you index?

Genuinely curious on how doable a modern search engine is on modern hardware.


I rum all custom software, I feel most off the shelf solutions aren't very resource effective.

The server has 128 Gb RAM and the index currently fits on a single 1 Tb SSD + an Optane drive of 480 Gb.

I find the average document to clock in at 7 Kb, in terms of raw HTML. In the index that's, dunno, probably less than 1 KB/doc.


Google CSEs, Bing API, or our own YaCY instances are more or less what we do atm for various topic search engines as we bootstrap Breeze.


www.neeva.com www.kagi.com Two privacy oriented search engines with results and features better than and surpassing Google (did I mention that they’re ad free?)


Okay, given that we have pretty successful examples of wikipedia as a general crowdsourced information storage and stackoverflow as a specialized domain crowdsourced Q&A site, would it be impossible to build a crowdsourced search engine? Not even scraping the web, but I would just type my search term, if that is already searched and results voted, I would see those. If it wasa completely new search term, I would get no immediate results, but my search would be displayed in "new searches page", which some voluntary people would be following and trying to add relevant results.


“You might need to do a lot of manual spam fighting initially“

How would this be limited to "initially"? Wouldn't it be a lot, initially, and then only get worse?


... and the moment you gain some traction, the SEO monster will train it's eye on you like Sauron; and without a billion dollar budget, you will be toast.


It would work until you got big enough, then you'd end up following the same path as Google, as that's where the money is.


> Lots of people want to be amateur police. (pg)

This is very true. How many times have I clicked on a site met with ads so bad that the browser slows down, and after 10 seconds the page gets covered up by more and more crap and then a paywall shows up sometimes too. Now here's the thing - a competitor to Google might detect you clicking back and then pop-up a special set of controls near the search result that lets you say: "too many ads" or "paywall".

However, if such an engine were to start beating Google, I'm sure Google would implement it in their own way: automatically detect why you clicked back in such a short timespan.


> However, if such an engine were to start beating Google, I'm sure Google would implement it in their own way: automatically detect why you clicked back in such a short timespan.

Do you seriously believe that Google doesn't use that as a datapoint already?


Google already detects immediate returns and knock off that link for you. What's problematic to me is I tend to reactively mash back and forward and link just goes from it.


Sooner or later you’ll have to deal with ballot-stuffing - companies trying to bury their competitors by casting lots of negative votes.

Perhaps ML will help in detecting such campaigns.


The issue is and will always be monetizing. Anyone competing with Google will need to have a robust monetizing strategy to survive.


When the search engine is funded by ads, there is incentive to produce results that people who click ads like.


One approach would be to have moderators from the community who are allowed to make decisions about results.


> What would a paid version of Google Search results look like - where Google can just try to give me the best possible results and not be worried about generating revenue?

God please no. YouTube premium shows what Google would do, i.e., they would further ruin the free experience by ramping up the amount of ads you see to "incentivize" the premium search.


Premium offerings like that are amazing simply for the fact that you can 'return' to the days where you obtained services by paying for them directly, not by looking at ads and paying with your mindshare. Google and YT aren't free services, and it's a miracle they continue to be accessible with ad blockers enabled.


Orthogonal to my point. Deliberately worsening the free experience after you introduced a premium service is a dark pattern for UX.


Eventually search will become a decentralized activity (No, not a web3/crypto/coin type decentralization, I am talking about the useful type).

Is there any particular reason why internet search has to have a distorting gatekeeper to the global commons (that pretends playing Maxwell's demon). For chrissake, the stuff being indexed is public.


>Eventually search will become a decentralized activity (No, not a web3/crypto/coin type decentralization, I am talking about the useful type).

People care about UX not about technology remember that unless people are willing to sacrifice good UX in order to have greater security and privacy. These things are tricky and there is no right formula.


Not all that google indexes is public, primarily paywalls/loginwalls allow google IP's to crawl information unhindered but as users you are not, so a new search engine will have to get to a scale and popular enough for others to open up. Quick example: Google can index many news sites, or LinkedIn profiles for example that a regular user with no account cannot.


Thats true, but probably something that can be tackled later and in any case it would not be a show-stopper for creating a valuable alternative (There are similar thorny related issues around IP e.g. for news sites)


His suggestion basically is to become DMOZ.org If you are old enough to remember it.


As I pointed out to paulg yesterday this was exactly the business model / concept that Blekko was created to address. The idea being that one could use "slashtags" to curate web sites that were "good" on a topic (not spammy) and pull results from that rather than the general web. Guess what? It works great! Also, it doesn't make enough money to support the company using advertising.

For a couple of years, Blekko ran a "3 card monte" game where we white listed the results from Google, Bing, and our own index. For every "contested" query, Blekko consistently beat the others by a significant margin. If the query wasn't contested, Bing and Google did about the same, and if the query was obscure, typically Google did better than Bing or Blekko.

What is a "contested" query? That is one where there is a lot of money on the line. My favorite one was "best credit card" (which is search engine shorthand for "What is the best credit card?" because the stop words "What", "is", and "the" are removed).

Why is it contested? Because if you put an advertisement into the results of that query, and the person making it clicked on that link and signed up for a credit card, you could be paid $50 or more. For a single click. Other queries that advertisers would pay well for getting the traffic of the user were, car dealerships, hotel chains, jewelry retailers, and university "referral" services (like the one that was busted for getting people into Ivy League schools by faking academic records).

Extremely few people click on an ad put onto a page of search results for the query "what is shoe rubber made of?"[1]. However it is required to serve queries like that so that people will come back when they are looking to spend money on something.

So using the same exact idea that Paul proposed Blekko built an English language index which allowed you to curate the crap out of your search results and return much better data. The "value" of that was not considered to be high enough to insist on people logging in to use the engine. Knowing an id for the person making the query allowed for user specific blacklists of spammers (so if for example you never wanted to see a Pinterest link in your results you could make that happen).

Without sufficient traffic, using the feedback loop "of these documents, which one was clicked as the 'best' answer?" type algorithms for ranking fail to converge rapidly enough for decent ranking.

Without a credible threat that if your site is not included in the index, your traffic will be greatly reduced, it is difficult to negotiate with web sites to permit crawling, rather than deny your crawls with the robots.txt file.

Blekko's best customers and most ardent fans? Reference Librarians. Yup, people who needed web search to do their jobs, not to find the movie times for the latest feature. Blekko never did try to create a subscription service, but I think such a service that is somewhere between free and the $$$ of LexisNexis has a shot, at least as a lifestyle business. You still need to get rights to the data and that gets harder and harder.

[1] Okay, bots do, but humans don't


We are working on exactly this problem.

IF anyone wants to see a demo please email me.


Why not Searx or YaCy?


How will these search engines interoperate?


I also consider Paywalls to be spam. Clicking on a link and finding out that it is paywalled, is a massive waste of time.


Agree that is annoying, but if you start excluding such results then how does one find that type of content?


You clearly label paywalled content with a symbol or image.


I think the only issue there might be that google might be unaware that it is a paywalled content due to how many sites allow crawlers access to content but not to users (based on crawler ip ranges). Agree such a flag would save time when available or even a search filter option to skip those results.


How about Bing?

Is it viable competition?


do google search engineers use ad blockers?


Ranking search results on popularity is flawed. It may improve search engine performance and the effectiveness of online advertising but it penalises users who can think critically and independently. There is an underserved market that has been left behind by Google and its "competitors".

PageRank seemed to borrow from the concept of citation count. The idea that "importance" could be measured by the number of times a webpage, like a paper published in a peer-reviewed academic journal, was referenced by other webpages, like other papers published in peer-reviewed academic journals. The initial name for the project before "Google" was "Backrub", referring to the reliance on "backlinks" to quantify importance.

An index of a commercially-oriented www full of sites supported by online advertising is nothing like Web of Science or some other database collection that allows ranking by citation count. The www has no peer-review and no limits on commercial activity.

Google succeeded in creating something highly profitable and sometimes useful, but the founders never delivered on their original promise. That was a search engine in the academic realm, where the technical details were public, and one that would be free from the influence of advertising.^1 Instead the project was turned into an online advertising business. A 180-degree pivot.

The moral/ethical debate went from the question of being advertising-supported to the question of invading the personal privacy of users, for the benefit of advertising. Whatever ideals the founders held in 1998 were overtaken by the lure of pure financial success. Once oppposed to idea of using cookies for advertising purposes, the founders were persuaded to purchase DoubleClick, ground zero for the explosion of online ads, for $3.1 bilion. Not sure what if any moral/ethical debate remains today. While the company is being sued simultaneously by hundreds of plaintiffs, including the US government, one of the founders is "hiding out" on a small island in the South Pacific. Whatever motivations he had to make an open, academic search engine free from the influence of advertising, they seem to be gone.

In sum, the world still needs a decent web search engine free from the influence of online advertising.

1. https://infolab.stanford.edu/~backrub/google.html

Excerpts:

"Up until now most search engine development has gone on at companies with little publication of technical details. This causes search engine technology to remain largely a black art and to be advertising oriented (see Appendix A). With Google, we have a strong goal to push more development and understanding into the academic realm.

Appendix A: Advertising and Mixed Motives

Currently, the predominant business model for commercial search engines is advertising. The goals of the advertising business model do not always correspond to providing quality search to users.

For this type of reason and historical experience with other media [Bagdikian 83], we expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of the consumers.

Furthermore, advertising income often provides an incentive to provide poor quality search results.

[T]here will always be money from advertisers who want a customer to switch products, or have something that is genuinely new. But we believe the issue of advertising causes enough mixed incentives that it is crucial to have a competitive search engine that is transparent and in the academic realm."


Ranking search results on popularity is flawed. It may improve search engine performance and the effectiveness of online advertising but it penalises users who can think critically and independently. There is an underserved market that has been left behind by Google and its "competitors".

PageRank seemed to borrow from the concept of citation count. The idea that "importance" could be measured by the number of times a webpage, like a paper published in a peer-reviewed academic journal, was referenced by other webpages, like other papers published in peer-reviewed academic journals. The initial name for the project before "Google" was "Backrub", referring to the reliance on "backlinks" to quantify importance.

An index of a commercially-oriented www full of sites supported by online advertising is nothing like Web of Science or some other database collection that allows ranking by citation count. The www has no peer-review and no limits on commercial activity.

Google succeeded in creating something highly profitable and sometimes useful, but the founders never delivered on their original promise. That was a search engine in the academic realm, where the technical details were public, and one that would be free from the influence of advertising.^1 Instead the project was turned into an online advertising business. A 180-degree pivot.

The moral/ethical debate went from the question of being advertising-supported to the question of invading the personal privacy of users, for the benefit of advertising. Whatever ideas the founders held in 1998 regarding the influence of advertising on web search were overtaken by the lure of pure financial success. Once oppposed to idea of using cookies for advertising purposes, the founders were persuaded to purchase DoubleClick, a company with a terrible privacy record that uses cookies and purchasing data to profile users as ad targets,^2 for almost double what they paid for YouTube. Not sure what if any moral/ethical debate remains today. While the company is being sued simultaneously by hundreds of plaintiffs, including the US government, one of the founders is "hiding out" on a small island in the South Pacific. Whatever motivations he had to make an open, academic search engine free from the influence of advertising, they seem to be gone.

In sum, the world still needs a decent web search engine free from the influence of online advertising.

1. https://infolab.stanford.edu/~backrub/google.html

Excerpts:

"Up until now most search engine development has gone on at companies with little publication of technical details. This causes search engine technology to remain largely a black art and to be advertising oriented (see Appendix A). With Google, we have a strong goal to push more development and understanding into the academic realm.

Appendix A: Advertising and Mixed Motives

Currently, the predominant business model for commercial search engines is advertising. The goals of the advertising business model do not always correspond to providing quality search to users.

For this type of reason and historical experience with other media [Bagdikian 83], we expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of the consumers.

Furthermore, advertising income often provides an incentive to provide poor quality search results.

[T]here will always be money from advertisers who want a customer to switch products, or have something that is genuinely new. But we believe the issue of advertising causes enough mixed incentives that it is crucial to have a competitive search engine that is transparent and in the academic realm."

2. https://www.nytimes.com/2000/02/17/technology/us-investigati...

https://slate.com/technology/2005/11/why-web-surfers-love-to...


what a great idea.


I'm honestly not trying to take a potshot against PG or YC here, but it's kinda funny to see him saying this after I worked for a YC-backed startup years ago that built its core revenue streams around generating SEO spam, we just marketed it as something else. Just to be clear, I don't think PG or YC are responsible for all or even most SEO spam, but I know firsthand that they've profited from it through at least one of their incubated companies.

I never considered the possibility that an incubator would support a specific product, then later on call for alternatives that would essentially freeze out the original product that they supported. I'm sure this very rarely happens, but it's interesting to see a real-world example in action.


I don't think it's as counterintuitive as it sounds - just because they're playing the game doesn't mean that they think it's right. If your options are to not be successful at SEO or to do the SEO spam thing, I don't think it's necessarily wrong to do the latter - it's your job to make your startup successful, not to make a stand against the way Google does things.

I view it as something like the rich folks who call for additional taxation of the rich. They're not going to just pay extra money that they don't have to under the current tax rules, both because it's not particularly fair and because one person paying extra taxes, even if they're very wealthy, isn't going to make a big impact. That doesn't mean they can't lobby to change the rules and be totally fine with it if everyone is paying additional taxes.


Don't know how rarely it happens. After all, weapons dealers frequently arm both sides of a conflict. They don't really need to care who wins - they're just making twice the amount of money.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: