Hacker News new | past | comments | ask | show | jobs | submit login
The Age of PageRank Is Over (2022) (kagi.com)
76 points by dotcoma 80 days ago | hide | past | favorite | 66 comments



Can the script on this then be flipped? Build a search engine, clearly smaller in scope and commercial utility, that if a site links to a payment or ad network, de-rank it heavily. Then the end result should be in theory, filled with what one would consider the "old" internet, primarily blogs and sites not trying to sell you things or abuse your data.

None of the large companies would do it, but that would be the point.


Introducing: Kagi.


That's not what Kagi does. Nor does its "Small Web" search mode, as it only searches blogs that have been manually added to a specific GitHub repo (so for most part is a collection of US tech blogs - not very diverse at all)

No VC-backed or commercial search engine would do what OP is talking about. But I can see a use for a niche search engine that ranks websites inversely proportional to the number of trackers and ad networks they depend on. Heck, I would pay for that, but I'm a nerd.

Maybe marginalia.nu would like this idea.


> not what Kagi does

Kagi maintains a "non-commercial index (Teclis) and non-commercial news index (TinyGem)" [1]. They also "prioritize non-commercial sources," implicitly downranking monetized sites. (Also, you can manually downrank problem domains [2].)

My devices have randomly switched back to Google or DDG from time to time. The first thing I did was check my ad blocker was working--I was simply stunned by the amount of blogspam puke.

> No VC-backed or commercial search engine would do what OP is talking about

No ad based, i.e. free, engine can, not sustainably. A paid one obviously can and does.

[1] https://help.kagi.com/kagi/search-details/search-quality.htm...

[2] https://help.kagi.com/kagi/features/website-info-personalize...


> That's not what Kagi does.

We do actually. We penalize the sites with a lots of ads/trackers on them in our results and boost non-monetized pages. It is one of the main reasons Kagi reults have a specific 'flavor' to them. (Kagi CEO here)


You surely have huge exceptions to this rule, or major silos like Reddit, Twitter, StackOverflow, Youtube, Facebook etc. would rank towards the bottom.

Which the average Joe wouldn't like, hence my comment that no commercial search engine would implement this feature correctly, on purpose.


Not really exceptions (all treated equally) just there is nuance in the algorithm. Having some ads doesn't mean you go straight to the bottom. And when searching for youtube you clearly want youtube.com - so nuance applied but that is the principle driving it.



What’s the reason that search engine would ask for a login? (Like Kagi does)


Because it's a paid service? That's the entire point.

And that also enables tons of user-centric features they talked about, starting with the earliest ones and my favorites: being able to uprank and downrank domains (like, "ban pinterest", "pin Wikipedia", "downrank w3schools", "uprank cppreference", etc.), and adding rewrite rules to results (like "reddit.com" -> "old.reddit.com"). Both of these are personalizations tied to your account, so they're active on any device as long as you're logged in.

They've added more cool stuff since, but these two alone were what has kept me a paying customer for the past years.


Ok the paid part makes of course sense.

However I still find it a bit creepy to know that they know all about my searches and even up- and down votes.

Google at least can be used in incognito mode.


Google knows what you search, even in "incognito" mode, and even when you're logged out. They correlate your IP address with your search profile and use everything you search for in your ad profile.

Kagi does not track search history, period. There's no history attached to your account even if you wanted it. The login is purely for authentication.


Those sites don't exist any more. Not literally of course, but effectively. There's probably only a couple tens of thousand if I had to guess (and many of them abandoned blogs with a couple dozen pages of no note, hosted by some server that has not yet been unplugged because it's physically lost). Also, good luck trying to find them (either electronically or physically).

(And I say that as someone who owns such a site.)

Case in point, I wanted to link to http://bash.org/?5273, but bash.org no longer exists.


You can use Wikipedia's built in search for that I guess.


Always found the concept of PR intriguing and the original paper to be “foundational”.

I wrote an alternate reality fiction short story about PR and SEO. It is here: https://github.com/jaronilan/stories/blob/main/Duplicitous.p...

Not the most “popular topic”, but thought people around HN might find it interesting.


I'm not convinced Google has stopped using backlinks and other classic pagerank attributes for search ranking.


Nobody said they stopped. The article argues that the practise has become infinitely less effective, and Google is optimizing for their customers as opposed to their users.


For many queries, it seems the top result is just Stackoverflow, Wikipedia or something simliar, then a couple of organic links, and then just junk. And on the second or third page, the results just stop. I have a feeling that at this point, it is a mixture of white- and blacklisting, and single-page classifiers based on heuristics or "AI". PageRank is surely in there, but probably not dominant. (Sidenote: I wonder if they secretly do ignore rel="nofollow" because so many links use it.)

Funnily, I've used PageRank for some smaller search projects, and it still works very well if you use it in a vertical, say "educational resources" or "programming" or simliar. I just did a broad crawl, 3-4 steps from some known "good" seed URLs away, calculate PageRank and mix it in with Solr's classifiers.


I think you are right. They might be using a plethora of other factor as an addition to back links; however, genuine links from authoritative sources still seem to be the best "voting" mechanism to bubble up good content/results.

I can imagine that Reddit could join the search war - first, they have so much user gen content that people are deliberately looking for; second, they could use the voting mechanisms that are already in place to give preference to the "better" content. Of course, I might be wrong :).


The part where you might be wrong is the part where Reddit does it. IIRC they have been “working really hard on a better search” for Reddit for…15 years? Every couple years spez will pop up and claim they almost have it!

I wouldn’t hold my breath.


There are some subjects that are personal that no one will be comfortable searching for while its tied to a billing address (i.e. any login).

Abortion in some states for example. It doesn't matter what any company says right now, the position is clear that the tune can and often does change the moment its inconvenient for them.

Paying for the privilege of being targeted is crazy talk.


> while its tied to a billing address (i.e. any login)

Total false equivalence. Kagi accepts Bitcoin [1]. Logins are not identifiable in the way billing addresses are.

> doesn't matter what any company says right now, the position is clear that the tune can and often does change the moment its inconvenient for them

You think the free engines don't know who you are?

[1] https://blog.kagi.com/accepting-paypal-bitcoin


bitcoin is traceable. not a false equivalence at all. Search history is kept tied to a profile/semaphore/data lake and then also tied financially to the way you pay, as well as the devices you use.

When data is collected, it will be trolled through for anything and everything. Absence of data is just as unique. The only way is to blend into the masses generically.

Free engines have a very hard time differentiating me from the mass of other web traffic.

I compile my own browser to make it that way though, even my phone doesn't register an overnight location, and runs GrapheneOS.

Importantly this should be a fundamental right of every person not just those that have the money and expertise to enforce it. The data collection shouldn't be allowed, its the equivalent of quartering a digital soldier in every home (something forbidden by the constitution).

Any business funded and dependent on arbitrary preferential loans (from the printer, regardless of how indirect) is state-run and nationalized industry, and should be bound by the same constitutional requirements of government.

I take my privacy seriously because of my job, and the indisputable fact that sensitive roles get targeted.


> this should be a fundamental right of every person not just those that have the money and expertise to enforce it

Sure. But it isn’t, and as long as the choice is free, ad-supported and paid, privacy contractually (but not technically) guaranteed, there is an obvious answer as to where the smart money goes. And that’s before we get to search quality.

A unicorn third option would be nice. But if even people who can pay for this aren't willing to, it condemns the idea of a public option in the cradle--it screams people don't value privacy in search enough to a quantifiable degree.


If you're a big fan of privacy and Kagi, then I'm not sure if you can appreciate the content of this article. After all, it describes their desire to build a search agent that knows everything about you, from your religion to your politics to your favorite brands.


People post video evidence of them literally committing felonies to their own personal public social media accounts. I know "no-one" is hyperbole but risk perception and tolerance is much more variable than it would need to be for this to be persuasive.


What if you don't care about privacy and only care for search quality? For example, when you're doing work related searches at work on your work device. Is that allowed?


When colleagues ask questions in Slack, I sometimes paste Kagi's search summary. Quick and usually spot-on.

Funny thing is, I've told the team about Kagi, but not everyone's willing to pay/see the benefits yet. Meanwhile, I'm wondering why they're asking if a good search engine could answer it so easily.


Amazingly prescient. The original version of that article was written in 2019


I used to get sent a monthly cheque for $1K around the turn of the millennium simply to have a link on my home page, such was the power of Pagerank back then. This was handy as it allowed me to invest time into learning web development, plus the site was educational and paid for its existence.

Producing content was harder back then and the web was a lot smaller. Google most definitely still use links to rank but are much more likely to discount/devalue links than they did historically.


Buying backlinks is still around and costly.


No doubt. Search still struggles to measure intent from the content creator PoV.


I like the idea to pay for quality content - or search results. If you don't pay for it, someone else will and the "sponsor" will probably have different objectives than you. So, you won't get the best information and it would be likely to waste money because of that (e.g. buying an advertised low-quality product).


Discussed at the time:

The Age of PageRank Is Over - https://news.ycombinator.com/item?id=33537513 - Nov 2022 (373 comments)


I will do you one better though, given my conversation with teenagers this weekend: Pagerank is dead, because webpages are dead.

The only moment these kids use a search engine is when they do homework. In any and every other moment they just search "locally" in TikTok, or insta.

It scares the shit out of me.

(Edit: paying kagi customer here! Keep it up, kagi! I still love and need you!)


>because webpages are dead.

I think it's more accurate to say that the internet and perhaps personal computing in general is dead.

Practically everything about what makes a computer tick has been abstracted away, because rightfully or otherwise people today just don't care. When was the last time you saw someone actually using the address bar in a web browser instead of Googling? Or indeed Tictok'ing or Insta'ing. Nobody knows what a file or folder is either.


The majority of people never did personal computing. Remember the Eternal September where internet users were saddened by hordes of normies messing up the net? I feel like there was probably a spike around the early 2000s in number of people doing personal computing. Since then it's gone back to normal. Most people aren't interested and have no need for it. Tiktok etc is just the new TV: something to mindlessly rot away in front of. We also need to remember that even those people who did use computers never used them how we use them. They used Windows and Microsoft stuff. Hardly in control of their own computing. Part of me is saddened by it, but then I wonder if that isn't the case for everything: cyclists are sad about all the cars, cooks are sad about all the ready meals etc.


> We also need to remember that even those people who did use computers never used them how we use them. They used Windows and Microsoft stuff. Hardly in control of their own computing.

That's just elitism.

MS and Windows gave access to computers to millions of people who otherwise wouldn't have been interested. MS products allowed them to be productive, and enabled thousands of businesses to function. MS was instrumental in the explosion of the internet and WWW in the 90s. No niche hacker-oriented or consumer OS had as large of an impact as Windows.

We can argue whether MS has lost its way since then, but claiming that people weren't in control of their computing in the 90s because they used MS products is silly. The aggressive tracking, SaaS business models and everything else MS is criticized for today came much later.


Hmm.. this is the line where it become elitism? Google enables people to be productive and thousands of businesses to function. What's the difference between using Google and a PC in the 90s? That you own the hardware? The hardware is useless to most people without the software. OK, so it's the data. Well, Microsoft were (and are) champions of vendor lock in. Even when they claimed to use open standards they messed it up (see the Office XML formats). If anything, Microsoft conditioned people to believing that personal computing was just running the software in your own house. It doesn't mean you are meaningfully in control of it, though. For me, it's only a short jump from there to what we see today where people just do the computing on someone else's computer too.


> What's the difference between using Google and a PC in the 90s?

Well, you answered that yourself. Software in the 90s was running locally, and the data it used was also local. That's an important difference from running software on someone else's computer with data you don't control. That access can be cutoff at any point, and there's nothing you can do about it. With local software and data you would at least have the option to export the data to another format (assuming the software supports it), or to read the data with another software. For the specific case of MS Office formats, even the closed ones prior to the XML formats were readable by 3rd party software. I would qualify this as being in "control of their computing".

Vendor lock-in doesn't deprive users of this control. You have exactly the same issue with F/LOSS today. Whether you choose to use PostgreSQL/MySQL or Oracle/Access, you're always locked in to that specific vendor. Hopefully that vendor has good integrations and interoperability features so that you can migrate to something else if you wanted to, but the actual license has no bearing on this.

I suppose our disagreement is with what it means to be in "control of computing". If you define it as being able to read, modify and share the source code, then proprietary software doesn't fit that definition. But most users don't practically need this level of control. As long as the software has decent interoperability, and you actually have control over how and where you run it, then proprietary software can qualify. SaaS, OTOH, does not.


> Even when they claimed to use open standards they messed it up (see the Office XML formats).

Yeah, sure -- but that came long after the 1980s / early '90s, which was the period at issue here.


> When was the last time you saw someone actually using the address bar in a web browser instead of Googling?

I used to laugh when people Googled a websites name instead of entering it manually. But these days, I find myself often either Googling (or DuckDuckGoing usually, but as a verb that just doesn't have the same ring to it)for the name, or relying on autocomplete from my bookmarks or history.

I feel website names have become less predictable mainly because of the explosion of possible top level domains: even if I know the exact name of a website, I can't reliably remember which TLD to use. Plus I'm more and more worried of accidentally using the wrong URL (through a misspelling or a wrong TLD or whatever) for fear of ending up on some scam site instead of the real thing.


>Plus I'm more and more worried of accidentally using the wrong URL (through a misspelling or a wrong TLD or whatever) for fear of ending up on some scam site instead of the real thing.

I also Google for websites I really should know by heart more often than I want to admit.

Why? Because I've accidentally typed googl.com or google.colm and then immediately slammed ALT+F4 enough bloody times that I actually no longer trust myself to type straight.

At least if I typo "googl" into Google I'll either get an autocorrected result or utter gibberish instead of a drive-by trojan to my face.


> everything about what makes a computer tick has been abstracted away

It’s always been abstractions. Even when we didn’t know it. (We mastered the technological use of transistors before we understood why they work.)


> people today just don't care.

When did people 'care'? Complex technologies have always been pushed on people. Regulation, consumer protection etc. is supposed to be the informed, delegated 'caring' filter.

The interesting long-term dynamic is that the problem is self-correcting. If you use amazing technologies to breed masses of addicted, exploited idiots instead of informed empowered citizens, eventually your walled garden will collapse onto itself.


>When did people 'care'?

Back when using a computer required having some idea of what was going on. Even if all you really cared about was playing Doom, you still needed to make sense of all those levers with nonsensical labels on them and boy did we figure them out.

Mind you, I don't see where we are today as necessarily a bad thing. It's a very good thing that most people can just use a computer as just another tool like a screwdriver or a car.

But on the other hand, we've also lost the joys and miseries of getting our hands dirty.


Yes, I also think the phenomenal adoption of computing is fundamentally a good thing as it unlocks new levels for society, at least in principle.

In the short term it triggered a race to a low-information, consumerist bottom in many respects (privacy invasion, addictive dark patterns, locked-down platforms, general enshittification etc.).

But this state of affairs, while unfortunate and lamentably gross waste, does not feel terminal. It is very unfullfiling beyond superficial sugar rushes and essentially hostile to the user-product being exploited. Especially so for the talented people that itch to get their hands dirty.

Informed and able minorities are what moves the needle, not dazed and confused masses. The history of (online) computing does not end here. It is still very early days.


I think its a fair point to say that most of humanity will probably be dead within 20-30 years. You have far more destructive people in positions that only make survival less likely, compared to the intelligent ones.

The people who made things work will die of old age or withdraw their support (on strike), LLM's will prevent new workers from developing the same expertise (since entry level positions will be removed).

Systems that have stood strong as oaks for centuries will suddenly fail, and with that collapse so too goes the food production.

Non-market socialist systems don't work, but you get those same systems during currency collapse (where ponzi outflows exceed inflows, or debt growth exceeds gdp).

No one knows a thing because socialism has done its dirty work, having captured academia over the past 50+ years, and indoctrinated the masses.

The benefits were front-loaded (as all ponzi's are), and it happened slow enough that no one noticed over multiple generations, and the generation that got the most benefit won't cede political power (they took power in the 1990s, and remain the majority today). They'll give it up only once its pried from the dead hands, which will come from natural aging.

Menticide from the Totalitarian state has a deleterious effect, making it harder for people to see the problems to take any action. Joost wrote extensively about this with regards to the Nazi's and Mao.

What we are seeing today is hubris and a natural consequence of ignoring lesson's learned.


"I think its a fair point to say that most of humanity will probably be dead within 20-30 years. "

Why do I even read this stupid fucking website


To be exposed to ideas and observations that may be outside your normal perception.

Forewarning of the issues when it matters, following rational principle, is being forearmed in preparation (should you choose to act on that warning and hedge your survival bets for yourself and loved ones).

Those that accurately predict and prepare survive. Those that don't are culled when the environment becomes disadvantaged towards survival.

There is always an element of chance, but adaptation and flexibility greatly bias towards continued persistence.


> When was the last time you saw someone actually using the address bar in a web browser instead of Googling?

I see the guy in the mirror every once in a while, why?


To be fair, I usually type the first few characters of the website I want to go to, and then pick the thing I want from the autocomplete results.


I run a website and that doesn’t seem to be the case. Some things don’t fit into a TikTok video. Some things can’t be answered by ChatGPT. People still search for things and find me.


> paying kagi customer here! Keep it up, kagi! I still love and need you!

What is it with the Kagi sycophancy on this site? Right now there's yet another discussion on the front page full of glowing Kagi adoration interspersed with some reality checks about the actual quality of the search results. I understand that Kagi is a Y-Combinator company but does it have to be laid on so thickly?

As to what I use for search: a self-hosted SearxNG (a meta-search engine, i.e. it proxies search results from other search engines) instance (started with Searx but followed when development moved there) combined with Recoll for local search, recoll-webui and the recoll 'engine' to integrate results into SearxNG. I also experimented with YaCy (a fully self-hosted search engine with its own web crawler) but have not gotten useable results yet, the system seems to get bogged down once the index grows behind a certain size.


Just try it, I guess. I was also sceptical and ran my own search engine and at one time I was even pitching people a paid indexing service that would allow people to self-host their search engine and by pooling money they gain world-class crawling.

But in the end, I noticed that for me, improving search results is mostly about suppressing garbage. Kagi lets me filter out Pinterest and some of the worst SEO spam farms. And with them gone, the results already feel much better.

I'd guess Kagi is popular here because they sell what people crave.


It really reminds me of Roam Research, which was also an overpriced, niche product that was unavoidable for quite some time in these circles. Now if you google it (or kagi it?) you just see a bunch of reddit posts asking if Roam is dead.

You'd think Kagi has millions of paying users, and not ~33k, seemingly half of which are on HN.

Hopefully this time around they don't actually start referring to their community as a cult like Roam did.


Kagi is not a yc company. They're privately funded and their last round was for less than a million dollars and was from private investors


Didn't even know they were a ycombinator company and now that I do I like them a lot less.

I am not a sycophant. Just a happy customer. And happy to have the chance to be a customer, not just a data point. I love kagi trying to serve me, with actual usefulness, instead of serving some dark marketplace of advertisers that have no interest in my wellbeing.


> Didn't even know they were a ycombinator company and now that I do I like them a lot less.

You didn't know it, because they aren't.


Maybe it's a good product or maybe they paid dang off. Two possibilities. Which one is more likely?


I've come to realize the same thing. I was found the "boomer" thing to look for restaurants on Google Maps, then I realized younger people use Tiktok for this now. And here I thought Tiktok was for just for memes and funny videos.


In China, there's WeChat, which is basically the everything app, from chat, to food delivery, to ecommerce, payment, and even navigation. Normally, that would violate App Store guidelines, but they are dominant enough in the Chinese market that Apple caved in and granted them special permissions.


If they didnt, Apple would have been banned. Thats the power of CCP.


Don't forget indoctrination and demoralization for thought reform as well. Beams marxist reeducation right into every teen's life in subtle ways.

There's good reason its being considered a national security threat when all user data is being shipped through China, and their businesses are a government partnership.

Political Warfare and Subversion are real and difficult issues to deal with given our 'open' society.


> It scares the shit out of me.

It should because anytime austerity, or other diverse circumstances happen where life is on the line, these are the first people that end up dying and worst the responsibility for the travesty is entirely on the previous generation in terms of political power. These are summer childs, and winter is coming.

People aren't naturally stupid. Its a long process of drugs and torture that makes them like that. Only the drugs are flouride and dopamine, and the torture is arbitrary struggle sessions built into every process to eliminate rational reason and thought where they have to interact with it in society during critical identity formation periods (permanently damaging them).

No doubt there will likely be a great dying in the near future. You can only kick the can so many different times in cycles before those cycles all line up at the same time. Mother nature is a bitch, and the unprepared die when safety nets fail.

The first most important and crucial tool for survival is having your brain and knowing how to reason following rational first principles and thought.


(2022)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: