Hacker News new | past | comments | ask | show | jobs | submit login

If the spammers only want to be indexed, then NPM should disable indexing for major search engines. But still allow it to be indexed other ways, which aren't unearthed on Google search.

Other ideas include: do not index new packages before they've garnered enough downloads.




As a developer, I want npm package information and docs to show up in search. I frequently prefer pypi or cran results over others because then I can easily tell if it’s a usable package vs just some snippet.

Especially cran because it has pretty rigorous entry requirements so being in cran is a signal of at least some minimal quality.


As a developer, I want npm package information and docs to show up in search.

What case is there when you want to find a package in NPM, and information about that package, using Google? If you want information about the package then it's find if the NPM package page is missing from the results - so long as you're getting the package's homepage or git repo then that's plenty. From there you can get to it's NPM page. If you know the package you're looking for, or if you know what you want to do, then searching NPM itself alone is fine.

Essentially, there is no overlap in the Venn diagram of "searching for a package" and "searching for information about a package". You want one or the other, not a results page with links to both.

If people realized this about their searches more then Google could fix a lot of spam problems.


> there is no overlap in the Venn diagram of "searching for a package" and "searching for information about a package".

I don't know, if I want information about something, it seems pretty reasonable that I might do my search for that something.


If that's the case then you're doing the second of the two searches, and if the NPM package wasn't in the results but its Github repo or homepage was you're still getting the results you wanted.

For any search where you don't know what you want Google without NPM pages works fine.

For any search where you do know what you want NPM's search function works fine.

There isn't a case where you need Google to interleave pages from the wider internet with pages from NPM. You only think you want that because it's what you're used to, or because you use Google to do searches that you should really do on NPM instead.


> if the NPM package wasn't in the results but its Github repo or homepage was you're still getting the results you wanted

Or you're getting a GitHub page with a similar name, or worse, a malicious GitHub page that instructs you to download the npm package you're looking for from a typo squatted version of it.


Also NPM is the only source that can show you the code you're actually going to get whether you download and inspect the tarball or you use NPM's built-in code explorer.

A github page really isn't what I want at all when asking questions about an npm package except for the fact that I'm used to its code browser, so I tend to click it out of habit.


I, like many other developers, are lazy. When I search for a package, or when I want for information about a package, I just search using Google, same as when I search for anything else. No cognitive overhead to decide exactly where I should search.

Sometimes the top results for a package is its GitHub page; sometimes it's NPM. I don't particularly care which one it is, except that the NPM page very clearly shows the package name. But I do care that the results are there. And if NPM results disappeared from Google, I wouldn't remember to use NPM's search all the time.

Additionally, what from the argument does not apply to GitHub as well? Perhaps they're better at filtering out spam repositories, but otherwise it's the same thing - free hosting on a domain with presumably high ranking on Google. And if that is also removed from Google's results, NPM packages wouldn't show up anywhere in the search.


> What case is there when you want to find a package in NPM, and information about that package, using Google?

Coz you might want results not only from docs but stackoverflow and other places ?

> Essentially, there is no overlap in the Venn diagram of "searching for a package" and "searching for information about a package". You want one or the other, not a results page with links to both.

Of course there is. I want docs, examples, and maybe opinions vs alternatives if I look to solve problem X with external dependency.


Coz you might want results not only from docs but stackoverflow and other places ?

Of course there is. I want docs, examples, and maybe opinions vs alternatives if I look to solve problem X with external dependency.

You don't need the link to the package in NPM to be in the results for either of these examples.


You're trying real hard to tell people how they _shouldn't_ be doing their work, maybe accept that your opinion, while valid, is just that - your opinion - and that others have their own equally valid ways of approaching their work and their searching?


I'm suggesting that it wouldn't be a problem if NPM switched off indexing, and if the article is correct that half of packages are spam then it'd actually be significantly beneficial.

The broader point is that by expecting Google to be a single interface to the entire internet, and refusing to accept that there might be some places you need to go to directly, we make the problem of spam worse. Using Google for navigation when you know what site you want rather than using that site's search feature incentivizes spammers to abuse things they would otherwise ignore.


But I want a single search bar that just magically gives me the right results. Given the enthusiasm for GPTn I think a lot of people do too.

Whether that incentivized spammers to spam, or Google et al to improve their software (or risk being outcompeted), doesn’t really seem like a “me” problem. I can’t change these things.


I sometimes run topical searches like <protobuf site:npmjs.com> to discover packages if I don't know the package name ahead of time. It would be annoying if NPM were not indexed at all.


It would be annoying if NPM were not indexed at all

More or less annoying than NPM being used for hosting spam?


That seems like a false choice. Deindexing is not the only way to solve spam. Plenty of other websites have found solutions and didn't have to pull themselves from Google.


What? Nearly every time I search a package name in Google, I'm trying to get to the npm page. And I want to find the matching npm page so I can click from there to the associated GitHub, since it's the most trustworthy way to know I'm browsing the source of that specific package.


Nearly every time I search a package name in Google, I'm trying to get to the npm page.

This is exactly the point I'm making. It's very rare that you want both NPM package pages and internet results. If NPM wasn't indexed it'd solve the spam problem, and the only cost would be people would need to think about what they're looking for and use NPM's search instead when they want the package page.


Ok, I see your point, but this creates another risk that you could end up on the GitHub page of an imposter repository that directs you to npm install from a typo-squatted malicious version of the package you're looking for.


As apposed to Google serving a typo-squatted malicious version of the package above the one you're looking for, directly from npm registry?


At least when you get to that page you can see download metrics, etc that are not available on GitHub.

That's not to say you don't have a point. It's kind of a damned if you do, damned if you don't situation with multiple underlying and partially conflicting causes (tyosquatting vs. SEO spam).

IMO, the best solution to the SEO spam is for npm to increase the burden of automated signup. Add more CAPTCHAs or even phone verification. And trigger alerts when there are suddenly thousands of new signups, or thousands of packages pushed from one account.

Also, they could add rel=nofollow to all links on the page. This would make it less of an attractive target for SEO spam (but not entirely, since the page itself might still rank highly and the spammer doesn't necessarily care about getting link juice out of it, so much as getting traffic to the npm page itself).


There’s a few, but a significant one is that I’m familiar with how npm organizes information whereas package pages organize in many different ways and sometimes put marketing spin on it.

So I like finding npm in my search results so I can see release history and other package metadata.

Also, like I said, npm is more trusted than lots of different developer pages so knowing something is a package is useful and not immediately apparent from going to a project page or GitHub repo.

It’s not that it’s impossible to find this info outside of npm, it’s that it’s easier to mix npm results in.

Also, generally I want to be able to search all relevant info in the universe. Trying to keep track of what exists and is excluded, especially if excluded to prevent spammers, is a waste of my thoughts.


Google search is an extremely common way to discover packages. Disabling indexing entirely isn’t a valid solution.

Downloads are very easy to fake. Usually package managers don’t allow indexing until the package and its author reach a certain age. This allows the team to discover and remove the package before it is indexed.


That seems pretty extreme. Why not just add nofollow to links? That's what websites like Wikipedia do.


how do you garner enough downloads without being discoverable by Google?


It's a fair question, most JS libraries I've discovered weren't directly accessed with Google -> npmjs.com but instead from the library's own page, GitHub, Hacker News, etc.

If I Google a library and end up on npmjs.com I usually just click on a link to the library's repository or home page first.

Of course, it would disenfranchise a bit, but what is another option?


Not npmjs.org's problem. Most languages their dependency managers don't give away indexed flashy web pages for free either, yet discoverability is usually not a problem.


Which languages have dependency managers with a public registry that is not indexed in Google? pypi.org and docs.rs are both indexed in Google, for example. With docs.rs it's even kind of annoying because often the indexed page is for an outdated version of the package.

There's really no reason why the same spammer couldn't target those sites too.


> Other ideas include: do not index new packages before they've garnered enough downloads.

Which would be trivial to automate.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: