We're talking about packages that don't even come with code
> More than half of all new packages that are currently (29 Mar 2023) being submitted to npm are SEO spam. That is - empty packages, with just a single README file that contains links to various malicious websites.
Yeah once you cut the obvious they will get smarter but at least some will leave to look for other easier target.
Spammers just try to find something that ranks high in SEO and costs them nothing, if repository stops being that most will leave. Most other package repositories don't have that problem to such degree
> unlisting a valid package could break project
... and about packages that most likely are NOT used as dep anywhere
> Let’s say there’s 10 spam uploads per hour and it takes you 1 second to verify a package is spam and remove it. That’s 30 minutes a week just dealing with spam. While I was on the .NET package manager, we had the on-call engineer handle this thankless chore.
No need. Just add flag button where a package can be flagged for a check. Users will do the flagging for that so at least you won't have too many valid packages to verify
> Could you detect these packages at upload time? Yes, but spammers will change their patterns once the package ecosystem gets too effective at detecting current patterns. Perhaps machine learning could help, but often times package manager teams are small and don’t have expertise in this area.
With AI I'm afraid it might get awfully close to "newbie user just publishing package full of shit code"
> Spammers just try to find something that ranks high in SEO and costs them nothing, if repository stops being that most will leave.
This is not true. Spammers will continue trying even if you are very good about removing spam packages. Source: worked on a package manager for 5 years.
> Most other package repositories don't have that problem to such degree
They do, you’re just not seeing it because they’re actively removing packages. That said, NPM is the largest package ecosystem and likely receives the most spam.
> Users will do the flagging for that so at least you won't have too many valid packages to verify
The trick is to have detection that’s accurate enough that you feel confident removing packages without human intervention.
Package managers have likely already built lots of tooling to detect potential spam and then bulk remove them. That’s how they manage thousands of spam removals per week in a reasonable amount of time. Nonetheless, human verification is necessary due to the “left pad problem”. This takes time due to the sheer quantity of spam.
> More than half of all new packages that are currently (29 Mar 2023) being submitted to npm are SEO spam. That is - empty packages, with just a single README file that contains links to various malicious websites.
Yeah once you cut the obvious they will get smarter but at least some will leave to look for other easier target.
Spammers just try to find something that ranks high in SEO and costs them nothing, if repository stops being that most will leave. Most other package repositories don't have that problem to such degree
> unlisting a valid package could break project
... and about packages that most likely are NOT used as dep anywhere
> Let’s say there’s 10 spam uploads per hour and it takes you 1 second to verify a package is spam and remove it. That’s 30 minutes a week just dealing with spam. While I was on the .NET package manager, we had the on-call engineer handle this thankless chore.
No need. Just add flag button where a package can be flagged for a check. Users will do the flagging for that so at least you won't have too many valid packages to verify
> Could you detect these packages at upload time? Yes, but spammers will change their patterns once the package ecosystem gets too effective at detecting current patterns. Perhaps machine learning could help, but often times package manager teams are small and don’t have expertise in this area.
With AI I'm afraid it might get awfully close to "newbie user just publishing package full of shit code"