Ask YC: How come spammers are not attacking YC News?

pg · on Oct 31, 2008

They are. We currently get about 30-40 spam submissions a day. Turn on showdead in your profile and you'll see it all. The reason we don't get more is that we're very aggressive about killing spams. Most spammers give up eventually when they realize that submitting here generates near zero traffic.

adityakothadiya · on Oct 31, 2008

Thanks PG for your advice.

BTW, when you say "aggressive about killing spams", you mean killing manually, right?

pg · on Oct 31, 2008

Some spam gets killed automatically. Some gets flagged either by filters or by users (there is a flag button on stories after you get over a certain karma), and killed manually by editors. It's very rare now for a spam not to at least get flagged.

pmjordan · on Oct 31, 2008

I think there are algorithms in place that auto-flag spammy submissions to bring them to the attention of moderators more quickly, but as far as I know, actual deletions are manual.

cdr · on Nov 1, 2008

30-40 a day? The OP has it right then. Reddit probably gets 30-40 a second.

jasonkester · on Oct 31, 2008

Pretty much every site with user generated content is overwhelmed with people trying to post spam. For my site, I use a combination of javascript human detection, bayesian filtering, and aggressive human intervention (including single click "spam this" links on every piece of content when logged in as an Admin)

It's worth noting that since late 2007, a significant portion of comment spam is human powered. CAPTCHA style bot filtering doesn't work against it, since it's not bots doing the posting. Bayesian filtering and good moderation tools are essential these days.

tectonic · on Oct 31, 2008

Sounds like a good business opportunity.

Dilpil · on Oct 31, 2008

The sites most vulnerable to spam are ones that a) have a critical mass of readership, especially dumb readership that will click on ridiculous spam links, and b) ones not run by people who are active contributors to the field of spam filtering.

Mistone · on Oct 31, 2008

i cant see how not being an active contributor to the field of spam filtering make your site vulnerable to spammers. Not be vigilant against spammers yes, but you don't need to be in the industry to combat this problem. the dumb readership comment speaks for itself, lets get off the high horse bro.

SwellJoe · on Oct 31, 2008

I think the point was merely that pg has spent a lot of time thinking about the problem of spam (it's one of the things he's famous for), he also wrote the software that runs HN, and when those two facts combine you end up with software that has many mechanisms for automatically preventing spam. It's just that being involved in the fight against spam means his site probably makes use of more cutting edge techniques than sites built by folks who have never dealt with spam before. HN probably also has a much higher "editor to submitter" ratio than most sites, and so a human that has privileges needed to kill spam usually sees it long before it hits the front page.

crabl · on Oct 31, 2008

They're all afraid of Paul Graham.

TomOfTTB · on Oct 31, 2008

I think a lot of it also has to do with the community itself. A site of this size would probably get 300-400 spam messages a day if it weren't for the fact that it's audience would see right through it. Tech people are so concious of Spam that they ignore it out of principle which means spamming a tech site pointless.

As for suggestions...

1. Obviously CAPTCHA. It just makes sense 2. I find keyword blocking very effective. So, for example, if I was running Hacker News I'd block any news item containing the word Viagra that was submitted by a user that is under a certain feedback level (like, no feedback, for example). With one caveat which is to give them a way to manually verify it (say an e-mail sent to them that allows them to verify they are an actual person and have the item approved) 3. Use E-Mail Spam Block Lists. Lists like SBL, CBL and XBL give IP addresses that generate massive amounts of spam. Many of those same IP addresses generate web spam. 4. I've never been a fan of this paticular method because I think it's discriminatory to an extent I'm uncomfortable with but many places have special requirements for countries that are famous for spam generation (Russia, China, etc...) Like making users from those IPs jump through special registration hoops.

Hope it Helps!

silencio · on Oct 31, 2008

I don't see how captchas "just make sense", especially in the most common image-based incarnation. I have worked with visually impaired people and the most popular request was always "I want to something on this website, but they have a captcha I can't see (and occasionally an audio captcha that makes no sense), can you sign me up/comment for me/do whatever task?".

As a sighted person, I've even run across captchas that were impossible to decipher, both from some third party solution and from something like recaptcha, the latter which bothers me to no end because sometimes both words are ambiguous.

Whether or not they make sense depends on your audience and your site and your implementation of it.

kwamenum86 · on Oct 31, 2008

Part 2 sounds powerful but it would make the submission process less simple and maybe less user friendly for new users.

TomOfTTB · on Oct 31, 2008

Well you'd only do it on words that are almost certainly spam. Like Viagra or male impotence or...well, you get the picture. It works on the theory of "this word would almost never be used legitimately in a post so it's almost certainly spam"

I use this on my mail server and with 200 users I've yet to ever get a false positive.

janm · on Nov 1, 2008

> I use this on my mail server and with 200 users > I've yet to ever get a false positive.

How do you know? I don't see how you would measure that; if you can figure out it is a false positive, you have discovered a better filter. You might get user complaints, but the absence of user complaints doesn't prove you have no false positives. (Although the presence of user complains could prove that you do.)

Also: The assertion that everything to do with viagra is spam makes it very difficult to have a discussion about viagra or spam. For example, this posting would be rejected.

TomOfTTB · on Nov 1, 2008

If you read my initial post I said specifically that it can't be just a flat out block. What you do is stop it and send an e-mail to the person who posted it asking them to verify they are an actual person.

That's both why it works even if you want to discuss viagra and how you can tell if you are getting too many false positives.

adityakothadiya · on Nov 1, 2008

Thanks a bunch for your excellent suggestions. I'll look into them.

ilamont · on Oct 31, 2008

We have a problem with comment spam on our site (a news and prediction market site, using Drupal). We introduced captchas, activated nofollow, all to no avail -- there are some very persistent spammers who will still go through the trouble of entering captchas just to have their stupid links show up at the bottoms of comment threads. It's not a huge issue, but it's definitely an irritant and added cost, in terms of the staff time required to clear it out.