Hacker News new | past | comments | ask | show | jobs | submit login

This sounds like an early version of DCC: https://www.rhyolite.com/dcc/

At first glance, I don't see anything that DCC didn't do, what did I miss?

It seems DCC isn't using word vectors at all? Using word vectors you can know that viagra and v14gr4 is the same word, because it is used in the same way in messages. That in turn means you don't need word lists, and can instead build from huge knowledge bases like GloVe.

That, and the fact that a message is sent in bulk isn't actually a very strong indicator that the message is spam, at least in the email world. As one input to a filtering system, it can be useful, but not as a rule applied on its own without consideration for other factors.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
