Modern anti-spam and E2E crypto

petercooper · on Sept 6, 2014

It's amazing how little sender reputation can count for with Gmail in the face of other features, however. I have a good reputation as a sender but also send almost a million mails a month and I spend a lot of time investigating oddities in Gmail deliverability.

All of my mails are newsletters containing 10-30 links, and more than once I've found the mere inclusion of a single link to a certain domain can get something into spam versus a version without that link, often with no clear reason why (domains that are particularly new are one marker, though). Or.. how about using a Unicode 'tick' symbol in a mail? That can get a reputable sender into Spam versus a version without the same single character (all double tested against a clean, new Gmail account) :-) Or how about if you have a link title that includes both ALL CAPS words and ! anywhere? Your risk goes up a good bit, but just go with one of them, you're fine..

I now have a playbook based around numerous findings like this, some based on gut feelings looking at the results and some truly proven, and even with my solid reputation as a sender, I'm having to negotiate a lot content-wise each week. But do I like it? Yeah, in a way, because it's also what stops everyone else being a success at it.. Gmail sets the bar high! :-)

(Oh, a bonus one.. include a graphic over a certain size? Your chance of ending up in the Promotions folder just leapt up. Remove it, you're good. It doesn't seem to be swayed much by actual content. So I've stopped using images where at all possible now and open rates stay up because of it.)

higherpurpose · on Sept 6, 2014

I think the author has developed too much the kind of thinking he needed to fight spam at regular e-mail companies.

I think there could be multiple, relatively easy methods to avoid encrypted spam.

Someone here already suggested the first email being a "poke". And only if you send a poke back, would that user be allowed to send you an e-mail.

The user could also have some description about him, from his profile, appear when you hover over his profile image or whatever. If you receive an e-mail say from a company you're expecting to receive email from, then you could poke back, so they can send you that email. I mean there should be ways to make it easy for people to know who's a total stranger that could be a spammer, or someone trying to reach out to them for good reasons.

Then you could also have the emails under different labels by default. All the trusted e-mails would come to the regular Inbox, while the rest will go under a different label.

As you said, the email provider could also see the user's reputation over time, and if he's a spammer or not.

And these are just some easy solutions we can come up with almost immediately. I'm sure there can be others with a little bit more thought put into it. I certainly don't see encrypted email as some kind of "doomsday scenario" like the author predicts in the post.

petercooper · on Sept 6, 2014

Google does a little of this already although the mechanism is not as direct as your version. E-mail from certain senders gains "importance" based on your interactions with that sender, such as if you'd first sent a mail to that address, if you'd ever replied to that address, if you open a certain amount of mail from that address, etc. Mail from senders considered "important" is then more likely to hit your inbox.

It seems to work reasonably well, although there are some interesting ways you can game it. One I learnt from the Internet marketing world was some list builders (using legit methods, but perhaps promoting things that often get caught by spam filters) hire people or implement techniques to encourage new list signups to reply to mails sent from the same address as the list by asking them questions, etc.

mike_hearn · on Sept 6, 2014

Actually the "poke" method would work and I suggested it on a different thread on that mailing list. It's the S/MIME model although these days you'd just stick an ECC key into a header and sign it with DKIM, then upgrade the clients. Doesn't have to be technically complicated.

There are at least three major downsides:

1) You still leak lots of metadata and the full data of the poke including most obviously the subject line.

2) Do users understand that their spangly new "encrypted mail" actually fails to protect a lot of important data? What if they (gasp) came to rely on it? I'd want to see usability studies showing a clear understanding of what is protected and what isn't.

3) You break other features that rely on the server being able to see content, like search, and the ads that pay for all of this.

chii · on Sept 6, 2014

This mechanism would cause so much phishing that a whole new type of war would begin based on it.

DanBC · on Sept 7, 2014

How are pokes different to regular challenge response whitelists, and how does poking avoid the problems of CR?

haroldp · on Sept 6, 2014

Gmail deliverability and rendering is the new IE6.

TeMPOraL · on Sept 7, 2014

I'd say GMail deliverability is the new SEO - as long as you're honest, kind, and don't try to cheat/abuse people, you'll be fine.

RadioactiveMan · on Sept 6, 2014

Can you share that playbook? In particular the truly proven findings?

petercooper · on Sept 6, 2014

I need to codify it as it's just notes and numbers scattered across experiments for now, but it's something I plan to do as I want to blog about each example (along with all of the other weird things I've learnt in the e-mail business so far).

I realized I should add a note, however, that everything I've said only applies to bulk e-mail (and sent through systems with a reputation for such) and not transactional or manual e-mail which suffers from fewer oddities for obvious reasons.

hnha · on Sept 7, 2014

make it an ebook and you might get some money out of it ;)

baudehlo · on Sept 6, 2014

You think spammers aren't going to read it too?

RadioactiveMan · on Sept 6, 2014

I suspect that spammers already have their own.

baudehlo · on Sept 6, 2014

My real suggestion is that you share this play book and the spammers stop doing those things, making it worthless.

runeks · on Sept 6, 2014

> A possibly better approach is to use money to create deposits. There is a protocol that allows bitcoins to be sacrificed to miners fees, letting you prove that you threw money away by signing challenges with the keys that did so.

This wouldn't work, because a miner can easily pay himself any amount of bitcoins that he has saved up in fees, and include this transaction in his own block (not broadcasting it). Thus he can basically create these "deposits" for free, and sell them for a profit.

That's the thing: whatever you try as a counter-measure, you always come back to money: in the above scenario, money would replace "deposits" because "deposits" would just be sold on the open market for money. Proof-of-work becomes money: if something important requires proof-of-work, you can be sure that a web app would surface that performs proof-of-work in exchange for money.

It always comes back to money, because whatever restriction you put on something, whether it be "pay fee to Bitcoin miners", "Solve proof-of-work puzzle", or something else entirely, these things will always end up being sold for money in an efficient market, because of the increased efficiency of division of labor: why should I use my inefficient smartphone to calculate proof-of-work, when I can pay a service with custom ASICs to do the job for me at a fraction of the cost?

As far as I can see, the only alternative that can work besides money is something that cannot be sold for money. And I can't come up with anything that fits this requirement.

thefreeman · on Sept 6, 2014

Not sure why you were downvoted. While I may or may not agree with your opinion I think you expressed it in a completely reasonable fashion and made a number of interesting points.

tomjen3 · on Sept 6, 2014

But your smartphone is something you already have. If it takes ten seconds then you already paid for those ten seconds when you brought it.

Of course spammers can buy their services but when the price for normal sending is effectively free for normal users you can jack up the price for spammers to make it too expensive for them.

A nice benefit is that it forces sites to use something other than email, such as RSS, since they can't afford to send newsletters anymore.

comex · on Sept 6, 2014

Maybe, but the performance difference between a power-strapped smartphone CPU and an ASIC tailored for the specific task is so massive I doubt you could make it expensive enough for the latter while maintaining a reasonable experience for the former.

IgorPartola · on Sept 7, 2014

I see two solutions to it. First: have the phone include a chip specifically for doing this type of work. Second: make the calculation such that it cannot be performed faster on cheap hardware specialized for the task. I believe that is the peppery of scrypt, and the reason why we do not see LiteCoin ASIC's yet.

tomjen3 · on Sept 7, 2014

You just need to include scrypt in it somehow. You can't hardware paralize that because it is memory hard and memory isn't as easy to parallize.

alexjeffrey · on Sept 6, 2014

having everything come back to money is a good thing though - a user can afford to pay e.g. 1 microdollar per email sent but if a spammer is sending 10 million emails a day, they can't afford that level of operational expense.

salmonellaeater · on Sept 8, 2014

It's sufficient to just destroy the bitcoins by sending them to a non-existent address. Alternatively, they could be donated to a third party such as a charity or an open-source software foundation.

sounds · on Sept 6, 2014

One important concept that seems to be missing from the discussion is Sender Stores.

Email currently uses a Receiver Stores model. SMTP servers can relay messages, but in almost all cases the message is transmitted directly from the originator's network to the recipient's network. The storage of the message only effectively changes _ownership_ once, even if the message headers say it was forwarded many times.

That makes email a Receiver Stores model: the recipient's network is expected to accept the message at any time and then hold it until the recipient comes to look at it.

Some of the bitcoin messaging protocols propose a Sender Stores model. That is, the message may be transmitted any number of times but the recipient's network is not responsible for long-term storage. The sender's network must be able to provide the message at any time up to the point when the recipient actually looks at the message.

There are some obvious restrictions such as requiring that the message be encrypted with a Diffie-Helman key (negotiated when the message is first transmitted to the receiver's network) to reduce the feasibility of de-duplicating millions of messages. And in order to prevent revealing exactly when the recipient reads the message, the recipient's network doesn't ack the message for a while.

Ultimately all of this is just designed to make bulk email (slightly) more expensive. Spammers run on very, very thin margins. But it doesn't do anything to solve the problem of account termination or blacklisting.

fanf2 · on Sept 6, 2014

The problem with the sender stores model is that the sender does not need to store anything: they just generate the spam message at retrieval time. So it does not actually increase their costs. Spam moves to the notification mechanism that tells receivers that a message is available: this is just as unsolicited as in current junk mail, and needs to contain enough information for the receiver to know if it is worth retrieving the message. All the current spam and anti-spam techniques will apply fairly exactly to these notification messages.

dredmorbius · on Sept 7, 2014

A fair point, but it does require that the sending host persist in its network location (or have continuously updated DNS which reports its present location).

Since early recipients of the spam will likely report it, it's fairly likely that subsequent retrieval attempts will find a downed (or sanitized) host, no longer delivering spam.

This will reduce the amount of spam actually delivered, and the spammer's production / revenue margins.

gioele · on Sept 6, 2014

What you call "Sender Stores" is at the basis of djb's IM2000 that is supposed to replace SMTP and email delivery in general.

See http://cr.yp.to/im2000.html and http://en.wikipedia.org/wiki/Internet_Mail_2000

sounds · on Sept 6, 2014

Thanks, I hadn't seen that before.

End-to-end encryption is needed to increase the load on a spammer as much as possible. Even if the spammer tries to re-generate the message "at retrieval time" the receiver should request retrieval several times (to obfuscate when the message is actually read) and the message should use multiple iterations of a cipher (and possibly HMAC) after an initial DH negotiation, or any other means to increase the cost for a spammer _and_ tie a message to a unique sender for reputation-tracking purposes.

patio11 · on Sept 5, 2014

Worth reading for confirmation regarding the importance of reputation in deliverability, which is something that is not widely understood by non-experts but which has really toothy consequences for many HNers' businesses.

idlewords · on Sept 6, 2014

This is an incredible write-up. Can someone who knows the author plead with him to write up the long history of the Spam Wars that he mentions in this document? I could read this stuff all day.

baudehlo · on Sept 6, 2014

Read Spam Kings. It covers a lot of the history.

ireflect · on Sept 6, 2014

Agreed. I'd buy the book.

beloch · on Sept 6, 2014

I'm not too knowledgeable about this stuff, but would it work if end-to-end encryption was only initiated after the first time somebody replies to an address? e.g. If somebody contacts you for the first time, they lack your public key (and/or a shared secret for authentication) and must send you plaintext. Then, if you reply, you automatically provide them with your public key and/or authentication info to send you encrypted messages in the future. Thus, most spam would be in plain-text, anyone who knows how the system works would avoid discussing sensitive info in the first email they send somebody, and everybody else wouldn't know the difference.

thefreeman · on Sept 6, 2014

Not a bad idea.

One issue I could see though is the initial email would essentially devolve into a "poke". Nobody would bother writing anything in it, which would mean the spam filters would have nothing to filter on.

y0ghur7_xxx · on Sept 6, 2014

>One issue I could see though is the initial email would essentially devolve into a "poke". Nobody would bother writing anything in it, which would mean the spam filters would have nothing to filter on.

that is a good thing: if the first message contains something else that is not just "poke", it's spam.

thefreeman · on Sept 6, 2014

But spammers would just send "pokes" as well. The system would have nothing to go on besides the reputation of the sender when a poke is initiated, so this is no different then just sending encrypted text to begin with.

skybrian · on Sept 6, 2014

It can't just be a poke because why would you reply to someone you don't know? The email has to be sufficiently interesting to convince you to reply, while not containing any confidential information.

thefreeman · on Sept 6, 2014

Because of my original point. That because it is required to start every email conversation, no one will actually put anything in it, or certainly much less context then you get currently. Which just makes spam prevention and detection harder meaning you will have to wade through more attempts which get through.

The person I replied to said that "anything that isn't a poke would be spam". Do you really think spammers are that stupid?

superuser2 · on Sept 6, 2014

MITMing the message with the public key attached would be pretty straightforward and impossible to catch without verification over some other secure channel

macrael · on Sept 6, 2014

This would be solved by public key encrypting and signing both sides of those messages. Nothing stops people from sharing your public key, so you could develop some kind of one off token for everyone instead, that way you can kill those tokens after a time.

superuser2 · on Sept 6, 2014

The question is how do you know that it's actually their public key.

The usual approaches are:

1) verification of they key fingerprint by some other channel, such as the PSTN, but this is obnoxious and feels like tradecraft; you are unlikely to get normal people to do this for normal communications.

2) certification of trust based on 3rd-party verification of government identity documents or control of some address.

3) the web of trust. Might work well for a bunch of security-conscious HN types, but unlikely to be a good solution for people such as our mothers who have neither the cryptographic background to make intelligent decisions about signing keys, nor the inclination to care.

click170 · on Sept 6, 2014

While what you say is true, it would at least be a step in the right direction towards better privacy.

It would provide protection against passive snooping (NSA/GCHQ) even if it wouldn't prevent active attacks.

Edit: Typo

alexjeffrey · on Sept 6, 2014

in my mind, the first email would be encrypted using a public address obtained by asking for the key from the receiver's domain's server, or otherwise leveraging the DNS for the receiver's mailserver.

thaumaturgy · on Sept 6, 2014

Well this is pretty neat.

I've been working on custom software to improve the spam filtering on my mail server for the last year (side project). It currently works by letting hosted users forward spam messages to a flytrap account, and then the daemon runs, reads the forwarded message, tracks down the original in the user's mail directory, does a whois on the origin in the mail headers, consults its logs, and then adds a temporary network-wide blackhole to iptables.

Originally it was intended to work alongside SpamAssassin and SQLGrey and all that, but last night I started considering replacing SpamAssassin altogether. I love SA, but the spammers are beating it regularly now. My TODO notes in the code actually say, "reputation tracking for embedded URLs, domains, ccTLDs and gTLDs, sender addresses, and content keywords." I wrote the first bits of code for reputation tracking this morning.

It's not much of a step for the software really, because it already uses embedded URLs in a message as part of the profile "fingerprint" for finding the original message from a forwarded version.

But I'm a bit chuffed to hear that I'm on the right track, considering how effective Gmail's tactics have been. :-)

Small service providers have it really tough right now. Users don't tolerate any spam at all. A few years ago, the state of the art for small independent services was SpamAssassin + SQLGrey (or other greylisting) plus a few other tricks; that's not sufficient anymore, and most of us smallfry lack the resources to come up with something much better.

After just 6 weeks in production, the software already has 20+million IPs blocked at any given time.

baudehlo · on Sept 6, 2014

I think SA has suffered from some of the original core developers (myself included) moving on to other projects in a completely different tech area. The good news is that other projects have taken up some of the mantle, like Haraka, check out the karma plugin. It does some amazing blocking of spam and penalizing clients.

Beyond that also one of the things SA doesn't do well is actually rejecting hard on sensible blacklists like the CBL. We worked hard to make everything heuristic based but it wasn't always the right choice. Some things need to be black and white. There's some code on SA for short circuiting now but it's not really the best solution. In my own spam filtering I have a bunch of hard rejects and they work really well.

Anyway, check out Haraka or Qpsmtpd for solid anti spam mail serving solutions. They work really well.

patio11 · on Sept 6, 2014

Hey wait, I did vaguely recognize your username from somewhere. Thanks for SpamAssassin! I spent more hours spelunking through your guys' source code and community rulesets than you'd want to know, about a decade ago, while working for the anti-spam group in our R&D department at a tech incubator.

xorcist · on Sept 6, 2014

Thank you for SA. I do not share the experience that it is outsmarted, I still achieve > 98% accuracy just as I did ten years ago. That is really a testament to all the hard work of its developers.

baudehlo · on Sept 6, 2014

This sounds a lot like DCC. Have you considered adding it into your filtering layer?

oalders · on Sept 6, 2014

Do you have any plans to release this software? I've got a small hosting business and SpamAssassin seems to be getting beaten on a very regular basis.

thaumaturgy · on Sept 6, 2014

Yeah, I've been thinking about that recently.

There are some challenges though. Probably the biggest one is that it's designed for a specific mail server setup: postfix + dovecot (configured for maildir) + fail2ban + php (the code is in php, because it was convenient) + mysql. I don't know yet how portable it will be.

If you'd like to try it anyway, let me know and I'll post what I've got to GitHub in the next few days.

edit: alternatively, I've been more seriously considering making the current network ban list available as an RBL. Since I already have DNS servers, it would be pretty trivial to do.

baudehlo · on Sept 6, 2014

Make sure you read the DNSBL rfc if you implement that :)

bgaluszka · on Sept 6, 2014

I run personal email server with postfix + dovecot and was thinking about replacing SA with something custom written in php. I would love to see what you have written.

oalders · on Sept 6, 2014

Your setup sounds pretty much like mine, so it would be portable in my case. :)

Having said that, if this were available as an RBL that would also be really great.

zokier · on Sept 6, 2014

One thing nice about E2E crypto in messaging is that it implies strong identities, which most importantly allow building whitelists with high level of confidence. And of course if we can make those identities costly to acquire/burn, either by proof-of-work or even just with a CA model, that alone should cut spam significantly.

ch · on Sept 6, 2014

Couldn't some form of proof-of-work system be used to increase the cost of sending a message without it having much of an economic impact on a casual sender? Was that what he was alluding to with the "burning bitcoin" reference?

diafygi · on Sept 6, 2014

Funnily enough, proof-of-work was originally invented to combat spam and denial of service attacks[1][2]. I asked that in a reply, and it seems that the large gap between server compute power and mobile compute power would make a reasonably taxing proof-of-work system too costly for mobile phones[3].

[1] - https://en.wikipedia.org/wiki/Hashcash

[2] - http://hashcash.org/papers/pvp.pdf

[3] - https://moderncrypto.org/mail-archive/messaging/2014/000782....

drfuchs · on Sept 6, 2014

Just to be clear, "Pricing via Processing or Combatting Junk Mail" (Crypto 1992), by Dwork and Naor, invented the idea of Proof Of Work, suggested it for fighting Spam (before the term was even coined!), provided actual functions, and more. This predates HashCash by half a decade. [Edited to de-obfuscate.]

diafygi · on Sept 6, 2014

Ummm, that's the paper referenced in [2]…

marcell · on Sept 6, 2014

I've been working on a system that implements this, called "bitnet".

The anti-spam system is basically what you said: you spend a small amount of bitcoin, and the server grants you some large number of "tokens". These tokens can be used to perform actions which use resources on the server, either storing messages, or getting messages. The exact price is configurable, and currently set low [1].

There is a second mechanism that I use, which is intended to lower costs for the average user, but still prevent spam. Essentially, in each bitcoin transaction, a small "transaction fee" is required, which goes to the miner. It is possible to prove, via a cryptographic signature, that you were the originator of a transaction, and therefore the person who spent that transaction fee. The bitnet system grants tokens to people who can prove they spent money on transaction fees [2]. The idea being, a typical bitcoin user will accumulate transaction fees anyways, but a spammer will have to go out of his way to send bulk messages.

[1] https://github.com/ortutay/bitnet/blob/master/bitnet.go#L50 [2] https://github.com/ortutay/bitnet/blob/master/bitnet.go#L38

ams6110 · on Sept 6, 2014

Heh. Bitnet was an early (1980s) point-to-point network built out of leased lines between educational institutions.

http://en.wikipedia.org/wiki/BITNET

zrm · on Sept 6, 2014

> Couldn't some form of proof-of-work system be used to increase the cost of sending a message without it having much of an economic impact on a casual sender? Was that what he was alluding to with the "burning bitcoin" reference?

The idea seems to be that you provably "burn" a small amount of Bitcoin to get an identity. An innocent person can then carry on using that identity forever without doing any more computation. Meanwhile a spammer will ruin that identity's reputation almost immediately and then have to pay again to get another one.

chii · on Sept 6, 2014

so the war's front moves to botnets - where spammer first installs malware/virus to take over a user's machine, in order to send email _from_ that user's identity. Given that a user's machine is more easily compromised, the burned identity wouldn't cost more than current botnet acquisition.

bradleyjg · on Sept 6, 2014

The problem is smartphones don't have much processing power, and what they do have uses batteries. So either they are trivial for servers to generate en masse or prohibitive for smartphones.

There's also botnets, whose resources hijackers are happy to exploit.

ch · on Sept 6, 2014

Sure but certainly one could still make it costly enough to prohibit mass sending but not too costly to drain a cell battery under casual use. Or am I underestimating the level of complexity needed for a viable proof of work?

idlewords · on Sept 6, 2014

Something lightweight enough to be feasible on a cell phone is likely enough to be trivially computable in bulk by a decent bit of hardware. Think about all those obsolete bitcoin miners out there.

ch · on Sept 6, 2014

It sounds like proof-of-work alone is insufficient. The bit* type systems sound promising but that seems to preclude mobile. Interesting. What a funny problem we have built for ourselves.

Interesting that SMS appears to have simply legislated the problem from existence (or so says the article), I suppose if anyone can connect to a telephony network for cheap the problem would persist there too.

pmorici · on Sept 6, 2014

https://en.bitcoin.it/wiki/Proof_of_burn

OpenBazzar a bit-torrent like p2p version of ebay uses proof of burn to make bad behavior expensive.

awt · on Sept 6, 2014

AKA Bitmessage

ch · on Sept 6, 2014

Yeah like bitmessage, but the network overhead (all users see all messages) of bitmessage might be prohibitive for battery powered clients, no?

awt · on Sept 6, 2014

Probably not a huge deal as phones become more and more powerful.

ch · on Sept 6, 2014

I don't see how its not a big deal.

If all messages are seen by my phone, just to be inspected and discarded, then my phone is consuming bandwidth and processing cycles on irrelevant data. And unless my phone has free bandwidth, and the processing cycles don't consume power (or it has infinite power), how can it ever not be a factor?

awt · on Sept 6, 2014

It'll get figured out.

sgentle · on Sept 6, 2014

I wonder if this would be an interesting application for Homomorphic Encryption. True FHE is still wildly inefficient, but there are some interesting applications like CryptDB where sort-of-Homomorphic-Encryption is feasible for certain restricted operations (keyword search being one).

In a system like that, maybe you could send your encrypted message along with some encrypted keywords that you consider to be spammy to some centralised service. That would, at least, avoid some of the client-side-filtering-is-too-hard problem.

As far as reputation, this might be one of the rare times where a Web of Trust seems like a good idea. Generating lots of false positives and negatives would be a lot less powerful if the value of those reports was filtered by how much you trust the account that made them. With email you already have an implicit source of trust, in that anyone you mutually email with is unlikely to be a spammer.

Seems like a really interesting problem space to be involved in.

fdsary · on Sept 5, 2014

Btw, this is written by Mike Hearn, who'd I'd like to nominate to hacker of the year. Super cool guy, mad respect to him :)

skrebbel · on Sept 7, 2014

I don't understand the objection against email costing money. I send you a mail? I pay $0.0001 to you. You reply? You pay $0.0001 back.

There is idea that this somehow blocks access to email for people who have a hard time paying for things on the internet (for whatever reason), but it is misguided: everybody who has access to the internet pays for it. ISPs could easily give every subscribed 10000 free emails every month.

Texting costs money and yet people do it.

What am I missing?

mike-cardwell · on Sept 7, 2014

"What am I missing?"

The specs and the software for a start.

Then there's the fact that it's not the spammers that will end up paying, but the people running the systems that are compromised and abused to send spam, be they shared hosting servers or home computers.

Then there's the network effect. I'm not going to feel good telling my friends and family that it will now cost them money to email me. Especially when they can just contact me using Facebook instead for free and without having to set anything new up. Especially when the email service they're already using probably wont even support this new fangled paid-email system.

It would be a massive task to add this functionality to email, and it wouldn't stop the spam, so it's not worth it.

anon4 · on Sept 6, 2014

So why not use one key per source, kind of something like this:

Alice wants to receive mail from Bob. Alice generates a public/private key pair and gives the public half to Bob. When Bob wants to send mail to Alice, Bob uses the public key Alice gave him. If Alice receives spam, she marks the public key it was encrypted with as "fuck it, the spammers got it" and never receives mail with that key again. Then she notifies Bob that the key he had has been compromised and sends him a new one. Alice could then, after Bob has lost her key to spammers one too many times, simply decide not to talk to someone like him.

This would give mailing list operators a large incentive never to share your email with anyone, otherwise you could just block them forever.

On the flip side, if the mailing list is really important to you, the operator could reject your new key and tell you you'll either receive their spam or you won't be part of the mailing list. Though I don't see why someone would do that in favour of just including ads in the mails themselves.

chii · on Sept 6, 2014

Lets suppose Bob was a spammer pretending to be a mailing list operator.

Alice gives her key to Bob, in the expectation that Bob would not be sending her spam. Bob then sends both spam, as well as legit mail that alice did want. Assuming that alice does not want to stop receiving the legit mail, but want to stop the spam, how does she do it in this scenario?

If alice blacklist the key for bob, but sends a new one, the situation didn't improve. If she doesn't send a new one, she stops receiving legit mail (that she wants, and cannot go without).

PaulHoule · on Sept 6, 2014

I think reputations are part of it but there are other aspects to.

I switched to gmail because my mail with every other provider and client was choked with phishing messages from major banks. So much work has been done on preventing origin spoofing in 2014 that accepting phony mail from chase.com is a sign of gross incompetence.

p4bl0 · on Sept 6, 2014

The discussion here is already quite long so maybe I missed it, but I don't see anyone asking (or answering) the first question that came to me while reading the linked email:

Why is the cost of end-to-end crypto never taken into account?

I just can't believe that we have reached a point where it is possible to cheaply mass mail the way spammers do if you need to encrypt each email for each recipient. That alone should be disuasive enough, at least that's always what I thought. If I'm right, all the discussion about the need for client to extract features from emails and send them to a necessarily trusted centralized third party is useless. But I may be missing something, where am I wrong?

hueving · on Sept 6, 2014

Even at a reduced rate, end to end crypto means spammers will have a much higher success rate since they don't have to fight a centralized spam system with global knowledge. This more than makes up for the extra time to encrypt a message.

p4bl0 · on Sept 6, 2014

Mh, okay.

But let's make the crazy assumptions that we are effectively in a world where end-to-end crypto is massively used, virtually by everyone.

I guess it would be stupid to assume that the message are encrypted but not signed.

This means that it would be easy to have a list of identities (i.e., keys) which are sending spams, for instance using a web of trust, without users having to disclose any other informations that "I trust these identities, not these ones" (which could be as easy to do as clicking "spam" or "not spam" for the users).

Now that means that the spammers not only have to encrypt every single email for every single recipient but also to generate new key-pairs for almost each encryption.

Of course it is also very easy to mark as spam any email signed with a key that is considered too small (i.e., too quick to generate).

Now if you tell be that still won't do it without a "centralized spam system with global knowledge", I have to seriously rethink a lot of my assumptions about the cost of some computations.

Perseids · on Sept 6, 2014

Some numbers:

- Public key encryption: 2048bit RSA can achieve up to about 200k encryptions per second on a high end cpu [1]. ElGamal-like encryption schemes using elliptic curve cryptography can get to about 100k encryptions [2].

- Public key signatures: RSA is hopelessly slow for good keys. There is no reason to use strong keys for this case, though. If you reuse primes you can use a 1000 primes to generate a million RSA modules (you need two primes per public key), effectively eliminating the costly prime search. Using elliptic key cryptography the key generation step is dirt cheap: With ed25519 you get 200k key generations per second [3]

[1] http://bench.cr.yp.to/results-encrypt.html [2] http://bench.cr.yp.to/results-dh.html [3] http://bench.cr.yp.to/results-sign.html

Formula used for each of the above numbers: 3500x10^6/[median for Intel Xeon E3]x4

p4bl0 · on Sept 6, 2014

Thanks for the number, it's very informative.

I guess there's still the solution to only accept message from trusted keys but that means that 1- we have to be able to detect signature-rings from spammers in the graph (that clearly should not be a big problem), and 2- that there is an easy and realistic way to have a legitimate key in the trusted network before it can be used to sends emails…

Anyway, all that is very interesting from a theoretical point of view, but we are far from being in a world where end-to-end crypto is massively used.

pyre · on Sept 6, 2014

> trusted keys

The problem with Web-of-Trust is that once you try to roll it out mainstream, most people will just click "accept" or "I trust this key" without thinking about it. Instead of "signature-rings from spammers" you would have spammers duping less-aware real people into trusting their keys.

That said, it would seem to be a longer feedback loop to get a number of actual people to open emails and trust their keys.

p4bl0 · on Sept 7, 2014

> most people will just click "accept" or "I trust this key" without thinking about it

What if they don't have to do that because it's transparently done for them when they use the "flag as spam", "not spam", and "reply" buttons for instance?

xorcist · on Sept 6, 2014

You could set up your MUA today so that anyone you've already emailed with gets put in a priority folder. No need to mess with end to end encryption just for that functionality.

Theoretically, any spammer could research you enough to send to mail purporting to be from your friends, but realistically no one would bother.

rando289 · on Sept 6, 2014

Encryption is pretty darn cheap in cpu cost. And they have botnets to do it for them.

p4bl0 · on Sept 6, 2014

Of course using end-to-end crypto does not only mean "encryption", see my other response.

dochtman · on Sept 6, 2014

I submitted this without the ?hn at approximately the same time. Pretty weird that this one gained traction while my submission did not.

https://news.ycombinator.com/item?id=8275787

hendzen · on Sept 5, 2014

Mike Hearn is also a core Bitcoin developer, as well as an HN commenter. Hi Mike!

lazylizard · on Sept 6, 2014

could there be a antispam gateway that replies to 'maybe'(as in spam, ham and maybe) mails with a temporary url that hosts a webform, before they reach the inbox? the webform could even limit message length, prevent attachments, be protected by akismet and so on. let the message from the form be actually relayed to the real mail server. and once the recipient replies, automatically whitelist that sender or possibly even the domain?

zerr · on Sept 5, 2014

>we had put sufficient pressure on spammers that they were unable to make money using their older techniques

Could anyone comment how spammers make money actually?

skizm · on Sept 5, 2014

One way is referral links. If you click on a link in my email and buy something on, say, Amazon, I could potentially make up to 10% of the purchase price (Amazon is actually normally around 5% I think).

aslewofmice · on Sept 5, 2014

Direct email marketing

Zigurd · on Sept 6, 2014

Some of my contacts have been using verification gateways/whitelists for email for decades. If spam were to become a problem, I would use one.

loup-vaillant · on Sept 6, 2014

> Botnets appeared as a way to get around RBLs, and in response spam fighters mapped out the internet to create a "policy block list" - ranges of IPs that were assigned to residential connections and thus should not be sending any email at all.

So basically, I can't send email from home? This is… unfortunate. If we want freedom, we need decentralization, and this kills it.

FourthProtocol · on Sept 6, 2014

Hey I've no idea of anything, and don't have a dog in this fight either, but reading everything here I think decentralization is actually the answer. Yes, you'd not get that global reach, and your network of contacts would be severely limited (presumably to those you know). But a decentralised system could do E2E and P2P and run from home. Running that beside traditional, clear-text and consequently spam-proof email strikes me as a sensible balance.

One (open, centralized) system for global comms. Another (closed, decentralized) for secure comms. Maybe even more than one "another", if contexts require different audiences.

loup-vaillant · on Sept 6, 2014

I'd rather not have most of my communications (even mundane ones) read by a third party that also reads everything else. Gmail is bad enough, but this is a really Big Brother.

anon4 · on Sept 6, 2014

You can, but it goes in the spam folder. I receive mails from smartmon and mdadm and have just marked my IP as "definitely not spam, you guys".

loup-vaillant · on Sept 6, 2014

Many corporate email systems don't have personal spam folders. They just redirect suspect email /dev/null, or otherwise make it inaccessible. It has been a problem for me in the past, and I don't even send my email from home.

And there's the case of looking for work. I'm somewhat proud of showing my personalized domain name, but if using something other than a huge webmail can cause it to fall into a spam folder… Fortunately that has yet to happen, but this is one of the reason I hesitate to switch from remote SSH to a physical server at home.

baudehlo · on Sept 6, 2014

You can't send unauthenticated (anonymous) email from home machines, at least not directly. It's not really as bad as it sounds.

zokier · on Sept 6, 2014

Afaik outgoing port 25 (ie smtp) is blocked in large portion of residential connections. And it is a very good thing in the current state of affairs.

loup-vaillant · on Sept 6, 2014

It forces me to trust a remote host. Not good.

Email is supposed to go from the sender's machine to the receiver's machine. That's how it should work by default, that's how TLS connections makes the communication vaguely secure, and that's how it makes it difficult for powerful third parties to have a peek at everyone's communications.

As far as I know, Gmail accounts are only a subpoena away from the US government. But a sheeva plug (or R-Pi) hosted at my home? They need a warrant. Even for countries that don't need warrants, wire-tapping everyone is expensive: it must be done one home at a time.

Now maybe the botnet situation is so bad that it is worth sacrificing our ability to send e-mail. Still, this strikes me as the wrong solution. Blocking outgoing 25 by default is fine, but we need to be able to lift the restriction if we want.

bilalhusain · on Sept 5, 2014

I wish Google provided an API to lookup a sender's reputation so that even a locally deployed spam filter could use the information.

JohnTHaller · on Sept 5, 2014

It wouldn't be in Google's best interest to do so. First, it could be used by spammers to lookup their own reputation for a given IP or domain before deciding whether to move on. Second, it would enable competitors to use some of Google's hard work in their products.

serf · on Sept 6, 2014

a central authority for email reputation is one of the ideas put out by Mike during his presentation at RIPE, not that his ideas are Googles'.

https://ripe64.ripe.net/archives/video/25/

dpweb · on Sept 6, 2014

Wait! Maybe not so fast dismissing this.. G already does this, with web search. It ranks reputations and its publicly available.

Trying to game the system in web search is a huge thing, cause there's a big reward. But G fights it pretty well.

Not such a bad idea G publicly ranks mail senders.

Someone1234 · on Sept 5, 2014

Too easy for spammers to utilise to game the system. Just make a new domain (reputation 0) then keep on sending different emails to see how the reputation changes. The linked reply kind of covers this (e.g. security through obscurity).

ill0gicity · on Sept 5, 2014

I don't believe Google wants to turn GMail into a data broker service, especially one that powerful. Why? Besides the obvious (spammers use it to learn how to beat the system, and other companies use it to compete with GMail) you'll be creating an insane amount of workload in having to deal with all the complaints. The current system of "spam is what our users say it is" leaves very little for debate. As someone who's run large email systems this appeals to my laziness.

superuser2 · on Sept 6, 2014

GMail's size and centralization is its competitive advantage. It's not in Google's interests for you to benefit from its data mining efforts without also contributing to them.

rkuykendall-com · on Sept 5, 2014

I wonder if Google uses "shadowbanning." If a spam account emails another spam account, is the email filtered?

I know that's probably not why it's private, but it got me thinking.

crazypyro · on Sept 5, 2014

They use shadowbanning on newly created accounts at the very least. If you don't have an active token on gmail creation (this is referenced in the OP by the randomized javascript), your account gets tagged and wiped in banwaves.

rwallace · on Sept 7, 2014

> When we started gmails were about $25 per 1000 so we were able to quadruple the price. Going higher than that is hard because all big websites use phone verification to handle false positives and at these price levels it becomes profitable to just buy lots of SIM cards and burn phone numbers.

How does that work? Don't SIM cards cost more than 10 cents?

cjg · on Sept 8, 2014

Not all the accounts need verification by phone.

orf · on Sept 5, 2014

The Gmail spam filter is indeed impressive, but on several occasions I have found 'real' emails being triggering it. Those times were just me browsing the spam folder randomly and I hate to think what else it has swallowed.

_delirium · on Sept 6, 2014

Yes, I think it's too aggressive, at least for my risk-sensitivity preferences. I've actually never gotten spam in my Gmail inbox, but I've had two serious false-positives that caused me problems, along with a number of less serious false positives, like mailing list subscriptions disappearing. That level of aggressiveness is too much for me, and doesn't seem to be configurable so I can tell it to err more on the side of avoiding false positives.

The first incident was that Gmail flagged an important email from my landlord as spam because it contained a forwarded message written in Danish, which the filter deemed to be a language I don't normally correspond in (it is nice, to be fair, that the filter actually tells me why it flagged the message). True enough. But I do live in Denmark, and in fact a mail containing Danish is a very good signal, for me, that it should not be spam-filtered.

I've moved to hosting my own mail as a result, and it's been going well so far. I use a fairly conservative host-based filtering approach. Just blocking hosts whose DNS doesn't match their rDNS rejects >70% of spam attempts, and adding Spamhaus's DNSBL brings that up to >95%. As far as I can tell from perusing the logs, it's quite conservative, and they're all true positives. And at least it rejects (if it's going to) in the SMTP session, so the sender will get a bounce rather than get silently filed into a spam folder, like Gmail does.

I do still get some spam, almost all of it from legitimate free-mail hosts who I can't feasibly filter by host (mostly Yahoo and Gmail). But it's fairly infrequent.

raverbashing · on Sept 6, 2014

Even worse, I've seen Google originating emails being flagged in GMail as spam/fishing

(in mailing lists, and yes, I checked, it seemed to be a legitimate email, so I let the sender know, but never heard back)

ams6110 · on Sept 6, 2014

I have had gmail flag a message I sent to myself as spam.

Oculus · on Sept 6, 2014

Really interesting article until it gets into the Bitcoin talk. I feel like his passion towards Bitcoins seeped a little too much into the article towards the end.

awt · on Sept 6, 2014

No mention of Bitmessage, which provides E2E crypto and anti-spam.

Canada · on Sept 6, 2014

Bitmessage is hardly immune from spam. I've seen it there.

And cranking up the proof of work isn't going to do anything to prevent it. The only thing that prevents Bitmessage from becoming a cesspool is its obscurity.

awt · on Sept 6, 2014

I would not call the random strings showing up in the general chan spam. I highly doubt it is profitable for whoever sent it. Industrial scale spam seems unlikely on Bitmessage.

Canada · on Sept 6, 2014

I get it on some chans I run. I'm not talking about the annoying "test" message either, I'm telling you there is already actual spam.

And the only reason there isn't tons of it is because Bitmessage doesn't have the kind of user base that it's profitable to spam to.

It is much eaiser to spam Bitmessage on an industrial scale and profit than it is to do so against Gmail's anti-spam systems today. If Bitmessage had a hundred million users, then it would be profitable to spam there and it absolutely would happen a lot. In my opinion spam would quickly become the dominant form of traffic on the network, and the broadcast message feature would go the way of VRFY and open relays in SMTP.

joelthelion · on Sept 6, 2014

Can someone explain botguard? I'm not sure I get it.

chii · on Sept 6, 2014

my understanding (which may be wrong - please i welcome all corrections) is that an obfuscated piece of code that is somewhat randomized is generated on the signup page. If you ran it, it produced a token, which is submitted as part of the signup.

If you didn't obtain a fresh piece of this script, but instead either reused, or tried to guess the token, then your signup still succeeds, but is marked as bad. Then, in an undetermined amount of time, a wave of bans would occur on all accounts that got marked as bad.

This prevents signups via scripts, but does so via making signup scripts untrustworthy, and so no one would be willing to put money in for they cannot be sure that it actually works.

joelthelion · on Sept 8, 2014

But what prevents you from requesting another token from Google?

chii · on Sept 8, 2014

I suppose nothing does, except for the resources required to do so (i.e., you need to run your script in a browser, which means lots more overhead and resources).

danso · on Sept 5, 2014

Fascinating read, and as amazing as email is, the OP manages to still make me realize how much I take it for granted:

> So I think we need totally new approaches. The first idea people have is to make sending email cost money, but that sucks for several reasons; most obviously - free global communication is IMHO one of humanities greatest achievements, right up there with putting a man on the moon. Someone from rural China can send me a message within seconds, for free, and I can reply, for free! Think about that for a second.

drzaiusapelord · on Sept 5, 2014

I remember using dial-up to connect to my college's unix system. I fired up the email client (mail? mailx?) and was hesitant for a moment to send an email to England. I thought back to my days using BBS's and worrying about dialing out to per minute charges. I just couldn't believe I could email anyone in the world for free using smtp email.

Obviously, both of us need computers, email accounts, network access, etc but there's no per region metering or anything. The cost of sending an email to someone sitting 10 feet from me or 10,000 miles from me is exactly the same. Mike's right, this is revolutionary.

_delirium · on Sept 6, 2014

In the actual BBS days, I don't recall that being a common point of confusion, oddly enough. I dialed up to local BBSs, and it was obvious where I was dialing because I actually entered the digits, and the modem audibly dialed them. Then sometimes I would correspond with people in other states or countries, through FidoNet echoes or mail. But it was clear that I wasn't dialing them to do so. I transmitted my message to my local BBS, and the BBS relayed my message onwards. I'm not sure at the time I entirely understood what mechanism the BBS used to do so, but I knew that I wasn't myself dialing Norway or Germany to do it, nor paying any kind of destination-based charge. I even played some multi-country multiplayer door games, all for free. So once I got an internet account, it didn't seem too magical!

nsns · on Sept 5, 2014

    Someone from rural China can send me a message within seconds, 
    for free, and I can reply, for free!

Yeah, you both "only" need some machine that can go online, an internet connection, electricity, the required technical knowledge, and to (implicitly) agree to your personal data getting harvested... otherwise it's "free".

scrollaway · on Sept 5, 2014

I am willing to give you £50 in cash for free as long as you come pick it up at my place in Sweden.

This offer is only available for you alone and in person. Are you interested? Because I'd really like to know how your pedantic definition of "free" works out for you when it's obvious that things that require infrastructure (such as online communications, or traveling to a different country)... require infrastructure.

(TLDR: If you don't find it amazing that with a very small indirect investment we can actually communicate with people anywhere in the world for free, you and I need to have a long chat about being spoiled.)

meepmorp · on Sept 5, 2014

Well, you could mail the money. Using the national postal systems that exist in both countries that have established means of transferring mail long distances. Same as the farmer in rural China could send a message to the Google engineer.

It costs money, yes, but if you're sending a one off message and it's not time citical, it's not unreasonable to say that sending the message via email is cheaper overall.

Edit: which isn't to say that communication via the Internet isn't really cool, just that it's not actually free, nor would it necessarily be cheaper than sending a message via other means, depending on the circumstances.

scrollaway · on Sept 5, 2014

You missed the critical "within seconds" in the original message. Email is near-instantaneous.

Are we seriously debating the merits of email over snail-mail on hacker news? I hope this is merely people enjoying being thick and provocative, because the alternative is that there's some massive bubble-blindness going on right now.

meepmorp · on Sept 5, 2014

And I was considering nsns' point about how the costs associated with sending email aren't nonexistent, and how, given a certain set of concerns and constraints, might be higher than other means.

I also thought your £50 example was a bit silly. Of course infrastructure costs money. There's means of communicating and delivering things like money that use infrastructure that require a far lower initial investment for a user, and that might be a better deal for that user, circumstances dependent. And not all communications need to be near-instantaneous.

scrollaway · on Sept 6, 2014

> the costs associated with sending email aren't nonexistent

> Of course infrastructure costs money.

You do realize you're.. well, you're not contradicting yourself, but on the one hand you agree it's obvious and on the other hand you insist it must be pointed out.

My example about money was just that. I could use Western Union or a bank transfer. Those things also have non-nonexistent (sic) costs. If you start pointing out everywhere that "Oh but there's this cost you didn't think about!", you won't be done by next week. You know all those libraries that let you check out books for zero cost? Reading those books requires education, which in turn is no free (No matter how much you wish to contradict that statement, the road you've taken requires you to admit to an unending set of cost upon cost which you don't think about in your day to day life).

You live in a society which expects you to have access to certain things. In the case you can't, there usually is infrastructure in place to help you out at little to no cost.

So again, please, just take this in: You can communicate with people at the other end of the globe. For free. This is not a legal forum and we don't need asterisks everywhere.

angersock · on Sept 5, 2014

Yep. Pretty amazing, isn't it?