I run several of the smallest email sending services on the internet. Been doing...

naasking · 2024-05-18T04:27:10 1716006430

If email didn't need fixing, spam wouldn't exist. There's more spam than there is legitimate email traversing the internet. That is a problem worth solving.

You could solve it with existing infrastructure to some extent, eg. your email address is actually a cryptographically generated guid rather than something easily guessed or harvested. If you combine that with a background handshake procedure for introductions, so that all of your contacts get their own guid alias mapped to your canonical one, then you can revoke any of those if they get compromised at any time. Spam is effectively solved.

This is basically like the web of trust, but for email.

rakoo · 2024-05-18T13:42:19 1716039739

> If email didn't need fixing, spam wouldn't exist.

Spam is not a technical problem, it's a societal problem. As usual the tech reflex is to find a technical solution but it is mistaken, once again. Societal proplems require societal solutions, not more tech. The only thing you will achieve with more tech is more segregation between those who have and those who don't, you'll create more issues than already exist

kavok · 2024-05-18T14:47:36 1716043656

Email being so cheap and easy is is a significant component of spam, which is a technical problem to a degree. Spam on Signal, text message, voicemail, Discord, etc… is significantly less present for various reasons (cost, complexity, etc…)

thebeardisred · 2024-05-18T16:36:26 1716050186

:sigh: speaking of getting into the mud pit...

The other side of that balance is that capitalism creating artificial incentives for bad behavior is a significant component of spam.

wizzwizz4 · 2024-05-18T19:14:07 1716059647

But that itself is just a special-case of the principal-agent problem.

rakoo · 2024-05-19T14:12:26 1716127946

Being cheap and easy is not a problem, quite the contrary. That we as a society make this a good thing for spam is a problem, but I don't want to make the system shittier just because of the bad use of it.

kavok · 2024-05-19T17:21:05 1716139265

My work email is virtually useless at this point due to the absurd quantities of spam I receive. I think the OP suggestions would actually make email less shitty.

Any communication medium that is cheap and easy will be relentlessly abused by spammers.

naasking · 2024-05-18T17:44:51 1716054291

> Spam is not a technical problem

Spam is 100% a technical problem. Any "societal solution" will be orders of magnitude more expensive and less robust than technical solutions.

withinboredom · 2024-05-18T18:42:44 1716057764

I suppose snake oil salesmen were a technical problem too?

> Any "societal solution" will be orders of magnitude more expensive and less robust than technical solutions.

Yes, yes, it is quite expensive, though you'd find it would actually be quite more robust.

naasking · 2024-05-18T21:07:55 1716066475

> I suppose snake oil salesmen were a technical problem too?

Snake oil salesmen were not created because of lax technical infrastructure. Spam is not a technical problem because scammers exist, it's a technical problem because the technology is what lets them reach you. The technology can then be amended so they can't reach you and spam is solved. I'm not purporting to solve scams, but to solve the technical mistakes that lets scammers spam you.

withinboredom · 2024-05-19T07:32:11 1716103931

wut. I guess we just need to wait for a big enough CME and we don't have to worry about spam anymore; so in that respect, I guess you are right. Though its like using a hammer on a window to screw in a picture frame.

naasking · 2024-05-19T13:12:44 1716124364

Nothing about my proposal requires centralization, the federated status of email remains exactly as-is.

deanishe · 2024-05-18T18:19:43 1716056383

Junk mail is a lot older than the Internet.

naasking · 2024-05-18T21:05:54 1716066354

So? What does that have to do with email spam?

felsokning · 2024-05-18T06:41:19 1716014479

> ...your email address is actually a cryptographically generated guid rather than something easily guessed or harvested. If you combine that with a background handshake procedure for introductions, so that all of your contacts get their own guid alias mapped to your canonical one, then you can revoke any of those if they get compromised at any time...

Here, you're kicking the problem further down the road, though, to another known attack vector: Directory Harvest Attack[1].

In this case, though, the directory (presumably) contains the guid mapping (which - by definition - would have to be a different guid than the object) and would have to process parsing these guids against the users. (This already occurs on recipient receive for some SMTP servers [just before BDATA/DATA] via the email address).

What would one bad email to an email guid do? Would it force rotation of the guid[s] throughout the entire forest? If so, how would that be communicated externally? How would you communicate it for just the one address, if you just changed the one guid?

Would you, instead, have to keep a guid history to check against -- or lose all of the email between possible compromise and the sender's database update? Would you just keep it in the Transport Queue, until manual intervention could check out email between the possible compromise of the guid and new mail would be received for the new guid? That wouldn't scale for large enterprises.

Keep in mind that nothing has to be sent for recipient validation to occur. The SMTP Server[s] just respond[s] to the recipient block with the next step -- but the caller doesn't have to complete the SMTP negotiation from this point, they already have validation if the addresses (even these proposed guids) are valid.

Tarpitting is somewhat of a viable option, here, but it isn't foolproof.

[1] - https://en.wikipedia.org/wiki/Directory_Harvest_Attack

naasking · 2024-05-18T12:29:58 1716035398

> Here, you're kicking the problem further down the road, though, to another known attack vector: Directory Harvest Attack[1].

Dictionary and brute force attacks don't work against cryptographic ids, so I don't see how this is relevant.

> What would one bad email to an email guid do?

I assume you mean, what would happen if you received a spam message and had to revoke a guid? First, revocation means the guid is no longer valid and to any incoming message, so it acts as if the guid simply doesn't exist.

Second, the idea here is that every entity gets their own guid designating you, so the same guid is not known by more than one entity. This is the purpose of the handshake protocol during introductions. If A and B know each other, B and C know each other, and A and C want an introduction, B triggers the introduction protocol which mints new guids for both A and C that are then exchanged with each other. This can happen transparently without the user seeing what's going on under the hood. Revocation is just a mark as spam button, and introduction is triggered by CC'ing more than one person in your address book (introduction is the trickiest part).

So if A gets a spam message from C, you just revoke the guid sent to C and you're done, any message from C now acts as if A's address no longer exists. This doesn't affect any connections to anyone else.

If B's guid for A is compromised in some way, you can trigger the introduction protocol again to mint a new guid after the compromise is resolved, then revoke the old one.

There is simply no way for spam to gain a real foothold here: they can't guess ids, and if they somehow obtain someone's address book, those addresses are valid only for one or two messages at best, before it gets revoked. The revocation and introduction protocols can happen using the existing protocols in a few different ways, like by exchanging some message types that are not seen by the user. There are definitely some details still to work out but I don't see any real roadblocks.

The only real "problem" is that now all email addresses are effectively private, eg. no globally addressable emails, which is not great for business purposes like info@mycompany.com. You could of course keep running the old email system for this.

withinboredom · 2024-05-18T18:45:27 1716057927

Email can be delayed ... for days, hours, even weeks. What if I set up a dead-man email to you, you revoke the id, then I die? Would you somehow magically receive my email for a revoked id?

naasking · 2024-05-18T19:08:37 1716059317

Well obviously I wouldn't get an email at a revoked address anymore than I would get messages at an email account that I closed. If you want to set up a dead man email, then set that up with an address that isn't shared with anyone else, then there would never be a reason to revoke it.

withinboredom · 2024-05-18T21:23:34 1716067414

I don’t see that really working. I regularly delete personal tokens off of GitHub, especially if they haven’t been used in awhile. I could see the same cleanup happening (or even being forced by disk space usage).

Anyway, I don’t think this idea would work with normal human patterns. At work, we regularly saw people opening emails years after we sent them. Hell, I’ve emailed people years after not talking to them. I just don’t see this working.

naasking · 2024-05-18T21:56:40 1716069400

Why would disk space be an issue? Guids are 16 bytes each. Even if you have 10k contacts, that's only 10k guids your email server has to store. That's 160kB. What's the big deal? You get more spam than that daily. Why wouldn't you persist 160kB to never get spam again?

> work, we regularly saw people opening emails years after we sent them

So? There just really isn't a need to revoke anything until you receive spam on that address. Maybe we're just not on the same page about how this works. Here's a more detailed overview of what I have in mind:

https://news.ycombinator.com/item?id=40402046

withinboredom · 2024-05-19T10:15:43 1716113743

That's ~15gb per billion contacts. There's an estimated ~2 billion gmail users, so we're talking 30gb just to have one guid per user, and you're suggesting multiple guids per user (unbounded). So, let's assume each user has at least two services, we're now at a 60gb table, and that doesn't even include a mapping between users and guids, which will probably double the table size even more.

At scale, you're probably looking at a multiple-terabyte table, right from the start, and spending compute-days, or even compute-weeks, just running migrations; just to get some dubious returns and a lot of additional end-user complexity.

naasking · 2024-05-19T13:09:39 1716124179

> So, let's assume each user has at least two services, we're now at a 60gb table, and that doesn't even include a mapping between users and guids, which will probably double the table size even more.

That's literally nothing. As I said, each user gets 10x more spam than that daily.

> and spending compute-days, or even compute-weeks, just running migrations

Migrations for what?

> just to get some dubious returns and a lot of additional end-user complexity.

There is no additional user complexity.

Supposing your math is correct, each user has a relatively fixed but larger than normal storage overhead for their address book and a inbox that that grows slowly because there's no spam, rather than a a small but fixed storage overhead for their address book and an inbox that grows 10x-100x faster due to mountains of spam.

I just really don't think you're comparing the storage requirements correctly.

withinboredom · 2024-05-20T06:49:27 1716187767

Here's the magical thing about bulk emails ... you only have to store the body once.

felsokning · 2024-05-18T18:49:40 1716058180

> ...the idea here is that every entity gets their own guid designating you, so the same guid is not known by more than one entity

Ok, now you're sending a list of guids that _can_ be emailed to, per negotiation? Otherwise, how are they sending to that specific guid? A guid is not a hash of an object but an identifier object (a 16-byte array, if I recall correctly) - it has to map to the recipient _somehow_.

In other words, in each SMTP exchange, that information would have to be stored in some form of look-up table, _somewhere_, on both the sending and receiving servers.

How do you enforce the senders destroying that table, so that many versions of it don't expose your half of the signature? Do you generate a new key per session? If so, where are you storing that key, in memory? How would you prevent the heap from exposing those keys in a process crash (say, where a dump is automatically generated - like in Windows)? How do you prevent a nefarious actor using A, B, or C from generating a flood of SMTP sessions and creating a tonne (yes, the metric kind) of these look-up tables in memory? What happens when back-pressure is hit? Do you force everything else to paging but keep the tables in memory?

naasking · 2024-05-18T20:56:40 1716065800

> it has to map to the recipient _somehow_.

I think you're overthinking it. For simplicity, instead of [human-readable]@mydomain.com, let's use [guid]@mydomain.com, a dynamic set of unguessable aliases for your account. Your guids that have been handed out are completely under your control and stored on your server.

There are no cryptographic keys to manage here, just cryptographically secure identifiers that are stored on a server.

If you and I had been introduced, you would have a guid@naasking-domain.com designating me in your address book, and I would have a guid@felsokning-domain.com for your address in my address book.

So revoking the guid you have for me is an operation that happens on my server and simply invalidates the only address that you have. This part is simple and why spam is easily stopped in its tracks.

The introduction protocol is the tricky part, because C and B would have different guids for A, so if B CC's A when messaging C, then there should be a way for C to resolve their guid for A. This is done via a petname system.

If C does not already have a mapping for A (and so doesn't know A), then it can request an introduction from B. C sends B an "introduce-me as C[guid-intro]" message with a new guid for C, then B then sends to A "here's who I call C [guid-intro]". Guid-intro is a use-once guid for introduction purposes.

A then sends to C[guid-intro], "hi, I'm A[new-guid]". C replies, "hi, I'm C[new-guid]". C then revokes guid-intro since it was used, and we're done. A, B and C each have their own guid addresses for each other. You can keep the audit trail of where you got a guid introduction in a database, but that's not strictly necessary for this to work.

This introduction protocol happens transparently to the user just by exchanging specific message types the server recognizes. It's a protocol that can be built atop SMTP just to manage the database of addresses that the SMTP server accepts.

felsokning · 2024-05-19T08:39:35 1716107975

> If you and I had been introduced, you would have a guid@naasking-domain.com designating me in your address book, and I would have a guid@felsokning-domain.com for your address in my address book.

The description, here, is no different than S/MIME encryption exchange - in that the guid exchange has to be done before it could be used.

You still have the issue of [A]Guid and [C]Guid correlating between themselves _and_ it being "unique" per SMTP session (after all, you said before that each guid has to be unique per session). This is where my earlier reference to generated guids being exchanged during the SMTP session comes into play. However, leaving that aside...

A single-use guid is no different than an SMTP address, if we're going based on the single guid inferred from that line -- and that guid has to be stored elsewhere in your system for it to be resolved to a recipient. So, you need something like a guid history array on the object for the forest to be able to resolve that guid (on recipient resolve) to a mail object inside your forest.

You have no mechanism (from your description) for B sending to [A]Guid or [C]Guid junk mail (assuming they've been able to discover the guids) using those guids. You say you would invalidate [A]Guid or [C]Guid -- but this doesn't resolve the issue of [A] and [C] now having to re-exchange Guids, for something that B has done.

So, now, all valid email between [A]Guid and [C]Guid is invalidated (per your description) and they're calling into your helpdesk, trying to understand why valid email isn't being delivered.

Do you tell them to re-exchange guids? How do they re-exchange guids when the mail system is dependent (directly) on those guids already being established on both sides? How do they "re-introduce" themselves, in other words, in that scenario?

naasking · 2024-05-19T13:41:17 1716126077

> The description, here, is no different than S/MIME encryption exchange - in that the guid exchange has to be done before it could be used.

There are formal connections between some encryption protocols and what I'm describing here (effectively a system based on capability security, ie. this is modelling spam as an access control problem for an unbounded set of actors). Basically encryption let's you do away with extra storage requirements for the guids, but the cost is additional complexity around key management and revocation, and more compute cost. I haven't thought about it enough to see if there's a formal correspondence with S/MIME, but my proposal is very simple so I don't think you need to try to understand it through that lens.

> You still have the issue of [A]Guid and [C]Guid correlating between themselves _and_ it being "unique" per SMTP session

No, these guids are not per-session, they are persisted in a user's address book.

> and that guid has to be stored elsewhere in your system for it to be resolved to a recipient

Yes, each user's address book contains the guid address for a contact just like right now it contains an email address. Just take the existing address book and make the emails cryptographically unguessable guids. If you and I have the exact same set of contacts, none of our guid addresses will match. That's literally it.

> you need something like a guid history array on the object for the forest to be able to

No such history is needed. I really think you're overcomplicating this.

> You have no mechanism (from your description) for B sending to [A]Guid or [C]Guid junk mail (assuming they've been able to discover the guids) using those guids.

I don't understand what you're trying to describe here.

> You say you would invalidate [A]Guid or [C]Guid -- but this doesn't resolve the issue of [A] and [C] now having to re-exchange Guids, for something that B has done.

If A and C have been introduced per the protocol I described, then anything B does has no impact on the relationship between A and C. If B sends them junk mail, the user (A or C) could decide to revoke B's access to them, or may opt to not revoke if they think it was an accident.

You could opt to track who introduced you and make revocation decisions based on that extra info too, but it's not strictly necessary.

> How do they "re-introduce" themselves, in other words, in that scenario?

In the case that a guid address has to be revoked but you want to keep the connection, (perhaps the guid address leaked somehow), then the mail agent would have renew their connection by re-running the introduction protocol before revoking the previous guid, or they would have to request another introduction through someone they both know.

This is as simple as having a "mark as spam" button, and when the user clicks it, it asks if they want to block the user entirely or if this was accidental (or something). If the former, the system revokes immediately, if the latter the system re-runs the introduction protocol using the existing guids to get new ones, then revokes the old ones.

anamax · 2024-05-19T04:26:08 1716092768

> If email didn't need fixing, spam wouldn't exist.

Let's start with something easy - [1] define spam. [2] Can spam be identified with a purely mechanical (no humans involved) process?

Do you have a definition of "spam" that is substantially different from "mail that I don't want."

It's also interesting that you assume that spammers can't generate new identities. (Yes, you seem to think that introductions solves everything, but it doesn't)

I ask that because "I don't want" requires mind-reading by the sender.

naasking · 2024-05-19T05:12:59 1716095579

I don't think you've actually read the details of my proposal because the fact that spammers can generate new identities is irrelevant.

Nevertheless, to answer your question, spam is generally understood to be unsolicited email. The fact that computers can't read minds is exactly why they shouldn't try and should instead simply remove the core mechanism that enables spam to begin with: the easy ability to reach you because of guessable and harvestable global identifiers, and the difficulty of changing a compromised address means collecting and reselling addresses has value.

Both of these properties are violated in this system. Minting new email guids is a trivial core operation that literally happ all of the time, and addresses cannot be guessed/brute-forced, therefore addresses have almost no value to spammers or brokers.

anamax · 2024-05-19T06:05:01 1716098701

You're right - I thought that you'd made one fatal error (sender registries) when you'd actually made a different one (introducers).

> spam is generally understood to be unsolicited email

Not so fast. If spam is universally disliked and should be eradicated, it can't include all unsolicited email.

That's because people LIKE some unsolicited email. Unless you can distinguish unsolicited email that someone will like from unsolicited email that they won't like ...

FWIW, you come across like the "advertising should be banned" people. (That's another group that confuses the existence of a problem with the existence of a solution that doesn't have any downsides. They fixate on their solution and try to define-away its downsides.)

Advertising is a lot like spam. It is product information. If you don't care about the information, it is unwanted, but if you do.... The thing is, there's always someone who cares.

FWIW, every time I've physically met someone who claims to be anti-advertising, said someone has happily displayed several pieces of advertising for products that said someone liked. When I point that out, "that's different" or "I don't mean that kind of advertising."

naasking · 2024-05-19T13:48:53 1716126533

> That's because people LIKE some unsolicited email. Unless you can distinguish unsolicited email that someone will like from unsolicited email that they won't like ...

Then they can opt into a service that sends them products they might like. Defaulting to opt-in, which is the current situation, is always terrible. Spam consists of a huge fraction of mail sent over the Internet, and vast majority don't want it, and it's a huge security problem (phishing, viruses etc.).

If Gmail implements this kind of protocol, then they can opt you into their advertising list as part of the sign up process.

brightball · 2024-05-18T19:02:06 1716058926

Spam exists because email can be sent from any domain on the internet by default without requiring any validation.

The moment that enforced DMARC with p=reject is mandatory a lot of problems will go away because you will be required to "turn on" email for your domain with SPF and DKIM. In the mean time, every domain that has ever been registered is subject to being used for spam.

naasking · 2024-05-18T19:05:27 1716059127

That helps, but they will find a way around it. They always find ways around half measures.

rpbiwer2 · 2024-05-21T13:37:16 1716298636

I just want to thank you for your contribution to this thread. I've long thought email should go in a direction similar to what you're describing, and I appreciate the specificity you've provided.

I wish I could say I'm surprised by the animosity (and relative lack of substance) in some of the comments you've received, but I guess that's a problem with social media and/or humanity that's unlikely to have a technical solution.

ttul · 2024-05-18T18:10:41 1716055841

If your email address was a secret known only to your existing contacts then how does someone reach you if they don’t already know you?

naasking · 2024-05-18T21:03:01 1716066181

I've described the introduction protocol in more detail here:

https://news.ycombinator.com/item?id=40402046

anamax · 2024-05-19T04:33:00 1716093180

That relies on introducers/mutual acquaintances, which isn't enough.

Here's an easy problem that it doesn't solve. I'm giving a public talk and want to provide an email address so that people can contact me.

How do you [1] let legitimate responses get to me AND [2] block illegitimate responses. (Note - I have no control over the audience.)

naasking · 2024-05-19T05:26:30 1716096390

Each attendee gets their own guid introduction as usual, just by scanning a QR code or something, just before or after the talk. The same basic introduction protocol works here, the QR code just names a guid address that has constraints on it (time-limited maybe, or that limits introductions by number of attendees).

Corner cases like this have solutions, and even if they're a bit more awkward, who cares? 99.9% of people who suffer from spam don't encounter these corner cases. Should a solution solve the biggest part of the problem, or prioritize making the corner cases easy?

anamax · 2024-05-19T05:59:21 1716098361

Attendee? What year is it in your world? (Also, time-limited is stupid for this case.)

This isn't an edge case - people pass out addresses expecting to be contacted by randoms all the time.

And, most contacts are second-order or further, so introductions aren't even possible because there's no common link. (I pass on "you should contact {other person}" to groups that I don't control all the time.)

I get that you're proud of your introductions hammer, but you don't get to ignore the existence of screws.

naasking · 2024-05-19T13:59:36 1716127176

> Attendee? What year is it in your world?

Perhaps you're unaware that most public talks take place at conferences, which yes, still have attendees.

> (Also, time-limited is stupid for this case.)

Time-limited is perfectly fine policy for some cases. Maybe you have a different case in mind, but that wasn't specified in your scenario.

> This isn't an edge case - people pass out addresses expecting to be contacted by randoms all the time.

And? If you want to open yourself to a flood of possible spam, then create an address just for that and publish it. Nobody's stopping you.

> And, most contacts are second-order or further, so introductions aren't even possible because there's no common link

What does "second order" even mean if not "someone I know knows them"?

There's nothing stopping the use of public addresses, the point of the proposal is that most people don't need it, and the default public nature of email is what creates the security and spam nightmare that it is.

Maxion · 2024-05-18T18:48:07 1716058087

By buying your address from a broker, just like now.

naasking · 2024-05-19T05:38:22 1716097102

Which address are they buying? Every contact you know has a different guid address for you, and as soon as one of those is used for spam, you can revoke it. What value does such an address have to a broker or spammer? Addresses have value now because they are global, easily guessable and harvestable, and difficult for the user to change. The system I'm describing violates all of those properties, thus devaluing the whole spam enterprise.

chrisandchris · 2024-05-18T09:17:11 1716023831

The first paragraph sounds exactly like paper mail.

And yet, we didn't solve it. It just got worse.

viraptor · 2024-05-18T09:38:14 1716025094

That seems region specific. I don't get paper spam in Australia. I did get some for a week, until I put a "no junk mail" sticker on my box, which is respected here. It's less of an issue of paper mail and more of applicable regulation.