Hacker News new | past | comments | ask | show | jobs | submit login

That's an interesting idea to hash emails to Merkle buckets by arrival time. It may lead to uneven distribution of emails to buckets though, increasing the amount of bandwidth required to sync a bucket in the worst case.

There are other optimizations you can bring in when you sync the tree:

1. If you are working your way down the remote's tree and you notice that your local signature is zero for the equivalent remote's node signature, then you know that your entire subtree is empty, and you can short-circuit and start downloading entire buckets.

2. Conversely, if you are working your way down the remote's tree and you notice that your local signature is present but the equivalent remote's node signature is zero, then you know that your entire subtree has data, but the remote's entire subtree is empty, and you can short-circuit and start uploading entire buckets.

3. As you work your way down sections of the tree, you can start to build an idea of the average email to bucket ratio. If several buckets are only likely to contain at most one email, then you can short-circuit again.




I think the LSB bits of arrival time is actually a really good distribution long-term (remember: we want the distribution to be uneven in the short term, but even in the long term) Maybe you could improve it a little bit by making each bin a prime number of seconds so automated messages sent at the same time every day can hit each bucket eventually. i.e. instead of 8 second granularity, use 7 or 11.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: