Wow, that thread is nuts. Scrolled up just a bit saw this. *My new public key se...

orf · 2024-09-16T14:27:09 1726496829

> This is now a thing of the past. I have designed a system that stores 100 million public keys in an 80 KB file

That’s 0.0064 bits per public key - so either there are lots of duplicates, or something is amiss here?

Edit: they don’t actually store the keys, so the quote is misleading.

bdamm · 2024-09-16T16:59:51 1726505991

Presumably there is a generator function that maps key IDs into actual keys that can be re-computed at will.

orf · 2024-09-16T17:02:50 1726506170

How could this work with less than 1 bit of data per key?

Assuming there are no duplicates, which is a sensible assumption, you’d need a minimum of 100,000,000 bits to store 100,000,000 unique entries larger than 1 bit with even a perfect hash function.

Dylan16807 · 2024-09-17T19:44:41 1726602281

In general, when you're storing a list of numbers, there are many situations where you can go below 1 bit per number.

The easiest one to think about is storing the deltas between each number. Let's say 80% of your deltas are 5. If you use arithmetic encoding, then storing a 5 only takes about 1/3 of a bit. It's not hard to come up with probability distributions where the average amount of bits per entry is less than 1.

Also, back in the realm of perfect hashes, once you're more than half full it becomes more efficient to store the missing numbers. If your perfect hash has 100,003,000 possible outputs, then your worst case is around 50k unique entries. By the time you encounter 100k unique entries you only need to keep track of the 3000 you haven't seen yet.

orf · 2024-09-18T10:31:40 1726655500

Thank you for taking the time to explain this - it makes sense, and it’s interesting to invert the problem by storing missing numbers.

CalRobert · 2024-09-16T18:47:36 1726512456

Maybe storing ranges or similar

orf · 2024-09-16T20:45:41 1726519541

Assuming no duplicates, the only case that would make sense would be if all but a single byte was different (sequentially across all records). Even then you’d end up with more than the number of bytes we’re talking about, even excluding the size of the index (which would be non-trivial).

keepamovin · 2024-09-17T01:49:29 1726537769

Why don’t you just read what the guy said by following the links in the forum? Surely, you can find more explanation there that will answer some of your questions? Hahaha! :)

Dylan16807 · 2024-09-17T19:49:27 1726602567

His thing has collisions, so it answers none of the questions.

Also they already did follow the link. That's why they said "they don’t actually store the keys, so the quote is misleading", which you responded to with a laugh and nothing else. And that happened many hours before you made this new comment.

keepamovin · 2024-09-18T11:52:40 1726660360

Heh, yeah. "Store" can have multiple meanings.

I'm not sure that guy really understood what was going on. If he'd followed the links he would've found the code. Or at least a technical description. So why need to play dumb and ask here, while trying to control the discussion?

I don't like that kind of thing. If you're okay with it, alright. But that's not me.

keepamovin · 2024-09-16T15:15:57 1726499757

“That quote is misleading” Hahaha! :)