Hacker News new | past | comments | ask | show | jobs | submit login

Wow, that thread is nuts. Scrolled up just a bit saw this.

My new public key search system is almost ready. I had to reinvent my binary database system because, although the database was lightweight https://bitcointalk.org/index.php?topic=5475626, I had efficiency issues with binary search. This is now a thing of the past. I have designed a system that stores 100 million public keys in an 80 KB file, yes, what you read 80KB!(in the future it will be smaller) that meets maximum efficiency. We would only be limited by the current speed of Secp256k1 when generating the 100 million or more public keys while creating the database. I am finishing designing the search script after months of being stuck due to personal issues, I am finally back on track.

I love these kind of mad inventor rabbit hole corners of the Internet. Kind of brings back the 90s for me when everything was exciting.




> This is now a thing of the past. I have designed a system that stores 100 million public keys in an 80 KB file

That’s 0.0064 bits per public key - so either there are lots of duplicates, or something is amiss here?

Edit: they don’t actually store the keys, so the quote is misleading.


Presumably there is a generator function that maps key IDs into actual keys that can be re-computed at will.


How could this work with less than 1 bit of data per key?

Assuming there are no duplicates, which is a sensible assumption, you’d need a minimum of 100,000,000 bits to store 100,000,000 unique entries larger than 1 bit with even a perfect hash function.


In general, when you're storing a list of numbers, there are many situations where you can go below 1 bit per number.

The easiest one to think about is storing the deltas between each number. Let's say 80% of your deltas are 5. If you use arithmetic encoding, then storing a 5 only takes about 1/3 of a bit. It's not hard to come up with probability distributions where the average amount of bits per entry is less than 1.

Also, back in the realm of perfect hashes, once you're more than half full it becomes more efficient to store the missing numbers. If your perfect hash has 100,003,000 possible outputs, then your worst case is around 50k unique entries. By the time you encounter 100k unique entries you only need to keep track of the 3000 you haven't seen yet.


Thank you for taking the time to explain this - it makes sense, and it’s interesting to invert the problem by storing missing numbers.


Maybe storing ranges or similar


Assuming no duplicates, the only case that would make sense would be if all but a single byte was different (sequentially across all records). Even then you’d end up with more than the number of bytes we’re talking about, even excluding the size of the index (which would be non-trivial).


Why don’t you just read what the guy said by following the links in the forum? Surely, you can find more explanation there that will answer some of your questions? Hahaha! :)


His thing has collisions, so it answers none of the questions.

Also they already did follow the link. That's why they said "they don’t actually store the keys, so the quote is misleading", which you responded to with a laugh and nothing else. And that happened many hours before you made this new comment.


Heh, yeah. "Store" can have multiple meanings.

I'm not sure that guy really understood what was going on. If he'd followed the links he would've found the code. Or at least a technical description. So why need to play dumb and ask here, while trying to control the discussion?

I don't like that kind of thing. If you're okay with it, alright. But that's not me.


“That quote is misleading” Hahaha! :)




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: