The next obvious solution is to have something that works fast for non-colliding...

posix86 · on Jan 4, 2024

But how would you detect that a station name is colliding or not colliding? With a hash set?

viraptor · on Jan 4, 2024

Search for cuckoo hashing. It's a whole thing for data structures.

posix86 · on Jan 11, 2024

Cuckoo doesn't aid in detecting collisions, the algorithm is about what happens IF a collision is found. The whole reason we're hashing in the first place is to not have to linearly compare station names when indexing.

In other words: Cuckoo is a strategy to react to the case if two values map to the same hash. But how to know weather you have two different values, or two of the same, if they have an identical hash?

anonymoushn · on Jan 4, 2024

It seems like you have to run the slow collision-resistant thing over the whole file.

londons_explore · on Jan 4, 2024

Collision detection is far cheaper. One method:

Simply add up all the bytes of all seen placenames while running your fast algorithm. This is effectively a checksum of all bytes of placename data.

Then at the end, calculate the 'correct' name-sum (which can be done cheaply). If it doesn't match, a collision occurred.

anonymoushn · on Jan 4, 2024

You can design inputs that defeat this scheme though. I thought the point was to be correct for all valid inputs.