The next obvious solution is to have something that works fast for non-colliding station names, but then falls back to another (slow) implementation for colliding station names.
It's very cheap to detect station name collisions - just sample ~10,000 points in the file, and hope to find at least one of each station. If you find less stations than the full run, you haven't yet seen them all, keep hunting. If you find two that collide, fall back to the slow method. Checking 0.00001% of the dataset is cheap.
Cuckoo doesn't aid in detecting collisions, the algorithm is about what happens IF a collision is found. The whole reason we're hashing in the first place is to not have to linearly compare station names when indexing.
In other words: Cuckoo is a strategy to react to the case if two values map to the same hash. But how to know weather you have two different values, or two of the same, if they have an identical hash?
It's very cheap to detect station name collisions - just sample ~10,000 points in the file, and hope to find at least one of each station. If you find less stations than the full run, you haven't yet seen them all, keep hunting. If you find two that collide, fall back to the slow method. Checking 0.00001% of the dataset is cheap.