Having a quick look at your code, couple of thoughts:
- You shouldn't bother with parsing and validating UTF-8. Just pretend it's ASCII. Non ASCII characters are only going to show up in the station name anyway, and all you are doing with it is hashing it and copying it.
- You are first chopping the file into line chunks and then parsing the line. You can do it it one go, just look at each character byte by byte until you hit a semicolon, and compute a running hash byte by byte. You can also parse the number into an int (ignoring decimal point) using custom code and be faster than the generic float parser.
- If instead of reading the file using standard library, you mmap it, should also speed things up a bit.