s/JSON/string/

Twirrim · on Jan 4, 2024

I took a few shots at this in rust (following on from https://www.reddit.com/r/rust/comments/18ws370/optimizing_a_...), and every single time the bottleneck was down to the string parsing, everything else was largely irrelevant.

You can do this whole thing in about 24 seconds, as long as you're smart about how you chunk up the text and leverage threading. https://github.com/coriolinus/1brc/blob/main/src/main.rs

toth · on Jan 4, 2024

Interesting, I guess that makes sense.

Having a quick look at your code, couple of thoughts:

   - You shouldn't bother with parsing and validating UTF-8. Just pretend it's ASCII. Non ASCII characters are only going to show up in the station name anyway, and all you are doing with it is hashing it and copying it.

   - You are first chopping the file into line chunks and then parsing the line. You can do it it one go, just look at each character byte by byte until you hit a semicolon, and compute a running hash byte by byte. You can also parse the number into an int (ignoring decimal point) using custom code and be faster than the generic float parser.

   - If instead of reading the file using standard library, you mmap it, should also speed things up a bit.