A few years ago I wrote a naive Java string-to-floating point conversion routine...

chrchang523 · on June 14, 2019

The biggest win, in my experience, is simply recognizing when you don’t need all ~16 significant digits provided by the standard implementations. If you’re only printing e.g. six digits, and can afford to round in the wrong direction 0.00000001% of the time, you can skip a ton of work. And as you mention, it gets even easier when you don’t have to support scientific notation.

The standard library functions occupy an awkward middle ground between this type of much faster, slightly sloppy float-to-string converter, and direct storage of the binary representation; in most applications where float serialization performance is actually relevant, at least one of these two alternatives is better.

Vindicis · on June 14, 2019

Agreed.

I deal with financial data which is in well-defined formats for the various asset classes. You can save a lot of time by just writing some parsing functions to deal with your exact needs and avoid needless work. And, hopefully avoid using the built-in functions for conversions as those will almost always be A LOT slower due to what they're required to handle.

I prefer to store that type of data in csv files because there are times when you need to read through the data to find errors, adhoc analysis, etc....If you're loading those types of files very frequently, there are better formats of course. But, when the parsing time is peanuts compared to the runtime of your program, it's nice to be able to load the file into other programs when you need without having to convert it first.

tomxor · on June 14, 2019

> that only converted 'simple' numbers, i.e. no scientific notation, only one (fixed) decimal separator, etc. [...] Anyway this simple implementation was ~10 times faster

What about numbers with a large order of magnitude? e.g 1e234. Is making very long strings still faster than switching to e notation?

roel_v · on June 14, 2019

I don't know - I knew that my input data wouldn't have to deal with such cases so I didn't have to deal with / check for it. This was part of 'etc' catchall :)

tomxor · on June 14, 2019

I guess an alternative is _always_ using e notation but don't bother moving the decimal, i.e effectively the same as the encoding but in base 10. That would be both simple and have small maximum string length.

Users probably wouldn't like it thought :P damn users.

buckminster · on June 14, 2019

I had the same experience the other way. I halved the runtime of an ETL program by writing a custom float to string function.