Hacker News new | past | comments | ask | show | jobs | submit login

A few years ago I wrote a naive Java string-to-floating point conversion routine that only converted 'simple' numbers, i.e. no scientific notation, only one (fixed) decimal separator, etc. It was the first Java code I had written in 10 years and the approach was probably not very Java-esque (I imitated what I'd do in C++, using (IIRC) java.nio.ByteBuffer or something similar). Anyway this simple implementation was ~10 times faster than the 'officially' suggested way, using formatted string conversion routines, even after asking around (on SO and similar) on how to do high performance string-to-float conversion.

There are probably similar issues in other programming languages, this is not to rag on Java; my point is that when you don't need the 'fancy' floating point formatting stuff, and you need high performance (in my case, I had to convert many billions of floating-points-as-strings and the conversion took ~50% of my program's runtime, which was measured in days), it can pay off to write a custom version of something as seemingly mundane as converting a string to a number.




The biggest win, in my experience, is simply recognizing when you don’t need all ~16 significant digits provided by the standard implementations. If you’re only printing e.g. six digits, and can afford to round in the wrong direction 0.00000001% of the time, you can skip a ton of work. And as you mention, it gets even easier when you don’t have to support scientific notation.

The standard library functions occupy an awkward middle ground between this type of much faster, slightly sloppy float-to-string converter, and direct storage of the binary representation; in most applications where float serialization performance is actually relevant, at least one of these two alternatives is better.


Agreed.

I deal with financial data which is in well-defined formats for the various asset classes. You can save a lot of time by just writing some parsing functions to deal with your exact needs and avoid needless work. And, hopefully avoid using the built-in functions for conversions as those will almost always be A LOT slower due to what they're required to handle.

I prefer to store that type of data in csv files because there are times when you need to read through the data to find errors, adhoc analysis, etc....If you're loading those types of files very frequently, there are better formats of course. But, when the parsing time is peanuts compared to the runtime of your program, it's nice to be able to load the file into other programs when you need without having to convert it first.


> that only converted 'simple' numbers, i.e. no scientific notation, only one (fixed) decimal separator, etc. [...] Anyway this simple implementation was ~10 times faster

What about numbers with a large order of magnitude? e.g 1e234. Is making very long strings still faster than switching to e notation?


I don't know - I knew that my input data wouldn't have to deal with such cases so I didn't have to deal with / check for it. This was part of 'etc' catchall :)


I guess an alternative is _always_ using e notation but don't bother moving the decimal, i.e effectively the same as the encoding but in base 10. That would be both simple and have small maximum string length.

Users probably wouldn't like it thought :P damn users.


I had the same experience the other way. I halved the runtime of an ETL program by writing a custom float to string function.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: