> I tend to agree; I take the view that most engineers are smart, and compilers/interpreters/virtual machines are even smarter so most targeted optimizations aren't going to result in very much gain.
This hasn’t been my experience. As an example, I find there’s an awful lot of performance left on the table in most programs because of the sloppy way programmers use memory.
Most programmers don’t think twice about allocating memory and pointer indirection, and most programs are full of it. But if you can refactor this stuff to use bigger objects and fewer allocations, and use object pools, arenas and inline allocation (smallvec, SSO and friends) where it’s appropriate you can usually improve your performance by several times in most “already optimized” programs. The performance comes from fewer malloc calls (malloc is expensive) and fewer cache misses (because locality improves). It’s like you say - Cache misses are super expensive on modern hardware. And you often don’t need a full rewrite to tweak this stuff - just some careful refactoring.
I once saw a 20x performance uplift in a benchmark because we were using some tree structure where each leaf node only stored a single value. We replaced it with something that had arrays at the leaves and larger arrays in the internal nodes and performance skyrocketed.
The compiler isn’t smart enough to suggest any of this stuff. If you use Box<X> in rust instead of X, it’ll happily give you a slow program. Slower than javascript in many cases. And the default Vec and String types in rust’s standard library allocate even if the contents would fit in a pointer.
In javascript, Python and friends you can’t even implement a lot of low-allocation data structures because every list and object is inescapably a pointer to a heap object. This is why JS will never be as fast as C - you can’t write fast, nontrivial data structures. There’s a ceiling on the performance in languages like this - and if you need more performance than JS can give you, then a rewrite might be the right call.
Another example: I had a bug yesterday where writing a 5mb JSON file took about 1 second. Turned out I wasn’t using a buffered writer. Wrapping my File in BufferedWriter::new() made the time taken drop from about 1 second to 0.01 seconds.
The compiler isn’t very smart. By all means, rewrite your software to be fast. But there’s also usually big performance wins to be had in almost any program if you take the time to look.
This hasn’t been my experience. As an example, I find there’s an awful lot of performance left on the table in most programs because of the sloppy way programmers use memory.
Most programmers don’t think twice about allocating memory and pointer indirection, and most programs are full of it. But if you can refactor this stuff to use bigger objects and fewer allocations, and use object pools, arenas and inline allocation (smallvec, SSO and friends) where it’s appropriate you can usually improve your performance by several times in most “already optimized” programs. The performance comes from fewer malloc calls (malloc is expensive) and fewer cache misses (because locality improves). It’s like you say - Cache misses are super expensive on modern hardware. And you often don’t need a full rewrite to tweak this stuff - just some careful refactoring.
I once saw a 20x performance uplift in a benchmark because we were using some tree structure where each leaf node only stored a single value. We replaced it with something that had arrays at the leaves and larger arrays in the internal nodes and performance skyrocketed.
The compiler isn’t smart enough to suggest any of this stuff. If you use Box<X> in rust instead of X, it’ll happily give you a slow program. Slower than javascript in many cases. And the default Vec and String types in rust’s standard library allocate even if the contents would fit in a pointer.
In javascript, Python and friends you can’t even implement a lot of low-allocation data structures because every list and object is inescapably a pointer to a heap object. This is why JS will never be as fast as C - you can’t write fast, nontrivial data structures. There’s a ceiling on the performance in languages like this - and if you need more performance than JS can give you, then a rewrite might be the right call.
Another example: I had a bug yesterday where writing a 5mb JSON file took about 1 second. Turned out I wasn’t using a buffered writer. Wrapping my File in BufferedWriter::new() made the time taken drop from about 1 second to 0.01 seconds.
The compiler isn’t very smart. By all means, rewrite your software to be fast. But there’s also usually big performance wins to be had in almost any program if you take the time to look.