> There have been 213 commits in the curl git repo from 7.53.1 till today. There’s a chance one or more other commits than just the pure alloc changes have made a performance impact, even if I can’t think of any.
"I can't think of any" is not a very scientific way to measure optimizations. Actually, this simple fact casts a doubt on whether it's this malloc optimization that led to the speed up or any of the 200+ commits OP is working on top of.
Why not eliminate that doubt by applying the malloc optimizations to the previous official release?
I'm a bit skeptical about the speed up myself, since I would expect curl to be primarily IO bound and not CPU bound (much less malloc bound, given how little memory it uses).
I'm not skeptical about it. Last time I did something similar to this optimization, I had a python function that was doing some work over strings to get a similarity metric.
For this function, a list of elements was kept inside each call, which began empty and a few calculations where performed, populating it before the main loop of the function kicked in. In python-land, this was obviously done by declaring an empty list `precalc_values = []` and then appending over it. When we cythonized it, the dev that took it went in with a `cy_malloc(size(int)*elements)` "dynamic array of ints", and called it a day, 70x speedup over plain python.
A few days later I came in and saw that code, and said "why not go with a simple array?", to which I was told "because we don't know the size of the strings beforehand". Did a test run with a small array plus a counter (to know up to where the array held real values, and not just zero-init fields) and got a 100x speedup.
In the end we went with both functions and a wrapper that would check the length of the strings involved and select one or the other, because the array version would massively crash from accessing an array out of bounds if a large string happened to come by.
"I can't think of any" is not a very scientific way to measure optimizations. Actually, this simple fact casts a doubt on whether it's this malloc optimization that led to the speed up or any of the 200+ commits OP is working on top of.
Why not eliminate that doubt by applying the malloc optimizations to the previous official release? I'm a bit skeptical about the speed up myself, since I would expect curl to be primarily IO bound and not CPU bound (much less malloc bound, given how little memory it uses).