Hacker News new | past | comments | ask | show | jobs | submit login

I’d be shocked if jemalloc or tcmalloc had issues with that workload.

Do you have a minimal reproducing example?




Sure, in plain C because why not. <https://gist.github.com/scottlamb/459a3ce6230be67bf4ceb1f1a8...>

There's one other element I didn't mention in my previous comment, which is a thread handoff. It may be significant because it trashes any thread-specific arena and/or because it introduces a little bit of variability over a single malloc at a time.

For whatever reason the absolute rate on my test machine is much higher than in my actual program (my actual program does other things with a more complex threading setup, has multiple video streams, etc.) but you can see the same effect of hitting the mmap, munmap, and page fault paths that really need not ever be exercised after program start.

In my actual (Rust-based) program, adding like 20 lines of code for the pooling was a totally satisfactory solution and took me less time than switching general-purpose allocator, so I didn't try others. (Also, my program supports aarch64 and iirc the vendored jemalloc in the tikv-jemallocator crate doesn't compile cleanly there.)




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: