> par_iter_mut() [1] from Rayon I wonder have they fixed the performance? https:...

pcwalton · on June 14, 2024

> I wonder have they fixed the performance? https://www.reddit.com/r/rust/comments/bto10h/update_a_scali...

The comments point out a whole bunch of problems with that data. A better example would be https://parallel-rust-cpp.github.io/introduction.html which shows them as quite comparable, depending on the compiler.

> In C++ I’d usually use Eigen, because these expression templates are saving memory allocations and bandwidth storing/reloading temporary matrices. Sometimes much faster than BLAS libraries with C API. I’m not sure Rust has an equivalent.

That equivalent would be ndarray.

> For some applications of graphs and trees it’s useful to have nodes polymorphic. An example is a visual tree in GUI: different nodes are instances of different classes. Array elements are of the same type.

And in that case you can use Box (or Rc/Arc).

> Even when the elements are 8 bytes so the indexing can be merged, you need to either spend a register for the base address, or load it from memory with another instruction.

I've never seen this be a performance problem in practice; the cost of doing a shift and add is incredibly low compared to the cost of actually fetching the memory.

> It’s relatively expensive to split or merge linked lists/trees/graphs stored that way. If the tree/graph is long lived, mutable, and changes a lot, eventually you might need to compact or even garbage collect these arrays.

Which is the same thing modern thread-caching mallocs also have to do, except that compacting and garbage collecting is actually possible with the arena approach (not that I think it's terribly important either way).

Const-me · on June 14, 2024

> That equivalent would be ndarray

It seems ndarray is conceptually similar to C libraries, it doesn’t have expression templates.

When you write r = a * b * c with ndarray, you allocate, store and then load a temporary array with a * b. When you write r = a * b * c with Eigen, depending on the types it sometimes skips the temporary, and instead computes the complete expression in one shot. For some use cases, the tactic causes substantial performance win.

> Box (or Rc/Arc)

An array of boxes will cause another pointer chase: first to load the pointer, another one to reach the payload.

xoranth · on June 14, 2024

> expression templates

That's one of the cases where you can't "mechanically" translate C++ to Rust. To obtain the same result, a good choice would be a proc macro.

Which is a pain to implement, but will also give you more flexibility.