I just meant to point out that SoA vs AoS or an even slower data structure is not necessarily relevant for all programs.
> The main perf problem with using linked lists in high level languages is not the fact that they cause cache-incoherent access, it's worse, it's the fact that they allocate and deallocate memory constantly which is much much slower.
This is not true though; at least the JVM most definitely doesn’t do that, and I’m sure other state-of-the-art GCs neither, like V8.
First of all, allocation is basically free, much faster than malloc and friends. It’s only a pointer bump. And the cost of deallocation is amortized/may not even happen. With plenty of memory available, GC doesn’t really happen, and for long-lived objects in case of a generational GC, they may not necessarily get checked at all after a time.
So the usual linked list gets appended a few times is close to optimal, iteration has the same cost as in lower level languages with pointer chasing (usually a bit more because objects usually have headers that take quite some space, less of them potentially fitting into cache). It even has some advantage in that a modern compacting GC may move the objects close to each other.
>So the usual linked list gets appended a few times is close to optimal, iteration has the same cost as in lower level languages with pointer chasing (usually a bit more because objects usually have headers that take quite some space, less of them potentially fitting into cache). It even has some advantage in that a modern compacting GC may move the objects close to each other.
I wonder why array-backed linked lists aren't more of a thing. I've used those before in C in hobby projects. You allocate an extendable array of nodes and the nodes next and/or prev pointers become indices in the array (although pointers would still work). Added nodes go on the end, deleted nodes stay where they are and just aren't referenced. If there is high churn, do something to keep track of inactive nodes to fill next. You're still spamming around an area of memory when iterating the list, but it's at least confined to one guaranteed contiguous block and you're only allocating to extend the array and not doing a lot of allocating and freeing of memory. You still have the memory overhead of the next/previous indices, though, so fewer nodes fit in cache than otherwise would.
Agreed, especially for high level languages and application level linked list usage. Worth mentioning I think the good linked list libs usually do this under the hood, and kernel & allocator level linked lists are always doing this. People implementing simple linked lists themselves, and/or copy-pasting the kind of code you get by searching for simple code examples (like the link in my sibling comment here), that's when they end up with very poor performing linked lists.
A closely related concept to what you're describing is internal storage linked lists [1], meaning the class/node contains the link(s) and is specialized for one particular kind of data, which often makes it more compact, and easier to allocate a block of nodes in a single call.
Anyway, I'd guess a big reason it's harder to find array backed lists when searching for linked list implementations is just because it's more advanced and may introduce memory management topics into articles and blog posts that the author doesn't want to cover. But I agree, once you use array-backed lists, the common examples of linked lists you can find online where new is used twice per node, once for a generic node, and once for the payload, seems ridiculously wasteful. This is the kind of thing I meant earlier - best practice would be to avoid a naive bad linked list even in code that isn't a hot spot. Time doesn't have to be spent optimizing it, and it doesn't need to appear as a hotspot, to know it should be avoided.
> at least the JVM most definitely doesn’t do that, and I’m sure other state-of-the-art GCs neither, like V8.
Which implementation are we talking about here? Do you mean JVM internal linked lists, or user allocations of linked list nodes? I was talking about implementations such as (first Google result for "Java linked list implementation") https://www.geeksforgeeks.org/implementing-a-linked-list-in-...
> First of all, allocation is basically free
No, definitely not true. You're claiming that memory allocation in a tight loop is not a bottleneck. I completely disagree, having written several high performance web apps with thousands to millions of users, the single most important performance "best practices" strategy in any tight-loop Javascript is to avoid memory allocation in the hot spot. Also true in Python, true in C++, true in every language I've ever used, and has been true for decades and is still true despite significant performance improvements in V8 and other engines. This is the whole reason there are high performance libraries in JavaScript like https://glmatrix.net/ that specifically structure the API so that the user has complete control over allocations, precisely because batching and re-using allocations is faster than not.
What are the specific situation that we are talking about? Of course needless creation of non-single-use objects will have a cost, but object reuse via pools may be even worse than creating non-escaping objects in a tight loop. The “problem” with high level languages is the complexity of their runtimes - you can’t easily reason about what and how will get optimized. It can change from version to version so one should profile.
Not too familiar with JS vm internals, but OpenJDK has quite a good escape analysis.
> The “problem” with high level languages is the complexity of their runtimes - you can’t easily reason about what and how will get optimized.
Yes, agreed. This is the point the author was making, and noting that we have room to actually improve the languages, and make some basic optimizations easier to think about and code explicitly.
> object reuse via pools may be even worse than creating non-escaping objects in a tight loop.
When does this happen in practice, and how likely is it? What does it imply about object lifetime if your objects don't escape, and why wouldn't you be re-using an object on the stack or local scope instead? How often would replacing a non-escaping object creating with a re-use that escapes be a realistic alternative? I'm sure a contrived case can be setup, but this seems like an unlikely explanation to me as general advice. It's bad practice to hope your objects don't escape, and difficult to ensure. In general, if you want higher performance, then avoid using new in hot spots. The difference is often an order of magnitude in an flat O(n) loop.
I’m not disagreeing with you, and as a very general advice, less allocation is better of course.
My point was more along the way that sometimes very non-intuitive (from a low-level perspective) solutions may optimize better, because that’s what the compiler writers optimized for.
And object pools at least in Java is not necessarily good. The first go at it should be the plainest implementation one can imagine (on a micro level. Of course architecturally it is important to think about optimization, like whether it will be a class or an array, etc.)
I just meant to point out that SoA vs AoS or an even slower data structure is not necessarily relevant for all programs.
> The main perf problem with using linked lists in high level languages is not the fact that they cause cache-incoherent access, it's worse, it's the fact that they allocate and deallocate memory constantly which is much much slower.
This is not true though; at least the JVM most definitely doesn’t do that, and I’m sure other state-of-the-art GCs neither, like V8.
First of all, allocation is basically free, much faster than malloc and friends. It’s only a pointer bump. And the cost of deallocation is amortized/may not even happen. With plenty of memory available, GC doesn’t really happen, and for long-lived objects in case of a generational GC, they may not necessarily get checked at all after a time.
So the usual linked list gets appended a few times is close to optimal, iteration has the same cost as in lower level languages with pointer chasing (usually a bit more because objects usually have headers that take quite some space, less of them potentially fitting into cache). It even has some advantage in that a modern compacting GC may move the objects close to each other.