Start with this representation: struct Node<TKey, TValue> { uint bitmap; KeyValu...

naasking · on Feb 19, 2023

I belatedly realized this might require some explanation too. Sorry, too many years of experiments and microoptimizations.

This is an unboxed representation of a sum type. Rather than relying on the CLR's inheritance to model sums I lay it out explicitly at the cost of a little space. It may seem wasteful since it looks like I'm adding 8 unused bytes to each node, but the CLR object header is 16 bytes so you're actually saving 8 bytes per node by avoiding classes and using structs that have no header! You can really see the difference in the working set if you do a full GC.Collect and check the allocated bytes after constructing some large tries.

Also, because I'm removing a heap object per node, this representation incurs one less indirection per node accessed, so I'm doing half as many loads as an equivalent class-based representation for every operation. This is a large constant factor difference, and probably partly explains why Sasa's ContainsKey is faster than yours.

Finally, no trie operatioms incur any virtual dispatch, which are instead replaced by a simple if-checks which are generally more predictable. This probably has less impact these days since branch predictors are pretty good now.

I think if you replaced your class-based representation that uses the superior CHAMP model with this unboxed representation your hash map would be the uncontested champion. You might even challenge the mutable dictionary for performance!

louthy · on Feb 19, 2023

> You might even challenge the mutable dictionary for performance!

Don't tell me that, I'm an optimisation nut, I might actually do it! ;)

I woke up this morning with a clear understanding of what your optimisation is, so I guess I just needed to load up my brain with the details! It's definitely a super interesting approach and one that I'm surprised I've not come across before.

I'm definitely a bit wary of changing my current implementation, mostly because it's been battle hardened, but I'm sure I won't be able to resist having a go at some point. Thanks for the tip!

naasking · on Feb 19, 2023

Do it! Do it! Do it! If only for a couple of tests to see how much it changes the results. ;-)

Unboxing sums is a nice optimization but then you can't naively use switch-patern matching to deconstruct them.

I have a bunch of other things in Sasa and other libraries you might find useful. I'm not actively working on most of them anymore except for bug fixes. I learned a lot but didn't end up using a lot of these features as much as I'd hoped.

For instance, being able to create open instance delegates in a way that automatically works around the CLR limits against such delegates to virtual methods. Some of the concurrency primitives are also interesting, as they implement an efficient atomic read/write protocols for arbitrary sized types using only volatile reads/writes (ie. avoid torn reads), and a sort of LLSC using only volatile read/write and a single interlocked inc/dec. Also, I added a kind system to CLR reflection to make working with it much easier [2].

It seems we're thinking along the same lines for numeric types. I reproduced the Haskell numeric hierarchy [1], but I put that on hold because I was thinking a [Deriving] attribute would eliminate a lot of redundancy.

Just FYI, clicking Num<A> on the main GitHub markdown page doesn't jump to the link on the markup.

Lots more to see if you're interested! I've played with parser combinators but never liked how they turned out, and settled on a simpler approach that was pretty interesting.

[1] https://github.com/naasking/HigherLogics.Algebra

[2] https://github.com/naasking/Dynamics.NET#kind-system