As an ex-3D engine dev, I respectfully disagree. But if you're happy to give som...

naasking · on Feb 18, 2023

> [1] https://github.com/louthy/language-ext

Cool library. I've had a few of these patterns in my Sasa library for years, but you've taken it to the Haskell extreme! Probably further than most C# developers could stomach. ;-)

You might be interested in checking out the hash array mapped trie from Sasa [1]. It cleverly exploits the CLR's reified generics to unbox the trie at various levels which ends up saving quite a bit of space and indirections, so it performs almost on par with the mutable dictionary.

I had an earlier version that used an outer struct to ensure it's never null [2], similar to how your collections seem to work, but switched to classes to make it more idiomatic in C#.

I recently started sketching out a Haskell-like generic "Deriving" source generator, contrasted with your domain-specific piecemeal approach, ie. [Record], [Reader], etc. Did you ever try that approach?

[1] https://sourceforge.net/p/sasa/code/ci/default/tree/Sasa.Col...

[2] https://sourceforge.net/p/sasa/code/ci/57417faec5ed442224a0f...

louthy · on Feb 18, 2023

> so it performs almost on par with the mutable dictionary.

I notice you have a comment that says "This trie is now the fastest immutable dictionary I'm aware of". Unfortunately I have to make you aware that my implementation is faster, sorry! ;)

Here's the add-items benchmark:

    BenchmarkDotNet=v0.13.5, OS=Windows 11 (10.0.22621.1265/22H2/2022Update/SunValley2)
    AMD Ryzen Threadripper PRO 3995WX 64-Cores, 1 CPU, 128 logical and 64 physical cores
    .NET SDK=6.0.300
      [Host]     : .NET 6.0.5 (6.0.522.21309), X64 RyuJIT AVX2
      DefaultJob : .NET 6.0.5 (6.0.522.21309), X64 RyuJIT AVX2

    |                    Method |      N |           Mean |
    |-------------------------- |------- |---------------:|
    | SysColImmutableDictionary |    100 |      32.315 μs |
    |                  SasaTrie |    100 |       8.007 μs |
    |          SysColDictionary |    100 |       1.440 μs |
    |            LangExtHashMap |    100 |       7.512 μs |
    |-------------------------- |------- |---------------:|
    | SysColImmutableDictionary |   1000 |     524.368 μs |
    |                  SasaTrie |   1000 |     140.537 μs |
    |          SysColDictionary |   1000 |      17.395 μs |
    |            LangExtHashMap |   1000 |     125.663 μs |
    |-------------------------- |------- |---------------:|
    | SysColImmutableDictionary |  10000 |   7,598.532 μs |
    |                  SasaTrie |  10000 |   2,055.220 μs |
    |          SysColDictionary |  10000 |     301.683 μs |
    |            LangExtHashMap |  10000 |   1,842.514 μs |
    |-------------------------- |------- |---------------:|
    | SysColImmutableDictionary | 100000 | 129,811.705 μs |
    |                  SasaTrie | 100000 |  46,752.702 μs |
    |          SysColDictionary | 100000 |   6,103.999 μs |
    |            LangExtHashMap | 100000 |  37,869.597 μs |

* SysColImmutableDictionary = System.Collections.Immutable.ImmutableDictionary

* SysColDictionary = System.Collections.Generic.Dictionary

My other benchmarks are still running, so I won't share them all, but we actually trade blows, your ContainsKey appears to be faster, my collection iteration is twice as fast as yours. I suspect if I removed my struct wrapper it would be a bit quicker, but wouldn't be a real-world fair comparison.

We're both much faster than Microsoft's ImmutableCollections though!

I am certainly interested in what that technique is with the nested Node<Node<Node<Node<... and why it works. Do you have any more detail on that? You might also want to look into the data-structure I use: CHAMP [1]. I notice you use HAMT, CHAMP is a more efficient version of HAMT. It has better data-locality, which is much more efficient especially when working with value-types.

With regards to the 'deriving' question: Yeah, I'm working on a more general solution now, as well as a way to do monad-transformers properly. I've also figured out a way to do infinite tail-recursion for my monadic types, which I'll be deploying in the next major update.

[1] https://michael.steindorfer.name/publications/phd-thesis-eff...

* EDIT *

Here's the full suite of benchmarks, but I've trimmed them down to just the 10,000 item tests and removed the error and deviations for clarity:

    +-----------------------+----------------+
    | Add 10000 items (value type)           |
    +-----------------------+----------------+
    |        SasaTrie       |   2,055.220 μs |
    |        LangExtHashMap |   1,842.514 μs |
    +-----------------------+----------------+
    | Add 10000 items (reference type)       |
    +-----------------------+----------------+
    |              SasaTrie |   2,988.663 μs |
    |        LangExtHashMap |   2,905.552 μs |
    +-----------------------+----------------+
    | ContainsKey (10000 value items map)    |
    +-----------------------+----------------+
    |              SasaTrie |   173,215.0 ns |
    |        LangExtHashMap |   235,577.2 ns |
    +-----------------------+----------------+
    | ContainsKey (10000 reference map)      |
    +-----------------------+----------------+
    |              SasaTrie |     518.789 μs |
    |        LangExtHashMap |     549.952 μs |
    +-----------------------+----------------+
    | Iterate (10000 value items map)        |
    +-----------------------+----------------+
    |              SasaTrie |   457,819.0 ns |
    |        LangExtHashMap |   227,632.7 ns |
    +-----------------------+----------------+
    | Iterate (10000 references map)         |
    +-----------------------+----------------+
    |              SasaTrie |   712,623.1 ns |
    |        LangExtHashMap |   519,250.7 ns |
    +-----------------------+----------------+
    | Random remove (10000 value items map)  |
    +-----------------------+----------------+
    |              SasaTrie | 2,276,772.4 ns |
    |        LangExtHashMap | 1,765,042.9 ns |
    +-----------------------+----------------+
    | Random remove (10000 references map    |
    +-----------------------+----------------+
    |              SasaTrie |   2,946.775 μs |
    |        LangExtHashMap |   2,576.813 μs |
    +-----------------------+----------------+

naasking · on Feb 19, 2023

Nice! I'm away for the weekend so will follow-up with an explanation of the nested generic when not on the phone if the following is unclear: the basic idea is that you inline the next level of the Trie into the previous level, thus collapsing the depth and removing some indirections. This slightly bloats the internal tree nodes but the leaves vastly outnumber the internal nodes, and since the leaves have been inlined one level up, it ends up compacting it somewhat. This is maybe clearer when starting with a struct-only design for the internal nodes and then performing the inlining as an optimization on that.

Also wondering whether you checked the Trie from the latest repo or the older struct version I also linked. I think the struct version was a little faster but I switched for the aforementioned idiomatic reasons.

I'll have to check out the CHAMP design for sure, I'm curious how it differs.

Edit: I also recall that I didn't optimize iteration at all and that it was fairly inefficient, but can't really confirm now. I recall optimizing merge carefully though so that should perform well.

Edit 2: forgot to mention that Node<...> generic is nested 6 times because that's the max depth of the trie for 32 bit keys with 32 element nodes.

louthy · on Feb 19, 2023

> Also wondering whether you checked the Trie from the latest repo or the older struct version I also linked

I included Sasa.Collections 1.0.0-RC4 from Nu-Get. So I assume it's your latest version. I've added Sasa.Collections permanently to my benchmarks seeing as you're the first library to get close to the performance of the language-ext collections - I'd like to keep an eye on you, haha! Always good to have a bit of friendly competition :)

> forgot to mention that Node<...> generic is nested 6 times because that's the max depth of the trie for 32 bit keys with 32 element nodes.

Yep, I figured that one. Still I look at it and go 'huh?'. I've been looking at code all day though, so I'll have a proper look tomorrow when less tired.

naasking · on Feb 19, 2023

Start with this representation:

    struct Node<TKey, TValue>
    {
      uint bitmap;
      KeyValuePair<TKey, TValue>[] entries;
      Node<TKey, TValue>[] children;
      // Node = Leaf of bitmap * children | Internal of bitmap * children
      // We inline that sum into a struct where
      // Leaf -> entries != null
      // Internal -> children != null
    }

Now consider the case where leaves are somewhat sparse. That means the children arrays will have few entries, so instead of allocating a bunch of one-element arrays for children which themselves allocate a bunch of one-element arrays of values, just inline the value arrays in the parent. Now propagate that same idea for internal nodes all the way back to the root.

Also, because these are all structs, the JIT should specialize the code generated for each level so it should be pretty fast. IIRC, the speed difference between the original and the inlined representation was noticeable but not crazy, but the memory use difference was considerable. This was all years ago though, first implementation was 2013.

naasking · on Feb 19, 2023

I belatedly realized this might require some explanation too. Sorry, too many years of experiments and microoptimizations.

This is an unboxed representation of a sum type. Rather than relying on the CLR's inheritance to model sums I lay it out explicitly at the cost of a little space. It may seem wasteful since it looks like I'm adding 8 unused bytes to each node, but the CLR object header is 16 bytes so you're actually saving 8 bytes per node by avoiding classes and using structs that have no header! You can really see the difference in the working set if you do a full GC.Collect and check the allocated bytes after constructing some large tries.

Also, because I'm removing a heap object per node, this representation incurs one less indirection per node accessed, so I'm doing half as many loads as an equivalent class-based representation for every operation. This is a large constant factor difference, and probably partly explains why Sasa's ContainsKey is faster than yours.

Finally, no trie operatioms incur any virtual dispatch, which are instead replaced by a simple if-checks which are generally more predictable. This probably has less impact these days since branch predictors are pretty good now.

I think if you replaced your class-based representation that uses the superior CHAMP model with this unboxed representation your hash map would be the uncontested champion. You might even challenge the mutable dictionary for performance!

louthy · on Feb 19, 2023

> You might even challenge the mutable dictionary for performance!

Don't tell me that, I'm an optimisation nut, I might actually do it! ;)

I woke up this morning with a clear understanding of what your optimisation is, so I guess I just needed to load up my brain with the details! It's definitely a super interesting approach and one that I'm surprised I've not come across before.

I'm definitely a bit wary of changing my current implementation, mostly because it's been battle hardened, but I'm sure I won't be able to resist having a go at some point. Thanks for the tip!

naasking · on Feb 19, 2023

Do it! Do it! Do it! If only for a couple of tests to see how much it changes the results. ;-)

Unboxing sums is a nice optimization but then you can't naively use switch-patern matching to deconstruct them.

I have a bunch of other things in Sasa and other libraries you might find useful. I'm not actively working on most of them anymore except for bug fixes. I learned a lot but didn't end up using a lot of these features as much as I'd hoped.

For instance, being able to create open instance delegates in a way that automatically works around the CLR limits against such delegates to virtual methods. Some of the concurrency primitives are also interesting, as they implement an efficient atomic read/write protocols for arbitrary sized types using only volatile reads/writes (ie. avoid torn reads), and a sort of LLSC using only volatile read/write and a single interlocked inc/dec. Also, I added a kind system to CLR reflection to make working with it much easier [2].

It seems we're thinking along the same lines for numeric types. I reproduced the Haskell numeric hierarchy [1], but I put that on hold because I was thinking a [Deriving] attribute would eliminate a lot of redundancy.

Just FYI, clicking Num<A> on the main GitHub markdown page doesn't jump to the link on the markup.

Lots more to see if you're interested! I've played with parser combinators but never liked how they turned out, and settled on a simpler approach that was pretty interesting.

[1] https://github.com/naasking/HigherLogics.Algebra

[2] https://github.com/naasking/Dynamics.NET#kind-system

Jensson · on Feb 18, 2023

3d engine is very structured, it isn't the creative part of game development.

Lets say you want to test your bullets having HP and can be shot down. So you add that in an hour, now bullets have HP, you test it, it doesn't work, so you remove it. Things like that needs to be seamless, because you need to iterate on ideas and test if things are fun. Big upfront design as you suggest where you very carefully constrain everything doesn't do well with that. It works well for 3d engines for example sure, but 3d engines are much more constrained than games.

louthy · on Feb 18, 2023

> 3d engine is very structured, it isn't the creative part of game development.

I also did lots of the creative bit ;). I just assumed this was going to go down a performance route question, sorry!

> Big upfront design as you suggest

No, I am not suggesting that at all - I'm saying the opposite of that. You're the third person to think that's what I am saying now, so I assume I didn't explain myself very well. Rather than repeat myself again, my other replies cover this [1] [2]:

[1] https://news.ycombinator.com/item?id=34847413

[2] https://news.ycombinator.com/item?id=34847326

Jensson · on Feb 18, 2023

Well, you say that works, but basically nobody who programs creatively writes code that way. People who like functional programming mostly writes compilers or similar processing throughput systems, you see it a ton there, but you barely see it on the other side of the programming landscape such as game programming.

So, you say it works for you, but basically every example of the things I talked about are done in a very different way than you describe, so unless you have programmed some very impressive games I would say that the data strongly supports another style of programming than you suggest here. For compilers or parsers or stuff like medical systems? Sure, go ahead, it seems to work for those domains, but it doesn't seem to be a panacea. If games were so much easier to write that way we would see tons of them, especially among the better games, but we don't.

naasking · on Feb 18, 2023

> If games were so much easier to write that way we would see tons of them, especially among the better games, but we don't.

Functional programming only started going mainstream a few years ago. Give it time, it will make an impact. You can already see the influence of functional reactive programming in UI frameworks like React, where the screen state is a pure function of the program state. That's exactly what games have been doing for years, ie. rendering every from based on program state, only now we have a purely functional model of how this works.

Some of the "purely functional" aspects will be optimized/compiled away at various layers, but the programmer would still be able to work with the high-level model without attending to those details.

louthy · on Feb 18, 2023

The biggest mistake you make is thinking you do something different to me or any other software engineer. You don't. You do the same thing and the same rules apply. You seem to think you're the only person doing anything creative. Well here's a newsflash: it's the same process in games as it is in all other forms of software development. The end result might look different, but the process is the same. The best devs are the creative ones, but creativity isn't limited to what appears on the screen, it comes from architecting complex solutions to difficult problems, it comes from elegance in code, and it comes from producing something that works for the end-user - whether that's a gamer, a doctor, or someone browsing on their phone.

Games is one of the easiest sectors to be in, the complexity level is low compared to other sectors, especially now all the difficult bits are done for you (off the shelf engines). I assume you're not working on titles that have 15 million lines of code in, like I do every day.

That is possibly why there hasn't been a big push for FP in games yet, because they haven't yet hit the complexity limit of OO like others have. Your argument is "Well we've always done it like this, why change?". Probably for the same reason people don't write games in assembly languages any more: at some point the current tools aren't good enough. Games might not have hit that ceiling yet, but you'll almost certainly be affected by the artefacts of getting close to it.

I see you're even questioning my history in games. Below [1][2][3][4][5][6][7][8] are some of the titles I worked on (or my code is in). But to be clear, we didn't use these techniques back then either, because it was too soon to be doing it (or we weren't enlightened then). You should consider being a little more open minded about how you could, possibly, write software more effectively. Especially when you have someone who's been on both sides of the fence advocating it. Just writing it off because you think people who do FP just write "compilers or similar processing throughput systems" is shooting yourself in the foot more than anything.

[1] Battalion Wars: https://en.wikipedia.org/wiki/Battalion_Wars

[2] Fire Blade: https://en.wikipedia.org/wiki/Fire_Blade_(video_game)

[3] Warmhammer 40,000 Fire Warrior: https://en.wikipedia.org/wiki/Warhammer_40%2C000%3A_Fire_War...

[4] Call of Duty - Finest Hour: https://en.wikipedia.org/wiki/Call_of_Duty:_Finest_Hour

[5] The Regiment: https://en.wikipedia.org/wiki/The_Regiment_(video_game)

[6] Reign of Fire: https://en.wikipedia.org/wiki/Reign_of_Fire_(video_game)

[7] Attack of the Saucerman: https://en.wikipedia.org/wiki/Attack_of_the_Saucerman

[8] Lunatik [canned]: https://www.unseen64.net/2020/11/16/lunatik-pure-entertainme...

nec4b · on Feb 18, 2023

>>That is possibly why there hasn't been a big push for FP in games yet, because they haven't yet hit the complexity limit of OO like others have.

Are you saying the biggest and probably most complex software (OS-es, browsers, etc.) out there is written in fp languages?

>> Games is one of the easiest sectors to be in, the complexity level is low compared to other sectors, especially now all the difficult bits are done for you (off the shelf engines).

As far as I know those "off the shelf engines" are not written in fp languages. And I would say they are complex.

louthy · on Feb 18, 2023

> Are you saying the biggest and probably most complex software (OS-es, browsers, etc.) out there is written in fp languages?

You know I didn't say that.

The imperative coding approach is cognitively more difficult. So at scale (size of code-base) you either have more bugs and therefore more overhead/cost of development; or you look for coping strategies to minimise the bugs and the cost of development.

Declarative, pure functional programming reduces the cognitive load, because you get to compose things reliably. And because it's declarative and pure you don't have to look under the hood to see what is going on, you know from the surface what large bodies of code do.

A simple example, two functions prototypes:

One in Java

    int foo(string x);

One in Haskell:

    foo :: String -> Int

I can tell you much more about the Haskell one than I can the Java one. The Java one could launch nuclear missiles for all I know, the Haskell one can't and therefore the returned Int has some relationship with the String argument, probably the length of the string. When you don't need to look inside you are more productive. When you can reliably compose code it is more robust and easier to build. When you declare your side-effects up-front, you can make better judgements about what the outcome of calling any function might be.

You are of course free to take whichever approach you like. The imperative one is however just going to get progressively more difficult as applications get more complex. I spent 27 years writing various combinations of assembly, procedural, and OO. I've spent the past 10 using the functional paradigm. If I ever had to go back to the imperative paradigm it would fill me with dread. I still need to use it occasionally for performance optimisations, but I would always wrap it with a declarative interface (where the contract always holds).

> As far as I know those "off the shelf engines" are not written in fp languages. And I would say they are complex.

When I talked about 'easy', what I meant was that games development has much less of the real world to deal with:

* You tend to know your user

* The user doesn't get to have a say

* You often know exactly what hardware you're going to be running on

* You don't have to integrate with 20 other systems by 3rd party providers

* To a certain extent you can put it in a box and move on (I know that's not strictly true, but you're unlikely to be maintaining a game for more than a decade)

* The feature-set of the game won't double every few years as the product competes with others

* Most games have a relatively similar management system

* There aren't massively complex compliance issues or laws that get in the way (yes there are some, but less than other industries)

* Although networking and game-state synchronisation is a hard problem, the scope of distributed computing problems are small

A game engine really isn't that complex. Of course it's hard to make it as optimal as possible, but the problem isn't at the real difficult end of the spectrum. Probably the hardest thing in games now is writing the tooling. I spent a number of years writing core tech and tools for a 3 studio games company and that was just at the beginning of the tooling journey. I assume that's quite a sizeable coding challenge now; one I would definitely us FP for if I was still in the industry.