> A large fraction of the flaws in software development are due to programmers n...

mightybyte · on Feb 18, 2023

It took me awhile to realize but this is one of the big things that algebraic data types (ADTs) help you to do...design your data types so they have exactly the number of valid states that your domain has. To use another common FP way of saying it...make invalid states unrepresentable.

toastal · on Feb 18, 2023

It's not just the ADTs but the exhaustive pattern matching that help make sure you've covered all of the cases in your ADT. You'll see JavaScripters use objects as a poor man's ADT where they'll use the key name as a the constructor or a { tag: key, ... }, but that has caveats (aside from feeling less first-class): 1) JavaScript can't do exhaustive pattern matching so you'll always need to handle null cases, 2) checks must all happen at runtime which means you'll end up having to throw/catch exceptions and nothing will hold your hand to say that you've missed a case. TypeScripters can handle 1 & 2 for safety, but the ergonomics are bad and verbose so you'll see a lot of folks skip it. Similar languages will have pattern matching, safety, but lack the lightweight/dense ergonomics. When you dive into a language in the ML family (Haskell, OCaml, Standard ML, et. al.), the ergonomics make you want to use them, and they are idiomatic to the ecosystem and you'll want to use them for just about everything--either the ones in the Preludes/stdlib like Maybe, Either, List, etc. or by building them yourself.

An example in the wild that demonstrates this `fp-ts`'s explanation of how to do ADTs in TypeScript where you can see the comparison in PureScript is a one-liner (both data and polymorphic data): https://github.com/gcanti/fp-ts/blob/master/docs/guides/pure...

whizzter · on Feb 19, 2023

That Typescript ADT example is heavily outdated (or explicitly obtuse)

class Bar { tag="bar" as const; constructor(public value:string){} } class Baz { tag="baz" as const; constructor(public value:boolean){} } type Foo = Bar|Baz;

toastal · on Feb 19, 2023

That is better, but it's still nowhere near `type Foo = Bar String | Baz Boolean`. It's also nowhere as clean to pattern match on.

whizzter · on Feb 20, 2023

That's true if pattern matching specifically is what you want, in practice we don't constructs ADT's for the sake of it and rolls with interfaces that also meshes well with data imported from other sources (ie JSON).

Also while pattern matching isn't an explicit thing in TS you get most benefits from compiler checked accesses that can recognize type-tests in your code and derive types and do property access checks.

ChrisMarshallNY · on Feb 18, 2023

A classic way to do this, is enumerations.

KMag · on Feb 18, 2023

The nice thing about ADTs is that you can couple the state variables with the enum, so that the representable state is better tied to the enum value. (Yes, you can kind of do this in C with a discriminated union of a struct holding an enum and a union, but the type system doesn't stop you form grabbing a union member that doesn't match up with the current enum value.)

ChrisMarshallNY · on Feb 18, 2023

I use Swift.

Swift enums are really nice[0].

[0] https://littlegreenviper.com/miscellany/swiftwater/enums-wit...

justinpombrio · on Feb 18, 2023

To clarify: Swift enums are ADTs, because they can hold data for each case. Same as Rust enums. This contrasts to C enums which are not ADTs because they cannot store additional data.

jb1991 · on Feb 18, 2023

I like Swift but the examples on that page are quite over-engineered. Reads like someone was seduced by their own cleverness. There are much simpler ways to achieve what is being done in that example.

ChrisMarshallNY · on Feb 18, 2023

If you read that page, you’d understand that it was supposed to be “over-engineered.” I assume that you did read it, and took the least charitable viewpoint. I didn’t post it for peer review. It reflects my enthusiasm, as I was learning the language, and is probably four years old.

I'm a fairly decent chap, and have nothing against you. Not sure why it's so important to you to begin our relationship with an attack. I suspect we have far more in common, than we do, differences.

jb1991 · on Feb 18, 2023

My apologies, I didn’t realize you were posting a link to your own site! I would have phrased my comment differently, although probably that should not have made a difference and I should’ve phrased it differently anyway.

ChrisMarshallNY · on Feb 18, 2023

And mine. I just get tired of the "Attack first" approach that everyone takes, these days.

I know that I sound like a demented old man, when I say that I miss the old days, when we all shared passions for the tech, and, even when we were in competition with each other, we tended to have respect for one another.

But those were also the days of the UseNet flamewars, so it wasn't all flowers and bunnies (and I was definitely a flame warrior).

the__alchemist · on Feb 18, 2023

ADTs is a concept I've had a tough time understanding; the approximate definition that made the most impact with is considering them as a specific type of enumerations: Ones where each variant holds a value. Is this correct?

thesz · on Feb 18, 2023

It is the other way around: enumerations are special cases of algebraic data types with no values attached.

In algebraic data types we have a sum (union) of a labeled cartesian products, and each product may have arbitrary number of values:

data MaybeInt = NoInt | AnInt Int

data MaybeItsPair = NoIntsPair | AnIntPair Int Int

The first constructor in both data types is a labeled (No...) product with empty number of value to apply cartesian product to.

The second constructor in both data types is a cartesian product: in first case an Int and in case a pair of Ints.

Of course you can have more involved data type:

data Expr = Void | Const Int | Var String | Bin BinOp Expr Expr | Un UnOp Expr

A label, two single value constructors and much more involved operations as well.

ghostpepper · on Feb 18, 2023

Can you explain the term 'cartesian product' in this context? I'm familiar with the cross product from undergrad linear algebra but I haven't applied algebra to types before, and I don't understand what two sets would produce a single Int as their product.

I haven't studied much theoretical computer science so I'd love to hear some good beginner resources on this stuff.

thesz · on Feb 18, 2023

Cartesian product: https://en.wikipedia.org/wiki/Cartesian_product

The number of variants of the AnIntPair construction is the number of distinct elements in the Int type, squared.

I used definition of algebraic data type in Haskell: http://wiki.haskell.org/Algebraic_data_type

justinpombrio · on Feb 18, 2023

"Cartesian product" is a fancy word for "pair". Or more generally "tuple".

ChrisMarshallNY · on Feb 18, 2023

That’s the way Swift enums work.

Here’s an interesting little exploration: https://littlegreenviper.com/miscellany/swiftwater/writing-a...

peterfirefly · on Feb 18, 2023

Hence why they are called enums in Rust.

jlouis · on Feb 18, 2023

Enumerations are not ADTs.

andrewprock · on Feb 18, 2023

The notion that your use cases have static well defines use cases is quaint. In practice, I've seen endless debates about whether state ahold be represented as a variable, parameter, type, or member, often with good arguments for all sides.

timcobb · on Feb 18, 2023

And state machines

eddsh1994 · on Feb 18, 2023

State machines in the type system to check they’re good at compile time! <3 Idris

HybridCurve · on Feb 18, 2023

I actually just started to refactor some of my code earlier this week to deal with this very problem.

It was related to deserialized data being transposed from one complex digraph to another. The reason I did not catch it as being problematic sooner was precisely because the state management was loosely distributed within the transposing functions' call hierarchy. The code which was trying to manage the states had emerged from fixes for a series of different problems related to edge cases, and once the new problems started relating to fixes for the others, I had that "Ahhh, well sh*t.." moment you get when finding you were missing something important.

> I think this probably the most important concept for programmers to keep at the front of their minds when working day to day.

The lesson I had learned was to be more mindful when synthesizing a solution and there might be more state interdependence than we expect. Centralizing state management has helped me solve it in this case.

louthy · on Feb 18, 2023

> adding new business logic to a process,

Adding logic is the smallest part of the problem: adding types, that aren't constrained to represent the smallest set of values needed for the problem at hand, that's the biggest problem. This 'looseness' doesn't show up as a compilation error and looks on the surface to be just fine. Then the real-world comes along and creates a value that should never exist and the code falls over.

Constraining the data by properly defining types is the (primary) way to reduce the possible states an application can get in. This can be via sum-types (discriminated unions), or even making sure you don't use `int`, `string`, etc. and other types that are just one step away from being dynamic.

When doing pure functional programming, along with properly constrained types, you tend to find the code writes itself. What I mean by this is that it becomes very much about composing functions via 'type lego'. Once the types fit, your code works [1]. This is a very profound feeling once you grok it, because it gives oneself a feeling of confidence in ones own code that doesn't happen in imperative land. I will sometimes go days or weeks without even running the code I'm writing because I just have confidence in it.

[1] It's still possible to write logic errors, but they're the only major class of bugs you get really.

charcircuit · on Feb 18, 2023

And how do you know you've constricted your data enough? You don't and you get bugs. You don't have a perfect specification you are building.

In the micro scale in terms of variables this is not the biggest problem that exists with programs. Enums don't somehow become garbage because it's technically possible to overwrite them with garage. The macro state of the entire system adds much more complexity and bugs. There is too many things that you have to keep in mind and systems are too big for one single person to understand it all. Pure functional still has this macro state. The state is stored in the program counter and by carrying along the current state with it.

If you break any programming task into fine enough steps it will feel like lego. It's nothing special. If I need to print out the numbers 1 through 10 there is a single canonical way my whole team could come up with independent from one another.

>I will sometimes go days or weeks without even running the code I'm writing because I just have confidence in it.

On a small scare this is possible but once you start working on a larger team or if you have to integrate your code with constantly changing systems you will want to be testing your code.

>[1] It's still possible to write logic errors, but they're the only major class of bugs you get really.

You can still make security bugs and performance bugs. Logic bugs is a giant category of bugs and are what most bugs belong to.

louthy · on Feb 18, 2023

> And how do you know you've constricted your data enough?

Types are composable. Sum-types and product-types allow the composition of smaller types into larger ones. They come in the form of discriminated-unions and records in FP languages. So when it comes to the question of how do I know when I've constricted my data enough, it's when I know all the components of all types have been constricted enough.

You don't have to design some massive schema for your entire application. You can absolutely do it as you go. Just create a type that represents the smallest set of values that are needed for whatever concept it reprsents. Then compose those.

> Logic bugs is a giant category of bugs and are what most bugs belong to.

I think logic bugs are relatively rare once you constrain the possible states your code can be in - and they usually come about because of some cyclomatic complexity issue (i.e. putting the if/else branches the wrong way around). Logic just happens to be the bit of your codebase that can't be captured in types (well, not with trivial types anyway). It's the bit of what we write that we interpret from the human world. That's our job, to translate the requirements into code, and the logic is that bit and therefore more prone to human error. I still find I have fewer of them when I have good constraining types that limit the scope for fuckups.

The current state of the world can absolutely be captured in types, then the logic is simply a fold over a sequence of events that take the old state of the world and transforms it into a new one. This is the best way of modeling time also, which is a major source of bugs in imperative code. Mutating data structures kills any effective modeling of time, so cheats like locks are needed. Even with locks there's no effective history and no understanding of now, which makes code difficult to reason about.

That is the idealised end goal of pure functional programming. The reality is most people will still use an `int` when they should use a constrained value (say [1..10] rather than [int.min..int.max]); and so their program can still get into a bad-state™. The more you constrain though, the fewer bugs you will have.

This is the lot of most devs, working out how much risk you're willing to take.

* Those using dynamic languages take the most risk. They do it because they believe they can write code quicker. Their trade off is lots of runtime errors, or runtime checks, or building lots of unit tests.

* Those using languages like Java or C# take fewer risks, as the compiler can help them out, but they don't tend to create types like `Month` that can only hold the constrained set of ints [1..12]. So they run the risk of accepting 13. They also have extensible type-hierarchies (inheritance) that means its impossible to know the full scope of any 'abstracted' interface reference passed around. They must cross their fingers and hope nobody screwed up the implementation. They're also not composable, which leads to additional logic problems as they're glued together manually.

* Those using languages like Haskell don't have the inheritance problem. Everything is sum-types, product-types, or exponential-types (functions) and is therefore composable. This creates an opportunity to create the idealised program (as I mention above). Logic bugs still exist and usually people don't constrain their types completely, but the situation is often much less risky and turns into more robust code.

> On a small scare this is possible but once you start working on a larger team or if you have to integrate your code with constantly changing systems you will want to be testing your code.

I run two development teams. But I also write a lot of research prototypes, this tends to be where I will just have some open ended project that I am playing around with - often programming with types to see what 'level ups' I can get from doing so. But even in a day-to-day project I could easily spend a few days writing code before I need to check it. For example, I recently wrote a whole language virtual-machine: AST, type-system, type-inference, type-checking, and evaluation without needing to run it - all the types lined-up and each subsystem was clear from a logic point-of-view. Currently, I'm building a C# monad-transformer system for language-ext [1] based around transducers, this is pure abstract type-level programming - I haven't run it yet. If I can run it in my head and it type-checks, that's usually good enough.

Of course at some point I do actually run the thing ;)

[1] https://github.com/louthy/language-ext

charcircuit · on Feb 18, 2023

>I know when I've constricted my data enough, it's when I know all the components of all types have been constricted enough.

Do you not see that this is a tautology?

>That's our job, to translate the requirements into code, and the logic is that bit and therefore more prone to human error.

The requirements given are typically ambiguous and it is our job to come up with the complete specification ourself. Our specification may have bugs ignoring the implementation. I am not denying that with enough diligence you can mostly avoid errors in implementing the spec, but I am saying finding the correct specification for what your code should do is too hard to do compared to implementing a wrong spec and then fixing it as bugs come up.

>Those using languages like Haskell don't have the inheritance problem. Everything is sum-types, product-types, or exponential-types (functions) and is therefore composable.

Inheritance is just a different way to create sum types.

louthy · on Feb 19, 2023

>I know when I've constricted my data enough, it's when I know all the components of all types have been constricted enough. > Do you not see that this is a tautology?

It might be if you hadn't butchered what I wrote. If I have a type called Month, and it can only accept values 1-12. Then I know I have constrained that type enough. If I then create types called Day and Year and constrain those I know they're constrained enough.

If I then compose Day, Month, Year into a type called Date and check the rules of the number of Days in a month so that an invalid date can't be instantiated then I have a more complex type leveraging the simpler ones. I could then put Date into a record type called Appointment, etc. etc. For each type I create I know what data I am trying to represent, so I constrain the type at that point. There's no tautology here, it's just composition of types. Making larger types from smaller ones and making sure they can't ever hold bad state.

> Inheritance is just a different way to create sum types.

Not really, they're open. Sum-types are closed. i.e. they represent a concrete set of states. The openness of an inheritance hierarchy is the problem. We nearly never need completely open type hierarchies, it is pretty rare that something needs to be extensible in that way outside of library development.

That doesn't mean inheritance is always bad, but the 'always inheritance' approach that is often championed in the OO world certainly is.

The goal is to write total functions [1] where every value in the domain can be mapped to a value in the co-domain. If that can't happen then your code is less predictable (throws exceptions or has undeclared side-effects). Having complete control of the states of any data-type is critical to making the approach easy to implement.

[1] https://www.statisticshowto.com/total-function/

charcircuit · on Feb 19, 2023

>into a record type called Appointment, etc. etc

This breaks down in the etc. What if an appointment is not valid on them weekend, or a holiday, or when someone's child has soccer practice, or when that appointment conflicts with another one. There are all sorts of restriction that can be added and yes they technically cover some potential bug that someone technically could introduce. It may be hard to predict all of these restrictions ahead of time.

>Not really, they're open

At compile time 99% of the time they are closed. Languages also add features to let you make it closed.

>If that can't happen then your code is less predictable (throws exceptions or has undeclared side-effects).

This is impossible in the real world where hardware issues are real. Exceptions and crashes are okay and is just a fact of life when working with large software syshems. We know these systems are not going to be perfect and embrace it rather than trying to perfectly handle every single thing perfectly every time. It's accepting that these bugs exist and that processes should be put in place in identifying them and monitoring them so they can be fixed.

Total functions also don't allow for infinite loops which is common place in real systems that mare expected to potentially run forever instead of only being able to serve 1000 requests before exiting.

louthy · on Feb 19, 2023

> This breaks down in the etc. What if an appointment is not valid on them weekend, or a holiday, or when someone's child has soccer practice, or when that appointment conflicts with another one.

Either the rules are an instrinsic part of an appointment or they're not. If you need something that represents exceptions to the state of a basic appointment then that's the job of the containing type (say a type called WeeklySchedule or something like that). Just like `Day` can be [1..31], but when used in a a `Date` then the constructor of `Date` wouldn't allow a `Day` of 31 and a `Month` of 2.

> At compile time 99% of the time they are closed. Languages also add features to let you make it closed.

You can't close an interface.

> This is impossible in the real world where hardware issues are real.

Exceptions in imperative-land tend to be used for all errors, not just exceptional ones, but expected ones. The problem is that no one knows what these side-effects are from the surface (i.e. a function prototype doesn't expose it).

Truly exceptional events should probably shut down the application (things like out-of-memory), or, as with Erlang, escalate to a supervision node that would restart the service.

Exceptions exist in functional languages too. However, expected errors, like 'file not found' and other application level 'could happen' errors should be caught at the point they are raised and represented in the type-system.

For example, a common pure pattern would be `Either l r = Left l | Right r` where the result is either a success (Right) or an alternative value (Left - usually used to mean 'failed').

When your function has this in its declaration, you are then aware of possible alternative paths your code could go down. This isn't known with exceptions.

For example the `parseInt` from C# and Haskell (below), one declares its side-effect upfront, then other doesn't. It may be semi-obvious that the C# one must throw an exception, but that isn't the only outcome - it could return `0`.

    int parseInt(string x);

    parseInt :: String -> Either Error Int

This confusion of what happens inside large blocks of code is exactly the issue I'm trying to highlight.

> Total functions also don't allow for infinite loops which is common place in real systems that mare expected to potentially run forever instead of only being able to serve 1000 requests before exiting.

In pure maths, sure, in reality yeah they do. Haskell has ⊥ [1] in its type-system for exactly this reason.

Anyway, you seem hellbent on shooting down this approach rather than engaging in inquisative discussion about something you don't know or understand. So, this will be my last reply in this thread. Feel free to have the last word as I'm not wasting more time explaining something that is literally about soundness in code and why it's valuable. If you don't get it now, you never will.

[1] https://wiki.haskell.org/Bottom

flohofwoe · on Feb 18, 2023

> ...you tend to find the code writes itself.

...in theory. The problem is usually that you had already spent a lot more time designing all your type constraints before "the code can write itself", then it would have been to just sit down and write some code to get the damn thing working first.

And at least IME, spending so much time upfront without any visible process results in too rigid "over-architected" systems which are hard to extend when the requirements ineviatably change next month (or even just an external system your code needs to talk to).

The best solution is usually to not give in to either extremes, but find a well balanced approach somewhere in the middle where you have some sane restrictions designed into the types to prevent the most obvious bugs, but not enough that the whole house needs to be rebuild if a little requirement detail changes.

> I will sometimes go days or weeks without even running the code I'm writing because I just have confidence in it.

...that's *exactly* what I mean, just that I don't see this as a good thing ;)

louthy · on Feb 18, 2023

You're mischaracterising what I am saying. You're talking about 'over architected' systems. That doesn't represent how I work or anything I am advocating. Types are composable, when you design a system or sub-system, make sure those types are constrained to represent the smallest set of values the sub-system needs. Then those types can be composed with other types to make larger systems. You start small and build out with composition, you don't need to architect some epic type-schema in one sitting.

It's really not that hard to do (well in FP languages anyway) and ends up improving throughput for any coding task due to the reduced need to constatly check if something does what you think it should do.

> The best solution is usually to not give in to either extremes, but find a well balanced approach somewhere in the middle where you have some sane restrictions designed into the types to prevent the most obvious bugs.

That isn't the best solution, it's a solution. As I mentioned in another comment [1], it's about risk, and how much you're willing to take on as a developer and as a team. I work in healthcare software, I want it to be right, because I don't want my fuck-ups to kill a patient. But even if I didn't, I'd still prefer to write code this way because it is sooo much easier to manage the mental model of the code-base, especially when the code-base is super-large.

> but not enough that the whole house needs to be rebuild if a little requirement detail changes

That isn't my experience at all. If the whole house needs a rebuild when some little requirement detail changes then you're doing it wrong, sorry. One thing that can happen when using sum-types instead of inheritance, is that adding a new case to the sum-type requires you to go and consider all pattern-matching usages of it. This can be painful, but it forces you to consider the impact of your change. I believe this to be a good thing and compilers usually help you here anyway. This trade-off is captured in the Expression Problem [2]

[1] https://news.ycombinator.com/item?id=34847326

[2] https://en.wikipedia.org/wiki/Expression_problem

Jensson · on Feb 18, 2023

Games and many similar domains are harder to write code for than CRUD. What you talk about works well for CRUD apps, but basically every set of development ideas works well for CRUD apps, it isn't hard to do.

louthy · on Feb 18, 2023

As an ex-3D engine dev, I respectfully disagree. But if you're happy to give some examples of why you can't constrain your data-types then I'm happy to admit I'm wrong.

I realise sometimes there's aspects of a game that needs to write to the metal, and compromises need to be made, but that isn't necessary for an entire game. I know people are using my language-ext library with Unity projects too, so I assume there are others in games that don't agree (although I don't know exactly what kind of titles they're working on).

[1] https://github.com/louthy/language-ext

naasking · on Feb 18, 2023

> [1] https://github.com/louthy/language-ext

Cool library. I've had a few of these patterns in my Sasa library for years, but you've taken it to the Haskell extreme! Probably further than most C# developers could stomach. ;-)

You might be interested in checking out the hash array mapped trie from Sasa [1]. It cleverly exploits the CLR's reified generics to unbox the trie at various levels which ends up saving quite a bit of space and indirections, so it performs almost on par with the mutable dictionary.

I had an earlier version that used an outer struct to ensure it's never null [2], similar to how your collections seem to work, but switched to classes to make it more idiomatic in C#.

I recently started sketching out a Haskell-like generic "Deriving" source generator, contrasted with your domain-specific piecemeal approach, ie. [Record], [Reader], etc. Did you ever try that approach?

[1] https://sourceforge.net/p/sasa/code/ci/default/tree/Sasa.Col...

[2] https://sourceforge.net/p/sasa/code/ci/57417faec5ed442224a0f...

louthy · on Feb 18, 2023

> so it performs almost on par with the mutable dictionary.

I notice you have a comment that says "This trie is now the fastest immutable dictionary I'm aware of". Unfortunately I have to make you aware that my implementation is faster, sorry! ;)

Here's the add-items benchmark:

    BenchmarkDotNet=v0.13.5, OS=Windows 11 (10.0.22621.1265/22H2/2022Update/SunValley2)
    AMD Ryzen Threadripper PRO 3995WX 64-Cores, 1 CPU, 128 logical and 64 physical cores
    .NET SDK=6.0.300
      [Host]     : .NET 6.0.5 (6.0.522.21309), X64 RyuJIT AVX2
      DefaultJob : .NET 6.0.5 (6.0.522.21309), X64 RyuJIT AVX2

    |                    Method |      N |           Mean |
    |-------------------------- |------- |---------------:|
    | SysColImmutableDictionary |    100 |      32.315 μs |
    |                  SasaTrie |    100 |       8.007 μs |
    |          SysColDictionary |    100 |       1.440 μs |
    |            LangExtHashMap |    100 |       7.512 μs |
    |-------------------------- |------- |---------------:|
    | SysColImmutableDictionary |   1000 |     524.368 μs |
    |                  SasaTrie |   1000 |     140.537 μs |
    |          SysColDictionary |   1000 |      17.395 μs |
    |            LangExtHashMap |   1000 |     125.663 μs |
    |-------------------------- |------- |---------------:|
    | SysColImmutableDictionary |  10000 |   7,598.532 μs |
    |                  SasaTrie |  10000 |   2,055.220 μs |
    |          SysColDictionary |  10000 |     301.683 μs |
    |            LangExtHashMap |  10000 |   1,842.514 μs |
    |-------------------------- |------- |---------------:|
    | SysColImmutableDictionary | 100000 | 129,811.705 μs |
    |                  SasaTrie | 100000 |  46,752.702 μs |
    |          SysColDictionary | 100000 |   6,103.999 μs |
    |            LangExtHashMap | 100000 |  37,869.597 μs |

* SysColImmutableDictionary = System.Collections.Immutable.ImmutableDictionary

* SysColDictionary = System.Collections.Generic.Dictionary

My other benchmarks are still running, so I won't share them all, but we actually trade blows, your ContainsKey appears to be faster, my collection iteration is twice as fast as yours. I suspect if I removed my struct wrapper it would be a bit quicker, but wouldn't be a real-world fair comparison.

We're both much faster than Microsoft's ImmutableCollections though!

I am certainly interested in what that technique is with the nested Node<Node<Node<Node<... and why it works. Do you have any more detail on that? You might also want to look into the data-structure I use: CHAMP [1]. I notice you use HAMT, CHAMP is a more efficient version of HAMT. It has better data-locality, which is much more efficient especially when working with value-types.

With regards to the 'deriving' question: Yeah, I'm working on a more general solution now, as well as a way to do monad-transformers properly. I've also figured out a way to do infinite tail-recursion for my monadic types, which I'll be deploying in the next major update.

[1] https://michael.steindorfer.name/publications/phd-thesis-eff...

* EDIT *

Here's the full suite of benchmarks, but I've trimmed them down to just the 10,000 item tests and removed the error and deviations for clarity:

    +-----------------------+----------------+
    | Add 10000 items (value type)           |
    +-----------------------+----------------+
    |        SasaTrie       |   2,055.220 μs |
    |        LangExtHashMap |   1,842.514 μs |
    +-----------------------+----------------+
    | Add 10000 items (reference type)       |
    +-----------------------+----------------+
    |              SasaTrie |   2,988.663 μs |
    |        LangExtHashMap |   2,905.552 μs |
    +-----------------------+----------------+
    | ContainsKey (10000 value items map)    |
    +-----------------------+----------------+
    |              SasaTrie |   173,215.0 ns |
    |        LangExtHashMap |   235,577.2 ns |
    +-----------------------+----------------+
    | ContainsKey (10000 reference map)      |
    +-----------------------+----------------+
    |              SasaTrie |     518.789 μs |
    |        LangExtHashMap |     549.952 μs |
    +-----------------------+----------------+
    | Iterate (10000 value items map)        |
    +-----------------------+----------------+
    |              SasaTrie |   457,819.0 ns |
    |        LangExtHashMap |   227,632.7 ns |
    +-----------------------+----------------+
    | Iterate (10000 references map)         |
    +-----------------------+----------------+
    |              SasaTrie |   712,623.1 ns |
    |        LangExtHashMap |   519,250.7 ns |
    +-----------------------+----------------+
    | Random remove (10000 value items map)  |
    +-----------------------+----------------+
    |              SasaTrie | 2,276,772.4 ns |
    |        LangExtHashMap | 1,765,042.9 ns |
    +-----------------------+----------------+
    | Random remove (10000 references map    |
    +-----------------------+----------------+
    |              SasaTrie |   2,946.775 μs |
    |        LangExtHashMap |   2,576.813 μs |
    +-----------------------+----------------+

naasking · on Feb 19, 2023

Nice! I'm away for the weekend so will follow-up with an explanation of the nested generic when not on the phone if the following is unclear: the basic idea is that you inline the next level of the Trie into the previous level, thus collapsing the depth and removing some indirections. This slightly bloats the internal tree nodes but the leaves vastly outnumber the internal nodes, and since the leaves have been inlined one level up, it ends up compacting it somewhat. This is maybe clearer when starting with a struct-only design for the internal nodes and then performing the inlining as an optimization on that.

Also wondering whether you checked the Trie from the latest repo or the older struct version I also linked. I think the struct version was a little faster but I switched for the aforementioned idiomatic reasons.

I'll have to check out the CHAMP design for sure, I'm curious how it differs.

Edit: I also recall that I didn't optimize iteration at all and that it was fairly inefficient, but can't really confirm now. I recall optimizing merge carefully though so that should perform well.

Edit 2: forgot to mention that Node<...> generic is nested 6 times because that's the max depth of the trie for 32 bit keys with 32 element nodes.

louthy · on Feb 19, 2023

> Also wondering whether you checked the Trie from the latest repo or the older struct version I also linked

I included Sasa.Collections 1.0.0-RC4 from Nu-Get. So I assume it's your latest version. I've added Sasa.Collections permanently to my benchmarks seeing as you're the first library to get close to the performance of the language-ext collections - I'd like to keep an eye on you, haha! Always good to have a bit of friendly competition :)

> forgot to mention that Node<...> generic is nested 6 times because that's the max depth of the trie for 32 bit keys with 32 element nodes.

Yep, I figured that one. Still I look at it and go 'huh?'. I've been looking at code all day though, so I'll have a proper look tomorrow when less tired.

naasking · on Feb 19, 2023

Start with this representation:

    struct Node<TKey, TValue>
    {
      uint bitmap;
      KeyValuePair<TKey, TValue>[] entries;
      Node<TKey, TValue>[] children;
      // Node = Leaf of bitmap * children | Internal of bitmap * children
      // We inline that sum into a struct where
      // Leaf -> entries != null
      // Internal -> children != null
    }

Now consider the case where leaves are somewhat sparse. That means the children arrays will have few entries, so instead of allocating a bunch of one-element arrays for children which themselves allocate a bunch of one-element arrays of values, just inline the value arrays in the parent. Now propagate that same idea for internal nodes all the way back to the root.

Also, because these are all structs, the JIT should specialize the code generated for each level so it should be pretty fast. IIRC, the speed difference between the original and the inlined representation was noticeable but not crazy, but the memory use difference was considerable. This was all years ago though, first implementation was 2013.

naasking · on Feb 19, 2023

I belatedly realized this might require some explanation too. Sorry, too many years of experiments and microoptimizations.

This is an unboxed representation of a sum type. Rather than relying on the CLR's inheritance to model sums I lay it out explicitly at the cost of a little space. It may seem wasteful since it looks like I'm adding 8 unused bytes to each node, but the CLR object header is 16 bytes so you're actually saving 8 bytes per node by avoiding classes and using structs that have no header! You can really see the difference in the working set if you do a full GC.Collect and check the allocated bytes after constructing some large tries.

Also, because I'm removing a heap object per node, this representation incurs one less indirection per node accessed, so I'm doing half as many loads as an equivalent class-based representation for every operation. This is a large constant factor difference, and probably partly explains why Sasa's ContainsKey is faster than yours.

Finally, no trie operatioms incur any virtual dispatch, which are instead replaced by a simple if-checks which are generally more predictable. This probably has less impact these days since branch predictors are pretty good now.

I think if you replaced your class-based representation that uses the superior CHAMP model with this unboxed representation your hash map would be the uncontested champion. You might even challenge the mutable dictionary for performance!

louthy · on Feb 19, 2023

> You might even challenge the mutable dictionary for performance!

Don't tell me that, I'm an optimisation nut, I might actually do it! ;)

I woke up this morning with a clear understanding of what your optimisation is, so I guess I just needed to load up my brain with the details! It's definitely a super interesting approach and one that I'm surprised I've not come across before.

I'm definitely a bit wary of changing my current implementation, mostly because it's been battle hardened, but I'm sure I won't be able to resist having a go at some point. Thanks for the tip!

naasking · on Feb 19, 2023

Do it! Do it! Do it! If only for a couple of tests to see how much it changes the results. ;-)

Unboxing sums is a nice optimization but then you can't naively use switch-patern matching to deconstruct them.

I have a bunch of other things in Sasa and other libraries you might find useful. I'm not actively working on most of them anymore except for bug fixes. I learned a lot but didn't end up using a lot of these features as much as I'd hoped.

For instance, being able to create open instance delegates in a way that automatically works around the CLR limits against such delegates to virtual methods. Some of the concurrency primitives are also interesting, as they implement an efficient atomic read/write protocols for arbitrary sized types using only volatile reads/writes (ie. avoid torn reads), and a sort of LLSC using only volatile read/write and a single interlocked inc/dec. Also, I added a kind system to CLR reflection to make working with it much easier [2].

It seems we're thinking along the same lines for numeric types. I reproduced the Haskell numeric hierarchy [1], but I put that on hold because I was thinking a [Deriving] attribute would eliminate a lot of redundancy.

Just FYI, clicking Num<A> on the main GitHub markdown page doesn't jump to the link on the markup.

Lots more to see if you're interested! I've played with parser combinators but never liked how they turned out, and settled on a simpler approach that was pretty interesting.

[1] https://github.com/naasking/HigherLogics.Algebra

[2] https://github.com/naasking/Dynamics.NET#kind-system

Jensson · on Feb 18, 2023

3d engine is very structured, it isn't the creative part of game development.

Lets say you want to test your bullets having HP and can be shot down. So you add that in an hour, now bullets have HP, you test it, it doesn't work, so you remove it. Things like that needs to be seamless, because you need to iterate on ideas and test if things are fun. Big upfront design as you suggest where you very carefully constrain everything doesn't do well with that. It works well for 3d engines for example sure, but 3d engines are much more constrained than games.

louthy · on Feb 18, 2023

> 3d engine is very structured, it isn't the creative part of game development.

I also did lots of the creative bit ;). I just assumed this was going to go down a performance route question, sorry!

> Big upfront design as you suggest

No, I am not suggesting that at all - I'm saying the opposite of that. You're the third person to think that's what I am saying now, so I assume I didn't explain myself very well. Rather than repeat myself again, my other replies cover this [1] [2]:

[1] https://news.ycombinator.com/item?id=34847413

[2] https://news.ycombinator.com/item?id=34847326

Jensson · on Feb 18, 2023

Well, you say that works, but basically nobody who programs creatively writes code that way. People who like functional programming mostly writes compilers or similar processing throughput systems, you see it a ton there, but you barely see it on the other side of the programming landscape such as game programming.

So, you say it works for you, but basically every example of the things I talked about are done in a very different way than you describe, so unless you have programmed some very impressive games I would say that the data strongly supports another style of programming than you suggest here. For compilers or parsers or stuff like medical systems? Sure, go ahead, it seems to work for those domains, but it doesn't seem to be a panacea. If games were so much easier to write that way we would see tons of them, especially among the better games, but we don't.

naasking · on Feb 18, 2023

> If games were so much easier to write that way we would see tons of them, especially among the better games, but we don't.

Functional programming only started going mainstream a few years ago. Give it time, it will make an impact. You can already see the influence of functional reactive programming in UI frameworks like React, where the screen state is a pure function of the program state. That's exactly what games have been doing for years, ie. rendering every from based on program state, only now we have a purely functional model of how this works.

Some of the "purely functional" aspects will be optimized/compiled away at various layers, but the programmer would still be able to work with the high-level model without attending to those details.

louthy · on Feb 18, 2023

The biggest mistake you make is thinking you do something different to me or any other software engineer. You don't. You do the same thing and the same rules apply. You seem to think you're the only person doing anything creative. Well here's a newsflash: it's the same process in games as it is in all other forms of software development. The end result might look different, but the process is the same. The best devs are the creative ones, but creativity isn't limited to what appears on the screen, it comes from architecting complex solutions to difficult problems, it comes from elegance in code, and it comes from producing something that works for the end-user - whether that's a gamer, a doctor, or someone browsing on their phone.

Games is one of the easiest sectors to be in, the complexity level is low compared to other sectors, especially now all the difficult bits are done for you (off the shelf engines). I assume you're not working on titles that have 15 million lines of code in, like I do every day.

That is possibly why there hasn't been a big push for FP in games yet, because they haven't yet hit the complexity limit of OO like others have. Your argument is "Well we've always done it like this, why change?". Probably for the same reason people don't write games in assembly languages any more: at some point the current tools aren't good enough. Games might not have hit that ceiling yet, but you'll almost certainly be affected by the artefacts of getting close to it.

I see you're even questioning my history in games. Below [1][2][3][4][5][6][7][8] are some of the titles I worked on (or my code is in). But to be clear, we didn't use these techniques back then either, because it was too soon to be doing it (or we weren't enlightened then). You should consider being a little more open minded about how you could, possibly, write software more effectively. Especially when you have someone who's been on both sides of the fence advocating it. Just writing it off because you think people who do FP just write "compilers or similar processing throughput systems" is shooting yourself in the foot more than anything.

[1] Battalion Wars: https://en.wikipedia.org/wiki/Battalion_Wars

[2] Fire Blade: https://en.wikipedia.org/wiki/Fire_Blade_(video_game)

[3] Warmhammer 40,000 Fire Warrior: https://en.wikipedia.org/wiki/Warhammer_40%2C000%3A_Fire_War...

[4] Call of Duty - Finest Hour: https://en.wikipedia.org/wiki/Call_of_Duty:_Finest_Hour

[5] The Regiment: https://en.wikipedia.org/wiki/The_Regiment_(video_game)

[6] Reign of Fire: https://en.wikipedia.org/wiki/Reign_of_Fire_(video_game)

[7] Attack of the Saucerman: https://en.wikipedia.org/wiki/Attack_of_the_Saucerman

[8] Lunatik [canned]: https://www.unseen64.net/2020/11/16/lunatik-pure-entertainme...

nec4b · on Feb 18, 2023

>>That is possibly why there hasn't been a big push for FP in games yet, because they haven't yet hit the complexity limit of OO like others have.

Are you saying the biggest and probably most complex software (OS-es, browsers, etc.) out there is written in fp languages?

>> Games is one of the easiest sectors to be in, the complexity level is low compared to other sectors, especially now all the difficult bits are done for you (off the shelf engines).

As far as I know those "off the shelf engines" are not written in fp languages. And I would say they are complex.

louthy · on Feb 18, 2023

> Are you saying the biggest and probably most complex software (OS-es, browsers, etc.) out there is written in fp languages?

You know I didn't say that.

The imperative coding approach is cognitively more difficult. So at scale (size of code-base) you either have more bugs and therefore more overhead/cost of development; or you look for coping strategies to minimise the bugs and the cost of development.

Declarative, pure functional programming reduces the cognitive load, because you get to compose things reliably. And because it's declarative and pure you don't have to look under the hood to see what is going on, you know from the surface what large bodies of code do.

A simple example, two functions prototypes:

One in Java

    int foo(string x);

One in Haskell:

    foo :: String -> Int

I can tell you much more about the Haskell one than I can the Java one. The Java one could launch nuclear missiles for all I know, the Haskell one can't and therefore the returned Int has some relationship with the String argument, probably the length of the string. When you don't need to look inside you are more productive. When you can reliably compose code it is more robust and easier to build. When you declare your side-effects up-front, you can make better judgements about what the outcome of calling any function might be.

You are of course free to take whichever approach you like. The imperative one is however just going to get progressively more difficult as applications get more complex. I spent 27 years writing various combinations of assembly, procedural, and OO. I've spent the past 10 using the functional paradigm. If I ever had to go back to the imperative paradigm it would fill me with dread. I still need to use it occasionally for performance optimisations, but I would always wrap it with a declarative interface (where the contract always holds).

> As far as I know those "off the shelf engines" are not written in fp languages. And I would say they are complex.

When I talked about 'easy', what I meant was that games development has much less of the real world to deal with:

* You tend to know your user

* The user doesn't get to have a say

* You often know exactly what hardware you're going to be running on

* You don't have to integrate with 20 other systems by 3rd party providers

* To a certain extent you can put it in a box and move on (I know that's not strictly true, but you're unlikely to be maintaining a game for more than a decade)

* The feature-set of the game won't double every few years as the product competes with others

* Most games have a relatively similar management system

* There aren't massively complex compliance issues or laws that get in the way (yes there are some, but less than other industries)

* Although networking and game-state synchronisation is a hard problem, the scope of distributed computing problems are small

A game engine really isn't that complex. Of course it's hard to make it as optimal as possible, but the problem isn't at the real difficult end of the spectrum. Probably the hardest thing in games now is writing the tooling. I spent a number of years writing core tech and tools for a 3 studio games company and that was just at the beginning of the tooling journey. I assume that's quite a sizeable coding challenge now; one I would definitely us FP for if I was still in the industry.

jmrobles · on Feb 18, 2023

The problem is even bigger when you are reviewing PRs, the lack of context is the big problem,imo.

KMag · on Feb 18, 2023

I agree.

Good code design, regardless of paradigm used, encapsulates complexity in order to minimize the amount of context needed to understand any one bit of code. The better the encapsulation, the better we can scale our limited mental capacity to larger and more complex systems.

If the complexity is better encapsulated, it's also faster and easier for the reviewer to acquire the necessary context.

When I was first learning OOP, I wish someone had mentioned why classes are useful: they leverage our natural mental abstraction machinery for understanding the world. Walking down the road, we generally think of each car as a whole and ignore thinking about the thousands of parts that make it up. It would be overwhelming to simply cross a busy street if we were considering all of the millions of car components flowing down the streets or considering the various differences in cars all the time, rather than abstractly thinking of various differing collections of parts as just "car". All of this happens at a subconscious level, and we can consciously tap into some of this mental machinery to help us reason about code. Of course, we don't need OOP to leverage this mental machinery, but it's one technique for doing so.

simplotek · on Feb 18, 2023

> So many of the problems I see come from developers (myself included) adding new business logic to a process, or fixing a bug and not thinking through the side effects of the change

That's a great take, but I'd turn it upside down. Problems come from developers adding new features while not being mindful of the importance of helping out whoever is reading that code in the future to understand what are all the possible states.

Anyone can hack together code that's all over the place, but forcing people in the future to do shotgun surgery is as bad as adding the bugs ourselves.

dzdt · on Feb 18, 2023

One thing I've seen that helps out immensely with this in a large complex code base is extremely liberal use of ASSERTs.

Every time you have a function or clause that takes an object or data with many possible versions or states, but only will work correctly with particular assumptions, ASSERT that your assumptions are satisfied. Every time.

This helps in several ways. At the time of writing the code, it makes you think about possible states that could be represented and the generality of what your new code actually handles, and so encourages you to write general solutions and avoid unhandled or mishandled cases.

At the time of reading the code, it helps the reader see what cases the writer considered. It helps document the intended scope and use cases the code is designed for.

At the time of extending surrounding code, it helps to catch cases where the extension introduced state combinations that had not been considered before. Its nice to make an extension, run a test case, and get an assert that tells you exactly what other function needs an extension. Without the assert you would make the extension and run your test case and get wrong behavior or a crash and have a big debugging mystery or even worse: apparently working but subtly wrong code.

Others in the thread are talking about making invalid states unrepresentable, which is nice, but not always doable. In a big code base you don't have the luxury of redesigning things; usually you have to stick to more local changes. But you can always add ASSERTs -- that is a completely local way to mark invalid states as invalid.

lucidguppy · on Feb 18, 2023

Haskell case expressions are nice. It seems like you can define which input values a function can process - and you can get a log of if/else logic removed.

makeitdouble · on Feb 18, 2023

I am always wondering how much are we expected to understand the code we’re working on.

It is very easy to just say people should know everything, but in practice people don‘t review every single library, nor defend against every single possible case, and surely don’t keep everything in memory while working on any single problem.

I really can’t imagine someone “fully understanding” how the linux kernel works for instance, or even a full graphic card driver, or the software on a POS works as working knowledge.

So yeah, people cause bugs because they don’t know it all, but I’m not sure that’s saying much.

PS: real progress is probably made starting from assuming people don’t understand, and having frameworks that somewhat keep the system manageable under that premise.

boppo1 · on Feb 18, 2023

Total newb here, but according to Stroustrup, the whole point of programming is so you don't have to know everything. You're responsible for:

A. Understanding relevant computing fundamentals

B. Understanding the interfaces to code you use

C. Writing reliable implementations and documenting them well

D. Writing useful and appropriate interfaces for your implementations

In a perfect world, this creates a nice chain where every programmer only needs to look after these four points and then everything will work seamlessly.

Too bad we don't live there.

iExploder · on Feb 18, 2023

this is why I think tests are so vital.

good luck trying to fix a bug in a complex project without doing some sort of test to make sure the change does not break one of the thousands of requirements implements so far.

from my experience fixing a bug without validating the system has a very high chance to spawn multiple bugs found later on...

_448 · on Feb 18, 2023

I think the best way to solve this problem is by writing small digestible message passing functions (like Erlang) and each function written using design by contrast (like Eiffel). That way the situation can be bought under some control when the software scales.

taberiand · on Feb 18, 2023

See also the Joe Armstrong talk "The Mess We're In"

https://youtu.be/lKXe3HUG2l4