> A large fraction of the flaws in software development are due to programmers not fully understanding all the possible states their code may execute in.
I think this probably the most important concept for programmers to keep at the front of their minds when working day to day. So many of the problems I see come from developers (myself included) adding new business logic to a process, or fixing a bug and not thinking through the side effects of the change
It took me awhile to realize but this is one of the big things that algebraic data types (ADTs) help you to do...design your data types so they have exactly the number of valid states that your domain has. To use another common FP way of saying it...make invalid states unrepresentable.
It's not just the ADTs but the exhaustive pattern matching that help make sure you've covered all of the cases in your ADT. You'll see JavaScripters use objects as a poor man's ADT where they'll use the key name as a the constructor or a { tag: key, ... }, but that has caveats (aside from feeling less first-class): 1) JavaScript can't do exhaustive pattern matching so you'll always need to handle null cases, 2) checks must all happen at runtime which means you'll end up having to throw/catch exceptions and nothing will hold your hand to say that you've missed a case. TypeScripters can handle 1 & 2 for safety, but the ergonomics are bad and verbose so you'll see a lot of folks skip it. Similar languages will have pattern matching, safety, but lack the lightweight/dense ergonomics. When you dive into a language in the ML family (Haskell, OCaml, Standard ML, et. al.), the ergonomics make you want to use them, and they are idiomatic to the ecosystem and you'll want to use them for just about everything--either the ones in the Preludes/stdlib like Maybe, Either, List, etc. or by building them yourself.
An example in the wild that demonstrates this `fp-ts`'s explanation of how to do ADTs in TypeScript where you can see the comparison in PureScript is a one-liner (both data and polymorphic data): https://github.com/gcanti/fp-ts/blob/master/docs/guides/pure...
That Typescript ADT example is heavily outdated (or explicitly obtuse)
class Bar { tag="bar" as const; constructor(public value:string){} }
class Baz { tag="baz" as const; constructor(public value:boolean){} }
type Foo = Bar|Baz;
That's true if pattern matching specifically is what you want, in practice we don't constructs ADT's for the sake of it and rolls with interfaces that also meshes well with data imported from other sources (ie JSON).
Also while pattern matching isn't an explicit thing in TS you get most benefits from compiler checked accesses that can recognize type-tests in your code and derive types and do property access checks.
The nice thing about ADTs is that you can couple the state variables with the enum, so that the representable state is better tied to the enum value. (Yes, you can kind of do this in C with a discriminated union of a struct holding an enum and a union, but the type system doesn't stop you form grabbing a union member that doesn't match up with the current enum value.)
To clarify: Swift enums are ADTs, because they can hold data for each case. Same as Rust enums. This contrasts to C enums which are not ADTs because they cannot store additional data.
I like Swift but the examples on that page are quite over-engineered. Reads like someone was seduced by their own cleverness. There are much simpler ways to achieve what is being done in that example.
If you read that page, you’d understand that it was supposed to be “over-engineered.” I assume that you did read it, and took the least charitable viewpoint. I didn’t post it for peer review. It reflects my enthusiasm, as I was learning the language, and is probably four years old.
I'm a fairly decent chap, and have nothing against you. Not sure why it's so important to you to begin our relationship with an attack. I suspect we have far more in common, than we do, differences.
My apologies, I didn’t realize you were posting a link to your own site! I would have phrased my comment differently, although probably that should not have made a difference and I should’ve phrased it differently anyway.
And mine. I just get tired of the "Attack first" approach that everyone takes, these days.
I know that I sound like a demented old man, when I say that I miss the old days, when we all shared passions for the tech, and, even when we were in competition with each other, we tended to have respect for one another.
But those were also the days of the UseNet flamewars, so it wasn't all flowers and bunnies (and I was definitely a flame warrior).
ADTs is a concept I've had a tough time understanding; the approximate definition that made the most impact with is considering them as a specific type of enumerations: Ones where each variant holds a value. Is this correct?
Can you explain the term 'cartesian product' in this context? I'm familiar with the cross product from undergrad linear algebra but I haven't applied algebra to types before, and I don't understand what two sets would produce a single Int as their product.
I haven't studied much theoretical computer science so I'd love to hear some good beginner resources on this stuff.
The notion that your use cases have static well defines use cases is quaint. In practice, I've seen endless debates about whether state ahold be represented as a variable, parameter, type, or member, often with good arguments for all sides.
I actually just started to refactor some of my code earlier this week to deal with this very problem.
It was related to deserialized data being transposed from one complex digraph to another. The reason I did not catch it as being problematic sooner was precisely because the state management was loosely distributed within the transposing functions' call hierarchy. The code which was trying to manage the states had emerged from fixes for a series of different problems related to edge cases, and once the new problems started relating to fixes for the others, I had that "Ahhh, well sh*t.." moment you get when finding you were missing something important.
> I think this probably the most important concept for programmers to keep at the front of their minds when working day to day.
The lesson I had learned was to be more mindful when synthesizing a solution and there might be more state interdependence than we expect. Centralizing state management has helped me solve it in this case.
Adding logic is the smallest part of the problem: adding types, that aren't constrained to represent the smallest set of values needed for the problem at hand, that's the biggest problem. This 'looseness' doesn't show up as a compilation error and looks on the surface to be just fine. Then the real-world comes along and creates a value that should never exist and the code falls over.
Constraining the data by properly defining types is the (primary) way to reduce the possible states an application can get in. This can be via sum-types (discriminated unions), or even making sure you don't use `int`, `string`, etc. and other types that are just one step away from being dynamic.
When doing pure functional programming, along with properly constrained types, you tend to find the code writes itself. What I mean by this is that it becomes very much about composing functions via 'type lego'. Once the types fit, your code works [1]. This is a very profound feeling once you grok it, because it gives oneself a feeling of confidence in ones own code that doesn't happen in imperative land. I will sometimes go days or weeks without even running the code I'm writing because I just have confidence in it.
[1] It's still possible to write logic errors, but they're the only major class of bugs you get really.
And how do you know you've constricted your data enough? You don't and you get bugs. You don't have a perfect specification you are building.
In the micro scale in terms of variables this is not the biggest problem that exists with programs. Enums don't somehow become garbage because it's technically possible to overwrite them with garage. The macro state of the entire system adds much more complexity and bugs. There is too many things that you have to keep in mind and systems are too big for one single person to understand it all. Pure functional still has this macro state. The state is stored in the program counter and by carrying along the current state with it.
If you break any programming task into fine enough steps it will feel like lego. It's nothing special. If I need to print out the numbers 1 through 10 there is a single canonical way my whole team could come up with independent from one another.
>I will sometimes go days or weeks without even running the code I'm writing because I just have confidence in it.
On a small scare this is possible but once you start working on a larger team or if you have to integrate your code with constantly changing systems you will want to be testing your code.
>[1] It's still possible to write logic errors, but they're the only major class of bugs you get really.
You can still make security bugs and performance bugs. Logic bugs is a giant category of bugs and are what most bugs belong to.
> And how do you know you've constricted your data enough?
Types are composable. Sum-types and product-types allow the composition of smaller types into larger ones. They come in the form of discriminated-unions and records in FP languages. So when it comes to the question of how do I know when I've constricted my data enough, it's when I know all the components of all types have been constricted enough.
You don't have to design some massive schema for your entire application. You can absolutely do it as you go. Just create a type that represents the smallest set of values that are needed for whatever concept it reprsents. Then compose those.
> Logic bugs is a giant category of bugs and are what most bugs belong to.
I think logic bugs are relatively rare once you constrain the possible states your code can be in - and they usually come about because of some cyclomatic complexity issue (i.e. putting the if/else branches the wrong way around). Logic just happens to be the bit of your codebase that can't be captured in types (well, not with trivial types anyway). It's the bit of what we write that we interpret from the human world. That's our job, to translate the requirements into code, and the logic is that bit and therefore more prone to human error. I still find I have fewer of them when I have good constraining types that limit the scope for fuckups.
The current state of the world can absolutely be captured in types, then the logic is simply a fold over a sequence of events that take the old state of the world and transforms it into a new one. This is the best way of modeling time also, which is a major source of bugs in imperative code. Mutating data structures kills any effective modeling of time, so cheats like locks are needed. Even with locks there's no effective history and no understanding of now, which makes code difficult to reason about.
That is the idealised end goal of pure functional programming. The reality is most people will still use an `int` when they should use a constrained value (say [1..10] rather than [int.min..int.max]); and so their program can still get into a bad-state™. The more you constrain though, the fewer bugs you will have.
This is the lot of most devs, working out how much risk you're willing to take.
* Those using dynamic languages take the most risk. They do it because they believe they can write code quicker. Their trade off is lots of runtime errors, or runtime checks, or building lots of unit tests.
* Those using languages like Java or C# take fewer risks, as the compiler can help them out, but they don't tend to create types like `Month` that can only hold the constrained set of ints [1..12]. So they run the risk of accepting 13. They also have extensible type-hierarchies (inheritance) that means its impossible to know the full scope of any 'abstracted' interface reference passed around. They must cross their fingers and hope nobody screwed up the implementation. They're also not composable, which leads to additional logic problems as they're glued together manually.
* Those using languages like Haskell don't have the inheritance problem. Everything is sum-types, product-types, or exponential-types (functions) and is therefore composable. This creates an opportunity to create the idealised program (as I mention above). Logic bugs still exist and usually people don't constrain their types completely, but the situation is often much less risky and turns into more robust code.
> On a small scare this is possible but once you start working on a larger team or if you have to integrate your code with constantly changing systems you will want to be testing your code.
I run two development teams. But I also write a lot of research prototypes, this tends to be where I will just have some open ended project that I am playing around with - often programming with types to see what 'level ups' I can get from doing so. But even in a day-to-day project I could easily spend a few days writing code before I need to check it. For example, I recently wrote a whole language virtual-machine: AST, type-system, type-inference, type-checking, and evaluation without needing to run it - all the types lined-up and each subsystem was clear from a logic point-of-view. Currently, I'm building a C# monad-transformer system for language-ext [1] based around transducers, this is pure abstract type-level programming - I haven't run it yet. If I can run it in my head and it type-checks, that's usually good enough.
Of course at some point I do actually run the thing ;)
>I know when I've constricted my data enough, it's when I know all the components of all types have been constricted enough.
Do you not see that this is a tautology?
>That's our job, to translate the requirements into code, and the logic is that bit and therefore more prone to human error.
The requirements given are typically ambiguous and it is our job to come up with the complete specification ourself. Our specification may have bugs ignoring the implementation. I am not denying that with enough diligence you can mostly avoid errors in implementing the spec, but I am saying finding the correct specification for what your code should do is too hard to do compared to implementing a wrong spec and then fixing it as bugs come up.
>Those using languages like Haskell don't have the inheritance problem. Everything is sum-types, product-types, or exponential-types (functions) and is therefore composable.
Inheritance is just a different way to create sum types.
>I know when I've constricted my data enough, it's when I know all the components of all types have been constricted enough.
> Do you not see that this is a tautology?
It might be if you hadn't butchered what I wrote. If I have a type called Month, and it can only accept values 1-12. Then I know I have constrained that type enough. If I then create types called Day and Year and constrain those I know they're constrained enough.
If I then compose Day, Month, Year into a type called Date and check the rules of the number of Days in a month so that an invalid date can't be instantiated then I have a more complex type leveraging the simpler ones. I could then put Date into a record type called Appointment, etc. etc. For each type I create I know what data I am trying to represent, so I constrain the type at that point. There's no tautology here, it's just composition of types. Making larger types from smaller ones and making sure they can't ever hold bad state.
> Inheritance is just a different way to create sum types.
Not really, they're open. Sum-types are closed. i.e. they represent a concrete set of states. The openness of an inheritance hierarchy is the problem. We nearly never need completely open type hierarchies, it is pretty rare that something needs to be extensible in that way outside of library development.
That doesn't mean inheritance is always bad, but the 'always inheritance' approach that is often championed in the OO world certainly is.
The goal is to write total functions [1] where every value in the domain can be mapped to a value in the co-domain. If that can't happen then your code is less predictable (throws exceptions or has undeclared side-effects). Having complete control of the states of any data-type is critical to making the approach easy to implement.
This breaks down in the etc. What if an appointment is not valid on them weekend, or a holiday, or when someone's child has soccer practice, or when that appointment conflicts with another one. There are all sorts of restriction that can be added and yes they technically cover some potential bug that someone technically could introduce. It may be hard to predict all of these restrictions ahead of time.
>Not really, they're open
At compile time 99% of the time they are closed. Languages also add features to let you make it closed.
>If that can't happen then your code is less predictable (throws exceptions or has undeclared side-effects).
This is impossible in the real world where hardware issues are real. Exceptions and crashes are okay and is just a fact of life when working with large software syshems. We know these systems are not going to be perfect and embrace it rather than trying to perfectly handle every single thing perfectly every time. It's accepting that these bugs exist and that processes should be put in place in identifying them and monitoring them so they can be fixed.
Total functions also don't allow for infinite loops which is common place in real systems that mare expected to potentially run forever instead of only being able to serve 1000 requests before exiting.
> This breaks down in the etc. What if an appointment is not valid on them weekend, or a holiday, or when someone's child has soccer practice, or when that appointment conflicts with another one.
Either the rules are an instrinsic part of an appointment or they're not. If you need something that represents exceptions to the state of a basic appointment then that's the job of the containing type (say a type called WeeklySchedule or something like that). Just like `Day` can be [1..31], but when used in a a `Date` then the constructor of `Date` wouldn't allow a `Day` of 31 and a `Month` of 2.
> At compile time 99% of the time they are closed. Languages also add features to let you make it closed.
You can't close an interface.
> This is impossible in the real world where hardware issues are real.
Exceptions in imperative-land tend to be used for all errors, not just exceptional ones, but expected ones. The problem is that no one knows what these side-effects are from the surface (i.e. a function prototype doesn't expose it).
Truly exceptional events should probably shut down the application (things like out-of-memory), or, as with Erlang, escalate to a supervision node that would restart the service.
Exceptions exist in functional languages too. However, expected errors, like 'file not found' and other application level 'could happen' errors should be caught at the point they are raised and represented in the type-system.
For example, a common pure pattern would be `Either l r = Left l | Right r` where the result is either a success (Right) or an alternative value (Left - usually used to mean 'failed').
When your function has this in its declaration, you are then aware of possible alternative paths your code could go down. This isn't known with exceptions.
For example the `parseInt` from C# and Haskell (below), one declares its side-effect upfront, then other doesn't. It may be semi-obvious that the C# one must throw an exception, but that isn't the only outcome - it could return `0`.
int parseInt(string x);
parseInt :: String -> Either Error Int
This confusion of what happens inside large blocks of code is exactly the issue I'm trying to highlight.
> Total functions also don't allow for infinite loops which is common place in real systems that mare expected to potentially run forever instead of only being able to serve 1000 requests before exiting.
In pure maths, sure, in reality yeah they do. Haskell has ⊥ [1] in its type-system for exactly this reason.
Anyway, you seem hellbent on shooting down this approach rather than engaging in inquisative discussion about something you don't know or understand. So, this will be my last reply in this thread. Feel free to have the last word as I'm not wasting more time explaining something that is literally about soundness in code and why it's valuable. If you don't get it now, you never will.
...in theory. The problem is usually that you had already spent a lot more time designing all your type constraints before "the code can write itself", then it would have been to just sit down and write some code to get the damn thing working first.
And at least IME, spending so much time upfront without any visible process results in too rigid "over-architected" systems which are hard to extend when the requirements ineviatably change next month (or even just an external system your code needs to talk to).
The best solution is usually to not give in to either extremes, but find a well balanced approach somewhere in the middle where you have some sane restrictions designed into the types to prevent the most obvious bugs, but not enough that the whole house needs to be rebuild if a little requirement detail changes.
> I will sometimes go days or weeks without even running the code I'm writing because I just have confidence in it.
...that's *exactly* what I mean, just that I don't see this as a good thing ;)
You're mischaracterising what I am saying. You're talking about 'over architected' systems. That doesn't represent how I work or anything I am advocating. Types are composable, when you design a system or sub-system, make sure those types are constrained to represent the smallest set of values the sub-system needs. Then those types can be composed with other types to make larger systems. You start small and build out with composition, you don't need to architect some epic type-schema in one sitting.
It's really not that hard to do (well in FP languages anyway) and ends up improving throughput for any coding task due to the reduced need to constatly check if something does what you think it should do.
> The best solution is usually to not give in to either extremes, but find a well balanced approach somewhere in the middle where you have some sane restrictions designed into the types to prevent the most obvious bugs.
That isn't the best solution, it's a solution. As I mentioned in another comment [1], it's about risk, and how much you're willing to take on as a developer and as a team. I work in healthcare software, I want it to be right, because I don't want my fuck-ups to kill a patient. But even if I didn't, I'd still prefer to write code this way because it is sooo much easier to manage the mental model of the code-base, especially when the code-base is super-large.
> but not enough that the whole house needs to be rebuild if a little requirement detail changes
That isn't my experience at all. If the whole house needs a rebuild when some little requirement detail changes then you're doing it wrong, sorry. One thing that can happen when using sum-types instead of inheritance, is that adding a new case to the sum-type requires you to go and consider all pattern-matching usages of it. This can be painful, but it forces you to consider the impact of your change. I believe this to be a good thing and compilers usually help you here anyway. This trade-off is captured in the Expression Problem [2]
Games and many similar domains are harder to write code for than CRUD. What you talk about works well for CRUD apps, but basically every set of development ideas works well for CRUD apps, it isn't hard to do.
As an ex-3D engine dev, I respectfully disagree. But if you're happy to give some examples of why you can't constrain your data-types then I'm happy to admit I'm wrong.
I realise sometimes there's aspects of a game that needs to write to the metal, and compromises need to be made, but that isn't necessary for an entire game. I know people are using my language-ext library with Unity projects too, so I assume there are others in games that don't agree (although I don't know exactly what kind of titles they're working on).
Cool library. I've had a few of these patterns in my Sasa library for years, but you've taken it to the Haskell extreme! Probably further than most C# developers could stomach. ;-)
You might be interested in checking out the hash array mapped trie from Sasa [1]. It cleverly exploits the CLR's reified generics to unbox the trie at various levels which ends up saving quite a bit of space and indirections, so it performs almost on par with the mutable dictionary.
I had an earlier version that used an outer struct to ensure it's never null [2], similar to how your collections seem to work, but switched to classes to make it more idiomatic in C#.
I recently started sketching out a Haskell-like generic "Deriving" source generator, contrasted with your domain-specific piecemeal approach, ie. [Record], [Reader], etc. Did you ever try that approach?
> so it performs almost on par with the mutable dictionary.
I notice you have a comment that says "This trie is now the fastest immutable dictionary I'm aware of". Unfortunately I have to make you aware that my implementation is faster, sorry! ;)
My other benchmarks are still running, so I won't share them all, but we actually trade blows, your ContainsKey appears to be faster, my collection iteration is twice as fast as yours. I suspect if I removed my struct wrapper it would be a bit quicker, but wouldn't be a real-world fair comparison.
We're both much faster than Microsoft's ImmutableCollections though!
I am certainly interested in what that technique is with the nested Node<Node<Node<Node<... and why it works. Do you have any more detail on that? You might also want to look into the data-structure I use: CHAMP [1]. I notice you use HAMT, CHAMP is a more efficient version of HAMT. It has better data-locality, which is much more efficient especially when working with value-types.
With regards to the 'deriving' question: Yeah, I'm working on a more general solution now, as well as a way to do monad-transformers properly. I've also figured out a way to do infinite tail-recursion for my monadic types, which I'll be deploying in the next major update.
Nice! I'm away for the weekend so will follow-up with an explanation of the nested generic when not on the phone if the following is unclear: the basic idea is that you inline the next level of the Trie into the previous level, thus collapsing the depth and removing some indirections. This slightly bloats the internal tree nodes but the leaves vastly outnumber the internal nodes, and since the leaves have been inlined one level up, it ends up compacting it somewhat. This is maybe clearer when starting with a struct-only design for the internal nodes and then performing the inlining as an optimization on that.
Also wondering whether you checked the Trie from the latest repo or the older struct version I also linked. I think the struct version was a little faster but I switched for the aforementioned idiomatic reasons.
I'll have to check out the CHAMP design for sure, I'm curious how it differs.
Edit: I also recall that I didn't optimize iteration at all and that it was fairly inefficient, but can't really confirm now. I recall optimizing merge carefully though so that should perform well.
Edit 2: forgot to mention that Node<...> generic is nested 6 times because that's the max depth of the trie for 32 bit keys with 32 element nodes.
> Also wondering whether you checked the Trie from the latest repo or the older struct version I also linked
I included Sasa.Collections 1.0.0-RC4 from Nu-Get. So I assume it's your latest version. I've added Sasa.Collections permanently to my benchmarks seeing as you're the first library to get close to the performance of the language-ext collections - I'd like to keep an eye on you, haha! Always good to have a bit of friendly competition :)
> forgot to mention that Node<...> generic is nested 6 times because that's the max depth of the trie for 32 bit keys with 32 element nodes.
Yep, I figured that one. Still I look at it and go 'huh?'. I've been looking at code all day though, so I'll have a proper look tomorrow when less tired.
struct Node<TKey, TValue>
{
uint bitmap;
KeyValuePair<TKey, TValue>[] entries;
Node<TKey, TValue>[] children;
// Node = Leaf of bitmap * children | Internal of bitmap * children
// We inline that sum into a struct where
// Leaf -> entries != null
// Internal -> children != null
}
Now consider the case where leaves are somewhat sparse. That means the children arrays will have few entries, so instead of allocating a bunch of one-element arrays for children which themselves allocate a bunch of one-element arrays of values, just inline the value arrays in the parent. Now propagate that same idea for internal nodes all the way back to the root.
Also, because these are all structs, the JIT should specialize the code generated for each level so it should be pretty fast. IIRC, the speed difference between the original and the inlined representation was noticeable but not crazy, but the memory use difference was considerable. This was all years ago though, first implementation was 2013.
I belatedly realized this might require some explanation too. Sorry, too many years of experiments and microoptimizations.
This is an unboxed representation of a sum type. Rather than relying on the CLR's inheritance to model sums I lay it out explicitly at the cost of a little space. It may seem wasteful since it looks like I'm adding 8 unused bytes to each node, but the CLR object header is 16 bytes so you're actually saving 8 bytes per node by avoiding classes and using structs that have no header! You can really see the difference in the working set if you do a full GC.Collect and check the allocated bytes after constructing some large tries.
Also, because I'm removing a heap object per node, this representation incurs one less indirection per node accessed, so I'm doing half as many loads as an equivalent class-based representation for every operation. This is a large constant factor difference, and probably partly explains why Sasa's ContainsKey is faster than yours.
Finally, no trie operatioms incur any virtual dispatch, which are instead replaced by a simple if-checks which are generally more predictable. This probably has less impact these days since branch predictors are pretty good now.
I think if you replaced your class-based representation that uses the superior CHAMP model with this unboxed representation your hash map would be the uncontested champion. You might even challenge the mutable dictionary for performance!
> You might even challenge the mutable dictionary for performance!
Don't tell me that, I'm an optimisation nut, I might actually do it! ;)
I woke up this morning with a clear understanding of what your optimisation is, so I guess I just needed to load up my brain with the details! It's definitely a super interesting approach and one that I'm surprised I've not come across before.
I'm definitely a bit wary of changing my current implementation, mostly because it's been battle hardened, but I'm sure I won't be able to resist having a go at some point. Thanks for the tip!
Do it! Do it! Do it! If only for a couple of tests to see how much it changes the results. ;-)
Unboxing sums is a nice optimization but then you can't naively use switch-patern matching to deconstruct them.
I have a bunch of other things in Sasa and other libraries you might find useful. I'm not actively working on most of them anymore except for bug fixes. I learned a lot but didn't end up using a lot of these features as much as I'd hoped.
For instance, being able to create open instance delegates in a way that automatically works around the CLR limits against such delegates to virtual methods. Some of the concurrency primitives are also interesting, as they implement an efficient atomic read/write protocols for arbitrary sized types using only volatile reads/writes (ie. avoid torn reads), and a sort of LLSC using only volatile read/write and a single interlocked inc/dec. Also, I added a kind system to CLR reflection to make working with it much easier [2].
It seems we're thinking along the same lines for numeric types. I reproduced the Haskell numeric hierarchy [1], but I put that on hold because I was thinking a [Deriving] attribute would eliminate a lot of redundancy.
Just FYI, clicking Num<A> on the main GitHub markdown page doesn't jump to the link on the markup.
Lots more to see if you're interested! I've played with parser combinators but never liked how they turned out, and settled on a simpler approach that was pretty interesting.
3d engine is very structured, it isn't the creative part of game development.
Lets say you want to test your bullets having HP and can be shot down. So you add that in an hour, now bullets have HP, you test it, it doesn't work, so you remove it. Things like that needs to be seamless, because you need to iterate on ideas and test if things are fun. Big upfront design as you suggest where you very carefully constrain everything doesn't do well with that. It works well for 3d engines for example sure, but 3d engines are much more constrained than games.
> 3d engine is very structured, it isn't the creative part of game development.
I also did lots of the creative bit ;). I just assumed this was going to go down a performance route question, sorry!
> Big upfront design as you suggest
No, I am not suggesting that at all - I'm saying the opposite of that. You're the third person to think that's what I am saying now, so I assume I didn't explain myself very well. Rather than repeat myself again, my other replies cover this [1] [2]:
Well, you say that works, but basically nobody who programs creatively writes code that way. People who like functional programming mostly writes compilers or similar processing throughput systems, you see it a ton there, but you barely see it on the other side of the programming landscape such as game programming.
So, you say it works for you, but basically every example of the things I talked about are done in a very different way than you describe, so unless you have programmed some very impressive games I would say that the data strongly supports another style of programming than you suggest here. For compilers or parsers or stuff like medical systems? Sure, go ahead, it seems to work for those domains, but it doesn't seem to be a panacea. If games were so much easier to write that way we would see tons of them, especially among the better games, but we don't.
> If games were so much easier to write that way we would see tons of them, especially among the better games, but we don't.
Functional programming only started going mainstream a few years ago. Give it time, it will make an impact. You can already see the influence of functional reactive programming in UI frameworks like React, where the screen state is a pure function of the program state. That's exactly what games have been doing for years, ie. rendering every from based on program state, only now we have a purely functional model of how this works.
Some of the "purely functional" aspects will be optimized/compiled away at various layers, but the programmer would still be able to work with the high-level model without attending to those details.
The biggest mistake you make is thinking you do something different to me or any other software engineer. You don't. You do the same thing and the same rules apply. You seem to think you're the only person doing anything creative. Well here's a newsflash: it's the same process in games as it is in all other forms of software development. The end result might look different, but the process is the same. The best devs are the creative ones, but creativity isn't limited to what appears on the screen, it comes from architecting complex solutions to difficult problems, it comes from elegance in code, and it comes from producing something that works for the end-user - whether that's a gamer, a doctor, or someone browsing on their phone.
Games is one of the easiest sectors to be in, the complexity level is low compared to other sectors, especially now all the difficult bits are done for you (off the shelf engines). I assume you're not working on titles that have 15 million lines of code in, like I do every day.
That is possibly why there hasn't been a big push for FP in games yet, because they haven't yet hit the complexity limit of OO like others have. Your argument is "Well we've always done it like this, why change?". Probably for the same reason people don't write games in assembly languages any more: at some point the current tools aren't good enough. Games might not have hit that ceiling yet, but you'll almost certainly be affected by the artefacts of getting close to it.
I see you're even questioning my history in games. Below [1][2][3][4][5][6][7][8] are some of the titles I worked on (or my code is in). But to be clear, we didn't use these techniques back then either, because it was too soon to be doing it (or we weren't enlightened then). You should consider being a little more open minded about how you could, possibly, write software more effectively. Especially when you have someone who's been on both sides of the fence advocating it. Just writing it off because you think people who do FP just write "compilers or similar processing throughput systems" is shooting yourself in the foot more than anything.
>>That is possibly why there hasn't been a big push for FP in games yet, because they haven't yet hit the complexity limit of OO like others have.
Are you saying the biggest and probably most complex software (OS-es, browsers, etc.) out there is written in fp languages?
>> Games is one of the easiest sectors to be in, the complexity level is low compared to other sectors, especially now all the difficult bits are done for you (off the shelf engines).
As far as I know those "off the shelf engines" are not written in fp languages. And I would say they are complex.
> Are you saying the biggest and probably most complex software (OS-es, browsers, etc.) out there is written in fp languages?
You know I didn't say that.
The imperative coding approach is cognitively more difficult. So at scale (size of code-base) you either have more bugs and therefore more overhead/cost of development; or you look for coping strategies to minimise the bugs and the cost of development.
Declarative, pure functional programming reduces the cognitive load, because you get to compose things reliably. And because it's declarative and pure you don't have to look under the hood to see what is going on, you know from the surface what large bodies of code do.
A simple example, two functions prototypes:
One in Java
int foo(string x);
One in Haskell:
foo :: String -> Int
I can tell you much more about the Haskell one than I can the Java one. The Java one could launch nuclear missiles for all I know, the Haskell one can't and therefore the returned Int has some relationship with the String argument, probably the length of the string. When you don't need to look inside you are more productive. When you can reliably compose code it is more robust and easier to build. When you declare your side-effects up-front, you can make better judgements about what the outcome of calling any function might be.
You are of course free to take whichever approach you like. The imperative one is however just going to get progressively more difficult as applications get more complex. I spent 27 years writing various combinations of assembly, procedural, and OO. I've spent the past 10 using the functional paradigm. If I ever had to go back to the imperative paradigm it would fill me with dread. I still need to use it occasionally for performance optimisations, but I would always wrap it with a declarative interface (where the contract always holds).
> As far as I know those "off the shelf engines" are not written in fp languages. And I would say they are complex.
When I talked about 'easy', what I meant was that games development has much less of the real world to deal with:
* You tend to know your user
* The user doesn't get to have a say
* You often know exactly what hardware you're going to be running on
* You don't have to integrate with 20 other systems by 3rd party providers
* To a certain extent you can put it in a box and move on (I know that's not strictly true, but you're unlikely to be maintaining a game for more than a decade)
* The feature-set of the game won't double every few years as the product competes with others
* Most games have a relatively similar management system
* There aren't massively complex compliance issues or laws that get in the way (yes there are some, but less than other industries)
* Although networking and game-state synchronisation is a hard problem, the scope of distributed computing problems are small
A game engine really isn't that complex. Of course it's hard to make it as optimal as possible, but the problem isn't at the real difficult end of the spectrum. Probably the hardest thing in games now is writing the tooling. I spent a number of years writing core tech and tools for a 3 studio games company and that was just at the beginning of the tooling journey. I assume that's quite a sizeable coding challenge now; one I would definitely us FP for if I was still in the industry.
Good code design, regardless of paradigm used, encapsulates complexity in order to minimize the amount of context needed to understand any one bit of code. The better the encapsulation, the better we can scale our limited mental capacity to larger and more complex systems.
If the complexity is better encapsulated, it's also faster and easier for the reviewer to acquire the necessary context.
When I was first learning OOP, I wish someone had mentioned why classes are useful: they leverage our natural mental abstraction machinery for understanding the world. Walking down the road, we generally think of each car as a whole and ignore thinking about the thousands of parts that make it up. It would be overwhelming to simply cross a busy street if we were considering all of the millions of car components flowing down the streets or considering the various differences in cars all the time, rather than abstractly thinking of various differing collections of parts as just "car". All of this happens at a subconscious level, and we can consciously tap into some of this mental machinery to help us reason about code. Of course, we don't need OOP to leverage this mental machinery, but it's one technique for doing so.
> So many of the problems I see come from developers (myself included) adding new business logic to a process, or fixing a bug and not thinking through the side effects of the change
That's a great take, but I'd turn it upside down. Problems come from developers adding new features while not being mindful of the importance of helping out whoever is reading that code in the future to understand what are all the possible states.
Anyone can hack together code that's all over the place, but forcing people in the future to do shotgun surgery is as bad as adding the bugs ourselves.
One thing I've seen that helps out immensely with this in a large complex code base is extremely liberal use of ASSERTs.
Every time you have a function or clause that takes an object or data with many possible versions or states, but only will work correctly with particular assumptions, ASSERT that your assumptions are satisfied. Every time.
This helps in several ways. At the time of writing the code, it makes you think about possible states that could be represented and the generality of what your new code actually handles, and so encourages you to write general solutions and avoid unhandled or mishandled cases.
At the time of reading the code, it helps the reader see what cases the writer considered. It helps document the intended scope and use cases the code is designed for.
At the time of extending surrounding code, it helps to catch cases where the extension introduced state combinations that had not been considered before. Its nice to make an extension, run a test case, and get an assert that tells you exactly what other function needs an extension. Without the assert you would make the extension and run your test case and get wrong behavior or a crash and have a big debugging mystery or even worse: apparently working but subtly wrong code.
Others in the thread are talking about making invalid states unrepresentable, which is nice, but not always doable. In a big code base you don't have the luxury of redesigning things; usually you have to stick to more local changes. But you can always add ASSERTs -- that is a completely local way to mark invalid states as invalid.
Haskell case expressions are nice. It seems like you can define which input values a function can process - and you can get a log of if/else logic removed.
I am always wondering how much are we expected to understand the code we’re working on.
It is very easy to just say people should know everything, but in practice people don‘t review every single library, nor defend against every single possible case, and surely don’t keep everything in memory while working on any single problem.
I really can’t imagine someone “fully understanding” how the linux kernel works for instance, or even a full graphic card driver, or the software on a POS works as working knowledge.
So yeah, people cause bugs because they don’t know it all, but I’m not sure that’s saying much.
PS: real progress is probably made starting from assuming people don’t understand, and having frameworks that somewhat keep the system manageable under that premise.
Total newb here, but according to Stroustrup, the whole point of programming is so you don't have to know everything. You're responsible for:
A. Understanding relevant computing fundamentals
B. Understanding the interfaces to code you use
C. Writing reliable implementations and documenting them well
D. Writing useful and appropriate interfaces for your implementations
In a perfect world, this creates a nice chain where every programmer only needs to look after these four points and then everything will work seamlessly.
good luck trying to fix a bug in a complex project without doing some sort of test to make sure the change does not break one of the thousands of requirements implements so far.
from my experience fixing a bug without validating the system has a very high chance to spawn multiple bugs found later on...
I think the best way to solve this problem is by writing small digestible message passing functions (like Erlang) and each function written using design by contrast (like Eiffel). That way the situation can be bought under some control when the software scales.
In terms of applying this to languages other than C++, as someone who’s relatively new (4 years) to Java I expected to see a lot more side effects in the Java I’ve worked with in that time. In fact, the opposite has been true, and much of the points discussed in this article seems to have been adhered to; even going back to some of the older codebases. That said, we mainly have stateless microservices running on k8s, so naturally that leads to a very different mental model than long-lived stateful applications.
The thing that really strikes me from the article is the tone in which it’s couched. The pragmatism on display makes it so much easier to read than it could be. Coming from another few years writing Scala and thinking I was a functional programmer (writing hobby projects in Haskell, Elm etc.), I’ve realised I prefer programming in Java in a team (or at least my current workplace’s flavour of Java, and we’re on Java 17); a realisation that still surprises me now. In reality, there’s only so much purity needed to ship even a relatively complex microservice, and Java does that very well with some functional patterns thrown at it.
i am by no means a java expert, but i would have thought that the fact that most objects in java are shipped between functions as references makes purity more difficult to effect and to diagnose than in c++ where everything is by default shipped as a value - of course c++ allows you to change this , java less so?
While you're not wrong RE: object references, Java does have the `final` keyword that you can apply to fields, local variables, method arguments, etc. This prevents the value from being re-assigned short of going out of your way to do reflection hacks and such. You can also apply it to classes themselves and prevent them from being inherited and modified that way.
The only real 'gotcha' there is that, while it prevents whatever it is applied to from being reassigned, if the value is a mutable object of some sort (e.g., an array, mutable collection, etc), it, of course, does not prevent you from modifying the contents of the array, etc. However, Guava provides nice immutable collections to mitigate that issue [1].
It's been a while since I've written modern Java, but I want to say JDK 14? also brought records, which I believe are basically the equivalent of having `final` and `public` on everything as well as overriding the equality and hashcode functions on everything by default so that comparisons are done based on the contents of the object rather than the reference.
And while I am sure many people are scared of the bytecode manipulation that it performs, Lombok [2] provides a lot of little utilities that make using immutable objects totally painless - e.g., applying `@Value` to the class generates all of the necessary boilerplate at compile time, or my favorite, `@With`, which generates methods for copying an immutable object with one of the fields changed (e.g., `val newUser = olduser.withName("Nayeon");`)
Lombok also offers Kotlin-style extension methods [3] which can be nice too for this kind of thing. Though, in my experience, it is the only feature in it that has bitten me at times.
At that point, why not just use Scala? Get all the immutability, implicits, etc without bolting these things on? I’m asking more rhetorically and there are organizational constraints from doing so, but it seems like if you’re looking for more FP, better solutions still exist on the JVM.
This is definitely true in theory, but in my opinion isn’t so much of a problem in practice. If you expose APIs on your objects that don’t expose mutable methods (I.e. like the normalised method on Vec3 in the article) then you limit exposure massively and in the places that remain exposed (ie standard library methods), it goes against the grain and tends to be avoided. Much as null safety is theoretically a huge issue in Java but, at least in my limited experience Null Pointer Exceptions are relatively uncommon in our microservices.
Obviously process isn’t a good replacement for a proper syntax to prove these don’t occur, but in terms of shipping maintainable, readable software, these pragmatic approaches applied to a mutable, multi-threaded language have largely proved sufficient.
Also, as mentioned in other comments in this threads, Java 17 Records go some way to mitigating mutability concerns too - although mutable APIs are still everywhere in common frameworks and standard libraries. That said, most mutability in a microservice can be scoped to either boot time or locally to a function though, with the latter being invisible to the caller, i.e. constructing a map with the mutable `put` calls, all within the scope of a single method.
(edit: sorry, reply does not really match your post, might have been better a level up? The bar connection to the "java vs c++" that I can come up with is that the pass by reference is not much of an issue when the general mindset is mutation-averse)
Java has come a long way since the "OOP is the bee's knees" days when people fully bought into OOP's out of sight, out of mind promise about mutable state. This started long before language features like records gave post-OOP the official blessing. Along the way have been a few odd intermediate stages like the bean craze that was all about exposing state and pretending that it's still OOP somehow, and the AOP adventures that explored ways to get along with all that mutability.
In my experience this is mostly solved using the “lombok” library for older versions of Java, and by using the records feature in newer versions of Java. Both of these allow you to create data classes with immutable fields.
Java 17 adds records, which are a more blunt tool than using const on method, you have to choose if a class is mutable or not, and not if a method is mutable or not.
For microservices, where you deals with data, it's quite effective because it works like typed JSON.
Now that Java has algebraic data types and pattern matching, it explicitly espouses a programming style long associated with FP, especially when programming in the small. See Brian Goetz's (Java's chief language architect) article: https://www.infoq.com/articles/data-oriented-programming-jav...
He talked about it a bit during his lex friedman interview. He said he spent some time there, IIRC, and found it somewhat intriguing, perhaps interesting, but needed to get shit done eventually and so stopped his investigation. Very pragmatic.
From what I can see in this article he mostly seemed to like the purity aspect of FP. It seems to me like a number of FP concepts aren’t really “FP things”, as Carmack points out you can use these patterns in imperative languages as well. Making your own DSL seems to also often be an FP thing that doesn’t need to be an FP thing.
I guess the other way is true as well. OCaml has OOP and support for imperative style (with mutation, early returns via exceptions, etc.)
In the same interview he also mentions using python a lot lately and making the point that for the vast majority of code, performance really doesn't matter and productivity is more important.
Reading this article, he's making a great argument for Rust. I assume the article predates rust by quite a bit. But basically Rust is not a pure functional language but still has a lot of elements in the language that add purity and rigidity. It allows you to do low level stuff when you need to but at the same time promotes safety and correctness.
>In the same interview he also mentions using python a lot lately and making the point that for the vast majority of code, performance really doesn't matter and productivity is more important.
Until it does matter. Then it *really* matters at which point you're normally looking at a bottom up rewrite and likely retooling and rehiring for new skills.
Ive been bitten by "oh it's just a prototype" a few too many times now.
I guess it might be my lack of experience, but why are rewrites bad ?
It's a prototype because when you start a project :
- you don't understand what the client wants
- the client themselves don't really understand what they need (and sometimes, not even what they want, which is magnified as soon as you have to deal with multiple people)
- you might not know how to achieve that
- you might have an idea how to achieve that, but not have practical experience in it
- reality (including funding limitations) restricts what solutions can be implemented
- the usual code entropy...
Given all of the above, why would you NOT do a bottom up rewrite at least once ?!
P.S.: And in the cases the performance critical bits can be isolated, Python can outsource those to performant languages like FORTRAN or C.
I am guessing the answer usually is : the funding doesn't come directly from the end user, and so the incentives are not aligned for a successful project - which also explains why software projects have such poor success rates (~10%) ?
One could always create a native python module in Rust/other and call it from Python. It might not work for videogames, for instance, but ML seems to do fine with this approach, and they require a lot of performance from the GPU.
So spend the time doing the hard bit, then introduce some complexity and a software turducken for the sake of calling it from a "simple" language. At which point I'd then ask the question are we really so scared of someone accusing us of overengineering that we're creating more complicated solutions to appear simple and pretending the complexity doesn't exist?
>Reading this article, he's making a great argument for Rust.
I think his points are related to an approach which generally reduces complexity and errors. Parts of that philosophy are embodied by Rust but are not exclusive to it.
I also don't know if Rust aligns well with the goal of reducing complexity. Programming in Rust gets really bogged down into thinking about programming language abstractions rather solving whatever problem you need to solve.
just because a pattern can be used alongside a different paradigm doesn't make it "not an FP thing."
Referential transparency and immutability are fundamental to FP, therefore they are meaningfully described as FP things regardless of who is using them or where.
>He said he spent some time there, IIRC, and found it somewhat intriguing, perhaps interesting, but needed to get shit done eventually and so […]
I think this resonates with many. My take is: don’t be dogmatic. Regardless of the language, one should write as beautiful, composable and pure as possible - but no more than that.
Where only performance matters, oop wins. When clarity is the only priority, Haskell wins. But in reality, we’re rarely dealing with either scenario. We’re somewhere in the noisy realm between, and ought to code accordingly. Fundamentalists rarely make policy with the nuance reality deserves.
Rust supports object oriented programming, and so does golang. In the case of the latter, it’s a fertile ground for all manner of bugs, performance issues, and weird behaviors thanks to the duck typing approach they chose.
> but needed to get shit done eventually and so stopped his investigation. Very pragmatic.
That's a strange take by him and on pragmatism. There's also almost nothing pragmatic about C++. Many functional languages are actually quite pragmatic.
> There's also almost nothing pragmatic about C++.
That's a heck of a weird take. C++'s main value proposition has always been pragmatism. In all possible and conceivable aspects of a programming language. The fact that is a (mostly) superset of C, support for OO, RAII, outstanding backwards compatibility, time-tested and future-proofed, "you pay for what you use", it targets everything and runs everywhere, it's the language of choice on most mainstream OSes and specially Windows, etc.
Why do you think people tolerate footguns and memory shenanigans? Why do you think that C++ is still one of the most popular languages ever devised and continued to hold the crown even after the C++98 cross through the desert?
C++ has been the optimal choice with regards to pragmatism at least for the past two decades. What does C++ offer other than pragmatic solutions?
> C++ is not necessarily popular for its merits. Inertia has a lot to do with it.
This personal assertion says nothing about C++ other than stating you have an irrational dislike for it. There are plenty of reasons that make C++ one of the most successful programming languages in mankind's history, if not the most successful, and inertia is not it.
The word game appears twice in all the articles, so I took this writing as a broad outlook. And C++ is not C.
Just because C++ ships games doesn't make it pragmatic. Aren't many game codebases considered to be absolute hell? And none of that says anything about the pragmatism of functional-first languages.
> The word game appears twice in all the articles, so I took this writing as a broad outlook. And C++ is not C.
C++ is indeed not C. C++ is a considerable improvement over C. You can stick with your C-style code and get the same result you'd get if you had used C, but you still have access to features that simplify a great deal your life and don't have a performance impact, like templates. With C++ you can have your C cake and eat the C++ cake too. The only drawback is compilation times, but any caching system takes care of that.
What language would you suggest game developers should have used throughout all these years? It needs to be performant and portable, and it needs to have existed for quite a long time.
C99 with code generators written in python or lua for the engine. Implementing templates in the C preprocessor goes pretty badly, but implementing them as a normal language that writes valid C to #include various places works just fine.
I've been coding like that since at least 2016[1]. The trick is to make adding code generators as easy as adding source code. Not a game engine in my case, though there's a CppCon talk from 2014 which makes a passing reference to using code generators with C++ instead of templates for one.
Assume source is src/foo.c and compiled to obj/foo.o, then introduce a third directory called gen. Change the makefile rule to copy .c from src to gen, and compile from gen to obj. Then add a makefile rule that says gen/foo.c can be built from src/foo.c.py by calling python on the source and redirecting stout.
That means a C source file can be turned into a python program that generates the same source by wrapping it in a string literal, calling print and renaming the source file. No build system change. Then change the python as you see fit to work around the limitations of C.
Works really well for a solo developer. Repo using this scheme is ~100 generated files (mostly lua, some python) vs ~600 C files. Plus ~20 C++ and one D, just getting started with that language.
Unity is partially written in C++, and since HPC# introduction they have been slowly migrating code from C++ into it.
Unity is one of the reasons why Microsoft started to take Midori lessons into regular C#.
Ultimately if we want to be pedantic, all game engines are written in C and C++, as that is the only way current OSes expose their graphics APIs, no alternative implementation can change that unless they also write an OS in the process.
> There's also almost nothing pragmatic about C++. Many functional languages are actually quite pragmatic.
a programming language alone doesn’t practicality make or break. factor in programming tools, libraries, community, and of course one’s prior experience.
John‘s pragmatic approach to this topic (any topic really) is very refreshing. Game dev is one of computing world‘s battle grounds when it comes to real-time performance and stability, and there’s just little place for dogma.
> The pure functional way to append something to a list is to return a completely new copy of the list with the new element at the end, leaving the original list unchanged. Actual functional languages are implemented in ways that make this not as disastrous as it sounds,
I'm curious : how do they manage to not make it (performance) disastrous ? Anyone could please forward me some functional-newbie-level examples ?
The short answer is that with a singly-linked list, you can easily add items to one end (conceptually, typically the front end.) Then if other code has references to the older parts of the list, they’re still valid.
A great explanation of this is in “Practical Common Lisp”:
But then there are cache locality issues, etc. Algorithmic runtime performance in practice rarely matches the basic big-O analysis taught in CS classes.
Case in point: for small cardinality data in practice it’s often (or at least occasionally) faster to use a linear time search algorithm on a fixed length array than to put it, say, in a tree, map, etc. that is big-O “better”.
When this article was published on his altdev blog it had a big impact on how I wrote code. I really embraced pure functions and tried to think in terms of side effect free code and inputs/outputs.
Got no real data for it but I felt the quality of the code I wrote was much better going forward.
It has helped me see classes as a group of functions with parameters in common. The constructor helps me reduce the amount of params passed to each function. If I avoid mutating the internal state across function calls, I'm just using them as a way to group functions together (as a module would).
Instantiating and then calling different methods helps me test a process at different points of the executing (testing and debugging). Inheritance helps me clean up stuff that's repeated in specific sub-problems. Mixins/modules help me in a similar way, but for stuff that's reusable in unrelated problems.
I'm writing this with Ruby in mind. Seeing how OOP constructs can help me manage code that's written in a functional style has given me a middle ground: I started as 'FP is superior; use everywhere; avoid OOP languages' and now see the OOP stuff just as tools that can make it easier for me to encapsulate and reuse logic. Obv this comes with footguns: taking outside params in methods, local variables changing behaviours, indirection from inheritance/extension - the list goes on and other languages have their own. There's problems with it but as the author says, these things aren't all-or-nothing - so you can nudge the codebase in the FP/'pure' direction.
"Objects are a poor man's closure.... Closures are a poor man's object."
As a 25+ years coder who's very accustomed to thinking in my own ways, I have a problem with functional-purism similar to the problem I had with globals and factory functions. The fact is that most programs have a lot of states and these states have to be represented somehow. Allowing the objects to contain functions that act on those states is a perfectly good analogy for most business logic or game logic you're modeling. Abstracting what you can abstract into atomic functions is great. But so many functions are more comprehensible when they take place bound to the scope of what it is they're acting upon... again, not to produce an answer, but to change/track/maintain state which is a critical part of what people experience when they interact with a program. I don't think that thinking in terms of objects or even global states should be considered a code smell when what you need to do is to get things done. If/when you need to make some functions synchronous across asynchronous systems, by atomizing their logic, then you will hit upon functional programming methods by default.
The argument isn’t that state is bad or doesn’t need to be managed. It’s more that spreading that state over a large number of areas leads to a lot of complexity and cognitive overhead in terms of expected behaviour at any given point of execution. Functional programming gives you ways to be much more explicit about the transformation being performed and the before/after states (except arguably when you start getting overly cute with zippers and lenses and such). To the extent that this approach hasn’t magically transformed programmer productivity or software quality, we know that both approaches are admissible.
Isn't encapsulating state - and functions that act on states - within the smallest-possible class ancestor a pretty reasonable way of handling that? I'm not suggesting spaghetti or`GOTO 10400` or `GlobalCatBehaviorFactory.StartMeowing(cat)` or something. I mean we have a paradigm, and it's basically OOP with a dash of functional programming at the functional level.
Maybe what I'm trying to say is that a lot of functional programming takes place within the OO model when it's efficient to do so; but functional paradigms just generally ignore state altogether, so absolute purity doesn't leave a route for daily coders writing regular programs to do their jobs.
I would say it's often the opposite, unless I'm misunderstanding you. Instead of being encapsulated "deeply" in a class ancestor, state in FP is in the "top" of the program, in the "shallow" part, as explicit as possible and as close to the entry point as possible, while the functional part is the thing that is in the middle. The term for this is "Functional core, imperative shell". In something like Haskell, side-effects are dealt with even "before" the entry point: they're handed by the runtime, which is the part that calls the entry point.
This also translates well to OOP in the form of things like hexagonal architecture (eg: business logic is pure, and state is modified by a database adapter), or to modern frontend programming (eg: the render tree is pure, visual representation state is stored/modified with by React), including flux architecture (eg: reducers are pure, state is stored/modified by Redux).
Um... ok. I think we're sort of both talking about keeping logic unbound from artifacts like display and from objects that contain state. Where I think OO makes sense is that once you admit that objects have properties it makes sense to take the top form of that object and put the functionality there - rather than abstracting it away into some completely other logical place.
Maybe this all comes down to what I find easier to follow in 200 code files and we're talking about roughly the same thing. If the general rule is to make logic as independent and no-need-to-retype-it as possible in the hierarchy, I'm onboard. And letting functions manage state themselves makes no sense; even in OO paradigms, functions should be pure and not rely on the state per se. (Put in plain terms, I make heavy use of getters and setters on instances; but complex functions are almost always static, and take instances of the class they're in to modify, rather than reproducing those functions and having them work on their own instance).
> I mean we have a paradigm, and it's basically OOP with a dash of functional programming at the functional level.
Well, but who really follows that idea? What usually happens is, that the state is stored in the object (of a class) and the result of subsequent method calls depends on previous calls and arguments, since it changed the state of the object. Maybe there are even calls to other objects in the system changing their state, and in turn changing return values of their methods.
Also: If we have actual functions (in a mathematical sense), then they depend only on their inputs / arguments. Why then have them in a class, and not in say for example, a module?
> But so many functions are more comprehensible when they take place bound to the scope of what it is they're acting upon
To me, the most understandable function (lowest context needed, lowest cognitive overhead) is a black box that takes type x as input and produces type y as output, without side-effects
> I don't think that thinking in terms of objects or even global states should be considered a code smell
Classes can mix state and behaviour. I prefer when state is encoded in types and behaviour in functions, instead of having implicit state changes in a class, because it is more difficult to track, and constraint.
Of course, in the real world, if a library or platforms uses OOP, so be it. I am not dogmatic. But ideally, I avoid OOP because of this implicit mixing of state and behaviour.
Yes. With guardrails. I mean, I do a lot of writing in JS/TS but I'd never write something that tacked a dynamic property onto some class and picked it up later. That's just gauche.
> a black box that takes type x as input and produces type y as output
I agree, but I'd rather keep that function in Class X. If it's specific and X is a final/protected class, I'd say
`protected doSomething()=>Y` where the argument is the implicit `this`.
I'd never allow some other thing to run this function, definitely not some other object floating out there; if it were going to be run that way it would have to be a `static DoSomething(x:X)=>Y` ...but I would still keep it in the file for X.
> To me, the most understandable function (lowest context needed, lowest cognitive overhead) is a black box that takes type x as input and produces type y as output, without side-effects
True, however we must be careful not to go completely bananas either; it's really hard for a reader to piece back together a full mental picture if the functionality is spread into dozens even hundreds of general-purpose 3-line functions.
Simple, `y = f(x)` types of functions are indeed the easiest to understand, but you might be no closer to understanding what the code is actually doing.
Agree. This is why starting with the data structures makes the code clearer.
Of course the functions can be grouped in modules. It is the same problem as having god objects, having lots of methods or many classes in OOP (it is orthogonal to OOP or functional programming).
Please read up on the definition of functional programming (or rather: pure functional programming, since you talk about "functional-purism").
Because, everything you are saying is either forbidden nor discouraged in (pure) functional programming. In fact, it is totally orthogonal.
The point of functional programming is solely to prevent the same code to behave differently just because it is used/called in different parts of the program. You can still use objects, you can still make them have methods that operate on only a limited part of the state and so on. No problem with that, I do it all them time in fully pure functional programs. In fact, I think this is best practice.
If I recall correctly, each line is either a feature to add, or a bug to fix. * means it was done on that day. + means it was done on a later day. Empty means either not done or not marked as done.
(Tasks may appear in multiple days, and there's no guarantee of tasks being retrospectively marked as complete.)
Why aren't more universities teaching functional programming first to form good mental models for the students and then show them when to use state and when to not.
When it comes to an Intro. class, they aren't teaching python, they are teaching how computers think, and how to make them think. Python stands the least in the way. Once you get deeper in the program though, I think you need to move away from Python ASAP. Although, if you have any courses that may not be squarely intended for CS majors (like maybe Math/Engineering/Stats/Bio. etc.), then you don't want the language to stand in the way of the concepts there either.
I think many CS degrees teach FP, I remember learning a variant of SML (mosml) for my first programming course. Now the same university is teaching the same course in F#.
Of course a lot of the data science/machine learning is being done in Python, but that is likely due much of the frameworks/tooling around is Python in general in that world.
At least on mobile, it’s not clear where the blogger’s introduction ends and Carmack’s reproduced post begins. 5 years late to mention it, though. Great article.
The psychological aspect of 'purity' in this concept is as interesting as anything.
I think that devs are naturally prone to be obsessive compulsive about such things.
We can see 'defeating the borrow checker' as a 1st order objective.
I think devs fee that 'defeating the borrow checker' gives us much more satisfactiong than 'fulfilling the customers needs'.
There's something a big 'chinse finger trap' about FP that consumes our attention on an impusleve level that is not necessarily consistent with what it actually means to be a good engineer.
This, along with our will to abstract and to want to use 'shiny new things' etc..
On the other hand, there are those who are too stubborn to use anything but a dull knife and a broken hammer. There are situations where you can, e.g., make an error state impossible by defining a couple of types, or using one or two lines of generics. But then, what should have been a simple change becomes an ordeal because the other person lacks the background on all of these ideas.
I think all CS curricula should have some sort of class that explores languages like Haskell and Rust, the problems they attempt to solve and how they go about solving them, if only for the intellectual experience. What language you end up using, and what ideas you end up applying, is then up to you on a case-by-case basis.
I’ve considered a lint rule that requires all functions to be written in Lambda syntax so that closures, side effects, and purity can be explicit. However, I haven’t explored possible effects on compile time, size, and performance. Another possibility is explicit scope closures, like [x,&y]{…} available for every defined scope.
best article i've seen on pragmatic purity - i would only dissent on c++ not providing facilities for pure functions (i don't here mean pure in the virtual=0 sense) - you can, and should, do quite a lot.
The language does not offer the features that you were referring to, and that the author of the article mentions. You can write pure functions, but the purity is not checked by the compiler, that is what has been meant here.
For example declaring function of the class as const pretty much turns it into pure function and compiler makes sure that it does not modify state of said class. So you are wrong from a practical standpoint. try to modify state of the class inside const function and it'll be compiler error.
That doesn’t prevent you from accessing global variables or reading member variables (which many const functions do). Marking a member function as const doesn’t in anyway guarantee purity, and is actually quite unrelated.
In fact, most const class functions are by definition not pure at all, because they will read internal state, otherwise they wouldn’t even need to be members of the class. So I would be willing to argue that nearly all const class functions are impure.
um, i seriously doubt that many const functions access global variables. if those in your code do, then i suggest your code has problems - the typical use of a const function is doing something like length() does on a std::string. how, or why. would such a function access a global variable?
I agree, but most would access internal state, which is equally impure. A pure function has no access to anything outside of its function parameters, and it changes nothing outside the scope of the function.
In the context of a pure function, a local member variable is no different than a global variable because it’s outside the scope of the actual function. If you can’t take the function away from the class and have it still work perfectly, then it’s not pure.
length() is not pure, but a function like length(string) could be.
the instance of the class is a parameter to the function. As well your definition of pure is not the accepted definition. The definition depends absolutely nothing on the internal implementation as long as some in invariants hold.
1. For the given set of inputs ( including the attached object ) the output is always the same if called again with the same inputs ( including the attached object in the same state ).
2. No side effects. This means that nothing can be observed about the function apart from it's return value.
If you wish to quibble as to whether the object is a parameter or not then observe that in c++23 it will be allowed an explicit this parameter to methods This is not to make the methods more pure but to enable other types of optimizations and reduce duplication of code over const and non const methods.
That's an odd interpretation of purity. As per the article:
> A pure function only looks at the parameters passed in to it, and all it does is return one or more computed values based on the parameters... It doesn’t maintain internal state... Ideally, it isn’t passed any extraneous data
If you send in an entire object state as an argument (or as an attached instance, as you argue), this is not a quintessence of functional purity and strays far from these concepts. Consider: your object can have all sorts of data in it that you are passing to every member function, and it is unlikely all that state needs to be used in every function -- as the article points out, this defeats the purpose as it's akin to passing in a big pointer to a bunch of state.
In a pure function, it accepts only the specific data it needs, nothing else or extraneous, and returns a value based on these specific inputs.
As I mentioned in another comment, you could say for a particular type that since it only has one data member, no harm done -- but in the general sense, most objects have more than one piece of data, and not every function needs every piece of data, so passing all that in is a sloppy approach to purity.
The object state is a parameter passed to the function. The number of arguments passed to a function does not affect it's purity. The feed forward recognition phase of an LLM is pure but has billions of parameters. The purity is defined by it's behavior in relation to again.
1. Does it always return the same result for the same input parameters
2. Does it modify anything other than return a value
These two properties define purity. Simple and not odd at all.
Not sure why you are arguing about this? The issue is about the idiomatic, expected style that surrounds pure functions and functional programming. The posted article goes into this in greater detail (and more elegantly than this thread) as to why the pattern you describe -- passing a big pointer of potentially large composite data to a function (implicitly or otherwise) -- is not in the spirit of FP purity. If you want to be pedantic, you are welcome and you are academically correct, but the tradition of FP in practice is not to do this (as also pointed out in the article). It's not a question how many arguments but rather if there is extraneous data that the function does not need, as would be common with the full object.
Not sure why you are arguing about large classes with extraneous data. If this is how you program your classes and it works for you then great but honestly you should try for small immutable class where possible with pure functions ( methods). It's really liberating.
Member functions have an invisible argument: the pointer to the C++ object. You can access this argument using the 'this' keyword. John Carmack describes pure functions like so:
> Pure functions have a lot of nice properties. Thread safety. A pure function with value parameters is completely thread safe. With reference or pointer parameters, even if they are const, you do need to be aware of the danger that another thread doing non-pure operations might mutate or free the data, but it is still one of the most powerful tools for writing safe multithreaded code.
Emphasis on 'value parameters'. The invisible 'this' argument is not passed by value, it is passed by reference. By Carmack's definition, C++ member functions are not pure.
If you read the article and see what pure functions are, you will understand. The length function has no parameters, by definition it must access data outside of the function if it’s going to do anything. This means it is not pure. Can you take that length function outside of the class and run it and have it do something? No, it is tightly coupled to the class that it is attached to.
A member function must have at least one parameter. Here it's "parameter0":
parameter0.membfun()
It doesn't really matter whether the parameter name is separated from the function name by a dot, or a bracket. In a different syntax, it might have been written as "membfun(parameter0)" without any practical difference.
This means that Foo::length() can be pure no problem, because Foo is its parameter, and, as such, in its scope.
I see your point, but you are
still passing the entire state of the entire object to your function if you want to follow that line of thought. This is also discussed in the article, and it’s not really what functional programming is about. What other state is in there that the function has access to? Maybe on a case-by-case basis you can say that particular object only has one piece of data, but in the general case this is definitely not what purity and functional programming is about.
Incidentally if this has whetted your appetite for functional programming, I encourage you to read a lot more about it and study languages that are truly functional. It could be quite liberating to see this approach to software development. It is not a panacea but it is a very useful perspective to acquire on software engineering, regardless of language, which is what this article discusses.
i find this slightly patronising - i have studied languages such as scheme and haskell. and i don't see how writing a function such as length(string) could be pure if the member function length() was not, as you would pass a copy of the string (ok, pure) and then call a member function (impure, according to you) on it to actually get the length.
No, you would not typically call a member function on anything in a pure language. Instead, you would pass a sequence of characters and the function would return the length, by counting them.
I didn’t mean to sound patronizing, I apologize for that. Your question suggested that you aren’t that familiar with functional programming, so I was just trying to steer you in the right direction. A function that takes no parameters is not a pure function in general, unless all it does is return a constant value, which wouldn’t make much sense. length() does not make sense in a functional programming style.
well, the length function for c++ strings does return a constant object - an integer value.
and passing a sequence of characters? hello, C strlen( const char * s); ! with all the problems that has. and if you don't want to pass some horrible pointer, you have to pass some object containing the characters which must have a length function itself. hello, std::string!
what you are ignoring here is how functional types either ignore functionality under the hood, or sacrifice it for performance.
which is why functional programming will never catch on (look at lisps), imho. programmers really like to know wtf their code is actually doing on a mutable machine.
> the length function for c++ strings does return a constant object -
No, I mean, a standalone function with no parameters can only be functional if it always returns the same value, which is effectively no different than declaring a constant. The function always returns the same value all time (which wouldn't be so useful).
I think you are trying to find C++ examples to match the broader implications of functional programming (which is what I'm talking about), which will be difficult.
As far as whether "functional programming will catch on", I don't think it will for C++ so much, but it certainly has been very successful for many other languages. I earn my living writing only functional code, but most of the time it is not in C++.
It's not pure if you can modify the state of a global, the file system, the environment, the standard output, etc., and this is half of the reason why purity is desired.
And it’s not just about modification, it’s also not pure if you simply read data that is not passed into the function via parameters. Because in a pure function, you don’t need to access anything, reading or writing, that is outside of the function itself.
certainly desired, but not always possible, seeing as we actually want to do state changes, for example output, unless with much hand-waving monads, which of course really do change state.
yes, i know. but there is nothing that stops you from writing pure functions, and quite a lot of features that encourage it, even if they are not checked by the compiler. i'd probably like a c++ compiler that warned me about lack of purity, but as the article says, sometimes you have to bite the bullet.
> there is nothing that stops you from writing pure functions
If they are not checked, they will be forgotten. When reading a function, if I do not have the guarantee that it is pure, it means the same as an old comment (not much).
If the only way to enforce it is convention and force of will it will not be enforced.
As an exaggerated illustration: we might add a "without-bugs" annotation to a function, to indicate it has no bugs. After all, nothing stops you from writing bug-free functions :)
GCC has a pure attribute for functions which is documented as being prohibitive of impure behavior. Though, I honestly cannot say how thoroughly it is enforced or if there are significant benefits outside of the obvious and some minor optimizations which the compiler might be able to perform.
I think this probably the most important concept for programmers to keep at the front of their minds when working day to day. So many of the problems I see come from developers (myself included) adding new business logic to a process, or fixing a bug and not thinking through the side effects of the change