Hacker News new | past | comments | ask | show | jobs | submit login
I claim Rich Hickey is wrong about non-null arguments to functions (2020) (jonstodle.com)
77 points by Capricorn2481 on July 23, 2023 | hide | past | favorite | 183 comments



The definition of breaking change here completely destroys any idea of abstraction or bug fixes. If calling a function requires knowing how the function is implemented, then the function should not exist in the first place: the whole point of an abstraction is that, if the use-site fulfills the precondition, it can assume that the postcondition holds after it used the abstraction.

This means that relaxing the precondition is, by definition, not a breaking change because the old inputs are a subset of the new ones. Similarly, strengthening the postcondition isn’t a breaking change because the new outputs are a subset of the old ones. (If and he use-site assumes that it gets a specific member of the function’s codomain for specific input values, you have an abstraction leak and the use-site has a bug [or the function should be inlined because it’s a false abstraction])


I really wish I had the ability to downvote articles. Agree with you 100%, and I feel like I lost braincells from reading this sentence:

> You are changing the behavior of the function. That is a breaking change.

So this guy is declaring someone else "wrong" because he unilaterally decides to redefine what a "breaking change" means? This is one of the dumbest "all Internet arguments are over semantics" examples I can think of.

As an aside, I've seen a number of recent instances (here's another, https://news.ycombinator.com/item?id=36826111) where a flat out ridiculous blog post makes it to the front page, only for the top comments to all point out how dumb and ridiculous it is. To stay within the HN guidelines, I'm going to refrain from speculating why, and I certainly know that HN audience is not some monolithic group, and that people have many different opinions. Would just like to know if others have noticed this or have other specific examples.


For the record, I'm not the author, nor do I agree with him. But I posted this because there was no comment section and I was curious if anyone would either

1) Agree with the article (virtually no one has) 2) Point out that the breaking changes Rich Hickey pointed out might be worth it, or not a big deal

A few people have responded with sentiments of #2. Not because it's not annoying to change these things, but that it's a smaller price to pay for relatively good type systems (I'm thinking more F# than Haskell)

Unfortunately, the talk Rich Hickey gives explicitly says Clojure spec doesn't have a solution for a flexible type system either, and the solution they proposed is still in alpha 4 years later. So while I appreciate the insights he offered, I'm torn between the problem of refactoring with the extra work of a finicky type system, or the work of hoping my tests are good enough to refactor my Clojure code.

Types in Clojure is the number one requested feature in their surveys, year over year, and despite the reputation he has, Rich recognizes that it can really help refactoring a code base. Good variable names aren't going to help you with this. Pushing Spec2 along would be a big win without having to implement a full blown type system.


My issue is that the author doesn’t seem to know what a breaking change is. Spec itself is annoying to me for a couple reasons, but this isn’t some novel insight of Rich Hickey’s: the claim about relaxing preconditions and strengthening postconditions is in Bertrand Meyer’s _Object Oriented Software Construction_ and is a relatively basic claim in texts about what it means for software to be correct.


Extending a function to accept null, under dynamic typing, can break a program which depends on the documented behavior of that function having thrown an exception for that case.

  (defun strlen (s)
    (assert (stringp s))
    (len s))

  (defun map-len-test ()
    (mapcar (lambda (arg)
              (ignerr (strlen arg)))
            '("abc" nil "defg")))
  

  (pprinl (map-len-test))

  (defun strlen (s)
    (if (null s)
      0
      (len s)))

  (pprinl (map-len-test))
Output:

  (3 nil 4)
  (3 0 4)
The caller had one idea of substituting a value in the null argument case. The function changed behavior, implementing a different idea.

Only in a statically typed language in which the call with nil is impossible, and which provides no dynamic access to the compiler, can we be 100% confident in saying that extending the function to support nil is a non-breaking change.

If the is impossible, then the function has no behavior for that case; a program invoking that case doesn't exist in a translated, executable form and so there is nothing to break as far as the program is concerned.

It's possible for another program to break: a compilation test case which validates that code which calls that function with nil cannot be compiled. If the language supports dynamic access to the compiler, then an application can be written which can behave differently due to the change: an application which dynamically compiles a call to the function and now gets a valid result (compiled code) rather than an error.

In the dynamic world, it's a non-breaking change for applications that wisely don't rely on exceptions being thrown on bad inputs, and instead handle those themselves.

If you're changing the function, you cannot know that is not the case; at beast you can decide not to care.

Not caring is not the same as knowing.


> Extending a function to accept null, under dynamic typing, can break a program which depends on the documented behavior of that function having thrown an exception for that case.

This just means that the preconditions are “the value is not null” even if there are no types to capture that precondition. I was pretty careful not to use the word “type” in the comment you’re replying to.


The problem is that in fact there are no preconditions; the function can be called unconditionally using any value as an argument. It throws for some values, and any change in which ones is a visible change.

Programmers will make code dependent on anything that is visible in an API, even if the specification tells them not to, an ISO standard tells them not to and even if their mother tells them not to.

Here, the spec may even be saying: if you call this function with a non-string, it throws an exception.


> The problem is that in fact there are no preconditions; the function can be called unconditionally using any value as an argument.

There are no _checked_ preconditions. It’s only meaningful to talk about the correctness of a program relative to a specification of its behavior and a specification needs to specify preconditions and postconditions. So, if there are no preconditions, any behavior of the function being called is as correct as any other behavior.

The fact that some programming languages allows the callers of a function to pass values that don’t satisfy the spec is irrelevant to the point I’m making.


The function checks for a type with assert: (assert (stringp arg)). But that check is a documented behavior which generates a predictable exception that the caller can detect and so can tell when that is suddenly missing.

No matter what consequence you give to the failed check, something can break when the check is taken away.

We could have a True(r) Scotsman's checked precondition like this

  (unless (stringp arg)
    (abort)) ;; bail the entire process image with abnormal status
If we check it like that, nobody will try to depend on the behavior within an application, in the way that my test program did.

It can still be a breaking change if the function is written that way, and then changed so the abort is taken away. Because the application can do this:

  (defun my-stupid-abort ()
    (strlen nil)) ;; this calls abort!
So now if the program relies on (my-stupid-abort) to abort, that will break; suddenly that function call returns zero and the process keeps going.

You really need:

  ;; ... ffi definitions here ...
  (unless (stringp arg)
    (acpi-power-off))
But again, dumb program can depend on (strlen nil) to power of the machine, which could be part of some critical system that breaks as a result of it not powering off.

Only a compile time check can do it, by preventing the program from existing. A program that never compiled and therefore was never deployed to its installation will never break.


> So, if there are no preconditions, any behavior of the function being called is as correct as any other behavior.

Umm, no. If the program is supposed to unconditionally produce the output "Hello", but it instead produces "Goodbye", then it is incorrect. Unconditionally means not that there are no preconditions but that the precondition is T (true). If the precondition is constant truth, it means that there are no circumstances under which the program can be excused for not satisfying the postcondition (e.g. producing "Goodbye" rather than "Hello").

Correctness is almost irrelevant here, because the question is about breakage. It is in fact the change in specification which is behind the breakage. The function is correctly implementing its specification at all times; the correctness of my example strlen is never being called into question. What it means for it to be correct has been changed, and the function's definition followed suit.


A breaking change is one that breaks code that uses it correctly. This means that the code that uses it uses it according to its specification. Breaking code that uses your code incorrectly should happen early and often because using code incorrectly is a bug in the using code, not the code being used.


That's just a blame game that is neither here nor there to the user who can't open his ten-year-old word processing document with the latest version.


It makes all the difference in the world for that problem. It’s how you build programs that can be maintained and evolved in a consistent fashion long-term.


I would agree the author was wrong, and I would hazard a guess that you're a lot more well read than most of us on this kinds of stuff, because this talk is widely seen as an insightful.

Is there a language that you feel solves this problem better than Spec could ever hope to?


My first thought was, "I wonder what the author considers a non-breaking change..."


Comments, updating dependencies, or documentation


Updating dependencies could way more easily be a breaking change than what we're discussing... And comments and documentation aren't code changes at all.


Very few people write code whose output purely depends on the choice of types involved. Somebody, somewhere, has to make the choice of placing max/min/wrapping_add/wrapping_sub/wrapping_mul/... in the (u32,u32)->u32 shaped hole in your program, and placing the wrong one is probably a bug. Any caller expecting an arbitrary (u32,u32)->u32 recieving your changes shouldn't break, but not all callers have correct behavior for truly arbitrary functional dependencies.

From the other side, people picked your function in particular not because it fills that type hole but because it does something in particular. If you meaningfully change the thing it does then that probably breaks the callers.

The author's argument is something along the lines of (1) if you're changing the type signature, that's probably because a new implementation requires it, and (2) if the new implementation is sufficiently different to warrant a change in the type signature then it's going to break somebody. I don't particularly like the example since in my experience somebody adding a nullable input will leave the old behavior alone and just wants to call the same function somewhere else, so it probably actually is a non-breaking change for all existing callers, but it's not hard to imagine stronger examples where the author would be right.


To be clear, nothing I said has much to do with types: preconditions and postconditions are a more general notion and types are one way to approximate them in such a way that they can be automatically checked. However, because of the halting problem, there will always be correct programs that cannot be typed.


"Breaking change" means it changes existing behavior. As in I used to get 'banana' when I passed 'b' and now I get 'bicycle'. If I still get 'banana', you didn't break anything.


No, a breaking change is if the interface contract changes in a non-backward compatible way. If the interface contract never promised you to get “banana”, and you still relied on that behavior, then yes your client code may break, but that’s on your side of the program, and your fault for relying on unwarranted assumptions, and not a breaking change in the library you called.

The purpose of interface contracts is exactly to precisely specify what can change and what can’t change without breaking clients, and conversely which assumptions clients can or can’t make. This is a precise give and take. The clients are expected to play by those rules just like the implementations of the interface are expected to maintain their promises — nothing more, nothing less.


This is an academic definition. In a system with dependencies, if you change behavior your dependents relied on, you broke them. If you didn't, you didn't.

In other words, if you break prod, "but we never promised to behave this way" is a poor excuse. When changing something, it's your responsibility to check for breakage, not just throw up your hands and say "not our fault" if it occurs.


E.g. a number of years ago now someone in Glibc development thought it would be fine to change the behavior of memcpy for overlapping objects.

Nobody does that, right? It's undefined behavior.

Downstream distros broke; it had to be backpedaled out.


It sounds like we agree.


Author isn't wrong though.

Most accurate and practical definition of a breaking change is - change dependents can't detect and complain.

You replace O(n) algo by a O(n^2)? Breaking if someone complains. Vice versa as well.

Is fixing a bug a breaking change? Yes. Ask Microsoft about porting SimCity that relied on a bug in MS-DOS.


You can make a breaking change without changing the type signature ("do_something now uppercases and reverses the input string before using it"), and you can change the type signature without making a breaking change ("do_something now accepts None and treats it equivalently to an empty string"). The author is conflating these two scenarios.


Someone is also sitting on the sidelines, snickering about how primitive the types and function signatures must be for that to happpen.


Do you know of a language that encodes whether a function has the capability to make a reversed uppercased copy of a string in the type of the function?


Dafny and Whiley are two examples with explicit verification support. Idris and other dependently typed languages should all be rich enough to express the required predicate but might not necessarily be able to accept a reasonable implementation as proof. Isabelle, Lean, Coq, and other theorem provers definitely can express the capability but aren't going to churn out much in the way of executable programs; they're more useful to guide an implementation in a more practical functional language but then the proof is separated from the implementation, and you could also use tools like TLA+.

https://dafny.org/

https://whiley.org/

https://www.idris-lang.org/

https://isabelle.in.tum.de/

https://leanprover.github.io/

https://coq.inria.fr/

http://lamport.azurewebsites.net/tla/tla.html


Maybe Idris or Blodwen. Idris can at least define a printf function where the string must contain the right template, i.e. printf("number %i text %s", 5, "hey") compiles, but printf("number %s text %s", 5, "hey") does not, so I'm not sure it's impossible to encode string reversal in the type signature.


You really don't need dependent types to do the printf thing, any language with GADTs or equivalent typing functionality can do it, see for instance OCaml's Printf module. Although you do need a little bit of magic from the compiler to get that exact syntax, which you can get rid of with dependent types.


That’s a definition for observational equivalence, not of breaking change.

There would be very little point in doing changes that are completely undetectable. Surely, there is Hyrum’s law and fixing even a bug can be a breaking change, but I believe it is not as black-and-white as in the case of compiler optimizations. For example, the reverse of your aforementioned algorithm from O(n^2) to O(n) is still detectable, but should not cause any harm in any reasonable program. Surely it can break a program full of race conditions that just so happened to work with the former implementation, but it should be similar to how compilers handle UB — if you are in the wrong, then I really can’t save you (at least that is my opinion).

Also, relevant xkcd: https://xkcd.com/1172/


Breaking change in practical terms means a change that doesn't break my environment.

That's why Microsoft was in a bind. You fixed an issue. Your change obeyed all relevant theoretical stuff, but to a consumer it's a breaking change.

But customer sees that going from Win X to Win X+1 broke their stuff and they will harass your support.


I agree with you wholeheartedly


I agree with Rich Hickey here and I think this is a flaw of sum types compared to union types. If Option<T> was a union type, T would be a subtype of Option<T> and those changes wouldn't be breaking. When writing Rust, I find myself repeatedly writing implementations of the From trait for enums, just because most of the time I actually want union types, not sum types. I think sum types should be built on top of union types by combining union types with a "lexically-scoped newtype", if that makes sense (i.e. they wouldn't be built in, they would be the result of two orthogonal features, while allowing for other combinations as well).

Edit: another nice consequence of this design would be None being its own type, which can be transparently converted to an Option<T> for any T, allowing for type inference to go in only one direction, while still avoiding the boilerplate of Option<T>::None. Unidirectional type inference would make for more intelligible compiler errors. Full Hindley-Milner can get confusing.


Union types are horrendous in practice because they're non-compositional. None|T is usually disjoint except when it isn't, and it's really easy to forget that case and fail to test it properly.

Having used Scala extensively, None as its own type is very much a mistake; it's not a type that you ever want and it only serves to get in the way.


Not sure I follow your argument, but I use union types extensively in TypeScript and I don't think they're horrendous at all.

E.g. it's very common to define a helper type like

    type Maybe<T> = null | undefined | T
and then with TypeScript's flow analysis I can easily determine when it's safe to call properties on a value, e.g.

    const foo: Maybe<Foo> = ...
    if (foo) {
        // Typescript now guarantees foo is not null or undefined
        // thus it's a Foo, so I can call any Foo methods on it
        foo.bar();
    }


> Not sure I follow your argument, but I use union types extensively in TypeScript and I don't think they're horrendous at all.

Have you ever used a language with a good implementation of sum types?

> and then with TypeScript's flow analysis I can easily determine when it's safe to call properties on a value

The problem isn't calling methods on null or undefined. The problem is when the value is null or undefined, but not for the reason you think (e.g. you think your code set it to null, but actually the generic code you called came back with null: T), and so you mix up the semantics.


While I get what you're referring to, the issue of Typescript getting this mixed up is not a real issue.

Why? If you write a function that takes T | null and returns T, then you're going to need to filter out your null.

Now, Typescript isn't safe, so you can have f: (T | null) -> T and give it a (null | null) and its return will indeed be null (at least at the moment, see [0]).

But your function's type guards will, at one point, filter out the null case. You might try doing this through some assertion, but if you do the runtime check, it will in fact blow up at runtime.

So Typescript will allow invalid code, but if you are actually doing type assertions correctly you'll get a runtime error at the "type narrowing" step. Worse than static verification of course, but better than carrying around a null that you think is not null.

Anyways, yeah, the conclusion is that in TS at least, T | null narrowing to T doesn't mean that T is not null. TS is at least smart enough to handle that.

[0]: https://www.typescriptlang.org/play?#code/GYVwdgxgLglg9mABGO...


Yes, it is an issue, but not for the reasons you think. For example, take these two APIs:

    // returns null if post doesn't exist
    async function loadPost(id: string): Promise<Post | null>;

    interface ResourceState<T> {
        // null if no data has been loaded yet
        data: T | null;
    }
    function createResource<T>(load: Promise<T>): Observable<ResourceState<T>>;
Both are largely designed to fit together (`createResource(loadPost(id))`), but if you do then you now have no idea whether `data` is null because the post couldn't be found, or because it's still being loaded. You would either have to use different nulls (null vs undefined) or have `ResourceState` box T somehow.

§um types are safe from this because they can distinguish between `None` and `Some(None)`.


Right, I agree with that. I think in practice this is largely circumvented by ad-hocing tagged unions in these cases.

Higher level, I think that it's fairly rare to see Optionals outside of the _very_ basic cases, and downstream of that you're actually not flinging around nulls or undefined as data. Instead everyone reaches for a kind attribute. Especially in cases like you're talking about (where there's this notion of not finding something, but also this notion of something still loading).

Not to say the distinction doesn't matter, but TS feels well designed in the sense that all of its unsafety is in places that end up not coming up in many "normal" codebases


> But your function's type guards will, at one point, filter out the null case. You might try doing this through some assertion, but if you do the runtime check, it will in fact blow up at runtime.

The problem is the other side: if you filter out null then you accidentally end up filtering out part of the T case as well, when you only meant to filter out the case that was not T.

More generally, I submit that if you're doing your types properly you never want to collapse T|S into something different from a sum type. You have code that handles T which you expect to handle the cases that come from the code that yields T. You have code that handles S which you expect to handle the cases that come from the code that yields S. When a T turns out to be an S, or vice versa, that can only ever be a nasty surprise, or at best some code that works by accident.


Should T | null narrow to T | never inside a !== check ?

I'm not sure what this would imply.


On the other hand, explicitly writing coercions from T to Option<T> gets tiring real fast. I'm sure it should be possible to design a typing+coercion algorithm that automatically coerces `x` into `Some(x)` in 99% of the cases that are actually unambiguous (either due to the type of `x` being known not to be an option or due to parametricity), which would be the best of both worlds.


Not the parent, but I have used Standard ML, OCaml, Scala and Swift, and I like union types as well. They are strictly more powerful than sum types. If I want to create an Option type such that Option<Option<T>> = Option<T>, then that's possible. With sum types only, it is not. Yes, this gives you Option<T> = T in situations where T = Option<S>. If I want to have a sum type, then I can just create a distinguishing "kind" field, as you do in TypeScript.

The argument against union types would be, that they are an advanced feature compared to sum types, and sum types are simpler to use right. The argument against sum types is that they are moving the role of types from descriptive to prescriptive, and that is what leads to the problem Hickey is pointing out.


What happens in the following case?

    const foo: Maybe<boolean> = false;
It seems to me that your code wouldn't behave they way you'd expect.


If you want to cover this particularly case, strict-boolean-expressions in tslint should catch it.

That being said, I agree it's an unfortunate footgun and think tsc should yell at you for using non-boolean types in conditionals if strict is on.


That’s because of the "truthy-ish" nonsense mess inherited from JavaScript.

You could say the same with 0, "", NaN.

That being said, I wonder what was the logic behind the decision to implement that.


Coercing nullishness/undefined-ness to false arises from languages like C. It is an unfortunate footgun, but not what most people think of when they think of Javascript's coercion weirdnesses.

Typescript supports this because it is a strict superset of Javascript.

They certainly could have made TS strict mode complain about it though.


Would union types be manageable if you only use them with concrete types rather than generic or inferred ones, or is it too easy to accidentally refactor a None into one of the cases (possibly hidden behind a typedef)?


A feature you can't use with generic types is pretty useless, frankly. So "manageable" yes, but the cure is worse than the disease.


I wouldn't say non-generic union types are useless; in fact the vast majority of sum or union types I write in application code (not libraries) are a combination of two or more concrete types, rather than generic ones.


What does it mean for None|T to be disjoint?


"If it's None then it's not T". It's very natural to write code that handles the case where the value is None, handles the case where the value is T, but subtly malfunctions if T can be None (e.g. you might write a cache and use None to represent the value not being present in the cache - but then your cache silently fails to cache if the thing you were caching returns None).


How is it natural to write code like that?

If it's None, don't put it in the cache. If T is None, don't put it in the cache.

if (typeof input != None) putInCache(input)


> If it's None, don't put it in the cache. If T is None, don't put it in the cache.

> if (typeof input != None) putInCache(input)

Exactly, now you've just written exactly the bug I was talking about.


If T is int|None, then None|T can't distinguish between None and T(None), whereas None|Some(T) can distinguish between None and Some(T(None)).


IF T is int|None, then None|T is None|int (and int|None), it's just a logical union of all possible values. You're complaining against issues of some weird Rust-enum-wrapper-like unions, which is exactly a problem proper unions types resolve.


> IF T is int|None, then None|T is None|int (and int|None), it's just a logical union of all possible values.

Which is confusing and introduces subtle bugs. It makes it impossible to reason about any part of the code in isolation, because you can't understand the behaviour of None|T unless you know what T is.


I think it’s only confusing if you use None as a _magic value_ with a meaning dependent on context. Like if you use -1 as a magic value of an int which means not a negative one (which you can do arithmetics on) but as “infinity” or “default value” or “use alternative method of calculation instead”. I think it is known that magic int values always come back to bite you later.

Same with None. None means “absent value”. If you use it like that, no confusion at all. Absent value or int is absent value or int. However, if in some places None means “absent value”, in others “error value”, in others “infinity”, in others “empty collection”—due to programmer’s laziness instead of using proper types, then _surely_ it will be impossible to reason. But not due to union types, I think.


> I think it’s only confusing if you use None as a _magic value_ with a meaning dependent on context.

Which is essentially every use of None? Like, the whole point of an option type, an X | None, is that None is not an X, and means something different.

> I think it is known that magic int values always come back to bite you later.

But the reason a magic int is a problem is because it's also a valid value. If you use -1 as a magic value, you will get confused because you can't tell whether it was whatever magic meaning you meant or the actual value -1. The whole point of using int | None is to avoid that problem, because None is never a valid int. Unfortunately if you have inclusive unions then that breaks down - you use T | None because None is never a valid T, but then if you try to use that with T = int | None, whoops.

> Same with None. None means “absent value”. If you use it like that, no confusion at all. Absent value or int is absent value or int.

What does that mean? I don't think there's any universal notion of "absent" that applies to every function in a single program, much less every program.

What None means is context dependent, sure, but that's fine as long as that context is local; after all, what e.g. 3 means in your program is also context dependent (maybe it means "3 users" or "position 3 in the array" or "file not found"). If you have proper nesting options, then maybe you'll compose together three layers and in the end you have some value where None means "not cached" and Some(None) means "error computing" and Some(Some(None)) means "empty collection" - but that's absolutely fine, each layer knows how to handle its own option and knows what None means in that context. The problem only comes when you have union types, because then you can't compose your layers without them interfering with each other.


> Like, the whole point of an option type, an X | None, is that None is not an X, and means something different.

Absolutely not! X|None does not mean X is never None. It's just a logical union (∪) from school. In this case it means "all possible values of X and also None if it was not in X".

For example, a function may take something like "Indexable<T> | Iterable<T>", and it is fine if it is passed Vector<T> which is _both_ Indexable<T> and Iterable<T>, the sets are not disjoint.

I think this is the root of our mutual misunderstanding.

> I don't think there's any universal notion of "absent" that applies to every function in a single program, much less every program.

I agree it's sometimes hard to maintain the same semantics over a codebase, but that's because software architecture work is hard.

Number 5 should mean approximately the same over the entire codebase, and for sure programmers will find a way for it to mean different things in different parts of the code, but that's what makes it a sloppy code that is difficult to maintain!

Similarly, if None means different things in different parts of a program, that's not a fault of mathematical logic or union types, that's just sloppy programming!


> Absolutely not! X|None does not mean X is never None. It's just a logical union (∪) from school. In this case it means "all possible values of X and also None if it was not in X".

Types are not sets, and thinking of them as sets will lead you astray.

> For example, a function may take something like "Indexable<T> | Iterable<T>", and it is fine if it is passed Vector<T> which is _both_ Indexable<T> and Iterable<T>, the sets are not disjoint.

Only because you don't care whether Vector<T> is processed as an Indexable<T> or an Iterable<T> - which is because you know that it implements both in a way that's consistent with each other, which is because you know there's a relationship between those two interfaces. But that kind of relationship ought to be expressed in the type system (in this case Indexable<T> should probably be a subtype of Iterable<T>), at which point you don't need to use a union at all.

The key use case for a union U | T is when the two types U and T are unrelated. And in that case, if you passed a type that happened to implement both U and T, you would very much care about whether it was processed as a U or as a T.

> I agree it's sometimes hard to maintain the same semantics over a codebase, but that's because software architecture work is hard.

> Number 5 should mean approximately the same over the entire codebase, and for sure programmers will find a way for it to mean different things in different parts of the code, but that's what makes it a sloppy code that is difficult to maintain!

> Similarly, if None means different things in different parts of a program, that's not a fault of mathematical logic or union types, that's just sloppy programming!

A codebase has to work up from the generic to the specific. Ultimately programming is the art of translating a business problem into a bunch of 1s and 0s, it would be absurd to demand that every 1 or 0 has the same semantics everywhere in your program. Just as at the very low levels you have code that interprets a bitpattern as a number or a character or an enumeration, at a slightly higher level you'll have code that interprets a collection as meaning exclude/exclude/transform or a value as meaning target/default/.... Particularly in library code, you don't necessarily know what the objective semantics of the values you're working on are. And all that's fine and normal - for most code, the internals of the value you're working on are and should be a black box - e.g. a sort function doesn't and shouldn't know or care whether the values it's sorting are numbers or strings, or whether one string is alphabetically before or after another - all it knows is that it has a collection and a way to compare elements of that collection. You should be able to use the same sort function to sort a collection forward that value withor in reverse, even though those are the exact opposite of each other.

It only becomes a problem if you mix up your layers - e.g. if you somehow pass a bitpattern that was meant to represent a number to a function that thinks it was meant to represent a string, or if your sort function confuses the magic value that was returned from the comparator with one of the values it was meant to be sorting. That's not (just) sloppy programming, it's poor language design, because you shouldn't even be able to make that kind of mistake.


It seems like the issue then is that int can be nullable. The solution is to make int an int. You cannot assign None to int.

I'm not an expert on programming languages, but asaik this is how typescript operates


That's not quite what the parent is trying to say. Assume that int is not None. Create a type called T that is int|None - a term of type T can be an int, or it can be None.

Now have a function whose argument is T|None. Substitute in the definition of T and you get (int|None)|None. If the function's argument is None, what does that mean? Was it given a T that happened to be None, or was it not given a T? Nobody knows.


> Was it given a T that happened to be None, or was it not given a T?

Hmm. It _feels_ like a function shouldn't ever need to differentiate between "what the caller thinks it has" (that is - being passed a "T-which-is-None" should have the same significance as being passed a "None"), but I'm not confident claiming that would ever be the case.


Consider an expensive computation that may have no answer, and so it returns (after some heavy processing) "int|None".

Also consider a cache lookup function that takes some key and returns a generic T|None for T the type of cache entries, with None signifying the key was not found.

Both are, on their own, pretty reasonable things to have.

If you are using union types and you put the results of your expensive computation into your cache, you will not be able to tell when you've done an expensive computation for some key and got a None result, or when you haven't got a result in the cache for that key.


You can have both union types and sum types (imo ideally), or in this case can easily choose to represent the result of the expensive computation in a more reasonable/descriptive way.


Perhaps the more reasonable/descriptive way to represent the potentially no-value result of the computation is with a sum type, not a union type?


> If you are using union types and you put the results of your expensive computation into your cache, you will not be able to tell when you've done an expensive computation for some key and got a None result, or when you haven't got a result in the cache for that key

This is certainly an issue, but that's an issue for the _overall system_ of a cache which stores T. My claim wasn't that "Union types that Union with None can never cause issues", but rather that, for a function whose argument is `(int|None)|None`, there shouldn't ever be any different behaviour _of that function_ between "passed a T (int|None), which was None", and "passed a None (not a T)". The function itself should still behave the same way.

You're right, of course, that _returning_ an (int|None), where "None" might mean "the answer is definitively known to be None" or might mean "the answer is unknown, and that is represented as None" can lead to unnecessary recomputation - but that's an issue of return types, not of parameter types.


One function's return value is another function's parameter. If the overall system has a bug, that's a problem, and I don't see any value in trying to quibble about whether it's a bug in the caller or the callee.


> I don't see any value in trying to quibble about whether it's a bug in the caller or the callee.

Right, neither do I - like I said, "that's an issue for the _overall system_ of a cache which stores T.". The bug is that a given type (None) has different meanings to different components of the system, and this bug only arises _because_ "One function's return value is [being passed, directly, without any interpretation, as ] another function's parameter". If the type signatures were changed so that interpretation was required - so that "the answer is None" could be distinguished from "I don't have an answer" - then the bug in the overall system goes away.


> this bug only arises _because_ "One function's return value is [being passed, directly, without any interpretation, as ] another function's parameter".

Which is the normal way of programming. If you can't safely compose functions without adding an extra layer of interpretation between them, programming becomes much harder.

> If the type signatures were changed so that interpretation was required - so that "the answer is None" could be distinguished from "I don't have an answer" - then the bug in the overall system goes away.

Which is something that using sum types rather than union types achieves by default. Wherever you want to localise the problem, union types add a big, easy class of ways to shoot yourself in the foot that just aren't there if you use sum types.


It is useful to differentiate this, though. Consider a datatype of map with values of type T. The function to return value for a key returns None when the key is not in the map, but what if the key is in the map, but the actual value is None? IIRC Lisp has a workaround but it requires special code to handle maps that can have None as values.

As the other posters have said, it (automatically flattening the Maybe monad) is non-composable and should be considered a bad language design, like it was a bad idea to automatically flatten lists in Perl.


Arguably, composing/depending on the composability of optional values is also a code smell. A Some(Some(None)) may be occasionally okay, but it is almost always better represented by an explicit type. Also, clojure can better express these in-between states, as demonstrated by the very good talk (I swear Hickey always sells me on Clojure.. I really should write something more serious in it)


That's why in python that raises a KeyError, or if you use explicitly .get() you can specify anything as your default value.


Which is why you end up with the painful

    sentinel = object()
    value = dictionary.get(key, sentinel)
    if(value == sentinel) ...
Which, sure, it works, but it's working around a problem that didn't need to exist in the first place. Why not just have Option work the way you expect, and be able to contain None the same way as it can contain any other value?


Well no, just handle KeyError :D


If you do that you still have the same problem, although the conditions for triggering it are narrower - you can't tell the difference between "your" KeyError and a KeyError from the callback you were passed.


It's probably subjective, but I don't see the need to wrap your code in a try/except block as an improvement over an optional sum type, and I really don't see the use of a magic value to indicate no value as an improvement over anything.


If it's not a try it's an if... you still need to handle the case where the key wasn't present and figure something out.


I don't think you should write code where it makes a difference if the None came from a T that happened to be None or from a None.


Why would you ever use T | None otherwise? As far as I can see the only reason to ever use T | None is because you want to write code that does one thing if it's a None and another thing if it's a T. Having it do the other thing if that T happens to be None makes no more sense than having it do the other thing if that T happens to be 5; at best it might work by accident, more likely it will be a subtle bug.


Let's try this in python…

    >>> T = int|None
    >>> T|None
    int | None
Yep, quite simply this is solved flattening the unions and removing duplicates :)


Yes, that is the problem, imagine you might write a cache and use None to represent the value not being present in the cache - but then your cache silently fails to cache if the thing you were caching returns None


As I already wrote in a different comment, KeyError and None aren't the same.


That’s not “solved”, that’s highlighting what the problem is.


Wait so in Rust you can’t have an Option<Option<T>>? That’s really bad!


> Wait so in Rust you can’t have an Option<Option<T>>?

You can, Rust has sum types rather than union types.

> That’s really bad!

I agree, but some people seem to like union types for some reason.


You can, because Rust doesn’t support first class Union, and uses ADTs. Something like Typescript allows you to write “(T | null) | null)” but that is not the same thing as a double-Option’d type.


Rust actually does have an untagged union type, but it generally requires some unsafe to use it, doesn't seem to get much use beyond c/ffi.


but the untagged union is still an aggregate at the type level right? Your data is a union at runtime, but at the type level it's a distinct thing, yeah? TS in particular is offering bespoke anonymous unions, which is a different sort of beast.


Of course, rust being nominally typed it is a distinct thing, while TS is structural. But that aspect should affect much more than just unions, (that said I personally know very little about typescript so talk my opinions here with a grain of salt).


It was added specifically for c/ffi, so this is cool.


In Rust you can, in Python an Optional[Optional[int]] is the same as an Optional[int].


You can have this, I’ve used it in Rust before.


I use this crate to avoid the boilerplate https://jeltef.github.io/derive_more/derive_more/from.html


Whenever i design a lang, i just give option semantics to types of the form `x|none`


> which can be transparently converted to an Option<T> for any T, allowing for type inference to go in only one direction, while still avoiding the boilerplate of Option<T>::None

You don't need to qualify the None, the following works:

    fn foo<T>(x: T) -> Option<T> {
        None
    }


Yes. That's thanks to Hindley-Milner type inference which makes the compiler infer which None you mean. But in longer code snippets full Hindley-Milner type inference can cause hard-to-understand type errors. Here specifically the compiler infers the type of a value in your function body based on the type of the function it belongs to, which is in the opposite direction than what a programmer typically thinks in. It leads to spooky actions at a distance.


> It leads to spooky actions at a distance.

I'll disagree on "spooky". If the type is ambiguous, the code will fail to compile. Contrapositively, that means that if the code compiles, the types aren't ambiguous. I find bidirectional type inference to be absolutely lovely (or at least Rust's implementation of it), and I wouldn't give it up.


As another poster said, None is in the prelude. It’s been explicitly imported, just… implicitly so. Other than that, Rust places strong emphasis on function signatures. They override everything else, such as inferred return types from the function body. This is to ease composition and reading (as far as I remember). That is to say, in Rust it is quite natural to refer back to the function definition, potentially to let it do inference heavy lifting, or work with the question mark operator. It doesn’t feel the wrong way.


> But in longer code snippets full Hindley-Milner type inference can cause hard-to-understand type errors.

I would love to see examples of this. We've gotten much better on this front over the years, but I'm sure there are plenty of cases yet to be addressed.

The "problem" with improving error messages is that the common and easy cases are addressed early, leaving only the uncommon and difficult to address left after a while, and people get used to the understandable errors which leaves them baffled when they encounter one that isn't.


It's actually not. There's simply a 'use Option::{None, Some}' built-in to make them easier to work with.


The GP is referencing cases where inference can't figure out what the T is in Option::<T>::None. This can happen in the body of a closure without an explicit return type, for example. To solve it you have to specify the type either earlier in a place that helps inference (in the example, add a return type) or in the expression, like None::<()>.


Yeah sorry I apparently can't read, it was pretty obvious. Thanks.


> Unidirectional type inference would make for more intelligible compiler errors. Full Hindley-Milner can get confusing.

Bidirectional type checking has good error messages and a very intuitive implementation.


Rather than just taking a statement from OP and stating the opposite, can you elaborate on why you disagree with them?

My personal experience has been very much in line with OP's: nonlocal type inference (in Rust specifically) frequently makes code hard to modify because a change can cause inference to fail in unexpected ways and places. Local inference doesn't have the same tendency.


They didn't state the opposite, bidirectional typing rules are a technique to describe a type system, distinct from both Hindley-Milner's and simple "unidirectional" checking.

I'm using it for my personal lang project and I'm quite satisfied as well, it translates easily into code and interacts well with subtyping.


> most of the time I actually want union types, not sum types.

Really? This surprises me to the point that I'd like to ask what you are coding.

I'm having a very difficult time picturing "You have A|B|C|D and you send in an A and extract a D" (union type without tag). The only way to do that is to guarantee memory layouts, which is something Rust explicitly does not do.

What am I missing?


You're thinking about C unions, not union types. Union types behave more like unions in set theory. Sum types behave more like disjoint unions. In particular, if you have a union type A|B, then A is a subtype of A|B. If you have a sum type A+B, then A is not a subtype of A+B, because for A=B, A+B has two instances of A, i.e. A+A != A, so for each element you need to be able to distinguish which instance this element is from. Whereas with union types A|A = A. It doesn't mean that union types are as willy-nilly as unions in C. It just means that you can assign variables of type A to variables of type A|B without any ceremony. If you want to extract an A from A|B, you need to write a runtime check (just like match in Rust with sum types). It does mean though that you can assign a value of type B to a variable of type A+B and then extract an A, if that value was in the intersection of A and B.


I understand both the mechanics and the implementation. What I was asking for was the use case.

Making assignment easier and compile time at the cost of making access costlier and runtime seems like an unusual tradeoff. I'm very interested in the case that makes that worthwhile.


Most of my enums are sums of types that don't have any intersection. You can think e.g. of a type Token that is a sum of Identifier, Number, etc. It's annoying to keep wrapping Identifier in Token, instead of just assigning. I solve that by writing generic functions that accept anything that implements the From trait, but it's still boilerplate all the same.

There is no shifting of cost from compile time to runtime, or slowing down access. Just like with sum types you're writing match clauses from time to time, with union types you do the same. If you need to inspect the discriminant, then you need to inspect the discriminant.


There is some runtime complexity here though.

With both enums and sum types, if you want to differentiate the types at runtime, you need to include a discriminant in order to be able to do that. Enums include that discriminant explicitly (that's what the `Token::Identifier(...)`/`Token::Number(...)` part represents). The memory layout for, say, a base identifier can consist only of the necessary fields for that identifies, but when it gets wrapped it will get an extra field that identifies which variant of the en it is.

With sum types you have two options:

* You always include a type tag directly in the representation (so a base identifier always has an extra byte or so of information with it). This is typically used in dynamic languages where the types are known at runtime anyway. I believe it's also the case in Java and I suspect other VM-based languages are similar. * You dynamically generate the enums at compile time (so a base identifier doesn't have a tag, but if it's used in a `Identifier | Number` union, then a tag for that specific union gets added in).

The problem with the first is that it adds unnecessary runtime cost, because now every instance of any type has extra information in it that probably isn't needed. The problem with the second is that the creation and unwrapping of enum values becomes implicit, and you'll probably need to juggle your discriminants around more. Say I gave a function of type Identifier | Number -> Boolean | Identifier | Number, then should the discriminants for the Identifier and Number cases stay the same? Or do we need to add extra code to munge the input discriminants into output discriminants? In the first case, this adds a lot of complexity to the compiler (and may not in the general case be possible), in the second case we have runtime performance issues.

This is, I think, the point that the previous poster was getting at. If you have implicit discriminators, this will have an implicit runtime cost that may not always be easy (or possible) to avoid. That is the value of explicitly wrapping types in enums - the runtime cost can be minimised, and the developer is always aware of where that runtime cost appears.


I’ve used C++ variants (from a couple of different libraries) a fair amount, and they are sort of union types, and I generally dislike it. The problem is that logically identifying the contents based on type is confusing and refactors poorly.

Suppose I have an Identifier or a Number or a StringLiteral. To work intelligently, this needs Identifier and StringLiteral to be disjoint. This isn’t just a type theory issue —- it’s a semantic issue. What is “foo”? Is it an Identifier or a StringLiteral? What happens when I add a Filename (for including/embedding files in the future)?

Logically, the fact that an Identifier is encoded by a particular type is a bit of an implementation detail, and a good type system will catch errors. It’s the fact that it’s an identifier — that is, it’s purpose — that matters for semantics and for understanding the code, and union types (and C++ variants, for the most part) are very bad at exposing this.


My issues with std::variant in C++ boil down to it being a runtime, not compile-time construct, and surprising interactions with implicit casts, especially in the case of primitive types. After that, unlike what I described about combining union types with lexically-scoped-newtypes, std::variant doesn't let you build sum types when you need them. You're stuck only with union types.

> What is “foo”? Is it an Identifier or a StringLiteral?

Normally, it's an identifier, since there are no quotes. If there is ambiguity about it, the lexical categories you're working with are ambiguous, which is a real issue. When designing a grammar, that should be avoided. Sadly, sometimes we're given a poor-quality grammar like in C, where some tokens can be interpreted both as headers and as string literals (but neither is a subset of the other) and can't do anything about it, but then you just need to decide which category you're going to assign to ambiguous cases (either one of two, or make a third one) and later you can disambiguate based on context. In any case, I don't consider it an issue of the tool, but an essential domain problem to be solved explicitly by the programmer.


>> What is “foo”? Is it an Identifier or a StringLiteral?

> Normally, it's an identifier, since there are no quotes.

Sorry, I was vague. I don't mean the sequence f o o in the input or the sequence " f o o ". I mean the string (String, str, whatever your language calls it) containing the three letters f o o, which is stored in a union type.

Sure, one can lexically-scoped-newtype it if the language supports it, but at that point, unless I'm missing something, it might as well be a sum type.


Transferring costs from compile time to runtime is pretty much the signature move of dynamic languages. If you already carry around type information and check it at runtime, why not just add union types? Your implicit conversion is zero cost at that point, after all.


If I were changing the signature of `do_something` in Rust and wanted to avoid breaking callers when changing the function from accepting `T` to `Option<T>` , I'd do the following:

    fn do_something(value: impl Into<Option<T>>) {
        let value = value.into();
        // remaining function body using `value` as an `Option<T>`
    }
Existing calls to `do_something` will continue working because `Option<T>` implements `From<T>` (and `Into` implementations are generated in the opposite direction whenever `From` is implemented). Passing in `None` will also work because every type `T` implements `From<T>`, and therefore `Into<T>` as well.

I'd even argue that in a lot of cases, this is worth doing even when creating new library functions (where there are no existing callers to worry about). In my experience, it's generally better to keep the API clean from a caller's perspective by taking on extra boilerplate internally in the implementation rather than shift the burden to the callers of the API in order to keep the library's implementation clean.


This is a breaking change too, as it can break type inference: https://play.rust-lang.org/?edition=2021&gist=8d528bcec2b92b...


Interestingly, even though it's a breaking change, breaking inference is explicitly defined to not be a semver-major change in Rust: https://predr.ag/blog/some-rust-breaking-changes-do-not-requ...


Often better to do the conversion then immediately call a non generic inner function. This prevents the rest of the logic from getting monomorphized. It’s a pattern you’ll notice in std.


I came away with a different interpretation of Rich Hickey’s talk. I think he was advocating against using slots, where a slot would be a struct field. A slot has to have a value and if a value is not available then null is put there and now null must be dealt with. Instead I think he was advocating for using dictionaries. You either have a key or you don’t, but you never use null. However it seems that now you would have a bunch of “key in dict” checks which are less efficient than “value is null” checks. Plus confusion about what exactly has been supplied in a dictionary. Although there may be cases where code can just sequence through what exists without worrying about what doesn’t exist.


I think the fact that it's impossible to know what could be in a dictionary without explicitly checking makes dicts an awful solution. A lot of times I'll encounter functions that take maps, but I'll have to reverse engineer the surrounding code (and if I'm particularly unlucky, a related service) to figure out what goes in there. I'll take explicit types over dicts any day.


The article is wrong. Rich Hickey is talking about the surface of the function, interface. The implementation details do not matter in this case.


The article is claiming that the surface and the implementation are related.

If client code passed a null value before, an error would have occurred. Now it is handled with a default that the original code was not accounting for, and this might be bad.

Thus the change in interface _is_ a change in behavior.


The caller could NOT pass a null value before.

Old situation: called function says: "i would crash if you gave me a null value, so my interface says you cannot give me a null value"

New situation: called function says: "i no longer crash if given a null value, which you couldn't do before anyway, so you won't notice any difference"


I believe this is exactly what Hickey was arguing.


But they could pass null, though. Clojure doesn't stop you from passing null anywhere.

Also, thrown exceptions are values. Consider for example a function that searches for a file a return null if it's not found, which you are composing with a function that opens a file but throws on null. You might very well write something like open(search(...)) and catching exceptions above that level. Now if I make open(..) able to accept null (maybe it opens a temp file?) then you now need to add a null check on the return value of search to get the old behaviour. That is 100% a breaking change!


> Clojure doesn't stop you from passing null anywhere

As a C++ programmer (oh, the horror!) I'd say that's an issue with Clojure rather than the general principle. And also one of the reasons I like the pointer/reference distinction in C++ (the former allows null values, the latter does not).

But sure, that changes my interpretation somewhat. It doesn't really change program safety, however.


But you already had to have a null check on on the return value of search if open wasn't previously able to take null, so the behaviour of your code hasn't changed.


Why? You may just have been catching the exception outside both functions. Lots of actual real-life programs, even if it's not the cleanest.


If the code passed in a null value before, it was violating the contract the function provided. A component isn't responsible for what happens when you operate it out of spec.

If I were a hardware designer and some changes to an IC I'm working on are going to make the next batch, say, work properly in hotter temperatures than it could before, and one of my coworkers comes and says, "Don't make that change, what if someone using that IC is deliberately operating it out of spec expecting it to fail and now the behaviour is going to change!" I'm going to start polishing up my resume, because I'm working for an organization that employs lunatics. Happily, that wouldn't happen in the hardware world; unhappily, I'm a software developer rather than a hardware designer.


Relevant XKCD, "keyboard heating":

https://xkcd.com/1172/


Which no client should/would have been using (given the error they would have received). Why would this new implementation be presumed incompatible, when any reasonable person would think this is an addition to the interface, and not a difference to the existing interface?


> no client should/would

No client could, even. He's talking about static types so it wouldn't have been possible even to express a caller not providing a value.


Almost certainly, the behaviour for all possible values that could have been passed to the function has not changed, merely some new ones have been added. If that's true, then Hickey is right, it shouldn't be considered a breaking change.

If at the same time as increasing the domain of the function, some of the return values changed for some possible values, then it is a potentially breaking change (although one not necessarily reflected in the API), and needs to be communicated. Whether it really is considered a 'breaking' change will depend on a fuzzy evaluation of whether it was legitimate for the client to make such assumptions about the return value, and will most likely depend on how the function was documented.

That case, where the function returns different values to what it used to, and which may break code that had expectations about those values, is a change that can happen any time and is not really related to the question of what to do when a function domain increases.


> You are changing the behavior of the function. That is a breaking change.

That makes almost everything breaking.


Well, yes. Any time you change a function’s behavior, you’re risking breaking the behavior of its callers. That’s just the nature of programming and doesn’t seem like particularly controversial statement, honestly.


One of the big benefits of using functions is encapsulation: the idea that you (as the caller) do not need to understand how a function does what it does, you just need to know what it does (perhaps along with some performance guarantees so you know it’s not going to do e.g. an O(n^2) operation). Beyond that, I don’t want to care and should not generally have to care about what the function is doing. As long as I get out (including side effects) what I expect based on what I put in, the internal behavior of the function can change any which way and I would not consider that breaking, because my code—using that function—would still run just fine.


Yes, I agree. I probably should've been more clear that when I said "behavior," I meant "behavior observable by the caller."

Going back to the original article, it seems reasonable to me that the following qualifies as such a user-observable change:

    foo(null) # throws InvalidArgumentException

    updated_foo(null) # the same as foo(0)


Thank you for the clarification; that's a much more compelling argument, and I think I am inclined to agree. I believe, however, that that is not what Hickey is saying. He's not saying "previously you could pass null and it would throw an exception, and now you can pass null and it won't throw an exception". He's saying "Previously if you tried to pass null you'd get a compiler error, but then I figured out that the method as written can handle nulls, so I changed the type signature to allow null to be passed in without the compiler throwing an error about it. See the updated documentation to understand what it does if you pass in a null value."

This isn't a breaking change because up until that point any code that was written to use that method already wasn't passing null (because it couldn't, because it wouldn't compile). The method's behavior hasn't changed, just its type signature, and so for any of the arguments that the existing code might possibly pass to it, it will still handle all of those exactly the same as it would have before (because, again, the implementation did not change).

Therefore, it is not a breaking change.


Then why talk about “breaking” changes? Just say “changes.” Of course, the reason we use the phrase “breaking changes” is that we are making a distinction between certain types of changes.


Whichever line in the sand you draw as an API boundary, if the break doesn't propagate across it then the change isn't breaking in that context. Suppose (as a crude example to simply illustrate the point), a codebase exposes some CRUD operations. Internally it relies on sorting methods and comparator methods for that sorting. If you reverse the order that sort works and also reverse all the comparators then internally you've made a bunch of breaking changes, but externally you can describe the change as non-breaking (probably -- performance regressions can be tricky). The change itself was both breaking and non-breaking, and the choice of description depends on who you're describing it to.


> externally you can describe the change as non-breaking (probably -- performance regressions can be tricky)

Yes, in most programming languages/environments it is always possible to write your application in such a way that any change of a dependency is I’ll break your app. Heck, you could throw an exception if someLib.version != “1.0.2” and then complain that a patch to 1.0.3 is actually a breaking change. Or your app could test that some API method does not exist, then complain when that method gets added later. Or you could complain when performance improvements reveal race conditions in your app (or break your usage of your computer as a space heater).


You can improve performance without changing the function's interface. Or remove a bug which causes an unrecoverable error. Or fix a memory leak. Or improve the error message on an internal assertion. Or add new methods to an object's interface. These would not change the extant observable behavior for most problem domains (although you could probably reasonably argue that any changes in performance characteristics are potentially breaking changes in e.g. game development), and are thus not "breaking", but are nevertheless useful changes.


A breaking change occurs if the function (before the proposed change) can be called in a way that doesn't crash (stop the program, or render it intoperable, or blow up at compile time or basically bring the show to a halt), and after the change, that call yields a different result or has a different effect or output.

We can divide those breaking changes into two:

1. The call obeys the specification.

2. The call, though apparently successful, circumvents the specification, relying on undocumented behavior.

The question is how much we care about 2, and there is no 100% answer.

There can be situation in which some kinds of 2 breakages are such that we care about them more than 1 breakages.

Suppose there is a certain correct, documented way of using the API, but almost nobody out there uses it that way. And suppose there is some undocumented way of using the API, which millions of installations use, thousands of times per second.

Suppose we need to implement something new, or even ore importantly, fix a critical bug, and suppose that the work boils down to either breaking one, or the other. It may be better to break the former to keep the latter working (and possibly elevate the latter to documented status).

In other words, in a perfect world we'd like to say that breakages of type 1 are non-negotiable, whereas 2 can be debated. But we can't even do that.


I've read the article and watched the video that it is a reply to.

I both agree with the author and disagree.

I disagree that Rich Hickey is wrong when it comes to whether those changes should be breaking or not. Those curves can be non-breaking, unlike what the author of the article claims.

But I agree with the author that the fact that they are breaking doesn't really matter.

Rich Hickey mentions this himself in his talk. He says that no one talks about the costs of using Option or Maybe. So then he lays out the costs.

And I was...not that impressed.

Sure, this change that should be non-breaking is breaking. And your downstream clients will have to change their code. That seems bad.

Until you realize that the only change they would be forced to make is to delete what is now dead code.

Sure, it's annoying; I don't deny that. But Rich Hickey claims that it increases code maintenance to have to make the change. I whole-heartedly disagree because any time you can safely delete code, you are making maintenance easier.

Plus, despite all of his complaints that people don't considers costs, he never really considers the costs of his proposal, the biggest of which is the complexity of the language.

I think the industry has rightly come to the conclusion that, all else being equal, and even sometimes when they are not, a more complex language is a worse language.

I think proper union types, the kind needed here, would add enormous complexity to the language for little benefit in rare cases. (Because how often do programmers relax constraints? I don't think they do that often.)

In my language, I'll keep Option as it is, and make such changes breaking, in order to avoid the complexity of union types.


> I think the industry has rightly come to the conclusion that, all else being equal, and even sometimes when they are not, a more complex language is a worse language.

That was a theory behind golang, and the observed result is that you're better off having a little bit more language constructs that may be misused, than having essential complexity in your problem space that you can't easily describe in your solution space, for lack of tools.


That's why I said "all else being equal." Go took it too far.


> And your downstream clients will have to change their code. That seems bad. > Until you realize that the only change they would be forced to make is to delete what is now dead code. > Sure, it's annoying; I don't deny that. But Rich Hickey claims that it increases code maintenance to have to make the change. I whole-heartedly disagree because any time you can safely delete code, you are making maintenance easier.

When every one of 100 libraries you depend on, does that once a month, you are forced to perform hours of daily unproductive and annoying busywork unrelated to your goals, just because every lib author wants to make your maintenance easier!

Of course, at some point you get fed up and just freeze all your library versions and stop ever updating your dependencies—which is arguably the opposite of what everybody should strive for.

> I think proper union types, the kind needed here, would add enormous complexity to the language for little benefit in rare cases.

Yes, and there's another, simpler way which proved to be working well: dynamic typing.


> When every one of 100 libraries you depend on, does that once a month,

That's not the rebuttal you think it is. I mention this specifically in my post: how often do programmers actually relax constraints? I don't think they do it very often.

> Yes, and there's another, simpler way which proved to be working well: dynamic typing.

Funny thing: I implemented something close to Clojure's map stuff in C. It really is dynamic, with checks at runtime.

Having only dynamic typing in a language is as much a mistake as making it complex


> I mention this specifically in my post: how often do programmers actually relax constraints? I don't think they do it very often.

I think every time a new parameter is added to a function, or some new "mode" of calculation gets supported, or a new field is added to a struct, etc. I personally do that all the time, but it's very hard for me to estimate how often other programmers usually do that.

> Having only dynamic typing in a language is as much a mistake as making it complex

I am interested in your thoughts why.


Those changes you mention are actually breaking changes, and Rich Hickey would think so too. (Maybe he wouldn't on the new mode of calculation, but I believe he would for the others.)

The reason: he specifically mentions that tightening the contract should be a breaking change, and adding a parameter is tightening the contract. Same with adding a new field.

I'm only concerned with changes that should not be breaking changes, but are because of implementation issues.

That isn't to say that static type systems are automatically good. If a static type system cannot express dynamic types, it is no good because sometimes, dynamic types are needed.

Speaking of dynamic typing...

Having only dynamic typing means that you have to wait until runtime to catch every mismatch. That's a bad deal when that mismatch may happen in production.

The very existence of TypeScript, the existence of Spec in Clojure, and the fact that type annotations have been added to Python should be enough evidence of this. These were all languages that prided themselves on their dynamicism; if they reneged, there's probably a good reason.

I mentioned above that having static types with no way to have dynamic types is a mistake. I believe it's just as much of a mistake as having only dynamic typing. They are both needed.

But personally, I would make static typing the default.


> and adding a parameter is tightening the contract

If a function took 3 parameters, and I've added another optional 4th one, how is that a tightening? All old callers are still using 3 parameters, their contract has not been broken.

Of course, adding a required 4th parameter is an incompatible change, if you meant that. I wouldn't call it strictly tightening though, since it's incompatible both ways. But yeah, excuse me for being vague, I meant adding optional parameters.

Similarly, adding a field to a struct which the caller does not have to initialize or mention at all, is, in my opinion, also not a breaking change and does not tighten. Since all callers don't need to change their code or semantics.


I thought you meant an extra required parameter.

Adding an optional one should not be a breaking change, and I have an idea or two of how to do it without breaking callers, even in a static language.

Adding a field to a struct should still be a breaking change because that field could be used. If the caller does not initialize it, you might have a bug.

If you're taking about Clojure maps, and not structs, that's completely different. But Clojure maps are open. Structs are closed.

Adding something to a closed thing is a breaking change. Why? Because of what I said above: the new thing needs to exist. Rich Hickey calls this "place-oriented programming," and he hates it for that reason. He's not entirely wrong either because yes, it can easily cause breaking changes.

But when you add something to an open thing like Clojure maps, that should not be a breaking change, and having implemented something like Clojure maps in C, I can tell you that it is not breaking, even in C.


It’s interesting that your definitions of complexity and simplicity are very different from Hickeys.

In “Simple made easy” he defines complexity as entanglement and simplicity as the opposite of that.

A proper union Option = T|None, is simpler than a sum type Option<T> = Option{T, None}, because the latter complects the types into a container. It’s even worse with nominal types because it complects both yhe structure and the name.

I think you think of complexity of language implementation (compiler) or runtime complexity?

That’s a fair argument! It highlights a different preference.


I meant that, but I also meant the definition he uses.

Yes, it "complects" the types into a container, but then you write code to handle the container, and only the container, until you are ready to open it up.

With union types, you must write code to handle all types in the union always.

This complects the code written in the language. It is also an obvious entanglement and meets his own definition of complex.


I feel like you missed the main point of what you are saying.

Whilst existing callers who obey the contract will not be broken by this callers who depend upon this code breaking for null values will no longer do so.

so it is a breaking change for them. You have just extended their behaviour.

That said if we defined the functionality of code as being valid within a scope of input and not defined anything outside of this we are safer in a sense. That said you do require feedback to show you are out of range.

But the point I want to make is that the line of argument that attempts to ascribe changes to behaviour on values that were out of scope as breaking changes is not helpful to anyone who is trying to co-exist via contracts in order to understand what work is expected as a result of change.


It's only considered "breaking" to change the externally observed behavior of at least one previously published use of a code interface. When widening the interface's range of inputs, you can avoid changing existing behavior. We do that all the time when we create a new optional parameter, new CLI tool switch or subcommand, new public function or type in a library, new API endpoint, new web domain, new network protocol.


Others already commented on why this complaint doesn't make any sense, but, hey, at least I was pointed to a Rich Hickey talk I've missed. It never occurred to me before that "Maybe" isn't actually a "maybe", but Hickey is right. Seriously, every single talk of his I've seen so far has something revealing in it. I should go search for all his presentations I've missed, I guess.


Hard to disagree with Rich Hickey re non-breaking changes.

See for yourself the "Fig. 3. Clojure codebase—Introduction and retention of code" chart at page 26 of https://download.clojure.org/papers/clojure-hopl-iv-final.pd...

Rich Hickey is right and has a track record to show.


The changed return convention is potentially a breaking change.

If there are some documented circumstances under which the function returns null, then you can write a program which produces those circumstances and expects the null value. That program will break if there is no more null value.

The only way it can be a non-breaking change is if there are no such circumstances; the function never actually returned null, but only threatened to do that in its documentation or the way it was declared, or both.

In fact, changing a function that returns a string to one that returns a string or nil, can be a realistically useful non-breaking change.

It can be a non-breaking change if the null value is not produced for any client which adheres to the existing API documentation. So that is to say, the circumstances by which the function returns null are entirely new.

For instance suppose we have a function like this:

   number identity(number)
It's an identity function, that only works for numbers. We cannot call identity(nil); that is an error. Since it returns number, it will never return nil.

This is a totally non-breaking change:

   any identity(any)
the function now just returns its argument, no matter what that is. It will now return nil---but, only if invoked as identity(nil), which was previously erroneous.

The conditions are paramount. When we analyze the call, the conditions are easy to think about: is the situation that the program is passing nil, which was previously not allowed, or not? When we analyze the return, it's not so easy: we have to think about: is this returning nil under existing conditions under which it previously could not have done that? Or is it only returning nil under new conditions?

The conditions which pertain to the call are always those of the call. The conditions which pertain to the return are also those of the call, plus the semantics of the function!


>If there wasn't a change in behavior, why are you changing the contract?

Because you're restricting or relaxing the types, and you want to type system to know this.

This doesn't have to translate to a change in behavior.


He isn't wrong, languages that make optionals not backwards comparable are. So swift "string?" is a perfect match for his insight, whereas rust option<string > is a mess.


Having run into the is recently when our cloud API changed a value we relied on to be optional, it is indeed a breaking change in swift.


Isn't what Rich Hickey defense the same as the Liskov substitution principle? Or at least very similar.


Somewhat related. The subclass relationship isn't very rigorous, you're free to have completely different behaviours out of the same methods in a subclass. Liskov says don't do that: if B is a subclass of A it should behave like an A.

When it comes to types, a subtype is one that can be transparently substituted for its super type. If S <: T, a term of type S can be used where a T was expected just fine. For union types, it's a given that T <: T|a for any type a, and a T will behave exactly like a T|a whose value is a T.

The Liskov principle states that a subclass should be a subtype.


I feel like they are talking past each other, or at least talking about different things.

Hickey seems to be talking about breaking the compiler. That is, he is saying that changing the return signature from null|int to int should not break the compilation. In effect, this would create a warning a la 'You are checking for null, but var x cannot be null'.

The author seems to be talking about breaking changes in the behaviour/operation sense.

Those are different types of breaking.


But there _is_ no "breaking" change in the behaviour, because none of the previously-possible behaviours _can_ break. For every way-of-calling the function that previously existed, those ways-of-calling will continue to operate as before.

Some new ways-of-calling will now be legal, but that is not a breakage, because no code that already existed could have been using those ways before, and so no code exists that can be broken.

Clients _may_ choose to write new code to take advantage of the new behaviour - but there is nothing that they _must_ do (unlike with a breaking change, where they _must_ take action). It's for the purposes of this categorization, for communicating about client's responsibilities, that the concept of breaking changes is valuable.


Hijacking this to respond to a reply you made 40 days ago to a comment I made 45 days ago: https://news.ycombinator.com/item?id=36335405

> But that doesn't seem to be what you were, or are, saying. If you issue is specifically with "an epicurean theme park of enlightenment and joy to span the stars", and not with the dramatized reaction to (against) it, then...I must say I still don't understand what the problem is. An experience which is _definitionally_ pleasant, fulfilling, and non-harmful (to self or others) is....well, it's capital-g Good, no?

Technological advance is fine on its own. The risk with "an epicurean theme park of enlightenment and joy to span the stars" is our own mental faculties, presuming no one is left behind. A variety of examples of this turning bad exist in fiction, a particularly pertinent one to me is Asimov's "The End of Eternity", and another one is the post-immortality centuries of Niven's "Known Space", but there are many others. Psychologically theme parks are meant to be visited for a break. If we spend our entire lives in them it does stuff to our capacities.

In everyday life animal species spend most of their problem solving effort on dealing with other members of their species (and some similar species). We are our own greatest competitors. In part, I think animals do this because members of each species are approximately on equal footing, so our individual problem solving is sufficient to deal with other individuals, and our group problem solving sufficient to deal with other groups of us. But nature itself, and the universe itself, is the biggest player. The biggest risk.

I fear that themepark life, as opposed to themepark visit, will either spark ennui and a bunch of interpersonal shenanigans, or will invite people to forget that we're just living in a fragile bubble. If the opportunities are not equal for every person, the situation is even worse. And I like to think that we have a responsibility to our co-inhabitants in the universe, the other species. A themepark life ultimately risks navel-gazing (at the individual or group level) and ignoring these responsibilities as well.

No thank you. I want a nice life, and a nice place to live it. But I don't want ennui. I don't want self-involved navel-gazing. I want people paying attention to the real issues in the universe. We spend enough effort already on dealing with other humans.

> What is it about the real world that is more important, inherently, than "the experiences that arise from living in it"? If a "better" (I'm hand-waving the complexity of comparison, because it's assumed as part of the discussion) set of experiences can arise - with perfect certainty, no trade-offs, no utilitarian cheats, just "everything is better for everyone" - then, as the other commenter said, it would be abhorrent to deny it 'because of some philosophical quibble about a difference between “simulation” vs “reality”'.

Sure. I agree. As long as we take pains to address the inevitable externalities before they impact others (including other species) this is fine. But make sure you know what's "better" for everyone before it is implemented. It's not obvious that even a god could manage this, much less humans (or even the humans/angels/demons of The Good Place).

"What is it about the real world that is more important, inherently" - It's everything, and we're just a part of it. "importance" is a value judgement, so is inextricably caught up in the individual so judging, but subtract out the value judgement and it's obvious that a human and a squirrel are more "important" than a single human. Now expand this to the "real world" entire.


Examples? What problems will be caused if we allow existing callers to keep working as before?


Language features like type promotion or automatic casting to default/zero values are likely to change how the new function is called, purely based on the change in type signature. Even if every old input has the same output in the new function, old code can compile to produce a new input and have broken behavior. That's ignoring any metaprogramming or more complicated type shenanigans.

Is there something special about Clojure making that change safe?


I don’t know about Clojure specifically, but Java and the JVM have bridge methods for such purposes. Meaning, a hidden method with the old type signature would still get generated whose implementation transparently forwards to the new method with the changed type signature, e.g. converting the T value to an Optional<T> value (or whatever).


Rich Hickey is correct.

I prefer sum types.

But this is a legitimate point for union types.


Rick is right I believe. But why not just use overloading instead? That is what is done in many OO-langs to avoid making breaking changes.


This careless approach with nulls kinda works for Clojure because of "nil punning" but is dangerous for other languages I guess


Counter-point: consider the case where the body of the function is only `return value`.


Isn't that an argument that the function implementation might break, not that callers might?

Assuming implicit conversion for union types, you could either widen argument or narrow return without changing that implementation, but not both. A different implementation may not be able to handle either modification, though.


No, he’s right. he’s speaking only of what is required and what is provided, all else being equal. The what and the why are different.

If the function started to provide different results for the same arguments, that’s different from what he’s talking about and would be a breaking change.

Worrying about the internals of the function is a violation of encapsulation. We care what we provide and what we receive back.


I think the poster means that adding null is a big, substantial change to the interface - if you have a function that used to not make sense with a null argument, and then it suddenly does make sense with a null, then the function has changed in a way observable to callers.


> the function has changed in a way observable to callers.

Observable to new callers who might want to pass None, but not to the existing callers who don't pass None as it is because it was not possible prior to the change.


I mean, the article's logic is effectively that you can break code without the type system helping or warning you. I don't think Hickey would disagree with this – it's arguably a point in his favour.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: