This is, IMvHO, such old news that it feels... weird to still read about it in a year with the prefix of 20.
Every programmer who has ever single-handedly written a 100,000+ LOC software system will tell you the same thing: shift as much responsibility on the compiler as you can and have the compiler check the code you write to any extent technologically possible.
Getting rid of bugs by experiencing, diagnosing and fixing them takes at least ten times more effort than getting rid of bugs by not making them in the first place, through expressing the problem at hand with a strong type system.
When you also consider the never ending necessity to introduce change to an already written software system, thus the necessity to refactor code (in the sense of altering the previously assumed meaning of its idioms), the critical advantage of a strong type system becomes self-evident.
I was listening to the John Carmack episode of Lex Fridman from the summer, and he makes a comment about being frustrated that in the Valley there’s an almost religious opposition to IDEs, debuggers, and static analysis.
Some of those tools have only become more powerful over time and I’m perplexed as to the mindset that would make a person averse to automating the drudgiest parts of their job in a career that is almost entirely based around automating things.
When I graduated college in the '00s I thought Vim was the most amazing thing I'd ever learned. Then a colleague at my first job showed me what happend when you typed . after a variable name in Visual Studio. Code completion, inline documentation...my mind was blown, and I never looked back. When I meet a young chap extolling the benefits of Vim or Emacs or really anything that doesn't have stepped debugging and code competition...well, there are no bonus points for doing things the hard way.
Vim is not really an editor. It is a way to edit text, you can use vim in almost any environment. You are also totally wrong about vim or emacs not having code completion, that is just nonsense.
I use (a version of) visual studio, the first thing that I did was installing a vim extension.
Also vim itself already supports almost any of these features with some basic plugins for the completion logic. If I type "." in my vim I get the same thing I would in visual studio.
If you are saying "vim is bad because it lacks X IDE feature" you are missing the point.
Vim: I can do that too, I swear! Just configure some plugins, can’t tell you what they might be though. But I’m turing complete, and I’m the best!
VScode: Of course I can do autocomplete bud. Here, search my package repo, I’ll tell you which plug-ins are the most popular and handle the entire download and install process for you.
Visual Studio: Of course I can autocomplete! Hold my beer while I bring your machine with 16 cores and 64GB of RAM to its knees for multiple minutes ;)
EDIT: I'd also hardly call VSCode "performant", either, at least compared to the multitudes of editors that don't pull in a full-on browser engine for basic text rendering... but yes, it is indeed "performant" relative to Visual Studio.
I used to have that issue 5 years ago but since then it opens under 5 seconds and loads the solutions I need for not much more. Currently using it on a cheapish recent windows laptop but even on a 8gb x220 it works fine, just loads for 30 sec in the beginning and then it's smooth.
Granted I'm only opening .net solutions with under 100 projects each but anything above that is unnecessarily more difficult to navigate with just vim.
And in my experience VSCode is many times slower than Visual Studio on the same projects. None of them are as fast as barebones sublime or vim on an m1 mac but when I load the latter with plugins it's not a huge difference.
They’re different experiences for users with different priorities.
Your priority is evidently ease of configuration, VScode is definitely easier out of the box - there is no comparison.
My priority is hackabilty for streamlined editing and code navigation. Here neovim run circles around vscode.
Neither is “right”, just different preferences. Note that I’m not here to tell you my editor is better, I personally enjoy using it more, and that’s enough for me.
"My priority is hackabilty for streamlined editing and code navigation. Here neovim run circles around vscode."
I hear this line a lot from hard-core vim users, but it's always talked about in the general. I'd like to hear a specific use case and exact scenario where a common workflow is faster in neovim over a more full featured IDE like JetBrains or Visual Studio.
> Why do you even reply to a post that you didn't bother to read? VScode is a neovim frontend.
I did read your post Mr. snark.
> There literally is no comparison, because there is a category error.
Pure pedantry. VScode takes far less effort to get a high quality feature set when compared to vim. That’s the only point of debate that matters for most people.
I think OP wanted to make a categorical difference between vim-the-editor vs vim-motions. The latter of which are supported in all major IDEs as plugins/extensions.
For the people responding to you who are saying you can get all the same things in vim, they're right of course, but a lot of this modern functionality is now built on top of the Language Server Protocol[1], which is an open standard created by microsoft for VS Code.
Kudos on the people who have ported this to Vim[2], but I suspect the support for LSP features will still be better in VS Code
Code completion, search, cross referencing, and all sorts of other features in vim, emacs, and all kinds of other editors (including Visual Studio) predate LSP by decades.
LSP is cool though, an advancement certainly, but it is not a completely new thing.
If you think ctags or rtags are even remotely comparable to what an actual IDE brings you you have absolutely never used more than 10% of what an IDE with semantic understanding of your code can do
it brings up "Code completion, search, cross referencing, and all sorts of other feature" and again, those work much less well in ctags / rtags than with a proper IDE
Kind of a perpendicular discussion but it amazes me that with this language server thing hipsters turned what was a very mature and proven pattern (a plugin architecture) into a distributed software problem. My god talk about doing shit the hard way...
The plugin architectures that I've seen require plugins to be written in the same language as the IDE. For example, Eclipse plugins need to be written in Java.
Language servers run in a separate process to the editor, and they communicate over JSON-RPC. The nice thing about this is that the language server can be written in any programming language - which usually ends up being the language of the code being edited, rather than the language that the editor was written in.
This makes it a lot easier for language servers to be written and maintained by the people who maintain the compilers for those languages, e.g. gopls is written by the Go developers, clangd is part of LLVM.
LSPs turn a M*N problem into an (ideally) M+N problem. You can't do that with any existing single editor's plugin architecture, basically by definition.
Ctags is a better analogy, and LSP has fairly obvious advantages compared to it.
I remember using Visual Studio in college (back in 1999, 2000) and coding circles around folks who were using text editors.
These days, I use Neovim + LSP for a pretty decent approximation of an IDE-- it's quite good. Still not as good as Visual Studio + C#, but I'm on Linux now, and not writing C# anymore, and I definitely prefer an open-source, general purpose, light-weight, customizable editor.
> When I meet a young chap extolling the benefits of Vim or Emacs or really anything that doesn't have stepped debugging and code competition...well, there are no bonus points for doing things the hard way.
I can't speak for Vim, but Emacs has stepped debugging, code completion, etc.
Part of the reason I use Emacs is that I get those features and (and a ton more) with the same light weight interface across multiple languages and platforms. At the same time, it's usually very easy to make Emacs work with third-party tools. On Linux I can step through code using GDB through Emacs, but at work I spend most of my time in Emacs, but have it bring up the MSVC++ debugger when I need it. It's the best of everything.
Meanwhile, I get to listen to my coworkers celebrate new features in VSCode that I've used in Emacs for years...
There are no bonus points for learning a new IDE for every project, either.
Wait until you learn about these 2002 technologies called "extract method", "inline method", "encapsulate fields", "extract interface", "extract local", "inline local", "rename", plus about 200 code inspections...
When I was a kid learning to code, programming books recommended stridently against even using syntax highlighting. It's funny how helpful things get rejected by people who "did it the hard way."
> When I meet a young chap extolling the benefits of Vim or Emacs or really anything that doesn't have stepped debugging and code competition...well, there are no bonus points for doing things the hard way.
Why do you think they are doing things "the hard way"? Emacs can have all the things you have described. And it can do that for languages that are not part of the .NET Framework, they just have to have a language server implementation.
The problem with "IDEs" was placing all your eggs into a single basket. You are on Visual Studio and then you need to do some Java(or Scala, or whatever). And now you have to get another IDE. Some tried to be one IDE to rule them all (Eclipse, Netbeans), didn't work all that well.
You need a good editor - most IDEs don't have one. You need integration with your language of choice. You need compilers and linters and a bunch of other things. Better to glue components that do one thing and do it well, than having one IDE trying to do everything. And those components can fell just as integrated.
>Why do you think they are doing things "the hard way"? Emacs can have all the things you have described
In the first few years of my career I would try every few months to get these things actually installed and working in emacs. It was definitely the hard way.
>The problem with "IDEs" was placing all your eggs into a single basket
Eclipse is a uniquely terrible piece of software and I understand why it might polarize people against IDEs for life. The JetBrains products are pretty good and, importantly, modular and consistent enough across languages that I don't mind.
I believe they’re talking about a different “hard way” in that they’re suggesting you can’t have those features in vim / eMacs etc and have to do something else instead.
You’re not wrong, configuring these editors is 100% the hard way (that doesn’t stop me from doing it though).
IntelliJ and VSCode have both worked pretty well with anything I've thrown at them, I'd consider them both fairly successful as IDEs "to rule them all". Obviously they're both very different ends of the IDE spectrum, but they've both had intellisense, debugging, type revealing, and refactoring features for all the mainstream languages I've tried with them. VSCode particularly tends to integrate well with LSPs.
emacs has plenty of plug-ins that turn it into an IDE. For people who are already comfortable with it, these tools are great. I’m more productive in it than with “modern” tools. For newcomers the learning curve is going to be steeper. I’d recommend learning at least either vim or emacs (or perhaps other text-based editors with similar features, if they exist) though, as it does provide versatility for oneself. Those GUI IDEs aren’t available in every environment where one might want to code or debug.
> I’m more productive in it than with “modern” tools.
Are you objectively more productive, or do you perceive yourself to be more productive?
I was an Emacs diehard for nearly two decades. Then I got introduced to CLion. In retrospect, Emacs was an inferior tool.
While it's certainly true that Emacs can be configured and extended to do all sorts of interesting things, I do suspect many of its users are deluding themselves when it comes to the sheer breadth and depth of functionality offered by a modern IDE. It also doesn't help that its maintainers are living several decades in the past, and haven't kept up with what the "competition" are doing.
I'm certainly not implying that Emacs is a poor editor. Very few IDEs have text editing capabilities on par with that. But its integration with other tools doesn't hold a candle to today's IDEs.
Both Vim and Emacs have plugins for intelligent code completion and viewing inline documentation. I personally prefer to use VS Code or Jetbrains IDEs with Vim emulation, but I've seen setups for both Vim and Emacs that basically made them into full-fledged IDEs.
Full-fledged Plugin-based development environments. An IDE is an Integrated Development Environment; it comes with the features necessary for efficient development built-in.
> there are no bonus points for doing things the hard way
Perhaps not from the technical perspective, but there are social advantages if you are a member of a clique that does things the hard way and considers it a sign of competence. ("Any idiot can write a code that compiles using autocomplete and syntax highlighting, but it takes a true master to achieve the same result using a decades-old Linux equivalent of Notepad.") They don't seem to understand that the ancient masters did things the hard way not as a pointless exercise, but simply because the easy way was not available back then.
Everything is pretty much just language servers behind the scenes these days anyway. Neovim supports those natively, so those young chaps aren’t talking about your vim of yesteryear.
There's a pendulum swing back and forth between local compilation being supported or not... currently settling onto VSCode remote, with Jetbrains Gateway trying to catch up in usability.
I'm adverse to debuggers as i've more than once caught myself following a rabbit hole of steping through code instead of thinking.
IDEs have some use, and static analysis has proven to catch the same mistake over and over, but only as the authors of those tools have discovered that false positives cannot be allowed ever, once there is a false positive the tools is worthless.
> I'm adverse to debuggers as i've more than once caught myself following a rabbit hole of steping through code instead of thinking.
Yes. I was kind of forced to think and do printf debugging at the beginning of my career, after having used before that (as a hobbyist) quite good asm debuggers.
Maybe that's just me, but I also was, I believe, a bit over-reliant on the debugger - I would just compile, run, see what happen, and launch the debugger if something didn't work.
Nowadays I could use sometimes a debugger - but the system I work with is sort of soft real-time so stopping at a breakpoint of even slight changes in timings can change the context - in some cases even printf debugging could make a bug vanish.
If nothing else, debugging without a debugger is a good exercise in logically thinking - and it can save time too.
I have a code base that is a mix of hard, soft, and static real time. I have a command line interface built in to it and a lot debug logging that can be re-enabled.
I've also spent some effort into making it tolerate being interrupted. And there is also the good old technique of inserting break points while the code is running.
I find debuggers more valuable with dynamically typed languages, especially Python. It's handy to be able to drop a `breakpoint()` in the middle of a script when you have no idea what a function is actually returning. The happens more often than you might think.
Of course, when this happens the bug you are hunting is not your only problem. You really should clean up the code to make it clear what is being returned from the function.
Yes yes, indeed. The ability to look at what's happening step by step really hamstrings my imagination. Sorry, no. A debugger is a microscope! It will help you find problems faster and will seed your imagination by filling in what is really going on. It's an augmentation, like any tool. Or do you prefer to stare into space blindfolded?
> A debugger is a microscope! ... Or do you prefer to stare into space blindfolded?
Perhaps illustrating the original point, microscopes aren't used to stare into space. A debugger is a microscope but the most pernicious bugs don't benefit from such a thing.
You say “prefix 20” but there was this weird trend in the early-mid 2000’s where Ruby evangelists really believed that TDD is just as good as static types - even better because you’re forced to test actual business logic! And they even managed to convince masses of programmers that this is true!
I agree with this, yet look at some of the extremely salty comments in this thread. People are upset that something might be useful and that they might benefit from learning it or changing their ways.
Makes sense. People have been burned. Probably there are lots of people who think about typescript environment setup and source maps when they think about typing, or who think about python's "isinstance(str, foo)". Or who think that it's overly complicated arcane nonsense with weird terminology (lookin at haskell). Or that types specifically refer to borrow checker woes in rust.
It's weird to me how scanning the comments all seem to refer to systems with 100k-ish LoC and dozens of contributors.
A big chunk of my job is writing node microservices in AWS Lambda. I do everything I can to avoid shared library code, since past experience tells me there be lots of dragons (mainly in when and how to push or pull lib updates to components). I have a very tiny shared lib that I try to never touch and definitely never introduce breaking changes.
Unit tests are a breeze since I never have to cast objects or worry about generics, etc.
Typescript would slow me down so much and add absolutely no benefit. Maybe I'm misinterpreting though and no one is claiming Typescript would benefit here.
We also have some C# lamdbas and I find writing unit tests for those so much more of a pain - since the shared libs have generics and I'm always casting things. But admittedly I don't know all the tricks.
> Unit tests are a breeze since I never have to cast objects or worry about generics
Generics reduce the amount of things you must care about on your tests.
And you shouldn't cast objects in almost no code ever. Most 100k LoC programs won't need it even once, your microservices should need it proportionally less.
That's the thing. The gains grow superlineraly with the amount of code. They make it just a bit easier to write some trivial 100's LoC programs, and they make it possible at all to have a working 100k LoC system. But if you don't learn them, you won't know where the break-even point is for you.
"And you shouldn't cast objects in almost no code ever." - I have a question about tests. Imagine I want to test a function that operates on quite large application state but not all app state is necessary for that function.
Options:
- Define all app state as a snapshot. Problem: snapshot can become stale, so more infra might be necessary to make sure that snapshot is up to date;
- Pass only the necessary state and construct as necessary. Problem: hard to define whole state precisely and ensure that it conforms to runtime state of a healthy app;
- Pass a subset of necessary state for some execution branch and cast the type. Problem: casting may result in test failures during runtime and potentially other issues such as modify-run-fail debug loop;
- Mock return values of functions called within the function being tested and use any combination of "state passing options above".
In a lot of places I use such approach with custom type helpers and transitive types, and passing in only the necessary subset for smaller functions or mocking return values for bigger ones. What do you think? I know that the AppState can be defined as a union of possible states and together with type guards can address those issues better. I just wanted to hear your opinion on how you would address such problems. I hope I explained it well enough.
export type Fn = (...params: any) => any;
type UnionToIntersection<U> = (U extends any ? (k: U) => void : never) extends ((k: infer I) => void) ? I : never;
export type FirstParamType<G> = G extends Fn[]
? UnionToIntersection<Parameters<G[number]>[0]>
: G extends Fn
? Parameters<G>[0]
: never;
export interface AppState {
first: {
a: number;
b: number[];
};
second: {
c: string;
d: string[];
}
}
type DeepPick<A, B extends keyof A, C extends keyof A[B]> = { [BK in B]: Pick<A[B], C> };
function calculateUsingFirstB(state: DeepPick<AppState, "first", "b">): number[] {
return state.first.b; // some calculation
}
function calculateUsingSecondC(state: DeepPick<AppState, "second", "c">): string {
return state.second.c; // another calculation
}
// function which takes complex state parameter and calculates the result based on results of other functions
function calculateMore(state: FirstParamType<[typeof calculateUsingFirstB, typeof calculateUsingSecondC]> & DeepPick<AppState, "first", "a">): string | number[] {
if (state.first.a > 10) {
return calculateUsingFirstB(state);
}
return calculateUsingSecondC(state);
}
> since the shared libs have generics and I'm always casting things.
This indicates to me that you're trying to write code that isn't correct (not doesn't work, but rather only works because of implicit couplings between components) and/or doing exotic lisp-style metaprogramming.
In the latter case, yeah, C#'s type system isn't powerful enough. Others are (to an extent: arbitrary code execution at compile time is never going to be completely safe).
In the former case ... that should be difficult. Forcing you to be explicit is half the point of a type system.
Casting is often necessary for parsing inbound data from certain mysql libraries or CSV or JSON depending on how it's written. I would guess that might be what the parent is talking about. That said, if you don't cast or parseFloat or whatever in JS you're going to have a lot of trouble. And if you're doing that, why not do it in Typescript where you'll know that the data you're accessing has been safely cast based on its type.
I don't see how they're mutually exclusive. I use union types prior to typeguards to end up with ultimately checked, cast values. Say I have a boolean fetched from JSON as "1" or "0". By the time I expose it to the rest of the code as part of a Record, I want to change it to an actual boolean. At first I'm going to treat the inbound value as a union type, e.g. (String | Number | null | undefined | Error). After I deal with the error cases, I'm going to cast it (or in TS, reinitialize it as a single type, boolean) so that any code looking at the imported value sees it as definitely only a boolean without needing to have lots of different pieces of code run their own checks on its type.
I've seen plenty of Lambda code developed in a "copy-and-paste" style, with little to no code sharing, similar to early CGI scripts from 25+ years ago. It makes maintainability incredibly difficult. The more shared code the better, in my opinion.
Yeah if it's really shared code. But if you're just trying to factor out small redundancies that are only implemented a few times and could diverge in the future - then no.
I'm a big fan of WET - write everything twice. Then maybe the 3rd of 4th time worry about creating some new abstraction to share code. It's so much easier to add an abstraction later when it becomes obviously needed - than to remove one when you find out your three things actually need different variations on the lib code.
My hypothesis is that it's so old to you because that discovery and the spreading revelation was first order to you. To people learning programming today, they end up having to somewhat rewind the timeline and learn everything new at 2x speed.
That's to say, having old conversations with new engineers is a really refreshing exercise and I'd encourage the world to continue doing it.
> shift as much responsibility on the compiler as you can and have the compiler check the code you write to any extent technologically possible.
I think that people first starting out or people who have never worked on large/new code bases with a diverse range of authors don't seem to appreciate this concept.
Rust has a perfectly nice type system by modern standards, but it's nowhere close to showing you just how deep the rabbit hole goes when it comes to avoiding bugs at runtime by having stronger type systems.
For example suppose my Rust function takes a slice of clowns (named unimaginatively "clowns") and also a usize integer k. Can we write clowns[k] ? Rust says sure, it will emit a runtime bounds check to confirm that k is inside the bounds of the slice. If there are sixteen clowns, and we ask for k = 20, this Rust code will panic at runtime.
But we can do better, if we are willing to pay for it. Dependent Types. In a language with dependent types and enough inference our type inference system will conclude that k can be 20 here, thus clowns must be a slice of at least 21 clowns, but this slice has only sixteen clowns - type error during compilation, either k or clowns are wrong.
Now, for cases where bounds checking would be the reasonable thing to do, Dependent Types just result in you writing bounds checks, ie in this case checking k < 16, and so it's possible you will just end up doing more work to result in a program that still just says, at runtime, "Nope, not enough clowns" or whatever like in Rust. The type system will require you to write correct bounds checks, but the Rust bounds checks are auto-generated, so they're correct too.
But in cases where bounds checks were not the only sensible approach, or maybe you didn't even realise a bounds check would be emitted because you assumed it was statically correct - this can catch some bugs at compile time which would otherwise survive into a running program, "Shifting left" is I believe the usual phrase to describe this improvement.
If you thought the function is obviously correct, "Of course there are more than k clowns" but it isn't, the type error may cause you to take that extra moment to think about it. "Wait, why can there be fewer clowns than... oh, I didn't mean clowns here, this should say circus_performers. I'm not even using the right slice!".
> Rust has a perfectly nice type system by modern standards
I disagree -- Rust's type system is pretty weak and very limiting compared to other modern languages like Typescript, Nim, Zig, or even C++. That's without even without getting into dependent type languages.
There are so many basic patterns which Rust's type system can't handle, especially when it comes to compile time types. For example, one recent thing I ran into was trying to downcast a dyn Trait into another more specific dyn Trait. It's just not possible, at least on stable. Instead you have to do ugly work like getters that return sub-traits wrapped in optionals. Ick, the visitor pattern was easier.
Rust's type system philosophically takes a very strict "closed system" approach too. Meaning one can only really write code that's valid for every known instance. This limits entirely valid and useful subset of programs to programmers using libraries or language features. I'm not talking about runtime dynamism, but compile time dynanism. C++ with concepts provides a much more powerful and adaptive type system.
Programming in Rust frustrates me that I can't do any of the compile time checks that I've become accustomed to doing in other languages. It makes doing them at runtime difficult as well.
> but it's nowhere close to showing you just how deep the rabbit hole goes when it comes to avoiding bugs at runtime by having stronger type systems.
Totally agree on that! Though that's missing how powerful compile time programming in general can be even without the dependent types. Personally I'm excited to see how C++ Concepts can evolve. Though I'm not sure how much that starts overlapping with Dependent Types.
> Personally I'm excited to see how C++ Concepts can evolve.
So, C++ 20 Concepts is basically what Bjarne Stroustrup proposed for a future version of his C++ language in the early 2000s. Several people proposed and WG21 accepted, a far more capable feature set for Concepts, this is often referred to as C++ 0x Concepts, since it was accepted for C++ 0x, the standard that would eventually (after years of delays) become C++ 11.
Bjarne wrote a paper arguing that this more powerful feature set was unnecessary and perhaps unworkable, and WG21 wound up removing Concepts from C++ 11 entirely (and the people behind it mostly got the message and ceased working on C++ altogether). A decade later, something that's close to Bjarne's original proposal became a C++ 20 feature.
C++ 0x Concepts was similar to Rust's Trait system in many ways. Particularly notable features of C++ 0x Concepts you might recognise in Rust's Traits:
1. Third parties can implement a C++ 0x Concept for some type which was not originally conceived with this Concept in mind, they just write the implementation and it works.
2. C++ 0x Concepts must be explicitly implemented they're not just a syntactic requirement that could be satisfied by happenstance in a type which is not in fact suitable.
3. As a result of 1 & 2, the C++ 0x Concepts have Semantics which in C++ 20 Concepts are confined to the idea of "modelling" a Concept or else IFNDR.
To be fair I've mostly used concepts in Nim, not C++ x0 concepts. However, your comparisons of C++ concepts to Rust traits seems to be lacking a lot of details or is plain inaccurate. It also misses the flexibility of C++ concepts.
> C++ 0x Concepts was similar to Rust's Trait system in many ways. Particularly notable features of C++ 0x Concepts you might recognise in Rust's traits:
Perhaps at the loosest level of comparison around only defining limitations on possible types. However, C++ x0 concepts enable much more powerful combinations of logic to specify if a template fulfills a concept. In Nim the concept can be any arbitrary boolean statement.
> 1. Third parties can implement a C++ 0x Concept for some type which was not originally conceived with this Concept in mind, they just write the implementation and it works.
That's true for C++ concepts, but not entirely true for Rust traits. You can only implement a Rust trait if you own the type or own the trait. If you use two third party libraries, you cannot implement a trait from one for a type from the other. At best you can wrap the type in a new struct, and reimplement the parent's traits.
> 2. C++ 0x Concepts must be explicitly implemented they're not just a syntactic requirement that could be satisfied by happenstance in a type which is not in fact suitable.
Everything in Rust traits requires them to be encoded into existing traits. C++ concepts let you define rules for arbitrary combinations of types. So you can create functions that take two independent types and define a constraint on those types.
Unfortunately you don't seem to have understood much of what I wrote.
The most crucial thing to understand is that C++ 0x Concepts were a significantly more powerful feature than the C++ 20 Concepts you got. Even though I emphasised this, you seem to have muddled them together to produce what you're calling "C++ x0 concepts" in several places, which is not actually a thing.
> That's not possible AFAICT with Rust's traits.
It is of course possible to write a Rust trait for something as useless as "any numeric type regardless of what kind", that's what the Num crate's Num trait is. Because Rust's traits have semantics, useless traits reveal themselves - you can't do much with something whose only decisive property is that it's numeric in some way.
Num also defines some traits for numeric properties that are way more useful, like the additive and multiplicative identities (Zero and One). A type can be numeric without having zero (NonZeroU32 is a trivial Rust example from the standard library) so expressing that you mean specifically a type with additive identity is useful in a way that merely "numeric" largely is not.
The "HasPower" example is revealing though, lots of people's toy Concepts are like this. They just dictate a morsel of syntax. C++ 20 Concepts are indeed suitable for this, but so is nothing whatsoever, because of C++ template "magic".
Why C++ 20 Concepts at all then? Your C++ compiler's diagnostics with nothing whatsoever are terrible because Substitution Failure Is Not An Error. Bjarne's simple "Concepts" can hide this somewhat - the diagnostics you get for a Concept failure are more digestible.
> That's true for C++ concepts, but not entirely true for Rust traits.
No, it would be true for C++ 0x Concepts but those never existed beyond a draft document. It does work for Rust traits as you say but you can't do it for C++ 20 Concepts.
Once again you're confused, that document is about C++ 20 Concepts, which exist, but I was describing C++ 0x Concepts, which are much closer to the capabilities of Rust's Traits and were never implemented.
> Everything in Rust traits requires them to be encoded into existing traits.
What you've written is a tautology. So I can only guess what insight you thought you had here.
Maybe you're imagining it's not possible to do obvious stuff like say that a type T must implement both trait A and trait B (a conjunction, signified in C++ Concepts with &&)? I assure you things do that all the time in Rust. foo<T: A + B + C, S: C + D>(p1: T, p2: S) is a function which takes two parameters p1 and p2, the type of p1 must implement traits A, B and C, while the type of p2 must implement traits C and D. For such complicated trait bounds idiomatic Rust would use the where keyword, but it's not mandatory, just easier to read.
In Rust you can't write the disjunctive bounds C++ 20 Concepts can express as || because it's not yet clear (and might never become clear) how to do so in a sound way. C++ doesn't care, none of the rest of the language is sound anyway, so it's too late to worry.
> The most crucial thing to understand is that C++ 0x Concepts were a significantly more powerful feature than the C++ 20 Concepts you got.
This makes a bit more sense than what you originally wrote.
Though regardless if C++ 0x concepts were more powerful, C++ 20 concepts as they exist today are strictly more powerful than Rust traits. You admit that yourself when you say that Rust traits cannot model disjunctive bounds. They certainly can't do arbitrary boolean predicates or negation.
That means that Rust's trait system cannot implement compile time type checking rules that C++ 20 concepts can today. You cannot encode entire sets of logic in the type system that you can in encode in C++ (or other) languages.
Features like disjunctive logic may not be able to be proved "sound" for things like borrow checking but that's not the point or intent. The point is that you can use arbitrary boolean predicates in inventive ways. Hence my original comment about "seeing where C++ concepts go".
> Even though I emphasised this, you seem to have muddled them together to produce what you're calling "C++ x0 concepts" in several places, which is not actually a thing.
True, my terminology got muddled. Trying to follow your terminology and (odd) backstory about C++ 0x concepts was confusing.
You are incorrect about how C++ 20 concepts work and that makes it more confusing.
While Rust traits force a "closed" system (to be imprecise) that's easier to prove soundness on upfront, that doesn't make the type system more powerful. It may make it more useful in some people's view. That's a pretty big distinction.
> No, it would be true for C++ 0x Concepts but those never existed beyond a draft document. It does work for Rust traits as you say but you can't do it for C++ 20 Concepts.
Err, no that's incorrect as you can easily check in any of the references I gave. C++ concepts as they exist allow you to call concepts if they fulfill the concept.
In Rust you must own either the trait or the type in order to implement said trait for that type. This is a widely known and deliberate limitation of the Rust trait system. It has some benefits, but is also leads to significant "trait bloat".
> It is of course possible to write a Rust trait for something as useless as "any numeric type regardless of what kind", that's what the Num crate's Num trait is. Because Rust's traits have semantics, useless traits reveal themselves - you can't do much with something whose only decisive property is that it's numeric in some way.
I don't really follow what you're trying to say here.
In contrast, I do find it very useful to define default algorithms for any numeric type that matches. It's a core part of C++ numerical libraries.
However, I get that this would be fairly pointless in Rust because you can't do much useful with it without things like generics specializations being stable.
> The "HasPower" example is revealing though, lots of people's toy Concepts are like this. They just dictate a morsel of syntax. C++ 20 Concepts are indeed suitable for this, but so is nothing whatsoever, because of C++ template "magic".
This makes no sense. A "morsel of syntax" or alluding to "C++ template magic" make no sense.
Granted C++ template's are amazing powerful, and amazingly difficult to debug.
C++ 20 concepts provide a useful and flexible way to describe compile time type restrictions, while not limiting C++ templates to the purely adjunctive subset of type logic able to be described in Rust's trait system.
> > Everything in Rust traits requires them to be encoded into existing traits.
> What you've written is a tautology. So I can only guess what insight you thought you had here.
It is, I was being lazy but the point I'm reaching for is that using Rust's trait system requires the traits you want to target to already exist and to be implemented for the types in question. Moreover, creating some type-based logic rules using the Rust trait system, requires that logic to effectively already be encoded into the traits (as some combination of adjunctive properties).
This is why you end up with incompatible HAL libraries for various STM32 models, among others. In my opinion it's a very limiting part of the ecosystem.
> This makes a bit more sense than what you originally wrote.
It re-states what I originally wrote, I'm glad you find it clearer now although since this is a matter of history it was all there for you to read if you cared. I don't think I have time to say everything two or three times until you "get" it.
> Features like disjunctive logic may not be able to be proved "sound" for things like borrow checking but that's not the point or intent.
The soundness problem is, unfortunately, fundamental. It's not about the borrow checker. C++ doesn't care whether your program has any logical meaning at all, so long as it is syntactically OK in these cases - of course in such case its meaning is unknown, but the standard explicitly tells compilers not to worry about that, the important thing is that the gibberish compiled, a C++ programmer can congratulate themselves on another successful project.
Maybe I need to expand the abbreviation I used, IFNDR: Ill-Formed, No Diagnostic Required. This is what the standard says to wave away such problems, not only with concepts but throughout the language. "Ill-formed" means this isn't actually a C++ program and so the standard does not define what it means, but "No Diagnostic Required" means the compiler needn't give an error or warning, it just presses on anyway.
[ You might imagine surely they could give a diagnostic, but actually they can't because of Rice's theorem. For a sufficiently powerful programming language you have to pick: 1. Your compiler sometimes gets "stuck" forever trying to decide whether a program is valid. 2. Your compiler reports errors in some otherwise valid programs. 3. Your compiler reports no errors in some invalid programs. Rust chose (2) and C++ chose (3) ]
> C++ concepts as they exist allow you to call concepts if they fulfill the concept.
Once again you've got turned around. The question isn't whether you can call concepts but whether anybody else can implement the concepts, and you simply can't do that. C++ 0x Concepts had "concept maps" to fix this, in Rust obviously the traits are explicitly implemented, but C++ 20 Concepts doesn't have an equivalent.
> Granted C++ template's are amazing powerful, and amazingly difficult to debug.
They're copy-paste. A slight improvement on C pre-processor macros. I suppose it's in the name, "templates" like a mail merge system. It's childishly simple like the cups and balls trick. The resulting mess does indeed produce unintelligible error messages and is also unsound in both obvious and surprising ways.
> In contrast, I do find it very useful to define default algorithms for any numeric type that matches. It's a core part of C++ numerical libraries.
Useful here meaning only you get better error messages than from SFINAE?
> using Rust's trait system requires the traits you want to target to already exist and to be implemented for the types in question.
I think it obviously follows that you can't use things which don't exist.
I don't know about this. Even some of the people who use dependently typed proof assistants seem to doubt that they should be used much in the programming part (as opposed to the proving part). Also, some of the examples you give might be addressed well enough by Rust's const generics.
I don’t disagree that it makes things much more pleasant, but I started doing this around 2013, which, while old, was still a year beginning with 20, and consensus was trending the opposite way and people were bullish about stuff like Ruby. The pendulum has really swung in the other direction.
I noticed this too and I have a simple explanation.
Ruby and Python overtook Java and C++ in the early 2010s in _spite_ of their lack of a good typing system, not because of it. On the whole, they are much more productive languages.
Now we're seeing languages that have Ruby / Python productivity but also have much better ways of static typing such as Typescript and Swift. And the Ruby / Python community is more open to static types as well.
The problems of ~2010 Java and C++ were mistakenly pinned on static types and the framing of "static vs dynamic languages" was always a red herring. Java and C++ were just crappy languages (at least in 2010, not sure about modern incarnations).
It really is a shame that Swift is so confined to the iOS world because it's such a great example of how you can have a language that feels like a scripting language but with much more advanced type safety.
This seems compelling. I do think Java has done a lot to mitigate the tedium of writing it in the meantime but my acquaintance with it is pretty casual.
No, there really is no new insight OP or anyone else has.
At the end of the day, the only thing that matters is writing something that works and works well. Hackers go through too many moodswings to be worth paying attention to when they start telling you how you should code.
True! But computer scientists (who are sometimes also hackers, sometimes not) apply research methods to existing codebases and the practice of coding, and have repeatedly presented findings that indicate that, mood swings/fads/hype cycles aside, some techniques really do deliver better software quicker. The VPRI STEPS work is an interesting example of this.
That's not to say that every CS methodology paper should be taken as gospel; we have problems just like other disciplines, sometimes more. But it's a far cry from post-hoc rationalization and hacker mood swings.
Doesn't really follow that because you ended up with a successful product that means it was the best way you could have possibly done it. I don't feel like I'd learned everything I know today the first time I delivered a successful product, and I doubt I know everything I'll know in the future either.
We (the industry) are still so quick to disregard the benefits of strict typing.
"Back in the day..." I worked on a collection of vital (to the company) infrastructure apps written in Borland Turbo (object) Pascal. Strong static type checking was enforced by the language. Good type design and strict type checking meant that it was normal that when a program compiled, it was bug free!
Much as I enjoy the flexibility of python, I know that every refactor or significant change means that there are now execution paths that have not been exercised - the burden of comprehensive testing is enormous, far outweighing the convenience of dynamic typing.
While I also agree that static typing is the easiest, statically decidable way to significantly increase program correctness, I’m not sure we can do significantly better with it then we currently do. Most interesting properties are not expressible even with dependent types, and those are very hard to prove, making their advantages non-no-brainers.
What I’m trying to say is that we should be open about another concept, for example contract-based programs (clojure’s spec for example), because they might have better properties.
Why does 100,000 lines of code of python tend to be safer and more manageable then 100,000 lines of C++ despite the fact that python has no type checker and C++ has a relatively advanced type checker?
Why do startups choose a python web stack over a C++ web stack?
I don't think it's "self-evident." I think there's something more nuanced going on here. Hear me out. I think type systems are GREAT. I think python type hints and typescripts are the way forward. HOWEVER, the paradox is real.
Think about it this way. If you have errors in your program, does it matter that much if those errors are caught during runtime or compile time? An error in compile time is caught sooner rather then later but either way it's caught. YOU are protected regardless.
So basically compile time type checking just makes some of the errors get caught earlier which is a slight benefit but not a KEY differentiator. I mean we all run our code and test it anyways despite whether the system is typed or not so the programmer usually finds most of these errors anyways.
So what was it that makes python easier to use then C++?
Traceability and determinism. Errors are easily reproduced, languages that always display the same symptoms from certain errors and in turn deliver error messages that are clear and are readable. These are really the key factors. C++ on top of non-deterministic segfaults, astonishingly even has compile time messages that can confuse users even further.
There is no "paradox". C++ is dangerous because of memory management and awful semantics (undefined behavior/etc), both of which are orthogonal to static typing.
It's a bit like saying that there's a paradox: everyone says that flying is safer than driving, but experimental test pilots die at a much higher rate than school bus drivers!
Paradoxes don't exist in reality. It's a figure of speech based on something that was perceived as a paradox. This much is obvious.
Much of the fervor around dynamically typed languages in the past was driven largely by the dichotomy between c++ and other dynamically typed languages.
Nowadays it's more obvious what the differentiator was. But the point im making here is that type checking is NOT the key differentiator here.
> So basically compile time type checking just makes some of the errors get caught earlier which is a slight benefit but not a KEY differentiator.
Unfortunately, I have to completely disagree here, at least based on my experience. Shifting software error detection from runtime to compile time is absolutely paramount and, in the long run, worth any additional effort required to take advantage of a strong type system.
Firstly, writing unit tests that examine all the possible combinations and edge cases of software component input and state is... an art that requires enormous effort. (If you don't believe me, talk to the SQLite guys and gals, whose codebase is 5% product code and 95% unit test code.)
Secondly, writing automated UI tests that examine all the possible combinations and edge cases of UI event processing and UI state is... next to impossible. (If you don't believe me, talk to all the iOS XCUI guys and gals who had to invent entire dedicated Functional Reactive paradigms such as Combine and SwiftUI. ;) J/K)
Thirdly, I don't even want to get into the topic of writing tests for detecting advanced software problems such as memory corruption or multi-threaded race conditions. Almost nobody really seems to know how to write those truly effectively.
> So what was it that makes python easier to use then C++?
The Garbage Collector, which is side-stepping all the possible memory management problems possible with careless C++. However, a GC programming language probably cannot be the tool of choice for all the possible problem domains (e.g., resource-constrained environments such as embedded and serverless; high-performance environments such as operating systems, database internals, financial trading systems, etc.)
"financial trading systems" This is a myth. Many financial trading systems are written in C# and Java. Don't be distracted by the 1% of hedge funds with lousy funding that need nanosecond reactions to make money. If you have good funding, product diversity matters more than speed.
Otherwise, your post is excellent. Lots of good points. SQLite is something that EADS/ESA/NASA/JAXA would write for a aeroplane / jet fighter / satellite / rocket.
I'm sure C# and Java make excellent programming languages for many if not most financial applications, but I meant that in the context of high-volume Enterprise Application Integration (EAI). Basically financial message transformation, explosion, summarization, audit, etc. across multiple financial institutions. The volume of messages to be processed was quite considerable, so nobody even thought about taking the risk of switching from battle-tested C++ to anything else.
I am sure your use case was incredibly specific. For insane performance requirements plus enterprise software that is not greenfield, basically everything is C++.
No trolling. Have you ever seen the high-frequency Java stuff from Peter Lawrey's Higher Frequency Ltd.? It is insanely fast. Also, LMAX Disruptor (Java) data structure (ring buffer) is also legendary. I have seen it ported to C++. That said, you can beat all of this with C++, given enough time and resources!
Another thing you're not addressing here is basically Type checking solves none of the problems you describe. You claim it's extraordinarily hard to write tests for UI and for memory corruption. And that's your argument for type checkers? It's next to impossible to type check UI and memory corruption. So your argument has no point here.
SQlite is written in C. It has type checking. Yet people still write unit tests for it. Why? Because type checking is mostly practically inconsequential. All your points don't prove anything. It proves my point.
All the problems you talk about can be solved with more advanced proof based checkers. These systems can literally proof check your entire program to be fully in spec precompile time. It goes far beyond just types. Agda, Idris, Coq, and Microsofts lean have facilities to prove your programs to be fully correct 100% of the time. They exist. But they're not popular. And there's a reason for that.
You say it's paramount to move error detection to compile time. I say, this problem is ALREADY solved, but remains unused because these methods aren't PRACTICAL.
Incorrect. Have a look at the Swift OpenCombine library. Multiple Publishers of a particular type that emits a single boolean value (e.g., an "Agree to Terms" UI checkmark and an "Agree to Privacy Policy" UI checkmark) are combined at compile-time to be transformed into a single Publisher of a type that emits only a single boolean value (e.g., the enabled/disabled state of a "Submit" button). Effectively, it is not possible to even compile an app that incorrectly ignores one of the "Agree" checkmarks before enabling/disabling the "Submit" button.
> It's next to impossible to type check (...) memory corruption
Incorrect. Have a look at the Rust standard library. Sharing data across multiple treads requires passing a multi-threaded Mutex type; attempting to share data through a single-threaded Rc (reference-counted) type will not compile. Once the Mutex type is passed, each thread can only access the memory the Mutex type represents by acquiring another type, a MutexGuard, through locking. Effectively, it is not possible to even compile a program that incorrectly ignores multi-threading or incorrectly accesses memory in a race condition with other threads thus possibly corrupting that memory. Moreover, it is also not possible for a thread not to properly release a lock once the MutexGuard type goes out of scope.
> All the problems you talk about can be solved with more advanced proof based checkers.
Unlikely. Without feeding strong type information that describes your problem domain into a checker, the checker cannot reason about your code and figure out possible errors. A strong type system is a "language" for a programmer to communicate with his or her checker.
> You say it's paramount to move error detection to compile time. I say, this problem is ALREADY solved, but remains unused because these methods aren't PRACTICAL.
> [the C language] has type checking. Yet people still write unit tests for it. Why? Because type checking is mostly practically inconsequential.
Please do not hold it against me if I do not continue commenting here - you must be from a different, parallel Universe. (How is Elvis doin' on your end? ;) J/K)
>Incorrect. Have a look at the Swift OpenCombine library. Multiple Publishers of a particular type that emits a single boolean value
First off types can't emmit values. Types don't exist at run time. They're simply meta info for the compiler to run checks. Thus they can't emmit anything. Second if you're talking about something that emmits a value then it involves logic that doesn't have to do with UI. A UI is not about logic, it is simply a presentation given to the user, all logic is handled by things that AREN'T UI based.
UI would be like html and css. Can you type check html and css make sure the hackernews UI is correct? There is no definition of correctness in UI thus it can't be type checked. The example you're talking about is actually type checking the logic UNDERNEATH the UI.
>Effectively, it is not possible to even compile a program that incorrectly ignores multi-threading or incorrectly accesses memory in a race condition with other threads thus possibly corrupting that memory. Moreover, it is also not possible for a thread not to properly release a lock once the MutexGuard type goes out of scope.
This is different. It's not type checking memory corruption. It's preventing certain race conditions by restricting your code such that you can't create a race condition. There's a subtle difference here. You can violate Rusts constraints in C++ yet still have correct code. Type checking memory corruption would involve code that actually HAS a memory corruption, and some checker proving it has a memory violation. My statement still stands Memory corruption cannot be type checked.
Think about it. A memory corruption is an error because we interpret to be an error. Logically it's not an error. The code is doing what you told it to do. You can't check for an error that's interpreted.
At best you can only restrict your code such that ownership lives in a single thread and a single function which prevents certain race conditions. which is what rust does. This has a cost such that implementing doubly linked lists are a hugely over complicated in rust: https://news.ycombinator.com/item?id=16442743. Safety at the cost of highly restricting the expressiveness of the language is very different from type checking. Type checking literally finds type errors in your code, borrow checking does NOT find memory corruption... it prevents certain corruption from happening that's about it.
>Unlikely. Without feeding strong type information that describes your problem domain into a checker, the checker cannot reason about your code and figure out possible errors. A strong type system is a "language" for a programmer to communicate with his or her checker.
No no, you're literally ignorant about this. There's a whole industry out there of automated proof checking of code via type theory and type systems and there's technology that enables this. It's just not mainstream. It's more obscure then haskell but it's very real.
It's only unlikely to you because you're completely ignorant about type theory. You're unaware of how "complex" that "language" can get. Dependent types is one example of how that "type language" can actually "type check" your entire program to be not just type correct but logically correct. Lean, Idris, Coq, Agda, literally are technologies that enable proof checking at the type level. It's not unlikely at all. it's reality.
>Please do not hold it against me if I do not continue commenting here - you must be from a different, parallel Universe. (How is Elvis doin' on your end? ;) J/K)
It's quick sort implemented in a language called idris. The implementation is long because not only is it just quick sort, the programmer is utilizing the type system to PROVE that quick sort actually does what it's suppose to do (sort ordinal values).
I'd appreciate an apology if you had any gall. But you likely won't "continue commenting here". Wow just wow. I am holding it against you 100%. I didn't realize how stupid and rude people can actually be.
"Typestates are a technique for moving properties of state (the dynamic information a program is processing) into the type level (the static world that the compiler can check ahead-of-time)."
> you're completely ignorant about type theory. (...) This is just fucking rude. (...) I didn't realize how stupid and rude people can actually be.
Yes, of course, naturally, you must be right, how blind could I have been?
> I'd appreciate an apology if you had any gall.
Sure, sorry about my little previous joke[1], meant no factual offense. The very best of luck to you as a programmer and a wonderfully polite human being with a great sense of humor.
[1] "Topper: I thought I saw Elvis. Block: Let it go, Topper. The King is gone. Let's head for home." ("Hot Shots!", 1991)
>Sure, sorry about my little previous joke[1], meant no factual offense. The very best of luck to you as a programmer and a wonderfully polite human being with a great sense of humor.
Jokes are supposed to be funny. Not offensive. Your intent was offense under the guise of humor. Common tactic. Anyone serious doesn't take well to the other party being sarcastic or joking, you know this, yet you still play games. It's a typical strategy to win the crowd by making someone overly serious look like a fool. But there is no crowd here, nobody is laughing. Just me and you.
So your real intent is just to piss me off given that you know nobody is here to laugh at your stupid joke. Your just a vile human being. Go ahead crack more jokes. Be more sarcastic, it just shows off your character. We're done.
> Your intent was offense (...) you still play games (...) a typical strategy to win the crowd (...) But there is no crowd here (...) your real intent is just to piss me off (...) Your just a vile human being (...) it just shows off your character
I assure you that I am not joking when I say the following: you are beginning to act in a disturbing manner at this point, please consider speaking to a mental health professional.
Again, sorry to have caused you discomfort with my little joke and best of luck to you.
Bro. If someone was truly disturbing and you truly wanted to help them wouldn't walk up to them and tell them to speak to a mental health professional. Telling them that is even more offensive. We both know this.
You're not joking. You're just being an even bigger ass, but now instead of jokes, you're feigning concern. It's stupid.
There's subtle motivations behind everything. A genuine apology comes without insulting the other party. Clearly you didn't do that here, and clearly you and everyone else knows what a genuine apology should NOT look like: "go get help with your mental problems, I'm really sorry."
It shows just what kind of person you are. It's not me who's disturbing... it's you, the person behind a mask.
Also clearly my words are from a place of anger and seriousness not mental issues. Mental problems are a very grave issue and it's a far bigger problem and the symptoms are far more extreme then what's happening here. But you know this. And you're trying to falsely re-frame the situation by disgustingly using mental issues as some kind of tool to discredit the other party. It's just vile.
I don't wish you the best of luck. I think someone like you doesn't deserve it.
Your argument makes no sense. I say the type checker is not the key differentiator then you say for python the key differentiator is the garbage collector.
So that makes your statement contradictory. You think type checkers are important but you think python works because of garbage collection.
Either way I'm not talking about the implementation of the language. I'm talking about the user interface. Why is one user interface better than the other?
I bet you if c++ has sane error messages and was able to deliver the exact location of seg faults nobody would be complaining about it as much. (There's an implementation cost to this but I am not talking about this)
Even an ugly ass language like golang is loved simply because the user interface is straight forward. You don't get non deterministic errors or unclear messages.
No contradiction, really, it's just that we are talking about two different programming goals: I emphasize the goal of producing well-behaved software (especially when it comes to large software systems), while you emphasize the goal of producing software in an easier (more productive) manner. For my goal, a strong type system is a key differentiator. For your goal, a garbage collector is a key differentiator. The discussion probably comes to down to the question of whether garbage-collected, weakly-typed Python is as "bug-prone" as memory-managed, strongly-typed C++. I have no significant experience with Python, so I cannot answer authoritatively, but I suspect your assumption that "100,000 lines of code of python tend to be safer and more manageable then 100,000 lines of C++" might be wrong. In a large codebase, there will probably be many more dynamic-typing error opportunities (after all, the correct type has to be used for every operation, every function call, every calculation, every concatenation, etc.) than memory-management error opportunities (the correct alloc/dealloc/size has to be used for every pointer to a memory chunk; but only if C++ smart pointers are not used).
>but I suspect your assumption that "100,000 lines of code of python tend to be safer and more manageable then 100,000 lines of C++" might be wrong.
I can give you my anecdotal experience on this aka "authoritative" in your words. I am a really really really good python engineer with over a decade of experience. For C++ I have 4 years of experience, I would say I'm just ok with it.
Python is indeed safer then C++. Basically when you check for type errors at runtime, you actually easily hit all reasonable use cases pretty quickly. This is why unit testing works in reality even though your only testing a fraction of the domain.
Sure this isn't a static proof but in Practical terms static type checking is only minimally better then run-time type checking. You can only see this once you have extensive experience with both languages and you see how trivial type errors are. Practicality of technologies isn't a property you can mathematically derive, it's something you get a feel for once you've programmed enough in the relevant technologies. It helps you answer the question of "How often and how easy do type errors occur uncaught by tests?" Not that much more often and not hard at all to debug.
The thing that is actually making C++ less usable are the errors outside of type checking. The memory leaks, the segfaults, etc. The GC basically makes memory leaks nearly impossible and python doesn't have segfaults period. What python does is fail fast and hard once you write something outside of memory bounds. Basically it has extra run time checks that aren't zero cost that make it much much more safe.
All of this being said, I am talking about type-less python above... when I write python, I am in actuality a type Nazi. I extensively use all available python type hints including building powerful compositional sum types to a far more creative extent then you can with C++. I am extremely familiar with types and python types. I have a very detailed viewpoint from both sides of the spectrum from both languages. That's why I feel I'm qualified to say this.
>No contradiction, really, it's just that we are talking about two different programming goals: I emphasize the goal of producing well-behaved software (especially when it comes to large software systems), while you emphasize the goal of producing software in an easier (more productive) manner.
I'm actually partly saying both. Python is both easier and more well-behaved and more safe. The "well-behaved" aspect has a causal relationship to "easier". It makes sense if you think about it. Python behaves as expected more so then C++.
Literally I repeat: Python (even without types) is categorically safer then C++. I have a total of 14 years of experience in both. I would say that's enough to form a realistic picture.
GC was one of the most important and relevant features (if not the most important) that allowed Java to penetrate, and eventually dominate the space where C++ used to be relevant in terms of middleware/business type applications. This detail matters a lot in this discussion. Then once that is taken as a given, you can compare different GC enabled languages based on other factors, such as type safety (or lack thereof in the case of python).
If it does matter to the conversation then it's evidence supporting my point. I'm saying type checking isn't a key differentiator between something like JS/ruby/python vs. C++. You're implying the GC is the key differentiator.
If you're saying that you CAN'T compare the python to C++ because of the GC then I disagree. GC only stops memory leaks. That is not the most frequent error that happens with C++. Clearly if you just subtract memory leak issues from C++ there's still a usability issue with just C++.
GC is not just for memory leaks, but memory safety in general. It also enables several paradigms that are extremely difficult to get right without memory safety.
In order to have a proper comparison, you should control for variables that are irrelevant to the experiment. In this case, you want to look at the effect of typing, so you should control for GC. Which is why you should compare python to other GC'd static languages, but not to static non-GC'd languages.
>GC is not just for memory leaks, but memory safety in general.
No this is not true. Memory safety and memory leaks are different concepts. You can trigger a memory leak without violating memory safety. In fact a memory leak is not really an error recognized by an interpreter or a compiler or a GC. It is a logic error. A memory leak is only a leak because you interpret it as a leak. Otherwise the code is literally doing what you told it to do. It's similar to a logic error. I mean think about it, the interpreter can't know whether you purposefully allocated 1gb of memory or whether you accidentally allocated it.
Memory safety on the other hand is protection against violation of certain runtime protocols. The interpreter or runtime knows something went wrong and immediately crashes the program. It is a provable violation of rules and it is actually not open to interpretation like the memory leak was.
See python: https://docs.python.org/3/library/gc.html. You can literally disable the GC (during runtime) and the only other additional crash error that becomes more frequent is OOM. The GC literally just does reference counting and generational garbage collection... that's it.
I can tell you what makes python MORE memory safe then C++. It's just an additional runtime checks that are not zero cost.
x = [1,2]
print(x[2])
The above triggers an immediate exception that names the type of error (out of bounds) and the exact line that triggered it. This error will occur regardless of whether or not you disabled the GC. It happens because every index access to a list also checks against a stored length. If you're above that length it raises an exception. It's not zero cost but it's more safe.
For C++:
int x[] = {1,2};
std::cout<<x[2]<<std::endl;
This triggers nothing. It will run even though index 2 is beyond the bounds of the array. There is no runtime check because to do so would make the array data structure not zero cost. This is what happens during buffer overflows. It's one of the things that makes C++ a huge security problem.
Let's look at the type issue.
def head(input_list: List[int]) -> Optional[int]:
return input_list[0] if len(input_list) > 0 else None
x: int = head(2)
--------------------
#include <optional>
#include <vector>
std::Optional<int> head(const std::vector<int>& input_list){
return (input_list.length() > 0) ? input_list[0] : std::nullopt;
}
int main(){
auto x = head(2)
return 0;
}
Both pieces of code are identical. Python is type annotated for readability (not type checked). But both literally produce the same error messages (wrong input type on the call to head). Both will tell you there's a type error. It's just python happens at runtime and C++ happens at compile time. C++ has a slight edge in the fact that the error is caught as a static check. But this is only a SLIGHT advantage. Hopefully this example will allow you to see what I'm talking about as both examples literally have practically the exact same outcome of a type error. A minority of bugs are exclusively caught with type checking because runtime still catches a huge portion of the same bugs... and in general this is why overall C++ is still MUCH worse in terms of usability then python despite type checking.
I don't think anyone is arguing that C++ is more difficult to use than Python, and much less safe. The question is how does python stack up to Java or C#? As you can see in this thread and many other discussions on this forum and elsewhere, people with experience working on larger systems will tell you that it doesn't.
If you had jobs in both stacks as I have you'll see that the differences are trivial. Python can get just as complex as either c# and java.
Those other people your copying your argument from likely only had jobs doing Java or C# and they did some python scripts on the side and came to their conclusions like that. I have extensive experience for production work in both and I can assure you my conclusions are much more nuanced.
Python and java stack up pretty similarly in my experience. There's no hard red flags that make either language a nightmare to use when compared to the other. People panic about runtime errors, but like I said those errors happen anyway.
Python does however have a slight edge in the fact that it promotes a more humane style of coding by not enforcing the oop style. Java programmers on the otherhand are herded into doing oop so you have all kinds of service objects with dependency injection and mutating state everywhere. So what happens is in Java you tend to get more complex code, while python code can be more straightforward as long as the programmer doesn't migrate their oop design patterns over to python.
That's the difference between the two in my personal experience. You're mostly likely thinking about types. My experience is that those types are not that important, but either way, modern python with external type checkers actually has a type system that is more powerful then Java or C#. So in modern times there is no argument. Python wins.
But prior to that new python type system my personal anecdotal experience is more relevant and accurate then other people's given my background in both Java and python And C++. Types aren't that important period. They are certainly better then no types but any practical contribution to safety is minimal.
> If you have errors in your program, does it matter that much if those errors are caught during runtime or compile time?
Of course it matters. If an error can be caught by the compiler, it will never get to production. Big win.
With typeless languages like python the code will get to production unless you have 100% perfect test coverage (corollary: nobody has 100% perfect test coverage) and then some unexpected moment it'll blow up there causing an outage.
This happens with metronomic regularity at my current startup (python codebase), at least once a month. It is so frustrating that in this day and age we are still making such basic mistakes when superior technology exists and the benefits are well understood.
That's fine. A type checker won't catch everything. Run time errors happen regardless. I find it unlikely that all the errors your code base is experiencing is the result of type errors.
Something like c++. You get a runtime errors. You have no idea where it lives or what caused it.
Your python code base delivers an error but a patch should trivial because python tells you what happened. Over time these errors should become much less.
That's a strawman, nobody has claimed a statically typed language will catch all possible errors.
It will however catch an important category of common errors at compile-time, thus preventing them from reaching production and blowing up there. Other types of logic error of course exist, in all languages.
> Something like c++. You get a runtime errors. You have no idea where it lives or what caused it.
I don't know what this means? You seem to be suggesting that code in a statically typed language cannot be debugged? Clearly that's not true. Debugging is in fact usually easier because you can rule out the type errors that can't happen.
>I don't know what this means? You seem to be suggesting that code in a statically typed language cannot be debugged?
You don't know what it means probably because you don't have experience with C++. These types of errors are littered throughout C++. What you think I'm suggesting here was invented by your own imagination. I am suggesting no such thing.
You talk about strawmen? Literally what you said can be viewed as an aspect of deception at it's finest. I literally in no way suggested what you accused me of suggesting. Accusatory language is offensive. Just attack the argument... don't use words like "strawman" to accuse people of being deliberately manipulative here. We both believe what we're saying, no need to accuse someone of an ulterior agenda when ZERO motive for one exists.
What I am suggesting here is that there is an EXAMPLE of a statically typed language that is FAR less safe and FAR harder to debug then a dynamically typed language (C++ and python). This EXAMPLE can function as evidence for the fact that static type checking is not a key differentiator for safety or ease of use or ease of debugging.
>Debugging is in fact usually easier because you can rule out the type errors that can't happen.
You don't get it. Type errors that happen at runtime or compile time contain the same error message. You get the same information. Therefore you rule out the same thing. Type checking is only doing extra checking in the sense that it checks code that doesn't execute while runtime checks code that does execute.
Python was programmed with sane error messages and runtime checks that immediately fail the program and gives you relevant logic about where the error occurred. This is the key differentiator that allows it to beat out a language like C++ which has none of this. C++ does have static type checking but it does little to make it better then python in terms of safety and ease of use.
> You don't know what it means probably because you don't have experience with C++.
I started developing in C++ in 1992, so I have a few years with it. I've never run into the problems you seem to be experiencing.
> Type errors that happen at runtime or compile time contain the same error message.
Yes. But for the runtime error to occur, you need to trigger it by passing the wrong object. Unless you have a test case for every possible wrong object in every possible call sequence (approximately nobody has such thorough test coverage) then you have untested combinations and some day someone will modify some seemingly unrelated code in a way that ends up calling some distant function with the wrong object and now you have a production outage to deal with.
If you had been catching these during compile time, like a static type system allows, that can never happen.
>Yes. But for the runtime error to occur, you need to trigger it by passing the wrong object. Unless you have a test case for every possible wrong object in every possible call sequence (approximately nobody has such thorough test coverage)
And I'm saying from a practical standpoint manual tests and unit tests PRACTICALLY cover most of what you need.
Think about it. Examine addOne(x: int) -> int. The domain of the addition function is huge. Almost infinite. Thus from a probabilistic standpoint why would you write unit tests with one or two numbers? it makes no sense as the your only testing a probability of 2 out of infinite of the domain. But that probability is flawed because it is in direct conflict with our behavior and intuition. Unit tests are an industry standard because it works.
The explanation for why it works is statistical. Let's say I have a function f:
assert(f(6) == 5).
The domain and the range are practically infinite. Thus for f(6) to randomly produce 5 is a very low probability because of the huge number of possibilities. This must mean f is not random. With a couple of unit tests verifying confirming that f outputs non-random low probability results demonstrates that the statistical sample you took has high confidence. So statistically unit tests are basically practically almost as good as static checking. They are quite close.
This is what I'm saying. Yes static checks catch more. But not that much more. Unit tests and manual tests cover the "practical" (keyword) majority of what you need to ensure correctness without going for an all out proof.
>If you had been catching these during compile time, like a static type system allows, that can never happen.
>I started developing in C++ in 1992, so I have a few years with it.
The other part of what I'm saying is that most errors that are non-trivial happen outside of a type system. Seg faults, memory leaks, race conditions etc... These errors happen outside of a type system. C++ is notorious for hiding these types of errors. You should know about this if you did C++.
Python solves the problem of segfaults completely and reduces the prevalence of memory leaks with the GC.
So to give a rough anecdotal number, I'm saying a type system practically only catches roughly 10% of errors that otherwise would not have been caught by a dynamically typed system. That is why the type checker isn't the deal breaker in my opinion.
I don't understand why you're talking about statistical sampling. Aside from random functions, functions are deterministic, unit testing isn't about random sampling. That's not the problem here.
Problem is you have a python function that takes, say, 5 arguments. The first one is supposed to be an object representing json data so that's how it is used in the implementation. You may have some unit tests passing a few of those json objects. Great.
Next month some code elsewhere changes and that function ends up getting called with a string containing json instead, so now it blows up in production, you have an outage until someone fixed it. Not great. You might think maybe you were so careful that you actually earlier had unit tests passing a string instead, so maybe it could've been caught before causing an outage. But unlikely.
Following month some code elsewhere ends up pulling a different json library which produces subtly incompatible json objects and one of those gets passed in, again blowing up in production. You definitely didn't have unit tests for this one because two months ago when the code was written you had never heard of this incompatible json library. Another outage, CEO is getting angry.
And this is one of the 5 arguments, same applies for all of them so there is exponential complexity in attempting to cover every scenario with unit tests. So you can't.
Had this been written in a statically typed language, none of this can ever happen. It's the wrong object, it won't compile, no outage, happy CEO.
This isn't a theoretical example, it's happening in our service very regularly. It was a huge mistake to use python for production code but it's too expensive to change now, at least for now.
> I don't understand why you're talking about statistical sampling. Aside from random functions, functions are deterministic, unit testing isn't about random sampling. That's not the problem here.
Completely and utterly incorrect. You are not understanding. Your preconceived notion that unit testing has nothing to do with random sampling is WRONG. Unit Testing IS Random sampling.
If you want 100% coverage on your unit tests you need to test EVERY POSSIBILITY. You don't. Because every possibility is too much. Instead you test a few possibilities. How you select those few possibilities is "random." You sample a few random possibilities OUT OF a domain. Unit Testing IS random sampling. They are one in the same. That random sample says something about the entire population of possible inputs.
>Next month some code elsewhere changes and that function ends up getting called with a string containing json instead, so now it blows up in production, you have an outage until someone fixed it. Not great. You might think maybe you were so careful that you actually earlier had unit tests passing a string instead, so maybe it could've been caught before causing an outage. But unlikely.
Rare. In theory what you write is true. In practice people are careful not to do this; and unit tests mostly prevent this. I can prove it to you. Entire web stacks are written in python without types. That means most of those unit tests were successful. Random Sampling statistically covers most of what you need.
If it blows up production the fix for python happens in minutes. A seg fault in C++, well that won't happen in minutes. Even locating the offending line, let alone the fix could take days.
>Following month some code elsewhere ends up pulling a different json library which produces subtly incompatible json objects and one of those gets passed in, again blowing up in production. You definitely didn't have unit tests for this one because two months ago when the code was written you had never heard of this incompatible json library. Another outage, CEO is getting angry.
Yeah except first off in practice most people tend to not be so stupid as to do this, additionally unit tests will catch this. How do I know? Because companies like yelp have had typeless python as webstacks for years and years and years and this mostly works. C++ isn't used because it's mostly a bigger nightmare.
There are plenty of companies for years and years have functioned very successfully using python without types. To say that those companies are all wrong is a mistake. Your company is likely doing something wrong... python functions just fine with or without types.
>And this is one of the 5 arguments, same applies for all of them so there is exponential complexity in attempting to cover every scenario with unit tests. So you can't.
I think you should think very carefully about what I said. You're not understanding it. Unit testing Works. You know this. It's used in industry, there's a reason why WE use it. But your logic here is implying something false.
You're implying that because of exponential complexity it's useless to write unit tests. Because you are only covering a fraction of possible inputs (aka domain). But then this doesn't make sense because we both know unit testing works to an extent.
What you're not getting is WHY it works. It works because it's a statistical sample of all possible inputs. It's like taking a statistical sample of the population of people. A small sample of people says something about the ENTIRE population of people. Just like how a small amount of unit tests Says something about the correctness of the entire population of Possible inputs.
>This isn't a theoretical example, it's happening in our service very regularly. It was a huge mistake to use python for production code but it's too expensive to change now, at least for now.
The problem here is there are practical examples of python in production that do work. Entire frameworks have been written in python. Django. You look at your company but blindly ignore the rest of the industry. Explain why this is so popular if it doesn't work: https://www.djangoproject.com/ It literally makes no sense.
Also if you're so in love with types you can actually use python with type annotations and an external type checker like mypy. These types can be added to your code base without changing your code. Python types with an external checker are actually more powerful then C++ types. It will give you equivalent type safety (with greater flexibility then C++) to a static language if you choose to go this route. I believe both yelp and Instagram decided to do add type annotations and type checking to their code and CI pipeline to grab the additional 10% of safety you get from types.
But do note, both of those companies handled production python JUST FINE before python type annotations. You'd do well do analyze why your company has so many problems and why yelp and instagram supported a typeless python stack just fine.
I think it is simpler than that: C++ is an incredibly complex and verbose language. Most of web development is working with strings, and C++ kinda sucks there. There is also a compilation/build step, so overall productivity is lower. Python is "easier" all the way around (we'll ignore the dependency management/packaging debates.)
It depends on how you define "safer." Run-time errors with Python happen frequently in large programs due to poor type checking all the time. Often internal code is not well documented (or documented incorrectly) so you may get back a surprise under certain conditions. Unless you've have very strict tooling, like mypy, very high test coverage, etc. there is less determinism with Python.
Also, this may come as a surprise, but many people do not run or test their code. I've seen Python code committed that was copy-pasta'd from elsewhere and has missing imports, for example. Generally this is in some unhappy path that handles an error condition, which was obviously never tested or run.
I know it happens "all the time" but these runtime errors happen fast and quick. You catch most of these issues while testing your program.
Statistically more errors are caught by python runtime then an equivalent type checked c++ program simply because the python user interface fails hard and fast with a clear error message. C++ on the other doesn't do this at all. The symptoms of the error are often not related to the cause. Python is safer then C++. And this dichotomy causes insight to emerge. Why did python beat c++?
In this case the type checker is irrelevant. Python is better because of clear and deterministic errors and hard and fast failures. If this is exemplary of the dichotomy between c++ and python and if type checkers are irrelevant in this dichotomy it points to the possibility that type checking isn't truly what makes a language easier to use and safer.
The current paradigm is rust and Haskell are great because of type checking. This is an illusion. I initially thought this was well.
Imagine a type checker that worked like c++. Non deterministic errors and obscure error messages. Sure your program can't compile but you are suffering from much of the same problems, it's just everything is moved to compile time.
It's not about type checking. It's all about traceability. This is the key.
>there is less determinism with Python
You don't understand the meaning of the word determinism. Python is almost 100 percent deterministic. The same program run anywhere with an error will produce the same error message at the same location all the time. That is determinism. Type checking and unit testing does not correlate with this at all.
I think it's better to catch errors sooner than later. This is where type checking helps. I've seen plenty of Python code that takes a poorly named argument (say "data").. is it a dict? list? something from a third party library like boto3? If it's a dict, what's in the dict? What if someone suddenly starts passing in 'None' values for the dict? Does the function still work? Almost nobody documents this stuff. Unless you read the code, you have no idea. "Determinism" of code is determined based on inputs. Type checking helps constrain those inputs.
As for C++ "non-determinism": If you write buggy code that overwrites memory, then of course you're going to get segfaults. This isn't C++'s fault.
I've seen plenty of code in all languages (including Python) that appears to exhibit chaotic run time behavior. At a previous company, we had apps that Python would bloat to gigabytes in size and eventually OOM. Is this "non-determinism"? No, it's buggy code or dependencies.
>I think it's better to catch errors sooner than later. This is where type checking helps.
Agreed. It is better. But it's not that much better. That's why python is able to beat out C++ by leagues in terms of usability and ease of debugging and safety. This is my entire point. That type checking is not the deal breaker here. Type checking is just some extra seasoning on top of good fundamentals, but it is NOT fundamental in itself.
>As for C++ "non-determinism": If you write buggy code that overwrites memory, then of course you're going to get segfaults. This isn't C++'s fault.
This doesn't happen in python. You can't segfault in python. No language is at "fault" but in terms of safety python is safer.
This language of "which language is at fault" is the wrong angle. There is nothing at "fault" here. There is only what is and what isn't.
Also my point was that when you write outside of memory bounds, anything could happen. You can even NOT get a segfault. That's the problem with what makes C++ so not user friendly.
>I've seen plenty of code in all languages (including Python) that appears to exhibit chaotic run time behavior. At a previous company, we had apps that Python would bloat to gigabytes in size and eventually OOM. Is this "non-determinism"? No, it's buggy code or dependencies.
This is literally one of the few things that are non-deterministic in python or dynamic languages. Memory leaks. But these are Very very very hard to trigger in python. But another thing you should realize is that this error has nothing to do with type checking. Type checking is completely orthogonal to this type of error.
>I think it's better to catch errors sooner than later. This is where type checking helps. I've seen plenty of Python code that takes a poorly named argument (say "data").. is it a dict? list? something from a third party library like boto3? If it's a dict, what's in the dict? What if someone suddenly starts passing in 'None' values for the dict? Does the function still work? Almost nobody documents this stuff. Unless you read the code, you have no idea. "Determinism" of code is determined based on inputs. Type checking helps constrain those inputs.
When you get a lot of experience, you realize that "sooner" rather then "later" is better but not that much. Again the paradox reels it's head here. Python forwards ALL type errors to "later" while C++ makes all type errors happen sooner and Python is STILL FAR EASIER to program in. This is evidence for the fact that type checking does not improve things by too much. Other aspects of programming have FAR more weight on the the safety and ease of use of the language. <-- That's my thesis.
Well, we do agree on something! I too much prefer programming in Python over C++. I honestly hope I never have to touch C++ code again. It's been about 5 years.
I try to add typing in Python where it makes sense (especially external interfaces), mostly as documentation, but am not overly zealous about them like some others I know. Mostly I look at them as better comments.
>I try to add typing in Python where it makes sense (especially external interfaces), mostly as documentation, but am not overly zealous about them like some others I know. Mostly I look at them as better comments.
See you don't type everything because it doesn't improve things from a practical standpoint. You view it as better comments rather then additional type safety. You leave holes in your program where certain random parts aren't type checked. It's like if only half of C++ was type checked, one would think that it'd be a nightmare to program in given that we can't assume type correctness everywhere in the code. but this is not the case.
Your practical usage of types actually proves my point. You don't type everything. You have type holes everywhere and things still function just fine.
I type everything for that extra 1% in safety. But I'm not biased. I know 1% isn't a practical number. I do it partly out of habit from my days programming in haskell.
You shouldn't compare a Python web stack with a C++ web stack, as C++ and Python target very different use cases.
You can compare however with a Java or C# web stack, both of which offer a superior developer experience, as well as a superior production experience (monitoring, performance, package management, etc.).
And worse language, in so many aspects, that you need everything to tame it.
In contrast, other langs like python have the luxury of see what C/C++ do wrong and improve over it.
Just having a `String` type, for example, is a massive boost.
So for them, the type system already have improved the experience!
---
So this is key: Langs like python have a type system (and that includes the whole space from syntax to ergonomics - like `for i in x`, to semantics) and the impact of adding a "static type system checker analysis" is reduced thank to that.
And considering that if you benchmark for a "static type system checker analysis" is what C++/C#/Java (at the start?) is then the value is not much.
Is only when you go for ML type systems where the value of a static checker become much more profitable.
Hindley mindler allows for flexibility in your types and this high abstraction and usability in code. The full abstraction of categories allows for beautiful and efficient use of logic and code but it's not safety per se.
Simple type systems can also offer equivalent safety with less flexibility. What make Haskell seem more safe is more the functional part combined with type safety. Functional programming eliminates out of order errors where imperative procedure were done in the wrong order.
Well there's an irony to your statement. Those programmers who write embedded systems (I'm one of them) tend to use C++. C++ lacks memory safety and has segfaults, python doesn't. They literally used the most unsafe programming language ever that literally doesn't even alert you to errors either at compile time or runtime.
C++ is chosen for speed. Not for safety. The amount of run-time and compile time checks C++ skips is astronomical. The passengers may think it matters, but the programmers of those systems by NOT using a program that does compile time or run time checks are saying it doesn't matter.
> Why does 100,000 lines of code of python tend to be safer and more manageable then 100,000 lines of C++ despite the fact that python has no type checker and C++ has a relatively advanced type checker?
Because C++ sucks, but static types are not to blame for that.
My point here is that static types didn't do much to improve C++. We should be focusing on what made C++ bad. The things that made C++ bad and the fixes for those things are what makes python Good.
I'm saying type checking is not one of those things.
Of course. I'm a python guru. I know the python type annotation inside and out. I'm a type nazi when it comes to writing python.
That's why I know exactly what I'm talking about. I can unbiasedly say that from a practical standpoint the type checker simply let's you run and the "python" application less, and the "mypy" application more.
Example:
def addOne(x: int) -> int:
return x + 1
addOne(None)
The above... if you run the interpreter on it, you get a type error. Pretty convenient, you can't add one to None.
But if you want to add type checking you run mypy on it. You get the SAME type error if you run mypy. They are effectively the same thing. One error happens at runtime the other happens at before runtime. No practical difference. Your manual testing and unit testing should give you practically the amount of safety and coverage you need.
Keyword here is "practically." yes type checking covers more. But in practice not much more.
Sure but the time delta is inconsequential. Why? because you're going to run that program anyway. You're going to at the very least manually test it to see if it works. The error will be caught. You spend delta T time to run the program. Either you catch the error after delta T or at the beginning of delta T. Either way you spent delta T time.
Additionally something like your example code looks like data science work as nobody loads huge databases into memory like that. Usually web developers will stream such data or preload it for efficiency. You'll never do this kind of thing in a server loop.
I admit it is slightly better to have type checking here. But my point still stands. I talk about practical examples where code usually executes instantly. You came up with a specialized example here where code blocks for what you imply to be hours. I mean it has to be hours for that time delta to matter, otherwise minutes of extra execution time is hardly an argument for type checking.
Let's be real, you cherry picked this example. It's not a practical example unfortunately. Most code executes instantaneously from the human perspective. Blocking code to the point where you can't practically run a test is very rare.
Data scientists, mind you, from the one I've seen, they don't use types typically with their little test scripts and model building that they do. They're the ones most likely to write that type of code. It goes to show that type checking gives them relatively little improvement over their workflow.
One other possibility is that expensive_computation() can live in a worker processing jobs off a queue. A possible but not the most common use case. Again for this, likely the end to end or your manual testing procedures will test loading a very small dataset which will in turn make the computation fast. Typical engineering practices and common sense lead you to uncover the error WITHOUT type checking being involved.
To prove your point you need to give me a scenario where the programmer won't ever run his code. And this scenario has to be quite common for it to be a practical scenario as that's my thesis. Practicality is a keyword here: Types are not "practically" that much better.
I would not use C++ in your comparison. Try with C# or Java. Not even close. They will crush in developer productivity and maintenance over Python, Ruby, Perl, JavaScript.
First off python now has types (you can place type annotations on the interpreter and run an external type checker) and javascript people use typescript. In terms of type safety i would argue python and javascript are now EQUAL to C# and Java.
Developer productivity in these scripting languages is also even higher. Simply because of how much faster they are to program in with the code then run/test loop. Java and C# can have loong compile times. Python and typescript are typically much much more quicker. With the additional type safety python and typescript are actually categorically higher in developer productivity then C# or Java.
But that's besides my point. Let's assume we aren't using modern conventions and javascript and python are typeless. My point is that whether or not C# or java crushes python and javascript over maintenance it doesn't win because of type checking.
You wrote: <<Java and C# can have loong compile times.>> Yes, for initial build. After, it is only incremental. I have worked on three 1M+ line Java projects in my career. All of them could do initial compile with top spec desktop PC in less than 5 mins. Incremental builds were just a few seconds. If your incremental build in Java or C# isn't a few seconds, then your build is broken. Example: Apache Maven multi-module builds are notoriously slow. Most projects don't really need modules, but someone years ago thought it was a good idea. Removing modules can improve compile time by 5x. I have seen it with my own eyes.
><<Java and C# can have loong compile times.>> Yes, for initial build. After, it is only incremental.
I work with C++ currently. Even the incremental build is too slow. Also eventually you have to clear the cache for various reasons including debugging, a new library, etc, etc/
1M line python is 0s compilation time. You hit the runtime section instantaneously.
Go was created with fast compilation times to get around this problem. I would say in terms of compilation, go basically is the closest in terms of the python experience.
Basically when things are fast enough "5x compilation time" isn't even thought about because things are too fast to matter anyway. Go hits this area as well as python (given no compilation)
> I want that data type to have helpful methods such as .Domain() or .NonAliasValue() which would return gmail.com and foo@gmail.com respectively for an input of foo+bar@gmail.com.
No the hell you don't.
Please please please do not attempt to separate the alias from an email address I submit. It's there for a reason - specifically, to hold you accountable if I experience a sudden influx of spam, and generally to keep things categorized in a world where senders can be sending things from all sorts of domains. Knowing that this is something one would even remotely consider is grounds to never touch anything one has built with a ten-foot pole, and I am now very strongly inclined to look into the author and compulsively scrub any accounts of mine from anything said author might've touched.
I am not exaggerating. The thing before the @ is meant to be opaque. Deeming otherwise for the sake of something so blatantly user-hostile as removing aliases is plain evil, and I will not sugarcoat my condemnation of such practices.
If you're sufficiently sociopathic to have no regard for the morality argument here, then at the very least take heed of RFC 5322 (https://datatracker.ietf.org/doc/html/rfc5322) and recognize that trying to parse any meaning from an email address' local-part is blatantly ignorant of IETF specifications and almost certainly will create bugs. Just don't do it - if not for your users' sake, then for your own.
> "recognize that trying to parse any meaning from an email address' local-part is blatantly ignorant of IETF specifications and almost certainly will create bugs"
I am sorry but this makes no sense. You do realize that the only reason you are able to use aliases is because your email provider chooses to parse meaning out of the supposedly "opaque" text right? If your email provider is free to "break" the spec, so are people you give your id to.
And that is solely the business of myself and my email provider. It's my email address, and therefore I am within my rights to assign whatever internal meaning I so choose. It is absolutely not the business of someone sending an email whether or not that opaque text has further-parseable meaning, and pretending otherwise absolutely will cause bugs (say, when sending emails to mailservers which don't use that alias syntax).
EDIT:
> If your email provider is free to "break" the spec, so are people you give your id to.
Wrong. See above. The email provider is free to "break" the spec because it is the thing in control of that email address and can therefore process it as it sees fit. The people to whom I give an ID are not my email provider, and therefore do not have the same degree of control; consequently, attempting to parse meaning from that opaque string will cause bugs, and also is a dick move which will not be tolerated.
If you're defending this practice because you, too, are parsing the opaque components of email addresses which you do not control, then I will take note to look into your code contributions as well and avoid anything you've touched.
Do. Not. Parse. The. Local-part. For. Aliases. Full stop. It's my email address, not yours. Respect how I enter it, or else remove it from your system entirely. Anything different is asking for bugs and is blatantly disrepsectful to users.
> If your email provider is free to "break" the spec, so are people you give your id to.
There is no reasoning behind this argument; it is purely a verbal construct memetically derived from some inapplicable equality ethic that might make sense in a completely unrelated situation.
The correct application of ethics is that someone agency who is given abc+def@gmail.com, and infers from it that this gives them permission to send email to abc@gmail.com (or, worse, sell that address to harvesters) is behaving unethically.
True enough, as far as it goes. But if you are concerned about subscribing to something twice, you may want to try to check delivery uniqueness. They might be your own addresses.
Of more interest to me, omitted from the presentation--as almost always--is anything about what is disliked about a malformed address. You see this when some web form says it doesn't like your address, but won't say why, leaving you to guess and try things until it is satisfied.
Another example is the password filter that idiotically demands "at least one capital letter, one digit, and one swear character" in your already several-word passphrase, and dislikes your choice of swear characters but won't say so.
> But if you are concerned about sending an e-mail to the same address twice, you need to check delivery uniqueness.
For one, you shouldn't be concerned about that, and for two, you can't tell delivery uniqueness anyway, since someone can have multiple completely different addresses going to the same inbox.
> But if you are concerned about subscribing to something twice
I'm concerned about some service collecting my email address and "accidentally" exposing it to spammers.
> Of more interest to me, omitted from the presentation--as almost always--is anything about what is disliked about a malformed address. You see this when some web form says it doesn't like your address, but won't say why, leaving you to guess and try things until it is satisfied.
That is indeed yet another reason why you should never ever try to parse meaning from email addresses you do not own.
And the extent of that parsing should be in accordance with the relevant RFCs (namely, 5322). Per that RFC, the local-part is an opaque string of permitted characters. Attempting to parse the local-part beyond that when you ain't the one who owns/controls that address is bug-prone at best and user-hostile at worst.
> Another example is the password filter that idiotically demands "at least one capital letter, one digit, and one swear character" in your already several-word passphrase, and dislikes your choice of swear characters but won't say so.
Articles like this bug me. You've given me a list of why types are awesome. Great. Now, tell me what the tradeoff is. Nothing is free in engineering. To get something, you have to give up something else. Even grug[0] understands this.
I don't think every article has to "teach the controversy." This is an article for programmers who don't know the upsides of types.
What's more, for a programmer who doesn't get the value of types, the major downsides are already apparent, at least at a basic level. Doesn't this make my code more verbose? Doesn't this get super confusing sometimes?
It's an article design to help certain programmers learn a particular thing, not an article meant to satisfy more experienced programmers' desire to see all sides of an argument acknowledged.
> What's more, for a programmer who doesn't get the value of types, the major downsides are already apparent, at least at a basic level.
How could the downsides possibly be apparent if the upsides are so mysterious they need an article to spell them out?
> It's an article design to help certain programmers learn a particular thing
Have they actually learned that particular thing if they don't know the tradeoffs they're making? I would argue they haven't. You need to know what you're getting and what you're giving up before you can decide whether something is worth using at all. There are too many articles hyping the upsides of technology X, but nobody asking what the downsides are.
I don't know what to tell you. The downsides are apparent. There is no logic theorem that says the upsides and downsides need to be equally as apparent.
The trade-off is obvious: you gain confidence about your program but you need to Learn More Stuff. Nobody's talking about the downsides of type systems except to the extent that they're worth talking about: see the comments here every time someone compares the type systems of Python and Rust.
> The trade-off is obvious: you gain confidence about your program but you need to Learn More Stuff
This is why engineering/software articles in general (this one included) needs to bring up tradeoffs more often. No, "learning more stuff" is not a downside or a tradeoff, it's just a fact of learning anything.
That you introduce more coupling is a tradeoff. That the program (sometimes) gets harder to change is a tradeoff. That is becomes easier to write large, messy programs because programmers feel more safe in the future to refactor, is a tradeoff. Trying to fix each one of those tradeoffs also come with their own tradeoffs, and so on.
These are "apparent" for me, when talking about languages using static types vs dynamic languages, but it is not apparent for everyone. So when bringing up these "obvious" upsides, also bring up the "obvious" downsides, as it seems quite a lot of people don't see it as "obvious" as we do.
> That you introduce more coupling is a tradeoff. That the program (sometimes) gets harder to change is a tradeoff.
You don't introduce more coupling, you document the coupling that already exists. If your program is hard to change with types, it would be hard to change without types - but easier to change incorrectly.
> That is becomes easier to write large, messy programs because programmers feel more safe in the future to refactor, is a tradeoff
Sure, I guess this is true in principle - but you could say the same about IDEs, or version control, or grep. The effect size is small.
> You don't introduce more coupling, you don't the coupling that already exists.
This is true at the code level. But at the system-design level, this documentation is the extra coupling.
I feel like it's important to understand this. I agree with the original commenter; in engineering, nothing is truly free. In many cases, this extra coupling helps keep a system strong and stable, like extra nails holding planks of wood together. In other cases, you may find that part of a system's spec actually missed the mark and now needs to be ripped up and redone. That extra coupling might now work against you!
Again, that doesn't mean that it wasn't worth having it. It is just important to understand tradeoffs in engineering.
No, the coupling (i.e. assumptions about what type this thing can be) is implicitly there in a typeless language. If you pass in the wrong thing it'll blow up. But it'll happen in production.
The coupling is always there, it's just a matter whether you make it explicit (this allowing errors to be caught early) or you pretend it's not there and let things crash in production.
How can the downsides of having to wear a seatbelt possibly be apparent if the upsides are so mysterious they need an article to spell them out? People have a quick aversion to things all the time. Sometimes the actual benefits need to be carefully explained. (“You are statically likely to be in a car crash. Wearing a seatbelt multiplies your chance of living through it.”)
I think almost everyone understands the benefits of both types and seat belts. The fact that a seat belt keeps you restrained during a crash is pretty intuitively obvious. The idiots who don't wear seat belts either (1) believe they'll beat the statistics, and thus no statistical argument will convince them or (2) value their "freedom" a lot more than they value their own lives.
In any case, though, wearing a seat belt or not is a choice one can make independently of all other factors. The cars come with seat belts, it's the same car either way. You click or not, nothing else changes about the car.
With types, however, that's not how it works. The tradeoff is... you may have to change your entire programming language, change your IDE, change your frameworks, rewrite existing code, etc. It's not analogous to seat belts at all. Do programmers want the compiler to catch mistakes? Of course they do, in an ideal world. Why wouldn't they? But there are a lot of tradeoffs here that don't exist in the case of seat belts.
Like seat belts, you can choose to wear or not wear a helmet independent of all other factors. The motorcycle or bicycle is exactly the same regardless of whether you're wearing a helmet. Also, everyone understands the benefits of a helmet. The benefits don't make everyone wear a helmet, but everyone is clear about why there are helmets.
It does, wearing helmets means you need to carry them around and store them, and it can ruin your hairstyle. People who can balance the tradeoffs make mostly reasonable decisions ("I write large programs with the support of a type system and eschew it for small scripts", "I will probably be OK without a helmet just biking slowly between two buildings at work") but some people will never accept the upsides as being worth it.
> some people will never accept the upsides as being worth it.
How is this any different from the selt belt case? "(1) believe they'll beat the statistics, and thus no statistical argument will convince them or (2) value their "freedom" a lot more than they value their own lives"
Would you really expect an article "The helmet is a biker's best friend" to convince them?
Everyone knows that a helmet helps prevent head injuries. That doesn't need to be explained. Whether the tradeoff of messing up your hair or whatever is worth it is up to the individual to decide, but there's nothing complex about the decision that needs an academic discussion.
They aren't going to learn that in a day or a week or a month. Maybe a year if they're extremely bright or are in a perfect environment for figuring it out, but most people take years. This article is a few minutes out of that hypothetical best-case year. If it gives them food for thought for a week, it'll be time better spent than 99% of what they could read, certainly better than if they read a comprehensive article that went 98% over their heads and got them hung up on things that they weren't yet able to experience and understand.
Besides, they aren't going print this article out and take it to a cave in the mountains to learn about type systems for a year. They're going to read other stuff along the way.
grug miss big brain benefit for types. Grug says the main benefit is auto completion, I think the real benefit is to making code changes.
If I update a type, the compiler will tell me every single location where I need to make a corresponding code change. For grug: change type give red squiggle, make change code good
Also, grug makes a good point about the temptations of generics, but I think they’re exaggerating the impact to the speed of development.
> big brain type system shaman often say type correctness main point type system, but grug note some big brain type system shaman not often ship code. grug suppose code never shipped is correct, in some sense, but not really what grug mean when say correct
I forgot about this, thanks for the morning laugh. No such thing as a free lunch.
Understanding existing code is a big benefit of static types as well.
I’m sure one could argue that member names should obviate the need for type annotations.
There’s also the distinct possibility that my preceding ~15 years of statically-typed software development have affected how I think about software development in some way. (wink)
But I am finding type annotations internet useful while working on a huge application that is about 2 years into adding a gradual typing system, enough so that I usually take time whenever I enter a new code area to add annotations to everything, just to understand what’s going on.
My perception is that I invest time to build understanding of the types, and then document what I’ve learned in the form of these type annotations so that future maintainers then gain a quicker understanding without having to do the initial research.
At least my non-statistically-significantly-sized team agrees.
> Understanding existing code is a big benefit of static types as well.
At a syntax level perhaps, but not necessarily at a semantic level. This won't apply to everyone, but I've noticed that the more types are relied on, the less my co-workers really understand the code. They're relying on the compiler so much they don't slow down and think through the changes they're making. In one extreme case I saw a guess-change-compile workflow that relied entirely on the compiler doing the work for them.
Ever so slightly longer compile times. It's pretty close to a free lunch.
There are only tradeoffs when we are at the frontier of what's possible with a set of technologies, and so must trade off on something in order to move along that frontier[1]. Many languages aren't operating at that frontier, and adding static typing is free (in the marginal case, ignoring the substantial effort to implement the type system). If you start a greenfield Python project, and you start typing right away and incorporate MyPy into your CI and IDE - it's as close to free as you can get, and the benefit is substantial.
[1] Eg, like in this diagram, http://image1.slideserve.com/2488675/production-possibilitie... - we only need to trade off on guns & butter if we're along the frontier (the blue line), if we find ourselves somewhere in the middle we can just make more stuff until we reach the frontier.
People using dynamic languages deeply, especially library and framework authors, regularly write abstract/generic code for which a suitable type declaration would be mind-bendingly difficult in a very sophisticated type system and impossible in a weak one. You can argue that this is ill-advised! But static typing with normally-powered type systems leads to more voluminous and more purpose-specific code. Very powerful type systems are possible, but treated as academic and too difficult to use in the real world.
A way this often gets worked around is code generation. Anywhere you have or reach for codegen in a static language, you probably could have used a plain old function in a dynamic language.
I think this is a little disingenuous. It’s not that the type system makes highly abstract/generic code difficult, it’s more that the specific ways people are used to writing that type of code in dynamically typed languages doesn’t lend itself well to adding type annotations. But I think you’d be hard pressed to find many places where Haskell programmers, for example, haven’t found a different way to express whatever the Python code is achieving while also allowing for type annotations.
Thank you for the thoughtful response; this is something I had failed to consider (since, as you anticipated, I try to avoid creating deep or highly dynamic abstractions), and I do regularly throw in the towel on complex or deeply nested types (either by just omitting them out using a "close enough" type in Python, or by using trait objects in Rust - looking at you, `Map<Chain<RangeInclusive<...>>>`).
Not GP, but for instance, Python web frameworks are oriented around decorators, and Rust web frameworks are oriented around macros. The syntax looks pretty similar, but like GP says, in dynamic languages you're using a (higher order) function, and in static languages you're using code generation for the same purpose.
I am confused. "Code gen" is a fuzzy concept once you have a virtual machine because you can do it at runtime. This is how mocking works in Java and C#. What you describe already exists in Java and C# -- decorators for web frameworks. I would not describe Java nor C# as dynamic. I describe them as "stricter" (types) and Python as "weaker" (types). To quote Norman Ramsey: "Every time I see a question about "strong" or "weak" typing, I kill a kitten." :-)
> Nothing is free in engineering. To get something, you have to give up something
I think this is a dangerous position to take to extremes/as an axiom.
Not talking about type systems at all here. The assumption that, given two tools/techniques for accomplishing the same goal, there are always equivalent tradeoffs simply isn't true.
Some tools are better than others.
That statement usually provokes misinterpretation. It should not be taken to mean:
- That some tools are always better than others, in every context. There are situations in 2022 where COBOL is the best choice for new code, and other situations where rewrite-it-in-Rust is the best choice. Problems occur when "tradeoffs of tool A (even if we don't know what they are yet) make it equivalent to tool B" is a core tenet of decision making.
- That some tools always have been and/or always will be better than others. Context, expectations, tool capabilities, and available programmer talent pools all change massively over time.
- That one tool is better than all the others. Plenty of times there are multiple ways to deliver optimal-given-constraints outcomes, and it comes down to a matter of taste or "just pick something, anything, and let us get to work".
Chasing hype and cargo culting leads to poor outcomes; "we should build our two-core app on Kubernetes/write our 2TPS app in Rust" are often justified with "because it's the future" or "because the cool kids are doing it". That's a major bummer.
But the opposite extreme is just as bad: assuming that all choices are fundamentally a wash because "to get something, you have to give up something else" is just as methodologically irresponsible as following the hype cycle. Programming isn't alchemy. This kind of bad decisionmaking can lead to dependence on obsolete (unsupported/insecure) tools, difficulty hiring, and, at worst, a culture of "don't talk about Python to me; if you can't freehand it in C you just need to get gud" gatekeeping cruelty.
Everything has tradeoffs. That doesn't mean they're equivalent.
The issue is that people from one side will always downplay the tradeoffs (or pretend they don't exist, like the current author). This exactly how hype and cargo culting happens.
Sure there are tradeoffs, but I disagree that it’s always so balanced. When people moved from assembly to high level languages presumably there were tradeoffs but in retrospect it’s a pretty clear cut choice. I’m not saying typed languages are as big of a shift as high level languages but it’s possible they are the unequivocal right choice.
Does that also hold for people doing data science in Jupyter notebooks? They're also programmers, arguably.
At this point in the trajectory of software engineering, it's fair to assume that most of the low-hanging fruits have been picked, and solutions that are unequivocally better would have to bring something fundamentally new to the table (which types are not at all). Most solutions will be picking a particular point on a trade-off isocurve.
Apart from that, it's always fair to ask someone who's strongly proposing something what the downsides are.
Why are you sure that all of the low hanging fruits have been picked? The origins of software development are pretty much still within living memory. We’re still extremely new to programming. I wouldn’t be surprised if the field looks completely different in 50 years.
As for types, I’m not saying they’re flawless. I’m pointing out that in the transition from assembly to high level languages there were flaws and criticisms that came out. But looking back fifty years, were these flaws and criticisms genuine tradeoffs that kept assembly as a reasonable option for most developers? No, they were not. Now we look back on programmers who insisted on writing assembly as oddities, as niche figures. I cannot say if this will be the case for types, but I won’t rule it out.
> Does that also hold for people doing data science in Jupyter notebooks? They're also programmers, arguably.
Yeah they’re definitely programmers, but anyone who’s had to maintain and deploy what a data scientist came up with in a notebook will question whether they should really be using types.
For a while in the codebase I was working on, we had a set of distinct types for different units. You know, a type for meters, another for centimetres, etc etc. We had types for radians, types for degrees.
We had conversion functions between them, and type inference when you performed certain operations.
The result was a disaster. Not an enormous disaster, but enough of a problem to rip the entire thing out and replace it with plain double-prevision floating points, and sensible variable names, everywhere.
Why was this? It was a combination of there being nothing sensible to infer when you, say, multiply an angle and a distance (which happens when you're doing algebra), and everything in the whole codebase needing to be aware of these types.
The downside of all these brilliant ideas is dependency. If you define an "EmailAddress" type, as the article suggests, you've got to write the code for it somewhere. Now all your projects are dependent on this library, with all the pain and anguish that versioning and linking/including/whatever the library brings.
Before, you depended on nothing but the String type, which is very likely built into your language. When all your code needed to do was pull that email address out of some persistent store (say), and send to some other piece of code, your dependency list was just the Persistence library. But with your fancy EmailAddress type, your dependencies are now much worse.
Keep things as simple as they can be. An EmailAddress type is not useful.
I think there's a misconception that types are meant to tag things according to the programmer's mental taxonomy. They're not. Types should be based on significant distinctions in how your particular program treats and processes the data. For instance, you don't need an EmailAddress type because your program doesn't do anything special with the knowledge that this string is actually an email address. It just treats it like another string. It takes some judgment to determine this, but I consider that part of the learning curve in using types rather than an inherent tradeoff from the tool itself.
wtf? metres and centimetres are not different types! they are just different ways of writing same type: Length. radians and degrees are just ways of writing a dimensionless Angle quantity. you made the absolutely elementary mistake of conflating a physical quantity with the unit used to measure it, of course it was a disaster.
>nothing sensible to infer when you, say, multiply an angle and a distance
angles are dimensionless so they should just be a distinct type of float. there is literally no problem here.
I can see how it could cause problems if they weren’t using the type system correctly.
typedef float cm;
typedef float meter:
cm a = 1;
meter b = 1;
if (a == b) {
// launch rockets
}
I’ve done something similar (not including rockets, don’t worry!) in Swift with its typealias feature. Thankfully there is a way to actually force compiler errors in such situations with something like https://github.com/pointfreeco/swift-tagged
You might not have seen this pattern before, but annotating values with units as types is a legitimate approach. There’s a whole chapter about it in the book Software Design for Flexibility by Gerald Sussmann, the author of Structure and Interpretation of Computer Programs, which is linked on here pretty often. It has to be done in the right way, though, in a language that’s expressive enough to support it.
I would rather have dependency problems-which are able to be automated with sufficient tooling-than working in a codebase written by someone that thought you could just use floats and strings for everything in an extremely overloaded fashion. Doubly so if they don’t believe in documenting all the separate use cases and just keep all that knowledge in their head.
The tradeoff is that I have to explicitly say (and know) what type I'm dealing with at every point in the program.
But I'm not sure that's much of a tradeoff. If I don't know what type this thing is, how do I know what operations I can safely do on it? How do I know that I can make it do what I'm trying to do? Or will it blow up at runtime when I do that?
I consider "I coded it, it's done, but it might blow up at runtime" to be highly unprofessional. "We covered that with unit tests" is theoretically OK, if you've got 100% test coverage. But you don't, and you never will.
Having to say everywhere what the type is gets tedious. Autocomplete (and "auto", for those languages that have it) help a bit here, but only a bit.
Most modern languages can do type inference and doesn't require explicit type annotation for variables. Hack, even C++ gots auto! For declarations, I think it does make sense to ask for the annotation because it can also serve as documentation. Have you ever tried scala, rust, haskell or typescript?
The key thing the author identifies is the two languages problem. Static types are a second program about the program, and that's great when the second program is simple declarations that keep you from passing one struct when another is expected.
But a flexible language needs more than that, and generics end up either Turing Complete or bad in some other way.
I usually run into issues at the boundaries in the system.
Usually moving from primitives into complex types does not account for serialization and deserialization between db and the client. This can be very annoying to work with in something like C#.
Usually it ends up resulting in alot more types and a lot more mapping between types.
However this has its own benefits, but is very boilerplate-y and is sluggish to work with when your domain changes.
I have to plug Ada's rich type system for explicitly encouraging this kind of design. With things like type predicates [1], you can do run-time enforcement or even prove at compile-time (to optimize away the runtime checks) that type constraints are met.
As an example of this, in a piece of code I'm working on there's a Base64_String type, where only RFC 4648 characters are permitted to be part of the string, the '=' padding character can only appear at the end of the string, and if the second-to-last padding byte is '=' then the last one must be as well. This is all enforced by the type system without having to call "validate()" or something every time its used.
Not just that, but it also often does so efficiently and doesn't incur a runtime penalty (for new type and static predicates) and will reuse previous function definitions as well.
These are the sorts of cases with function parameters in various languages other language I've dealt with, in which this would have helped:
- "dt": delta time of what? Seconds, milliseconds, microseconds, nanoseconds, ticks? Usually, I'd expect seconds if it was a float, though I've seen counter-examples, and I usually have to trace back the flow to know for use if it's a 64-bit (u)int.
- "ip_addr" and "port": What's the type of port? If you guessed "int", you'd be right in part of the system. If you guessed "string" you'd be right in a different part of the system.
- "path": Does it matter if this is a relative or absolute path? It often isn't apparent this matters and then you find out this path is passed to a different system in which it does matter.
I agree that it sounds really stupid up front, but it's done when you're just you modeling the constraints of the problem. I've found that it saves a lot of time in debugging and silly mistakes later.
For types with invariants, you just add the `Invariant` aspect and then the type invariant gets checked automatically when passed as a parameter. Combined with built-in pre/post conditions, I've found that these sort of automatically inserted checks give me a lot of confidence, and allow significant embedding of conceptual and domain knowledge during development.
If you like this style of programming but use Python for your day-to-day, check out typeguard; it provides runtime assertions for parts of the Python type annotation system similar to "Invariant".
As with many tools, there are caveats. It's often surprisingly slow (so avoid using it on hot paths, or only turn it on during your testing/pre-production runs) and can't type-check everything (e.g. callables). But it's still pretty nice and requires minimal effort to use!
Oh god, please just use primitive types. Don't make assumptions about things.
Everyone thinks they are so smart validating emails, phone numbers, zip codes and all until their great design goes live and they discover that users in the real world do not follow their assumptions.
I have seen that happen again and again. No, if your idea of validating an email is more complicated than "should have an @ symbol", I guarantee you, there is counter example that will mess you pretty system up. Have fun scrambling to fix that ticket.
Oh you think you know how an address and zip code should look like? No, you don't.
Please people, just use strings and call it a day. Why do you like to suffer?
I mean, sure, using the type system to protect you from mixing up units can be useful. Everything in moderation. Primitive types are good types.
That sounds like an argument against excess validation rather than against types (maybe because of blog-driven development showing minimal examples of the power of types?).
Picking on one of those points, suppose emails are just strings. What then prevents them from getting used as names, identifiers, and other unrelated data? Just your continued vigilance as a programmer and hoping that nobody ever carelessly names the field "address" so that mistakes can slip by over a series of devolution commits.
You might not know much about what an email is, but you know _something_ about how you expect them to behave and be used, and if you like the computer to automate your work it's not totally unreasonable to rebind the string type as an email type and require explicit conversions at the point of use to treat it as anything other than just an email. There's zero runtime cost, it's not much more code, and that class of bugs is greatly reduced except for at the boundaries of the system.
Maybe that effort isn't worth it, or maybe your domain changes fast enough you'd have a lot of churn, or maybe those bugs aren't too important, or whatever. Using types to help you reason about the things you do know and do care about can be a huge productivity boost though, so I wouldn't just write the whole technique off.
> That sounds like an argument against excess validation rather than against types (maybe because of blog-driven development showing minimal examples of the power of types?).
What? Would you rather have JSON as a string or as something like Map<string, JsonValue>?
> No, if your idea of validating an email is more complicated than "should have an @ symbol"
Then how is just using a string any better? Would you rather litter the entire codebase with validations that this is indeed a valid email or just do it once at the entry point?
I think excessive type system hacks are bad too but they are more often than not the problem of the language where its unable to express certain concepts naturally (see C++ template hacks).
> What? Would you rather have JSON as a string or as something like Map<string, JsonValue>?
This question can't be answered without additional context. What are the requirements?
But as a heads up, Map<string, JsonValue> couldn't be representation of all valid JSON documents. Lists are valid Json as well AFAIK. So I think this is making a point.
Based on the casing JsonValue is an object, not a primitive, so there's no reason that interface wouldn't work. It would just by a different subtype depending on what type the value is, and lists/dicts would also implement the Map<string, JsonValue> interface.
As much as I can just happily ignore JSON because I don't do webdev, I think there is a strong case to make that lists should be implemented using a native list, array, or vector class. But I won't argue because it's not necessary. The possibility for argument already proves my point.
My point is to show that "the type system" provides endless rabbit holes, and often is just a waste of time. There are a lot of situations where you want to represent a JSON document as a string. (Trivial example, as an embedding in an HTTP response object).
> The possibility for argument already proves my point.
Was just correcting something wrong:
> But as a heads up, Map<string, JsonValue> couldn't be representation of all valid JSON documents.
So,
> As much as I can just happily ignore JSON because I don't do webdev
I think this is why you're going wrong here. JSON is pretty much a solved problem, and Map<string, JsonValue> is pretty close to how it's used in Java and other languages that don't have native types that match its structure (like javascript and python do).
> (Trivial example, as an embedding in an HTTP response object)
This isn't json, it's what you get when you stringify/dumps/convert the json into a different datatype. When you need to be specific it's often called a "json string".
JSON is often interpreted as dictionary but is serialized as key-value pairs, for JSON itself this is not a big problem as everybody agrees not to produce JSON documents like {"a":1,"b":2,"a":3} so most people do not care about how their parser reads them.
In cases like URL queries or HTTP headers it is not such a clear cut. There it is common both to use duplicated keys and to use JSON-like dictionaries to read them.
Personally I never had bugs due to this: PHP does the "right" thing with duplicated keys and I never encountered it in node, but it bugs me that we use this kind of lossy[0] representations.
[0] in JS in particular the object are not really adequate to be used as dictionaries, inheritance and predefined keys aside there are also special magical attributes that behave in special ways Object.getPrototypeOf(Object.assign({}, JSON.parse('{"__proto__":null}')))===null;
JSON is a data-interchange format, i.e. a specification how to serialize and deserialize a domain of values. So Map<string, JsonValue> "isn't JSON" either.
JSON isn't a solved problem, it's a solution to a problem (and often used as a non-solution to a non-problem).
> Would you rather litter the entire codebase with validations that this is indeed a valid email
Why would you? Just use it. For most of parts of the program it's not relevant to the computation whether the string is a "valid email", whatever that means. It's a string.
"Just use it" is what lead to the big SQL injection fallout and even today we pay the price as not even a year ago thousands of crucial service were vulnerable via log4j because of the "Just Use It" mantra.
I say, don't make assumptions unless you need them. Formatting an email address in an HTML document would work by wrapping it in the appropriate tag. (which implies html-quoting it correctly, but that is unrelated to email syntax).
Sending an email using an API would work by passing the email as a string to the API.
Looking up an email address from an address book using a pattern to match would be implemented with normal text search.
It doesn't matter if the email is "valid" or not. Don't overthink it.
You don't know what you're talking about. For example, HTML escaping (a.k.a quoting) rules don't care what you're escaping - an email, a street name. It's just text.
And that's the point of it. You quote precisely because the container syntax doesn't know the syntax of what you're embedding. If it knew, there would be no need of the escaping.
HTML escaping requires you to look for characters to escape for it to not interfere with HTML. This is quite literally validating symbols in the input. If they fail validation they need to be escaped. It's literally IMPOSSIBLE to do escaping without first validating every single symbol in the input.
So you're "validating" symbols now (strange choice of word). But certainly validating that the higher-level construct that we're escaping is an email address. Because the escaping procedure doesn't care.
Validating is just the process of ensuring an input is admissible in the way you want to use it. That can be symbols in a string, whether a string is an e-mail or even if a number is in a certain range. Escaping is just validation + fixup which can be used in some cases. Anyway the only way to validate an e-mail in practice is to use it and confirm.
Escaping (or quoting in general) is a simple translation from a literal representation of a string to a (lexical) syntax representation with the purpose of embedding the string in an external medium (e.g. source code written in that lexical syntax).
Escaping is a mechanical process that doesn't discriminate between "valid" and "invalid". It is completely ignorant to the higher-level meaning of the string that is translated (e.g. email address) but solely operates on the constituent characters.
That is in contrast to validation, which is a simple function that decides whether a given object is admissible or not (as you say yourself). "Admissible" here is in with respect to a meaning that is higher-level than lexical syntax. It is semantic (is this a valid email), not syntactic.
(There are sometimes certain technical restrictions on which values can be represented in a lexical syntax, for example hard limits on string lengths. So there is a small extent to which "validation" can fill a purpose with relation to lexical syntax, too - but that's not what we're discussing).
To make it even more confusing, email addresses conform to a (albeit poorly specified) lexical syntax, too. And you can certainly attempt to validate if a given string is valid email address. However, HTML syntax doesn't care about that. Email address syntax is not part of the HTML syntax. HTML specifies how to escape _strings_, not email addresses.
And HTML syntax is right not caring about email syntax because it would be unnecessary complication in practice.
Just as the other examples I gave. E.g. looking up email addresses from an address book is not a task that in practice needs to be more specific than looking up a string from a list of strings.
> Anyway the only way to validate an e-mail in practice is to use it and confirm.
Which was my initial statement "Just use it" that you heavily disagreed with.
> Escaping is a mechanical process that doesn't discriminate between "valid" and "invalid". It is completely ignorant to the higher-level meaning of the string that is translated (e.g. email address) but solely operates on the constituent characters.
It does discriminate between "valid" and "invalid". This symbol is "valid" and we don't need to do anything. This symbol is "invalid" and we need to escape it. Validation occurs throughout the whole abstraction stack. Not only at the level of meaning of an entire string.
> Which was my initial statement "Just use it" that you heavily disagreed with.
In the case of e-mail I don't disagree with you. It is however balls to the wall insane to say "Just use it" in general. Which was my point. Notice how my reply specifically mentions vulnerabilities that were caused by the "just use it" mantra.
> This symbol is "valid" and we don't need to do anything. This symbol is "invalid" and we need to escape it.
It's quite a stretch to call symbols that need to be escaped “invalid”. And it's often possible to escape without discerning between “valid” and “invalid” characters. For example, in HTML you might just convert all characters into numeric entities.
> It is however balls to the wall insane to say "Just use it" in general.
My favorite "easy win" find from learning Haskell last year was just the `newtype` keyword, which basically just let you alias primitive types with zero runtime impact.
`newtype Email = Email String` and `newtype Username = Username String` are just `String`s with guard rails.
Python's type annotation system has this feature too:
from typing import NewType
UserId = NewType('UserId', str)
user_id: UserId
user_id = "abc" # Type check error
user_id = UserId("abc") # OK
# At runtime, a NewType is the identity function
s1: str = "abc"
s2: UserId = UserId(s1)
assert s1 is s2
The point is not about validation, it's about conveying semantic information through types. It's perfectly valid to have an email type that is just a wrapper around a string. The advantage is now you and all your functions unambiguously know that the type represents an email (whatever that means) and not a frobinator.
About 90% of stuff I ever worked does email things here or there, and I never had the desire for some hyper complex abstraction with 10 pages of documentation just to store email. I also never experienced a bug that would have been fixed by this.
I mean if the user is putting in some bullshit there, the basic validation of type="email" or pendant if I am not working in HTML is gonna tell him that and after that he does or does not get his verification email. Boom, problem solved along with other one's.
Need just the domain? Well I'll use a damn 1 liner function with split() or something for this hyper specific use case, don't encumber me with the probably faulty abstraction of email some library designer cooked up 8 years ago just for this?!
I do use types when they are actually convenient, but these types-are-the-best-everywhere-all-the-time articles keep failing to convince me...
Just because there are cases where over-validation (email address being the primary example) can be a problem doesn't mean that _no_ type validation is useful. There are many, many places where validation-beyond-primitives is useful, or even just types that need actions taken on them before they can be used in place of another type. One simple one is "non-negative integers", not commonly provided as a primitive type but _very_ common to need to be able enforce. Complex numbers are another. Normal strings vs HTML strings (need to be properly handled before they can be output to a front end). The list is endless, and custom types can provide a _lot_ of safety.
There's a sharp distinction between validation and typing. I can cast a string into a domain-specific Email type without validating the string. I can also reject a string due to a validation rule without changing its type.
Opaque type aliases are great because they impose semantics. Even if I don't know how to validate an email, it's nice, at times, to distinguish a string which I suspect to be an email from all other sorts of strings.
Using specialized types can be useful, but if it provide s methods they need to be correct for ALL situations.
Here's a subversion of GitHub's authentication (now fixed) where they assumed that "lowercasing domain name using English case rules is always fine and produces the same result" led to a vulnerability:
https://dev.to/jagracey/hacking-github-s-auth-with-unicode-s...
TIL that punycode breaks email.split('@') to get the domain:
Apparently John@Gıthub.com normalizes to xn--john@gthub-2ub.com but Gıthub.com normalizes to xn--gthub-n4a.com.
For now I will go back to forgetting that emails not covered by /[A-Za-z0-9.-+_]+@[A-Za-z0-9.-_](.[A-Za-z0-9.-_])*/ exist to preserve my sanity but this does confuse me.
People are getting lost about your example as they don't see the analogy as it sounds like validation of input. The point is still there if people are willing to look past the confusion, that making a type hierarchy requires you clearly know upfront what different subsets of the data you deal with actually look like and this will change at different stages of the project. This is the reason people ditched c++ and java in 2000 and started hacking in python and js, at least at the beginning although things are shifting.
The thing is a lot of things are reversed here, the reality is the way evolution is more sensible is to "generalize" from the bottom up, unless you are very sure from the start that certain types categories make sense for your problem. That's how things should always be, you can only generalize once you've worked through a problem and realized certain aspects are actually common in someway and can be connected somehow, and thus you can take a subset of data you have in your program and organize it into a type. The thing is I feel like that is even harder than thinking top-down (which I feel like is where the pendulum is swinging now) because that takes time.
The reason top-downers still do their thing is they feel their method is better is selective thinking (sorry, I know this is harsh) because they honestly discount the anger and fury people from the outside have dealing with their code and they ignore the amount of refactoring they need to do when their designs break. In fact, they love the churn really or at least merely accept it as a "part of development."
To me, type is for data abstraction, like we use generic for function abstraction.
Abstraction in this case, mean, for future changes, i just need to change in one place, the Email type abstraction instead of searching and replacing every ussage of primitive email string.
Ahem. Not all email addresses have an @ in them. If you're sending an email to another user on the same machine, just their username is enough (at least for some mail implementations).
Note that this makes your overall point stronger, not weaker.
Unless you're really going to use textareas with scrollbars for every string and store everything in unlimited length database fields, you probably do want to distinguish between single-line and multiline strings and have some limit on their lengths.
> No, if your idea of validating an email is more complicated than "should have an @ symbol", I guarantee you, there is counter example that will mess you pretty system up.
Requiring b2b users to use their @company.com email is common and some b2b customers actually expect to be able to configure that. Another simple case is stripping out any "+whatever" from gmail addresses and to flag that sort of thing for other eomains so support can verify it's not a case of someone creating 72 trial accounts. Yes, rejecting users due to naive and incorrect validation is bad, but treating emails as entirely opaque strings isn't always an option either.
> Another simple case is stripping out any "+whatever" from gmail addresses and to flag that sort of thing for other eomains so support can verify it's not a case of someone creating 72 trial accounts.
But for that you need a personal domain and the ability to configure something like that, whereas everyone in a corporation that runs their email through Google can use the +abc thing out of the box, and I've seen it used a ton. The goal with measures like that is usually not to catch every single one who tries to game the free tier but to ensure it's more inconvenient to do so than just buy it for most people, and typically filtering +abc emails will be just one of several different measures taken to that end.
> if your idea of validating an email is more complicated than "should have an @ symbol", I guarantee you, there is counter example that will mess you pretty system up.
unless you are writing an email server. Then you would need to do this properly.
In other words, validate the data that is in the domain of your application. If your app simply _sends_ email, then it's not your domain, and don't need to validate, as long as the receiving end of the email (aka, the email server) accepts it.
I'm learning Python after 35 years of working with statically typed languages (Pascal, C++, Java, a bit of Typescript lately) and by god this is hard. Not because there is anything in the language that I don't understand but the lack of any type info is killing me. I just can't build up a rhythm of coding. I feel like every five lines I have to sprinkle in print() statements to keep track of the data transformations as there is nothing useful that can be captured about it even with these weak ass "type hints". I know that even when I get this crap to work I'll hate going back to that code in a few months as I'll have forgotten what the hell it all did and will have to spike it with print() ad df.shape() again to make sense of it. And naming discipline can only get you so far.
Maybe dynamic typing just isn't my thing and I need a new gig...
I've moved from Python to a static language and it's made programming enjoyable again. I'd forgotten that that was possible. It's much more productive as well. Using Python as a production application language is like playing operation. For me what I really detest is ambient, untyped (in the sense that they aren't declared in the function definition) exceptions. Exceptions can just happen on any line, and there's no way to know what exceptions a function will raise. So you have to dig into the source code of your dependencies and such, it's a tremendous waste of time and you still get unanticipated exceptions in production.
MyPy is awesome and will make you a more productive programmer and will make your application more robust. But I agree that Python types are "weak ass". It's not a particularly ergonomic type system to use, and it's more difficult to express complex types than is worth it for the sometimes questionable benefit.
3.10 does add unions using | which is nice, and I expect the type system will get better, but I share these frustrations.
> grug very like type systems make programming easier. for grug, type systems most value when grug hit dot on keyboard and list of things grug can do pop up magic. this 90% of value of type system or more to grug
> danger abstraction too high, big brain type system code become astral projection of platonic generic turing model of computation into code base. grug confused and agree some level very elegant but also very hard do anything like record number of club inventory for Grug Inc. task at hand
I once attended a meeting where a Professor from a University somewhere in Chicago gave a brilliant demonstration of using a similar type system for dealing with values in Electrical Engineering. It made quite sure you couldn't do things like add volts and amps.
[Edit] it also handled things like parallel resistances, etc.
It was in C++ if I recall correctly.
This is a great idea, that I've haven't had cause to use yet.
Unit of measures are a great example of what a type system can do, and something not enough languages support. F#[1] and Scala[2] are two that I know of that do support UOMs. Like you, I haven't had the need to use them in the domains I work in, but I imagine that they would be invaluable in certain contexts.
It's also something that some languages seriously screw up. Consider multiplying a time (which is typed in go) with a numerical value... suppose what I want is a user to input number of time intervals to wait. So the user wants 5, and the interval is 2500 milliseconds. The way you get 12500 milliseconds out of that made me want to throw my computer out the window.
Go does not have operator overloading, and numeric operators must have identical types. So if you have `var x int = 5` and `var t time.Duration = 2500 * time.Millisecond`, you have to `time.Duration(x) * t` or `time.Duration(x * int(t))`.
It's slightly better than languages with no operator overloading nor newtypes at all (well, actually a lot better given other things you can use newtypes for) but without operator overloading using it just for units, with no other API machinery, is usually a bad idea.
1. Uh, ok. Might as well throw out the whole thread then?
2. A "time unit" is not a special type of value. You can construct arbitrary types of integers, and it is common to do so. `*` has no clue what a time is, just that it's "not an int" (for example).
The commenter you're replying to expressed it confusingly. The point is that in Go, 5 * time.Milliseconds(2500) is a type error, and instead you need to do time.Nanoseconds(5) * time.Milliseconds(2500).
`5 * time.Milliseconds(2500)` is not a type error, though `int(5) * time.Milliseconds(2500)` is.
(This is especially relevant because you really mean `5 * (2500 * time.Millisecond)` vs. int(5) * (2500 * time.Millisecond)`, as there is no `time.Milliseconds` function.)
This approach to typing could have saved the Mars Climate Orbiter, i.e a pounds of force type vs. a newtons type.
> "A NASA review board found that the problem was in the software controlling the orbiter's thrusters. The software calculated the force the thrusters needed to exert in pounds of force. A separate piece of software took in the data assuming it was in the metric unit: newtons.... Propulsion engineers, like those at Lockheed Martin who built the craft, typically express force in pounds, but it was standard practice to convert to newtons for space missions. One pound of force is about 4.45 newtons. Engineers at NASA's Jet Propulsion Lab assumed the conversion had been made, and didn't check."
There could be issues with memory use, but it could also be implemented as an API, i.e. ensuring values exported from one software package to another were of the correct type, but then store them internally as simple types... Switching everything to a unified metric system would make more sense in the long run, however.
I think you've inadvertently stumbled on another great example, distinct types for TranslatedMessage, LocalizedNumber, etc. from ordinary string has been a cornerstone of localization enforcement on at least two large applications I've worked on.
I wouldn't say that UOM are uncontentious, things can get dicey around reference units and precision for instance, or the combinatorial explosion of composite units.
Funnily enough, there are some applications for converting mechanics problems to analogous electricity problems to leverage circuit simulation software such as PSPICE to help solve things like transcendental equations.
Yes, but the mapping doesn't change the relationship between the units of measure, which is the actual meaning as far as the type system is concerned. It's just a change of names.
Mind expanding on that? Because what that readme there shows is absolutely possible in C++, I have used a similar system for dealing with natural units.
I'm happy to admit I'm wrong on that. My understanding is the bit that makes (kg.m)/s^2 type equivalent to a N, equivalent to J/m is not implementable in the same generic way.
All unit systems decompose each unit to primitives, that can then be aliased for notational convenience. So, a result of J/m decomposes the same as a N.
Note that most such solutions break (or become much much much more complex) if you want anything more than simple arithmetic from them. For example, matrix multiplication with typed values (where each element of the matrix can have a different type/unit of measure) is extremely ugly code, and basically no such library supports it - even for matrices of fixed size (say, code that could multiply 3x3 matrices with 9 type parameters - which is not unrealistic in physical simulations).
This is a fundamental limitation of types - they tend to scale poorly to very complex non-uniform structures. That's not to say that they shouldn't be used when they do scale nicely, though!
matrix multiplication with typed values (where each element of the matrix can have a different type/unit of measure)
Is this really a common occurrence? In most situations I've come across, it's the matrix itself (rows/columns) that has a unit of measurement, not the invididual columns. Tensors, rotation matrices, lighting maps: they all use the same units of measurement.
Matrix multiplication is often used for solving systems of linear equations, and you often have systems of linear equations involving different physical quantities (such as position, speed, time and mass if solving some classical mechanics equations, or pressure, volume, temperature, and time for thermodynamics etc).
And while it may be relatively common to start out and end up with matrices that have a single unit for each row (but different units on different rows), intermediate results will often end up with different combinations of units in each element.
The most successful languages are typed but weakly so. Just enough type system to avoid the biggest class of bugs, not enough to get in your way all the time. Golang strikes this balance very well. Too little typing, and your Python unit tests get too heavy to run after every commit. Too much, and you have to read a book on category theory before you can figure out how to grab that one field using Lenses in Haskel.
Edit: I should note this is coming from an outside observer as my most favorite languages are dynamically typed like Python and Lisp. But it should also be noted that I like writing in small code bases. Larger ones tend to need typing.
I think ascribing PL popularity to striking the right tradeoff in this respect is leaping a bit far. It's compatibility and familiarity with predecessors, marketing dollars, etc. They tend to have C++ style syntax for example which is a similar path-dependence-formed quirk of history.
I want to point out that in practice, Common Lisp is also to some extent statically typed. The compiler will issue warnings if there are forms that it can determine (at compile time) would cause type errors at runtime. It's common practice to not accept code unless these errors are eliminated (one can even set up your compile system to abort when they are found.)
What it will not do is reject the program unless it can confirm that every expression will not cause a type error.
Those are only advice to the compiler, but importantly `safety` is about run-time error checking. Pushing `speed` higher and dropping `safety` to 0 means that runtime type checks might be removed for certain things, but it's not required to. Like if you've similarly declared that a variable definitely holds a `fixnum`, it'll believe you whether that turns out to be the right thing in the end or not. But again, it's advice. The compiler could leave those runtime checks in place, too.
An aside: never declare something to be a fixnum in Common Lisp. Fixnum means different things in different CL implementations, so this is a good way to get unportable code. Instead, use explicit integer range types.
Python is strongly typed. The weakness (such as it is) in python is that in historic idioms the type is ignored in favour of the interface (duck typing).
2000 square dollars? If I'm choosing between ways to spend capital so as to improve the efficiency of a process, and that process currently produces five widgets per dollar, then the quantity I'm comparing to choose between my courses of action can be measured in widgets per square dollar.
Hiring a better engineer for more money may create an efficiency improvement of 1 widget per dollar, with an outlay of $1k extra for the better engineer, giving a total gain of 0.001 widgets per square dollar; hiring a worse engineer for much cheaper may represent an improvement of 0.1 widgets per dollar, at an outlay of $1, giving 0.1 widgets per square dollar.
Perhaps not the most intuitive unit, but it's not impossible. (Though since you can even measure it in dollar-sterling if you like, I suppose that doesn't make it a counterexample to "stop multiplying dollars by sterling".)
You have provided a real example, which I was looking for, of why one might need to express a square dollar; thanks.
I wonder if the people who want to argue "types save you from bugs" see your example as very unwelcome, since they'd want to use "squared dollars" as an example of something nonsensical that should be flagged as a type error. I hope those people can reflect rationally on the limits of type systems in the real world.
A type system is merely a tool to encode information to help better model things. If you want to prevent multiplying dollars together, types can help you do that. If you want to enable multiplying dollars together, types can help you do that, too.
# oops my scaler has a unit
x unit * x unit = x unit^2
The value isn’t catching this line of code since it’s potentially valid. It’s catching the line of code where you pass the result to a function that expects unit.
I agree, but the original article at dusted.codes hopes types will "prevent silly mistakes like multiplying $100 with £20".
I don't know what that author would think of multiplying $100 with $20, but my point is that this embrace of type systems is apparently not just about function interfaces; it also includes the operands of things like multiplication, and preventing that operation if the types are fishy.
For better or worse hopefully they think the two examples the same.
Using the example above, "multiplying $100 with £20" can be achieved just by making the better engineer a remote employee paid in pounds. It adds the exchange rate into the mix, so the math will change over time, but conceptually the math is the same as the "dollar * dollar" example.
There isn't a reason why you shouldn't write something like $100 * (£20 / £47) as "int dollars = 100 * 20 / 47;". Note that this expression assicates to the left instead of the right, which can be the right thing to do if doing integer arithmetic. But it would not work with a strongly typed setup as in your example.
In my experience trying to prevent accidental mistakes is a waste of time and often makes our lives miserable. Catching the rare bug by doing complicated work in the type system when it would have been easy to find in normal code anyway is not worth it.
It was just the first example from the top of my head. The expression above calculates the right thing, cast to int. In general, prescribing which units we can multiply and which not, is extremely silly if you consider how we learn it in school. You can multiply anything and everything, simply take care of the units. There isn't an obvious reason why we couldn't have 2000 dollar-pounds as a transient value in a longer computation.
The real problem is that most type systems aren't fit to track the units automatically. Solution: Don't beat yourself up, track the units in your mind / in comments / in variable names instead of the type system. And just get it right. It's not that hard - if you mix something up that's usually the type of bug that is immediately noticed and fixed.
Programmers will have immediate answers for you -- stated confidently as if to imply there is a spec somewhere when in fact no spec exists and the programmer you're talking to is peddling their own bullshit as gold.
A dollar times a dollar is a dollar squared. You don't need a spec for that!
For example, if you have a random variable that's in dollars, its variance would have units of dollars squared. People consider the variance of dollar estimates all the time.
Hey author here! Sorry I didn't respond to any feedback yet. I've literally posted this before leaving my house and didn't think it would get many upvotes as it didn't get any votes the other day either.
Sorry that the general sentiment is "everything old gets new again". I didn't try to rehash some old news again. I basically blog about things that come up in my daily work life and this topic was something that I felt quite passionately about. From my own experience I felt that type systems, especially in modern languages, are not nearly as well utilised as they could be. Of course there is always a balance to strike, especially with over engineering and needless optimisations, but that is a topic for another blog post another day.
Don't let it get you down. HN sentiment often trends grumpy when someone makes a point that's been made before. That doesn't mean it's not important to restate, extend, elaborate on, modernize, and recontextualize ideas!
There are almost 8 billion people on this planet; most claims echo prior statements to some degree.
I found your article practical, short, and largely accurate; which is to say: I liked it. I think it could be improved with either an edit or followup which links to similarly-inclined articles, papers, or talks that discuss the topic, so folks can deepen their understanding of the role of type systems in day-to-day programming and PLT.
>A string value is not a great type to convey a user's email address or their country of origin.
So we have a type for "country of origin". And then some country that you have in the records splits up into 2 countries, what do you do then? Do you keep a list of all countries that ever existed and keep it up to date?
This approach works good in some cases, but not always
Is the alternative to sweep it under the rug? In a stringly-typed world, what happens - do you just hope for the best? In a typed world, the problem ("the real world has ceased to conform to the model") is at least made plain so that you can decide what to do with it, because the model is actually… modelled.
In a stringly-typed world, you still end up with a purpose-tuned model of what an email is, how it's used, and what error cases are -- these just aren't implemented as qualities on a type.
There's a dozen approaches for it, many from the functional programming paradigm. But an example of one approach would be that your consumer functions become responsible for interrogating the data they'll act on -- through assertions or other verification means.
Comparing to the real world: my metal foundry doesn't yell if it gets a non-metal Type of material. But, the logical process it follows (heating to 1000+ *C) takes care of all but a handful of corner cases when you give my foundry the wrong data type.
The choice to wrap all the logic into a "Type" and then get mad when the model logic exists elsewhere is just a choice. And a weird one people get VERY OPINIONATED about.
I guess I'm only opinionated about it because I'm 100% not smart enough to get it right unless something stops me getting it wrong. ("It" can be pretty much anything here.) It's why I'm such a terrible Python programmer. The foundry analogy is spot on - there's nothing to stop me throwing my grandma in, so at some point you can bet I accidentally will.
> I'm 100% not smart enough to get it right unless something stops me getting it wrong
IMO, type systems are harsher on modeling mistakes than something like Python is. Sure, you'll get it wrong the first time (sorry Grandma!). And in Python, you can mutate your system rapidly into a new state that can accommodate the old model's mistaken assumption. If your program starts getting complex enough that the mutation speed is dropping -- deconstruct it into smaller, manageable problems.
Humans are 100% not smart enough to build systems the way a lot of corporate shops keep trying to.
This problem has already been solved. Use something like https://en.m.wikipedia.org/wiki/ISO_3166-1_alpha-2 and get an authoritative list of countries / territories, including defunct ones like Yugoslavia and USSR. You still have to define the business logic of whether you want to keep a certain country record aligned with historical borders or with current borders, but that problem exists whether you have primitive type to represent countries or not...
> You can have your name change as well, or your calendar can change
What do you mean by that?
Having a type for "country of origin" would mean that the type gives you limits on what values it can hold (any country known to ever exist) so you can not say something like:
Country c = "Foo"
because Foo is not a country.
I can't imagine having a type for a persons name that holds checks anything but perhaps a strings length, certainly not a list of all possible names.
The calendar example I don't get. We already have "date" types in almost all languages so that "works", although it can be used as example of how hard is to implement some types.
-----
So I say:
>> This approach works good in some cases, but not always
And you say:
> Does that mean we stringly type everything?
> I can't imagine having a type for a persons name that holds checks anything but perhaps a strings length, certainly not a list of all possible names.
As always, that is very domain-specific: there are lots of countries with naming laws, some of which do have lists of legal names.
Likely nothing: usually they’re laws which apply to parents / birth certificates.
I guess it’s possible that an immigrant trying to get naturalised would have to adopt a “legal” name for the country, but I’m not aware of any country where that’s a rule, aside from the Zairianisation movement of Mobutu.
Restricting the space of possible values is only one of the possible advantages of declaring a dedicated type for something. Even if it is not possible to restrict the space of possible values (e.g. with names, which can't realistically be restricted to any smaller subset than all possible strings), there are other advantages to having a dedicated type, such as preventing the user of the type from putting a value of that type into somewhere it doesn't belong, which is a very realistic scenario in stringly typed codebases, especially where there are similar but different sets of values, all stringly typed.
A string is a wrong model for an email address. But it's a pretty useful one.
A custom type sitting lonely in an isolated codebase IS ALSO A WRONG MODEL. Arguably, it might be more a useful one than a string. But that's debatable.
And on that debate, I'll argue a string is a better model because it is a better UNDERSTOOD model by more PEOPLE than whatever MyEmailClassForThisProject you just came up with.
It's not. You have the same problem regardless of type you place there.
> A string is a wrong model for an email address. But it's a pretty useful one.
Depends on use case. Perhaps it's an overkill in this toy example.
I've real life use cases with untrusted user input where having raw string as untrusted and some kind of verified type as trusted would eliminate whole swath of errors.
Class as a concept is somewhat orthogonal to type, at least to people into programming language research, who consider "type" to implicitly mean compile-time type. Classes are taken to refer to support for virtual method dispatch, while types refer to compile-time expression type-checking. In languages like Java, C++ or C#, the type of an expression corresponds to the class of the value it will have at runtime. However, in Python for example, compile-time expressions always have the type "any", but can have various classes at runtime.
This difference between class and type can even be seen in some of those languages. For example, the type *X has no corresponding class in C++ for any X. In Java, the type int doesn't have a class, and the types ArrayList<Integer> and ArrayList<Object> have the same class.
Depending on the language yes, or probably, or definitely not. A class is one of several ways to represent a type. Some languages have structural type representations and a class/instance is equivalent to a “plain old ___ object”; some languages have types but no notion of classes at all.
Sure but the requirements given by the author are exactly represented by, for example, Java's class system. So why is the author dreaming about something that has existed for 20+ years?
>A string value is not a great type to convey a user's email address or their country of origin. These values deserve much richer and dedicated types
this is a classic case of not needing more types but needing proper names. Types as concretions, i.e. simply collections of data or functions are a terrible idea because they're static and don't accrete. Data in the real world always does. This becomes very obvious when you go down a paragraph and you see the conundrum:
>For example, let's have a second type called VerifiedEmailAddress. If you wish it can even inherit from an EmailAddress. I don't care, but ensure that there is only one place in the code which can yield a new instance of VerifiedEmailAddress
okay, and for the next email setup let's have a third type, and a fourth type, and a fifth type, and so on. The end result of this is a zoo of types that help nobody to understand anything. It reminds me of an older Rich Hickey talk. When you program a delivery truck you don't make a type for each different truck because of the contents of the truck, you just take your delivery out of the truck and you don't care about the rest.
No, there should only be the one EmailAddress type. If it's not valid, it's not an EmailAddress.
Does having an EmailAddress type guarantee you won't accidentally accept crap? No, but when you get it wrong, you edit the validation in one place in the system.
> You check that stuff when the data enters the system.
There can be N entrypoints where data enters the system (different controllers, CLI), so you must always remember to validate emails in N places, otherwise broken data could end up being passed to business logic. Data can also be constructed inside the system. It's nice to have one centralized place where email is validated. Placing it in the constructor of a special type and using only that type for email guarantees it's impossible for business logic to receive invalid emails in principle, no matter what you do, because when an exception is thrown from a constructor no object is created at all. No object = no invalid data to deal with. You know that when you see EmailAddress (or any other type where state is validated in the constructor) it's in a valid state, there's no ambiguity, and, in my opinion, it's also more readable than just some string, the intent is clearer.
If you can construct an EmailAddress, then you have a valid EmailAddress. That's the point.
If an EmailAddress can be a valid or invalid email address, then just leave it as a String (since that can also be a valid or invalid email address).
> You check that stuff when the data enters the system.
Yes
> If that place is the EmailAddress type, then you have built your system wrong.
No
If you validate & construct an EmailAddress from another external class, that means external classes are free to bypass validation and construct an invalid EmailAddress. Putting the validation/construction inside EmailAddress lets you force construction to go via validation.
My advice: Relax and don't argue. People who don't understand that constructing an EmailAddress type is also validating the raw email string (in this case) will never understand it. They'll remain convinced for a very long time, possibly the rest of their lives, that they know better. That passing a string around is fine as long as either you always validate it everywhere (yes, kill your performance, that's smart) or that they validated it once and they pinky swear to never change the value and to always call validate before passing it into the system.
Let them find subtle errors in their programs over time, it's job security for them. They don't want to move on to new and more interesting things they just want to keep fixing the same shit for the rest of their careers.
no. a type is a description of a set of values and its associated operations. Types impose global meaning on entities in your program. When something belongs to a certain type receivers of arguments of that type lose control over how to interpret them. Thus types introduce coupling.
Names are just labels attached to an entity for the purpose of identification and readability, they don't impose meaning.
Sorry, I accidentally took the delivery out of the email. You made them both have a deliver_to(address) method, you spent most of your comment talking about emails and the computer surely didn't stop my underslept human self from confusing an email address from a physical one.
then you should complain and check what's in your mail. The fact that a delivery method is generic isn't a problem, delivering things from A to B is a generic task. The recipient of the packet handles the content, the deliverer doesn't care what's in the box. deliver_to ought to be reusable, there shouldn't be 50 versions of it.
When we send json over the wire do we rewrite methods globally to make sure we're all in sync about the content? No, you as the message recipient make sure that what you got makes sense and how to deal with it.
You know those "how to draw Bugs Bunny" art guides they used to include in children's art books? Where they begin with a circle with some guidelines and then do a whole bunch of stuff and the end result is Bugs Bunny? But you have no idea how they went from Point A to Point B? That's the same thing with Type Theory.
PROFESSOR: Well, you see, there are different objects, like strings and numbers, that are shaped differently, so we put them into different categories. Those are types! See how simple this is?
<a whole bunch of stuff later>
PROFESSOR: Now the endofunctor of the covariant types are jointly distributed under the free monoid, provided that the pullback doesn't reverberate the time fibration of the dependent type space.
That's not possible, because I can't discern the final cause of all these theorems and definitions in "a whole bunch of stuff". It all comes off as a game of defining abstract objects just for their own sake.
I don't know if you're a theist, but you can pray for divine grace that this blockage go away. Other religions like Buddhism also teach the power of positive intentions in releasing spiritual blockage.
There is an important difference between types and objects: types are basic building blocks. An email is not a basic building block nor is money. In the mature ecosystem of Java there are libs and standard libraries that handles these problems some are even in the standard library (timestamp with timezone, url...)
It would be nice to have stuff for these in the standard libraries but currencies are a moving target and needs constant update to handle the quirks of the real world. For the same reason it's extremely hard to create an "Address" class that handles every possible scenario.
In my book the string is a great way to store a country of origin because everything else depends on the usage of that information.
Yes but they should also be compositional, one should build larger types from smaller types. This is how we scale to solving difficult problems.
> but currencies are a moving target and needs constant update to handle the quirks of the real world.
This is a separate issue orthogonal to types. The problem of modelling the real world and what models might be more generally useful. No model is perfect, all have trade-offs.
> In my book the string is a great way to store a country of origin
As an implementation using a string sounds pragmatic, but that should not be its type!
You should still create a currency type and parse into it. Then you won't accidentally pass an arbitrary string to a parameter that is supposed to be a currency.
To compare them this way is likely to cause confusion.
A type (of a term) represents the set of all possible values for a particular term. An "object" in OOP, does not have any formal definition, but is typically a first-class module with mutable state. As such, they can be represented by a term and therefore can have a type.
I am saying that an object has a type, rather than is a type or some augmentation of it.
An object is a term-level construction and therefore is not really comparable to a type. Types can be given to both state and behaviour. For example, a function type describes pure behaviour.
Note that statically-typed OOP languages have a name for the nominal types of objects: "classes". One could say that a class is a type representing both state and behaviour.
Why is it better to have two email types, VerifiedEmail and UnverifiedEmail vs. one Email type with an "isVerified" field?
One type is probably going to align with storage and transport better, and you probably mostly want to treat verified and unverified email addresses the same except for some very specific situations. (E.g., maybe only your EmailBlaster cares, where it's like a privilege: some can send to unverified emails and some can't.)
This seems like a bad for types to me.
(It's all code you write and data you design -- types are just one tool... you need to think about they best tool, not get fixated on one, no matter nice it is.)
You can then use visibility controls to universally guarantee that recognize and validate must be called before send. No test can ensure this is true.
Under the presumption that send should only perform work on verified emails, the alternative is not being able to be confident that emails passed to send are pre-verified. This means that send must check this flag, and therefore have the ability to fail due to non-verification.
This isn't inherently a problem, but it can lead to a failure to separate concerns. If one part of your system is responsible for parsing and validation and a separate part responsible for interacting with the sending machinery, it's unfortunate if the latter part can fail due to a failure to verify the email. These systems have now implicitly shared responsibility.
You can try to guarantee that no email is passed from the first system to the second without being verified, but this can be challenging. It's a universal property. Tests can show the presence but not the absence of bugs.
But those types we showed at the beginning provide exactly that guarantee.
But then you can just have a "makeEmailVerified" that takes an UnverifiedEmail and converts it to a VerifiedEmail without verification. Then the "send" function still needs to check things.
Typically, the owner of the VerifiedEmail type restricts your ability to construct new values of that type by making the constructor private and then only "blessing" a small number of public constructors, each one performing that verification.
If any type could be converted to any other (compatible) type for any reason whatsoever then types would have no semantic value. But fortunately, we can control how types are constructed and used.
I think you understood the use, but value the safety less than I do:
> maybe only your EmailBlaster cares, where it's like a privilege: some can send to unverified emails and some can't
A boolean flag is strictly inferior, because it is a runtime check. You can only ever be sure that you're processing verified emails at runtime, and there's no way to require that the code guards against that in all places. If they're different types, you can't even pass an unverified email to code that needs verified emails, so you eliminate the entire possibility at compile-time.
It's so that when you are 50 function calls deep you don't have to remember if you are handling a verified email or not. This is the same problem I have with "sum types" as implemented in languages that don't have algebraic data types. You either have a massive struct that contains mostly nullable values or something like a tagged union.
>verified and unverified email addresses the same except for some very specific situations.
This is when typeclasses are a useful concept (or interfaces). Instead of designing your function around a concrete type, you can codify that the caller needs to provide any types that satisfy some requirements. For example, if the function only cares that the input can be treated as a string, you can ask for something like `As<String>`. Then, the caller can provide literally anything as long it implements `As<String>`.
> types are just one tool
Indeed, types are just a tool. But it is a much better tool. Its an electronic shaver instead of rusty axe. This is my biggest gripe with "simple and small languages". More often than not, you just end up writing more verbose and complicated code just to compensate.
To illustrate, I wanted to implement `INCR KEY VALUE` from Redis.
```
item, exists := db.keys[key]
if !exists {
return
}
value, ok := item.Value().(string)
if !ok {
return
}
intValue, err := strconv.ParseInt(value, 10, 64)
if err != nil {
return
}
intValue++ // <== this is literally the only meaningful work I am doing
Maybe it's just a bad example (this is what the main article is about, though)... Generally speaking, you need to be able to send emails to both verified and unverified emails. The difference is in what the email you are sending is about. That's why VerifiedEmail as a type doesn't make a lot of sense to me.
You'll need sendToUnverifiedEmail(email: UnverifiedEmail) and sendToVerifiedEmail(email: VerifiedEmail), and have code to get the right type to pass to the right function the in the right circumstance...
You've got the same potential for getting this stuff wrong, whether you express it in a type or in imperative code or however you express it.
Replies are generally assuming the type part of the code is bug-free and the imperative part of the code is not, which just isn't reasonable.
Also, the static vs. runtime stuff is irrelevant to this example: the verified status of an email address is a runtime property that cannot be known at compile time (OK, I'm assuming you don't hard-code verified email addresses). I.e., due to a bug, a value of type VerifiedEmail could be created for an email address that is not really verified. Then, your static checks for VerifiedEmail don't help you at all.
Further, "verified" vs. "unverified" is really a business concern around when it's OK to send certain kinds of emails to the address. It has a fairly standard definition, but there are qualifiers for email addresses that are at least as important to a business that aren't. E.g, did the owner of the email address opt in to marketing emails? Or opt out? Or did not express a preference (yet)? Are you going to have types for those? You'd have to have 3 X 2 types to encode that... that's (verified, unverified) X (opted-in, opted-out, didn't specify) types. Then your business adds a new kind of email, securityAlerts, so now you've got (verified, unverified) X (opted-in, opted-out, didn't specify) X (gets-security-alerts, doesnt-get-security-alerts). Oh wait, some emails shouldn't go to banned people. So now you've got: (verified, unverified) X (opted-in, opted-out, didn't specify) X (gets-security-alerts, doesnt-get-security-alerts) X (banned, not-banned). And you need a "send" for each type, called at the right spot.
> You'll need sendToUnverifiedEmail(email: UnverifiedEmail) and sendToVerifiedEmail(email: VerifiedEmail), and have code to get the right type to pass to the right function the in the right circumstance...
Only if you're using a language with an insufficiently strong type system (e.g. Java, C#)
class SendTo t where
sendTo :: t -> IO ()
newtype Email = Email string
instance SendTo Email where
sendTo (Email address) = ...
newtype VerifiedEmail = VerifiedEmail Email deriving (SendTo)
newtype FooBarBazEmail = FooBarBazEmail Email deriving (SendTo)
> I.e., due to a bug, a value of type VerifiedEmail could be created for an email address that is not really verified. Then, your static checks for VerifiedEmail don't help you at all.
Of course they do - they tell you that the bug is in the verification code, and not in any of the thousands of lines of business logic separating it from the place where the error was found.
I don't know Haskell, but in the typescript code the types aren't doing anything... sendToEmail sends to any kind of Email. And if the code wants to know if an Email is verified, it inspects the verified field.
> Of course they do - they tell you that the bug is in the verification code, and not in any of the thousands of lines of business logic separating it from the place where the error was found.
Whether you centralize the verification code (or otherwise have a separation of concerns for it) or not is independent of whether you use the type system to express when an email is verified or some other mechanism.
That's my point, you don't need a combinatorial explosion of behaviors for every possible most-specific-type, you can just reuse existing ones.
> And if the code wants to know if an Email is verified, it inspects the verified field.
That's exactly what you shouldn't do. Runtime "type" inspection is just bad overly distributed parsing. The point of typing is to move errors to compile-time.
This doesn't need to inspect anything, because the type signature guarantees that someone else has already handled that.
> Whether you centralize the verification code (or otherwise have a separation of concerns for it) or not is independent of whether you use the type system to express when an email is verified or some other mechanism.
Separating the verification code - which is table stakes, really - doesn't guarantee that you're not accidentally adding or losing "verification" elsewhere. Types can and should.
> The point of typing is to move errors to compile-time.
You've got to understand: Whether or not a particular email address is verified or not isn't something that's generally known at compile time... That means compile time checks cannot actually guarantee if that email address is verified or not.
The compiler cannot know something that isn't known. There must be runtime code somewhere doing the check.
What people are really talking about with types here is that if you organize your code in a certain way, you can put the code that verifies whether an email address is verified or not in one place and use types to help make sure other code doesn't accidentally ignore or change that guarantee.
That's good and fine. But... (1) you can do the very same thing without types; (2) Either mechanism is only as good as your code organization and controls that ensure there's a single place this is determined (and that that place is correct).
> Whether or not a particular email address is verified or not isn't something that's generally known at compile time
Yes, obviously. What is knowable at compile time is the stuff that comes after: given that the input to this function has property X, does the output have property Y? Arguments, not premises.
> But... (1) you can do the very same thing without types
Sure, and if someone makes an SMT-solver-oriented language (that isn't just using it to drive type inference) I'll be happy to try it out. But what most people who dislike types claim is that you can replace them with testing (true in principle, false in practice: no one actually remembers to test every last edge case every single time) or just programmer discipline (lol).
In fairness this takes an unusually strong type system to express, doesn't it? Typescript can do it, but I don't think e.g. Haskell98 can do it out of the box in an analogous way? (Of course, it's hard to prove a negative and I'm not super familiar with Haskell, but my evidence is that I'm pretty certain F# can't.)
> but I don't think e.g. Haskell98 can do it out of the box in an analogous way?
This is basically the runtime representation of `data Email = Verified { email :: string } | Unverified { email :: string }`, but promoting `Verified` and `Unverified` to type level requires an extension:
data Email (verified :: bool) where
Verified :: string -> Email true
Unverified :: string -> Email false
Perhaps, or perhaps the poster is a beneficiary of the (useful IMO) "do you want to post this again? We felt that it's good but didn't get visibility this time around" moderator outreach tradition.
I'm working in a codebase that has, at times, 10+ different expressions within a single conditional in many places, and trying to pull out the context of why the conditional exists in the first place make grug brain hurt. At the very least, you could put all of the expressions and assign to a boolean with a variable name saying wtf it is you're conditioning on.
Eh... I agree that the minimization of LoC is almost certainly not the most important vector on which to optimize, but I'm not convinced the example linked here is an improvement. The author is obviously correct that their version is easier to debug and slightly easier to understand, neither of these improvements, taken in isolation, satisfy these conditions when taken as a whole.
In terms of ease-of-debugging, sure, splattering local variables and extra control statements may allow you to break/inspect a certain class of bug in a certain way. But it also creates a lot of noise and makes the code a lot more "dense". It's hard to see given an example in isolation, but when all of your code looks like this it can make it significantly more "tiring" to understand. "Easy to debug", while important, is also something that must be balanced against other factors.
And in terms of easy-to-understand, again, I agree that the author's example has a slight edge (give the first one a shot though... it's not so bad). But what does it mean for a `Contact` to both be "inactive" and also "a family or friend"? They have forgotten to capture the single most important condition! Similar to my first point, it can be hard to see the issue when given an example in isolation, but imagine looking for whatever condition or rule the author is enforcing in a sea of other blocks that look similar.
A simple comment over the original version would suffice for me:
> A string value is not a great type to convey a user's email address or their country of origin. These values deserve much richer and dedicated types. I want a data type called EmailAddress which cannot be null.
Sure, I'm on board: I also want an e-mail address type. Just not in your shit language in which something can be of type String, yet be null reference.
Hmm, oh! C++ comes to mind. Of course, there is such a thing as null, but in:
void fun(string x) // std::string
{
}
x cannot be null. The reference semantics (like a copy of a string sharing the data with the original) is an implementation detail/optimization encapsulated inside what appears to be a value type.
The tools are there in C++ to create ideal types for your problem domain which just look like value types that have no null domain value (or any other value you don't want), and standard strings are like this.
Pointers to std::string are almost never required, though. A pointer to std::string is not something you have to use to write a string handling C++ program or module; and such a pointer p is itself not a string, *p is (if p is non-null and valid).
About the only time you would need a pointer to std::string when calling some C API that takes a callback function with a void * context, and you'd like that context to be a std::string. Then you might take the address of string object to pass to that API. (That pointer would likely never be null, but could go bad due to lifetime mismanagement.)
Most other uses of such a thing would be unidiomatic. Whereas, in some languages, string references that can be null are foisted on programmers as the standard, idiomatic string representation. That's a big difference.
But in Kotlin non-nullable types are the default. In C++ references are non-nullable. Both include nullable types too.
In that specific example, String can't be null in Kotlin and neither can string in C++, although you can make them nullable by using e.g. a pointer or an optional type.
Having everything be nullable like in Java is unarguably 100% a mistake.
Type systems cause programmers to write 300 classes no one references and many duplicates of each other.
Dynamic typing allows you to focus on what really matters, not trivial business logic OOP hierarchies that get inevitably ignored.
I feel like in app development, there’s something honest about dynamic typing. You’re focusing on the instance rather than the unnecessary model definition that again, nobody uses and redefined elsewhere anyway.
Tbh, I’ve never used a language like JS professionally. I’m sure a lot of code bases are copied and pasted messes with state dependencies and things that make it so you HAVE to focus on the type. It’s an app language, you’re just not gonna find elegance lol.
I’ve been a C# dev for a while now. I’ve just found that outside of libraries/frameworks/etc how rarely object model code actually gets reused. And I appreciate the cut to the chase aspect of that.
that's the issue: there is no other way to create new types aside from creating a new class in mainstream languages. Those two concepts are separate, and should be treated as such. Types are not Classes, the last is just a lousy "embodiment" of the first
Classes are composite data types, what you are referring to are primitive data types.
A class isn’t necessarily some wrapper around primitive types, they can contain data structures, other type instances, etc. Obviously that eventually leads to an end object containing a primitive, you can’t just have fancy trees of nothingness lol.
> Obviously that eventually leads to an end object containing a primitive, you can’t just have fancy trees of nothingness lol.
You absolutely can have types with only one value, e.g. a class with no members.
Then you can have a tree type composed of those nothing types, where the information is in the tree structure, not in any primitive value stored anywhere.
module Example
type mytype
val id: mytype -> mytype
end = struct
type mytype = {
position:(int * int)
}
type othertype = int
let id self = self
end
```
How many classes I have defined? Exactly zero. How many types I have defined? Definetely not zero. One of these types is secretly equivalent to a primitive type, while the other is not, and none of this is known to users of these types.
I am not sure why keep trying to show "class = type". It is wrong, those are different concepts. I don't care what c++ says
The essay uses a class-oriented type system (they're clearly a .net developer), but the same ideas very much exist in non-OO type systems.
And the richest and most expressive type systems are arguably specifically non-OO. Nor are OO languages necessarily statically typed (Smalltalk, Self, Python, Ruby, Javascript, ...)
I feel compelled to note that C only has the lightest possible amount of type checking, if even that. For example, the following program compiles:
#include <stdio.h>
void foo(double* c) {
printf("%g", *c);
}
void bar() {
printf("bar");
}
int main() {
int x = 9;
int* y = &x;
foo(x); //warning
foo(y); //warning
bar(1.0); //not even a warning
}
I think it is related to how you organize your data structures and business logic - you may have rich typed model that models specific domain, but OOP would typically call to include all the business logic that modifies attributes of class to be part of that class - eg, you do not externally set specific values to class instance, but instead execute some action on the class instance that may change these attributes. If you use type system just to model data structures and have external business logic to use/change it, it would be called anemic model.
>A string value is not a great type to convey a user's email address or their country of origin.
I can argue whatever type you use in place of a string will similarly be "not a great type". This can be argued in perpetuity because no type/map actually matches the reality it's encapsulating.
Type systems require you to build a Pretty Good Theory of your problem space so that your types can overlap with reality/actual usage as much as possible.
The problem is at planning/design time, you'll have a Pretty Crappy Theory of your problem space and can only get a Pretty Good one after having wrestled with it for a while.
A dynamic, more forgiving language, allows you to build what you can today with the theory you have, and then change it in the future when your theory gets disproven.
Started a Ruby project using Sorbet and I'm refactoring with confidence now. It's such a huge help. The project is about 9000 lines of Ruby by now and I don't think I would have been able to get this far without static type checking
The second edition of the book Refactoring was written to use JavaScript instead of Java in part to dispel the myth that you can't confidently refactor without static types.
Haskell and related languages have distinct types; so does Nim. But even if there's no direct language support, you can always emulate them by having a compound type (record, struct, class, whatever you call it) with one field.
I’ve always grokked primitive types as mapping to different concepts used when storing values in memory.
A string is some bytes in a line with a terminator at the end.
An integer is a group of signed or unsigned bytes.
An enum value points at another value with a pointer.
Etc.
What I think this describes is some validation classes, which don’t need to be built into a language’s runtime. Primitive types have a real reason for existing when compiling these apps, a validation class doesn’t. It can just be a library, in which case this is a nice API for validation!
This doesn't make too much sense. For example, (on 64-bit Linux) long, unsigned long, long long, unsigned long long, double, void*, char*, int(*)(int), []int are all stored in exactly the same way: they are a 64-bit value somewhere in memory. You could argue that double is different since it's just packing a mantissa and exponent, and that signed is actually a sign bit + a 63-byte number, but that still leaves long*, int(*)(int) and long being the same thing: a 64-byte number. Not to mention that struct X { long X } has the same representation as well most likely.
Instead, it's more normal to think of types (primitive or not) as descriptions of what can be done with a particular kind of value - from this point of view, it's obvious why long* is a different type than long (you can dereference it) or int(*)(int) (you can't call it). With this new definition, we can also see why we may want to distinguish EmailAddress from String - you can send an email to an EmailAddress, but you can't send an email to a String; conversely, you can sort the characters of a String, but you can't sort the characters of an EmailAddress.
I come from the C++ world, where it is natural that types are checked at compile time. And where you would use classes to implement the 'sophisticated types' the author suggested.
So in the ears of a C++ programmer the article says: design your classes well.
With "var x = 0;" as if that's somehow better. So instead of having clear blocks of types that you can visibly read, it's concealed. And the type could change each time you run the compiler.
It's amusing that the first language with great refactoring tools had dynamic types (Smalltalk). But I think it's an underappreciated note that static type languages often create a bias for earlier coupling, less system-independent modularity, and a lot of unnecessary data copying from structure to structure, creating in turn a stronger need for refactoring tools since what should be small changes end up becoming larger ones affecting more places. Even though they can't catch everything (thanks to reflection) I'm pretty happy that such tools exist when doing Java development. I've sometimes missed them in less tooling-mature dynamic langs, but also have found them less necessary. (Though I'm sure part of that is due to other feature-factors, like closures, that historically have been a long time coming (if ever) to the most popular static langs.)
Types got a bad wrap because of C++. There was a strange dichotomy between languages like python/javascript and C++. If type systems were so good why was it easier to program with javascript and python then with C++? People got confused and promoted dynamically typed languages as better.
What many people didn't realize was that C++ was hard DESPITE the type system, not because of it. This was soon rectified with type script which eventually caused a complete flip of opinion in the industry once javascript developers realized how much better it is.
The other question to this equation is why was python so easy to program for DESPITE not having a type checker (it has external type checks now, but I'm saying before this)?
The answer is deterministic errors and easy traceability. If you have an error that happens either at runtime or at compile time you want to easily know what the error is, where it came from, and why it occurred. Python makes it VERY easy to do this. Not all type checkers make this easy (see C++).
In actuality type checking is sort of sugar on top of it all imo. Rust is great. But really the key factor to make programming more productive is traceability. Type checking, while good is not the key factor here.
Think about it. Whether the error occurs at runtime or compile time is besides the point. Compile time adds a bit of additional safety, but really if an error exists, it will usually trigger at some point anyways.
The thing that is important is that when this error occurs whether compile time or runtime you need as much information about it as possible. That is the key differentiator.
That is why typeless python and typed rust, despite being opposites, are relatively easy to write complex code for when compared to something like C++.
> Whether the error occurs at runtime or compile time is besides the point. Compile time adds a bit of additional safety, but really if an error exists, it will usually trigger at some point anyways.
Well, if the error is at compile time, there's no chance that code makes it to production and affects customers.
If the error is at runtime, you need to have tested that edge case and if you haven't, there could be customer impact.
I mean, once you see a few TypeErrors in Python code with no type annotations, or a few NullPointerExceptions in Java where there's no compile time null checking by default, I think it becomes very clear that catching things at compile time is much better...
I agree with you, but I'm saying things from a practicality standpoint.
Let me put it this way. If you have type checking I'm saying that from my anecdotal experience you probably catch 10% more errors then you would normally catch before deployment. The reason is you're bound to run runtime tests anyway and these tests cause you to correct all your little type bugs anyway.
And this isn't even errors you would'nt've caught. It's just about catching the errors earlier.
That's it. Catching 10% of errors after deployment rather then before... that is not a huge deal breaker. Type checking benefits are marginal in this sense. Yes agreed it's better, but it's not the deal breaker.
I'm trying to point out the deal breaker feature. The delta difference that causes python to be BETTER then C++ in terms of usability and safety. Type checking is a negligible factor in that delta is basically my thesis. This is subtle. A lot of people are going on tangents but that is my point.
This is offensive. To suggest my opinion has no connection to reality?
Read what I wrote carefully. I'm not talking about type errors. I'm talking about errors in GENERAL.
Your comment is carefully tailored to incite flame war. You represent HN bias at it's finest, reading something and giving a casual dismissal without really interpreting it.
But the topic here is type systems, and Rust's is more like C++'s than any other, and vice versa. C++ compilers used to have error messages that were hard to interpret, but competition between compilers has improved them.
Meanwhile, C++ itself has changed enabling better error messages because it is clearer what your code is trying to do.
You just can't stop can you? There's really zero need to say this other then trying to be an ass. I don't hate C++. I chose to write C++ as my day to day job after quite some time doing python because it's a challenge. There is no hate. But I have no loyalty to the language either. That is my way. No loyalty and therefore no bias. C++ is definitively less safe then python. This is fact and that is why I program in it.
I am also talking about type systems. Not just C++. However I AM using C++ as an example. It is flawed despite a stronger type system then python. The paradoxical dichotomy between the two languages is the quintessential example of my point. The type system is not essential to safety. It's an illusion. Types are simply sugar on top of it all because in essence you're just relying on error messages to resolve all the errors. Whether those errors happen at compile time or runtime aren't a big deal.
That is the point. I'm not waging a war against C++. I'm EXPLAINING to people like you who don't bother to read carefully or think carefully.
>But the topic here is type systems, and Rust's is more like C++'s than any other, and vice versa.
Categorically wrong. Rust's type system is derived from the Hindley Milner. See: https://en.wikipedia.org/wiki/Hindley%E2%80%93Milner_type_sy.... Haskell is the the language famous for using this type system. In short Rust's type system comes from functional programming and C++'s type system is derived from OOP origins.
>but competition between compilers has improved them.
I use C++ everyday for my work. It may have improved but it's still overall horrible.
Every programmer who has ever single-handedly written a 100,000+ LOC software system will tell you the same thing: shift as much responsibility on the compiler as you can and have the compiler check the code you write to any extent technologically possible.
Getting rid of bugs by experiencing, diagnosing and fixing them takes at least ten times more effort than getting rid of bugs by not making them in the first place, through expressing the problem at hand with a strong type system.
When you also consider the never ending necessity to introduce change to an already written software system, thus the necessity to refactor code (in the sense of altering the previously assumed meaning of its idioms), the critical advantage of a strong type system becomes self-evident.
(Yes, Rust 4ev3r! ;))