> You don't know what it means probably because you don't have experience with C...

deltasevennine · on Nov 6, 2022

>Yes. But for the runtime error to occur, you need to trigger it by passing the wrong object. Unless you have a test case for every possible wrong object in every possible call sequence (approximately nobody has such thorough test coverage)

And I'm saying from a practical standpoint manual tests and unit tests PRACTICALLY cover most of what you need.

Think about it. Examine addOne(x: int) -> int. The domain of the addition function is huge. Almost infinite. Thus from a probabilistic standpoint why would you write unit tests with one or two numbers? it makes no sense as the your only testing a probability of 2 out of infinite of the domain. But that probability is flawed because it is in direct conflict with our behavior and intuition. Unit tests are an industry standard because it works.

The explanation for why it works is statistical. Let's say I have a function f:

   assert(f(6) == 5).

The domain and the range are practically infinite. Thus for f(6) to randomly produce 5 is a very low probability because of the huge number of possibilities. This must mean f is not random. With a couple of unit tests verifying confirming that f outputs non-random low probability results demonstrates that the statistical sample you took has high confidence. So statistically unit tests are basically practically almost as good as static checking. They are quite close.

This is what I'm saying. Yes static checks catch more. But not that much more. Unit tests and manual tests cover the "practical" (keyword) majority of what you need to ensure correctness without going for an all out proof.

>If you had been catching these during compile time, like a static type system allows, that can never happen. >I started developing in C++ in 1992, so I have a few years with it.

The other part of what I'm saying is that most errors that are non-trivial happen outside of a type system. Seg faults, memory leaks, race conditions etc... These errors happen outside of a type system. C++ is notorious for hiding these types of errors. You should know about this if you did C++.

Python solves the problem of segfaults completely and reduces the prevalence of memory leaks with the GC.

So to give a rough anecdotal number, I'm saying a type system practically only catches roughly 10% of errors that otherwise would not have been caught by a dynamically typed system. That is why the type checker isn't the deal breaker in my opinion.

jjav · on Nov 6, 2022

I don't understand why you're talking about statistical sampling. Aside from random functions, functions are deterministic, unit testing isn't about random sampling. That's not the problem here.

Problem is you have a python function that takes, say, 5 arguments. The first one is supposed to be an object representing json data so that's how it is used in the implementation. You may have some unit tests passing a few of those json objects. Great.

Next month some code elsewhere changes and that function ends up getting called with a string containing json instead, so now it blows up in production, you have an outage until someone fixed it. Not great. You might think maybe you were so careful that you actually earlier had unit tests passing a string instead, so maybe it could've been caught before causing an outage. But unlikely.

Following month some code elsewhere ends up pulling a different json library which produces subtly incompatible json objects and one of those gets passed in, again blowing up in production. You definitely didn't have unit tests for this one because two months ago when the code was written you had never heard of this incompatible json library. Another outage, CEO is getting angry.

And this is one of the 5 arguments, same applies for all of them so there is exponential complexity in attempting to cover every scenario with unit tests. So you can't.

Had this been written in a statically typed language, none of this can ever happen. It's the wrong object, it won't compile, no outage, happy CEO.

This isn't a theoretical example, it's happening in our service very regularly. It was a huge mistake to use python for production code but it's too expensive to change now, at least for now.

deltasevennine · on Nov 6, 2022

> I don't understand why you're talking about statistical sampling. Aside from random functions, functions are deterministic, unit testing isn't about random sampling. That's not the problem here.

Completely and utterly incorrect. You are not understanding. Your preconceived notion that unit testing has nothing to do with random sampling is WRONG. Unit Testing IS Random sampling.

If you want 100% coverage on your unit tests you need to test EVERY POSSIBILITY. You don't. Because every possibility is too much. Instead you test a few possibilities. How you select those few possibilities is "random." You sample a few random possibilities OUT OF a domain. Unit Testing IS random sampling. They are one in the same. That random sample says something about the entire population of possible inputs.

>Next month some code elsewhere changes and that function ends up getting called with a string containing json instead, so now it blows up in production, you have an outage until someone fixed it. Not great. You might think maybe you were so careful that you actually earlier had unit tests passing a string instead, so maybe it could've been caught before causing an outage. But unlikely.

Rare. In theory what you write is true. In practice people are careful not to do this; and unit tests mostly prevent this. I can prove it to you. Entire web stacks are written in python without types. That means most of those unit tests were successful. Random Sampling statistically covers most of what you need.

If it blows up production the fix for python happens in minutes. A seg fault in C++, well that won't happen in minutes. Even locating the offending line, let alone the fix could take days.

>Following month some code elsewhere ends up pulling a different json library which produces subtly incompatible json objects and one of those gets passed in, again blowing up in production. You definitely didn't have unit tests for this one because two months ago when the code was written you had never heard of this incompatible json library. Another outage, CEO is getting angry.

Yeah except first off in practice most people tend to not be so stupid as to do this, additionally unit tests will catch this. How do I know? Because companies like yelp have had typeless python as webstacks for years and years and years and this mostly works. C++ isn't used because it's mostly a bigger nightmare.

There are plenty of companies for years and years have functioned very successfully using python without types. To say that those companies are all wrong is a mistake. Your company is likely doing something wrong... python functions just fine with or without types.

>And this is one of the 5 arguments, same applies for all of them so there is exponential complexity in attempting to cover every scenario with unit tests. So you can't.

I think you should think very carefully about what I said. You're not understanding it. Unit testing Works. You know this. It's used in industry, there's a reason why WE use it. But your logic here is implying something false.

You're implying that because of exponential complexity it's useless to write unit tests. Because you are only covering a fraction of possible inputs (aka domain). But then this doesn't make sense because we both know unit testing works to an extent.

What you're not getting is WHY it works. It works because it's a statistical sample of all possible inputs. It's like taking a statistical sample of the population of people. A small sample of people says something about the ENTIRE population of people. Just like how a small amount of unit tests Says something about the correctness of the entire population of Possible inputs.

>This isn't a theoretical example, it's happening in our service very regularly. It was a huge mistake to use python for production code but it's too expensive to change now, at least for now.

The problem here is there are practical examples of python in production that do work. Entire frameworks have been written in python. Django. You look at your company but blindly ignore the rest of the industry. Explain why this is so popular if it doesn't work: https://www.djangoproject.com/ It literally makes no sense.

Also if you're so in love with types you can actually use python with type annotations and an external type checker like mypy. These types can be added to your code base without changing your code. Python types with an external checker are actually more powerful then C++ types. It will give you equivalent type safety (with greater flexibility then C++) to a static language if you choose to go this route. I believe both yelp and Instagram decided to do add type annotations and type checking to their code and CI pipeline to grab the additional 10% of safety you get from types.

But do note, both of those companies handled production python JUST FINE before python type annotations. You'd do well do analyze why your company has so many problems and why yelp and instagram supported a typeless python stack just fine.