I don't understand why you're talking about statistical sampling. Aside from ran...

deltasevennine · on Nov 6, 2022

> I don't understand why you're talking about statistical sampling. Aside from random functions, functions are deterministic, unit testing isn't about random sampling. That's not the problem here.

Completely and utterly incorrect. You are not understanding. Your preconceived notion that unit testing has nothing to do with random sampling is WRONG. Unit Testing IS Random sampling.

If you want 100% coverage on your unit tests you need to test EVERY POSSIBILITY. You don't. Because every possibility is too much. Instead you test a few possibilities. How you select those few possibilities is "random." You sample a few random possibilities OUT OF a domain. Unit Testing IS random sampling. They are one in the same. That random sample says something about the entire population of possible inputs.

>Next month some code elsewhere changes and that function ends up getting called with a string containing json instead, so now it blows up in production, you have an outage until someone fixed it. Not great. You might think maybe you were so careful that you actually earlier had unit tests passing a string instead, so maybe it could've been caught before causing an outage. But unlikely.

Rare. In theory what you write is true. In practice people are careful not to do this; and unit tests mostly prevent this. I can prove it to you. Entire web stacks are written in python without types. That means most of those unit tests were successful. Random Sampling statistically covers most of what you need.

If it blows up production the fix for python happens in minutes. A seg fault in C++, well that won't happen in minutes. Even locating the offending line, let alone the fix could take days.

>Following month some code elsewhere ends up pulling a different json library which produces subtly incompatible json objects and one of those gets passed in, again blowing up in production. You definitely didn't have unit tests for this one because two months ago when the code was written you had never heard of this incompatible json library. Another outage, CEO is getting angry.

Yeah except first off in practice most people tend to not be so stupid as to do this, additionally unit tests will catch this. How do I know? Because companies like yelp have had typeless python as webstacks for years and years and years and this mostly works. C++ isn't used because it's mostly a bigger nightmare.

There are plenty of companies for years and years have functioned very successfully using python without types. To say that those companies are all wrong is a mistake. Your company is likely doing something wrong... python functions just fine with or without types.

>And this is one of the 5 arguments, same applies for all of them so there is exponential complexity in attempting to cover every scenario with unit tests. So you can't.

I think you should think very carefully about what I said. You're not understanding it. Unit testing Works. You know this. It's used in industry, there's a reason why WE use it. But your logic here is implying something false.

You're implying that because of exponential complexity it's useless to write unit tests. Because you are only covering a fraction of possible inputs (aka domain). But then this doesn't make sense because we both know unit testing works to an extent.

What you're not getting is WHY it works. It works because it's a statistical sample of all possible inputs. It's like taking a statistical sample of the population of people. A small sample of people says something about the ENTIRE population of people. Just like how a small amount of unit tests Says something about the correctness of the entire population of Possible inputs.

>This isn't a theoretical example, it's happening in our service very regularly. It was a huge mistake to use python for production code but it's too expensive to change now, at least for now.

The problem here is there are practical examples of python in production that do work. Entire frameworks have been written in python. Django. You look at your company but blindly ignore the rest of the industry. Explain why this is so popular if it doesn't work: https://www.djangoproject.com/ It literally makes no sense.

Also if you're so in love with types you can actually use python with type annotations and an external type checker like mypy. These types can be added to your code base without changing your code. Python types with an external checker are actually more powerful then C++ types. It will give you equivalent type safety (with greater flexibility then C++) to a static language if you choose to go this route. I believe both yelp and Instagram decided to do add type annotations and type checking to their code and CI pipeline to grab the additional 10% of safety you get from types.

But do note, both of those companies handled production python JUST FINE before python type annotations. You'd do well do analyze why your company has so many problems and why yelp and instagram supported a typeless python stack just fine.