Hacker News new | past | comments | ask | show | jobs | submit login
Type checking in Python (senko.net)
97 points by senko on May 18, 2014 | hide | past | favorite | 77 comments



As a somewhat new python programmer (~2 years) coming from statically compiled languages like C#, C++ and Pascal this is something I have thought about a lot. My initial impulse was to wish I had static type checking. But I have come to the conclusion that I just hadn't fully appreciated the differences between interpreted and compiled languages.

The article says that the lack of type checking "lets through" a certain class of errors. However, let's be specific here: there is no build-time static analysis phase in the transformation of python source to executable code. So "lets through" means the same thing whether you have strong typing or not: the error is going to be found at runtime. What we're really talking about, then, is converting AttributeError ("YourObject has no attribute 'startswith'") into something more specific ("Hey, this is supposed to be a string!"). Honestly, that seems to me to be a pretty minor increase in diagnostic information.

So the bottom line for me is that I feel I didn't really understand duck typing. I used to joke that the term actually meant "no typing," and that is in some respects true. But what it really means is "the thing can do what the method expects it to be able to do." If the thing can't serve in the expected role, then the existing errors that result are sufficiently explanatory, imo.


Sure, but you have to wait until runtime to detect those errors. Which means your codebase can have a nasty bug that you might not even notice if you don't run a particular test case. Static typing detects the same thing, but statically at compile time.


Yes, but there is no compile time in python. Are we discussing type checking in python, or another language, or some ideal hybrid? I don't need to be convinced of the value of static type checking in compiled languages.


Sure there is. When do you think parse errors happen? All languages are compiled in one way or another.


Yes, but that's not a very useful observation. I think most developers understand that there is a process to transform source code into executable format. The distinction between interpreted and compiled languages, as commonly understood, is precisely about when that happens. There is a "compilation" step in python, but it happens when the code is executed, and therefore there is not a distinct "compile time" step during which a programmer has an opportunity to detect errors and prevent them from occurring at runtime. In any case, we're slicing hairs here, so in defense of my position I'll simply note that if your definition is sufficient, then there is no need to distinguish between interpreted and compiled languages at all.


That's simply not true. There is a compilation step in python that is completely distinct from the runtime (i.e. when the bytecode files are compiled). There's nothing special about python that prevents static compile time analysis from happening.


Oh, and to address your other point: the line between compiled and interpreted languages has been blurry since day one, and it's only getting blurrier. No modern, highly-used interpreted language gets by these days without compilation to bytecode. Rather than "compilation vs. interpretation", the battle should be "interpretation vs. native execution", which is more meaningful, but the line there is actually still pretty blurry (see JIT compilation, which straddles that line).


I understand your point. There is a step that compiles python source to bytecode, and it does take place before the bytecode is executed. I just think this is a somewhat pedantic observation. A large part of the value of the language lies in the immediacy of its "interpreted" nature (I'll put that in quotes from now on, thanks to you :)). You run the script and it either fails or executes. In the version of python I use on Ubuntu scripts aren't compiled to bytecode for the first time until they are imported, so other code in the project has likely already run before the compiler would have a chance to review the use of types in the imported module. There isn't a project-wide process of compiling all the files and making sure everything is used correctly before any code executes. Maybe it could be made to work that way, but wouldn't that alter the language and the way it is used? In the end I come back to: is AttributeError really not descriptive enough? You can have the last word here.


Okay. AttributeError is good enough, except when it's not. Python is designed in such a way that makes you actively seek out AttributeErrors, rather than having them diagnosed at compile-time. If that works for you, then that's great, but it doesn't always work. Whether it's in the main branch of CPython, or some sort of experimental fork, or some completely distinct static analysis tool, Pythonistas should support efforts to add (potentially optional) increased static checking to the Python interpreter, because it's a good thing.

Will it fit in with the current Python ecosystem? Will it have to change the way the language is used? Maybe, but that doesn't mean we shouldn't experiment with static analysis. We're hackers, after all, and this is something that probably deserves to be hacked on.


The py_compile module allows you to compile Python to bytecode files without executing the code. This isn't the usual way of doing things, but it works well to add separation between compile-time and run-time if desired.


The issue is about when the error is discovered.

You can have "duck typing" in Haskell, for example, using type-classes.

Instead of "letting the errors through" to run-time, you detect them at compile-time.


Yes, but I am talking about python as it is now, not what it might be if it combined the features of other languages. I don't know anything beyond the most basic characteristics of Haskell, so I can't really comment further.


Obviously there are some differences, but type classes are pretty similar to the way python normally works - type classes specify methods that a type has to have. So in the case of 'print', the type specifies that print can be called on anything that implements the method that turns a value into a string. The neat thing, of course, is that if you try to call it on something that doesn't implement that method, you get a compile error (and with proper tooling, I actually see red squiggles in my editor if I make this mistake, which is about as good as it can get).


This module doesn't seem to make it possible to type-annotate map and filter, which are Python builtins.

I believe that dynamic typing results from frustration over generics, rather than over static typing in general. (Type stuttering is another problem, but modern languages solve it with local type inference.) I mean, annotating non-generic functions is always straightforward doesn't introduce much complexity:

    # s should be str
    # returns nothing
    def print(s):
        ...

    # just becomes (using Py3 annotation syntax)
    def print(s: str) -> None:
        ...
However, it's much harder to write a type annotation of generic functions. Worse, remember that Python's map is variadic.

    # for any types t0, t1, t2, ... tn:
    # fn is a function that takes arguments of types t0, t1, ... t(n-1)
    # and returns a value of type tn;
    # seqs are iterables of types t0, t1, ... t(n-1);
    # returns a list of type tn.
    def map(fn, *seqs):
        ...

    # How to write this signature is non-obvious
Many mature static typing systems would allow you to express such types (usually called parameterized types). But any one of them would require more than those trivial notations in print.

That's when the dynamic typing people get annoyed and go "fxxk static typing systems, I can handle this in my mind". Which is about 65% (totally random estimation) the point of dynamic typing in my opinion.

Any type checker for dynamic typed languages that doesn't seriously try to solve the generics problem is not genuinely interesting.


Even that trivial `print` annotation breaks duck-typing; the print function can actually accept anything that responds to `__str__`.

Similarly, is there any reason the example `add` function in the OP shouldn't be able to add strings, or lists?

I've seen a lot of these quick-runtime-type-checking libraries, but they always have the same problem as manual type-checking: they make the constraints much stricter than they need to be and prevent entire classes of useful behavior.


In some languages, like Scala, you can use structural type annotations to handle this situation.


Python has the abc module, but it would be a bit wordier than "arg=list". Maybe if the builtins were treated as the corresponding abcs.


I think `abc` would be overkill for this. My understanding is that you might use that to add behavior to a bunch of classes that aren't related hiearchically. But for type safety, all you need to do is check for the single magic method.


You can use it for that, but you can also use it to check for rough interfaces. (The classes are rigged to fool isinstance() via dark magic.) It's fairly common that you want more than one magic method, but don't care too much what the type is as long as it's sorta like a `list` — thus, MutableMapping.


There is nothing special about map and filter, they are not idiomatic Python because there are better ways to do the same things.

Python 3 annotations are already not used in any specific predetermined way, so if you create a system to do something with type annotations then you can implement List(str) or whatever it is you want an annotation to work like.


> There is nothing special about map and filter, they are not idiomatic Python because there are better ways to do the same things.

True, list comprehension is often used instead. But a complete typing system should be able to type-check that too. The point is not changed.

Also, I only used map and filter as an example. Python programmers tend to write a lot of generics functions without actually knowing it.

    def f(g, *args):
        ...
        r = g(*args)
        ...
        return r
This function is actually quite generic, akin to map.

And we are not yet getting into the realm of "trait" or "concept" or "typeclass" or whatever.

    def sum(*nums):
        s = 0
        for n in nums:
            s += n
        return s
Remember, although s is definitely an int, s + n may yield some completely unrelated type if n implements __radd__. The function here involves potentially infinite number of types, and their __add__ or __radd__ traits (reusing the name of the Python magic method). To fully express its type without unnecessary constraints is an interesting exercise. Since Python allows dynamic parameter list building (like sum(*nums)), dependent typing may be necessary for that purpose.


Are there languages that include the strong type checking, inference, and typeclass functionality of Haskell while also being a little more forgiving in terms of purity, so that writing ordinary programs is not quite so troublesome as in Haskell?


Rust has a decent amount of type inference, traits and strong guarantees.

Ocaml/SML are both functional languages similar to haskell, and allow IO pretty much anywhere. They don't have typeclasses, (although ML people would say there's nothing you can't do with the ML module system[0]).

[0]http://lambda-the-ultimate.org/node/1558


Scala (although it has limited type inference to require people to specify at least the inputs to methods for readability).


> This module doesn't seem to make it possible to type-annotate map and filter, which are Python builtins.

In Haskell map is

    map :: (a->b) -> [a] -> [b]
How could we do something like that in a Python type extension? One way would be to just use the Haskell notation in quotes:

    @type("(a->b) -> [a] -> [b]")
    def myMap(fn, as):
        #...etc...
An alternative would be to define a function Fun for describing functional types, and gen.{something} for a generic type, e.g.:

    @type(Fun(gen.a, gen.b), [gen.a], ret=[gen.b])
    def myMap ...etc...
But that notation doesn't (at least to me) look as clean as the Haskell.


You have a strange definition of interesting. Even dynamically typed languages are used in a static way most of the time; this is why JavaScript engines can perform well on most code. A type system without generics can still find plenty of bugs.


I'd agree with his definition and your point simultaneously. Type system with only 0-th order types are still useful for optimization and error capture, but higher order structures and higher order polymorphism are both incredibly useful and more powerful than 0-th order types.


It's not hard to imagine an extension to ML-style types that could cover variadicity:

    map : ('a... -> 'b) -> ('a list)... -> 'b list
Where <type>... denotes <type1> -> <type2> -> ... -> <typeN> spliced into the type signature, with each type variable 'a in <type> replaced with 'a1, 'a2, ... 'aN. You could cover tuples too:

    zip : ('a list)... -> 'a,,, list
Bit ugly, and writing a type checker for it could be fun, but it seems workable.


Take a look at Typed Racket for a variation on this.



Typed Racket is pretty decent.

However, behold the byzantine Scheme numerical tower and the effort to type it: http://www.ccs.neu.edu/home/stamourv/papers/numeric-tower.pd...


Fully typing variadicity in Python is only possible with dependent typing, since Python allows you to build the argument list dynamically. Generally speaking you can't even determine the number of arguments statically:

    args = []
    while some_condition():
        args.append('x')
    f(*args)
Typed Racket allows you type some common cases, though.


"type stuttering" eludes my ability to Google. Could you provide a link or a quick explanation?


    uint8_t foo = (uint8_t)69;
In theory, there should be no need to specify the type twice. The ideal circumstance is something like

    uint8_t foo = 69;


Until java 7 you had to write:

    List<String> list = new ArrayList<String>();
It feels silly to have to write the type parameter twice. But now you can say:

    List<String> list = new ArrayList<>();
It's usually not a huge deal, but occasionally the stutter can be more pronounced:

    Map<String, List<Map<Integer, Set<Float>>>> map = ...


I've been checking types in Python dynamically for years now. My library, Obiwan https://pypi.python.org/pypi/obiwan/1.0.2 uses type annotations, which I think nicer than decorators.


As the article mentions - the big advantage of decorators is Python 2.x compatibility. (not that I want to start another discussion of THAT topic)

EDIT - looks like obiwan will allow decorators as well as annotations. I do prefer the syntax of typedecorator though. It seems less cluttered.


It's a matter of taste. I think the decorator approach looks more bolted-on and ad hoc. I prefer C style declarations over annotations, though, because they have the least amount of noise.


This actually looks pretty good.


There's one thing that doesn't sit well with me about adding type-checking to the CPython implementation of the Python language. Type-checking is generally used at compile-time to check that your program won't do nasty things, and then run-time checks are done only when you do something overly dynamic that the compiler couldn't check (i.e. casting an Object to some concrete class in Java). However, all type-checking implementations for CPython work purely at runtime! And type-checks are invoked each time your function is called! This seems not only wasteful, but also defeats the purpose of type-checking before running your program.

I would rather see a more generic pre/post-condition contract system, to be used in select top-level functions, that gives a lot more flexibility in expressing what is supposed to happen (i.e. I expect to be given a not-None value with a __str__ method that doesn't throw and will return an iterator yielding consecutive not-empty, not-None strings), including what assumptions you're making when you call a function that you got from a random object (i.e. here I'm calling a function that I got somehow as a parameter and I'm going to assume it can take a string of the format "ipv4 address:port" and returns an object that is a database connection), together with a mechanism for recovering from broken contracts - we are doing runtime checks, so there's no reason to restrict ourselves to Java-style type constraints, which don't even work for Python in general. Or, alternatively, a lighter system that can be checked at import time once before the program is started "for real". Preferably both of those.


This is the single biggest thing I miss coming from modern Perl. Pythonistas seem very anti-type validation/checking, but with Moose classes the type stuff simplifies a lot, is declarative, self documenting, and vastly reduces certain classes of bugs.

The closest I've seen is the traits thing from Enthought, but there doesn't seem to be much buy-in from python users.


Former Enthought employee here, I always liked programming with Traits. Although nowadays I use the similar but much for light weight atom[0], which was created by another former Enthoughter.

[0]: https://github.com/nucleic/atom


Awesome pointer, thanks!


Threes also the mypy project which offers optional static typing in python, if you're interested in such stuff.


If you write APIs which only accept this one type, then you are explicitly disallowing people from using types that are functionally equivalent as far as the API is concerned. Why should you do that?


That's not how these trait/type systems work.


I recently wrote this post "statically type checking python" http://renesd.blogspot.de/2014/05/statically-checking-python...

Turns out with modern tools you _can_ statically type check python :)


Appreciate that it has a "logging only" mode.

I wish there was something like this, but parsing the docstrings with the same format as pycharm: http://www.jetbrains.com/pycharm/webhelp/type-hinting-in-pyc...


I'm currently entertaining the idea of it outputting docstrings (adding :param:, :rtype: and other applicable tags if they're not already there).


I wrote a similar python typechecker some time ago (https://github.com/cabalamat/ulib/blob/master/debugdec.md). My syntax is somewhat less verbose so this:

    @returns(int)
    @params(a=int, b=int)
    def add(a, b):
        return a + b
becomes:

    @typ(int, int, ret=int)
    def add(a, b):
        return a + b
I was inspired to do it like that by Haskell's very clean type syntax.

Senko's version has the advantage that you can compose types in it, e.g. {str:int} is a dictionary whose keys are strings and values are integers.


For Python 3, I've added optional type checking based on function annotations and decorators to the Ensure library: https://github.com/kislyuk/ensure#enforcing-function-annotat....

    @ensure_annotations
    def f(x: int, y: float) -> float:
        return x+y

    f(1, 2.3)
    >>> 3.3
    f(1, 2)
    >>> ensure.EnsureError: Argument y to <function f at 0x109b7c710> does not match annotation type <class 'float'>
I think it works better than other approaches mentioned here because it (1) doesn't repeat the contents of the function signature, (2) doesn't install any magic system-wide hooks (and the attendant performance issues), and (3) is completely optional.


Didn't know Guido thought of static type checking such a long time ago. I'm thinking now that one solution to two problems (this one, and slow python 3 adoption) would have been to include optional static typing (like in dart) in python 3.


Having seen many type-checking initiatives for Python come and go over the years, I've come to believe that type-checking is like security in that it is difficult to add after-the-fact and works best when designed-in from day one.

When function annotations were introduced, I ended up removing all their uses from the standard library because every early attempt to use it was too simplistic and failed to support any type-system use case except for extra documentation.


Too bad that this isn't type checking in the traditional sense though. That's just runtime tag checking.


Union types would be an incredibly useful extension of this.


Indeed! pysonar2 uses them with success.


This is neat, but why not just use Go!


I am an avid gopher, but this is not a useful response. Believe it or not, there are many places and reasons where Python works out better than Go, and making it a bit easier to use Python in those situations is valuable.


You're getting downvoted, but I think you have a point. Any attempts to enact type-checking in a language that doesn't require it is doomed to failure. You'll get all the safety of a dynamic language, with the flexibility of a typed language.


> You'll get all the safety of a dynamic language, with the flexibility of a typed language.

That sounds a pretty good description of Go, so I'm wondering what his point is in the first place.


Go is neat, but why not just use Haskell? (etc.)


Monads

Or more specifically, how non-intuitive it is to use the IO stuff

I don't care what it uses, or what is it called, I care about being able to use it with what I know

So yeah, I'll go for Go instead of Haskell


The IO stuff has almost nothing to do with monads.

It is unfamiliar, but if you use static typing, at least reap the benefits of better error checking, no runtime null dereferences, etc.


Yeah, because why would we ever need the compiler to help us identifying operations which make it harder to reason about our programs!


Haskell is ok, but you should check out the dynamic nature of Python


I think that the dynamic nature of python is why some folks prefer haskell over it.


yeah, that was my joke... no one got it. basically I'm talking about the never ending hunt for a perfect language


Well, not that many. So that settles it then.


That is why i said some.


You should check out the dynamic nature of Typeable.

"The module Data.Dynamic uses Typeable for an implementation of dynamics."

http://hackage.haskell.org/package/base-4.7.0.0/docs/Data-Ty...


The case of a large, already existing Python codebase comes to mind.


You make Python sound like FORTRAN. :)


err...It's a tradeoff.

I think when the first dynamic programing language was invented, the loose typing system must be thought as a big advantage.

Type checking is necessary in some cases, but I really enjoy dynamic typing a lot. So I think a bit tradeoff like this is acceptable. After all, we have to tradeoff everywhere when it comes to computer science such as the time-space tradeoff of an algorithm.


I think people who enjoy dynamic typing have had bad experiences with bad static type systems earlier (C, C++, Java).

Once you use a good static type system (ML, OCaml, Haskell) you see that dynamic typing isn't responsible for the joy, but the ability to express rich ideas without repeating redundant type declarations.


I would like to know why in every post about Python on HN there's someone shouting "just use Go"!


It seems like quite a few people who are used to dynamic languages have found out that static typing and verbosity aren't strictly synonyms by using Go.


Static-typed languages are not very good for scripting and "gluing" systems.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: