Looking in the implementation of the Python version of ABC[0], you'll see that `__subclasshook__` doesn't do any language magic, it just registers the method to be called in `__instancecheck__`. So if you just implement `__instancecheck__` directly, you get the same behavior, but without the caching around it.
class OneWayMeta(type):
seen_classes = set()
@classmethod
def __instancecheck__(cls, instance):
C = instance.__class__
print(f"trying {C}")
if C in cls.seen_classes:
return False
cls.seen_classes |= {C}
return True
class OneWay(metaclass=OneWayMeta):
pass
def f(x):
match x:
case OneWay():
print(f"{x} is a new class")
case _:
print(f"we've seen {x}'s class before")
if __name__ == "__main__":
f("abc")
f([1, 2, 3])
f("efg")
When running:
trying <class 'str'>
abc is a new class
trying <class 'list'>
[1, 2, 3] is a new class
trying <class 'str'>
we've seen efg's class before
Am I missing a particular point the article is making, or did the author overlook this?
Can you help me understand what you mean here? The author manages to make something match `case NotIterable()` by...modifying the `NotIterable(ABC)` class. That's exactly what I would have expected. What you mean by "one doesn't need to touch ClassA, or any descendants of it, at all."?
Not quite! The point of the article is that you don’t need to change the class of obj to override this instance check. If you have a look, ClassA always uses the hook implementation in the examples. And this can be shortened to just using instancecheck.
I think the point in the first section is that the way ABC instancecheck/subclasshook interacts with pattern matching is surprising for anyone not familiar with ABC. It allows you to check for a match with arbitrary functions, beyond simply checking if an object is an instance of a given type. In the final section where he has issues with caching, I presume he hasn't read about instancecheck, your code would fix his issue.
Reading this I can't help feeling that Python puts the "simplicity" in all the places that don't matter. Simplicity is the reason given for not having useful language features like pattern matches as expressions, or lambdas with multiple expressions, but I've never seen these features[1] cause problems in other languages. And then we have this... Surely semantic simplicity is the simplicity that actually matters, but the way the simplicity argument is used in Python is often to enforce arbitrary syntactic and semantic complexity.
[1] It's almost ridiculous to call these features as they're just a consequence of the underlying language model. E.g. if you have expression blocks you have lambdas with multiple expressions with no extra work.
Python has plenty of complexity. It tends to stick it in places where new programmers don't run into it. Python is one of the few languages I know that lets you just... poke at the internal machinery like this.
Have you ever messed around w/ Julia? It doesn't really have much "internal machinery" since it relies on a data-oriented/functional language structure. I find myself reading the Julia source code on a regular basis since it's very readable and succinct. I often find it more useful than the documentation itself.
It can take a bit to fully understand/appreciate Julia's multiple dispatch, but once you do you pretty much understand the entirety of the machinery.
These are fair examples, but 'Python puts the "simplicity" in all the places that don't matter' is a bold statement that isn't well supported by these. I work primarily writing hit-and-run R&D code that has a shelf-life of about 6 months - python is either one of, or the best scripting languages for this, precisely because it puts a lot of simplicity in places that DO matter.
Python is hands down the best ecosystem for "hacking something together to prove a point but won't be maintained".
It's (a) extremely easy to learn (takes about 2hrs for a Java/C# dev to be productive) and (b) has a very deep ecosystem and wrappers for pretty much any native library you want. Then (c) it works great under Windows/OSX/Linux as long as you're on an x86/x64 platform. The clincher is (d) it's the de-facto beginners language so all the newbies can at least read it and hack away.
The competition:
* PHP is similarly easy but very limited.
* Ruby is in my experience a slow and buggy mess with a community who are welcoming but suffer from a reality-distortion field (might be different now, my experiences were 15+ years ago).
* Java has accidental complexity getting started.
* C# is competitive but for the low-skilled / newbies too hard and still has an irritating NIH syndrome (e.g. pushing people to MS's half-baked crypto APIs instead of first-class ports/wrappers of libsodium / BouncyCastle).
* Javascript/Typescript are probably the closest, they have better package management for the "hack it together" use-cases but the language itself poorly designed what with all of the unintuitive "surprises".
My kids are just about old enough to learn coding and I'm going to start them with Python before moving on to C, ASM then if they want to develop anything serious; C# / Java / Rust / TypeScript.
I agree thoroughly with this - it's fantastic for building something quickly. If I need a quick script to do one-off data processing there's nothing better. My biggest problems with it IMO with respect to maintainable software are:
- The syntax required to build libraries feels like messing with the internals of the language. Defining various methods with reserved names that have 4 underscores each doesn't really feel like something you are supposed to do. The code becomes harder to read and messy IMO.
- Runtime type checking is great for iterating quickly, but bad for stable software.
- Encapsulation is only enforced through external tools, so if you aren't using those religiously you end up with problems with tightly coupled modules.
- Dependency management is not a good experience. Understanding the different rules about where python pulls modules from is hard. Venv makes things a bit better, but even then it's still a bit opaque. It means that I often spend more time on getting external dependencies aligned properly than writing any python when working on a python codebase locally.
I have to admit - I use C# for that nowadays, at least as long as I don't have to follow the standard coding guidelines (which are great for software with a mid/long life-cycle). Once you get over the learning curve and don't have to apply good engineering practices (i.e. write code comparable to Python/Go norms) then it's way more productive (time-to-working-solution) and dependency management is great. The best bit - a huge amount of effort is going into reducing boilerplate so it's getting better and better with each release.
If I'm working with less experienced developers or people for whom software engineering is a side issue (researchers/academics, security experts, data-scientists) then it's Python all the way.
Lua needs mentioning here, on a 1-5 scale where more is better, I rank it (a): 4, (b): 2, 3 if you're using LuaJIT, (c): 5, and (d): 5.
That last one is the surprising one: we're about to see a generation of programmers who learned Lua via Roblox when they were 8-13 years old. Roblox is singlehandedly in the process of making Lua the #1 beginners language, and if not the most popular language by number of developers, then at least the most undercounted.
I use Lua all of the time, since it's such an easy language to embed in other projects.
I use it in embedded systems (think 128MiB of memory -- not tiny, but not enormous either) and it's fantastic. I can make changes to logic on the device without cross compiling things and I can make changes quickly to test things out.
I'm in my 40s, and definitely not part of the Roblox generation. I just really like the simplicity of the language and how it's small enough to pick up in an afternoon. More complicated topics like coroutines and upvalues might take a little longer to fully grasp, along with ffi in LuaJIT if you're going that route.
Tcl is my go-to embedded language. I tried Lua a while back but butted heads up against it's "just use a table as a list" idea (that didn't work quite right; but that was a long time ago) and became frustrated with it.
libtcl8.6.so is 1.8MiB on my desktop, and liblua5.1.so.5.1.5 is 186KiB.
Maybe there are ways to shrink libtcl or cut pieces out, but that's quite a difference.
I've found that for most of my tasks the order of things is not terribly important. I suppose that if I really needed this I could add my own ordered list data type to Lua.
What about Julia? It's as readable as Python. Julia is arguably easier than Python in some programming aspects. Both languages can be complicated in more advanced scenarios, but both languages tout an easy start for quick scripts.
Every time I've looked into Julia (it's been a while, last time was around last year), I've hit one or more speedbumps or outright roadblocks in something which comes fairly naturally to python stdlib, or has a library ready to go. If I'm doing just mathy, data-sciency type work, it's usually pretty great. But domains like IO, http (servers or clients), IPC/RPC, database work, AWS, stuff like that always felt at best a bit unpolished.
That's not to say these are impossible in Julia, but there was enough friction to make me not really wanna use it, when python can do all that and more.
Most of my quick one time hack projects involve cleaning up text or gluing different text oriented command line programs together for things that are beyond my shell skills.
Indeed it is a bold statement, but if one can't make overly grand claims on the Internet then where? :-)
I'm interested to know to where you find the simplicity in Python. My guess:
- the ecosystem
- portions of Python that date back over a decade + perhaps some of the modern string handling and maybe data classes
My overall point is that the Python community relentlessly beats the drum on simplicity, but modern Python is not a simple language for any reasonable definition. I believe they have increased the complexity of the language while claiming that these complexity-increasing changes are in service of simplicity. I further believe that mountains of this complexity could be avoided with better language design and a better implementation.
Simpler than Python? Definitely Go. Probably Java and JavaScript. Maybe even C, although the whole "undefined behaviour" thing is a different kind of complexity.
I'd consider the complexity of Python comparable to that of C# and Swift - it's a similar "kitchen sink" language.
Java and C are neither simpler nor easier than Python where the rubber meets the road: making the computer do something you want it to do. Not even close. Java requires a fair amount of arcana just to get started (relative to python) and C is a simple language which pushes all the complexity onto the programmer. Java, like python, has some really deep rabbit holes when you dig into internals, and C has so much complexity in the necessary tooling.
I think Go is possibly simpler than Python. The syntax is smaller, no method overriding, and does not really have internals to the same depth as a VM language.
Not easier, simpler. There are many things that are very easy to do in python, but the language itself is far from simple. However you only need to know maybe 20% of python to start being productive.
That being said, I'm not convinced Java is actually simpler.
But if your code has a shelf life of 6 month, then your code is probably not read and changed as many times, as code, which goes into production settings and might be there for the next couple of years. So actually many things do not matter as much for such throwaway code.
Or performance! There's lots of low-hanging fruit in the Python interpreter that doesn't get improved to preserve the purity of the runtime, or whatever. (Well, at least this might see improvements now. But for a long time people would point at it and laugh.)
Or to enable the kind of extreme dynamism that is illustrated in TFA. How to optimize code properly when even core relationships like "x is of type T" may be nondeterministic.
As someone that follows Python since the 1.6 days and occasionally uses it for scripting, the language is simple only on the surface level, it provides the same magic capabilities of languages like C++, but apparently not many people find their way into the runes tablets.
> But surely Python clamps down on this chicanery, right?
>
> $ py10 abc.py
> 10 is not iterable
> string is iterable
> [1, 2, 3] is iterable
>
> Oh.
>
> Oh my.
I'm sure I'm being dense and missing the obvious but ... what is the author responding to here? What's wrong or bad?
In the context of this article, the result is not surprising, but in general it's probably not most people's expectation that you can define a class, make sure it doesn't subclass any ABCs, but then still have it "match" an ABC. (If you ask me, cases should only match when types are equal -- pattern matching is structural but (in Python) subtyping is anything but.)
While I wouldn't go as far as to say that this is "the point" of ABC, it's certainly relatively important, with __subclass_hook__ being promimently placed near the top of the ABC documetnation.
Control over destructuring isn't entirely new territory for PLs, Scala has Extractor Objects[0], as an example.
I think that it's a bit easy to say "it should just match the type!" when the reality is that even basic classes like list get overwritten in Python. Ultimately many language features have configurable features through dunder methods, and the fact that those get used by other language features is a feature, not a bug IMO.
As usual, don't use libraries that do weird stuff... and every once in a while you'll have the nice DSL that does something useful in this space and it will work well.
The thought experiment about a more restrictive version of this: how does Python tell that an object is a list? If it's through isinstance, then you're hooking into a bunch of tooling that have hooks that can be overwritten. If it's _not_ through isinstance, suddenly you have multiple ways to test if something is a list (which is a problem).
> Abstract base classes complement duck-typing by providing a way to define interfaces when other techniques like hasattr() would be clumsy or subtly wrong (for example with magic methods). ABCs introduce virtual subclasses, which are classes that don’t inherit from a class but are still recognized by isinstance() and issubclass().
You simply don't "subclass ABCs" ever (except when defining an ABC); if you do it's no longer a virtual subclass and you're no longer implementing the ABC. As a concrete example, when did you last "subclass" collections.abc.Iterable? You did not, you implemented __iter__.
Python also has structural typing, often called duck typing - if you have a runtime-checkable protocol, an object will also match isinstance even when there is no inheritance.
> but in general it's probably not most people's expectation that you can define a class, make sure it doesn't subclass any ABCs, but then still have it "match" an ABC.
Abstract Base Classes were an attempt to formalize python's duck typing. Matching things that don't inherit from them is their whole purpose.
Totally agree. That behaviour is exactly what I would expect.
All in all, I really don't get the dramatic tone in this article. It turns out that in python (as in most languages that give you access to the internals) if you mess with the internals the results are well messy. But literally nothing in this article suprised me at all.
I don't think the author is intending to say there is anything wrong in this particular example; he is, rather, anticipating some ways in which this might obfuscate code, either accidentally or deliberately. The rest of the article investigates some of these possibilities and demonstrates that you can, indeed, do so.
Perhaps it would have been a bit clearer, and less easy to dismiss as a fuss over nothing, if the author had left the 'not' out of the definition of NotIterable.__subclasshook__(), or defined an IsIterable class with the 'not' in place?
The only thing that can sometimes bite you here is that str is iterable, if you expect a list of str and you only get a str and suddenly you iterate over the chars.
I am not sure if it wouldn't have been better to make the conversion explicit here.
> The only thing that can sometimes bite you here is that str is iterable, if you expect a list of str and you only get a str and suddenly you iterate over the chars.
Python isn't even unique in this regard! You can iterate over a string whether you're working in JavaScript, C++, or Go. (And that's not even getting into cases like Haskell where String is merely syntactic sugar for [Char].)
He is just demonstrating that __subclass__ hook has control over what is counted as a match.
Which he explained in another article that it allows the author of the abstract class to hijack calls to isinstance for any instances created from subclasses.
The point though is that the tone of his article seems to suggest that this is some scary "gotcha" of the language, whereas some of us consider this to just be the expected behavior.
Well to me the "gotcha" wasn't that you could control the match from the abstract class, it was all the silly things that you could do. Which for me was the point of the article. I mean the first palindrome example was pretty cool no?
The real reason why you shouldn't use this is that Python is wrong about typing and classes. In OOP, classes are not types; interfaces are [0]. ABCs are a
poor replica of this. Guido, having clearly avoided any relevant literature, states, in PEP-3119 [1]:
> ABCs are not intrinsically incompatible with Interfaces, but there is considerable overlap. For now, I’ll leave it to proponents of Interfaces to explain why Interfaces are better.
Let me try: interfaces are better because the protocol of an object isn't tied to its implementation, but in a properly encapsulated world an interface represents the information available about a class as a type [2]. A subclass may just be reusing an implementation without adhering to the same protocol, or two interchangeable classes might have no inheritance relationship.
Python is in a lot of ways a nice language, and I've certainly enjoyed programming in it, but many points of its design seem intentionally unobservant of prior work and research in programming languages, though perhaps it's equally an indictment of that research that the most popular languages ignore it so much. Typescript handles this much better, though neither it or Java eschew using classes as types entirely.
> In OOP, classes are not types; interfaces are [0].
Depends on which OOP language we are talking about, Smalltalk definitly doesn't have interfaces unless we are talking about later dialects like Pharo, which introduced traits into the language.
The paper you linked to, makes its point exactly by moving beyond Simula and Smalltalk into their own view of OOP.
So like anything else on the OOP ecosystem, it is only yet another view about what OOP should be like.
Yes, but ABCs and also method resolution are particularly hackish in Python.
Generally, in Python one always has to understand the implementation and mentally execute the code, because everything is informally specified and nothing is declarative.
There's certainly a way to consider classes as types coherently, with added subtyping. In this view, interfaces are collections of types. That is, they correspond most closely to type classes.
> Guido, having clearly avoided any relevant literature […] unobservant of prior work
Same thing can be observed for his blog opinions on parsing. Sadly, this gives rise to a whole generation of programmers who believe (on account of perceiving him as an authority) that Peg are actually good.
Agreed. Unintentional ambiguities are a problem of PEG parsers. Ironically, the old Python parser also failed to detect ambiguities.
There is a strong aversion in the Python space for unambiguous formalisms. A parser that resolves ambiguities by earliest match first seems to satisfy the dynamic mindset.
Pattern matching in Python is not perfect, but I still thinks this is a cool feature. The ergonomic doesn't match Rust pattern matching as explained in this recent discussion[0] but it's better than nothing IMO.
Hell yeah, another reason to get people to switch to Python >= 3.10. Unfortunately a lot of libraries are still gonna be behind for now: https://pyreadiness.org/3.10/ - of the top 360 most downloaded on PyPI, 212 have explicit Python 3.10 support. This plus the walrus operator, X | Y -style union types, and the speedup possibilities of 3.11 all look great, imo
41% of the top packages don't support the new version? And looking back, the numbers aren't even much better for previous versions, 3.8 is several versions old now and they've still got over 20% breakage.
That is actually pathetic and I'm not blaming the package authors here - Python needs to stop making big breaking changes that rototill the codebase continuously.
Breaking changes are pretty serious business in the Java world and there is an incredible amount of thought and research put into even something like modules let alone the JDK17 changes where reflection is being fundamentally changed. Python seems to have an absolutely carefree attitude to language breakage, and I guess why wouldn't they? It's always worked for them.
> is several versions old now and they've still got over 20% breakage.
I don't think that's illustrating breakage, just the lack of an explicit declaration that the package supports a newer version of python (which may be newer than the latest release of a given package).
Yeah... I always develop against the latest Python available and have probably run into 1 package in my career with a version incompatibility, and that was with a C API portion. 20-40% is not correct.
I assume you weren’t working with Python during the 2 to 3 transition? There were at least six years, if not more, of major libraries not supporting Python 3, of as a library author trying to write code that was 2/3 compatible, etc.
I agree that post Python 3 it is less common for a language update to break third-party libraries, but I also don’t think sweeping the 2 to 3 years under the rug is fair.
I didn't see the need to frame everything in the article as being so bad. I thought all those examples were awesome. I'll probably never do anything like them so as not to confuse people but I still think they were all interesting uses of the feature.
Honestly, I think people are too hard on languages (and especially Python) for having new features that challenge the status quo. And then there's also too much drama when it turns out that a scripting language is, in fact, a scripting language! So you can do weird things with stateful ABCs and such. I mean yeah, it's strange. But it probably also has some perfect use case in a very specific circumstance. At the end of the day, if you understand how a feature really works, you can do creative things with it. I'm glad we have it!
Yep. Honestly this seems fine - pattern matching is ultimately just a function call, you can immediately see where it is (and hopefully even click through to the code from where the pattern is defined), there's no "magic" action-at-a-distance.
Customizable pattern matching is in fact unusual, although not unheard of. Traditional pattern matching is very much not a function call, because it is compiled with assumption that it can't be customized and no arbitrary user defined code is executed. In particular, usual compilation scheme guarantees that subterms are not matched more than once, but that can't be guaranteed with customizable patterns.
Even if someone would like Python to pretend to be statically typed, __subclasshook__ is real (and not even type checking in the traditional sense) while type declarations and their checkers are only quasi-executable documentation.
Personally, I think this is a very good situation: extremely dynamic languages are a worthwhile tool, and the only issue with the pattern matching "exploits" in the article is that boolean operators and non-cached evaluation for subclass checks are not built-in.
Based on other comments here, it seems like some people don't know the purpose of ABCs and are assuming they're more like a class definition for classes that have to be directly inherited.
__subclasshook__ is exactly the sort of nonsense that makes Python code a complete mess and impossible to optimise. I'm going to take a wild guess that none of the "fast" Python variants (Cython, Micropython, RPython etc.) support it.
I'm going to take a wild guess that none of the "fast" Python variants (Cython, Micropython, RPython etc.) support it.
Cython at least aims to be a superset of Python, so it will support it sooner or later. However I don't doubt that using it will make your cython code stop being "fast".
A lot of these dunder (double underscore) functions are useful for metaprogramming. Just kinda spitballing an idea but perhaps some code to generate a python object model based on the schema of a database would want to use this method. If your DB schema has some special way of defining subclass relationships (maybe a foreign key to another table) you might need to manually control when something is or isn't a subclass in python's object model based on the result of querying the DB schema.
In general metaprogramming is the kind of thing you probably don't and shouldn't reach for first, in fact it's usually more for libraries and tools vs. your production business logic. It can get difficult to reason about and pass the maintenance of code that heavily uses metaprogramming to other people unfamiliar with it.
The most dominate use (these days, at least) is to implement structural typing (a la Protocol), i.e. conforming to a “shape” without actually inheriting anything. So yeah, it’s not particularly useful for day-to-day use, but still a hook needed to make certain nice things happen behind the scenes.
Where the first makes it harder to use `Iterable` incorrectly (i.e. supplying a non-type as parameter).
I can imagine that a Java or C# programmer would call the first version more "coherent" because it gives the interface `Iterable` a name explicitly.
Sort of like there's not reaaaaally a reason to use Extension Methods in C# (of course there are, but in a lot of simple scenarios there aren't) as opposed to static methods taking a Type as single parameter.
What follows the keyword “case” there is a pattern, not an expression. It’s not an imperative construct of code to be executed, but a declarative construct of an expected shape of an object, which may include name bindings.
Consider things like “case DistanceMetric(distance=d):” earlier in the article: this checks “is the value an instance of DistanceMetric, and if so, take its distance attribute and bind it to the name d”.
So in this case, what would it mean? If the value is an instance of Not, take and bind it to the DistanceMetric name (as is typical for a single positional subpattern), and… uh oh, more parentheses, what to do? There’s no obvious sensible meaning for it, so it’s a syntax error.
Because the right arm of the 'case' keyword is not actually a statement being executed, but its own syntax element to represent a pattern. It is not expecting two sets of brackets there.
> That made me wonder if ABCs could “hijack” a pattern match. Something like this:
I guess the word "hijack" is used loosely for rhetorical effect here because this seems to be working as intended, and it's not even remotely the most dangerous footgun in Python. The problem (if any) is with `isinstance`, and not the pattern matching. `isinstance` should probably explicitly work with ABCs (via flag or something) because I do agree it's a bit weird that it takes the ABC's `__subclasshook__` as gospel by default.
The same author would say something similar about C/C++ where by obtaining the address of a function and then writing a value to that address you can change the code of the program.
Yes, you are misusing the API. No, there ARE valid use-cases for it: for example a lot of testing/mocking facilities in Python are so easy to implement because of these features. No, the fact that you can't do it in statically typed languages does not imply Python must go the same way.
IDK, this is roughly how CLOS works and it's largely held to be a Good Thing: classes are simply arbitrary sets of predicates a value does or doesn't meet.
It seems like the main problem here is that Python does not enforce that the subclass hook method is pure and so allows you to create buggy implementations. The spec should mandate that the method is pure and causing side effects should be a runtime (or compile time) error.
This is why adding language features needs to be carefully thought out and explored to great lengths. Sadly C++ never got that memo. Here's hoping Python remains "simple."
I’m not sure I understand the “this is why” part. Python wants to let you override almost everything. It’s a feature and a powerful and occasionally helpful one. It’s also a feature you never ever have to know about or touch.
The same debate applies to all of those things, so this is just another instance of Python battling with a philosophical principle underlying the language, which naturally arises anytime a redundant (but good) feature like this is added to the language
Agreed - I mostly use Python for small scripts and it makes that use case very easy. I know it has a bunch more features for more niche stuff as well but my throwaway script to download images from a webpage doesn't need pattern matching.
It's like python's metaclasses. You rarely need them but sometimes they really are just the best solution to the problem. Those times, you're really glad they're available.
Indeed. Most Python code bases I have seen are maxing out all obscure features like those in the article. In practice, Python today is one of the most unreadable languages in existence.
`a += b` sometimes does the same thing as `a = a + b`:
>>> a = b = (1, 2) >>> a = b = (1, 2)
>>> a += (3,) >>> a = a + (3,)
>>> print(a, b) >>> print(a, b)
(1, 2, 3) (1, 2) (1, 2, 3) (1, 2)
Sometimes it does something different:
>>> a = b = [1, 2] >>> a = b = [1, 2]
>>> a += [3] >>> a = a + [3]
>>> print(a, b) >>> print(a, b)
[1, 2, 3] [1, 2, 3] [1, 2, 3] [1, 2]
IMO if that behaviour had been "carefully thought out" by the language designers, it should have been obvious that it's a bad idea.
Failing that, the implementation of that behaviour is convoluted - so the designers should have paid attention to the Zen of Python: "If the implementation is hard to explain, it's a bad idea".
Failing that, if the behaviour had been "explored to great lengths", the designers would have understood how it interacts with other language features - in particular, nested mutable and immutable objects.
Python's designers failed to do any of these things, so we've ended up with an operator with unpredictable behaviour and a long FAQ entry [0] about how its possible for an operator to both succeed and fail at the same time:
>>> a = ([1, 2], 4)
>>> a[0] += [3]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
>>> a
([1, 2, 3], 4)
The obvious x = x + y and the also-works x += y. There is more than one way to do the operation. The conceptually more simple way would be the former as it does not require the programmer to know an extra operator.
[0]: https://github.com/python/cpython/blob/main/Lib/_py_abc.py