Learn by reading code: Python standard library design decisions explained

ainzzorl · on June 30, 2021

> "There are so many projects on GitHub – pick one you like and see how they did it." But most successful projects are quite large; where do you start from?

I'm working on a slightly similar project motivated by this problem: how do you learn from established open-source projects - most interesting ones are too big, too complex, to hard to get started with.

So I'm compiling a collection of interesting code examples from open-source projects and explaining/annotating them. I'm trying to pick examples that are:

- Taken from popular, established open-source projects.

- Somewhat self-contained. They can be understood with little knowledge of the surrounding context.

- Small-ish. The can be understood in one sitting.

- Non-trivial.

- Instructive. They solve somewhat general problems, similar to what some other coders on some other projects could be facing.

- Good code, at least in my opinion.

I'm planning to share it in a few weeks.

mttpgn · on June 30, 2021

> > where do you start from?

My answer for this has been to scroll all the way through the git history of a repo structurally similar to something I want to do and read each of the first month-or-so of commits.

chestervonwinch · on June 30, 2021

This sounds related: http://aosabook.org/en/index.html

genericlemon24 · on June 30, 2021

I cover The Architecture of Open Source Applications as well, here: https://death.andgravity.com/aosa :D

ainzzorl · on June 30, 2021

You blog about interesting things, subscribed :-D

genericlemon24 · on June 30, 2021

Thank you!

TheFreim · on June 30, 2021

This is very interesting to me. Do you have a blog or some other way of keeping track of this?

ainzzorl · on June 30, 2021

I started a repository for it: https://github.com/ainzzorl/goodcode. Please follow!

It's empty for now, but I included a few examples [1] of what I'm going to annotate and add there.

[1] https://github.com/ainzzorl/goodcode#examples

jimpudar · on June 30, 2021

A project you may want to look into adding is Tern [0]. I've had a good time reading through the code over the past couple of weeks, and have found it to be at least not "bad" code, and pretty easy to understand.

Specifically how they are untarring each container layer and creating a chroot jail to run commands inside is fairly self-contained and interesting.

[0] https://github.com/tern-tools/tern

andi999 · on June 30, 2021

Why smallish? Small only teaches you tactics and not strategy. (in the sense architecture is most important for largish projects)

cammikebrown · on June 30, 2021

Gotta start somewhere.

ainzzorl · on June 30, 2021

Yes. I was thinking to start with smaller examples showing how they write code at <big project> without necessarily understanding the big picture, but bigger examples would be great too eventually. Something like http://aosabook.org/en/index.html already mentioned here.

moharoune · on June 30, 2021

That sounds great, would please notify me whenever it's up ? here's my email: mohammedi.haroun@gmail.com

kwerk · on June 30, 2021

+1 can you share a way to follow along?

ainzzorl · on June 30, 2021

Replied in another branch: https://news.ycombinator.com/item?id=27687724

WillDaSilva · on June 30, 2021

> [pathlib] is a good object-oriented solution

I'd agree that pathlib is a good solution, and that it's an object-oriented solution, but I'm hesitant to call it a good object-oriented solution. It uses some object-oriented hackery to work as elegantly as it does. I approve of this decision, but it leads to some weirdness such as the inability to subclass `pathlib.Path`. Instead you need to subclass `type(pathlib.Path())`, as the method resolution order changes once it is instantiated:

    >>> from pathlib import Path
    >>> Path.__mro__
    (<class 'pathlib.Path'>, <class 'pathlib.PurePath'>, <class 'object'>)
    >>> type(Path()).__mro__
    (<class 'pathlib.PosixPath'>, <class 'pathlib.Path'>, <class 'pathlib.PurePosixPath'>, <class 'pathlib.PurePath'>, <class 'object'>)

In any case, weirdness like this can make for an even better learning opportunity!

hexane360 · on June 30, 2021

I was bitten by something related to this a couple weeks ago:

    >>> with open("pickle.pkl", "rb") as f:
    ...     pickle.load(f)

    Traceback (most recent call last):
      File "<stdin>", line 2, in <module>
      File "/usr/lib/python3.9/pathlib.py", line 1074, in __new__
        raise NotImplementedError("cannot instantiate %r on your system"
    NotImplementedError: cannot instantiate 'WindowsPath' on your system

tialaramex · on June 30, 2021

One interesting thing you couldn't practically have done twenty years ago but became relatively straightforward for a project which is Free Software anyway in the modern era is to literally link the source code from the documentation.

Take Rust's std::iter::repeat_with

https://doc.rust-lang.org/std/iter/fn.repeat_with.html

There's a declaration of this standard library function to make an iterator from a closure, then there's a working code sample which uses it and a further example showing a closer to real-world usage.

But top-right is a [src] link, you can go in one click from the documentation of the function call, to the actual implementation which is used (except where Rust just says this is a compiler intrinsic and then you need to burrow into the details of the actual compiler).

otabdeveloper4 · on June 30, 2021

Python's standard library was designed??

Big if true!

N.B. I've been programming in Python since it was version 1.3. Forgive me.

ttrruu · on June 30, 2021

I’ve been programming in Python since version 3.6 and I had the exact same thought.

scotuswroteus · on June 30, 2021

The difference in language, syntax, and flow between Automate the Boring Stuff and Python's official documentation is like the difference between reading the modern translation of Beowolf next to the original Old English version.

Why doesn't someone make a plain language annotation -- for beginners -- to translate the official documentation?

vesche · on June 30, 2021

Reading tons of Python code has really helped me as a Python developer. Several times a month I'll checkout GitHub's trending Python repositories: https://github.com/trending/python?since=weekly By pursuing this list I can dip into repositories that interest me, see the "beat on the street", skim interesting project code, keep up to date with the latest design decisions, etc. It's also fun to see that sometimes a trending repository has "bad" code. It's nice to be able to recognize good code from bad code, which is a skill I only developed by reading and writing tons of code. Also, it's nice to realize that fun projects don't have to be coded well to be cool.

elpakal · on June 30, 2021

I've learned so much by lurking on the Swift standard library team's proposals, but digging around the std lib is something I rarely do. Were I really learn the most is from other Swift projects developed by many std lib authors like the Swift Argument Parser, Algorithms or Collections projects. I find these useful because it gets me practical pro-tips and I get to see std lib in practice without needing to know the nitty nitty gritty stuff. So that's somewhere in between a std lib and a "successful project" on GitHub I guess.

vajrabum · on June 30, 2021

I found the cpython code's standard library has got a lot of good stuff and the inline documentation is generally very good. I learned things from csv, difflib, heapq, and re. I'd recommend taking at least a quick look at any python standard library module before using it.

https://github.com/python/cpython/tree/main/Lib

lifeisstillgood · on June 30, 2021

I try and get my mentorees (is that a word?) to read and write up their "favourite" python modules - just for the exercise in reading and style.

I ought to do more of it but I think this ought to become a thing - a sort of nutshell guide to different modules.

I think this should be something between the manual and pymotw

BostonFern · on June 30, 2021

Maybe they're your proteges?

occamrazor · on June 30, 2021

The word is _mentee_.

SonOfLilit · on June 30, 2021

Mentees :)

omnicognate · on June 30, 2021

Mentee is the correct word in modern usage, but Mentoree would arguably be more "correct" etymologically in that Mentor is a proper noun that has been "verbed" [1], not "one who ments".

[1] https://en.m.wikipedia.org/wiki/Mentor_(Odyssey)

mattikl · on June 30, 2021

It's so easy to lose focus when reading code just for learning. Writing about that code is a good way to keep the focus and dig deeper.

fatsdomino001 · on June 30, 2021

The word you're looking for is manatees: large aquatic mammals, mostly herbivorous.

lifeisstillgood · on July 1, 2021

I have to get them all name badges now :-)

enricozb · on June 30, 2021

tangentially related: I'm learning iOS tweak development (on a jailbroken iPhone) using theos [0], and due to the fact that Apple's docs are pretty lacking, I've had to basically exclusively rely on GitHub's search for code examples using the methods/classes I'm interested in. Never done this kind of "learning by example" with respect to programming before, and it definitely has me yearning for better docs...

[0]: https://github.com/theos/theo

eachro · on June 30, 2021

The code in the statistics library is actually quite a bit more complicated than I had expected. Is there a reason why a function like `mean` isnt just: `sum(data) / len(data)`?

mattkrause · on June 30, 2021

This is the complete code (from here: https://github.com/python/cpython/blob/3.9/Lib/statistics.py)

   if iter(data) is data:
        data = list(data)
    n = len(data)
    if n < 1:
        raise StatisticsError('mean requires at least one data point')
    T, total, count = _sum(data)
    assert count == n
    return _convert(total / n, T)

The _sum function is a little more involved, but not appreciably so.

In general though, a literal translation of a formula is not always great in terms of numerical stability/error. For example, you might have learned to calculate the variance as mean(X^2) - mean(X)^2, but this can lead to a huge loss of precision that more complicated approaches avoid.

abecedarius · on June 30, 2021

A couple things here I don't understand:

> if iter(data) is data:

Wouldn't it be cheaper like `if type(data) is iter:`?

And why convert `data` to a list at all, to check for length 0, given that `_sum(data)` will return the count?

turndown · on June 30, 2021

Perhaps it's cheaper to just get the conversion and length check out of the way rather than have to execute _sum and find out from checking T.

abecedarius · on June 30, 2021

But the path where the cost of _sum is extra is the error path; plus that'll end up being 0 iterations anyway.

slaymaker1907 · on June 30, 2021

One trick that often improves numerical stability with computing the sum is to first sort the data.

rincewind · on June 30, 2021

Sometimes I think anti-recommendations (e.g. asyncio) are far more useful. You don't learn a lot from many open source libraries or standard library modules if you try to imitate them. You must understand that this code is a compromise, then they stuck with it for compatibility reasons, and today you could have a much better API or simpler implementation if you started from scratch.

Spivak · on June 30, 2021

But that’s true for code that was written yesterday too. All code makes compromises and has to deal with constraints. Understanding what someone did given their lot is still valuable.

afarviral · on June 30, 2021

Sounds like a goldmine for learning. I collect trivial examples of code constantly but these tiny programs or demonstrations don't provide the same insights as snips from real projects, and all the facets of that project's direction.

VMtest · on June 30, 2021

I remember there is an old explanation of Python implementation on Youtube, the inner workings, I have never watched it. But I couldn't find it now