Hacker News new | past | comments | ask | show | jobs | submit login
Learn by reading code: Python standard library design decisions explained (andgravity.com)
282 points by genericlemon24 on June 30, 2021 | hide | past | favorite | 43 comments



> "There are so many projects on GitHub – pick one you like and see how they did it." But most successful projects are quite large; where do you start from?

I'm working on a slightly similar project motivated by this problem: how do you learn from established open-source projects - most interesting ones are too big, too complex, to hard to get started with.

So I'm compiling a collection of interesting code examples from open-source projects and explaining/annotating them. I'm trying to pick examples that are:

- Taken from popular, established open-source projects.

- Somewhat self-contained. They can be understood with little knowledge of the surrounding context.

- Small-ish. The can be understood in one sitting.

- Non-trivial.

- Instructive. They solve somewhat general problems, similar to what some other coders on some other projects could be facing.

- Good code, at least in my opinion.

I'm planning to share it in a few weeks.


> > where do you start from?

My answer for this has been to scroll all the way through the git history of a repo structurally similar to something I want to do and read each of the first month-or-so of commits.



I cover The Architecture of Open Source Applications as well, here: https://death.andgravity.com/aosa :D


You blog about interesting things, subscribed :-D


Thank you!


This is very interesting to me. Do you have a blog or some other way of keeping track of this?


I started a repository for it: https://github.com/ainzzorl/goodcode. Please follow!

It's empty for now, but I included a few examples [1] of what I'm going to annotate and add there.

[1] https://github.com/ainzzorl/goodcode#examples


A project you may want to look into adding is Tern [0]. I've had a good time reading through the code over the past couple of weeks, and have found it to be at least not "bad" code, and pretty easy to understand.

Specifically how they are untarring each container layer and creating a chroot jail to run commands inside is fairly self-contained and interesting.

[0] https://github.com/tern-tools/tern


Why smallish? Small only teaches you tactics and not strategy. (in the sense architecture is most important for largish projects)


Gotta start somewhere.


Yes. I was thinking to start with smaller examples showing how they write code at <big project> without necessarily understanding the big picture, but bigger examples would be great too eventually. Something like http://aosabook.org/en/index.html already mentioned here.


That sounds great, would please notify me whenever it's up ? here's my email: mohammedi.haroun@gmail.com


+1 can you share a way to follow along?



> [pathlib] is a good object-oriented solution

I'd agree that pathlib is a good solution, and that it's an object-oriented solution, but I'm hesitant to call it a good object-oriented solution. It uses some object-oriented hackery to work as elegantly as it does. I approve of this decision, but it leads to some weirdness such as the inability to subclass `pathlib.Path`. Instead you need to subclass `type(pathlib.Path())`, as the method resolution order changes once it is instantiated:

    >>> from pathlib import Path
    >>> Path.__mro__
    (<class 'pathlib.Path'>, <class 'pathlib.PurePath'>, <class 'object'>)
    >>> type(Path()).__mro__
    (<class 'pathlib.PosixPath'>, <class 'pathlib.Path'>, <class 'pathlib.PurePosixPath'>, <class 'pathlib.PurePath'>, <class 'object'>)
In any case, weirdness like this can make for an even better learning opportunity!


I was bitten by something related to this a couple weeks ago:

    >>> with open("pickle.pkl", "rb") as f:
    ...     pickle.load(f)

    Traceback (most recent call last):
      File "<stdin>", line 2, in <module>
      File "/usr/lib/python3.9/pathlib.py", line 1074, in __new__
        raise NotImplementedError("cannot instantiate %r on your system"
    NotImplementedError: cannot instantiate 'WindowsPath' on your system


One interesting thing you couldn't practically have done twenty years ago but became relatively straightforward for a project which is Free Software anyway in the modern era is to literally link the source code from the documentation.

Take Rust's std::iter::repeat_with

https://doc.rust-lang.org/std/iter/fn.repeat_with.html

There's a declaration of this standard library function to make an iterator from a closure, then there's a working code sample which uses it and a further example showing a closer to real-world usage.

But top-right is a [src] link, you can go in one click from the documentation of the function call, to the actual implementation which is used (except where Rust just says this is a compiler intrinsic and then you need to burrow into the details of the actual compiler).


Python's standard library was designed??

Big if true!

N.B. I've been programming in Python since it was version 1.3. Forgive me.


I’ve been programming in Python since version 3.6 and I had the exact same thought.


The difference in language, syntax, and flow between Automate the Boring Stuff and Python's official documentation is like the difference between reading the modern translation of Beowolf next to the original Old English version.

Why doesn't someone make a plain language annotation -- for beginners -- to translate the official documentation?


Reading tons of Python code has really helped me as a Python developer. Several times a month I'll checkout GitHub's trending Python repositories: https://github.com/trending/python?since=weekly By pursuing this list I can dip into repositories that interest me, see the "beat on the street", skim interesting project code, keep up to date with the latest design decisions, etc. It's also fun to see that sometimes a trending repository has "bad" code. It's nice to be able to recognize good code from bad code, which is a skill I only developed by reading and writing tons of code. Also, it's nice to realize that fun projects don't have to be coded well to be cool.


I've learned so much by lurking on the Swift standard library team's proposals, but digging around the std lib is something I rarely do. Were I really learn the most is from other Swift projects developed by many std lib authors like the Swift Argument Parser, Algorithms or Collections projects. I find these useful because it gets me practical pro-tips and I get to see std lib in practice without needing to know the nitty nitty gritty stuff. So that's somewhere in between a std lib and a "successful project" on GitHub I guess.


I found the cpython code's standard library has got a lot of good stuff and the inline documentation is generally very good. I learned things from csv, difflib, heapq, and re. I'd recommend taking at least a quick look at any python standard library module before using it.

https://github.com/python/cpython/tree/main/Lib


I try and get my mentorees (is that a word?) to read and write up their "favourite" python modules - just for the exercise in reading and style.

I ought to do more of it but I think this ought to become a thing - a sort of nutshell guide to different modules.

I think this should be something between the manual and pymotw


Maybe they're your proteges?


The word is _mentee_.


Mentees :)


Mentee is the correct word in modern usage, but Mentoree would arguably be more "correct" etymologically in that Mentor is a proper noun that has been "verbed" [1], not "one who ments".

[1] https://en.m.wikipedia.org/wiki/Mentor_(Odyssey)


It's so easy to lose focus when reading code just for learning. Writing about that code is a good way to keep the focus and dig deeper.


The word you're looking for is manatees: large aquatic mammals, mostly herbivorous.


I have to get them all name badges now :-)


tangentially related: I'm learning iOS tweak development (on a jailbroken iPhone) using theos [0], and due to the fact that Apple's docs are pretty lacking, I've had to basically exclusively rely on GitHub's search for code examples using the methods/classes I'm interested in. Never done this kind of "learning by example" with respect to programming before, and it definitely has me yearning for better docs...

[0]: https://github.com/theos/theo


The code in the statistics library is actually quite a bit more complicated than I had expected. Is there a reason why a function like `mean` isnt just: `sum(data) / len(data)`?


This is the complete code (from here: https://github.com/python/cpython/blob/3.9/Lib/statistics.py)

   if iter(data) is data:
        data = list(data)
    n = len(data)
    if n < 1:
        raise StatisticsError('mean requires at least one data point')
    T, total, count = _sum(data)
    assert count == n
    return _convert(total / n, T)
The _sum function is a little more involved, but not appreciably so.

In general though, a literal translation of a formula is not always great in terms of numerical stability/error. For example, you might have learned to calculate the variance as mean(X^2) - mean(X)^2, but this can lead to a huge loss of precision that more complicated approaches avoid.


A couple things here I don't understand:

> if iter(data) is data:

Wouldn't it be cheaper like `if type(data) is iter:`?

And why convert `data` to a list at all, to check for length 0, given that `_sum(data)` will return the count?


Perhaps it's cheaper to just get the conversion and length check out of the way rather than have to execute _sum and find out from checking T.


But the path where the cost of _sum is extra is the error path; plus that'll end up being 0 iterations anyway.


One trick that often improves numerical stability with computing the sum is to first sort the data.


Sometimes I think anti-recommendations (e.g. asyncio) are far more useful. You don't learn a lot from many open source libraries or standard library modules if you try to imitate them. You must understand that this code is a compromise, then they stuck with it for compatibility reasons, and today you could have a much better API or simpler implementation if you started from scratch.


But that’s true for code that was written yesterday too. All code makes compromises and has to deal with constraints. Understanding what someone did given their lot is still valuable.


Sounds like a goldmine for learning. I collect trivial examples of code constantly but these tiny programs or demonstrations don't provide the same insights as snips from real projects, and all the facets of that project's direction.


I remember there is an old explanation of Python implementation on Youtube, the inner workings, I have never watched it. But I couldn't find it now




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: