> "There are so many projects on GitHub – pick one you like and see how they did it." But most successful projects are quite large; where do you start from?
I'm working on a slightly similar project motivated by this problem: how do you learn from established open-source projects - most interesting ones are too big, too complex, to hard to get started with.
So I'm compiling a collection of interesting code examples from open-source projects and explaining/annotating them. I'm trying to pick examples that are:
- Taken from popular, established open-source projects.
- Somewhat self-contained. They can be understood with little knowledge of the surrounding context.
- Small-ish. The can be understood in one sitting.
- Non-trivial.
- Instructive. They solve somewhat general problems, similar to what some other coders on some other projects could be facing.
My answer for this has been to scroll all the way through the git history of a repo structurally similar to something I want to do and read each of the first month-or-so of commits.
A project you may want to look into adding is Tern [0]. I've had a good time reading through the code over the past couple of weeks, and have found it to be at least not "bad" code, and pretty easy to understand.
Specifically how they are untarring each container layer and creating a chroot jail to run commands inside is fairly self-contained and interesting.
Yes. I was thinking to start with smaller examples showing how they write code at <big project> without necessarily understanding the big picture, but bigger examples would be great too eventually. Something like http://aosabook.org/en/index.html already mentioned here.
I'd agree that pathlib is a good solution, and that it's an object-oriented solution, but I'm hesitant to call it a good object-oriented solution. It uses some object-oriented hackery to work as elegantly as it does. I approve of this decision, but it leads to some weirdness such as the inability to subclass `pathlib.Path`. Instead you need to subclass `type(pathlib.Path())`, as the method resolution order changes once it is instantiated:
I was bitten by something related to this a couple weeks ago:
>>> with open("pickle.pkl", "rb") as f:
... pickle.load(f)
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/usr/lib/python3.9/pathlib.py", line 1074, in __new__
raise NotImplementedError("cannot instantiate %r on your system"
NotImplementedError: cannot instantiate 'WindowsPath' on your system
One interesting thing you couldn't practically have done twenty years ago but became relatively straightforward for a project which is Free Software anyway in the modern era is to literally link the source code from the documentation.
There's a declaration of this standard library function to make an iterator from a closure, then there's a working code sample which uses it and a further example showing a closer to real-world usage.
But top-right is a [src] link, you can go in one click from the documentation of the function call, to the actual implementation which is used (except where Rust just says this is a compiler intrinsic and then you need to burrow into the details of the actual compiler).
The difference in language, syntax, and flow between Automate the Boring Stuff and Python's official documentation is like the difference between reading the modern translation of Beowolf next to the original Old English version.
Why doesn't someone make a plain language annotation -- for beginners -- to translate the official documentation?
Reading tons of Python code has really helped me as a Python developer. Several times a month I'll checkout GitHub's trending Python repositories: https://github.com/trending/python?since=weekly By pursuing this list I can dip into repositories that interest me, see the "beat on the street", skim interesting project code, keep up to date with the latest design decisions, etc. It's also fun to see that sometimes a trending repository has "bad" code. It's nice to be able to recognize good code from bad code, which is a skill I only developed by reading and writing tons of code. Also, it's nice to realize that fun projects don't have to be coded well to be cool.
I've learned so much by lurking on the Swift standard library team's proposals, but digging around the std lib is something I rarely do. Were I really learn the most is from other Swift projects developed by many std lib authors like the Swift Argument Parser, Algorithms or Collections projects. I find these useful because it gets me practical pro-tips and I get to see std lib in practice without needing to know the nitty nitty gritty stuff. So that's somewhere in between a std lib and a "successful project" on GitHub I guess.
I found the cpython code's standard library has got a lot of good stuff and the inline documentation is generally very good. I learned things from csv, difflib, heapq, and re. I'd recommend taking at least a quick look at any python standard library module before using it.
Mentee is the correct word in modern usage, but Mentoree would arguably be more "correct" etymologically in that Mentor is a proper noun that has been "verbed" [1], not "one who ments".
tangentially related: I'm learning iOS tweak development (on a jailbroken iPhone) using theos [0], and due to the fact that Apple's docs are pretty lacking, I've had to basically exclusively rely on GitHub's search for code examples using the methods/classes I'm interested in. Never done this kind of "learning by example" with respect to programming before, and it definitely has me yearning for better docs...
The code in the statistics library is actually quite a bit more complicated than I had expected. Is there a reason why a function like `mean` isnt just: `sum(data) / len(data)`?
if iter(data) is data:
data = list(data)
n = len(data)
if n < 1:
raise StatisticsError('mean requires at least one data point')
T, total, count = _sum(data)
assert count == n
return _convert(total / n, T)
The _sum function is a little more involved, but not appreciably so.
In general though, a literal translation of a formula is not always great in terms of numerical stability/error. For example, you might have learned to calculate the variance as mean(X^2) - mean(X)^2, but this can lead to a huge loss of precision that more complicated approaches avoid.
Sometimes I think anti-recommendations (e.g. asyncio) are far more useful. You don't learn a lot from many open source libraries or standard library modules if you try to imitate them. You must understand that this code is a compromise, then they stuck with it for compatibility reasons, and today you could have a much better API or simpler implementation if you started from scratch.
But that’s true for code that was written yesterday too. All code makes compromises and has to deal with constraints. Understanding what someone did given their lot is still valuable.
Sounds like a goldmine for learning. I collect trivial examples of code constantly but these tiny programs or demonstrations don't provide the same insights as snips from real projects, and all the facets of that project's direction.
I'm working on a slightly similar project motivated by this problem: how do you learn from established open-source projects - most interesting ones are too big, too complex, to hard to get started with.
So I'm compiling a collection of interesting code examples from open-source projects and explaining/annotating them. I'm trying to pick examples that are:
- Taken from popular, established open-source projects.
- Somewhat self-contained. They can be understood with little knowledge of the surrounding context.
- Small-ish. The can be understood in one sitting.
- Non-trivial.
- Instructive. They solve somewhat general problems, similar to what some other coders on some other projects could be facing.
- Good code, at least in my opinion.
I'm planning to share it in a few weeks.