Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How do you learn to read source code?
5 points by ywecur on May 2, 2016 | hide | past | favorite | 7 comments
I'm a decent solo programmer and freelance web developer and I've created some simple back end programs together with a friend.

But when it comes to understanding even the most simple open source projects I can't for the life of me understand what's going on. There are so many functions relating to each other, so many classes that I just get lost. I'd probably get lost in everything my friends have written as well if they weren't there to in detail explain their architecture for me.

So how do you do you learn this? There are plenty of books and resources on how to program but I've yet to find any on how to read source code.




* Start at obvious entry points: main(...) for standalone apps, exported functions in libraries and top-level handlers for web applications.

* Use a debugger and breakpoints for stack traces to understand program flow (if not available, throw an exception in a function of interest or insert print statements for the same effect).

* Either use an IDE that allows you to jump to definitions or map the project with grep (e.g.: grep -H 'class ' * > classes.txt) to save time when manually going to the definition (or anything in between, but what's available depends on the language).

* Focus: Especially when not used to code reading one can easily get lost trying to understand every single statement at once. Focus on what you want to understand in this reading session and develop a habit of doing the next step towards that goal once you got a good idea of what you are looking at. Perfect understanding is not necessary - if it turns out you missed a crucial part, you can still go back and re-read.

* Practice: The more you read and write code, the more proficient you'll become, the more intuition you will develop and thus the faster you'll be when skimming over unknown code.


Your comment touches a lot on how to get an idea of the overall architecture, and I thank you for it.

My main problem though is that I can't seem to see the purpose of functions, because more often than not they are written in a, to me, foreign way.

An example I have is this uncommented if statement:

`if (tab.url.toLowerCase().indexOf("https://facebook.com") > -1)`

Should it be clear to me immediately that this means "If the target website is open"? It took me a solid minute just to understand this statement.

Are there some common design patterns I should memorise?


>> * Practice: The more you read and write code, the more proficient you'll become, the more intuition you will develop and thus the faster you'll be when skimming over unknown code.

> if (tab.url.toLowerCase().indexOf("https://facebook.com") > -1)

> Should it be clear to me immediately that this means "If the target website is open"? It took me a solid minute just to understand this statement.

In theory you should figure out what the url-property of tab is (e.g. by logging it's value to the console). Then you see it is a string and check the javascript reference for toLowerCase() and indexOf().

When you've gained some experience you'll recognize those two functions immediately and know what they do and just assume that tab.url is a string (and you'll be mad at the original developer if it isn't).


It's just vanilla Javascript, not some "foreign way". Checking `"somestring".indexOf(something) != -1` is common idiom in JavaScript because function indexOf returns -1 when there is no match. Just look for the definition of indexOf method on MDN: https://developer.mozilla.org/pl/docs/Web/JavaScript/Referen...

Maybe it's weird but what else could indexOf return? It couldn't return 0, because strings in JS starts from 0 (not from 1), it couldn't return null, because JS is weak typed so in some circumstances null would be not different than 0.

I'm glad that JS is not PHP (when some functions can return number or boolean or something else "just because") ;)


The simple fact of the matter is that reading code is hard, maybe even impossible in the general case. You can understand code with some amount of effort, but it often boils down to an exercise in reverse engineering.

One thing this means is that in any substantial codebase you are never going to understand all of it. You will typically only have time to learn a fraction of the system, so if you are going to proactively explore the codebase, you will need to prioritize. You probably (but not necessarily) want to get a handle on the top-level architecture before digging deep anywhere.

My final piece of advice is that I personally find it impossible to understand just about any non-trivial piece of code without running it, and running it multiple times (1). Perhaps even many, many times. You can run under a debugger (single stepping or breakpoints) and this seems to work for many people. I still rely on print statements sprinkled through the code myself, adding and removing them as I run the code in question over and over again as my current point of interest moves from place to place in the code. This might sound scary, but it's not that different from the way you normally debug code.

(1) It's entirely possible that the person that wrote the code in the first place also ran it many, many times (testing each small change) as they wrote it. So it's perhaps not unreasonable that you yourself may need to run it many, many times in order to understand it later.


Knowing which source file to start with is important. That depends quite a bit on the language, and the type of software.

Hard to make a short summary here, because there's no easy rule of thumb. For example, you could say, for C, "find the main() function first." That doesn't help, though, if the open source project is a library, like pcre.


It's not that hard, really, just use a call tracing tool. (Doing it without such a tool is very hard, of course, and is not very smart.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: