The Shapes of Code

classified · on Jan 28, 2020

> ... not so good code: if it was, we wouldn’t need a comment...

How much longer will this rumor persist that "good code" doesn't need commenting? Good comments explain things that aren't immediately obvious from the code.

Izkata · on Jan 28, 2020

You left out the most important part of this criticism:

> ...to explain each line of code.

Commenting every line is a hint to a more pervasive problem. This section has nothing to do with "comments are bad".

clarry · on Jan 28, 2020

Most code should be boring and immediately obvious. I don't run into lots of real world code that needs much explaining. Sometimes you do something weird or not-obvious for good reason, but I see plenty of comments explaining things that only need to be explained due to poor design. Also poor comments that don't really explain anything that isn't obvious by looking.

nerdponx · on Jan 28, 2020

    def clean_start_year(x):
        # Account for the 7 year offset in the database
        return x + 7

    def process_record(record):
        record['start year'] = clean_start_year(record['record'])

It's perfectly obvious what this code does, but without the comment it's totally unclear why it does what it does.

clarry · on Jan 28, 2020

   const database_year_offset = 7;

   fix_start_year(year):
       return year + database_year_offset;

   process_record(record):
       record['start year'] = fix_start_year(record['record']);

In this version with one less magic constant (and renamed function), the comment would look very redundant.

DougBTX had the same idea, at the same time. I don't think their version needs the comment either. EDIT: Dang you removed a perfectly fine post :)

nerdponx · on Jan 28, 2020

I guess, but I think it's more that my example fails to illustrate the point well.

Informative comments cannot always be "factored out" like this. Do you at least agree that a ticket or bug #, or a link to an issue in an issue tracker, would be appropriate?

clarry · on Jan 28, 2020

I think a comment is appropriate if the problem cannot be factored out. But in my experience such things can be factored out most of the time, hence the argument that most code should be boring and immediately obvious.

Firadeoclus · on Jan 28, 2020

Factoring out a comment into code sometimes loses information. I would argue that database_year_offset is not as informative as the original comment. Free form text can say things better than identifier names.

nerdponx · on Jan 29, 2020

Most code, sure. But it's not an ideal to be striven for, or a metric to be used when evaluating code.

taneq · on Jan 28, 2020

Ah, the inline equivalent of Doxygen-for-the-sake-of-Doxygen documentation.

    int frob(const int x, void *context)
    Performs a frob on x in the given context.
    Parameters:
      x - int param
      context - a pointer to context
    Returns an id of the frob performed.

clarry · on Jan 28, 2020

I see a lot of this at work, sigh.

The better a variable / function is named (and the better its scope or purpose is limited to one thing), the less it needs comments. So there are a lot of self explanatory variables or functions with doxygen comments that state exactly the same thing the name already does, without adding anything besides noise. And then there are some poorly named things where the exact same thing happens: comment re-states what the bad identifier states. Sigh. Genuinely useful comments are rather rare.

nerdponx · on Jan 28, 2020

How is what I posted an example of this, in any way?

Use your imagination. The "cleaning" routine might not be a one-liner. Maybe it depends on multiple functions from another file/package/library.

AnanasAttack · on Jan 29, 2020

The comment doesn't explain why there even is an offset in the first place. Who put it there and why? I think this should be explained as well

AstralStorm · on Jan 28, 2020

The code might be obvious, but if you don't include the reason why it exists it's a target for removal. And having too many of those makes either for garbage or hard decisions when refactoring.

Writing the rationale is the most important comment you must not skip.

And there would be links to design documentation so that it can be kept up to date. (For non-programmers.)

swish_bob · on Jan 28, 2020

Why it exists ought to be explained by the existence of a test.

This isn't always possible, but it's far more possible than many developers seem to think.

minaa-chan · on Jan 28, 2020

That just makes two things that need to be commented doesn't it?

When I read code I care about what the code is supposed to do and why it's supposed to do that.

There is code that passes the current tests and is logically sound but no longer fits the business requirements. Knowing that it was implemented to solve a certain use case helps the reader / reviewer see whether the code and the test still fit or if they should be updated.

The best comments I've come across comment the intention and any context that isn't immediately obvious from the function.

swish_bob · on Jan 28, 2020

Your test should explain what it's testing. Via it's name or some other mechanism. Otherwise how do you know what's gone wrong when it fails?

minaa-chan · on Jan 28, 2020

That other mechanism is comments.

SketchySeaBeast · on Jan 28, 2020

Does that explain why, though? It explains that it's there, and someone obviously wanted it to be, but it still doesn't get at the why. I've found that most of the logic I'm left scratching my head at is due to some business rules that there's no better way to deal with - even if a test exists for it I still can't figure out what the tests purpose was, other than to just verify that it's working.

thfuran · on Jan 28, 2020

Just yesterday I wrote some code that was pretty boring, pretty readily understandable, and also pretty apparently a good target for an easy simplification. However that obvious simplification would also non-obviously introduce a bug that might not be noticed immediately. So I left a comment to the effect of "yeah these two lines look kinda dumb but we need it because X"

Could I have instead massively restructured the feature to prevent the possibility of the bug the comment was explaining the workaround for? Probably. But that would have increased the number of lines of code to make the change by a factor of something like 20x-200x and would have required far more testing. It also probably would have resulted in something overall more complicated.

classified · on Jan 28, 2020

I'm not sure whether I should envy or pity you. Writing code that is that obvious and boring cannot be very fulfilling. How long until a machine does that for you (after all, automating boring and obvious stuff is what we programmers do)? OTOH, on days where I feel lazy I can only wish my world were that simple.

gobayesgo · on Jan 30, 2020

Any monkey can puke thousands of lines of code. Producing code so boring that fellow humans can understand it immediately is the real challenge in programming. It requires much more effort and care than blaming everyone else for not understanding a sophisticated pile of mess.

goto11 · on Jan 28, 2020

Maybe because some people have learnt they should comment but haven't learnt how to comment properly. And as often the case in this industry, the fact that some people do some thing badly leads other people to forbid the thing altogether.

BitwiseFool · on Jan 28, 2020

We typically use comments to describe our client's business process. Even though the code itself looks straightforward what the client wants certainly isn't.

artsyca · on Jan 28, 2020

A rule of thumb I follow is that code is twice as hard to read as it is to write so comments may of course help but in the other case comments that have to overly explain what the code is doing or why it's doing it also indicate that the code ought to be revised

My process of writing code is to of course code a proof of concept in order to get a feel for the problem domain this could be considered a spike in agile parlance

Then the primary and alternate flows can be coded against a basic test suite

followed by a refactor and then a comments pass

Depending on what comments arise perhaps another refactor may be in order

I typically try to follow the spirit of TRUE software

blakehaswell · on Jan 29, 2020

> How much longer will this rumor persist that "good code" doesn't need commenting?

Indeed. I recently came across an article[0] which does a good job of debunking this idea, and describing different types of comments and when they can be valuable. Have a read if you're skeptical about comments.

[0] http://antirez.com/news/124

CraigJPerry · on Jan 28, 2020

Like what? It can’t be the “how” - the code should be able to express that clearly. It can’t be the “what”, again that should be apparent from some combination of code and configuration / metadata.

So that leaves the “why”. To my mind, code comments are the worst place to record why a particular implementation exists. The why often needs collaboration with non-coders.

artsyca · on Jan 28, 2020

It's definitely the why for any particular unorthodox bits like shims hacks workarounds and the like

The worst is finding some snippet from stack overflow or GitHub without a reference to the issue to back up why this thing is here or perhaps a TODO: with an improvement

minaa-chan · on Jan 28, 2020

I disagree. The code is the best place to comment the why. Regardless of what is documented or what is written up on jira or w/e, the code is what is true.

Commenting the purpose of the code and maybe a link to some documentation allows a future reader to see if the code still fulfills the requirements.

CraigJPerry · on Jan 28, 2020

The comments are very often not true, that's why there's so much push back

minaa-chan · on Jan 29, 2020

So they should be updated the same as any other documentation. They're there to help. They don't need to be 100% infallible to be useful.

luckycharms810 · on Jan 28, 2020

While the patterns themselves are interesting, the idea that all code must be rearranged in to smaller functions to be “refactored” seems a little juvenile.

hinkley · on Jan 28, 2020

There’s an exercise for creative writing that my lit major friend told me about. You take everything you’ve written and cut it up into sentences. You just push the sentences around until something clicks and you figure out how to write your way out of the “stuckness”.

Refactoring has rediscovered this trick, among others.

0x445442 · on Jan 28, 2020

Another similarity to creative writing is this; you're not done when there's nothing left to add, you're done when there's nothing left to remove.

ipnon · on Jan 28, 2020

The code that is easiest to revisit is decomposed into separate functions that are each trivial to understand. They are used to compose similarly trivial functions, and so on. Ideally every point of your program from input to output can be understood clearly and immediately.

0x445442 · on Jan 28, 2020

Makes the test a hell of a lot easier and more robust as well.

rgoulter · on Jan 28, 2020

I can agree "code should be in small functions" by itself is a bad guideline. I think it's suggested due to other principles around making code easier to read / use.

It'd be better to ask if the code is more complex than it needs to be.

I liked the terms John Ousterhout uses. For some module of a program (e.g. a method, class, package, etc.), it has an interface and an implementation. The interface is "what you need to know to use the module", the implementation is "how it's done".

Complexity comes from dependencies (e.g. more interfaces you need to know about), and from obscurity (e.g. things you need to know about, which aren't obvious from the interfaces given).

In some cases, many small methods would be an increase in complexity overall. In some cases, one big method may be an increase in complexity.

hinkley · on Jan 28, 2020

In most cases one big method is an increase of complexity (and others will add to it over time). Large methods are usually a failure of imagination, like bad and rambling prose.

Code is a set of instructions for the computer, and for the next person who has to maintain it (“ Programs are meant to be read by humans and only incidentally for computers to execute.”)

Cooking instructions are split into steps. Assembly instructions are split into steps. LEGO instructions are split into steps. When drawing instructions aren’t split into steps the Internet turns them into a cultural phenomenon (Step 2: draw the rest of the fucking owl).

Only programmers think they are immune to this ridicule, smash all the steps together, and then get salty when others complain, ignoring a series of luminaries who beg in every format of media available, and for decades, to do otherwise.

Seriously, organize your code, separate the steps. You’re killing us.

mason55 · on Jan 28, 2020

No one disagrees that code needs to work in steps. The conversation is around how much and the answer is going to depend on what you're doing.

Why have a function that prints a whole line? Isn't that just smashing together a bunch of small steps to print one character at a time?

CoolGuySteve · on Jan 28, 2020

It seems antithetical to the intent of "Goto considered harmful":

> My second remark is that our intellectual powers are rather geared to master static relations and that our powers to visualize processes evolving in time are relatively poorly developed. For that reason we should do (as wise programmers aware of our limitations) our utmost to shorten the conceptual gap between the static program and the dynamic process, to make the correspondence between the program (spread out in text space) and the process (spread out in time) as trivial as possible.

While Dijkstra later talks about using procedures to index into the call stack, needlessly adding to the depth of the callstack for single use functions seems excessive.

With this logic, I actually think "The paragraphs with headers" is the way to go since as TFA states: "You know that the algorithm operates in steps, and you know where the steps are located in code."

http://www.u.arizona.edu/~rubinson/copyright_violations/Go_T...

makapuf · on Jan 28, 2020

Functions can be inlined by the compiler if it thinks it's an issue. I think it's rather a question of readability : sometimes if a process involves a long series of step I like to be able to read them linearly rather than jumping back and forth to sub steps. I can read just the headers if they're cleanly organized by cleanly separated paragraphs (with braces if needed to indicate separate scopes and no leaking variables between paragraphs).

taspeotis · on Jan 28, 2020

https://news.ycombinator.com/item?id=8374345

DavidVoid · on Jan 28, 2020

Since that blog seems to be down at the moment here's a cached version of the article/email.

https://webcache.googleusercontent.com/search?q=cache:http:/...

reuben_scratton · on Jan 28, 2020

Absolutely this.

More functions => More complexity => Bad.

Also: decreased legibility, less opportunity for compiler optimisation.

saltyfamiliar · on Jan 28, 2020

I'm glad I'm not the only one who pays attention to this. This is completely unsubstantiated and coming from a relatively inexperienced programmer, but it's been my experience that better code tends to produce nicer shapes. An obvious example not mentioned in the article arises when code is nested too deeply and too thinly, creating jarring and disproportionate peaks.

Thorrez · on Jan 28, 2020

Microsoft's docs have an interesting idea of what good nesting looks like:

https://docs.microsoft.com/en-us/previous-versions/windows/d...

Apparently 13 different indentation levels is good.

jpz · on Jan 28, 2020

I actually find that surprisingly readable, probably because of the indent size which clearly allows the eye to see the blocks - and the size of my very large monitor - but this code seems to indicate a religious dogma about avoiding early returns, which would otherwise straighten it up.

a_t48 · on Jan 28, 2020

Refactoring it effectively needs some sort of scope guard, too, to run `pfd->Release();` and the like. It's just as much dogma to write C++ like C, avoiding RAII.

reilly3000 · on Jan 28, 2020

DonHopkins · on Jan 28, 2020

Good code should be tightly wrapped around the problem, and reflect the shape of the problem itself.

But sometimes, the shape can be deceptive:

https://www.youtube.com/watch?v=CAUxhXIeSc8

mtts · on Jan 28, 2020

I review a lot of code for work and I agree with you that you can usually tell the quality of the code simply by looking at its outward appearance. Rough edges and bizarre shapes almost always indicate poor, unstructured thinking in the code itself.

Not saying I go by shape alone, but if it looks messy, that's a warning flag.

hinkley · on Jan 28, 2020

I used to push hard for my favorite indentation style, and at one point I figured out that a big reason I could find bugs so fast was that, given reasonable formatting, I could see code smells just from the shape.

If the bug wasn’t in the ugliest part of the code, then that code likely had two bugs in it.

mariojv · on Jan 28, 2020

I hadn’t thought about the shape of code much as a refactoring heuristic, but it’s neat to see a post about the concept of code shape.

I found examining code shape to be an effective method for assisting in algorithm recall in some college courses. Now working with larger codebases in industry, I find it really useful for navigating around large files or remembering where to go for particular snippets of code.

I wonder if this phenomenon has anything to do with different spatial tricks people use for memorization and recall (the “memory palace” technique). I’ve never thought of myself as having good spatial reasoning at all, but this kind of thing makes me question the boundary between “spatial” and symbolic / verbal thinking.

zdragnar · on Jan 28, 2020

My favorite name for a shape of code I've heard is the "pyramid of doom" caused by excessive nesting- think lots of ifs or callback hell in something like node.js

It's easy to recognize, and the importance of recognizing it is right in the name.

quickthrower2 · on Jan 28, 2020

Luckily we have async await to save us!

foxes · on Jan 28, 2020

For me I would think of the "shape of code" as an abstract measure of complexity [0]. More text might correlate with more complex logic. This is a measure in a language which manipulates state / control flow.

I'm not sure what happens in a functional language. Maybe you could just think about numbers of functions. In Haskell sometimes it helps to write things in a pointfree way and you can spot some more general function to replace some noise.

[0] https://en.wikipedia.org/wiki/Cyclomatic_complexity

Huggernaut · on Jan 28, 2020

Sandi Metz talked about a thing she does in her 2014 All the Little Things talk that she calls the "Squint Test". The idea being that you can tell a lot about the code from only its shape and colours by squinting at it.

Someone made an Atom plugin: https://atom.io/packages/squint-test

hackinthebochs · on Jan 28, 2020

This is exactly why I hate the trend in reducing structural characters in new languages, or having it be almost all words (e.g. python). There is much to understand about code by its shape, but blocks of words tend to have much less obvious at-a-glance shape than similar code with structural characters thrown in.

lincpa · on Jan 28, 2020

The data flow programming of pure pipeline structure, It systematically simulates integrated circuit systems and large industrial production lines.

In the computer field, for the first time, it was realized that the unification of hardware engineering and software engineering on the logical model.

It has been extended from Lisp language-level code and data unification to system engineering-level software and hardware unification.

and it brings large industrial production theory and methods to software engineering. It incorporates IT industry into modern large industrial production systems, This is an epoch-making innovative theory and method.

This is the [Pure Function Pipeline Data Flow v3.0 with Warehouse/Workshop Model](https://github.com/linpengcheng/PurefunctionPipelineDataflow).

RickJWagner · on Jan 28, 2020

I'm a software maintenance engineer. I make my living reading other people's code and trying to internalize it so I can make repairs where they are needed.

This is an interesting article, and I intend to go listen to the podcast, too.

In the past I've made efforts to try to speed up the process of learning a codebase. I'd do things like copy the code into a text processor, then shrink the font so I could see which files (classes) were biggest, and look for repeating shapes like the author mentions. I'd also use code cleaners to point out troublesome classes and UML full-trip tools to try to get a good sequence diagram out of a piece of code.

It was all cumbersome, unfortunately. Most days I just use 'grep' to help me figure things out. I'm still looking for helpers, though. I'm hoping the podcast helps.

DarkCrusader2 · on Jan 28, 2020

This post very nicely summarizes what I now realize I have been doing subconsciously all the time. Writing in paragraphs with headings, turning short else branches in guards, refactoring "sharp saw teeth" etc. Good to know that others also use these heuristics.

0x445442 · on Jan 28, 2020

> ...understand legacy code, refactor long functions...

Wouldn't this invite consternation during code reviews because it doesn't have anything to do with the current feature? Or are you one of the lucky few who gets to exercise professional judgment when working?

l0b0 · on Jan 28, 2020

Handy patterns, which could probably fairly easily be implemented as a high-level linter.

crimsonalucard · on Jan 28, 2020

This is the surface shape of code.

There is an intrinsic and deeper geometry that textual programming hides.

gitgud · on Jan 28, 2020

High indentation has always been a good heuristic for high complexity.

More indentation equals more mental context required.

But although these shorter, less-indented functions are inherently easier to understand, they're not necessarily freer of bugs...

crimsonalucard · on Jan 28, 2020

Functional code tends to extend horizontally a lot.

sinuhe69 · on Jan 28, 2020

Functional code would be totally different, I guess.

pmontra · on Jan 28, 2020

Probably. This is my experience with the case of the unbalanced if.

When it happens to me in Python or Ruby, I return immediately from the smaller branch and move the other branch out of the if, to the main level.

When it happens in Elixir... It doesn't happen because I don't use ifs there. I write two versions of the same function. One matches the condition for the then branch, the other the condition for the else one.