Hacker News new | past | comments | ask | show | jobs | submit login
The Shapes of Code (fluentcpp.com)
144 points by jandeboevrie on Jan 28, 2020 | hide | past | favorite | 68 comments



> ... not so good code: if it was, we wouldn’t need a comment...

How much longer will this rumor persist that "good code" doesn't need commenting? Good comments explain things that aren't immediately obvious from the code.


You left out the most important part of this criticism:

> ...to explain each line of code.

Commenting every line is a hint to a more pervasive problem. This section has nothing to do with "comments are bad".


Most code should be boring and immediately obvious. I don't run into lots of real world code that needs much explaining. Sometimes you do something weird or not-obvious for good reason, but I see plenty of comments explaining things that only need to be explained due to poor design. Also poor comments that don't really explain anything that isn't obvious by looking.


    def clean_start_year(x):
        # Account for the 7 year offset in the database
        return x + 7

    def process_record(record):
        record['start year'] = clean_start_year(record['record'])
It's perfectly obvious what this code does, but without the comment it's totally unclear why it does what it does.


   const database_year_offset = 7;

   fix_start_year(year):
       return year + database_year_offset;

   process_record(record):
       record['start year'] = fix_start_year(record['record']);
In this version with one less magic constant (and renamed function), the comment would look very redundant.

DougBTX had the same idea, at the same time. I don't think their version needs the comment either. EDIT: Dang you removed a perfectly fine post :)


I guess, but I think it's more that my example fails to illustrate the point well.

Informative comments cannot always be "factored out" like this. Do you at least agree that a ticket or bug #, or a link to an issue in an issue tracker, would be appropriate?


I think a comment is appropriate if the problem cannot be factored out. But in my experience such things can be factored out most of the time, hence the argument that most code should be boring and immediately obvious.


Factoring out a comment into code sometimes loses information. I would argue that database_year_offset is not as informative as the original comment. Free form text can say things better than identifier names.


Most code, sure. But it's not an ideal to be striven for, or a metric to be used when evaluating code.


Ah, the inline equivalent of Doxygen-for-the-sake-of-Doxygen documentation.

    int frob(const int x, void *context)
    Performs a frob on x in the given context.
    Parameters:
      x - int param
      context - a pointer to context
    Returns an id of the frob performed.


I see a lot of this at work, sigh.

The better a variable / function is named (and the better its scope or purpose is limited to one thing), the less it needs comments. So there are a lot of self explanatory variables or functions with doxygen comments that state exactly the same thing the name already does, without adding anything besides noise. And then there are some poorly named things where the exact same thing happens: comment re-states what the bad identifier states. Sigh. Genuinely useful comments are rather rare.


How is what I posted an example of this, in any way?

Use your imagination. The "cleaning" routine might not be a one-liner. Maybe it depends on multiple functions from another file/package/library.


The comment doesn't explain why there even is an offset in the first place. Who put it there and why? I think this should be explained as well


The code might be obvious, but if you don't include the reason why it exists it's a target for removal. And having too many of those makes either for garbage or hard decisions when refactoring.

Writing the rationale is the most important comment you must not skip.

And there would be links to design documentation so that it can be kept up to date. (For non-programmers.)


Why it exists ought to be explained by the existence of a test.

This isn't always possible, but it's far more possible than many developers seem to think.


That just makes two things that need to be commented doesn't it?

When I read code I care about what the code is supposed to do and why it's supposed to do that.

There is code that passes the current tests and is logically sound but no longer fits the business requirements. Knowing that it was implemented to solve a certain use case helps the reader / reviewer see whether the code and the test still fit or if they should be updated.

The best comments I've come across comment the intention and any context that isn't immediately obvious from the function.


Your test should explain what it's testing. Via it's name or some other mechanism. Otherwise how do you know what's gone wrong when it fails?


That other mechanism is comments.


Does that explain why, though? It explains that it's there, and someone obviously wanted it to be, but it still doesn't get at the why. I've found that most of the logic I'm left scratching my head at is due to some business rules that there's no better way to deal with - even if a test exists for it I still can't figure out what the tests purpose was, other than to just verify that it's working.


Just yesterday I wrote some code that was pretty boring, pretty readily understandable, and also pretty apparently a good target for an easy simplification. However that obvious simplification would also non-obviously introduce a bug that might not be noticed immediately. So I left a comment to the effect of "yeah these two lines look kinda dumb but we need it because X"

Could I have instead massively restructured the feature to prevent the possibility of the bug the comment was explaining the workaround for? Probably. But that would have increased the number of lines of code to make the change by a factor of something like 20x-200x and would have required far more testing. It also probably would have resulted in something overall more complicated.


I'm not sure whether I should envy or pity you. Writing code that is that obvious and boring cannot be very fulfilling. How long until a machine does that for you (after all, automating boring and obvious stuff is what we programmers do)? OTOH, on days where I feel lazy I can only wish my world were that simple.


Any monkey can puke thousands of lines of code. Producing code so boring that fellow humans can understand it immediately is the real challenge in programming. It requires much more effort and care than blaming everyone else for not understanding a sophisticated pile of mess.


Maybe because some people have learnt they should comment but haven't learnt how to comment properly. And as often the case in this industry, the fact that some people do some thing badly leads other people to forbid the thing altogether.


We typically use comments to describe our client's business process. Even though the code itself looks straightforward what the client wants certainly isn't.


A rule of thumb I follow is that code is twice as hard to read as it is to write so comments may of course help but in the other case comments that have to overly explain what the code is doing or why it's doing it also indicate that the code ought to be revised

My process of writing code is to of course code a proof of concept in order to get a feel for the problem domain this could be considered a spike in agile parlance

Then the primary and alternate flows can be coded against a basic test suite

followed by a refactor and then a comments pass

Depending on what comments arise perhaps another refactor may be in order

I typically try to follow the spirit of TRUE software


> How much longer will this rumor persist that "good code" doesn't need commenting?

Indeed. I recently came across an article[0] which does a good job of debunking this idea, and describing different types of comments and when they can be valuable. Have a read if you're skeptical about comments.

[0] http://antirez.com/news/124


Like what? It can’t be the “how” - the code should be able to express that clearly. It can’t be the “what”, again that should be apparent from some combination of code and configuration / metadata.

So that leaves the “why”. To my mind, code comments are the worst place to record why a particular implementation exists. The why often needs collaboration with non-coders.


It's definitely the why for any particular unorthodox bits like shims hacks workarounds and the like

The worst is finding some snippet from stack overflow or GitHub without a reference to the issue to back up why this thing is here or perhaps a TODO: with an improvement


I disagree. The code is the best place to comment the why. Regardless of what is documented or what is written up on jira or w/e, the code is what is true.

Commenting the purpose of the code and maybe a link to some documentation allows a future reader to see if the code still fulfills the requirements.


The comments are very often not true, that's why there's so much push back


So they should be updated the same as any other documentation. They're there to help. They don't need to be 100% infallible to be useful.


While the patterns themselves are interesting, the idea that all code must be rearranged in to smaller functions to be “refactored” seems a little juvenile.


There’s an exercise for creative writing that my lit major friend told me about. You take everything you’ve written and cut it up into sentences. You just push the sentences around until something clicks and you figure out how to write your way out of the “stuckness”.

Refactoring has rediscovered this trick, among others.


Another similarity to creative writing is this; you're not done when there's nothing left to add, you're done when there's nothing left to remove.


The code that is easiest to revisit is decomposed into separate functions that are each trivial to understand. They are used to compose similarly trivial functions, and so on. Ideally every point of your program from input to output can be understood clearly and immediately.


Makes the test a hell of a lot easier and more robust as well.


I can agree "code should be in small functions" by itself is a bad guideline. I think it's suggested due to other principles around making code easier to read / use.

It'd be better to ask if the code is more complex than it needs to be.

I liked the terms John Ousterhout uses. For some module of a program (e.g. a method, class, package, etc.), it has an interface and an implementation. The interface is "what you need to know to use the module", the implementation is "how it's done".

Complexity comes from dependencies (e.g. more interfaces you need to know about), and from obscurity (e.g. things you need to know about, which aren't obvious from the interfaces given).

In some cases, many small methods would be an increase in complexity overall. In some cases, one big method may be an increase in complexity.


In most cases one big method is an increase of complexity (and others will add to it over time). Large methods are usually a failure of imagination, like bad and rambling prose.

Code is a set of instructions for the computer, and for the next person who has to maintain it (“ Programs are meant to be read by humans and only incidentally for computers to execute.”)

Cooking instructions are split into steps. Assembly instructions are split into steps. LEGO instructions are split into steps. When drawing instructions aren’t split into steps the Internet turns them into a cultural phenomenon (Step 2: draw the rest of the fucking owl).

Only programmers think they are immune to this ridicule, smash all the steps together, and then get salty when others complain, ignoring a series of luminaries who beg in every format of media available, and for decades, to do otherwise.

Seriously, organize your code, separate the steps. You’re killing us.


No one disagrees that code needs to work in steps. The conversation is around how much and the answer is going to depend on what you're doing.

Why have a function that prints a whole line? Isn't that just smashing together a bunch of small steps to print one character at a time?


It seems antithetical to the intent of "Goto considered harmful":

> My second remark is that our intellectual powers are rather geared to master static relations and that our powers to visualize processes evolving in time are relatively poorly developed. For that reason we should do (as wise programmers aware of our limitations) our utmost to shorten the conceptual gap between the static program and the dynamic process, to make the correspondence between the program (spread out in text space) and the process (spread out in time) as trivial as possible.

While Dijkstra later talks about using procedures to index into the call stack, needlessly adding to the depth of the callstack for single use functions seems excessive.

With this logic, I actually think "The paragraphs with headers" is the way to go since as TFA states: "You know that the algorithm operates in steps, and you know where the steps are located in code."

http://www.u.arizona.edu/~rubinson/copyright_violations/Go_T...


Functions can be inlined by the compiler if it thinks it's an issue. I think it's rather a question of readability : sometimes if a process involves a long series of step I like to be able to read them linearly rather than jumping back and forth to sub steps. I can read just the headers if they're cleanly organized by cleanly separated paragraphs (with braces if needed to indicate separate scopes and no leaking variables between paragraphs).



Since that blog seems to be down at the moment here's a cached version of the article/email.

https://webcache.googleusercontent.com/search?q=cache:http:/...


Absolutely this.

More functions => More complexity => Bad.

Also: decreased legibility, less opportunity for compiler optimisation.


I'm glad I'm not the only one who pays attention to this. This is completely unsubstantiated and coming from a relatively inexperienced programmer, but it's been my experience that better code tends to produce nicer shapes. An obvious example not mentioned in the article arises when code is nested too deeply and too thinly, creating jarring and disproportionate peaks.


Microsoft's docs have an interesting idea of what good nesting looks like:

https://docs.microsoft.com/en-us/previous-versions/windows/d...

Apparently 13 different indentation levels is good.


I actually find that surprisingly readable, probably because of the indent size which clearly allows the eye to see the blocks - and the size of my very large monitor - but this code seems to indicate a religious dogma about avoiding early returns, which would otherwise straighten it up.


Refactoring it effectively needs some sort of scope guard, too, to run `pfd->Release();` and the like. It's just as much dogma to write C++ like C, avoiding RAII.


wow.


Good code should be tightly wrapped around the problem, and reflect the shape of the problem itself.

But sometimes, the shape can be deceptive:

https://www.youtube.com/watch?v=CAUxhXIeSc8


I review a lot of code for work and I agree with you that you can usually tell the quality of the code simply by looking at its outward appearance. Rough edges and bizarre shapes almost always indicate poor, unstructured thinking in the code itself.

Not saying I go by shape alone, but if it looks messy, that's a warning flag.


I used to push hard for my favorite indentation style, and at one point I figured out that a big reason I could find bugs so fast was that, given reasonable formatting, I could see code smells just from the shape.

If the bug wasn’t in the ugliest part of the code, then that code likely had two bugs in it.


I hadn’t thought about the shape of code much as a refactoring heuristic, but it’s neat to see a post about the concept of code shape.

I found examining code shape to be an effective method for assisting in algorithm recall in some college courses. Now working with larger codebases in industry, I find it really useful for navigating around large files or remembering where to go for particular snippets of code.

I wonder if this phenomenon has anything to do with different spatial tricks people use for memorization and recall (the “memory palace” technique). I’ve never thought of myself as having good spatial reasoning at all, but this kind of thing makes me question the boundary between “spatial” and symbolic / verbal thinking.


My favorite name for a shape of code I've heard is the "pyramid of doom" caused by excessive nesting- think lots of ifs or callback hell in something like node.js

It's easy to recognize, and the importance of recognizing it is right in the name.


Luckily we have async await to save us!


For me I would think of the "shape of code" as an abstract measure of complexity [0]. More text might correlate with more complex logic. This is a measure in a language which manipulates state / control flow.

I'm not sure what happens in a functional language. Maybe you could just think about numbers of functions. In Haskell sometimes it helps to write things in a pointfree way and you can spot some more general function to replace some noise.

[0] https://en.wikipedia.org/wiki/Cyclomatic_complexity


Sandi Metz talked about a thing she does in her 2014 All the Little Things talk that she calls the "Squint Test". The idea being that you can tell a lot about the code from only its shape and colours by squinting at it.

Someone made an Atom plugin: https://atom.io/packages/squint-test


This is exactly why I hate the trend in reducing structural characters in new languages, or having it be almost all words (e.g. python). There is much to understand about code by its shape, but blocks of words tend to have much less obvious at-a-glance shape than similar code with structural characters thrown in.


The data flow programming of pure pipeline structure, It systematically simulates integrated circuit systems and large industrial production lines.

In the computer field, for the first time, it was realized that the unification of hardware engineering and software engineering on the logical model.

It has been extended from Lisp language-level code and data unification to system engineering-level software and hardware unification.

and it brings large industrial production theory and methods to software engineering. It incorporates IT industry into modern large industrial production systems, This is an epoch-making innovative theory and method.

This is the [Pure Function Pipeline Data Flow v3.0 with Warehouse/Workshop Model](https://github.com/linpengcheng/PurefunctionPipelineDataflow).


I'm a software maintenance engineer. I make my living reading other people's code and trying to internalize it so I can make repairs where they are needed.

This is an interesting article, and I intend to go listen to the podcast, too.

In the past I've made efforts to try to speed up the process of learning a codebase. I'd do things like copy the code into a text processor, then shrink the font so I could see which files (classes) were biggest, and look for repeating shapes like the author mentions. I'd also use code cleaners to point out troublesome classes and UML full-trip tools to try to get a good sequence diagram out of a piece of code.

It was all cumbersome, unfortunately. Most days I just use 'grep' to help me figure things out. I'm still looking for helpers, though. I'm hoping the podcast helps.


This post very nicely summarizes what I now realize I have been doing subconsciously all the time. Writing in paragraphs with headings, turning short else branches in guards, refactoring "sharp saw teeth" etc. Good to know that others also use these heuristics.


> ...understand legacy code, refactor long functions...

Wouldn't this invite consternation during code reviews because it doesn't have anything to do with the current feature? Or are you one of the lucky few who gets to exercise professional judgment when working?


Handy patterns, which could probably fairly easily be implemented as a high-level linter.


This is the surface shape of code.

There is an intrinsic and deeper geometry that textual programming hides.


High indentation has always been a good heuristic for high complexity.

More indentation equals more mental context required.

But although these shorter, less-indented functions are inherently easier to understand, they're not necessarily freer of bugs...


Functional code tends to extend horizontally a lot.


Functional code would be totally different, I guess.


Probably. This is my experience with the case of the unbalanced if.

When it happens to me in Python or Ruby, I return immediately from the smaller branch and move the other branch out of the if, to the main level.

When it happens in Elixir... It doesn't happen because I don't use ifs there. I write two versions of the same function. One matches the condition for the then branch, the other the condition for the else one.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: