Hacker News new | past | comments | ask | show | jobs | submit login

Another quote of his;

"Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowchart; it'll be obvious." -- Fred Brooks, The Mythical Man Month (1975)

Stated a different way:

"Bad programmers worry about the code. Good programmers worry about data structures and their relationships." -Linus Torvalds




While I agree with both quotes separately, I don't think that they necessarily mean the same thing.

To me, the first (by Brooks) seems to be about grasping the domain model to understand what the system does (or can do) in general.

Wheras the second (by Torvalds) seems to be about how best to organize data in code for efficient processing. Array, hash, tree, heap, etc and their associated access time complexity. The efficiency of your solution depends on your choice of a data structure that fits the local problem.


What if the table includes a poorly labeled column entitled “fiat@“?


Underrated comment that would have been perfect if you said "labled"


The Linus quote is ripe for misinterpretation. Not worrying about the code can lead to an unreadable mess, that ones future self or others will hate working with. So a really good programmer will probably rather go the Sussman way and realize, that programs are firstly meant to be read and only lastly meant to be run by a computer (paraphrasing here).


> The Linus quote

I always attributed it to Rob Pike, but it turns out Pike's is following:

> Rule 4. Fancy algorithms are buggier than simple ones, and they're much harder to implement. Use simple algorithms as well as simple data structures.

> Rule 5. Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.

https://users.ece.utexas.edu/~adnan/pike.html

Interestingly enough, the above link has this to say:

> Rule 5 was previously stated by Fred Brooks in The Mythical Man-Month.

Which I guess references GP's excerpt.

Also says this, which kinda loops back to Linus's way of saying it:

> Rule 5 is often shortened to "write stupid code that uses smart objects".


Designing good data structures is very important for code readability too. If you struggle to understand how given data structure is used / what it contains it would be hard to understand the rest of the code.


There's a lot of code out there that would improve massively with better data structures though.

I mean I've waded through tons of code where the original author abused strings to indicate relationships in a table - a column with semicolon-separated values referencing other tables / rows. And a ton of code to check references.

Mind you, that's more database design than data structures, but they're close enough for this example.


In reality though the tables are going to be like: Column named “price_usd_incl_tax” is neither in usd nor does it include all taxes.


Don't worry, the code is self-documenting. /s

Self-documenting code (what I take in practice means "no comments"-culture) is something I don't understand how it can work, never seen a good implementation of it. It _can_ be successful in describing the _what_ but is poorly or not at all describing the _why_. Perhaps I'm in the wrong domain for that though.


In practice it really does mean self documenting code.

Like variables called "daysSinceDocumentLastUpdated" instead of "days". The why comes from reading a sequence of such well described symbols, laid out in a easy to follow way.

It doesn't do away with comments, but it reduces them to strange situations, which in turn provides refactoring targets.

Tbh, its major benefit is the fact that comments get stale or don't get updated, because they aren't held in line by test suites and compilers.

Most comments I come across in legacy code simply don't mean anything to me or any coworkers, and often cause more confusion. So they just get deleted anyway.


In most cases, even though there's verbose variable names you still can't understand the why just by reading the code. And even if you did, why would one want to?

Most often I'm just skimming through, and actual descriptions are much better than having to read the code itself.

This whole notion of "documentation can get out of sync with the code, so it's better not to write it at all" is so nonsensical.

Why isn't the solution simply: "lets update the docs when we update the code". Is this so unfathomably hard to do?


> This whole notion of "documentation can get out of sync with the code, so it's better not to write it at all" is so nonsensical.

To me, this feels similar to finding the correct granularity of unit tests or tests in general. Too many tests coupled to the implementation too tightly are a real pain. You end up doing a change 2-3 times in such a situation - once to the actual code, and then 2-3 times to tests looking at the code way too closely.

And comments start to feel similar. Comments can have a scope that's way too close to the code, rendering them very volatile and oftentimes neglected. You know, these kind of comments that eventually escalate into "player.level += 3 // refund 1 player level after error". These are bad comments.

But on the other hand, some comments are covering more stable ground or rather more stable truths. For example, even if we're splitting up our ansible task files a lot, you still easily end up with several pages of tasks because it's just verbose. By now, I very much enjoy having a couple of three to five line boxes just stating "Service Installation", "Config Facts Generation", "Config Deployment", each showing that 3-5 following tasks are part of a section. And that's fairly stable, the config deployment isn't suddenly going to end up being something different.

Or, similarly, we tend to have headers to these task files explaining the idiosyncratic behaviors of a service ansible has to work around to get things to work. Again, these are pretty stable - the service has been weird for years, so without a major rework, it will most likely stay weird. These comments largely get extended over time as we learn more about the system, instead of growing out of date.


> Comments can have a scope that's way too close to the code, rendering them very volatile and oftentimes neglected.

I think this is a well put and nuanced insight. Thank you.

This is really what the dev community should be discussing; the "type" of comments and docs to add and the shape thereof. Not a poorly informed endless debate whether it should be there in the first place.


> To me, this feels similar to finding the correct granularity of unit tests or tests in general.

I recently had an interview with what struck me as a pretty bizarre question about testing.

The setup was that you, the interviewee, are given a toy project where a recent commit has broken unrelated functionality. The database has a "videos" table which includes a column for an affiliated "user email"; there's also a "users" table with an "email" column. There's an API where you can ask for an enhanced video record that includes all the user data from the user with the email address noted in the "videos" entry, as opposed to just the email.

This API broke with the recent commit, because the new functionality fetches video data from somewhere external and adds it to the database without checking whether the email address in the external data belongs to any existing user. And as it happens, it doesn't.

With the problem established, the interviewer pointed out that there was a unit test associated with the bad commit, and it was passing, which seemed like a problem. How could we ensure that this problem didn't reoccur in some later commit?

I said "we should normalize the database so that the video record contains a user ID rather than directly containing the user's email address."

"OK, that's one way. But how could we write a test to make sure this doesn't happen?"

---

I still find this weird. The problem is that the database is in an inconsistent state. That could be caused by anything. If we attempt to restore from backup (for whatever reason), and our botched restore puts the database in an inconsistent state, why would we want that to show up as a failing unit test in the frontend test suite? In that scenario, what did the frontend do wrong? How many different database inconsistencies do we want the frontend test suite to check for?


That makes no sense to me either. In my book, tests in a software project are largely responsible to check that desired functionality exists, most often to stop later changes from breaking functionality. For example, if you're in the process of moving the "user_email" from the video entity to an embedded user entity, a couple of useful tests could ensure that the email appears in the UI regardless if it's in `video.user_email` or in `video.user.email`.

Though, interestingly enough, I have built a test that could have caught similar problems back when we switched databases from mysql to postgresql. It would fire up a mysql based database with an integration test dump, extract and transform the data with an internal tool similar to pgloader, push it into a postgres in a container. After all of that, it would run the integration tests of our app against both databases and flag if the tests failed differently on both databases. And we have similar tests for our automated backup restores.

But that's quite far away from a unit test of a frontend application. At least I think so.


> With the problem established, the interviewer pointed out that there was a unit test associated with the bad commit, and it was passing, which seemed like a problem. How could we ensure that this problem didn't reoccur in some later commit?

It would seem that the unit test itself should be replaced with something else, or removed altogether, in addition to whatever structural changes you put in place. If you changed db constraints, I could see, maybe, a test that verifies the constraints works to prevent the previous data flow from being accepted at all - failing with an expected exception or similar. But that may not be what they were wanting to hear?


> This whole notion of "documentation can get out of sync with the code, so it's better not to write it at all" is so nonsensical.

I do believe that in a lot of case an outdated, wrong or plain erroneous documentation does more harm than no documentation. And while the correct solution is obviously "update the doc when we update the code", it has been empirically proven not to work across a range of projects.


What 'has' been proven then? No comments or docs? Long variable and method names?

I just had a semi-interview the other day, and was talking with someone about the docs and testing stuff I've done in the past. One of the biggest 'lessons' I picked up, after having adopted doc/testing as "part of the process" was... test/doc hygiene. It wasn't always that stuff was 'out of date', but even just realizing that "hey, we don't use XYZ anymore - let's remove it and the tests", or "let's spend some time revisiting the docs and tests and cull or consolidate stuff now that we know about the problem". Test optimization, or doc optimization, perhaps. It was always something I had to fight for time for, or... 'sneak' it in to commits. Someone reviewing would inevitably question a PR with "why are you changing all this unrelated stuff - the ticket says FOO, not FOO and BAR and BAZ".

Getting 'permission' to keep tests and docs current/relevant was, itself, somewhat of a challenge. It was exacerbated by people who themselves weren't writing tests or code, meaning more 'drift' was introduced between existing code/tests and reality. But blocking someone's PR because it had no tests or docs was "being negative", but blocking my PR because I included 'unnecessary doc changes' was somehow valid.


But arguments around "is this so hard?", or resolution stripping like "so don't write documents at all", are more about superiority signalling, aimed at individualistic benefit.

The fact is that, when you zoom out to org level, comments do quickly drift out of sync and value, and so engineering managers must encourage code writing that will maintain integrity over time, regardless of what people "should" be able to do.


The argument isn’t that it’s better to not write it at all, it’s that it’s not worth the effort when you could have done something else. Opportunity cost and all that.


People are lazy.


Lazy people work the hardest. It's an up front investment for a big payoff later when you can grok your code in scannable blocks instead of having to read a dozen lines and pause to contemplate what they mean, then juggle them in your memory with other blocks until you find the block you're looking for.

Comments allow for a high-level view of your code, and people who don't value that probably on average have a slower overall output.


What you write in your first para is so self evidently true, at least to me.

I simply cannot comprehend the mindset that views comments as unnecessary. Or worse, removes existing useful comments in some quest for "self-documenting" purity.

I've worked in some truly huge codebases (40m LOC, 20k commits a week, 4k devs) so I think I have a pretty good idea of what's easy vs hard in understanding unfamiliar code.


As the late Chesterton said, "Don't ever take a fence down until you know the reason why it was put up."

A lot of people think comments are descriptive rather than prescriptive. They think a comment is the equivalent of writing "Fence" on a plaque and nailing it to the fence. "It's a fence," they say, "You don't need a sign to know that."

Later, when the next property owner discovers the fence, they are stumped. What the hell was this put here for? A prescriptive comment might have said, "This was erected to keep out chupacabras," answering not what it is, but why.

You might know about the chupacabras, but if you don't pass it on then you clearly don't care about who has to inherit your property.


> Lazy people work the hardest.

What's amazingly funny is that many people think this is a positive, because they ascribe more value to working hard than to achieving results. I even thought your comment was going to go that way when I first read it.


Better, a "last_updated" method on instances of "Document", that being an "Age" instance with a "days" method: document.last_updated.days

Self describing code does not need theRidiculouslyLongNamesPerferredByJavaCoders.


Yes, was just an illustrative example


The why should be clear from the domain that you're working within. A line of comment should count as something like 10 lines of code, if you're reading a comment then you're treading into real complexity. If you're in a code base where that isn't true, then is the comment really necessary?

Fairly hot take from me, life is more ambiguous than that :-).


> The why should be clear from the domain that you're working within

Sometimes the 'why' is purely domain knowledge. Sometimes the 'why' is about narrowing down options available in the domain. Sometimes the 'why' is about a choice made for reasons that aren't specific to the domain. And sometimes the 'why' is about the code that wasn't written, so it can't possibly be in the code that was.


> sometimes the 'why' is about the code that wasn't written, so it can't possibly be in the code that was.

I have often had to write extensive comments related to this to prevent well meaning coders who are not expert in the domain from replacing the apparently bad or low performance code with an obvious but wrong 'improvement'.


In a perfect world, tests and assertions would protect from that, but yes, that's a good use of comments.


"Sometimes" doing all the lifting there.

Comments are supplemental. If you have just added some weird, non-obvious, bit of code because you needed to compromise, or work around some other quirk, go ahead and comment. No one is going to (sanely) object to that.


What you describe is how I tend to comment. At the opposite end of the spectrum we have Knuth's 'literate programming', exemplified in Tex, which has as its goal 'making programs more robust, more portable, more easily maintained, and arguably more fun' [0] by merging documentation with code. I'd bet if you counted documentation lines vs. code lines in Tex they'd be near 50/50, and I'd bet that if we asked Knuth whether the comment lines were supplemental he'd say no.

[0] https://www-cs-faculty.stanford.edu/~knuth/lp.html


Grokking that 'why' can take non-trivial mental effort by a non-author, even when well coded/documented. Worse, if the code is needlessly complex, or trying to be smart or over-engineered, any amount of commenting wont help. The non-author (maintainer) of the code is now burdened. And if (so commonly happens) - they dismiss the original code as 'non-performant' or 'not a best practice' or something else.. we know how that plays out.


> The why should be clear from the domain that you're working within.

I hear this commonly from coders who haven't had the ambiguous pleasure of working with old, production critical codebases from generations of coders who have come and gone, with technical decisions buffeted around by ever-shifting organizational political and budgeting winds. Knowing the why's that leadership cares about is far more important to your career than the technical why's, which are along for the ride.

Once you go into production with tens of thousands of users and up, with SLA's driven by how many commas of money going up in smoke per minute...yeah, illusions of "pure" domain knowledge driving understanding of function dictating code form evaporate like a drop of distilled water on the Death Valley desert hardpan in the middle of summer.

I used to be like that as well years ago, but some kind greybeards who took me under their wings slapped that out of me.

Now my personal hobby code with an "unlimited" budget and I'm the sole producer and consumer? Yep, far closer to this Platonic ideal where comments are terse and sparse, and the code is tightly coupled to the domain.


Code is almost never self-documenting. That's why there are so many O'Reilly books out there.

A great example is AWK: It's a tool, and it comes with a book from the people who made the software. That's how I like my software.


To your point, we also have 'The C Programming Language', K&R, and 'The Unix Programming Environment', K&Pike.


Seems like the common denominator is the K here!


Or the Emacs book.


Unless you are in some trivial startup domain, real domains (TM) have almost fractal-level complexities if you dig deep enough, corner cases, sometimes illogical rules etc.

The "why" is still very much needed since it can have 10 different and even conflicting reasons, and putting it in the code in appropriate amount shows certain type of professional maturity and also emotional intelligence/empathy towards rest of the team.

I mean, somebody has to be extremely junior to never experience trying to grok somebody's else old non-trivial code in situation when you need to deliver fix/change on it now. And its fairly trivial to write very complex code even in few lines, which some smart inexperienced juniors (even older, but total skill-wise still juniors) produce as some sort of intellectual game out of boredom.


And even more important than the 'why' can be the 'why not'? Ie explanations for implementation choices that haven't been taken for various reasons.


> I mean, somebody has to be extremely junior to never experience trying to grok somebody's else old non-trivial code

People are definitely capable of looking at someone else's code and saying "this crap is completely unreadable, we should rewrite it all", while at the same time believing that their own code is perfectly readable and self-documenting.


I’m not a “no comments” maximalist but someone has to be pretty junior to have never experienced a comment that is just completely incorrect.

It’s really hard to write a good comment that is only “why”. It’s really hard to keep comments up to date as code is moved and refactored. And an incorrect comment is much more damaging than no comment at all.

That’s the driving force behind “self documenting” code. My view is that a comment is sometimes necessary but it is almost always a sign that the code is weak.


> It’s really hard to keep comments up to date as code is moved and refactored.

Hard disagree with this.

If your comment is so volatile then that really sounds like there's something architecturally wrong with the code.

Most of the time these kind of "comments" can be turned into either a test, or a extensive description that goes into version control.

Because commit messages are just that: a comment for a specific moment in time. There are lots of options to inline comments.


Which is why I find looking at `git blame` (or one's favorite IDE's/SCM's equivalent) output so very useful in case of undercommented code.


> It’s really hard to keep comments up to date as code is moved and refactored

I agree with this, but if the explanation for logic has good reason to be there, then keeping comments up-to-date with code changes is very important and it goes back to seniority and empathy I mentioned earlier - if you understand why its there in the first place, and you actually like rest of your team, you are doing too all of you a big favor with updates of comments.

Each of us has different threshold for when some text explanation should be provided, which is source of these discussions. But again back to empathy, not everybody is at your coding level, you can save a lot of new joiner's time (and maybe a production bug or two) if they can quickly understand some complex part.


The developers , especially new ones do not understand or know all the history of the project.

I remember one time in css I had to do something weird like min-widht:0; It was needed to force the css engine to apply some other rule correctly,. but this will puzzle you when you read it. And this kind of puzzling code needs comments, I prefer to just put the ticket ID there and the ticket should contain the details on what the weird bug was with all the details, so if some clever dev wants to remove the weird code he can understand stuff.

Sometimes I see in our old project code like if webkit to X else do Y , there is no comment with a bug link so I have no idea if this code is still needed or not (Browsers still differ in more complex stuff, like contenteditable )


I don't like such rules of thumb.

A better approach would be that A comment should tell you something that you cannot glean from the code and/or is non-obvious. Yes, I understand non-obvious can have a truck driven through it, but in general it should work.

You can read code and understand what it's doing mechanically, but you may not understand why the obvious approach wasn't taken or understand what it's trying to achieve in the larger context. Feel free to comment on those, but if the code is difficult to understand mechanically, the code is generally bad. Not always, everything has exceptions, but generally that's true.


Documentation without accurate and descriptive method/member names is much more harmful than the inverse. If an abstraction is sufficiently complex to warrant a lengthy description of why it exists, then it should have a design doc. In practice, most code within a repo is pretty simple in what it accomplishes and if it's confusing to a reader, then it is most likely because they don't understand the design of the larger component or system or simply because the implementation is poor. There are of course cases where comments are really useful or even necessary (e.g. if going against best practices for a good reason or introducing an optimization that is for all intents and purposes unreadable without explanation), but they are exceptions.


I like that term. When I hear it I can with 100% accuracy know the person touting it is a hack and their code is garbage.


The dream of self-documenting code requires solving two problems, only one of which programmers are typically good at.

1) Communicating with computers

2) Communicating with other humans

Self-documenting code is essentially writing prose. Granted, to someone with similar knowledge as you.

But most people suck at writing.


I have better hope that a good programmer can write readable code, than that they will write readable documentation. As you point out, people suck at writing.


I would remark here that The Mythical Man-Month did give a page or two to documentation. My copy seems to be out on loan, but as I recall the section included a figure showing the documentation for a sort function, perhaps 25 lines or so.


> My copy seems to be out on loan,

Drifting off-topic, but I wonder how close to the top of the list TMMM is for "on loan" duty cycle in the software world. My copy also seems to be persistently in someone else's hands.


If I remember correctly, Brooks experience was with assembler, which might require some more documentation than modern Java or Python.


I think that the example was in PL/1.


At my company code is required to be self-documenting. My attitude is that if you can't determine the why then you likely are not familiar enough with the problem domain to be working with that code. It's fine not to be familiar with the domain and there are ways to address that, but reading source code is not one of them.


So you bar all junior developers from writing code until they've gone through tested coursework in your domain, or what?


Yes absolutely. All developers, junior and senior, go through a 4 month training program working on a completely independent project from scratch that teaches them everything they need to work in their domain. There are exceptions now and then, but for the most part it's pretty consistent.

When a developer wants to switch from one area to another, they go through an accelerated program (takes only about a month).


I've seen lots of documentation that I only understood after I understood the code.


In other words: poor documentation.


It used to mean that, but a programmer changed the meaning. They could.

(a) rename the column, be the guy who broke the system and spend all weekend trying to fix 6 systems he never knew existed, written in Access, Excel, Crystal Reports, VB6, Delphi and a CMD file on a contractors laptop.

(b) keep the column name, deliver the feature, go home.


I really hope you are joking 100%, because

(b) Go home and be happily oblivious that six other systems silently started to produce wrong results since the meaning of the column has changed. But, of course, that is someone else’s problem, some other day, when several months of reports and forecasts have to be recalled and updated.


We prefer option c: add a new table/column with similar looking name. Then few years later start wondering why there're two almost identical entities, and why one of them behaves weirder than another.


There's no point in keeping the same name so that a system can keep running, if the data meaning has changed.

Bad programmers chose (b). Good programmers choose (a). Better programmers refuse the change request.


If data meaning changed tho those 6 systems would break too, no ?


It might not be math-related; could be something as simple as a requests table being named to indicate that api X was used and now it uses API y and there is some reporting on that somewhere that doesn't care which API was used.

Ideally the table would have been named more generically but in an earlier stage startup there will be mistakes in naming things no matter how hard you try to avoid that.

So the only thing that actually breaks here is that a small number of engineers that care about this might misinterpret what it means unless they learn the appropriate tribal knowledge. Ideally it gets fixed but if you look at all of the things that can be improved in an early stage startup, this kind of thing is pretty minimal in importance so it becomes tech debt.


Anyway, where were we? Oh yeah - RIP Fred Brooks.


That's why you should also have comments, and maintain them. Then if the name is hard to change, you can at least document the new semantics.


To be clear I am pointing out incentives to make a worse choice, rather than the better choice.


Which is exactly what Linus' quote is about.


In that case the code will probably also be difficult to read.

In my experience, studying the database schema and IO data structures is indeed the best way to begin understanding a complex system.


This is only in the reality where nobody is shown the table.

Showing things acts as a forcing function to fix the thing being shown


I can just barely count the times I have seen a production failure due to someone assuming a millicent value from a column with ambiguous naming was in centicents or vice-versa.


At least you can grep that column name in the source code to find out where other taxes are calculated. Of course an ORM can further complicate this.


This hits home. Not only in databases, but also in code.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: