I like to use "big-endian" naming molds (love that term!) to define sets of names that when you alphabetize them place related variables next to each other. (i.e. in a completion menu or browser.)
For example, left_foo and right_foo are little-endian, since the least significant word comes first, so they'll be a long distance away from each other in an alphabetized list.
But foo_left and foo_right are big-endian, since foo is more significant than left or right. So they will appear one after the other in an alphabetized list.
Common suffix words are _x _y _z or _min _max, or _left _right _top _bottom, of even singletons like _enabled _loaded _error etc.
But when you combine multiple dimensions together in names, you need to think of which dimensions are more significant, based on how the variables are used, so use foo_x_min foo_x_max, if the positions are important, or foo_min_x foo_min_y, if the ranges are more important.
Sometimes it's hard to decide or ambiguous, so just try to be predictable and the same as all the other code. Think of which variables should appear closest to each other in an alphabetical list.
And avoid middle-endian or random-endian (or sentence-grammar-order-endian) like the plague. A variable name should probably not be a grammatically correct sentence.
Another really annoying linguistic naming smell is "smurfing," where all of class Smurf's instance variables have smurf_ prefixes. Or where all the classes, methods, or instance variables have an "xyz_" prefix where "xyz" is the name of the project or library. Arrgh!!!
I really like this concept but I find it a bit frustrating that the name for the naming convention doesn’t follow its own convention. Shouldn’t it be called “endian-big”? ;)
Also from an LTR standpoint why is it big-endian when the left is not the end but the start? so it should be big-startian or, according to you, startian-big.
There's an interesting question that arises when you says "when you alphabetize them place related variables next to each other".
Let's say you have some non-trivial class that includes, among others, some 2d rectangular data: An x, y, width, and height. They're all related, but they don't naturallly occur near each other without a little massaging:
coordX, coordY, sizeWidth, sizeHeight?
xMin, xMax, yMin, yMax?
coordXMin, coordXMax, coordYMin, coordYMax?
I generally agree with your sentiment, but there's a reason "naming things" is one of the hardest problems in computer science :)
I'm not quite able to verbalize exactly why, but when I see the set of { "coord", "x", "min" }, it sounds to me like the most intuitive way to put it would be "x_coord_min", but this seems to violate the rule that GP gave, since "coord" here seems to be "greater" than "x" given that "x" is an answer to "what kind of coordinate?". The best explanation I can come up with is that "x coordinate" feels like a coherent logical unit and that splitting it would make it more work for me to parse as a reader, and then "min" follows that because "min_x_coord" sounds like it would be something like the "minimum x coordinate" for a given window or something. I wish I could come up with some consistent universal rule for how to order these things, but I can't really come up with any other process to describe how to get what's the most intuitive other than "look at all of them and see what sounds right". I guess it's not unreasonable to say that ordering three "words" is fairly easy to brute force looking at all of them, and beyond that it's probably worth reconsidering the naming (and perhaps scoping) of the variables you need to disambiguate, but it's not nearly as satisfying as having some sort of objective rule.
"Smurfing" and "big-endian" are the same thing though!
IMO a big alphabetical list of everything in your project is not a useful or important thing. Use a language that has good support for hierachical namespaces, and use them.
I think big-endian naming was useful for programming with editors that supported tab completion. At one point, the suggestions were only displayed alphabetically. Nowadays, editors use a more sophisticated algorithm (is there a name for it? Fuzzy search, perhaps?) that suggests words containing the sequence of characters already typed anywhere within it.
Loved how short and to the point it was.
If you don’t have time to watch, the idea is it’s incredibly rare for two devs to come up with the same names for vars. To increase the odds of coming up with the same names for vars, you should agree on naming conventions (name molds) as a team. Sounds obvious, but great science is often confirmations or denials of the obvious.
Patterns in the way code looks in general are invaluable for parsing code quickly especially in areas that you're somewhat familiar with. You can discard/ignore big chunks of code very quickly and go straight to where you think the relevant part is if they look as you'd expect at a glance. If they don't, it's sort of like a cache miss. "What the hell why don't people autoformat their goddamn files before saving" and then read those bits of code just to make sure they're not hiding any surprises, before formatting them properly.
It's the difference between taking, say, 2 seconds to read a method, and 10 or more.
I can only assume people who don't treat code formatting as a rule read every.single.thing.line.by.line.every.time.
I find Rails' conventions are very good around this, for example datetime fields end with _at and dates end with _on. This way you end up with variable names like published_at or published_on depending on if you care about the time or not. It sounds so natural.
The idea of using ? to end a variable name for booleans is great too.
It's the opposite of cognitive load because you can glance at a name and know what it is without knowing more about it. If the implementer of a linguistic named function does bad things to break the expected behavior then you shouldn't blame the method -- that's a user error.
Personally I find consistent names more important for CLI tools, kubectl's CLI is good in this department for being consistent. You can predict how each command works by knowing the pattern. They went with a "verb noun" style. I don't think one is necessarily better than the other but being consistent does help for CLIs because you often need to recall what to run by memory, CTRL+r history or running the command incorrectly to get a help menu on what you can run. However a code editor gives you a lot more help with auto-complete or buffer-complete for function or variable names.
For naming things in programming, I'm not 100% convinced a hard pattern based standard makes sense because naming is very subtle, sometimes you want the emphasis on the "thing" or an emphasis on the "action" depending on the context -- basically which one is more important for that specific instance.
For her open question of "what would you name a variable for storing the maximum number of orders per month", that's an incomplete question. What's the context behind it? Is this variable defined as a constant somewhere? What other functions are in that module or class? How do you plan to use the variable? Will it be used in more than 1 spot? Is it part of a library that third party folks can use or limited to 1 code base? Will there be other similar variables, such as getting weekly or yearly orders?
My name is Don, so every time I see a column called "createdon" I think it's a boolean flag that you can set true to create me. I wish the db designer would use snake case instead of mashing all the words together. But then again, I keep my ssh key in a file called donkey.pem.
The "big-endian naming mould" suggests naming it orders_per_month_max, since orders is the object (most significant), per_month is a count of orders (secondary significance), and max is a constraint of the order per month count (least significant).
Then you can use other parallel names in the same big-endian pattern, like orders_per_month orders_per_year orders_per_year_max orders_per_second_min refunds_per_year_average etc, and they will all sort next to their closely related names, instead of the "inline max" or "prefix max" scrambling the alphabetical order.
These are the sorts of 'style guides' that we need. I started boycotting 'style' meetings at new companies ages ago because it always turned into a bunch of people using up all of their time, energy, and social capital arguing about where the curly brackets go and how whitespace should be handled. These are things a machine can do for you. We shouldn't be wasting our breath on them.
As far as 'consistent' names go, there are multiple dimensions of sameness. Using the same word for all instances of the same concept, not using the same word for other concepts, using consistent pluralization. Using same adjective/adverb/gerund form for related concepts. You are telegraphing sameness in these cases, and difference in others.
We have tried things similar to what you describe before, we just have dialed it in wrong. New-ish, good ideas often fall prey to bad execution. Hungarian notation, for instance, dictates that the variable name stays the same when the sense of the data changes, but is supposed to change when the implementation details shift. Which is exactly the opposite of what we want. If I fix a Y2K bug or a 2038 bug in due_on, I'm going to end up with a slightly different structure, but the deadline it represents is still 12 midnight. And if it's not, well, maybe we need a different convention for calendar day versus business day deadlines.
I agree that a machine can enforce conventions, but it can’t decide them for you. The primary point of conventions (IMO) is to aid the reading of code. If one style is more readable than another, that should be chosen.
If you think opening braces should go at the end of the line and I (incorrectly ;) ) think they should go on a line by themselves, our team style guide should probably pick exactly one of those. That’s a human choice after human debate, followed by machine enforcement.
Very interesting video. I'm convinced that this is a very under-explored area of software engineering and that proper naming is at least 50% of developer productivity. Often it doesn't matter how well-structured a code base is, if the function and variable names are nonsensical the code will still be very hard to read.
I once worked in a place in the 1990s that took it to such an extreme that every table name, column name, and variable name had to be approved by a naming standards committee before it could go into production. IIRC the committee met once a month, maybe twice? Which was not ideal for the developers but changes only went to production once a month during a "change window" anyway.
Naming conventions can help with code readability, but don't let the process become more important than the goals.
To use her example. I would have chosen ordersPerMonthMax. Which would probably sort alphabetically nicely with ordersPerDayMin and ordersPerYearAverage.
Now that I know "name-mold" would be a good query, I might find something better than the Spanish name-mold.
Wide Scope Narrow Scope
+-------------+-------------
Function | Short Name | Long Name
+-------------+-------------
Variable | Long Name | Short Name
+-------------+-------------
I can’t quite explain why this works
I'll take a shot...
The general principle uniting all 4 quadrants of the table is: "Use names just long enough to be clear, but not longer."
Here's an illuminating exception to the heuristics: The use of the very short global "DB" for database.
We are really trying to balance two competing goals:
1. Brevity -- Don't explain what I already know. You mention this in relation to a tight loop variable: "I bet you didn’t need me to explain dL stood for Drivers License. It might have even annoyed you if I had spelled it out."
2. Clarity -- Don't confuse me. Don't make me look something up to figure it out.
Maximize brevity while retaining clarity.
Clarity is related to frequency of use. This relates to your comment: "How come the jQuery constructor feels much more natural than the native version? document.querySelectorAll('#appContainer')". It is annoying because we use it all the time... we don't need or want a verbose description.
If the thing is used everywhere, and especially if it is a general convention, assume familiarity. Sure, someone might be confused by "DB" the first time they ever see it, but it will quickly become part of their lexicon and remain so through repeated exposure. However, the same cannot be said for "CGTAO" as a stand in for "cudaGetTextureAlignmentOffset". In that case, the long form is what I want.
We handle these principles effortlessly with our use of "he" vs "John" vs "John Smith" vs "the John Smith you went to highscool with" but for some reason have trouble with them when writing code.
It's all about managing entropy. Less surprising means you can use fewer letters. More surprising means more letters. That shouldn't be too controversial.
The part where it gets tricky is when a concept is widely used, but is a complex concept. What if you have dozens of calls to cudaGetTextureAlignmentOffset in a function, and hundreds in a codebase? Heck, even CUDA is an acronym, Compute Unified Device Architecture.
There's a similar complication when you have several of these big names with slight differences. Made up example, say you also had cudaGetTextureAccessKey, cudaGetVectorAlignmentOffset, etc. I actually find these sometimes worse than the initialisms, as my eyes skip over these long names. The acronyms (CGTAO, CGTAK, CGVAO) have a higher ratio of different letters to total length. But then obviously the abbreviations are very opaque.
> It's all about managing entropy. Less surprising means you can use fewer letters.
I like this framing.
> The part where it gets tricky is when a concept is widely used, but is a complex concept. What if you have dozens of calls to cudaGetTextureAlignmentOffset in a function, and hundreds in a codebase?
You have to predict the knowledge of the developers (current and future) working on the system, and let that guide what you can assume. This is necessarily an art and you'll miss the mark sometimes. One approach is to always be overly verbose, but this is too simple: it destroys readability when practiced without restraint.
> There's a similar complication when you have several of these big names with slight differences. Made up example, say you also had cudaGetTextureAccessKey, cudaGetVectorAlignmentOffset, etc.
One technique here is to introduce a namespacing object/module/<whatever you call it in your language>. So something like "cuda.textureAccessKey", "cuda.vectorAlignmentOffset", etc. Sometimes repetition of a long name is the least of all evils, sometimes it's not.
Agreed. As with all rules there are always exceptions. I say this applies to concepts that are ubiquitous across unrelated codebases. id, repo, min, max, and enum are some that come to mind. Otherwise, all business domain terms should always be spelled out in full in the same way people refer to them in speech (ie, their ubiquitous language). So the only time acronyms are ok here is if that is how people talk about the particular term in every day speech (like "sku" instead of "stock keeping unit").
Like most things, it's a double edged sword. I haven't worked with world-class developers so most of my experience is dealing with people who would benefit immensely from any linguistic practice.
If you think someone comes up with bad names, wait till they have to write a few sentences, or paragraphs.
Sometimes I create variable names like "runProcessAsync" (instead of "asynchronouslyRunProcess"), "setIsActive" (instead of "setActive"), and even use shorthand vs non-shorthand (e.g. "src" vs "source") in different contexts.
It abuses the English language but makes code much easier for me to read. Most of the time I don't even realize I'm doing it.
But does it make the code easier for others to read? The first 2 steps I think so, and I've seen them in other projects. The last one probably not, and I try to avoid it and use more descriptive names (like "srcPath" and "srcData") when I spot myself making it.
I strongly agree. There's only one correct way to spell a word, but many different possible abbreviations. The hard part is remembering just WHICH letters to leave out, not typing the letters.
I have a terrible habit of mixing camelCase with snake_case. I'll start out using snake_case because I find it slightly more readable, but then use some library that has camelCase methods, and before I know it's all a bit of a hodge-podge. (Or is that hodge_Podge?)
It seems to me that ideally, if a variable name is so predictable that you can name it by rules, that’d be an opportunity for the language to not require a name.
But in practice $0 and Haskell’s point-free styles can be annoying to read, so maybe what I want is the IDE to insert obvious names.
You still have to say what of the many obvious things you are using here.
Point-free syntax has a different kind of namelessness, where if you have a single thing, you don't have no name it. And the $0 is really a limitation of the language, nobody ever though it was a good thing.
While there is evidence that hungarian notation + camelCase is better for token usage - eg variables
Underscores are better for readability other kinds of things like unit test names or filenames. Because humans tend to shortcut, it becomes camelCase for everything, including other inappropriate attributes out of laziness, which is aggravating. It's too bad that distinction has not been properly subjected to rigor yet.
You can use Hungarian notation with underscores as well so not sure what you're getting at here. Sounds like you're talking about camelCase vs snake_case.
s_CALLBACK_CUR_INSERT Lexical_Bindings_For_XmTextVerifyCallbackStruct XtPointer call_data XLTYPE_CALLBACKOBJ /* How long can this go on???? */ Set_Call_Data_For_XmTextVerifyCallbackStruct Wcb_Meta_Callbackproc XmAnyCallbackStruct doit newInsert XmCR_MODIFYING_TEXT_VALUE cdr car ep /* do nothing for most cases... */ Cvt_XmRXmString_to_LVAL GetValues_Union Resource_Instance WINTERP_MOTIF_111 XmStrings XtGetValues XtPointer_value cv_xmstring XmBulletinBoard XmNdialogTitle XmNnoMatchString XmNlabelString XmNtitleString XmRowColumn XtGetValues XmStringFree /* This is so totally ridiculous: there's NO WAY to tell Motif that any button can select a menu item. Only one button can have that honor. */ /* If this function looks like it does a lot more work than it needs to, you're right. Blame the Motif scrollbar for not being smart about updating its appearance. */ xm_update_scrollbar widget_instance Widget widget scrollbar_values pane_maximum widget_sliderSize new_sliderSize h_water l_water XtVaGetValues XmNheight XmNpaneMaximum XmNsliderSize widget_sliderSize XmNrefigureMode maximum minimum INT_MAX percent XmScrollBarSetValues ARMANDACTIVATE_KLUDGE DND_KLUDGE *dialog*button1.accelerators:#override Ctrl<KeyPress>m: ArmAndActivate() /* sets the parent window to 0 to fool Motif into not generating a grab */ USE_MOTIF xlw_unmunge_class_resize XlwMenuResize
For example, left_foo and right_foo are little-endian, since the least significant word comes first, so they'll be a long distance away from each other in an alphabetized list.
But foo_left and foo_right are big-endian, since foo is more significant than left or right. So they will appear one after the other in an alphabetized list.
Common suffix words are _x _y _z or _min _max, or _left _right _top _bottom, of even singletons like _enabled _loaded _error etc.
But when you combine multiple dimensions together in names, you need to think of which dimensions are more significant, based on how the variables are used, so use foo_x_min foo_x_max, if the positions are important, or foo_min_x foo_min_y, if the ranges are more important.
Sometimes it's hard to decide or ambiguous, so just try to be predictable and the same as all the other code. Think of which variables should appear closest to each other in an alphabetical list.
And avoid middle-endian or random-endian (or sentence-grammar-order-endian) like the plague. A variable name should probably not be a grammatically correct sentence.
Another really annoying linguistic naming smell is "smurfing," where all of class Smurf's instance variables have smurf_ prefixes. Or where all the classes, methods, or instance variables have an "xyz_" prefix where "xyz" is the name of the project or library. Arrgh!!!