Hacker News new | past | comments | ask | show | jobs | submit login
On DRY and the cost of wrongful abstractions (thereignn.ghost.io)
223 points by nkurz on Sept 18, 2016 | hide | past | favorite | 165 comments



I am firmly in the DRY camp with a very important caveats:

1- Do no generalize so you can factor out like code too early. The first time just write it. The second time write it again even if only minimally different. The third time is a judgement call. By the fourth time you see the proper abstraction, something that might have been much different if you were to have refactored early on.

Corollary to #1- Don't run ahead of yourself making all these placeholder functions, interfaces, and abstract classes with the expectation that you would have had to refactor this way later. Unless you done versions of your project a dozen times (even then, sometimes I don't even do it), you are most likely going to be wrong.

2- Don't refactor out single lines unless there a lot going on on that line or you have the experience to know (don't lie to yourself either) that this will be an important and growing line.

3- Sometimes performance can trump. And sometimes you do know this ahead of time, especially if performance is correctness as it is on some projects with very tight constraints.

Besides that. I freaking hate seeing similar code everywhere, blowing up my instruction cache, making changes difficult. Please stop it.


In my experience, there's a costly dimension to DRY that doesn't get enough love: dependencies.

DRY incurs a significant dependency cost. This has been painful for me, since I work on distributed systems a lot. When you have an ardent follower of "DRY" who refactors a common utility so that many subsystems of a distributed system have a code dependency on the same piece of code, then changing that piece of code can become very costly. When that cost gets too high (e.g. requires coordinated deployments and lots of cat herding with owners of other systems), then the DRY has just painted you into a corner where the cost of changing your code is higher than it's worth, and you end up with a rotting code base because the cost of change is too high.

Please, please factor that cost into your decision to wield DRY in distributed/service-oriented systems!

Code dependencies have diminishing returns in increasingly distributed systems...try to find a way to encode the common functionality as data instead of code and you'll be way better off.


I work on mostly distributed systems too and that's why I'm in the DRY camp (but only after the 2nd iteration kind of thing -- mostly). That's where the experience pays off.

I think DRY is taken to extremes and you get that one-line function that does nothing meaningful except rename append or something equally trivial (rename append then append space).

I like small code bases. I like only have to make one bug fix. I like being able to rewrite large chunks of code (like entire protocols). And DRY lets that happen. But I also like readability, and something that can fight against DRY I realize.

I'm a pragmatist at heart. And coding is really a craft. So all these maxims have their use, but it just takes experience to know when to use and break them. But overall I see DRY as one of the more vital.


In that case, can't you then just un-DRY (hydrate?) the code by branching it for each subsystem? Then each team can keep its own pace, and still cherry-pick improvements from each other, which would be difficult to do if they were simply pieces of code copied all around the codebase.


The opposite of DRY is WET (Write Everything Twice) ;)


Yes and no. Once a half dozen components take a common dependency, that dependency tends to grow large and complex. Where each component needed a simple string templating class (for example), the aggregate dependency ends up doing too much and is overengineered. It has multiple layers of abstraction to make the various string manipulation classes "look the same" and therefore justify DRY refactoring.

So yes, you can just "rehydrate" the dependency into each component, but you often end up needing a fair bit of work for each component to untangle the mess and create the minimal dependency that each component actually needs. If you just want to copy the dependency directly, then sure, that's typically pretty easy. But that's also typically a pretty terrible outcome as now you have tons of duplicated code and modifying the dependency implies duplicate work, and worse, duplicate pointless work as you must maintain the pieces your component doesn't even use.


Yup. While DRY is often sold as "everyone gets to use it!" but in reality becomes "everyone HAS to use it".

I like to think of this as: DRY (Don't Repeat Yourself) needs to be weighed against DRY (Don't Refactor... Yet).

This can be mitigated somewhat by liberal and thoughtful use of SemVer or equivalent.

(There is also, of course somewhat tongue in cheek; WET - Write Everything Twice)


> When you have an ardent follower of "DRY" who refactors a common utility so that many subsystems of a distributed system have a code dependency on the same piece of code, then changing that piece of code can become very costly.

Is this a piece of code that actually requires co-ordinating all nodes so they're running the same version of the code? Otherwise I'm not sure I see the problem, since you can just run old versions of the shared code in parallel with new versions.


If the utility requires all the subsystems, then doesn't the dependency exist no matter how the code looks?

Of course you can refactor to reduce dependency scope at a code level(the essence of functional programming), but if in order to do X you need to access W, Y, and Z, then I don't see how code changes will affect this deeper truth.


The something is wrong with your DRY. Otherwise it is not clear why that dependency is costly and why changing it is so painful. Sounds like code is moved outside but is still tightly coupled.


> Don't refactor out single lines unless there a lot going on on that line or you have the experience to know (don't lie to yourself either) that this will be an important and growing line.

On the contrary, you might want to factor out single tokens, if they are likely to change together.

You can view concerns over magic numbers as a special case of this. If you have "10" in 10 places, you probably want it in a constant with a name. But perhaps you want it in several constants with different names - does the "10" here represent the same piece of knowledge as the "10" there?

There's no reason the same doesn't apply if we want to perform the same operation in 10 places. Those that represent the same piece of knowledge ("this is how we render a text box", "we have capacity for 4 widgets", "we compute the total price of a basket like this") should likely be consolidated. Those that represent different pieces of knowledge should probably not.


This is another good point. Often times you want to make sure that "magic numbers" are replaced by well named constants or configurable values, even if they're only used once. Most importantly to make sure to differentiate them. It becomes less obvious what those numbers do when they're just numbers, and it becomes much less obvious that two things using the same number in close code proximity just have the same values by coincidence.

This can be true for logic as well. If you have a very specific conditional it can help to put it into a function so that it's purpose is clear, especially so if it's a piece of business logic that might change. Interestingly this can be "anti-DRY" as well because you might have the same logic duplicated in different places, but the fact that they are the same is a coincidence, and you want them to change depending on their own requirements.


> Interestingly this can be "anti-DRY"

Which is funny, because as originally coined, it is exactly "DRY".

"Every piece of knowledge must have a single, unambiguous, authoritative representation within a system".

https://en.wikipedia.org/wiki/Don't_repeat_yourself


Your "Corollary to #1" is YAGNI. I agree with 1. My advice to my teams for #1 is identical.

The complaint about readability in TFA is inexcusable. If I do not have a system for visualizing (or otherwise comprehending) the complexity of my system, then there's not much hope. Dependency graphs and "Find references" in my IDE help. If the only tool in the toolbox is looking at the code that fits in a vim console, then I'm going to have problems writing large applications. I am not that kind of genius. I have met and worked with that kind of genius (i.e. holding all the code in their head) and it turned out that around 1998 codebases hit a size that even those geniuses hit a wall, with project-ending consequences.


I am interested in what tools you would suggest for said visualizing/comprehending.


I agree with all three of your caveats. But I want to emphasise and re-phrase #1

Premature abstraction is as bad as premature optimisation.

In fact their effect on the code base is rather similar: complexity and obscurity in the name of some theory about a future benefit.


I agree. One aspect that catches me out sometimes is that often the first iteration of something is in such obvious need of abstraction that it's hard to let it go for the time being. The reason to do so, of course, is that you don't know what the proper abstraction is yet, but that doesn't make it pleasant.


Point 1 is how I do it. I think it balances the trade off between too-early abstraction and too-much duplication.

The only thing I'd add is that you should think about including some comments when you duplicate code so that people realise how the different bits of code relate to each other. If you've thought of a potential way to abstract the code, but didn't proceed with it for some reason, you can add a note describing how it could be done. This helps new people understand why they're seeing duplication and how they can factor the code if they choose to.


I disagree with your point 2 very strongly. A lot of people have a mental block when it comes to 1-line functions, and they really shouldn't, they can often be very helpful for a lot of reasons.

Generally, the most useful tool in refactoring is to use extract method relentlessly, even into 1-line functions. More often than not, a well named function with well named parameters is one of the absolute best and most reliable ways to make your code as close to "self-documenting" as possible. But there are additional advantages as well.

Spinning things off into function calls often makes refactoring easier. Because it discourages bad habits like local variable reuse (using variables as scratch pads) and excess mutable state. It also encourages a more functional style, which has a lot of benefits especially in multi-threading but also in the ability to reason about and determine the correctness of code. It can also make it a lot easier to see when you need to spin off functionality into other classes/modules. For example, if you have a member field and you have a ton of little one line snippets everywhere that anything happens with that field, that usually means you need to extract it out into a class and have those one-line snippets be various methods on that class. When all that code is inline, this can be harder to see.

Additionally, there are lots of mental blocks that happen when you have inline "one liners". There's a bias to avoid "cluttering them up", even when it's necessary. For example, when you need to add error checking and error handling. When you've spun out the code into a method call, you tend not to have that block, you just add the error handling code because it's trivial, and going from a 1-line method to a 5-line method isn't a big deal. But bloating up another function by taking a sweet one-liner and replacing it with 5-lines of code can seem like clutter, so often times you just have an aversion to doing so. Additionally, sometimes those "sweet one-liners" can be more readable when they're broken up into multiple lines, and this is again something that is trivial and straightforward when it's spun out into its own method but will see resistance when the code is inline. This is true even if the code remains a single statement broken up into multiple lines to enhance readability.

A lot of the time these patterns aren't obvious. Because the end result of spinning off one-liners is often not one-line method definitions, it's short methods that are a couple lines with several lines of extra error detection/handling. Or it's separate utility classes to encapsulate handling of special member fields.

The idea of ensuring that you always have "meaty" functions is unhelpful, what's important is thinking about things from a perspective of abstraction, modularity, readability, and correctness.


If you have a one-line function that is used one place (or even a few places) and it doesn't mean it is easier to understand just because you made it look like English. More likely you just obfuscated it: update_string_with_value(string,val) is NOT easier to undestand than string.append(val).

I really and truly hate code where people think they need to take the programming out of programming. This goes all the way back to when they learned it was a good idea to comment their code "next = ++i // increment index then assign".

> Spinning things off into function calls often makes refactoring easier.

You're breaking a rule in keeping things simple. Don't anticipate. You are refactoring ahead of the refactoring, most likely making the wrong decision in the process.

And you're also coding defensively against your fellow developers. This tends to be a problem in low- to average-skill workplaces. Instead why you just follow cheakins and show your junior developers what they are doing wrong? That is how I learned to program from some amazing developers. It wasn't from them coding defensively against me, but from the tap on the shoulder I got asking me why I did something on my last checkin.

And I'm not advocating "meaty" functions. They should be divided on logical pieces of functionality that aren't so small where every line qualifies, but not so large as to span more than a page or more do a single thing. Before they even hit the page mark, most likely other reasons to will dominate though like the single responsibility principle or something else.

Me and you might just come from different worlds.


>is NOT easier to understand than string.append(val).

But this depends entirely on the context of the code. I like to think of this in terms of "abstraction levels". Mixing abstraction levels harms readability and so if your code is doing some low level operation then string.append(val) should be obvious from the context and the name of the function it belongs in. If however the code is still at an abstract enough level then you want to wrap your implementation detail in a name that is appropriate for that level of abstraction.

As a rough example, in a function that's popping from some data source and pushing into a local structure as a part of some larger unit of work, wrapping your collection.append(val) would be harming readability. But if you're "enrolling" a "student" into a "class" you should be doing class.enroll(student) even if the implementation is just class.students.append(student_id)


> As a rough example, in a function that's popping from some data source and pushing into a local structure as a part of some larger unit of work, wrapping your collection.append(val) would be harming readability. But if you're "enrolling" a "student" into a "class" you should be doing class.enroll(student) even if the implementation is just class.students.append(student_id)

After calling class.enrol(student), I would expect the student to be enrolled, not just added to a list of students.


> After calling class.enrol(student), I would expect the student to be enrolled, not just added to a list of students.

Why do they have to be different? That's really his point. In this situation, enrolling just requires adding their student id to the list of students in the class. But while doing 'class.students.append(student_id)' accomplishes that, it is not clear reading that line that it actually enrolls the student in the class, rather then just being one step of a larger process, or something else all-together.


If they are not different then you're half way to going full Active Record (which I'd consider an anti-pattern). There is likely much more to enrolling a student than adding them to a class, and most of that logic should not go in class.


You are masterfully missing the point.


"is NOT easier to undestand than string.append(val)."

Disagree 100%. But of course as everything else in our industry this is super subjective. I'd much rather a line of combinators which read like English than a collection of function calls with wacky parameter calls. YMMV on this though and I understand that.

I also want tiny functions and those functions to be moved far away. I want the "english" names as close to the Real Work as possible. If I really care what the activity is I can use the IDE to drill in. But again, I understand YMMV on this one.

People who argue this topic as if there's a ground truth are the true problems. Everyone has they're own take.


> "People who argue this topic as if there's a ground truth are the true problems. Everyone has they're own take."

Exactly. It's like design patterns, algorithms, and data structures. There are lots of techniques useful in programming. The trick is having the experience and judgment to know when to use the right one.


I feel like you're arguing against something else. It should be obvious that I'm not saying "ALWAYS USE 1-LINE FUNCTIONS". My point is that a phobia against small functions is both extremely common and typically harmful to code quality, for all the reasons I described.

If it were easy enough to give someone a simple checklist of guidelines and have them produce high quality code, coding would be a lot easier, but it doesn't work that way. Coding well requires years of experience that produces the wisdom and perspective to know when and to use different techniques. It's what tells you "it's unnecessary to comment an i++ line", it's what tells you when and how to use one data structure over another, and so on.

And no, I'm not "refactoring ahead of the refactoring", I'm refactoring where it makes sense, even down to a single line. That's the point. To properly compose things, and to avoid the bias against small functions. Not to atomize a program into infinite function calls, of course.

Most of the code I've come across that is below the quality I'd like tends to have excessively large function definitions, precisely because there is such a widespread antipathy against splitting things up "for no reason" and against small functions. When it improves readability, when it improves the composition and modularity of the system, when it makes it easier to update, modify, improve logging and error handling, etc. those are all good reasons for extracting methods. The rare times I've seen code that required excessive jumping around to figure out what anything did it wasn't because of too many small methods, it was because of overzealous application of bad design patterns (e.g. call a factor method to produce a configuration object then instatiate some other factory and pass the configuration object to it to get a factory that produces what you want, that kind of nonsense). That's indicative of a failure of judgment.

My point is, don't be afraid of small functions when they make sense. There are too many people who are afraid of small functions because, like you, they have these strong biases against them based on prejudice. How do you know when a small function is a good idea? Judgment and experience, of course, same as always.


"Made it look like English" probably isn't an improvement.

"Made it so a future modification is made in one place instead of four (possibly missing some)" probably is.


> If you have a one-line function that is used one place (or even a few places) and it doesn't mean it is easier to understand just because you made it look like English. More likely you just obfuscated it: update_string_with_value(string,val) is NOT easier to understand than string.append(val).

We agree your sample has been obfuscated.

  /// <summary>
  /// Case Insensitive IRC Name
  /// </summary>
  public struct CIIrcName {
  	private string Data;
  
  	public static implicit operator CIIrcName( string s ) { return new CIIrcName() { Data=s }; }
  	public static implicit operator string( CIIrcName n ) { return n.Data; }
  
  	public string ToLower() { return (Data??"").ToLowerInvariant().Replace('{','[').Replace('}',']').Replace('|','\\'); } // Silly scandinavians :<
  
  	public static bool operator==( CIIrcName lhs, CIIrcName rhs ) { return lhs.ToLower() == rhs.ToLower(); }
  	public static bool operator!=( CIIrcName lhs, CIIrcName rhs ) { return lhs.ToLower() != rhs.ToLower(); }
  
  	public override bool Equals( object obj ) { return obj is CIIrcName && (CIIrcName)obj == this; }
  	public override int GetHashCode() { return ToLower().GetHashCode(); }
  	public override string ToString() { return Data; }
  }
Every single function of this is a one-liner. Are any of them obfuscated? (IRC servers sometimes talk about the same user or channel in multiple cases, this was a quick drop-in to prevent these from being mistaken as different users/channels by my IRC client. The "Silly scandinavians" comment references the fact that, yes, IRC thinks '[' is lowercase '{'. And yes, I've seen that matter in practice.)

Most of the changes I'd make here have nothing to do with method length:

1) Conversion (at least back to string) should probably be made explicit at this point (was originally implicit for ease-of-use when replacing string s)

2) ToLower could be made private (only used inside the type) and have the comment replaced with the above blurb (instead of being a comment to myself that only I'll properly decode.)

3) Rename CIIrcName to CaseInsensitiveIrcName?

4) Rename Data to OriginalCaseName?

> And you're also coding defensively against your fellow developers. This tends to be a problem in low- to average-skill workplaces.

My experience has been just the opposite. Low/average skill tends to mean less asserts, less static analysis, ignored warnings instead of warnings-as-errors, no unit tests, no thread safety annotations... forget my coworkers - I add these things to code defensively against future-me, and I've found it super effective on larger codebases.

> This goes all the way back to when they learned it was a good idea to comment their code "next = ++i // increment index then assign".

We agree this is bad too. A different perspective though - those are perfectly reasonable comments when you're new enough to programming that you don't remember what "++i" does! But it's something to grow out of and remove.


> "My experience has been just the opposite. Low/average skill tends to mean less asserts, less static analysis, ignored warnings instead of warnings-as-errors, no unit tests, no thread safety annotations... forget my coworkers - I add these things to code defensively against future-me, and I've found it super effective on larger codebases."

Bingo.

And this is another great reason to have more, smaller methods: it's far better for unit testing. When you have big, chunky function defs it can be a chore to figure out how to reach into the method and test one particular aspect of it. When your code is well composed into functions that are logically consistent, you can write unit tests that target each piece individually.


I can't really comment about your specific use case, but I would wonder why you are having to do these conversions and comparisons on the fly so much that you a full class dedicated to them and you don't have the the name normalized ahead of time. But I don't really know.

But like I've said in all my comments. There are times and places for all these maxims and rules to be broken. That's why programming is difficult, especially that is why finding the right abstraction so that it looks easy is difficult. The best programmers I know always seem to find that right balance and you look at their code and go "I could do that" when in reality you couldn't.


> but I would wonder why you are having to do these conversions and comparisons on the fly

I receive a weird mix of normalized and unnormalized input over the network from servers and software I don't control, and want to keep the unnormalized versions for display purposes.

I could normalize on construction - might save a tiny bit of performance? - but then I'd need to store and keep in sync two separate strings (one for display, one for comparison.) Not that I have any real mutation going on that would violate that invariant in practice.

> so much that you a full class dedicated to them and you don't have the the name normalized ahead of time.

The general pattern is: Parse the network message (containing mixed normalization), then immediately lookup (or create) the target channel/user - allowing no duplicates due to mixed normalization.

With this I can simply construct e.g. Dictionary<CIIrcName,...>s and not need to remember to normalize every time I want to query by a given key. Only about 5 members that reference this type but there's a good 20-30 key lookups/equality comparisons using those members, handling different network messages among other things. That list will probably expand.

I could've wrapped dictionary access instead, but this seemed simpler (as dictionaries aren't the only thing I'm using, and none of the existing code wrapped dictionary access.)


I'd normalize on instantiation. It is both a performance win, smaller code, and much simplier. Chanel names don't change so you dont have to keep them in sync in any real sense, and if they do, the setter has a very trival job of setting both the real and normalized name.


The nice thing with wrapping it in an abstraction, is that you can change or add optimizations without having to change the code EVERYWHERE.


I disagree with the use of lots of small 1 line functions, especially when doing algorithmic work.

As the article mentioned, the thing you spend most of your time doing is reading. Using lots of small functions means you have to bounce all over your codebase to figure out whats actually going on. In procedural languages a complex 'meaty' function often needs to be understood in one conceptual bite. Needing to bounce all over the codebase to figure out what those little functions are actually doing is hugely distracting while I'm trying to read.

I often start those big functions with a block comment explaining what I want the computer to do ("// The overall goal is X, which needs Y and Z to be true. There are 4 cases to consider: A, B, C and D through which these invariants need to be true..."). Then I write the code. The code itself will be delineated with comments calling back to the documentation ("// Case B - the wendigo. Note we're being careful here to make sure that x is always < y"). If I find myself duplicating blocks inside that function, I'll pull those shared blocks out. But single lines? When I'm reading the code later I'm going to want to track exactly what happens through the function at a mechanical level. I want to be able to see the invariants, and track exactly how each value flows through. And for that, the less scrolling and bouncing around my codebase I need to do the better.

Often separate cases are cleanly distinct enough that it makes sense to extract them into their own functions, but sometimes they're not. And thats ok too.

In a sense you want to split things up based on how you expect to read it. If I have lots of tiny functions that are only used once or twice, I'm probably going to forget what they actually do and need to read them to understand the code that calls them. In that case, the code should just be inlined.


>Needing to bounce all over the codebase to figure out what those little functions are actually doing is hugely distracting while I'm trying to read.

Ideally those functions should be named such that their jobs are fairly obvious.


Sure, but that totally depends on context. A function called `renderFooter()` is great, even if its just 1 line. I know exactly what thats going to do.

But a call to `setImpulse(spaceship, 0, somevar)` might just set member variables ximpulse and yimpulse, or it might make a bunch of calls to the physics engine, or both, or something else entirely. If I'm trying to understand some code that calls setImpulse I'll probably end up looking it up just to be sure. So in that case, if it does just set the member variables I'd prefer to just write `spaceship.ximpulse = 0; spaceship.yimpulse = somevar;`.

Same thing, but less work to read.


It's the difference between reading code to get an intuition of what happens, and reading code to debug an issue.

For the first, well named single line functions are great. For the second, they require a lot of jumping around.


Completely agreed. Ultimately its about readability, and replacing code with meaningful names is always a boon for readability. I don't understand people who have an aversion to adding more abstraction--every time you can replace any section of code (even one liners usually have multiple parts to them) with a meaningful name is an increase in readability as your code-specification moves closer to the language-specification of the project.

Of course there are bad abstractions. A simple test is anything that doesn't immediately refine your code into something more closely resembling the language-spec. The goal should be code with just the right amount of abstractions such that they map to this language-spec with the least amount of friction possible.


If one wants to treat all expressions as functions, they're far far better off with a proper functional language built on lambda calculus and the power of full function composition primitives.

Most projects fail badly trying to twist OOP to get there. The code patterns are so predictable, its not even a surprise anymore.


That’s the pro-short functions side of the debate, but there are a few counter-arguments that you glossed over.

More often than not, a well named function with well named parameters is one of the absolute best and most reliable ways to make your code as close to "self-documenting" as possible.

It might make a single function’s implementation nicely self-documenting, but what really matters is how readable and maintainable the overall code base is.

When we’re reading the code calling that one-liner function, in most languages you can’t see the names of the parameters, only the order in which they are provided. Sometimes the meaning will be obvious anyway, or it won’t matter, for example if you have a commutative function with two parameters. Sometimes the meaning could be made obvious but hasn’t been, for example the infamous boolean parameters where the call winds up being

    do_something_with(true, false, true);
because no-one defined more specific types/constants to use instead. And sometimes, there is no inherently “natural” order to the parameters, so replacing a one-liner that was unambiguous with a function call where the parameter order is unclear to the reader introduces uncertainty.

Another potential danger with using lots of very small functions is that while each function individually may be nice and clear, the number of relationships between those functions is much greater. This can cause a great deal of frustration to a reader who has to keep jumping to another part of the code to decipher each “clearly named” one-liner function. It can also make it harder to examine the behaviour in context when testing or debugging, for example because log functions don’t have enough information available to produce complete, self-contained messages, or because every time you hit a breakpoint in the debugger you have to walk 6 levels up the call stack to figure out what’s going on at the time.

Finally, I suspect the benefits of one-liners potentially expanding to include error handling are mostly illusory. How often can that one-liner really take any useful recovery action at such a low level, and how often will it just need to return some type of error value or raise some type of exception anyway, so that code further up the call stack with more information or resources available can deal with the problem effectively?

The idea of ensuring that you always have "meaty" functions is unhelpful, what's important is thinking about things from a perspective of abstraction, modularity, readability, and correctness.

OK, but readability and correctness are also global properties, not just local ones, and the benefits of modularity are closely related to how much complexity is encapsulated and hidden away by the extra abstraction, but one-liner functions rarely increase the level of abstraction very much.


Your first point is known as the Rule Of Three:

https://en.wikipedia.org/wiki/Rule_of_three_(computer_progra...


Yeah! I always love seeing when my ideas align with one of those I trust, and Fowler often gets my nod of approval. A month or so ago I was caught agreeing with Gil Tene, so I'm on a roll.


I would add to NOT practice DRY in your test cases/code. More often than not this will get you in trouble.


I'm firmly in the anti-DRY camp since it causes everything to be slow, including coding speeds. You spend far too much time trying to figure out something new, when you could have just cut-and-pasted the answer from the previous problem.

Repeat yourself. It's good for you, and makes you code faster.

You can refactor out later if code becomes bloated.


And then one day you realize there are bugs duplicated all over the place and you pray that nobody hits them.

I'm not religious about DRY, but it's not a bad thing when kept in the toolbox with KISS and YAGNI.


> And then one day you realize there are bugs duplicated all over the place and you pray that nobody hits them.

If you can copy-paste repeated code, you can copy-paste debugging as well.

Duplicate bugs are a single bug.

Obviously not all DRY is bad.


And then I can explain to my boss why I'm working on stuff I'm not supposed to when I could have just wrapped it up once in a library and fixed it once on all those projects.

So now not only have I created a bug, duplicated it N times, but my boss thinks I'm an idiot and wasting time for not applying DRY and good engineering hygiene.


> Repeat yourself. It's good for you, and makes you code faster.

Maintainable goes out the window...


All too often the kind of "abstractions" we talk about in programming are the leaky, code-indirection kind. These abstractions tend to proliferate in a single code-base over generations of programmers. It leaves one wondering if these thousands of lines of code actually compute anything.

In my experience the software at an organization is often a reflection of its culture. Thick with leaky abstractions, interfaces, facades, and indirection? Years of frustrated mediocre programmers, each slightly confused by the previous generations' shenanigans, led by a clueless management team that care more about deadlines than deliverables. There are signs that people are trying to do better but the code takes years to show improvement as it resists change at every corner. A crystalline entity designed and driven by data, never doing more work than is required? A team that has had little churn and is led by passionate engineers often more concerned with deliverables -- at least at first, because it's often fairly easy to make changes when your code-base adheres to a well-defined algebra, is thoroughly tested, and even has a well-formed specification of its critical components.

DRY is as good an idiom as "measure twice, cut once." The difficulty is that software is often more complex to understand than a cabinet. However I can't slight it for being a bad idiom because of the wide variability in skill for such a difficult task as programming. We need something to teach new programmers that at least get them thinking about the abstractions they wreak havoc with.


Your comment reminded me of Edmond Lau's book - The Effective Engineer [1] - where he talks about putting in a good amount of effort into the onboarding process for new engineers.

His premise - having a senior engineer spend an hour a day for the first month helping the new employee with explaining the existing abstractions being used, the underlying design of various systems, etc. - would still be only about 20 hours, which is still only 1% of the number of hours that employee will spend in their first year - about 2000 hours.

As a result, I believe that armed with that knowledge, the new employee is likely to be much more productive, failing which, at least cause less damage to the code base.

I would say that the first example you mention - leaky abstractions et. al. - are just as much (or maybe more) due to poor onboarding as they are due to the frustration of mediocre programmers. There is a lot to be said for good process, which software engineering as a discipline falls short of quite consistently.

[1] https://www.amazon.com/Effective-Engineer-Engineering-Dispro...


Agreed!

I would wager that 90% of good software (by a most liberal definition) is designed to be that way by the methods employed to construct it. It's the leaders and managers who set those processes and standards. If they are well versed in the state of the art and understand how to make it work for the business you can end up with crystalline entity software. Most of my job is presently "hacking the process" rather than the code itself so that the team can produce the best, desirable results.

I'll check out that book, thanks for the recommendation.


The idea of 'don't repeat yourself' is an ok guideline unless, like most proverbs/sayings/mottos/slogans it gets used as a absolute rule.

feel free to ignore it, mock it and toss it aside if it leads to bad abstractions, highly convoluted structures or write-only code.

anyone that has had to do maintenance or adding features to a large OO codebase from the early 2000s will have seen vastly massive class hierarchies where following the path of execution is a roller coaster ride through 10 files for what could have been a 30 line function.

these days grep/vim/sublime/emacs/and the rest of the gang all have power search and replace editing functions. sometimes it's better for the whole world to just copy-paste-alter your code.


an ok guideline unless, like most proverbs/sayings/mottos/slogans it gets used as a absolute rule.

Amen!

a roller coaster ride through 10 files for what could have been a 30 line function.

sometimes it's better for the whole world to just copy-paste-alter your code.

Then what happens when you have almost a hundred copy/pasted slightly rewritten 15-30 line variations on the same theme? How do you refactor then? (Yes, I have seen this is production systems, and yes, it was very critical code!) As you say, it comes down to cost/benefit.

Basically, you just have to keep potential refactoring/rewrite costs down so that you are never trapped. Caveat: You can seldom predict the risks as well as you think you can. What you can depend on, is being observant to historical patterns in your codebase. It's hard to predict the business needs and the architecture in the future. On the other hand, it's often quite easy to see the historical trends in the codebase.

Really, the analogy of the physical file room is a great one. (Or if you're not familiar, a library, a tool chest, or any kind of physical inventory system works too.) You can have a file clerk that seems like "super-filer" because he never "wastes" time by putting files back, but this never works out in the long term. The same goes for a filing staff that never reorganizes the file room. Also, one can generally see the long term disaster developing long before it results in the dramatic disaster. You can see which shelves are getting filled up, and which drawers are getting overstuffed, usually weeks or months ahead of time.


> Then what happens when you have almost a hundred copy/pasted slightly rewritten 15-30 line variations on the same theme? How do you refactor then? (Yes, I have seen this is production systems, and yes, it was very critical code!) As you say, it comes down to cost/benefit.

The only thing that can happen - you understand the core function those 30 variations solve and introduce a full parametrised solution to the problem, then replace all places with calls to it. Generally there would be some way to tell where all these copy-pastes are using something not much more complex than a regex. But even then, maybe just leaving it duplicated is better.


The concern I have with this approach is how easy it is forget to alter one of the variants when you alter one of them.

How do you know the code you're touching has other versions that are semantically the same and should also be altered?[1]

How do you avoid having to fix the same bug several times because you bug-fixed one place but not the others?

How do you avoid the technical debt that builds over time when instances of the pattern within the codebase are each at subtly different "versions" with similar, but not identical, semantics (even though identical would have worked fine?)

[1] Of course, DRY code has the inverse: How do you know if an existing function to do what you want to do already exists so you can avoid duplicating the extraction?

Where my opinion falls today is: A slightly more complex solution is often less risky and more maintainable than the straightforward duplication solution because at least the complex one looks complex to a would-be maintainer who will at least be aware of things up front, whereas duplicated code can have a bunch of hidden costs whenever it's touched that won't necessarily become apparent until later when the presence of that technical debt throws a monkey wrench into unrelated plans.


If your variants need to be similar, that alone is a great reason to abstract. It makes the abstraction more valuable.

Meanwhile, there could easily be transformative code that just computes some stuff you often need. In those cases altering one variant need not effect the others.


No, what will happen is that one of your colleagues fixes a bug in a couple of places, someone else fixes some other bugs somewhere else, and at the end the buggy duplicated code becomes a buggy mess in which no one has any idea of what should be the right behaviour. How, how, leaving duplicates and starting this bloody mess can be ever better? I honestly can't see it.


Seems equivalent to the case where you have one generalized function with a bunch of obscure special cases coded into it. I feel like having 30 functions means you can easily trace which parts of the program are exercising which special cases.


The only thing that can happen - you understand the core function those 30 variations solve and introduce a full parametrised solution to the problem, then replace all places with calls to it. Generally there would be some way to tell where all these copy-pastes are using something not much more complex than a regex.

Knowing where these things were wasn't a problem. They were all on the class side of certain classes. (Yes, this was Smalltalk, but this entire subsystem didn't have a single instance variable in it!) I was on a team of 10, with some very smart guys. We all wanted to "understand the core function" in this subsystem but what it really was, was an object system, where objects were expressed as consecutive entries in a series of arrays. Every method resembled some kind of complex merge with multiple arrays and multiple incrementing indexes and varying side effects embedded in nested conditional logic. Only one developer understood the underlying object model, and she wasn't apt to share. Rather, it was the source of her job security. (Most days, she spent in the cafe on the 1st floor, reading a book, until she got notifications, then had to "consult.") If you pointed out the "unusual" nature of an entire Smalltalk subsystem without a single instance variable in it, she started talking to you about her PhD in Math.

No, you aren't such and genius, and myself and my colleagues such dullards, that we only needed you to show up and point out a few simple truths.


If you change something that has that many parameters there's a very high chance you will introduce a bug. You better have a test for each variations. Or simply keep them separate so that they do not affect each other.


When you have to change something like that, it's time to write some tests. Regardless of if the code is in a single place or distributed.


When you have to change something like that, it's time to write some tests.

So what do you do if that's not practical? See my other comment in this subtree where I give more background information. Sometimes the cost/benefit doesn't work out at that moment. (And believe me, we would've loved to refactor that whole thing!)


Cry, mostly.

This type of thing is going to be terrible regardless of if it was in one place, if you can't test it.


I've had a particularly bizarre experience a few times recently. At a previous company we had identified that we needed some bit of code to solve a critical business problem, we tackled it, solved it fully, on the second or third try, and put it into a library function that underpinned the whole app.

We did such a good job that we never touched it again. My next company or a place I interviewed could just not understand why I couldn't answer questions about something so fundamental to our success. Certainly I must be overplaying my involvement in the project.

The joke I make is that we spend all of our time looking at the trivial code in our apps. The more important it is the less time we spend touching it, which is just so backward from other industries and I don't know how we fix it.


these days grep/vim/sublime/emacs/and the rest of the gang all have power search and replace editing functions. sometimes it's better for the whole world to just copy-paste-alter your code.

The compiler is also quite helpful. Need to change a class? Just change it and recompile. All the places it complains about are exactly all the places that need to be changed, and a targeted search+replace will get them all easily.


"Every piece of knowledge must have a single, unambiguous, authoritative representation within a system"

It's a much better rule as originally stated than as misapplied.

That's not to say that there won't be exceptions, even to the original rule.


I've come to the conclusion that there are at least 3 kinds of DRY, and application varies:

1) There's a higher level abstraction in your code. Abstract at your own risk, unless you have at least 3-4 instances, or your architecture absolutely requires it.

2) There's wrappers around libraries and you only use it a certain way. I'm OK with this here. Sure, I could copy-paste the same exact parameters each time I use it in my code, but I will just write my own fiddleFoo() and use that every time.

3) There's when things MUST BE THE SAME. You should abide by DRY even if you only have 2 instances. E.g., we have an SOA and we DRY our routing. Routing MUST match or else things break.


I agree with this important article.

In the large open source project I work on, we have a prolific contributor who is systematically going through the entire code base, "refactoring" some of it in way the impairs my ability to read, understand, and maintain the code (even code that I originally authored). Over time, we've added scores of new macros -- as result ordinary C knowledge is now insufficient to read and understand what the code is doing.

As the author suggests, the guiding principle should be factor only when it results in a net reduction of complexity. In some cases, the cost of adding new abstractions outweighs the gains from Don't Repeat Yourself.

There is some value in having some blocks of code that can be read in isolation, even if there some bit of repetition between blocks.


The problem likely isn't DRY, the problem here is poor abstractions.


The problem here is that our so-called editors cannot inline calls for us.


I feel that would just hide other problems. As soon as you pull some code out into a function, it is inviting other uses of it. As soon as someone fixes the function for their use in a way that breaks it for yours...


... then your unit tests make it clear that they shouldn't do that.


Statistically, relying on tests to cover areas of the code that you are not actively working on is not a reliable methodology.

That is, yes, ideally you have a test catch something. Realistically, you don't have 100% test coverage.


Sure. That's why there are more quality-oriented practices than unit tests. Personally, I'm also a big fan of pair programming, collective code ownership, automated functional testing, good run-time monitoring, weekly retrospectives, and doing a five-whys for every bug. Even with that bugs happen, but hopefully not very many of them.

Still, when I'm worried about other programmers breaking something, I write unit tests. Library code should have good tests that document the purpose of the shared function. And my general experience is that when I break some shared library function anyhow, I'll see other tests going red. Unit tests are my first line of defense for this sort of problem.


> As soon as someone fixes the function for their use in a way that breaks it for yours...

... it becomes clear that you're actually dealing with two separate "pieces of knowledge".


Many editors let you peek at the source of a call in a pop-up window.


Even so, it sort of breaks the flow. The difference between reading a novel from beginning to end or a choose-your-own-adventure book.


In a large app, inlining all calls would be unreadable. Having sane, descriptive names for functions (and using functional programming, for that matter) solve the problem better than inlining.


We talk about cases where subject article has sense. In such cases you can't 'just give it a name', because read the article first please. Every instrument has its uses, I'm talking about that we miss it, not about what mess could be done by abusing it. Mess can be done in any language, in any editor/IDE.


Much damage is done by people with too much time on their hands.


What most programmers knowing a particular programming language has in common is that they know the built in and standard library functions. So it's often better to use those, then abstracting them away under different names.


I am firmly in the DRY camp, although obviously there are limits to how far you should go to maintain the dry-purity. In my experience, the biggest issue with copy-pasting code is that someone else (or your future self) will edit the code in only one of the places, making the codebase inconsistent. If abstraction gets too costly, I have found it useful to simply add a comment in both places, "copied from x" and "pasted from y" respectively. Or, if there are constants involved, break those out as global constants.


The core of the problem is that people often confuse “parameterization” with “abstraction”:

Parameterization is simply adding adjustable knobs. For example, a function can take as argument a gigantic list of flags, where each particular combination of flags makes the function do something slightly different.

Abstraction is actually separating concerns. It's dividing a program in logically independent parts, whose correctness can be established individually, without worrying about the others. Abstraction means that software components only interact with each other through explicitly designated interfaces, often enforced by the programming language itself.

Parameterizing is easy. Abstracting is difficult. Unfortunately, parameterization without abstraction is a source of headaches in the long run.


There is a quote from Pólya that I love in margin notes of Concrete Mathematics.

    There are two kinds of generalizations.  One is cheap
    and the other is valuable.  It is easy to generalize
    by diluting a little idea with a big terminology.  It
    is much more difficult to prepare a refined and
    condensed extract from several good ingredients.
I think it applies here, fairly well. Just finding a way to not repeat yourself is relatively easy to do. Finding a way to not repeat yourself, but keep the same level of information conveyance is actually quite difficult.


New Motto: Don't Repeat Yourself, Unless You Need To Repeat Yourself, In Which Case Repeat Yourself, But Be Careful How You Repeat Yourself ...

DRYUYNTRYIWCRYBBCHYRY


Dry, un-try, wicker baby cherry.


DRY is a problem because it's so catchy and seems so self-explanatory.

The original formulation of DRY spoke in terms of information. Any piece of information should exist in exactly one place in your code. As I've said before, modulo some caveats related to double-entry bookkeeping, this seems exactly right. An inveighing against anything that looks repetitive is not.

The goal is not compression, it's ease of understanding and ease of making correct modifications.

I've started using "Huffman coding" to refer to the practice of misapplying DRY by deduplicating superficial similarity rather than information content.


This is a cogent way to frame the acronym.

There are times when repeating yourself is just accidental. I might have the same middle name as a coworker. Do we actually "share" that middle name? That depends on who's asking, or what system you're trying to build. But probably not.


"It can be expensive to use the wrong abstractions, therefore you should copy and paste" is not sound reasoning. The _right_ abstractions can be incredibly (insanely, ridiculously) beneficial to a code base across all dimensions -- correctness, maintainability, performance.

Searching for the right abstractions is definitely worthwhile - even if it sometimes incurs extra rewrite/refactor churn. The better the abstractions used, the easier the refactoring, the faster you find the good abstractions, the easier it is to change the code as your understanding of the problem grows ...


He's saying duplication can sometimes beat inappropriate abstraction. You're saying appropriate abstraction beats both duplication and inappropriate abstraction. (Which I thought was obvious.)


Sometimes what you have is a coincidence, not an abstraction. In that case, you should duplicate the coincidentally-identical implementation.



> If you find yourself passing parameters and adding conditional paths through shared code, the abstraction is incorrect. It may have been right to begin with, but that day has passed. Once an abstraction is proved wrong the best strategy is to re-introduce duplication and let it show you what's right.

I like this take on when go full DRY or not a lot. It's very easy to abstract out two things that look identical in their earliest implementation, but are actually different in intent and function--then when you get a new feature request on one of them they become obviously wildly different, and then to maintain the abstraction you have to throw in a bunch of conditional blocks into it and it turns into a nightmare to maintain.

I heard a rough rule of thumb to stay away from abstracting anything until you see it repeated identically at least three times, not two. It's just a rough rule of thumb to try and make sure you're not prematurely abstracting something and building the wrong thing, but anecdotally it's been useful for me.


> I heard a rough rule of thumb to stay away from abstracting anything until you see it repeated identically at least three times, not two. It's just a rough rule of thumb to try and make sure you're not prematurely abstracting something and building the wrong thing, but anecdotally it's been useful for me.

I've not heard that before, but I like it. I have always advocated for waiting until there's at least two instances, with a similar reasoning: don't abstract until there's a 100% proven need to do so.

> It's very easy to abstract out two things that look identical in their earliest implementation, but are actually different in intent and function--then when you get a new feature request on one of them they become obviously wildly different

On top of waiting for the proven need, I think the other big thing you should strongly consider when building an abstraction is the future of that code.

If you need to make a change (regardless of what that change is), are you fairly certain every instance of it going to change in the same way? If the answer is not an immediate and resounding 'yes', then you should really stop and consider if your abstraction is actually a helpful step towards DRY, or merely clever "architectural astronautism" that is only going to be a pain to maintain later.


> I heard a rough rule of thumb to stay away from abstracting anything until you see it repeated identically at least three times, not two.

I used to work in a shop with that rule, and I've mentioned it here on HN. It's still pretty easy to track down and rewrite 3 occurrences. Heck, it's still pretty easy to track down and rewrite 7, though as you let the numbers increase, you run the risk of missing an occurrence, making a mistake when you do the refactor, or the introduction of a confounding "idiosyncracy" in one of those occurrences.

In my experience, 2 occurrences is too little data to justify refactoring around, unless those are pretty hefty chunks of code. 2 occurrences of 2 consecutive lines is definitely too little.


I think the focus on counts is wrong. The first question is "are these the same for a reason." If there are two places in the code that need to behave the same or the code is broken, then you should probably pull it out. If there are 100 places in the code that happen to behave the same, but any subset of them could change independently tomorrow, then you might not want to pull it out.


Exactly - 3 occurrences is a pattern - 2 is bordering on zealotry.


Does this article add anything to Sandi Metz's original?


If we could track copy-pasted chunks, then manual abstraction step would be needless, as it is already clear from editing history. We could then transform between abstract-nonabstract source or completely remove that distinction by making it view property. And modify one place and quickly update (or not) all others. We'll also have to deprecate 'modern' text sources and store code in structured way.

That should be all obvious to anyone with 10+ programming experience, imo, but I always see such "oh, dry or not, kiss or not" jitter. We now have great tools to do everything, except programming. Someone must make programming app better than just notepad-with-word-completion junk.


Ever heard of a function? If it's as simple as copy paste, that's all you need. The trickier abstractions come when you need something to be almost the same in two places. Copy and paste tracking wouldn't work there.

Replacing functions with some kind of copy paste tracking editor does not sound like an improvement.


Coverity actually has a really good tracking ability where it will find code that was copy/pasted/edited but the edit looks wrong.

I can't remember the name, but it will identify what it thinks is the original, and show where the copy/paste is, along with the edit it thinks was missing. This was usually around some error handling logic in our code base. Think:

    if (checkIsValid(foo)) {
      printf("Looks like foo is not valid.  foo == %s\n", foo);
    }
Wasn't uncommon for someone to copy that and forgot to change all places "foo" appeared.

Long story short, advanced static analysis has come a long way.


This is already solved on bigger scale -- RCS. You can develop, merge, blame, diff slightly different filesets, but not lineranges. Look, we already have both 'just functions' and dry-vs-complexity problem unsolved.


Static code analysis and a good IDE make this point considerably less important. For example, duplicating code is not nearly as expensive in my Spring (Java) application as it is in that shiny Nodejs webapp. It's still important, but solid tooling combats it somewhat.


High level of abstractions accumulate especially in big monolithic applications and I agree it takes time to get the feature in. Especially, you have to get an idea of what code has already been written and how it plays with the components before you start doing anything. First, to not repeat yourself and also to lessen the chances of doing something wrong. I think everything has its trade-offs and use cases. This approach in large applications could make you write less tests, as there is less amount of repeated code. On the other hand you can split your project into independent smaller parts and at the cost of sacrificing DRY you can get something more easy to understand, but then you have to put more effort on testing and integration testing to see if all bits play together correctly and as expected. I agree that the sense of what should be abstracted and what shouldn't comes with experience. It is also very important to get as much information about the problem - this helps choosing the right approach in the context of above.


I tend to take DRY more from a standpoint of data and code to transform that data into other data.

I'm less concerned about duplication in code that uses data than I am the code that determines whether the data itself is correct.


Here's a research article I wrote on this topic; there's a section on the "costs of abstraction":

http://harmonia.cs.berkeley.edu/papers/toomim-linked-editing...


Interesting. Does it scale? (Skimmed the paper and believe have understood the process but examples you have are limited to n=2 clones. What if n>>2?)


DRY is rather language-dependent. In Java, where only single result can be returned from a function, it is often not worth the efforts to factor out common functionality if that requires adding an extra class just to pas few in/out parameters.

In C++, on the other hand, the barrier to follow DRY is rather low as a I can literally move code to a helper function and easily parametrize it for reuse in another place.


Am I the only one who thinks this sort of advice would be immensely more useful with some concrete examples?


I was thinking the same thing, it's hard to judge this article without knowing exactly how much code duplication the author thinks is appropriate and in what circumstances.


I think this article could benefit from a few good examples. Sure, one should never apply principles in a rote fashion; they're guides to judgment. But the way we develop good judgment is through practice. Without seeing how he'd apply this in specific situations, I don't feel like I've really learned anything.


My complaint with DRY - sometimes it's just significantly easier to learn a codebase or learn the problems that led to its structure/abstraction by just writing many of the underlying components yourself. Whether that code goes into the codebase is somewhat irrelevant from a learning perspective.

On top of that, when 95% of the code you work with is based on a "Minimal Viable Product" (the bane of my existence), assumptions will probably be made incorrectly, or in a way that adds significant technical/organizational debt. By doing it yourself, the original assumptions can be put into a different light and can help reduce all sorts of debt in the future.


I think we need to address this giant elephant in the room called object oriented programming. https://www.youtube.com/watch?v=IRTfhkiAqPw


Yes, you read that right. Duplicating code (a.k.a copy + paste) can be a good thing. Namely when the abstraction that would replace repetitive portions of your codebase is a pain to understand.

Yes, this 1000 times! DRY or any pattern/principle is not some sort of universal truth/declaration of ultimate goodness from your deity or some deep law like the Heisenburg Uncertainty Principle! It's just a rule of thumb. As a practical technologist, you must always implement with respect to the contextual cost/benefit. Taking any principle as an automatic ultimate good is simply intellectual laziness!

I was in a company with a Java product and a Smalltalk product. The Smalltalk product was rather well factored. We had a policy of rewrites to keep things architectually clean. We had the fantastic Smalltalk Refactoring Browser as our IDE, and we weren't afraid to use it. (To this day, you can refactor an order of magnitude or more faster with it than you can in other languages.) The Java product consisted of many independent parts with tons of duplicated code between them. Definitely not DRY. (There were also 20X more Java programmers than Smalltalk, for a comparable product.)

One thing I noticed, is that bugs on the Java side were more prevalent, but never affected the entire application. Whereas, the far better factored Smalltalk product could occasionally have the rug pulled out from underneath the entire application by the introduction of a library bug. (Though it would be very rapidly fixed.)

Another thing at that shop: We had a policy of only empirically verified DRY. We never wrote code to DRY in anticipation: only in response to actual code duplication. Generally, actual practice and experience is better at producing exactly the needed architecture than the prognosticating cleverness of programmers.

EDIT: Shower thought! (Yes, I literally got out of the shower just now.) There is a lot of emphatic but somewhat thoughtless application of principles and rules of thumb because it's actually motivated by signalling. The priority is actually on the opportunity to signal and not on the underlying cost/benefit calculation. There is a nice analogy for this in music. Once a musical scene gets a certain degree of prominence and/or commercial success, people start prioritizing sending the signal that they are part of that musical scene. So you will find bands and musicians using those techniques and stylistic flourishes, not because of the underlying aesthetic principles in support of the particular piece of music, but as an opportunity to signal. This happens in OO and also in the startup scene. (And, OMG, but is signalling given a lot of energy in the Bay Area!)

So the insider move is this: Look for the underlying motivation behind the signal. Did this person jump on the opportunity to signal, or did they first seek out the information to make the determination of cost/benefit? How carefully did they do that? (Hopefully, we can get some of the hipsters/pretenders to at least go through the motion of doing the above. Even just going through those motions turns out to be beneficial.)


A lot of software developers are awful story tellers, and what they end up using for abstractions sounds like something George Orwell would cook up.

And when those developers discover TDD, then their code spends multiple pages saying nothing at all. Oof.

I think deep down most of us intuit that this is a problem. We will persue an articulate but not particularly bright candidate and pass on someone who quickly arrives as the solution without showing any work.

But we let in people with good verbal and poor written skills and they create deserts, their code is so dry. No, I don't want you to explain to me again how great your code is. I just want to be able to follow it without using a debugger. Because maybe this bug isn't even in your code and I just want to make a determination and keep moving.


What's wrong with Orwell's style of writing? I found him to be simple and direct.


I think he meant NewSpeak. But maybe what he was thinking is actually "portmanteau".


No I meant newspeak. If you name everything in an app with noncommittal things and then use them in different spots to mean different things, you end up with code that has to be memorized to be understood.

As someone more eloquent said, they are so afraid of making a bad decision that they try not to make any at all. Everything is named neutral words like 'context' or 'options', the verbs are neutral too, and there are other code flows that use the same name with different data.

As a general rule, if you would describe a code flow out loud in plain English, and not use any of the nouns or verbs that appear in the code, someone has ruined that piece of code.


I would say that the article simply states the obvious: bad abstractions are bad, but for some reason that I'm missing tries to derive from there that copy and paste sometimes is good. And the quoted comments persist in the same illogical path. Saying that bad abstractions are a bad thing and then copy and paste must be a good thing in some cases is simply a non sequitur. There is no whatsoever correlations between the two statements. And if you want to know my opinion, blind copy and paste is always bad. Especially in codebases over 1MLOC. Those bugs that you have to fix multiple times in multiple places will kill yourself and, believe me, you will really curse whoever thought that doing a copy and paste was a good idea simply because they couldn't see a better solution.

These kind of trends in HN, that now I'm seeing more often, make me terribly sad, it's like promoting mediocrity instead of a well thought solution just because of laziness. If you can't understand some abstraction then try harder, or if the abstraction is not good then find a better solution. But PLEASE don't start to copy and paste and above all don't start promoting this abomination.

Source: I wrote a lot of copy and paste when I was too young to understand and I had to read too many copy and paste when I stopped doing it.


Saying that bad abstractions are a bad thing and then copy and paste must be a good thing in some cases is simply a non sequitur.

What happens when you are under deadline and you understand just enough to make a bad abstraction, but not enough to make a good abstraction? So here, you seem to have an underlying assumption of timeless omniscience. (This is common in people attracted to absolutes and abstractions.) By all means, if you have the time to make the right abstraction and DRY, then please do that. However, it's just as fallacious to suppose that any DRY abstraction is necessarily the right thing to do. Again, it's all about contextual cost/benefit. (Most of the time, it is the right thing to do, but this certainly isn't absolute.)

To be more specific, what if you have an abstraction that lets you do DRY, but it's obtuse, it looks like it's not quite the right answer, and it would involve the modification of a library that is used by many other parts of the system for which you don't have tests? Are you certain that in all cases like this, the cost/benefit of not having to do the search in your code next week is going to be worth it?


Your case is not in favour of duplication, it is in favour of temporary duplication because of a deadline. So I think that you agree with me that duplications is always bad, because you can justify writing whatever abomination close to a deadline, not only duplication. But if you leave untouched that mess in the future then sooner or later it will bite your ass.


My take: I try to write it DRY first. If I fail to come up with a good intuitive abstraction after half an hour, I just write it simple and copy code.

The next day or later I come back to the code and sometimes I find a neat abstraction then, which just needed more context. Sometimes I can get rid of the whole section too...

It all... Depends.


My rule of thumb is wait to generalize something until it is repeated 3 times. If I try before I can't really know what the problem is. But if it's something that's repeated across projects, sure I tackle it right away, but then maybe it should be open source?


Half an hour? When you're first writing the code, you should know immediately if you're going to need to repeat it and what the abstraction should be. If you don't, then don't waste your time. Later, if it turns out that you actually do need to repeat the code, it should be clear at that point what the abstraction should be.

Let the actual need for duplication guide you in finding abstractions, so you don't waste your time and produce difficult-to-follow code for no reason.


Like I wrote I avoid producing difficult to follow code.

I tend to start structuring parts of applications and algorithms on paper, and this half hour with paper and pencil helps a lot.


I like the last part: let the code pull DRY out of you, don't push DRY in.


>Yes, this 1000 times! DRY or any pattern/principle is not some sort of universal truth/declaration of ultimate goodness

I think "write the least code", from which DRY is derived from is universal.

It's just in competition with a few other axiomatic principles - typically the rule of least power and loose coupling, which can suffer when DRY is maximized at the cost of everything else.

>We had a policy of only empirically verified DRY. We never wrote code to DRY in anticipation

Can't agree enough with this one. The worst kind of re-factoring is that done in anticipation of future code.


Great point about signaling. You should write a blog post or something about it.

I find that many so called best practices propagate because of how engineers want to be perceived. You want to be seen as the smart guy who reads the Gang of Four book so you shove OO patterns where they don't belong. Or you let others blindly follow dogma because you don't want to be seen as the dummy who just doesn't get it.


> Yes, you read that right. Duplicating code (a.k.a copy + paste) can be a good thing.

No it is never a good thing because the code you write is the code that has to be tested and managed. If a routine you're going to copy paste 15 times has a bug you might not give a damn but the guy after you who has to clean up the mess might prefer this routine to be only in 1 place. The rule is simple, if some substantial logic is used at 2 different places it has to be refactored into its own component. Just like a variable has to be declared if a data is used at 2 different places, that's basic CS. If one has hard time to figure out what is what then it's a naming and documentation problem, not an "DRY code" problem.


If you're still willing to use the word "never" then post again in five to ten years and tell me how you feel about it then.


10 years ago I had pretty much the same ideas that I expressed in my previous post about copy and paste. Maybe at the time I was much more extreme given the codebase mentioned above on which I was working on. I still don't see any argument that explains why copy and paste should be better than find a better solution, apart from laziness obviously.


Within the same project, and when you don't want to handle the coupling costs that can come with generalization.


> Generally, actual practice and experience is better at producing exactly the needed architecture than the prognosticating cleverness of programmers.


My own experience with DRY is that it has diminishing value in a distributed systems ecosystem. There is definitely still value, but code coupling between components of a distributed system is very costly. I'm pretty fed up with coordinated deployments, big bang upgrades, etc., so I'm very conservative about applying DRY. Often I'd rather copy a bit of code to avoid the dependencies, which usually pays off better in the long run.


Easily solved: When writing an abstraction, document with a date the reason for the abstraction.

Seriously, a comment that goes like this:

"2009 sept 20 : Peter Pan: this abstraction is to hid the ugliness of using SQL Server 2003 which is still in use by Slow Corp."

... now in 2016 a maintainer knows the assumptions and the reasons for the abstractions and can make a much clearer decision about how the abstraction should be treated.


Any time you repeat some block of code twice, you should religiously turn it into a function which is called twice with the same arguments.

Any time you call a function twice with the same arguments, you should religiously write a two-iteration loop around just one call.

Any time you have more than one two-iteration loop in the same program, you should write a 'dotwice' macro and use that instead.

Any time you have a function call sitting in a loop just for the sake of being repeated, you should avoid the argument expressions being evaluated multiple times; you must religiously evaluate the arguments to temporary variables outside of the loop, and in the looped function call, refer only to the temporaries.

In order not to repeat this pattern itself you need a proper "calltwice" macro, and to be working in Lisp, ideally: (calltwice yourfunc expr ...). yourfunc and each expr are evaluated once, then the resulting values are applied to the function twice.

This is a DRY town, damn it!


For a more visual explanation of the "cost of abstractions" watch this presentation by Cheng Lou (Facebook). https://www.youtube.com/watch?v=mVVNJKv9esE


This is an excellent talk which introduces some very solid concepts and breaks down the problem in a very sophisticated way. I hope more people watch this.

In fact, I think this talk is superior to the linked article. You should consider submitting this talk directly to HN.



This article needs to be shown to all upcoming architects, senior developers, and design minded engineers. Too often the balance between abstraction and specialization isn't thought about explicitly but rather dogmatic rules kick in.


Sometimes I think programmers and theologians have a lot in common. Both have dogmas and articles of faith which are backed by internally consistent logic. Now to the comments to hear the Catholics and Protestants debate!


DRY etc should just be guidelines, not rules. I hate when something is made overly complex just to avoid repeating, global variables, long functions or large files. For example complicated one liners in a imperative language.


Can we just ban coding posts that have no code? They usually give you an illusion you learned something but are a total waste of time. It's like reading a self-help post on following your biggest dreams or something. The hard part isn't these big beautiful ideas, it's how to actually put it in practice. How to actually take something away and measure whether abstractions or duplications are more costly in real code??


This seems like an anti-intellectual argument. For example would "ban" many of Dijkstra's famous numbered papers. I'm guessing you haven't read them.

While we're at it, why don't "we" just "ban" - mathematical proofs without numbers - all criticism, as it's about art but contains no art - all papers in the field of music theory, since they aren't music.


It's definitely anti-intellectual. But people who run this site themselves say they're interested in high signal to noise ratio.

I don't think those are good comparisons. Better example would be a post about how to write rigorous math proofs and then not giving a rigorous proof as an example.


>> Better example would be a post about how to write rigorous math proofs and then not giving a rigorous proof as an example.

Oh, you mean like some of Dijkstra's numbered papers? https://www.cs.utexas.edu/users/EWD/transcriptions/EWD10xx/E...

I think "examples might help" is valid feedback, but the author has sort of provided a reason why examples aren't included: in small projects the abstractions are usually worth it, was my read. So while the author could talk about examples, he probably couldn't include then in a piece of this size.

"Can we ban" is what has rubbed me wrong way. Who is "we". How are "we" "banning"? Does this just mean you don't agree with this post being highly upvoted? Or that you want to circumvent the opinions of those who upvoted with a "ban"?

It is interesting also to think about the irony of an argument that one should work in terms of concrete examples of the subject matter, in objection to a piece that argues one should not always abstract things.


Something that I see more and more of lately is module exaggeration, like projects that have tens of thousands of small files that all depend on each other and are complexed together using modules. There are 5-10 imports then a few lines of code that binds them together and finally an export ... The worst part is that this is done by respectfully software companies and even taught in guides and tutorials.


The DRY principle issues usually apply very well to OO programming and the whole "framework" concepts, where individual libraries or pieces of the code cannot easily be reused for different projects.

Another big issue in the framework case is the inversion of control, which further restricts how the reuse can happen and makes it very hard to reuse the code differently in an efficient way.


I was younger and I thought that the right abstraction - and more specifically, the right language feature - would surely lead me to higher productivity. By invoking a certain keyword my code would magically be better in some way.

However, language features are not the only abstractions you have available. In fact, they're typically the least interesting ones you could apply to a problem - any yokel can lift code out and slap parameters, annotations, reflection, or generic type signatures on it. But to reach interesting abstractions, you have to be patient enough to do discover the shape and flow of the code over time and subsequently apply exactly the right data structures and algorithms to either reproduce the same shapes in an abridged form, or different shapes that achieve the goal more efficiently.

So when I position myself as anti-DRY, which I often do now, I'm saying, don't succumb to the thinking that just shovelling things around and slightly repackaging them will add up to what you wanted. An actually useful abstraction cuts so deep that it may amount to a different thing altogether. The language features are there to solve very straightforward situations that are known to reoccur time and again. Do not use them in clever ways.


Frankly without encapsulations inside of encapsulations, the author would have never been able to make this blog post. Reference TCP packets wrapped inside of Ethernet 2 frames as the basis of the entirety of the internet.

I don't think this is as "extreme" as some perceive it.


Off topic, but since somebody will always complaint about the bad layout or the difficulty of reading of a blog post submitted to HN, I'd like to point out the excellent readability of this one.


The stark blacks kind of burned my eyes, but I definitely agree it was readable.


One can always lower the brightness of their screen, but they can't easily change the contrast in the font/bg.

Besides, research has shown that plain ole black on white is the best combo for readability.


Figure out what varies => encapsulate that. Could be behavior, structure, or creation. This idea that you can avoid DRY in a positive way simply means one has yet to recognize that first statement.


I think the title should be "wrong abstractions." Wrongful means unfair/unlawful/unjust but it doesn't ever mean incorrect.


Article in a nutshell: Goodness = DRY×KISS


Thanks for the article. I agree with its main points


Another thing that is difficult to get right is choosing a suitable font...


TL;DR Think before you type.


I would have taken it more seriously if he addressed security and unit tests. His perspective here is sophomore. He is right, but not because of what he wrote. DRY isn't just about abstractions, it's about the cost of writing new code. And testing and security verification are hidden costs to duplicated code.


> His perspective here is sophomore.

An HN comment usually gets surprisingly better if you take the name-calling bits out of it. I'd say that's the case here.


I understand that it was a bit blunt, but isn't referring to a perspective as 'sophomore' just another way of saying it was short sighted or that it doesn't take the broader ramifications into account?


Yes, but to my ear it also includes a putdown, along the lines of 'juvenile' or 'childish'. That's why I referred to it as name-calling.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: