I’ve usually heard this phenomenon called “incidental duplication,” and it’s something I find myself teaching junior engineers about quite often.
There are a lot of situations where 3-5 lines of many methods follow basically the same pattern, and it can be aggravating to look at. “Don’t repeat yourself!” Right?
So you try to extract that boilerplate into a method, and it’s fine until the very next change. Then you need to start passing options and configuration into your helper method... and before long your helper method is extremely difficult to reason about, because it’s actually handling a dozen cases that are superficially similar but full of important differences in the details.
I encourage my devs to follow a rule of thumb: don’t extract repetitive code right away, try and build the feature you’re working on with the duplication in place first. Let the code go through a few evolutions and waves of change. Then one of two things are likely to happen:
(1) you find that the code doesn’t look so repetitive anymore,
or, (2) you hit a bug where you needed to make the same change to the boilerplate in six places and you missed one.
In scenario 1, you can sigh and say “yeah it turned out to be incidental duplication, it’s not bothering me anymore.” In scenario 2, it’s probably time for a careful refactoring to pull out the bits that have proven to be identical (and, importantly, must be identical across all of the instances of the code).
My CTO often asks me to implement a feature to do X and make it “generic enough to handle future use cases”. My answer is always the same - either give me at least three use cases now or I am going to make it work with this one use case. If we have another client that needs the feature in the future then we will revisit it.
Of course, there are some features that we know in advance based on the industry how we can genericize it.
From (I think) an old Joshua Bloch talk on API design, paraphrased:
* If you generalise based on one example, you will get a flexible API that can handle only that example.
* If you generalise based on two examples, you will get a flexible API that can switch between those two examples.
* If you generalise based on three examples, you have a chance of abstracting over the common essence.
Hmm. I wonder if he got that from Simon while he (JB) was at CMU. He (HS) once said to me, jokingly, I think, “One makes an observation, two makes a generalization, three makes a proof”.
The Rule of 3 is a great rule, except when it isn't.
I had a colleague some time ago who wrote a couple of data importers for FAA airspace boundaries. There were two data feeds we cared about, "class airspace" and "special use airspace". These airspace feeds have nearly identical formats, with altitudes, detailed boundary definitions, and such. There are a few minor differences between the two, for example different instructions for when a special use airspace may be active. But they are about 95% the same.
The developer wrote completely separate data definitions and code for the two. The data definitions mostly looked the same and used the same names for corresponding fields. And the code was also nearly the same between the two (in fact exactly the same for most of it).
It was clear that one importer was written first, and then the code and data structures were copied and pasted and updated in minor ways to create the second.
Because the data structures were unique for each (even if they looked very similar in the source code), this impacted all the downstream code that used this data. If you saw a field called FAA_AIRSPACE_MIN_ALTITUDE, you had be sure to not confuse the class airspace vs. special use airspace, because each of these had a field of the same name, the compiler wouldn't catch you if you used the wrong one, and you may have the wrong offset into the data structure.
I asked the developer and they told me, "I have this philosophy that says when you have only two of something, it's better to just copy and paste the code, but of course when you get to the third one you want to start to think about refactoring and combining them."
Yep, the Rule of 3.
In this case there were only ever going to be two. And the two were nearly the same with only minor differences.
But because of blind obedience to the Rule of 3, there were many thousands of lines of code duplicated, both in the importer and in its downstream clients.
I still like the Rule of 3 as a general principle, and I have espoused it myself. But I think it is best applied in cases where the similarities are less clear, where it seems that there may be something that could be refactored and combined, but it's not yet clear what the similarities are.
I think it is a very bad rule in a case like this, where it should be obvious from the beginning that there are many more similarities than differences.
Every single rule or advice in programming is good until it isn't. OOP is good until it isn't, function programming is good until it isn't, premature optimization is the root of all evil until it is the root of all good.
For some reasons humans have this deep need to try and boil things down to bulleted lists which in the domain of programming are just incredibly not useful.
This is because programming is an art. It has fundamental components (objects, functions, templates, iterators, etc) like in visual design (point, line, shape, value, etc).
I think engineers should read the old and the new C++ books and master that language to know the evolution of all these paradigms and how to use them. There’s so much wisdom in the “Effective C++” series and Gang of Four and “C++ Templates: The complete guide“ to name a few.
Problem is in this “start up culture” to bang things out and get them working the art is left behind. Just like many other arts.
I was part of an uncredited team interviewed by Meyers for "Effective Modern C++". Always thought ir was somewhat ironic. At the firm (declined acknowledgments because they shun media attention) we werent even using a C++11 compiler on either Linux or Windows at the time. Yet, at least two of the patterns, I recognize as being at least a partial contributor to.
Myself, I lost track of C++ standards changes with C++17, and Ive not been using C++ for the last several years.
I still love the power and speed, but right now I'm dping more ETL work, and Python is a better and more productive language for that.
I think that you have a point but I often find myself citing guidelines or rules when I am evaluating code decisions or questioning code design. Maybe it depends on your interpretation of the phrases, some sayings should be followed religiously but others applied with discretion.
The rule of 3 usually is in reference to small scoped abstractions, not whole modules or subsystems. We're talking about extracting a short function, not significant and potentially thorny chunks of code.
But I guess no one explicitly spells this out, so I could see where someone could become confused.
Premature abstraction. Rule of 3 helps. But I found better principle for it:
Any abstraction MUST be designed to be as close as possible to be language-primitive-like. Language primitives are reliable, predictable, and non-breaking. If they do, they don't affect business logic written on top of it. If parameters are added, defaults are provided. They don't just abstract, they enable developers to express business logic more elegantly.
The challenge is to pick the part of the code abstractable to be primitive-like first and make it a top priority.
This is why language features like Rust async, Go channel-based comm, ES6, C++ smart pointer was such hype in their time and is used up until now. It also applies to enabling tools such as React, tokio, wasm-bindgen, express, TypeScript, jquery (even this which is not a thing anymore).
I find abstractable parts in code by thinking about potential names for it. If there is no good name for a potential common function, it's probably not a good idea to extract the section into a function. Maybe it can be extracted into two separate functions with good names?
The rule of 3 usually is in reference to small scoped abstractions, not whole modules or subsystems. We're talking about extracting a short function, not significant and potentially thorny chunks of code.
I would say it’s for entire features sometimes. For instance, we are a B2B company. We have features on the roadmap or a feature might be suggested by a client. Either way, you hit the jackpot if you can get one client to pay for a feature that doesn’t exist in your product that you can then sell to other clients.
The problem is that you don’t know whether the feature is generally useful to the market. But you think it might be, in that case, you build the feature in a way that is useful to the paying client and try not to do obvious things that are client specific. But until you have other clients you don’t know.
That's been my experience doing embedded stuff. Customers will tell you they want a feature. You'll implement it then find out they don't need it enough to devote resources to utilize it on their end. So then it just lingers in the code base. And never gets real world testing. Lately I've been pulling unused features and consigning the code to the land of misfit toys. Just so I don't have to maintain them.
If that’s the case and they were willing to fully pay for the feature, at professional services + markup hours, it wasn’t a loss. In our case, we still got unofficially professional services and recurring subscription revenue and now we have a “referenceable client”.
If it’s behind a per client feature flag, it doesn’t cause any harm.
> I asked the developer and they told me, "I have this philosophy that says when you have only two of something, it's better to just copy and paste the code, but of course when you get to the third one you want to start to think about refactoring and combining them."
I would argue this is a good rule of thumb, but nothing should ever be a hard unbreakable rule. The usual refactoring rules should apply too: if 95% of the code is the same, you should probably go back and refactor even without a third use case because it sounds like the two use cases are really just one use case with slight variation.
If the compiler didn’t catch it, doesn’t that say it was modeled incorrectly?
Why not have an IAirSpace interface or an abstract AirSpace class with two specializations? If there were processes that could handle either it should take an AirSpace class, one that could only handle one or the other took the specialization.
If the steps were the same for handling both, have step1...step(n) defined in the concrete class and have a coordinating “service” that just calls the steps and takes in an IAirSpace.
To be honest I don't get it. What language was this in, that you couldn't duck type the data or make the data structures inherit from some shared interface that encapsulated the commonalities?
> The Rule of 3 is a great rule, except when it isn't.
"Rules are for the guidance of wise men and the obedience of fools."
It's unfortunate that you had to deal with a fool, but that's not a indictment of the particular rule that they picked to follow off a proverbial cliff.
Thanks for your reply. I just want to note that what appears to be a quote in your comment ("Rules are for the guidance of wise men and the obedience of fools") is not something I wrote at all. If you still have time to edit the comment, I might suggest changing that so it doesn't look like you quoted me.
> But because of blind obedience to the Rule of 3, there were many thousands of lines of code duplicated, both in the importer and in its downstream clients.
Well sure, blind obedience to anything at all can cause problems.
Are you sure merging code for different datafeeds would be better though? In such cases, what is identical and what is not, should be references to eachother in comments. But you don't know beforehand which approach would be better, unless you know the datafeeds will stay the same as now.
The sad story here is that if you know the datafeeds will stay pretty static, there's little to gain making an advanced abstraction over them. Which is why you often find duplicated code that haven't been touched for years.. The original target was met with a naive approach, and no new changes lead to stale codebases.
If you have a 95% match on something nontrivial (and it likely won't diverge significantly), I'd go for merging even with 2 cases. At least merge most of the common parts.
Reading a couple of ifs, and some not-quite duplicate procedures seems much better than having a complete 2-set in cross-refenenced files.
Why are you reading a couple of ifs instead of having the two similar things represented by separate classes with shared functionality in a common class? Or even if you prefer composition to inheritance you could still make it work cleaner without a bunch of if statements.
No, you also need to ask not just whether they're similar, but whether they are the way they are for the same reason, and are thus likely to change together. It doesn't matter if there are 10, if they all might change independently in arbitrary ways.
"DRY" as coined in The Pragmatic Programmer spoke in terms of pieces of knowledge.
If you are combining code just because it looks similar, you're "Huffman coding."
Every time I have seen a feature that was written general to handle possible future uses, after a year of sitting unused there will certainly be modifications or surrounding code that doesn't support the planned extensibility. So it can never be used without a lot of work changing things anyway.
Future coding has lead to some of the most overcomplicated systems I've worked with. It's one of the reasons (among many) I quit my last job. I was constantly told code that had no use cases was "important to have" because "we needed it".
So does that same idea apply to all of the many abstractions thst geeks do just to stay vendor or cloud agnostic just in case one day AWS/Azure go out of business?
On the other hand, I worked on a multi-million line codebase that was deeply joined to oracle’s db, with a team who all really wanted to move away from it but couldn’t because in the beginning (a decade earlier) the choice had been made to not put in “unnecessary” abstractions.
It’s not about the abstractions. In the case of Oracle or any other database, if you’re only using standard SQL and not taking advantage of any Oracle specific features, why are you spending six figures a year using it?
The same can be said about your cloud provider. If you’re just using it for a bunch of VMs and not taking advantage of any of the “proprietary features” what’s the purpose? You’re spending more money than just using a colo on resources and you’re not saving any money on reducing staff or moving faster.
You’re always locked into your infrastructure decisions once you are at any scale. In the case of AWS for instance (only because that’s what I’m familiar with), even if you just used it to host VMs, you still have your network infrastructure (subnets, security groups, nails), user permissions, your hybrid network setup (site to site, client to site VPNs) your data etc.
In either case, it’s going to be a months long project triggering project management, migrations, regression tests, and still you have risks of regressions.
All of the abstractions and “repository patterns” are not going to make your transition effort seamless. Not to mention your company has spent over a decade building competencies in the peculiarities of Oracle that would be different than MySql.
After a decade, no one used a single stored procedure or trigger that would be Oracle specific? Dependencies on your infrastructure always creep in.
Yes. There's no point abstracting over a vendor API if you're not actually using an alternative implementation (even for testing). Otherwise, keep your code simple, and pull out an abstraction if and when you actually have a use case for doing so.
Vendor agnostic code doesn't anticipate AWS going out of business, just them raising prices significantly. It can be smart to be able to switch in a reasonable amount of time so that you can evaluate the market every few years. This way spending extra time to be vendor agnostic can also pay off. But there's no technical reason for that, it's a cost issue.
It’s often noted that in almost 15 years of existence, AWS has never increased prices on any service.
What are the chances that AWS will increase prices enough to make all of the cost in developer time and complexity in “abstracting your code” and the cost in project management, development, regression tests, risks, etc make it worthwhile to migrate?
The cost of one fully allocated developer+ qa+ project manager + the time taken by your network team + your auditors, etc and you’re already at $1 million.
Do you also make sure that you can migrate from all of the other dozen or so dependencies that any large company has - O365? Exchange? Your HR/Payroll/Time tracking system (Workday)? Windows? Sql Server? SalesForce? Your enterprise project management system? Your travel reimbursement system (Concur), your messaging system? Your IDP (Active Directory/Okta)?
DRYing existing code to handle exactly what is needed to support the current uses with no additional axes of variation isn't adding functionality (well, if it handles more than the existing values in the existing axes of variation, it's adding some but arguably de minimis functionality.)
Building code with support for currently-unused variation to support expected future similar-but-not-identical needs, that is, pre-emptively DRYing future code, is adding functionality.
On one project we ended up with a series of feature that fell into groups of threes and we kept trying to make the 2nd feature in the series generic, and time after time #3 was a rewrite and significant rework of #2. So any extra time spent on #2 was time wasted.
I think that what Burlesona is suggesting is more nuanced (and effective) than the rule of three. We can easily imagine situations where code used twice warrants a refactor, and situations where code used three times does not.
The rule of three suffers the same problem as the pre-emptive refactor - it is totally context insensitive. The spirit of the rule is good, but the arbitrary threshold is not.
Similarly, 99% of your comment is bang-on! My only gripe is the numeral 3. But pithy rules tend to become dogma - particularly with junior engineers - so it's best to explain your philosophy of knowing when to abstract in a more in-depth way.
> My only gripe is the numeral 3. But pithy rules tend to become dogma
Agree, 3 is a pretty arbitrary number. If you have a function that needs to work on an array that you know will certainly be of size 2, it takes minimal effort and will probably be worthwhile to make sure it works on lengths greater than 2.
But the bigger point is valid: you need examples to know what to expect, and if you make an abstraction early without valid examples, you'll almost certainly fail to consider something, and also likely consider a number of things that will never occur.
> make it “generic enough to handle future use cases”.
The answer to this is usually YAGNI.
That is, don’t plan for a future you might never have. Code in a way that won’t back you into a corner, but you don’t know what the future’s cases might be (or if there even will be any) so you can’t possibly design in a generic way to handle them. Often you just end up with over-engineered generec-ness that doesn’t actually handle the future cases when they crop up. Better to wait until they do come up to refactor or redesign.
Some people argue to design in a way that lets you rewrite and replace parts of the system easily instead.
It's not so much that you're preparing for 7 separate cases, as a thing that works in one place and almost, but not quite, fits in 6 others. You rarely exactly duplicate code, but often duplicate and modify.
By the time you hit 7, you do clearly Need It. But now you've got 7 cases to work from in writing the generalization. When the number is 2, it's often reasonable to say, "I don't know how these will evolve and I'll probably guess wrong".
Yes, I agree. That’s not what I was replying to, though. I noted in another comment that I consider merging worthwhile even in the absence of three use cases, certainly if what you have now is very similar.
If all seven are the same except for one or two cases, it doesn’t mean that you have to have a bunch of if statements, you either use inheritance or composition to create special cases and judiciously apply “pull members up” and “push members down” refactoring, interfaces, abstract classes, virtual methods, etc. These are all solved problems.
Yes I know about the whole “a square is not a rectangle problem”.
Oliver Steele describes "Instance First Development", which the language he designed, OpenLaszlo, supported through the "Instance Substitution Principle". I've written about it here before, and here are some links and excerpts.
In the right context, prototypes can enable Instance-First Development, which is a very powerful technique that allows you to quickly and iteratively develop working code, while delaying and avoiding abstraction until it's actually needed, when the abstraction requirements are better understood and informed from experience with working code.
That approach results in fewer unnecessary and more useful abstractions, because they follow the contours and requirements of the actual working code, instead of trying to predict and dictate and over-engineer it before it even works.
Instance-First Development works well for user interface programming, because so many buttons and widgets and control panels are one-off specialized objects, each with their own small snippets of special purpose code, methods, constraints, bindings and event handlers, so it's not necessary to make separate (and myriad) trivial classes for each one.
Oliver Steele describes Instance-First Development as supported by OpenLaszlo here:
[...] The mantle of constraint based programming (but not Instance First Development) has been recently taken up by "Reactive Programming" craze (which is great, but would be better with a more homoiconic language that supported Instance First Development and the Instance Substitution Principle, which are different but complementary features with a lot of synergy). The term "Reactive Programming" describes a popular old idea: what spreadsheets had been doing for decades. [...]
Oliver Steele (one of the architects of OpenLaszlo, and a great Lisp programmer) describes how OpenLaszlo supports "instance first development" and "rethinking MVC":
[...] I've used OpenLaszlo a lot, and I will testify that the "instance first" technique that Oliver describes is great fun, works very well, and it's perfect for the kind of exploratory / productizing programming I like to do. (Like tacking against the wind, first exploring by creating instances, then refactoring into reusable building block classes, then exploring further with those...)
OpenLaszlo's declarative syntax, prototype based object system, xml data binding and constraints support that directly and make it easy.
OpenLaszlo's declarative syntax and compiler directly support instance first development (with a prototype based object system) and constraints (built on top of events and delegates -- the compiler parses the constraint expressions and automatically wires up dependences), in a way that is hard to express elegantly in less dynamic, reflective languages. (Of course it was straightforward for Garnet to do with Common Lisp macros!)
>The equivalence between the two programs above supports a development strategy I call instance-first development. In instance-first development, one implements functionality for a single instance, and then refactors the instance into a class that supports multiple instances.
>[...] In defining the semantics of LZX class definitions, I found the following principle useful:
>Instance substitution principal: An instance of a class can be replaced by the definition of the instance, without changing the program semantics.
In OpenLaszlo, you can create trees of nested instances with XML tags, and when you define a class, its name becomes an XML tag you can use to create instances of that class.
That lets you create your own domain specific declarative XML languages for creating and configuring objects (using constraint expressions and XML data binding, which makes it very powerful).
The syntax for creating a bunch of objects is parallel to the syntax of declaring a class that creates the same objects.
So you can start by just creating a bunch of stuff in "instance space", then later on as you see the need, easily and incrementally convert only the parts of it you want to reuse and abstract into classes.
In OpenLaszlo, you can create trees of nested instances with XML tags, and when you define a class, its name becomes an XML tag you can use to create instances of that class.
That lets you create your own domain specific declarative XML languages for creating and configuring objects (using constraint expressions and XML data binding, which makes it very powerful).
This gives me nightmares of over engineered xml programming that is infamous I the Java community. You lose all of the benefits of static type checking.
Generic enough code that can be adapted to future cases is code with clean architecture that follows standard OO principles.
On the other hand, trying to handle all the hypothetical cases because "that makes the code generic and future-proof" is usually a complete waste of time.
My view is to develop the simplest, well architected OO code that can handle the use cases at hand.
After witnessing the collateral damage of many software reuse projects (anyone remember "components"?), I came up with a different ruleset, useful for "compromising" with "software architects":
First, translate the data.
Second, divine a common format and share the data.
Third, create the libraries for this common format, to be reused amongst projects.
I have never reached #3 in my professional career. Sure, we wrote the libraries. But other teams, projects have never adopted before whole effort became moot.
So I kept my projects in tact and moving forward, while letting mgmt think they're doing something useful.
> The moral of this story? Don't get trapped by the sunk cost fallacy. If you find yourself passing parameters and adding conditional paths through shared code, the abstraction is incorrect. It may have been right to begin with, but that day has passed. Once an abstraction is proved wrong the best strategy is to re-introduce duplication and let it show you what's right. Although it occasionally makes sense to accumulate a few conditionals to gain insight into what's going on, you'll suffer less pain if you abandon the wrong abstraction sooner rather than later.
Sandi Metz's blog (and book) are an absolute gold mine. I'm a junior developer (<5 years), and the material gave me an entirely new outlook on design (object-oriented and otherwise)... and also showed me how un-maintainably most development is done nowadays :(
If there was a required reading list for professional developers, I would put her work on with zero hesitation, I feel it to be that important.
> You call devs Junior until they have over 5 years experience? Wow, that’s harsh
Less than 3-4 years is absolutely junior dev. Up until around 7 or 8 years is mid-level. Passing beyond that would be senior dev and the other titles then follow.
With the obvious note that it's not strictly time bound - someone could easily get stuck at mid level for much longer if they aren't progressing. It's pretty hard to still be only junior after 7+ years
I still very much consider myself junior but someone I know with less experience than me just recently got a job with "senior" in the title. I have seen senior job postings that say something like "3+ years of experience."
I absolutely do not consider myself senior and I don't think I will for at least seven more years.
One of the big reasons I realized I still _was_ a junior developer was because of Sandi Metz's content showing me how long of a journey I still have to go :-P
I don't personally think it's strictly about the time one's been programming, although in my experience that can be a good benchmark for e.g. how good one's abstraction, API design, etc. skills are.
> don’t extract repetitive code right away, try and build the feature you’re working on with the duplication in place first. Let the code go through a few evolutions and waves of change.
This ^^^.
I'm in my mid 50's now and worked as a software dev since my teens. I've learned over time that certain lumps of code need to be left alone for a while before jumping in and aggressively refactoring the perceived "duplication". I've been guilty of this kinda thing before, spot a wodge of code that looks like a duplication, only to find out later you hit some "special cases" as the project progresses and suddenly, as the article points out, your "helper" method balloons into a ball of spaghetti.
As an apropos to the article, and touched upon therein, checking in a fairly major change to de-duplicate some code without consulting the original author/team is a wee bit rude. Ask your colleagues first why such code still exists before barging in and making these changes, they may already have some concerns as to why refactoring to a helper method or some abstraction isn't in their game plan yet. It's a common courtesy.
> As an apropos to the article, and touched upon therein, checking in a fairly major change to de-duplicate some code without consulting the original author/team is a wee bit rude.
This sounds like the outcome of bad culture. Ownership of the code should be shared to the point where it should never be considered rude to improve the code. Any part of the code.
> Ask your colleagues first why such code still exists before barging in and making these changes,
If I had to synchronise with others all of my improvements to existing code (which was frequently written in a hurry to meet a deadline, so with shortcuts taken intentionally, or with incomplete knowledge of future use cases) I would get at most half as much done.
> they may already have some concerns as to why refactoring to a helper method or some abstraction isn't in their game plan yet.
If there are alluring "improvements" that don't work for such subtle reasons, this should be documented in the code. If it's not, one has only oneself to blame when someone goes in and changes it.
Edit: I realise now that I'm talking about teams where everyone is reasonably senior. It could be different with junior members on the team, to which many changes might look like improvements when a senior engineer would at most see the change as equivalent. In that case I think you're right, but for a different reason: junior engineers should always check in with senior engineers about things in order to learn more of the craft!
For 90% of shops outside the cornucopia of SV with unlimited budgets and internal customers only, the client drives decision mercilessly and ruthlessly.
And for a good reason. The product doesn't exist because they're fun to develop. They exist because they solve a problem customers have. So ultimately, decisions should always be based on what customers (long-term) benefit most from.
One of the areas where I really like "incidental duplication" is in tests.
Tests can sometimes be very repetitive and identical, and it's tempting to want to refactor it in some clever way. That's almost never good.
On top of the reasons laid out in parent comment, tests also function as unofficial documentation. I like having everything explicit in there, it makes them easier to read and understand.
If you try to abstract away tests, you often just end up re-implementing the same abstractions used in the actual code, and you can end up not catching unfounded assumptions that your abstraction is making in both the tests and the code. There is a scope for having test helpers / utils to make tests easier to write, but you should be minimalist with these.
I agree, but some test helpers are reasonable. It's about balance and readability of the tests in question. Otherwise integration tests end up being a hundred lines of code.
I'm not sure I agree. I like to move all setup code to helper methods so that my tests are just a few lines
// Given <setup>
// When <thing happens>
// Expect <thing> to be <such>
This allows the reader to easily see which workflows are actually tested. If the reader is interested in implementations the utility code is one click away and usually only needs to be looked at once to be completely understood. The test bodies themselves however have many flavors for many work-flows so getting rid of the repetition is critical to highlight the specific nature of individual tests.
That’s why I make the test fail before writing the code. If the code is already written, then I break it in the minimal way to test the test, and then fix it.
IME things never fail in the way you expect them to... You can build a fortification where you think you are weak only to find the enemy is already in the castle.
A green test does not equal bug free code. There may be a misimplementation / misunderstanding of the spec or your code passes beautifully under right conditions, with just this data setup.
That's a test for your test, so why only run it once transiently instead of running every time? "Mutant" testing helps with this. It's basically fuzzing your test code to make sure that every line is meaningful.
I can see how that would be useful, but I also think it's a matter of priorities.
I'm basically saying I rarely have bugs in my tests because I verify them first. In fact I can't think of a single bug in my tests over the last 4 years (or even 10 years), but I can think of dozens of bugs in my code.
For example here are some pretty exhaustive tests I've written for shell, which have exposed dozens of bugs in bash and other shells (and my own shell Oil):
I would rather spend time using the tests to improve the code than improving the tests themselves. But I don't doubt that technique could be useful for some projects (likely very old and mature ones)
ITYM "I rarely have bugs in my tests that I'm aware of". The amount of tests I've seen that look like they are working properly, have been "verified" and are buggy is huge. Usually we find they were buggy because someone moves some other tests around or changes functionality that should have broken the tests, but didn't.
Please, don't ever assume that your tests are beyond reproach just because you verified them. Tests are software as well and are as prone to bugs as anything.
And how do you do this in practice? I am struggling to think of a good way to keep the production code that fails the test and the production code that doesn't fail the test together. I might have my test check out an old version of the production code, compile it and test against that. But that is hard to get right.
As a preventative measure, I write some tests for my tests. Also in TDD style of course. And on a very rare occasion, I have to write a test for those tests as well.
I do actually do that. I'll write some buggy code in order to learn how to test for it.
TDD for me is primarily a way to guide myself toward accomplishing a goal. So I sometimes write way more tests for myself than the business needs. I will then delete the scaffolding tests before I tag my PR for review.
Tests have fewer bugs if you write them before the system under test, and if they don't have mocks, and if you have enough of them that anything you get wrong the first time will get noticed by the results of the many similar tests.
> Tests have fewer bugs ... if they don't have mocks
100x this. I've repeatedly fail to convince my team members that mocks are unnecessary in most cases. I've reviewed code with mocks for classes like BigDecimals and built-in arrays. This is especially prevalent in Java teams/codebases.
> Tests rarely have bugs, I find, so generally dry isn’t critical
DRY is important for tests, but the areas where it is have probably (if you aren't writing a testing framework or using a language that doesn't have one that covers your use case) largely covered by your testing framework already.
I have 2 rules I use when determining whether to duplicate code or to refactor:
1. How many duplications are there? If the code is duplicated once, that's fine. If it's duplicated twice (so 3 instances of it), then it's time to consider refactoring, subject to the next rule.
2. Why is the code duplicated? If it's "incidental duplication", i.e. code that happens to look the same, don't refactor. Only refactor if there's actually a reason why the code is the same. Which is to say, attempt to predict the future: if a change is made to one instance of the code, do I expect it to be replicated to the other instances too?
It's also useful to look at not just the duplication but the code itself. In this case, it was code for geometry which is not like to change all too much.
Often the difference between harmless but ugly looking duplication and duplication that is actually harmful relies on the semantics of the code and not just its appearance.
It wasn't code for geometry, it was code for manipulating geometric shapes via UI. the article specifically mentions that custom behavior was eventually needed:
> we later needed many special cases and behaviors for different handles on different shapes. My abstraction would have to become several times more convoluted to afford that, whereas with the original “messy” version such changes stayed easy as cake.
It is, which is why you want to be conservative. If the duplicated code is obviously supposed to be identical, then predicting the future should be trivial. If it's not obvious, then it's a question of "can I conceive of a reason why I'd want to update one and not the others?". And if the answer is unclear, wait a while and see if anything comes up.
This. I wait until the feature is mature, and has been used by actual people. There's no point trying to clean up code that hasn't finished evolving, or that I don't understand fully.
It also means I have a tidy stack of refactoring to do when I'm bored or need a quick motivational win :)
And in the other direction, when adding flags etc to a method that was created to reduce duplication, consider if splitting it into two "duplicate" copies or copy it to the call-site requiring the flag isn't the better alternative.
Or, if possible, split up the helper into smaller units of functionality that can be combined as appropriate for different requirements. That could possibly work for the original article's problem, too.
Refactoring duplicated code is a trade off: on the one hand you create abstraction and centralizing code logic at the cost of, first, the overhead of learning that abstraction, and second, by increasing coupling between functions. I've personally found that the coupling is by far the most important factor to consider. If A depends on B, and C is found in both A and B, then you should factor out C. If A and B share C because they are adjacent, without being fundamentally intertwined, then duplicate C, but consider creating a library or module that makes it easy to talk about things that are similar to that duplicated code, C. (I tend to think about modules/libraries as little, independent DSLs)
Another thing you can do when a function becomes overencumbered is to split the remaining similar logic into smaller functions which are composed into specialized functions.
This has benefit in that when analyzing modules, you can spot differences in procedure at a glance instead of needing to dig through 100 lines of somewhat similar imperative code.
Right. There are two kinds of such refactoring / DRY-ing up the code:
1) Specialized helper sub-functions/classes to make the codeblocks DRY.
2) Functions/classes to make separate features DRY.
Problem arises when the design or understanding of the code doesn't reflect the realities of changes over time, forcing you to restart/revert, or making spaghetti code with optionals and whatnot to accomodate the rising complexity.
The agile approach would be to make the code that is easiest to change either way, and prevent being locked in to only one approach.
Go error handling is a good example of this. So far all the attempts to reduce the repetitive `if err != nil { ... }` through some abstraction failed. Look at https://github.com/golang/go/issues/32825
In the language maybe but both Rust and Zig show that it's possible to have much less 'bloat' for error handling even without using exceptions.
I'd say that go designers have still work to do: Zig especially show that you can be a 'simple' language and yet have both sane error handling and generics.
I thought all of this until I got used to Go's error handling.
There's a couple aspects to this:
1. After a while, the "if err != nil {" becomes a single statement in your mind, and you only notice it if it's different (like trapping things that should error with "if err == nil {"). In other words, it only feels verbose if you're not used to it. After a while, the regular rhythm of "statement, error check, statement, error check" becomes the routine pattern of your code and it looks weird if you don't check for errors (which is as it should be).
2. The point of Go's error handling is that it isn't magic. There's nothing special about error values, and they are handled exactly the same way as every other variable in the system. The only thing the language defines about errors is that they have a method called Error that returns a string. That's it. This means that you can create complex error handlers if you need it, entirely within the standard language. This is extremely powerful.
The Go team's examination of the language error handling is interesting because it seems there's a conflict between newer Gophers who don't like the verbosity of it (but don't realise the power it brings) and the older Gophers who are used to the verbosity and appreciate the power. Almost exactly like TFA. The repitition looks ugly if you don't appreciate the reasons for it.
"It isn't magic!" mantra is often heard in Go apologetics, but every time I see it, it occurs to me that Go's definition of "magic" is somewhat akin to a 15th century peasant seeing a lightbulb. Stuff like exceptions or error types isn't magic - they have been around for a long time, they're well understood, and they have significant advantages.
I kinda prefer this to the Rust apologetics, where every post about Go is replied to with three posts about how Rust does it so much better ;)
I've used lots of other languages, as have a lot of other Gophers. I'm not saying "Go's approach is good" out of some strange tribalism or a need to assert my preference. I'm saying this because, having spent over 35 years programming, I really appreciate the simplicity of Go and the lack of magic. I'm not apologising for Go's simplicity, I'm trying to explain why I like it.
Exceptions are pretty much the perfect example of (bad) magic actually; unannotated, dynamically dispatched nonlocal control flow that can execute code (ie, destructors) in completely unrelated contexts on the way past. At least 0x5F3759DF can be boxed into a function and commented in one place.
> Go's definition of "magic" is somewhat akin to a 15th century peasant seeing a lightbulb.
Except exceptions are rarely understood and used correctly by most programmers. They can simplify program structure, but at the expense of proper errorhandling and error mitigation strategies.
Golang is still in the sort of niche that builds databases, queues, container-orchestration, etc., but can be built for other things given enough care for spending the extra effort simplifying the solutions.
> Except exceptions are rarely understood and used correctly by most programmers
That's pretty condescending.
The mechanism for exceptions has been around for more than 20 years, it is well understood by most programmers.
The problem is that error handling is hard.
Exceptions are an adequately sophisticated solution to that hard problem. Go's approach only encourages ignoring errors (since the compiler never enforces that you handle them) and boiler plate.
Nope! Exceptions are, in fact, rarely understood and used correctly by most programmers. Including loopz. And me. And the authors of approximately every nontrivially-exception-using piece of code I've had to work with. And presumably also of the code loopz has had to work with.
The difference is that some of us have the good sense to rarely use exceptions at all.
Is it really: Are programmers omniscient then that they can trap all kinds of exceptions correctly from external code? It's a sophisticated method that dumps the problem on the user instead.
Golang also output stack traces and even supports panic() if one wants to have something similar to handling exceptions. The difference is that this is used for classes of errors that ideally are programmer error, and not for all kinds of business logic states. I'm not saying the Go Way is perfect either, but it's at least a small step acknowledging the difference, rather than defaulting to dumping random programmer errors on unsuspecting users.
Errorhandling is easier when improving design. The problem is this takes time, thinking and effort.
Go didn't discover anything. Java's runtime and checked exceptions are already the direct consequence of errors being of two kinds: recoverable and non recoverable.
Exceptions are basically error types with dynamic typing in an otherwise statically typed language. So if you can deal with dynamic typing in Python, you can handle exceptions in Java.
The main issue with any discussions on exception is the elephant in the room, Java. Java has a worst model of exception mixing weird typechecking rules + error handling not forcing to recover the exception.
I really like the exception model of Erlang, recovery is only possible from another routine. It's is in my opinion the best exception model.
Go code is nice because everything is fully explicit but it's hard to read, trying to follow the control flow when half of the statements are error recovery code is just very unpleasant. And i still don't know how to express that an error is the result of another error without forgetting the context.
This is kinda my point: it's hard to read until you get used to it. I can totally see why someone used to Python has a hard time with Go's error handling, because they're not used to the rhythm of it (logical statement, error check, logical statement, error check, logical statement, error check); they're more used to (exception handling setup, logical statement, logical statement, logical statement, exception handling completion). It's different, and therefore strange and weird. But after a while you get used to it, and expect to see an error check after each logical statement, and it sort of merges into one structure in your mind.
There are packages for wrapping errors, and I believe some form of error wrapping (using the fmt package for some reason) is being adopted. More than that is up to the coder to implement.
> Java has a worst model of exception mixing weird typechecking rules + error handling not forcing to recover the exception.
Unless you have a specific complaint about Java's model (which I'd love to read), I strongly suspect that your beef is with a few standard Java library functions misusing checked exceptions than a statement against exceptions in general.
The combination of runtime and checked exceptions offers the most complete solution to the difficult problem of handling errors, with the compiler guaranteeing that error paths are always handled.
The big problem with Java's model is that exceptions aren't part of the type system: they're that whole separate thing that is applied to methods, but it's not a part of that method's type in any meaningful sense. This means that it's impossible to write a higher-order function along the lines of "map" or "filter" that would properly propagate exceptions, because there's no way to capture the throws-clause of the predicate and surface it on the HOF itself.
Sounds nice, and of course it is possible to build solutions with exceptions that do recover all errors elegantly and cleanly. However, the correct judge on this would be your own users. Given enough care, the discussion becomes rather philosophical.
Though, having to confront errors through the callstack makes one review where handling would be most prudent, in real-life the time-pressures are just too strong making such efforts largely unrewarded.
> The point of Go's error handling is that it isn't magic. There's nothing special about error values, and they are handled exactly the same way as every other variable in the system
The error maybe, but not the result of the call. The multiple-value return x, err is not a first-class value. It cannot be handled like any other variable.
This was demonstrated very clearly with proposal for try. try would have automatically returned with err when err != nil. But what if you wanted to change the error, say create an error message? Then try was completely useless. In Rust, where the result actually is just a regular value, you can transform the error however you like just like any other value and try is just as useful as before.
Sorry, I don't understand what you're saying. Are you saying that because there's a proposal in v2 for error values to not be 1st class values, therefore they're not in v1?
I think it’s that in Go multiple return values aren’t a first class value. It’s just two separate values. Whereas in Rust or Haskell they’d be a single, first-class Result<a> (or whatever) value.
> 1. After a while, the "if err != nil {" becomes a single statement in your mind, and you only notice it if it's different (like trapping things that should error with "if err == nil {"). In other words, it only feels verbose if you're not used to it.
The whole point of programming is to abstract away repetitive work. Yes, a human will spot that the pattern is the same, but this is both fallible and a waste of human effort. And even if you can see the difference, those extra characters are still filling up your lines and making it hard to keep functions on a single screen where you can comprehend them more easily.
> 2. The point of Go's error handling is that it isn't magic. There's nothing special about error values, and they are handled exactly the same way as every other variable in the system. The only thing the language defines about errors is that they have a method called Error that returns a string. That's it. This means that you can create complex error handlers if you need it, entirely within the standard language. This is extremely powerful.
There's nothing magic about something like https://fsharpforfunandprofit.com/rop/ either. Just plain old functions and values written in the language (that talk literally gives the definitions of all the types and functions it uses, written in plain old F# code). You need functions as values, but everyone agrees that's a good idea anyway, and you need proper sum types, but you need those anyway if you're ever going to be able to model a domain properly.
> The point of Go's error handling is that it isn't magic.
The problem is first, that sum types are also not magic. There is nothing special about the error type or value in a `Either<T, Err>`. Go's type system is just too crappy to make such things, or make good use of them even after you tried to shove them into an interface{}.
The second problem is that Go's error values, like every error handling system that pretends it doesn't need sum types, have picked up more magic (%w) or impacted the usability of other interfaces (context.Err, separate error channels) bit by bit.
The downside I see with go's error handling is that you can forget to check. With rust, if the function being called returns Result, you have to deal with the error (even if dealing with it just means propagating it out). Missing error handling is such a common source of bugs that go really turns me off here.
Specifically for this issue, linters also have many false positives. Some Go libraries trying to encourage a fluent style will accept an error for some logic also return it, so you can `return x.HandleError(err)` - but if you don't want to return it, you obviously don't care it returns what you just passed it. (I personally consider fluent methods a bad idiom in Go, but I also don't get to write all the Go code in the world or even in my project.)
There are also a lot of functions that return errors because Go's type system demands that if the interface returns two values `T, error`, every implementation must also - it won't auto-create a nil value for the second result. That's reasonable if you are committed to errors just being normal values. But such a restriction would not be necessary if the interface could be declared with a sum type - promotion of a `T` to a `Either<T, ...>` or `Just T` or so on would be fine for all types, not just error handling . Lots of infallible Writer implementations like bytes.Buffer and hash.Hasher suffer from this, and linters can't be aware of all such cases.
Sure, but given a choice, I'd rather work with primitives that are correct-by-construction, not correct-if-I-use-an-extra-tool-and-actually-act-on-its-advice.
If you do use a linter and have it set up so linter issues are fatal to the build, then you run into the issue that if the linter throws false positives, you have to add exclusion rules (if the linter even supports that) or downgrade linter issues back to warnings, and lose the benefit entirely.
One aspect of go error handling that still really bothers me is how easy it is to accidentally drop an error on the floor.
If you strap all the linters you can find onto your build system, you can catch the most obvious cases. But I still frequently find mishandled errors that sneak past the linters in ways that can't be solved without restricting the way devs use the language.
By making error handling something you have to deal with every time you call a function, you massively increase the number of opportunities you have for screwing it up.
I would love something like rusts ? Operator for go. You could choose to not use it when you need special handling. But it would be rare and exciting and developers would use it with care.
I don't really think Go error handling is a good example. A nil test paired with a return cannot be extracted to a function. A macro could perhaps centralize the logic, but otherwise it's no more amenable to deduplication than plain addition. Moreover, it's completely trivial, if verbose in the Go language.
The case where it becomes interesting is when there are four or five statements that are repeated. If it's two statements, especially if the statements are both control flow, one of which is tied to the frame of the executing function, and the other is a trivial nil test, that falls firmly on the "not refactorable" side of the line.
> Then you need to start passing options and configuration into your helper method... and before long your helper method is extremely difficult to reason about
In which case, you should split the helper function ( extract sub-part common to all cases, and report the differences where the helper function is called).
I think I would most of the time go with de-duplicating as early as possible, as long as the helper function only has few parameters and can be described in plain english rather easily.
To me the cost of having to refactor the helper function later in the process is less than dealing with duplicated code.
Duplicated code causes many issues. You mentioned introducing bugs, but it also makes the code harder to read. Every person who reads the code has to make the de-duplication effort mentally (check that the duplicated parts are indeed duplicated, figure out what they do, and on what parameters they differ...).
Premature optimization. Duplicated code is only evil when there's a bug you only fix in one place and forget about the duplicates; in almost every other case, it's easier to reason about and is more resilient in the face of local changes.
Abstraction is often like compression, and compressed data is easier to corrupt. Change the implementation of an abstraction, and you put all consumers of it at risk. It's not an absolute good.
Consider you have 4 times a block of 10 lines of code, they are identical except for a couple of parameters. The person who reads the code has to 1. figure out what the code does 2. see if the duplicated parts differ in some subtle way.
The alternative is to replace the duplicated parts with a function that has a meaningful name. This makes the code easier to read. It's not a premature optimization. It would make sense to keep it duplicated if you're in an explorative phase and not sure yet what the final design will be, but I wouldn't submit a patch where parts are obviously similar. I'm not sure it would pass review.
It really depends. For test setup, I'm inclined to leave alone; duplication is often easier to reason about when a test breaks. For code with control flow changes in the middle, abstraction is honestly dubious. For mere value differences, maybe, but if the values are complex and not merely scalar, maybe not. More duplication is needed to justify when the abstraction needs more edge cases to cover them all, especially control flow more than different values.
I generally find it pretty easy to reason about code structured like:
switch(object)
type1:
(bunch of code)
type2:
(bunch of code)
type3:
(bunch of code)
etc...
Even if the function is long it's pretty easy to skip over the irrelevant parts.
When you get in trouble is when you discover a bug (or have changed requirements) in something that gets duplicated several times and have to remember to hit all of them. The last one especially, it's the one that seems to be missed the most often.
Overall the tradeoff is generally worth it though, because you only need to care about one case at a time.
> Even if the function is long it's pretty easy to skip over the irrelevant parts.
> Overall the tradeoff is generally worth it though, because you only need to care about one case at a time.
Which part is irrelevant? As a programmer, I don't generally know which value `object` has, so if I need to understand the whole statement, I need to look at every case, so I often need to check whether they are identical or slightly different.
Duplicate code like this is a well-known source of bugs, one of the cases most often highlighted by static analysis tools.
It kind of depends what you're doing, but a lot of the time you are in a situation like "there's a bug when you do X", and when you find code like this you scroll through the options until you get to the one labeled as function X.
Inside of there you may run into stuff that was set up earlier in the function, but it's pretty easy to backtrack to the top of the switch statement to see what was done before then.
There's nothing quite like the joy of being asked to fix something and discovering that it's a big old top down project without excessive abstraction/compartmentalization. Just skim through the code in order till you get to the part that's misbehaving and start investigating. No need to jump around a dozen factory template engines or custom generic template wrappers to find the 3 lines of business logic that have an off-by-one error or something.
I am such a huge fan of Big 'Ol Switch Statements over using polymorphism/types/generics/whatever. So easy to understand, and it's all there in a huge scrolling list of cases. If something needs to change, it's easy to change it, and you know what else is affected.
That works until you have 20 different Big Ol' Switch Statements, each switching on (mostly) the same cases, essentially implementing a set of related behaviors in 20 different places instead of grouping the 20 behaviors under the same umbrella.
Overall, I think there is an equilibrium between the number of cases in the switch and the number of different switches with the mostly the same cases. The fewer cases ypu have and the more times you handle the same cases, the more it will help to group these different behaviors in separate places: each case would correspond to a different class, each switch statement to a different method in each class. The fewer classes and the larger they are, the more I think it helps to apply this transformation.
By the way, this has nothing to do with the discussion above. The alternative to the switch that GP commenter presented isn't polymorphism, it is simply extracting the common lines into a separate function.
> Duplicated code causes many issues. You mentioned introducing bugs, but it also makes the code harder to read
That is I believe contestable. Yes, it can end up easier to read but there is a big tradeoff - when you remove code from its context it is much harder for a reader to reason about it. Once something is extracted into a function you can't see, you have to mentally ask the questions like "could this return null / None?", "how will it behave if the input is negative?", "does it throw exceptions", "is it thread safe?" etc etc. All this is directly observable in context when the code is inline, not so when it is removed to a central place.
Ideally, these questions should be answered by the function name, type, and its documentation. And if not, one can always jump to the function definition and the above elements (missing if the piece of code is inlined) will give additional hints to what the code does.
Upvoted. We'd probably come up with different implementations, I strongly prefer composition and "interpreter" style code, but whenever I've seen casual mutations in different dataflows, it was because of doing one off changes while ignoring the whole.
I think about it in terms of bugs. If your abstraction causes a bug, I have to go in and work out wtf your wonky abstraction is doing and also risk breaking other cases. If there are 6 duplications and there is a bug because one of them is missing a change applied to all of them, that takes 5 mins to fix and risks breaking nothing.
When you make an abstraction think not only "Will this create bugs?" But also "If this abstraction does create bugs will they be easy to identify and fix?".
Extracting common function was right move in your example.
The bad move was to add options to common functio instead of changing one caller to call new different method.
Even worst move was add more and more options to originally simple method. Just because something was rightfully extracted to common place 10 months ago dies not mean it have to remain in common place. And the need for split does not mean original extraction was wrong.
As with everything, it often becomes a big grey area on what is acceptable and what is not.
Example: my (fictional) company sells a B2B platform which provides companies with an online marketplace or some other type of online application. Each installation requires different, though often similar, integrations with the customer back-ends - think Postgres vs MySQL, but some customers may also use TotallyEffingObscureDB, which doesn’t even speak SQL. Those backends are usually the source of truth for the entire stack, storing user data, etc, the local data is just a mirror.
So, given this scenario, how should we approach user registration processes? Is that a product component or custom (not DRY) code? What about all the other (similar, common, and basic) features of our platform that every customer wants but invariably needs to be implemented slightly differently?
I’m only posing these questions because I worked in a situation where something similar happened, and it wasn’t dealt with well, and I’m wondering what other HN coders have to say...
Ironically, we're handling this with the concept called "clean code".
We do have a core, which does implement the base logic. Everything in there is domain driven, but only using DTOs and providing interfaces for input and output, using repositories and presenters.
When the data source changes, we only need to add the new repositories and set those within the context.
If we have to implement some specialized logic just for one client, we add this as a plug-in on top, so it is handled outside of the core logic.
Of course all of this is very abstract and since I've worked mostly with simple MVC concepts until now, I'm still struggling to get my head around this approach, but so far it's looking very well.
It may look overcomplicated at first, but there is a very strict while at the same time very flexible logic behind it, which can handle all the "creative" ideas our clients have so far, while still keep being maintainable.
It is a long process to get to this method of programming, but my initial scepticism has completely changed to "why haven't I worked like this before?".
That’s pretty much the approach I would suggest and think works best. Thanks for your reply.
Looking back, I think the place I worked had a lot of issues with how code was actually stored and maintained - different repositories for each client and each feature, for example, meant a lot of common code was simply copy/pasted when reused and that obviously made sharing bug fixes much more difficult or even impossible. Real dirty stuff. The solution isn’t all that complicated but when you’re on a services team with several hundred clients and hundreds of issues in your backlog, management focuses less on paving the way for the future and more on getting things done immediately. A few of us did implement a single code-bass / configurable framework for one big feature, but it was hard to get buy-in and convince people to use it - even if it reduced the workload from days or weeks to hours. The concepts from that were eventually re-packaged by management and sold by a more creative manager as an “SDK”, but I didn’t have the privilege of working on that team.
Interesting. I'm curious how long did your core system with basic-logic take to reach that maturity, and how many people involved? What kind of development model did you use?
My first job at a bigco had a very heavy read-the-code culture, and being able to read code and write readable code is one of the most valuable skills I learned there. The lack of this skill is one of the things I've found most frustrating about working with less talented or, now that I'm a bit later in my career, more junior engineers. There's a tendency to glom onto black-and-white, superficial rules without understanding them, instead of appreciating the principles underlying them and having a sense of how to balance trade-offs. This creates an unfortunate cycle: everyone writes unreadable code, so nobody practices reading code, so nobody internalizes what readable code looks like and continue to write bad code.
I tend to have a strong reaction to duplicated code, but DRY is risky if whatever you're pulling out isn't logically concise and coherent[1]. Some of the helper functions I've seen in code reviews (as well as the one in the OP) strike me as written by unsophisticated language AIs, catching textual similarities that aren't semantically linked and pulling them out.
The engineers I've mentored over the years, including ones starting with no eng experience, go on to write fantastic code (and no, this isn't just my opinion). But it's a very labor-intensive, hands-on process of thorough reviews and feedback until they internalize the underlying principles, and it can be tough to swing in very short-term-oriented company environments. Now that I'm running larger teams, I've been noodling over how to encapsulate this deeper understanding of what makes good code in a way that scales better. But the fact that Google et al haven't already done this makes me think it's not something you can teach with eg a bullet point list.
> I encourage my devs to follow a rule of thumb: don’t extract repetitive code right away, try and build the feature you’re working on with the duplication in place first. Let the code go through a few evolutions and waves of change.
(Note: this isn't a case of what I describe in the earlier part of my comment, as I don't think this is a superficial, black-and-white rule)
I disagree pretty strongly here, especially since you're #2 involves waiting til you hit a bug. IME, there are many cases in which a solid understanding of the code allows pulling repeated code out in a way that's principled enough that it's more likely to adapt well to future changes. Indeed, this is true of most changelists to some degree, or you'd end up with no functions or classes other than main().
[1] A good rule of thumb is whether the free function has a reasonably readable name that captures its logic, without having to abuse handleFoo() or processBar().
It's important for developers to understand that programming is at least as much about expressing yourself clearly with language, as it is about maths and compsci views of functions. Language that's more verbose, but also adds clarity, is a good thing.
If I had an instruction book on building a cabinet, it wouldn't help to re-list every screw, tool, and their sizes on every single step if I could put a parts list at the front. But it also wouldn't help to collapse every matching group of steps with one or two different parameters together.
Can't be specific without knowing the exact helper function but I find sitting down with a cuppa (away from the computer) and planning an interface (If it's C++) that accepts (say) a policy class with a default and a callable object ("Functor") very helpful.
Sometimes it can't be done but it's better to have a bugfree building block (Macros don't count!) that can accept buggy user defined tasks than lots of repitition.
The best way I’ve had this explained is that refactoring != DRY. Some methods have a similar structure, but if the problem they’re solving isn’t fundamentally similar then it’s not truly refactoring and you’re setting yourself up for more work in the future.
> don’t extract repetitive code right away, try and build the feature you’re working on with the duplication in place first.
I have a similar rule of thumb: when you do a thing the first time, just get it working. When you do a similar thing a second time, just get it working. When you do a thing a third time, pull it together as a cohesive abstraction.
This is a silly generalization, but the point is that you generally need to see a handful of examples of a thing before you can make any useful abstraction. What's more common is people creating abstractions during the first iteration, leading to features that are never used, and others that were never considered.
And the abstractions are not abstracting over the correct things so they are actually a net negative. One has to reason about the abstraction AND the problem domain.
I find it helpful to work with ADTs of values, lists and maps. No OO, just functions for selection and projection. The majority of programming is figuring out the nuances of the domain and getting something working. Code is actually an impediment to that.
I think that people often make these choices based on latent social and work-related contextual factors as much as there is any misunderstanding or blind adherence to the wrong set of guidelines. There are human factors as well that go into people defensively citing guidelines for their decisions after a criticism comes up in code review. It's somewhat likely that the choice of a premature deduplication wasn't deeply considered. The programmer just did it because they felt like it made sense, and when questioned about it, they claim whatever "rule" exists to justify it.
There's a strong pressure in work environments to get shit done, get it through review with as little fuss as possible, and move on to the next thing. That's work after all. Choosing to leave something in a state that's guaranteed to need attention later can be viewed as lazy. If you at least make an attempt to do something in a clean way, it will be perceived as "work" even if it turns out to be the non-optimal choice.
I find for myself, at least, that I'm far more willing to let things evolve in my personal projects and to discover the right abstractions over time than I am in my work projects. I don't know if that's true for everyone though. My personal projects are always about the learning process. Maybe for a lot of people that's not the case? The focus is more on the product than the process?
I write the first version of every side project in a language I don't know or don't know well and rewrite the actual version I'm going to use in a language I do know well. So I'm really happy to let things evolve, and the more I know the domain in one of my main languages, the more time I spend in the new language because I want to distance myself from what I think I know and examine it fresh when I come back to it and see if what I think I know is actually still (or was ever) valid.
Different people have different teams and motivations both at work and at home, so while I agree with you that taking a wait-and-see approach is often a really good idea, there are often human factors involved in these decisions where the same person will make very different choices depending on their environments and motivations.
> I encourage my devs to follow a rule of thumb: don’t extract repetitive code right away, try and build the feature you’re working on with the duplication in place first. Let the code go through a few evolutions and waves of change.
... alright buddy, I’m going to need you to remove your spyware on my computer!
I just did this last week with a CBOR message interpreter in C. I did it the long messy way, saw a bunch of repetition, “cleaned” it, changed the way the parser is called, changed some of the validator that runs after it... then did that entire process two more times as requirements changed.
Loved the "incidental duplication" term. I will def start using it from now on.
But to add to this, I want to say that the moment you find your self writing a "helper" function, that's the moment when you must realize that you didn't understand the problem quite good.
As a rule of thumb and what I am always trying to pass to my peers at work etc, is to always try and think in terms of your domain and the models in it.
- What models does this helper function is acting upon or on behalf?
- Does it make more sense to create a model and put that function in it?
- Should that function be in a specific model or more that one?
I am working on a large project and I also realized this. I worked really hard to not repeat myself. I tried to reuse as much as possible as not to create extra code. Then, I quickly realized the things are NOT EXACTLY the same, and the methods to handle things were similar they were not the same. So, I had to refactor a lot of other helper methods.
I agree with the approach you are describing. It would be easier to refactor later, as one would probably have a better understanding of the application in the wild.
I wonder if we could have IDE features that will make #2 less likely. Eg marking some lines of code that they're similar to lines of code elsewhere. And if you then change it in one place the IDE will remind you about the others.
JetBrains has something like this in their tools. It pointed me in the right direction a couple of times when touching code that is similar to something elsewhere.
Relying on your IDE to maintain program semantics is terrible. IDE should help you write good code that you commit, not be a crutch for writing bad code.
People in the thread are arguing that it's not necessarily bad code though. Also, the point of the IDE would be to help you just not forget to change it.
I am coming to the conclusion that code reuse is over pushed as an ideal in universities. Often the desire to create reusable code means that we create overly complicated code with less duplication.
Another approach with statically compiled language and good IDE might be to reduce duplication whenever you see it and inline back when you feel that this abstraction is no longer useful.
From my experience, I would start with a more compact implementation first in hopes that it would save me time. This usually ends with me having to do more complicated operations in my head to plan the code. It's tempting. It's like trying to do math in your head to save time - often leads to a mistake. Now I believe there's no shame in trying out a solution first and rewriting it later when I get a feel what the problem is really about.
John Carmack once said something like "If you're not sure which version is better, code it both ways and compare".
The only downside of later optimization is that in commercial environment they often don't let you clean up.
> The only downside of later optimization is that in commercial environment they often don't let you clean up.
Exactly (speaking for custom industrial automation projects). The follow up next requirements for a project might be in 5 or 7 years or never. You would be cleaning up a 80% dead project. In these environments, it's not uncommon to no longer have a dev environment because the system is a singular testbench that costs upwards of 2 million and only exists in 2 factories, so all bugfixes must be extremely conservative and any refactoring is stricly forbidden because there's no way to locally test for regressions.
Reusing a function like this is not clean code. It violates several principals.
1. Functions should have as few parameters as possible and almost never have flag parameters. This is a basic thing and costs very little to follow. As soon as you want to add a flag to a function you need to make a new function.
2. Minimize coupling.
3. Single responsibility principal. A unit of code should have one reason to change.
Of course in order to follow principles 2 and 3 here you may well need to consider the business logic.
One big way I prevent this from happening is to treat classes as interfaces to data structures and keep everything that isn't about accessing the data elsewhere. Conversions to other data types go somewhere else. In fact I don't want my data types depending on any other data types at all.
When doing this any of this repetition or evolution can stay out of the data structures themselves so that they can be reused without irrelevant baggage.
Have you not simply abandoned OOP at that point? A core point of OOP is that objects manage their own state, and provide an interface for accessing/mutating it.
If classes are only used as data structures, and everything is done through (presumably pure) utility methods, it sounds like you're writing procedural code in an OOP language.
That's not inherently a bad thing, but OOP provides benefits and you may be making a trade-off without thinking about it.
I try to worry much more about what works rather than labels.
If you stuff all your functionality and data transformations into class definitions, your class definitions will be full of dependencies. Now all the fundamental elements of your program depend on each other and can't be separated.
It's like trying to pile up concrete until it forms a cave instead of laying it down to make the floor and building on top of it.
> Firstly, I didn’t talk to the person who wrote it. I rewrote the code and checked it in without their input. Even if it was an improvement (which I don’t believe anymore), this is a terrible way to go about it. A healthy engineering team is constantly building trust. Rewriting your teammate’s code without a discussion is a huge blow to your ability to effectively collaborate on a codebase together.
I totally disagree one million percent! If you are on my team and you want to rewrite code I wrote (cause it sux) then do it! Don't ask, just DO IT! DO IT NOW! Have a blast, tear it apart, rewrite all my shitty abstractions and see if you can do it better. If the result is better code, then awesome! I learn something and the software gets better. If the result is worse code, then awesome! I'll tell you why, you'll learn something and your improved understanding will allow you to write better code.
I also think that a salaried engineer who thinks that a piece of code he or she (but almost always he) wrote is "his" or "hers" is totally wrong. It's the company's code. Having a false sense of ownership towards that code will just cause grief for that engineer and friction for his or her team.
> I also think that a salaried engineer who thinks that a piece of code he or she (but almost always he) wrote is "his" or "hers" is totally wrong. It's the company's code.
While this is _technically correct_, this isn't how humans work. Humans attach their worth to things they do, even when they shouldn't. It's a difficult thing to avoid to most people so if you write some code, commit it and then later see a teammate completely rewriting it, it can come off as either:
1. You spent a lot of time and hard work on code that you kinda identify with (I mean, it's code you wrote. It's your "art") and then someone comes in and re-writes it without even asking you about it can make it feel that they think you're an idiot.
or maybe even worse
2. They didn't know the reason you wrote it that way (maybe it was in support of future changes?) and now they just screwed it all up.
Maybe you haven't run into either of those situations. If you haven't, great! I spent the first part of my career hitting the first one because of how hard it can be to disassociate with the work you produce and the second one I see happening occasionally. It's a breakdown in communication within a team.
That’s the behavior of a junior engineer though. It’s a sign of an inferiority complex where any change feels like some kind of judgement on your ability to write it “correctly” the first time.
You are not your code, the code you wrote is not representative of you. Nobody who rewrites your code thinks that either.
You’re on a team working to build something larger than yourself. You’re not a bunch of painters sitting with your own easels in a room hoping to tape all of your paintings together in the end.
The best engineering teams I’ve ever worked on were groups of people that had no issues at all modifying each other’s code because we all trusted each other to do the right thing. The worst were the ones where each module had one “owner” who handled all of the changes to their module and “nobody else’s”.
Same here. And my own code is rewritten frequently as business needs change or better factorings are discovered when we hit the rule of 3, as mentioned in this thread.
It's nothing personal; we all get paid and go home. There's a lot of bruised egos in this thread that would be better spent in therapy, I think.
The fix for this is to have a review system in place; you can then catch blatant code repetition before it goes in, and try to use your soft skills to guide the author toward a different solution such that they think it was their idea all along.
If you're the one who has created it and who touches that part of code more often than others then there is a notion of ownership: you don't want to accept changes that will get it into dead end and slow you down. That's how code review emerges. It's the same as the pull request system in the opensource development.
I have a very different view to this. As a disclaimer: I am working in a smaller team where most code is non-trivial, may be it is different in very large teams on large code bases.
This isn't a question of ownership in the sense of a private property. It is about how you interact and respect and professionalism. First of all, a larger change to code a teammate wrote has a ring of saying: it was badly written. So this is partial a matter of social interaction. If the code is badly written, it needs to be fixed, but one should be reasonably polite about that. Usually, the better way of doing this is having a quick chat with that person where you explain why and how you think the code should be changed, before you just do it. Also to pass on any insight you have about the code, the creator missed.
On the other side, there is the big elephant in the room: unless we are talking about rather simple code, the code perhaps was written as it was for a reason. This often enough isn't apparent on the first glance. So a refactoring without checking with the author always bears the risk that you introduce a bug or at least get into the way of planned changes, which would be incompatible with the refactoring.
So there are many reasons to quickly check with the author before doing significant changes. If the proposed changes are good, then the author will be happy about you doing them, if not, then its better to have talked about them before doing them. And in any case, it is good to have communicaton about major changes.
> First of all, a larger change to code a teammate wrote has a ring of saying: it was badly written.
Not necessarily. In actively maintained projects code change all the time in some area, and if you are doing proper commits very often a piece of code does not magically appears in a perfect state but is rather improved in a series.
If you can improve the code you wrote, others can do too.
Code quality is not a binary thing. Moreover, if code quality actually has been improved, you should be happy with it.
Now how it is done is also important, but neither more nor less than any other human interaction.
But that is exactly my point: there are not only a technical but also social considerations. And yes, in the end a code quality improvement - if the refactoring actually is - should make one happy. But still it is the decent thing to check with the author first, if that is possible.
Several years ago, I wrote a big chunk of code to analyze engineering data from an engine test cell, and display a graph of the results. Someone else had done the hard part; I was merely coding up a gloriously-complex Excel spreadsheet in C++. I grabbed data from a MySQL database, and labeled the row data like: row[combustion_air_mass_flow] + row[fuel_mass_flow] * row[specific_gravity_of_diesel]. (Or whatever; it's been awhile. You get the idea.)
I had HUNDREDS of lines of calculations, and each variable was very clearly understood so that you could trace the whole process. The code was running and producing replicated results from the spreadsheet. We started trusting it to process new test runs, instead of copy-pasting into Excel.
The next morning, I came in to find the "other" programmer had stripped EVERY variable reference, and replaced them with the column numbers. There was absolutely NOTHING in the code that could help you understand that "combustion_air_mass_flow" was column, say, 54.
I turned to him and asked him what happened, and he said it was inconsistent with the rest of the code base. I was literally gobsmacked. I racked my brain for a response. In the awkward pauses, he admitted that my code was better, but he couldn't bring himself to use it, because that would mean that he would have to go back and recode every other place that worked like that, and there were many.
He was the guy responsible for most of the system; I was just writing this part because I'm an engineer who codes, and could understand the actual science going on. In the end, I re-replaced my equations with my previous code, and wrote another couple hundred lines defining column number to engineering variable, to "translate" his column numbers to something that made sense.
So, no, I don't believe in "do it; don't ask."
And, if, by some chance, you see this, Chris, I still think that was the weirdest flex I've ever seen.
This reminds me when a developer took over my codebase while I was on holiday. When I returned I had discovered that he converted all tab indents to spaces across the entire project. He completely destroyed my ability to perform diffs against earlier commits, because his preference was evidentially more important. Of course this was all justified with a link to Google’s coding style guide.
The commonality in both your and the parent's situation is that you reacted defensively. Your response was "You destroyed my tabs!" when a proper response would have been to try and understand your coworkers motivation. Perhaps he had some really good arguments for preferring spaces over tabs? Tabs can be problematic in heterogeneous environments where developers with different tab settings and different editors work on the same codebase. Have you read the arguments in Google's coding style guideline?
Maybe your coworker was a fucknut that did it just to mess with your code. I wasn't there so I can't know. My point is that you shouldn't assume that.
Frankly, the justification doesn't matter here: A change of that scope should be discussed and agreed on with teammates before you even put up a patch, not committed while the primary developer is out of town.
If this happened to be Python, there is no such thing. Two pieces of Python code that have different semantics can look identical under a whitespace-excluding diff, so you must not habitually use such a thing as your go-to comparison method.
For instance if we edit:
if condition: if condition:
blah blah
blah -> blah
blah blah
then nothing shows under "diff -b" or "diff -w". With a different kind of language there will be a non-white-space difference due to the changing position of a brace or other delimiting token.
Otherwise, exactly the remark I was thinking of making.
I need to disagree with you, I've been in a few positions where one engineer suddenly decided to rewrite parts of the code base without any input from other engineers. It's a huge blow to team morale, and it gave me a fear of writing code in this team.
Every time I wrote a piece of code, I wondered how long it would be there for, I understand that code evolves, but seeing your code being rewritten after a week is no fun, and it's a huge blow to your confidence as well.
I understand that code needs to be rewritten, because of requirement changes, or different understanding of the original problem, but talk about it, bring it to the attention of the team as to why you feel we need to rewrite. Make sure that the team understands why, and sees the value in it.
Just rewriting code on your own is a big no for me, and to me, breaks the trust that we had.
> Every time I wrote a piece of code, I wondered how long it would be there for, I understand that code evolves, but seeing your code being rewritten after a week is no fun, and it's a huge blow to your confidence as well.
Here's some tough love. If people frequently rewrite code that you write it is because they are stronger developers than you are. You, the junior developer, can either sulk about it and feel miserable or realize what an amazing opportunity you have to improve.
Swallow your pride and approach the person who rewrote your code: "Hey, I noticed that you rewrote the code I wrote last week. Mind telling me what was wrong with it so that I can learn from you?" If the person answers "No" or "I don't have time" that person is an asshole and you are in the wrong place. But if the person answers "Sure! Let's schedule a meeting in a conference room with a whiteboard this afternoon and I'll explain what was wrong with it!" then you are in a great spot!
> Just rewriting code on your own is a big no for me, and to me, breaks the trust that we had.
Not for me. I trust my teammates judgement. They know when it's a good idea to rewrite my code and when it's not. No point wasting time holding meetings and bikeshedding over minutiae over work that can be done in a few hours. 9 times out of 10 they will do the right call. 1 time out of 10 they won't and we revert - no damage done.
I think you’re missing the point as to why I might lose confidence. I’m fine with people being better than me, especially when they explain why they did certain things. But we work as a team, that means that before we do work, we agree on one thing. If suddenly an engineer decides a week later he wants to do something else, then that’s not cool. Even if he has a valid reason for it. Instead you talk to the team, and agree to change.
Again, there’s a reason why we work in teams, and not as individuals. I trust my teammates too, but rewriting code without any discussion breaks that trust for me.
You can discuss when the rewrite is done, that way it wastes less of your time: you can talk about non-hypothetical existing thing. If the discussion shows that the rewrite is bad then it's his problem: he has to fix or revert.
Plus even the initial rewrite is more time-consuming to do if he doesn't discuss it with the original author beforehand.
> Here's some tough love. If people frequently rewrite code that you write it is because they are stronger developers than you are. You, the junior developer, can either sulk about it and feel miserable or realize what an amazing opportunity you have to improve.
Why is this condescending tone and unfounded assumption acceptable here?
It's like, Well, lookie here, a stranger with a generic opinion! What an amazing opportunity for me to assume specifics and talk down to him! You're welcome!
I care more about the way it gets replaced. I constantly learn from my code being replaced, and replacing code myself. However I doubt anyone liking code being pushed onto them. That’s how I feel when someone decides to rewrite code without anyone’s knowledge. They already did the work, so they’re expecting it to be merged in.
I know that programmers are humans but I think paying too much attention to people's feeling is what is dragging the IT industry down, if you have self esteem issues go see a shrink
It’s not about personal feelings or self esteem, it’s about trust within the team, and confidence that when you do something the team sticks to it. When a rewrite is required, the team talks about it and then acts.
> I totally disagree one million percent! If you are on my team and you want to rewrite code I wrote (cause it sux) then do it! Don't ask, just DO IT! DO IT NOW! Have a blast, tear it apart, rewrite all my shitty abstractions and see if you can do it better. If the result is better code, then awesome! I learn something and the software gets better. If the result is worse code, then awesome! I'll tell you why, you'll learn something and your improved understanding will allow you to write better code.
I see your point of view.
There is a need to detach from your code while still being passionate in programming.
The key is the mindset: programming so that people enjoy what you're programming and not just for the sake of programming.
Also, to note, the refactor presented should objectively serve the same cause/requirement better than current code.
I've been a lead myself here and what danabramov nailed is that he realized this:
> My code traded the ability to change requirements for reduced duplication, and it was not a good trade.
I think the key to most of this situation is right there. It's also something the original author largely misses.
The key is this: want.
If you want to rewrite some code you find, then kindly leave that keyboard and go play with some toys. This, "wanting", is the actual origin of the author's anecdote. "I saw this and I thought it was bad and I felt the urge to rewrite it". Ok, but don't.
What is missing is knowledge on the circumstances of the code. You see some piece of code and you're only seeing that, the code itself. But why is it that way? Is this code supposed to be modified/extended/reduced/deleted in a near future? Frequently? Or simply, as a first question: Does this code need to be better?
And even if the answer is yes, then in what aspects does it need to be better? What does "better" mean for this piece of code? Did you want to just make it cooler looking because you didn't like it? Or have you actually detected that some of the needs this code was supposed to fulfil aren't being fulfilled? Of course, if you only now saw the code and do not know what those needs are then you're in no position to rewrite it, just because you want.
But then again, if there is indeed motive to rewrite it, by all means, do.
Following this idea will probably lead to having to ask, sure. But you won't be asking for permission, you'll ask because you need information before you can decide.
> What is missing is knowledge on the circumstances of the code. You see some piece of code and you're only seeing that, the code itself. But why is it that way? Is this code supposed to be modified/extended/reduced/deleted in a near future? Frequently? Or simply, as a first question: Does this code need to be better?
That's exactly why you should always be very careful refactoring old code that you don't fully understand. It might look like spaghetti code that contains a lot of weird artifacts, when in reality it once was clean code that had to be edited dozens of times to handle edge cases. Not that it can't be refactored, just that it's likely not to look that much cleaner if it's to provide the same functionality. Refactoring old cold just because it looks ugly is often a waste of time and money.
This is fine in some environments, but not all. I worked on a code base once that had over 150 engineers. Teams were broken up mainly along feature and UX lines, but there were a few cases where some features were used in multiple user experiences. I worked on one of those features.
There was an engineer on one of the user experience teams that didn't understand this and decided to rewrite a portion of our code to make it more performant in their UX. In doing so, this engineer introduced bugs into our feature that adversely affected other user experiences within the application, but was not apparent in their own implementation. Had our team known about this, we would have very easily 1. been able to point out the bugs 2. helped the engineer with a better solution.
This wasn't a small effort, the engineer's team received a requirement from their PM, groomed the ticket, architected the plan, assigned the ticket to the engineer to work, the engineer wrote the code, then had at least two reviewers on their team approve the change before merging it. So we're talking multiple points of failure, I don't blame the engineer individually.
The fallout came a week after the code was deployed when the UX team flipped on the feature toggle. They checked their UX, it looked good, and continued on. Meanwhile other UX teams started seeing crashes coming from our feature. (the crash didn't manifest in testing because it was toggled off, which was a failure on the UX team)
This wrecked collaboration and trust across our codebase in multiple ways and lead to increased overhead processes. We dealt with the initial fallout from UX teams not trusting our feature, which then evolved in to UX teams not trusting each other, UX teams not trusting feature teams, feature teams not trusting anyone, an no one trusting the procedures in place.
Yes there were multiple failures. But at the end of the day, this whole scenario could have been prevented with a one line message: "Why did you implement this in this way?" No one was emotionally invested in the code that was changed, it was for all intents and purposes crap code, but the crap code worked, the new code didn't, a simple courtesy check would have saved a ton of time, money, and trust.
If you are on my team and you want to rewrite code I wrote (cause it sux) then do it!
Sure, rewriting is fine, but ideally it would come in the form of a PR rather than being checked in, so if you had valid reasons for writing it that way, it can be reverted before anyone else layers code on top of the bad rewrite.
Redoing work you just did is tantamount to critisicm. I agree that everyone should welcome constructive criticism, but some tact is necessary in applying it.
> I also think that a salaried engineer who thinks that a piece of code he or she (but almost always he) wrote is "his" or "hers" is totally wrong. It's the company's code. Having a false sense of ownership towards that code will just cause grief for that engineer and friction for his or her team.
I think some places value a sense of ownership because that person then becomes responsible for maintaining that code and reviewing changes. There is someone accountable for it. Not that they have any special rights to that code.
Please don't consider refactoring or rewriting equivalent to redoing. Refactoring or rewriting means that I'm adding my name to the code, but I'm NOT removing yours. You still did the hard work of coming up with the original functionality even if it is later changed. Refactoring does emphatically not mean that your original work had no value.
Like if someone were to change all my spelling and grammatical errors in my top comment. I wouldn't mind at all. Hell, someone could even rephrase the comment entirely and remove the weak points and emphasize the strong ones as long as the main message is the same. I'm not a native English speaker so I'd probably learn from the experience.
> I also think that a salaried engineer who thinks that a piece of code he or she (but almost always he) wrote is "his" or "hers" is totally wrong
This. If some developer get offended when there is some issue in their code or the way it got implemented either they are not mature enough or there is a cultural issue in the team.
The point is, code "ownership" shouldn't be understood as a term of property - of course it isn't the property of the programmer but the company - but in terms of responsibility.
This might be different in very large teams, but usually, once you write a piece of code, you are the the prime responsible person for maintaining it. And as long as this is the case, I would expect to be involved in any significant change to the code. Of course I am fine with changes, which also transfer the responsibility :)
What projects is everyone working on that they have the time to keep rewriting each other's code all the time? :)
If such teams did code review, maybe they'd get it right on the first commit and would only have to refactor when the requirements change or there's a clear benefit.
That sense of "ownership" in some small piece of the product? Helps build loyalty to the project, and the company. Seeing my baby with that brand name on it helps me feel a part of the team. I've left many a project that didnt offer me the loyalty of letting me fix my own mistakes (after some guidance).
Toe-stepping aside... How else are you going to develop this apparently inefficient coder without including them into the process of reforming this code? This isnt just about ego, this is about skill. This is about the code Im gonna write in the future. And here's your opportunity, written in our codebase.
I'm 52 many would consider my code a mess. Been a professional coder -> solution architect all my life, I work for me now with my own apps. With my own code I clean things up when I can, but sometimes it isn't worth it. I used to write clean code, spend time doing it but no more.
- Rewriting requires retest, introduces new bugs.
- If it ain't broke, don't fix it.
- Users don't care about clean code. They only care about the end product.
- Developers (less experienced?) obsess about this stuff. It is so expensive. Layers of coding management to deal with it all.
- If your code requires constant maintenance you are doing it wrong.
- Obeying the above rules means you can leave old software behind and generate new software if you want.
I am about to celebrate 10 years of my first software (an app) I wrote for myself. It is still in prod. It is fine, been earning $$ for years. I don't touch it unless I need to support new devices or support API updates. On the flip side, there isn't much code I wrote in my previous professional career that is still live - the reason is vendors only make a profit because they need to generate cashflow constantly, and they do it by breaking the above rules unnecessarily.
Do you work for a vendor? Think about the real reason you are breaking the above rules - the reason is because the extra time is billable. You are not breaking the above rules because it is the best way to write software. You are breaking the above rules because it generates the most profit for the company you work for.
When you work with your own apps you can be as messy as you like. In a team there is some incentive to make the code readable to other developers. At the lightweight end you have coding standards and linting, then design patterns then finally big refactorings. It’s all trade offs. Bug team products can have islands of code that are like one person projects and those can be messy for example.
Code with duplications isn't less readable. Quite the opposite. Fighting your way through layers of abstractions and generalizations can make it harder to understand for someone new to the code. Especially code that will rarely be touched except for bug fixes, having it as simple as possible can often be worth it.
And in the end, users will also appreciate this. Any time spent refactoring is time not spent on new features or bug fixes. A user doesn't care if your code is cleaner (unless it was buggy before), they'd rather have new features or improved usability.
It's gotta be case-by case, but upon shallow reflection I feel like 1 layer of abstraction is the sweet spot. I've worked on code where I have to "Go-to definition" a dozen times to figure out what is going on, and I hate it. And I have seen code where it's just a big wall of text hundreds or thousands of lines long, and I hate it.
> Users don't care about clean code. They only care about the end product.
I'd disagree. Both code and compiled product are products, with different users. Your code's users care about the code. Your product's users care about the product.
The "whatever works" approach works for a solo developer of non-free software products, where there are no other users of the code. Or for one-off scripts that aren't meant to be used as code after they serve their purpose. Whenever there are other people who get to work (in any sense) on your code, things become more complicated.
Which doesn't mean code must be somewhat "clean" to some standards. IMHO, the only thing that matters is that every user of the code can understand its logic and feel good (or, at least, doesn't feel bad) working on it.
If you don't have automated tests, and start writing them as early as you can, you're not going to have a good time. I hate working at shops or on jobs where there is zero unit testing. You have to have tests to safely refactor things. If you miss something, you add a test. If you get a false failure; figure out what was wrong with the test.
Retesting should be as simple as running the test scripts. If it's not automated, it's not tested correctly.
I'd rather have automated integration or system test and no unit test than the other way around. I find unit tests uncover the fewest problems of the three.
> - If your code requires constant maintenance you are doing it wrong.
Disagree. Your code should reflect the business you're in, and changes to the business are what creates opportunity and let you turn your skills into profit.
Clean code is absolutely a means to an end, and if you've got a codebase that's just sitting there fulfilling some static business purpose then yes, it makes sense to leave it ugly. (Similarly with code that you know is being end-of-lifed soon - if there is not going to be a need to make changes to the code in the future then it's fine to load it up with technical debt until it sinks). But that's the exception rather than the rule.
Even for static business purposes, the target environment changes over time. The OS gets upgraded, security bugs are discovered, dependencies become end-of-life'd, language runtimes are replaced, etc.
I find that every 1000 lines of code I write, my productivity goes down a bit more. Without refactoring, my code just gets really coupled and it gets harder to change. Your position is that the time cost of refactoring is less than that of speed up benefits?
If you duplicate and don't try to generalize wherever possible, these links won't hit you as hard. Changes to one module are much less likely to cause bad behavior in others compared to code where as much logic as possible is generalized and shared between functions.
100% agree. the difference comes from the fact that you are either a programmer or a businessman. a programmer strives for perfection, a businessman for functionality. i am trying to transition from the former to the latter but it is really hard to "get it" and "let go" of some things.
> There isn't much code I wrote in my previous professional career that is still live
That means your impact on the world has been very limited. You haven't contributed to the basis of what other coders used. Not that this is illegitimate - but I believe we should strive further.
> the reason is vendors only make a profit because they need to generate cashflow constantly, and they do it by breaking the above rules unnecessarily.
1. "vendors" don't control all code. In fact, the most important bodies of code we all use were released for general use rather than sold commercially: Free Software libraries, drivers, kernels and applications.
2. Not all commercial companies want you to write this kind of throw-away code.
> You haven't contributed to the basis of what other coders used.
What fraction of us can claim they have, really? Most developers work at the application layer, the last one. The more foundational your stuff is, the less of it there is, because it is reused everywhere.
For similar reasons, very few people work on massively popular software. Most work on software that have only a couple users, or even just one (typically one corporate user). Very few people are famous, because fame is fundamentally scarce.
Implying that every programmer worth their salt should have produced code other programmers use is just not realistic.
> What fraction of us can claim they have, really?
I think it's larger than you imagine. Just look at the code of GitHub, BitBucket, Sourceforge etc. Not to mention self-hosted commercially-developed FOSS.
> Most developers work at the application layer, the last one.
Well, that doesn't mean they have to work _only_ on that. Each developer uses a bunch of libraries, utilities and frameworks which are either FOSS or could use a FOSS alternative.
> For similar reasons, very few people work on massively popular software.
Well, yes, this is true, but you can work on somewhat-popular or even niche software which is still used by hundreds of thousands, or just thousands, of people. That's still very significant!
> Implying that every programmer worth their salt should have produced code other programmers use is just not realistic.
Well, drop the "have". I think programmers worth their salt should strive to produce code that other programmers use.
Morever - I believe that it's the visible code, and the free code, is what we should use as the model and the target of advice and improvement.
> Just look at the code of GitHub, BitBucket, Sourceforge etc
I think those are misleading on two accounts: first, it's the tip of the iceberg. Most co-workers I've spoken to don't contribute to any such open source projects. They tend to have other priorities, starting with the proprietary or custom software they are paid to write. Yes, there are many programmers writing open source code out there. I'm willing to guess however that much more time is spent on proprietary or custom software.
Second, there's a lot of unused garbage out there. Yes, there are a massive amount of code that could be used by others, but the only code you see is the tiny fraction that is actually used by many people. Such is the way of search engines, they show you a highly skewed sample, biased towards fame. (And rightly so: famous stuff everybody use tends to be what you are interested in to begin with.)
> Each developer uses a bunch of libraries, utilities and frameworks which are either FOSS or could use a FOSS alternative.
That's the thing: everybody uses a bunch of libraries and utilities all the time. I believe much fewer people writes those utilities. Or at the very least, much less time is spent writing them.
The more popular the library or utility, the more pronounced this is. Everybody can write something 10 other programmers will end up using. Very few will be (un)lucky enough to write something that have enough users to be known across the planet.
Take me for instance. I've sought my GitHub ranking¹ which is mostly concentrated in a single project². Numbers suggest I'm in the top percentile. Not the most popular code out there, but I'm doing pretty damn fine.
Yet: have you even heard of those? Possibly, but I'd say probably not. The top percentile simply isn't enough.
> […] you can work on somewhat-popular or even niche software which is still used by hundreds of thousands, or just thousands, of people. That's still very significant!
Agreed. This is all a matter of degree, after all.
> I think programmers worth their salt should strive to produce code that other programmers use.
I'm not sure I agree with that goal. And I say that as someone who dreams of having a global impact. Sure, any professional worth their salt should strive to improve to at least some decent standard of excellence. (Some professions, like classical music, tend to require excellence merely to enter the field, which we could argue is maybe going a bit too far.)
Having other programmers use your stuff is a great way to have feedback and improve up to that standard, but I don't think it's not the only way. It may currently be one of the easiest, though.
> I believe that it's the visible code, and the free code, is what we should use as the model and the target of advice and improvement.
If we're going down that path, I'd go as far as to say pretty much all code should be free to begin with. Following that thought to its conclusion though, I quickly concluded that universal free software is mostly incompatible with capitalism. I can write software in my free time, but I need a salary in the first place to even have such free time. Currently, that salary comes from a day job dedicated to proprietary software.
I mean, recall the OpenSSL debacle. That piece of software is used literally everywhere, and we had to have Heartbleed for insanely rich corporations over the world to even notice that trickling a few drops of money down this project might be a good idea.
> Most co-workers I've spoken to don't contribute to any such open source projects.
You asked "what fraction can claim they have"; and you're saying "most" of your co-workers haven't. Fair enough, it's the same for me actually. But "Most" is just over 50%. A sizable fraction have.
> Second, there's a lot of unused garbage out there.
Yes, that's true (and also a lot of unused gems). But even if we only take code that's seen use by others, my previous argument stands. Of course, "used" can mean 10 people or a Million people.
> Take me... GitHub raking... I'm in the top percentile
Wow, I didn't even know about these rankings! Thanks!
Anyway, I looked at my rankings, but more than that - I looked at the overall number of contributors. There are supposedly [1] about 24 Million software developers in the world today. Now, There are 300K C++, 300K C, 760K Java, 600K Python and 1M Javascript contributors on GitHub, Then it's pretty safe to assume that between 5% and 10% of developers contribute merely on GitHub. Actually, yeah, over 2 M active users in 2017 [2]. And again, there are other venues for FOSS contribution, like I mentioned. So dropping the unused stuff, we're still close to 10% of developers.
> I quickly concluded that universal free software is mostly incompatible with capitalism.
If you put it that way, I agree that that's the case. But - I didn't mean to say programmers should be inspired just by the fact that software can be free; I also mean that they should strive to write their software as though it were about to be released as FOSS.
> That means your impact on the world has been very limited. You haven't contributed to the basis of what other coders used. Not that this is illegitimate - but I believe we should strive further.
I don't think many coders can claim that much of their code from decades ago is still alive. And rightly so! The areas where code is used for that long (e.g. core banking systems) are not necessarily the most pleasant to work with.
> A healthy engineering team is constantly building trust. Rewriting your teammate’s code without a discussion is a huge blow to your ability to effectively collaborate on a codebase together.
I'm against the idea that people should be attached to "their" code (that is: the code they wrote). Now I also understand that humans that humans, but the priority should be to make them evolve toward more detachment from their work and acceptance that what they design "can" (actually will) be imperfect, rather than avoiding upsetting them even when it would not be justified to be upset. Plus one essential purpose of source control is to be able to make changes, and revert them if they were "wrong"; or for any other purpose. Maybe propose a patch instead of committing directly, but that's really a cultural matter about how projects are organized.
I don't want to ask of permission for an improvement. If there is a need to have formal authorization of maintainers responsible for parts of the code, they setup just that. Otherwise, I'm certainly going to improve "your" code, in some rare cases without even telling you (that's not a goal in itself to do it behind your back, obviously, and I also value collaboration -- but that should not be a problem). The question that remains is: is it really an improvement. If you are not sure, then maybe don't commit. If you are, do it (following your local rules), and if it ends up being a mistake, yes, it will be reverted, and so what?
You should not take issue of your work being reverted (for good reasons), like other people should not take issue of "their" code being modified. Better ask for forgiveness than permission.
> For example, we later needed many special cases and behaviors for different handles on different shapes. My abstraction would have to become several times more convoluted to afford that, whereas with the original “messy” version such changes stayed easy as cake.
> You should not take issue of your work being reverted (for good reasons), like other people should not take issue of "their" code being modified. Better ask for forgiveness than permission.
Changing one developers _working code_ after they’ve invested a significant amount of time into without discussing it with the team first it is to basically heap on a number of unwritten requirements and also decide that the schedule allows for addressing them. If removing repetitive code is a requirement then the team needs to be informed that code will be reviewed for repetition, and introduced to various techniques for spotting and removing repetitive code. The schedule also needs to be adjusted to allow for this additional work.
And if the rewritten code is so superior than perhaps there needs to be a discussion about whether there should even be a team, or whether the other people should be let go and all work assigned to the developer who can do the work “properly” the first time. After all, what’s the point in having one developer’s work constantly rewritten by another?
I had an experience this year where I was working with another developer. This guy would legit walk behind me rewriting everything I did. To put this into perspective, I've been doing this work for over 20 years this guy came out of a bootcamp last year.
But it wasn't even that this guy was rewriting things, it's that he was constantly breaking things as a result. Imagine every fucking day you come in and something's broken that was working last week and you're like "wut?!? that was working, what happened?!".
This was a contract with a guy I'd been doing work with for years. The other developer was from another country and was one of his first employees. The straw for me was when he rewrote my small testing framework. I spent about a day trying to standup a unit testing library in the framework they were using and finally decided it would be quicker to just write a simple testing mechanism + runner, which took me a few hours to do. I come in the next week, and this guy had taken that idea, and then just rewrote the entire thing.
I called the owner up and just straight up told him he's paid me to do nothing. Literally every single line of code I've written has been rewritten by this guy, only in a shitty, buggy manner. That was 2 months in, and at that point I had tried talking to the guy about it and I was done.
Fast forward to today and that system STILL isn't working. It would've taken me 4 weeks at MOST to build and stabilize. We're coming up close to a year now. I'm still good friends with the owner so I ran cloc over the codebase. It went from 3k LoC to 21k LoC. The owner finally admitted to me that he's going to be paying for employing that developer for a very long time.
So while I understand the sentiment about not 'owning' code and would have agreed with it before this experience, the trust issue with respect to rewriting is waaaaay bigger than that.
I told the owner that this developer should never have been let near a code base with any ability to make decisions on their own. They needed years of mentorship. I literally disagree with every single decision they make. It's mystifying. It's almost like the guy always asks me my opinion and then does the complete opposite, only that's not the case.
You sound really angry about it and you've hung onto it for quite some time. A good friend of mine used to tell me 'who cares?' when I would rant on about things just like you just did. It is honestly startling to hear that in response to a rant. If you answer 'i care!', then that is exactly the problem.
Let it go, it isn't worth it. At the end of the day, I'd say this is your problem to work on. You didn't step back, slow down, forget about the code, and mentor the other developer well enough when you were there. That is kind of exactly what the OP is saying in his post.
(Side note: I've been you many many times and it only hurt me in my career. I actively work against it now and things have been much better.)
I've only seen that once in my career. One developer rewriting existing code constantly, to such an extreme degree it's unfathomable.
He rewrote large amount of scripts (often overnight). Other developers would wake up the next day or continue the project next week, only to find a swath of issue reported by users and the project is gone (rewritten and moved). Did that for while, destroying the work of many people across a number of teams, making enemies and leaving a trail of projects ablaze. Didn't manage to get him fired (organizations rarely fire people) so over time other developers simply stopped working (what's the point when it will be undone the next day) or left.
At first, I thought this was the normal junior mistake. Reading code is quite hard, so one reflex of junior developers is rewrite existing code to make it "prettier" or "better", of course it's just a path to understanding the code because they understand what they just rewrote after the fact. One aspect of learning real world software developer is to stop the urge of constantly rewriting like that.
But that was not it for this developer, he went on and on never learning. Trying to mentor lead nowhere, his stubbornness and deeply ingrained vision was stronger than the will of any lead or manager. Worst challenges were, he fundamentally disagreed on almost all objectives, required features and design decisions, he simply had a different vision and ideals deep down that there is nothing you can do to reason about. He was ultimately unmentorable and unmanageable.
I specifically state it was a contract and you're talking about a job.
this reminds me of Steve Yegge. He did a post on the difference between employed developers and entrepreneurs, and how they're completely different creatures.
You've allowed yourself to become a pet and you've given up the ability to care about your work for food and you think it's a moral failing in others who have not.
Your responses to this thread still makes me feel like you're the problem. You're just trying to point the finger at someone else, instead of looking inwards.
My supposition is not about caring about the quality of work at all. You totally missed the point. It is about being upset about things and not dealing with it in the right manner.
If this developer is breaking the company, then it is your job (while you were there) to mentor the developer. It isn't about the code at all, it is about the culture of the company to tolerate the behavior that the developer is exhibiting.
What you describe as a problem, is such a general common problem in this industry that as a more senior person, you should have been able to help define better.
Once you have the policy in place, it is much easier to go to the CEO and say... "hey, we've put these policies in place, everyone agreed to them, i've tried to mentor the developer and nothing is working. i don't think this person is beneficial for the business."
Until you do that, it is he/she/they said and the likely non-technical ceo has no understanding of what good/bad code practices are.
I'm also a contractor, been one for ages now. I only take a 'real job' when I'm a founder. I've got perspective on both sides though.
And on the 8th day latchkey created programming. And he saw that it was good. And then the sun started shining, the rainbows came out, and everything was perfect.
In reality I walked away from the work, told the owner in no uncertain terms that he needs to get rid of the other developer and I refused to be responsible for the quality of the work. Fast forward to today, I'll be putting a new system in place later this week to replace the old because the owner finally called me back desperate after 6+ months of this system not working when the initial estimate was for 4 weeks.
See, the great thing about it is that I do good work. I insist on it. I've been doing good work for this guy on and off for 6+ years. I still do good work. This guy knows this. And I guarantee you, the next time I call him up and tell him one of the developers doing work for him isn't worth the time, he'll listen. Because in all the years I've done good work for him, I've only said this to him once, and time showed me to be 100% correct. Because, as it turns out, despite the fact that I don't shit sparkles, I do try to work with people until it's obvious they're not worth the effort.
In a certain perspective, by quitting this conversation you too „don’t care“ about what latchkey thinks about this topic, though „someone somewhere“ (most likely their employer) definitely cares that they would tolerate the behavior your coworker displayed. You are doing what they propose, just at another perimeter of tolerance.
I don’t think that you „fundamentally disagree“, but that you have different understandings on when the time has come to quit caring. And given the limited amount of things we can care about during our lifetime I find it certainly helpful to reevaluate from time to time what I decide to care about.
I agree that your coworkers behavior is harmful to a team’s productivity and I would seek the conversation to resolve the disagreement in a professional manner. But if there is no path of a resolution, I would escalate the problem, stop caring and get on with my life, because there is nothing more I could’ve done.
> Changing one developers _working code_ after they’ve invested a significant amount of time
How much time they invested is irrelevant. There are lots of times where someone is wrestling with something for so long they just want to get it done and don't want to look at it anymore. Many times it's trivial for someone fresh to tidy it up
> If removing repetitive code is a requirement then the team needs to be informed that code will be reviewed for repetition
This focus on "requirements" is only really trotted out when someone doesn't like someone changing their code and is looking for a defense. In reality, coding is super subjective, there are dozens of aesthetic judgement calls everyone has to make. Saying something "wasn't a requirement" basically means "fuck off I don't like you touching my code".
There's no way to codify concrete requirements for handling every possible way code can be improved. If devs are throwing requirements at each other, that's a culture problem, not a spec problem
That is a central point to gp's argument though. The amount of time does matter, because it affects the schedule and hence triggers management decisions.
> ... aesthetic judgement calls everyone has to make. Saying something "wasn't a requirement" basically means "fuck off I don't like you touching my code".
If you believe that then you really shouldn't be tidying up other people's code. Why are you making people unhappy over something that you think is an aesthetic judgement call? Let them be happy.
> There's no way to codify concrete requirements for handling every possible way code can be improved.
If the change can’t be codified, then it calls into question why the change needs to happen at all.
Assuming that there is some improvement that can be justified, then explain it to the team, and have the team start reviewing for it.
By not going through the process of explaining the improvement to the team, it signals that the team can’t be trusted to learn and improve code on their own. Again, if that’s the case, a conversation needs to happen about whether there should be a team at all.
If you don’t trust someone to change your code to make small improvements, it sounds like you have a trust problem, not the other way around.
Working on a software project as a large team is like growing a garden, not building a house. Just because someone planted a hedge doesn’t mean it can’t be improved by pruning it.
Trust would allow that the code is modified in the first place, but then a code review needs to happen anyway to ensure that at least one more person is aware of the change and as a sanity check that the change makes sense.
> If the change can’t be codified, then it calls into question why the change needs to happen at all.
No, there are lots of things that are difficult to codify but are very valuable. If you look at law and policy you'll find plenty of examples. This is akin to saying "If I can't formalize it mathematically, it doesn't exist" which is obviously false
Additionally, there's the category of things which are maybe in principle codifiable as requirements, but are tedious to specify. I would definitely put aesthetic judgements about code in this bucket. It's much simpler if you say something like "and the developers will use their best judgement on how to write clean maintainable code" and be done with it
> If removing repetitive code is a requirement then the team needs to be informed that code will be reviewed for repetition, and introduced to various techniques for spotting and removing repetitive code.
Things like this are never requirements, at least in the conventional sense of 'things the stakeholders ask for the software to do'. Nor should they be: it's our job as professionals to know how to do our work. I don't want a PM or sales rep worrying about the Liskov substitution principle (unless I'm working for code climate or something): it's a waste of their time.
Code cleanliness should make future changes easier. Of course, some 'cleaning' expeditions do the opposite, so there should still be code review etc.
In an environment where all code must be reviewed, when you make a change like this you send it to the original owner for review. Or at least someone on the same team, if they're not available.
"Don't be too attached to your code" means trying to review improvements to the code you just wrote objectively. Can you accept changes with good grace?
But you can't get to the point where you have a smooth-working team if you have philosophical differences about what improvement looks like! If one person's improvement is another person's regression then those difference need to be worked out and some kind of synthesis agreed to. This can take a lot of time, but it's necessary work to get to that ideal.
On the other hand, a lot of work gets done by teams that are not smooth-running teams that are in philosophical agreement. It may even be a good thing to have a diversity of opinion and avoid group-think? You have to be able to accept some messiness though.
I think there’s a balance. There’s a difference between gradually improving something, or improving it at some point later — and literally rewriting 100% of code someone has just landed over the night.
I agree with you. To me, the action felt wrong when he described it. I wasn't sure why at first, but I think perhaps because the work was not in service to any actual task.
If he was in charge of writing the next piece of functionality for that code, it would be poor teamwork to completely refactor it without discussing the reasons for the design, but at least it would have been within his area of responsibility.
This is also the reason why I like having code review. The review would have been a good place to raise his concerns. They could have had their eventual conversion earlier, before feelings were hurt or wasted efforts were made. It would even help in the opposite case, where the author's code was legitimately poor and the reviewer had good suggestions.
It's not a panacea, but I find code review helpful. In my experience, review was the time when a lot of knowledge-sharing occurred. Some might feel that only senior developers should review, but as a junior I learned a lot about why the author made the choices they did. I learned how people expected the code to change, about language features or pitfalls, etc., and it helped me grow as a developer.
This actually happened to me, (and his rewrite didn’t even work!). Having witnessed this first-hand, I can say that the impact of the code change was absolutely dwarfed by the lost trust. Unsurprisingly, I would come to find out that this engineer had what I would call the opposite of soft skills, and the notion of a “this irks me so I rewrote your code” has become a giant red flag for me.
Did you lose trust because the code broke (a legit reason to lose trust) or because your code was changed without kissing your ring (a problem with you being too attached to your code emotionally).
I mean, it's nice to talk about ideals, but it's also highly unrealistic that people are able to separate the code they write and what is says about them. Especially if you are part of a newly formed team, you must make sure that there is trust, respect, and clear communication, because this is the stage where most misconceptions occur. Upon building a successful foundation, you can test new team dynamics such as changes in communication that tradeoff that initial safety for more efficiency.
If someone spends a week or two writing a patch and you come in and rewrite it in an evening, that, in and of itself, is telling me something: You think your teammate is a worse coder than you, given you were able to solve it with "cleaner" code. You assumed that your solution was better, without talking to the person who authored it to see if they did things that way for a reason.
This could have been solved with a "why did you do things this way", and maybe you would have learned a bit about the thought process behind it, or maybe you would have gotten "yeah this could probably be better, go ahead and clean it up".
In a lot of cases, I definitely get the latter, but I always ask first, because if they had a reason to do things a certain way, they probably don't want someone stomping on a feature they're actively working on.
> If someone spends a week or two writing a patch and you come in and rewrite it in an evening, that, in and of itself, is telling me something: You think your teammate is a worse coder than you, given you were able to solve it with "cleaner" code. You assumed that your solution was better, without talking to the person who authored it to see if they did things that way for a reason.
The time they spent developing the solution is a sunk cost. The only thing that should matter is whether the afternoon rewriting it is the most productive use of your time.
Also, rewriting something is far easier than writing it from scratch. Just because you can rewrite it quickly doesn't mean the original developer didn't save you a bunch of time by letting you see a workable first pass.
The same gut reaction that causes you to go "ugh, crap code, I'm going to rewrite" is probably similar for the other developer, but with your rewritten version.
People tend to have an aversion to code they didn't write in general, and actually reading and understanding existing code tends to be a rare skill.
> Also, rewriting something is far easier than writing it from scratch.
Given the amount of times I see companies rewrite their product with half the features missing and more bugs than before, I'm not sure I agree with that.
Yup, the upfront effort to come to an implementation is like 90% of the work. Refactoring existing code is a lot easier than juggling product, design and technical requirements into a functioning solution. All the moving parts are already there to look at.
> If someone spends a week or two writing a patch and you come in and rewrite it in an evening, that, in and of itself, is telling me...
Or, what it might be telling you is the second person wouldn't have been able to clean it up in an evening without the first person having already spent a week or two on it. The second person was building on the first. Just because it was (hypothetically) less lines of code doesn't mean it could have been written without starting with the more lines of code and refactoring; it often takes more time/steps to get to fewer lines of code.
But don't get me wrong, I'm in favor of colleagues discussing code, not just changing each other's code without discussing it.
This blows my mind. If I had to discuss with my teammates every time I changed their code I would be way less productive. Everyone’s a little bit watched to the code they write, but iterative improvements and bumping the quality of the codebase is way more valuable than risking a fragile ego. It’s the teams code, we all benefit from improving it.
You can certainly go too far in either direction (talking or not talking), but to the extent talking matters when it does, it's not about "ego", it's about 1) "What did they know that I don't" and 2) Building shared understanding of the context and decisions and architecture.
As the OP says "A healthy engineering team is constantly building trust." Sometimes you already have enough trust and shared understanding to not need a discussion, other times a discussion is good.
I've humbled and have been humbled before because sometimes what looks like an ugly, unclean solution is the correct solution. There might be weird edge cases in the system that you catch, but the person doing the rewriting doesn't see until they push the code out and it breaks something in an entirely different part of the codebase.
That's the biggest reason why you should always, always discuss those changes with your coworkers first instead of going cowboy on it.
Or have tests. If your push comes with tests and I reduce the code by half while still passing all of them it means that 1). you didn't actually test everything you've done 2). you didn't write code as well as you should have 3). I'm an idiot and made the code less robust.
1) happens all the time. 2) happens some of the time 3). happens as often as 2.
This seems more like a post-hoc justification than any real rationale for change. There's nothing stopping your new code from introducing new testing requirements that weren't needed for the original code and it sounds like for 1) it could be equally a case of you blaming another developer when its actually 3) that occurred.
This also assumes that everything is actually testable, sometimes you write unclean code because the underlying framework or module you're working with has an entirely separate issue that you have to work around. Which again, might not show up because your work around code sidesteps the underlying issues.
This all boils down to the fact that you should just talk to the developer and see what they had in mind rather than assuming that you're right and they're wrong by rewriting it yourself.
Well, perhaps you wrote the code quicker because your colleague already wrote all the tests, and it was easy for you to verify the behaviour of your complicated-looking but short code was correct.
I disagree that people should NOT be attached to their code.
Having a sense of ownership for what you write can lead to higher quality systems where people are willing to stand up and fight for what they believe is higher quality.
BUT this importantly depends on the ability to compromise, admit being wrong, and change based on new information from the people who have a sense of ownership over their parts of the codebase. Like you said.
People that get really attached to their code tend to be the people not smart enough for you to want them fighting for their opinions.
Good engineers promote good ideas and general approaches to code base structure. They don’t give a shit if people go in and make changes as long as they don’t compromise the whole architecture.
The point isn't to stroke your teammates' egos. It's to find out why the code is written that way, and to build consensus around a different way of writing it. Consensus building is a great way to improve the cohesiveness of the team. It lets teammates operate with a set up implicit, shared assumptions, which means they can spend more time on business problems and less time on these kinds of discussions.
Generally, lone wolves worsen the team dynamics, because they reduce team cohesion.
> the priority should be to make them evolve toward more detachment from their work and acceptance that what they design "can" (actually will) be imperfect
In an ideal world, sure, but I've worked with enough people varying from professional-but-proud[1] to emotional-maturity-of-a-child to understand that this _shouldn't_ be prioritized, on grounds of feasibility. You may as well say "we don't need reasonable workdays, we need to teach employees to jettison their biological need for sleep".
I don't think it's _impossible_ to get people to be fully detached from their code, but I think that it's only possible in fairly narrow situations (perhaps in a small, tight-knit, talented team culture). Trying to get it to work in a general context is usually much more effort (and lower chance of success) than meeting it halfway with tweaks to communication style and process.
[1] To be fair, I'm quite sure I've slipped into this mode on occasion myself
> I don't want to ask of permission for an improvement.
If the team works in branches and uses PRs this is a non-issue, because stakeholders will have a chance to react.
It would be an issue if I went to work one morning to find a system I wrote rewritten without my having a chance to look at the change proposal. If your team allows/encourages this kind of behavior then I don't want to work there.
Honestly, no attachment to my code demotivates me.
It makes me less responsible for outcome and helpless. It makes it much harder to prove I know what I am doing. Since you modify things however you please despite me not agreeing, I am not the one at fault for maintenance issues. It also means that keeping conventions requires constant negotiation and frankly, if I would enjoy people trying to dominate me or me dominate them I would become manager.
And yes, it should not be about that, but it is about that with a lot of guys.
I also think it's fine to change the code someone wrote. Just because someone wrote it, doesn't mean it's the right way to do it. I often find myself rewriting the code, it's the natural process of code evolution. It just feels that it should be more readable, efficient etc.
Although, if the change is essential or it requires more pair of eyes, I'll just make a PR(MR) and let the people review it.
I think a better process is to discuss with the other dev, talking through how you think it could be improved. They (or sometimes you!) can learn something that will lead to improved in the future.
It's a bit like that analogy about giving a man a fish vs a fishing rod.
> I'm against the idea that people should be attached to "their" code (that is: the code they wrote).
Can people _reaaally_ detach themselves from the code they write? After all, programming is a somewhat creative profession. I can't imagine painters and writers detaching themselves from their work. I wonder how doctors and surgeons cope with this. Their work directly affects human lives and I can imagine a mistake weighing down heavily on them. However, I'd like to imagine that my doctor doesn't "detach" himself/herself from their work.
> I don't want to ask of permission for an improvement
If I check in working code that passed a code review, and someone else just decides to overwrite it with their version of aesthetically pleasing code (i.e not a bugfix) without so much as shooting an email in my direction, I would be outright offended. Any changes other than a critical bugfix/performance improvement can wait until you get a chance to speak with the original dev - this is common courtesy, not bureaucratic red tape.
Get over it. It is not your code (assuming you are working for a company). If some other developer wants to waste time making changes that don’t do anything, that’s between them and their manager.
Would you be “outright offended” if your code got changed after you left?
> I'm against the idea that people should be attached to "their" code
I agree with this, but it goes both ways. The codebase belongs to the team, not any individual developer.
Making changes for the sole purpose of making the team's code correspond to your personal preferences is the opposite of professional detachment.
The quality of a team is the quality of the communication.
If the architecture and design isn't well communicated, you get people going off in the wrong direction, or polluting a clean design. Also, people are most attached to their code based on their time investment. If you trash someone's code/changes, you trash their time, which could have been more efficiently decided with a conversation ahead of time. If you find yourself surprised by design changes, find design flaws in a code review, or have different definitions of "improvement", the communication in your company may be lacking.
This is about ownership. I agree there's no need to be personally attached to your own code. I have no problem with someone taking ownership of my work (understand refactoring/abstracting the crap out of it for the sake of Engineering). But the one day that person leaves the team, you have no choice but deal with that person's mental model, as nice or convoluted it was while refactoring. I've had this experience in every team I've worked with...
Of course, we all have had that experience! So you have a choice... realize it is going to happen and learn to deal with it in a productive manner, or just continue to get upset about it. I know which choice I'm working towards...
Are you suggesting that a dev acting like this without consulting the team should be considered normal/expected? Not sure how to address a move which is unproductive in its nature "in a productive manner" unless being proactive and preventing those rewrites to happen, unless needed and discussed with the team. But in my personal experience the team's decisions were always ignored by the coder in question.
edit: I should remind for context, I have no problem with the rewrite, but I have one with the team being handed over-abstracted code; I found that this happens mostly with more junior profiles, the same that don't stay around for long.
Since this is a fairly common dev experience (rewriting code), that the team meet together to set a policy as a group. Similar to code format standards, why not talk about code maintenance standards?
I get you now, and this is definitely an approach I agree with. I guess I was ranting about my previous experiences... since the last departure I encourage the rest of the team to plan and discuss such changes.
> I don't want to ask of permission for an improvement. If there is a need to have formal authorization of maintainers responsible for parts of the code, they setup just that. Otherwise, I'm certainly going to improve "your" code, in some rare cases without even telling you (that's not a goal in itself to do it behind your back, obviously, and I also value collaboration -- but that should not be a problem). The question that remains is: is it really an improvement. If you are not sure, then maybe don't commit. If you are, do it (following your local rules), and if it ends up being a mistake, yes, it will be reverted, and so what?
The issue remains that "improvements" and "good code" are too a large extent subjective. And the issue with doing a major refactor of code someone else wrote just days before on your own without consulting the original author or the team late at night as is described in the article (and doing it repeatedly) is showing poor social skills at best and being actively dangerous at worst.
Here are some reasons that you should consult the team and or the original author before doing a large scale refactor of recent code:
* Start off with the principle of charity and assume that the person who wrote that code and the people who reviewed it aren't complete idiots and you aren't some kind of savant who is able to see something they didn't while working on it for a much longer period of time.
* Since that person has been working on it for days they might have a perfectly valid reason for writing the code the way they did. One example is that they are working with the stakeholder and this is the first of a series of changes to deliver specific functionality and the code is the way it is because it facilitates those upcoming changes that might only be days away.
* If you have a habit of doing this and your changes getting reverted "ask forgiveness than permission" as you said, the team will begin to lose trust in you and it is bad for team morale. Consulting people before you make a huge change to their recent code makes them feel part of the process as opposed to something that is being done around them with them as spectators. On any team where that happens with any degree of frequency is not going to be a happy team in the long run. And managers will prioritize a happy team with high morale as opposed to a "rockstar refactorer" on the team who is impacting morale regardless of whether their refactorings are error free or not.
* It is fundamentally against the spirit of collaboration. When someone releases some new code there is a degree of inherent collaboration in there, it might involve multiple developers, stakeholders, implied functionality that was not part of the original story and at the very least the code reviewer. You unilaterally making a change without even taking 10 mins of your time to discuss it with them is impolite at best and wasting company resources at worst if your "improvement" ends up being reverted and all that work and time could have been saved had you bothered to have a 10 min convo before starting.
I dislike the panning of "clean code" in the article because I really think that duplication can be clean as shown. Clean code has a quality of comprehension, not simply a lack of duplication.
The example given is pretty typical actually - where you can frame the problem in a way that your solution ends up without duplication yet the solution requires mental gymnastics to get your head around. We've all been there in the pursuit of the perfect code.
"Write code that immediately makes sense to someone with half your smarts" is a far better guideline than "don't repeat yourself".
I disagree strongly. Raw code, that anyone who knows the language can read... is readable!
The OP’s code, which added an invisible abstraction, is “clean” I guess. It’s less repetitive anyway. But it’s not readable. You need to go read the other file to be able to read this one.
It’s like a clean kitchen with everything behind cabinet doors. Yes, there’s less to look at. But you can tell what’s happening without opening all the doors.
> ... My code traded the ability to change requirements for reduced duplication, and it was not a good trade. For example, we later needed many special cases and behaviors for different handles on different shapes. My abstraction would have to become several times more convoluted to afford that, whereas with the original “messy” version such changes stayed easy as cake.
The article's starting example and revision clearly illustrate how the original code was changed. Duplication was removed - just like so many advocate for.
Unfortunately, the article doesn't state the changed requirement that broke the new design, leaving that to the reader's imagination. And I can imagine cases in which the revised code would be superior to the original.
It sounds like the problem arose from the need for specialized handles on different shapes. If so, I don't see how using various createXHandle functions (where "X" is an identifier associated with a specialized handle) would necessarily cause convolution. It could become quite convoluted if not done with care. For example, keep the createHandle method, but pass customization parameters to it. But that's really the naive way to go.
What we'd be faced with is nothing more than the kind of specialization requirement we usually see with Object Oriented programming. We can use composition over inheritance to great effect there. We can extend the author's refactored design and end up with something clean and maintainable.
If there's something that I have learned about refactoring code that is repetitive into "cleaner" shorter code, is that the refactored version looks better but it's way harder to understand. When other people try to look at the "cleaner" version they have to spend more time trying to understand it and mentally untangle the abstraction.
I like syntactically shortcode as long as it's clear. I also understand that sometimes shorter code has some small performance advantages that can add up in languages like JavaScript where the size of the files can become a loading speed bottleneck.
But to be honest it really bothers me when someone tries to make perfectly fine and readable code into something different just to satisfy some weird intellectual urge to make things more abstract.
Sometimes long code is not only easier to read and understand, but it also helps to create a better technical outline of decisions that otherwise will get lost in the reasoning of whoever is writing that code.
It can be harder to understand; but, even worse, it can also be harder to change.
Which is a bit ironic, because if you asked one of us to explain why we "DRY", we'd probably say something about it making the code easier to change, because a change only needs to happen in one place.
The problem is that whenever you make abstractions, you necessarily limit your axes of flexibility, abstractions always have certain sorts of uses in mind, and built-in implicit limits. Abstractions make certain explicit things easier to change, while making all sorts of implicit things you haven't even thought of yet -- but which might come up later -- harder to change.
Still, obviously of course sometimes abstractions and DRY are the right thing to do. The trick is knowing which is which, which you get better at with experience at software in general as well as the specific domain -- and I'm not sure it can be systematized or formalized, I think it's still a craft.
> But to be honest it really bothers me when someone tries to make perfectly fine and readable code into something different just to satisfy some weird intellectual urge to make things more abstract.
They obviously don't think it's "perfectly fine" or that their changes are some "weird intellectual urge" or they wouldn't do it.
It's a subject for debate, and to make your case to that person, you need to do better than vague claims that what you're doing is "fine" and their idea is "weird."
This is by no means a generalization. You’re right. Most people are coming from a good place when they make suggestions that change your code. This especially true for code reviews where I believe everyone has the same goal.
I’m referring to some isolated cases especially those that involve receiving a codebase that was created by someone who is not working on it anymore. I think this is where that strange urge to refactor everything kicks in for some people.
That's something I see most people struggling with, including myself, in particular when learning a new language.
e.g. in kotlin it can be tempting to use a ton of complex operators and rely on the intricacies of let vs run vs apply .. even if the resulting code takes some thinking to understand.
At the beginning of my career, the code I was the proudest of was a gigantic piece of code doing some very complex stuff. It took a lot of effort to understand. Nowadays I am extremely happy when I take a complex problem and solve it only by writing very simple code that even the most junior dev we would hire can easily understand.
The downside I can see is that in some orgs more respect is given if you write black box code making you unfireable but that's a good sign you don't want to work there/need to make the culture evolve.
That often depends on how familiar you are with a code base, and your personal knowledge about for example linear algebra or whatever specific thing is going on somewhere.
Obviously there is no objective "right" in this matter, but always writing code that the greatest idiot can understand obviously isn't good either.
Another problem with the duplication approach is that I'd you need to make some change later on, you need to apply it to 12 places and are pretty much guaranteed to mess up one of them, since this is the most boring task ever and your brain switches to lie power auto pilot after the third instance.
Either you need to change 12 places and test those 12 (or whatever) use cases, or make one change and still test those use cases. Only with the DRY function, as the article says, it’ll wind up accumulating options and configuration to accommodate all those use cases, becoming a big hairy abstraction that’s harder to understand for any of those use cases, whereas with the duplicated functions they would all evolve separately and in isolation and remain independently simpler.
It's funny how one can feel ashamed for not using the esoteric language features.
Whenever someone new joins the firm, and they know all the cool operators ect., I get so nervous around them! But I've caused myself so much stress trying to do something clever, having to go back and update the behaviour, only to curse myself for trying to be clever!
> Whenever someone new joins the firm, and they know all the cool operators ect., I get so nervous around them! But I've caused myself so much stress trying to do something clever, having to go back and update the behaviour, only to curse myself for trying to be clever!
On the upside, this process is all about learning. It's easy to be clever, it's hard to be clever and successful. Making these mistakes and fixing them is how you turn knowledge into wisdom.
Ive been writing a react application with my wife for the last couple of months (something we started at the YC hackathon in November actually). It’s been a few years since I’ve written any react, so it’s been nice having her there to help me.
Maaaaaan...it’s like I’m learning how to program again. There is SO MUCH focus on shorthand and abstractions and stuff, all seemingly in the name of being “concise” that it produces code which to me is extremely difficult to read. It feels like I’m doing everything “wrong” when I read the documentation, and this would have been a lot harder without her there to pair program with.
Especially when I’m switching between golang on the backend (where I’m comfortable) and react on the front, it highlights how much they seem like totally opposite philosophies.
I’ll say this, which is probably mostly due to me growing up on python: code should read like a description to the computer of what you are trying to do, and even if somebody barely speaks the language you’re writing in, they should still be able to read what you’re doing. This doesn’t just help others, it helps you need less cognitive overhead when reading your own code back.
Saying the same thing with less characters rapidly approaches the point of inverse return, and it seems like most of the recommendations in the JS world right now just want to keep pushing us further and further into that inversion.
What are some examples of React's focus on shorthand and abstraction? React is fairly small and doesn't really encourage much at all. The one 'battle' that I often find myself in is "should this be a seperate component?" but that's more of a people problem and something that every language and framework will have.
Of course there might be some completely valid reason for this, but it’s baffling to me why you would want this type of shorthand, instead of just explicitly writing what you mean.
Of course in golang this could be something like:
var someVariable //is false
someFunction(someVariable)
But I would (personally) not write code like this if I could avoid it. It would be:
someVariable := false
someFunction(someVariable)
It’s a little bit longer, but imo takes slightly less mental overhead to read.
It's a shorthand, and its similar to what HTML does.
If you don't like the optional shorthand, don't use it? I don't understand how this is something exclusive to React, or something it specifically encourages.
I'm with you on this one, I've seen people rewrite things like this, and to me this is just personal preferences, and it's something that the team need to agree on.
The author of this piece was not engaging in a DRY activity even if he thought he was. He (perhaps unwittingly) admits to it himself:
> My code traded the ability to change requirements for reduced duplication, and it was not a good trade.
The acronym DRY was coined in The Pragmatic Programmer, and the author of that book makes it clear that DRY is about not repeating requirements in code. You definitely don't want to deduplicate similar code if it involves multiple requirements because then you entangle yourself, which is what the author of this piece did.
> DRY is about not repeating requirements in code.
Exactly! A lot of developers, even experienced (and I would include the OP, even after his supposed lesson) seem to not get this: duplication is only bad when what is being duplicated MUST behave the same by definition (i.e. if you change one but not the other, the other would've become broken as its definition would still be the same as the changed one). Otherwise, two similar or identical codes are NOT duplicated in a way that violates DRY, they are just coincidental - as they might become different later - so the "duplication" in this case must NOT be removed (but perhaps a smaller portion of the code might be duplication of definition, which should be then extracted and de-duplicated). It really isn't that hard.
In my experience, it's often non-obvious when you have multiple "requirements" and when you don't. Except in retrospect when you realize you painted yourself into a corner.
Your experience is different, you always know when you have multiple requirements and should DRY, and when instead you have "incidental [or accidental] duplication" and de-duplciation would not be advisable and thus not be called "DRY"?
If so, I wonder what leads to our different experiences.
The Pragmatic Programmer is pretty zealous about DRY. For instance, the authors describe how they inserted sample code snippets into the book using specialized scripts developed for that purpose. A simple copy paste wasn't good enough (see p. 100-101). Granted, they had wanted to make sure the code snippets were tested and updated as needed, but repeating code anywhere seems to be a bad practice according this definition.
Wasn't it the case that the requirements were the same at the time the code was written? TextBlock.resizeTopLeft(...) and Rectangle.resizeTopLeft(...) did the same thing for the same reason, not by coincidence. The issue was the possiblity (and eventually the reality) of future divergence.
The issue wasn't divergence. The issue was that there was no requirement that TextBlocks should behave like rectangles.
It's very possible that DRY could apply even where divergence is expected. Suppose we had a requirement that "A user should not be able to resize a TextBlock if they are not allowed to read the text." At the outset, the read & resize permission logic might be identical & DRY is easy to apply. Nobody would be surprised if eventually resizing permissions became more restrictive. Even at that point, DRY applies. If you repeated the shared logic, and later you add new restrictions to the read permissions, it'd be easy to introduce a bug where you forget to restrict resizing permissions accordingly.
> The issue was that there was no requirement that TextBlocks should behave like rectangles.
Sure, though that’s not quite what the new code did.
Instead, it added a new abstraction (createBox), and then used that to incidentally assign the same behavior to TextBlocks and Rectangles. Your change could easily be accomplished by changing the last line to check the permissions and call createBox or something else, depending. I’d guess that createBox would also be useful for many other shapes, but if you wanted to do something totally different, like a star with handles at each point, you could just....not call createBox, and write a totally independent “let star()” function that does its own thing; it doesn’t seem like a base class constructor is forcing you to call createBox or anything.
So....this code doesn’t actually seem that bad to me. What am I missing?
> The issue wasn't divergence. The issue was that there was no requirement that TextBlocks should behave like rectangles.
But they had the same current requirements. So, ideally, the implementation of the common behavior should have been shared but shared in a way which didn't tightly couple the two consumers with each other, so that it would be simple for either to stop using the shared code.
If they had separate classes for TextBlock and Rectangle, then likely there are separate requirements involved. It may have been that at the time they needed separate classes the requirements for resizeTopLeft was identical, but they are still two separate requirements.
I suggest everyone take a minicourse or read a book on systems engineering - at least the part where they discuss requirements. The fact that two distinct entities have an identical requirement for something does not mean there is one requirement. It means there are two requirements which (for now) happen to be identical.
The key phrase is happen to be. You do not apply DRY to separate requirements even if they are the same/similar.
Look at it from a testing perspective[1], which is similar to how a systems engineer will look at it. If you would write a separate test for it, it is a distinct requirement. (Actually, the ordering should be the opposite: If it is a separate requirement, it should have a separate test).
> I suggest everyone take a minicourse or read a book on systems engineering - at least the part where they discuss requirements. The fact that two distinct entities have an identical requirement for something does not mean there is one requirement. It means there are two requirements which (for now) happen to be identical.
I'd love to do this -- do you have any recommendations for a book like this?
My rule is: if I find myself sorting things into two buckets, one of which I’ve given a name that obviously means “good” and the other “bad,” that’s a sign I’m deciding emotionally rather than rationally. It’s time to take a step back and think about the distinctions between the things in the buckets harder.
Sorting code into “clean” and “dirty” buckets is a good example of this. Both bucket names are completely subjective, with “clean” obviously meaning good and “dirty” bad. As the article indicates, dig in a little deeper and it’s not hard to find objective ways the “dirty” code would actually be preferable.
I agree that not every "smart approach" is worth it if you sacrifice legibility.
But I don't think you necessarily need to ask permission to refactor code. On the projects I had the past couple of years everyone understood that code was open to be changed by anyone.
In practice we'd often ask "why did you do X" instead of just rewriting it.
I trusted the people I worked with on those projects though and if they thought they had to rewrite part of my code for whatever reason, that was fine by me. And vice versa, I changed their code and that was fine.
Well in his case, did refactoring the code to clean it up add business value?
There has to be a very good reason for me to refactor existing/working code for instance for performance.
I won’t refactor code to reduce existing duplication but I will refactor code if I see there is some functionality that I need elsewhere so I won’t just copy and paste.
Well, in that case, why do it now instead of waiting until “the future”?
If it’s needed in the future, the cost is not more than it is now. If it isn’t needed in the future, then he’s wasted his time. If something similar is needed, but it needs to be refactored in a different way, they are going to have to refactor it
There is no special case “trap”. Business requirements necessitated changes.
> Well, in that case, why do it now instead of waiting until “the future”? If it’s needed in the future, the cost is not more than it is now.
They're familiar with the code and its gotchas when it was written. In "the future" most of that context will be gone (people forget stuff, and change companies/teams/roles).
Better to get rid of the gotchas before you get burned.
> If it isn’t needed in the future, then he’s wasted his time.
The new version was still more readable to newcomers, and resizing stuff is pretty core functionality for a drawing program.
> If something similar is needed, but it needs to be refactored in a different way, they are going to have to refactor it
A second pass of refactoring tends to be easier than a big ball of mud. And in the worst case, they can always inline it and start over from the same state.
> There is no special case “trap”. Business requirements necessitated changes.
When requirements change you can either start adding special cases or rethink how it fits into the bigger picture. That matters in every layer, but it's especially important in the UI layer.
Unless you want to end up with an unusable Apple-style hodgepodge. In that case, go wild, I guess.
It depends on context. Ideally you have an automatic code review system that would remove a lot of these discussions and create sense that everyone is involved in the code bases.
In this context however it is pretty special because it it was a single commit that that he immediately refactored. In that case the more reasonable approach in my opinion is to take a post-commit code review where you talk to the dude about your thoughts how one can improve the code. It was also a pretty significant change when it comes to coding philosophy, so if you think you know more it is also a potential opportunity to mentor your colleague (or learn something yourself)
If it would have been a bug fix or an added feature it is no problem. Or if some time passes and most code is written in style X but this was not and then you rewrite it, also fine.
1) Changing someone else's code without discussing it with them. Seriously, wtf?! Even if the code is better maybe there are reasons the code is the way it is and the it's not worth making the changes for the overall progress of the business.
2) Misjudging the requirements. If those modules needed changes, then they should've been left alone. This story could've easily gone the other way with the author competently abstracting away messy details for an overall positive impact. This happens quite frequently, but obviously doesn't make for a good article because it's expected.
And a story about not having a code review process prior to check-in. How is a team of developers ever going to work together if they all write code completely on their own with zero input from or approval by others? That's why style guides exist.
You are not saying goodbye to clean code: you are embracing it!
“Clean code” is a semantically overloaded term, so discussion about it is inevitably mired in confusion. Every brain that first hears the phrase “clean code” will construct some plausible placeholder definition, and the most readily available one by analogy is “less source code”.
You can easily try and apply principles such as SOLID and then go on to produce an unmaintainable web of nonsense, because they are meant to be facets of a unified organising principle and not a bag of tricks for local optimisation towards the state of “clean code”. I’ve experienced that firsthand.
“Clean code” is great marketing, but I almost wish Robert Martin had called his theories “Martin Code” so there was no doubt what is under discussion.
I can't help thinking of this excerpt from re-frame's docs:
> Now, you think and design abstractly for a living, and that repetition will feel uncomfortable. It will call to you like a Siren: "refaaaaactoooor meeeee". "Maaaake it DRYYYY". So here's my tip: tie yourself to the mast and sail on. That repetition is good. It is serving a purpose. Just sail on.
Thanks for the link. Best as I can tell from the ClojureScript, this is in reference to a refactoring that would cross layer boundaries. Like taking the methods
1) Always do pull requests. No change should make it into a master without someone else’s eyes. Unless it’s a “get out of trouble” surgical revert that really needs to go out in an emergency.
This would have given the author to raise concerns about very duplicated code. We do that all the time with comments like “this could be a bit DRY-er, how about moving this bit to a common place with this other bit in the codebase that also does that”
2) when the author refactored code, he should have deffo asked for OG to review his code. May be his duplication didn’t account for edge cases.
Reviews are great. Make reviews a part of the system. Small, incremental reviewed code, with decent coverage, being shipped multiple times a day by CI, reacting to customer needs is how the best engineering orgs are operating.
He is right about some parts of over accounting for the future. I’ve had that happen to me multiple times.
Same with doing large refactor s of other people’s code. It’s not about code anymore, you gotta ensure they feel good about it too. Esp junior engineers, it has to feel that it’s still their work.
Building a transform tool should be a pretty well understood problem. Just take a look at what photoshop's transform tool does and you will see all the possible extensions you might need to support in the future (rotate, skew, distort, perspective, warp, transform origins, a bajillion modifier hotkeys, etc).
Of course don't go ham and support all of them up front. But with those future use cases in mind, it's pretty hard to write yourself into a corner. I would say Dan's refactoring looks fine other than needing more customization for positioning and handle behavior. But it seems easy enough to just add them later.
> Firstly, I didn’t talk to the person who wrote it. I rewrote the code and checked it in without their input.
If you want to modify a method that has 10 unique contributors. Do you really need to talk to different 10 people to maintain to make a change? That does not sound very effective.
And, most importantly: when you code as a job, all your deliverables are company's property. They are not yours. The company can do whatever they want with them, they don't need your opinion or approval. If they want to replace every piece of code that you ever submitted with an ASCII clown, then print the source code make a pinata from it, they can do that too if they want.
Secondly: version control. If you want the old version of some code, you can retrieve from version control right? You can leave a comment such as: "for a verbose version of this method, see revision <sha1>".
> > Firstly, I didn’t talk to the person who wrote it. I rewrote the code and checked it in without their input.
> Do you really need to talk to different 10 people to maintain to make a change?
No, you don't talk to all of the 10 contributors. But if you want to still work as a team you should talk to a few. Depending on the size of the change.
Some or even all of the previous contributors may no longer be at the company or are inaccessible. But a super quick discussion with 1-3 other team members should float if this is a good idea or not.
Obviously, if you practise pair programming, most smaller rewrites just need a consensus within the pair, and a more extensive change may be a good idea to get the approval of another pair, especially if some other team members were the original contributors.
There is no need to be precious about any existing code. Nor any need to be a bull-in-china-shop either.
> If you want to modify a method that has 10 unique contributors. Do you really need to talk to different 10 people to maintain to make a change?
I think the more apt example in this case would be if somebody made a modification in that file and you rewrote just their modification immediately after they done it without discussion. If the modification is necessary so soon it means that the feature/implementation were not discussed properly before and the process probably needs revisiting.
Of course if you are modifying part of the file, or doing some previously agreed refactoring effort then explicit author consent is not necessary.
> Secondly: version control. If you want the old version of some code
I think here we talk mostly about production code. The problem with relying on the source control for "I will revert this to the old version when it will become required" is that the code will evolve in subtle manners and it will be hard to roll back and apply all patches on top of it.
In my experience: don’t worry about duplication. It’s sometimes OK and sometimes a Symptom. Don’t fix the Symptom, fix the main cause:
Classes or functions that do more than one thing. The definition of one thing is one conceptual thing.
For example I’ve seen DTO objects with both wire concerns and Ui concerns. Splitting that out made a lot of code simpler and incidentally deduped some stuff.
So attack poor abstractions, not duplication. The S.O.L.I.D principle is a good start (even for non OO code)
If you are unsure, I’d always err on keeping the code as is, you can always refactor it tomorrow when you understand more. Instead: make sure the unit test coverage is adequate.
This should rightfully be on the front-page of HN. I've gone through something almost the same as Dan Abramov here, and I feel it's difficult to impossible to explain this and you have to experience it yourself.
The tricky bit is where to draw the line, which I've learned only after pouring thousands of lines of code and I understand is a bit different for everybody. The rule of three[1] was a very useful rule of thumb in the beginning, but there are many secondary variables that I unconsciously use to decide when to remove repetition. e.g.:
- Single file tends to be split less often, since refactoring it is easier. Example: the single file routing in React being repetitive is okay, but if there's multiple files with routers and custom logic I'd consider a helper a lot stronger.
- Conceptually straightforward APIs tend to be split more often, since it's easy to package and reason about, as well as design. Examples: cookies, warnings, kv stores, etc.
- Early stage projects tend to be split less often, since few things are not yet clear and being able to put everything in your head is a lot more important.
It's also one of the lessons that you should learn going to senior. To the extreme, someone insisting in removing this kind of duplication for the sake of it is often a sign of a junior dev (as in, because it's wrong and not trying to understand the codebase/tradeoffs first).
The more time passes, the more i consider coding to have a strong aesthetic component. What makes it complex is that there are two kind of "aesthetics":
- the code itself
- the abstractions that the code represents.
You can have very "clean looking" code (short functions, short files, no repetition, etc. ) that is in fact modeling a problem in the most convoluted way. And the other way around : a bit of repetition, but the concepts behind are completely obvious.
And most of the time, nothing falls completely into one category, and the way you'll decide where to draw the line is almost a matter of taste.
I think this highlights an important nuance to the Don't Repeat Yourself maxim. It's not just about whether two bits of code are similar now. It's about whether, when they change in future, they are likely to change in the same ways. If not, then DRYing up the code now is only making more work for yourself down the line.
DRY is good but like any abstraction it has tradeoffs. It might give you a warm feeling to do the refactor, but you might end up with code that's harder to understand and harder to replace.
What is this implying? That instead of adding a property/method to a class you should instead create a child class that inherits from the original with the new properties? Sounds like a great way to have a very complicated class hierarchy IMO
Exactly that. Just because you want new behavior doesn't mean all the other consumers of that class also want it. The old behavior should still exist, and have a name. I'm not sure whether the idea was ever fully developed into patterns for replacing deep hierarchies with simplified classes that factor out some inheritance from otherwise-unused base classes.
There are at least 6 giant and contentious subjects covered in this one, short article:
- Code duplication (when it's appropriate)
- Code cosmetics/legibility
- Code scalability/flexibility
- Code review (protocols, how to perform them)
- Accepted widespread coding wisdom (abstractions, and shirking them, in this case)
- What defines a "junior" engineer (there are wildly different perspectives in this thread)
- Common courtesy (coworker interaction, etc)
Almost all out these are highly circumstantial to the type of project, codebase, language, culture, etc. I would like to see a productive conversation regarding any of these topics, but discussing so much at once in such circumstantial ways just leaves us talking past each other (the threads here are pretty clear evidence of that). I don't think this article has enough substance to cover any of these topics in any depth.
It's like if someone asked "What vehicle is the best vehicle?" Some people chime in with how much they like their brand of vehicle. Others say how trucks are better then cars. Others say that electric is the only choice. Others say cars made after 2010 are the best. The question is too broad so the responses aren't effective.
Code is not clean : code is cleanable. What is clean today might not be clean tomorrow (Robert C. Martin, author of the book "Clean Code")
Trying to evaluate code cleanliness in isolation is a waste of time ; as it strongly depends on the way your requirements evolve (but it's not subjective). You only need abstraction points at the frontiers between parts of your requirements that change at different times.
Duplication is better than a wrong abstraction (Sandi Metz, author of Practical Object Oriented Design in Ruby).
Trying to religiously factorize every structural similarity you can find, for the sake of the almighty DRY principle, without taking into account how business requirements evolve, is counterproductive.
Code is being read, reviewed, commented, linted, compiled, test-covered, read again, contemplated, diffed, documented in weird diagrams, mentioned in READMEs, revisioned. More LOC = more places for dust and rot to accumulate and more places for bugs to hide and stick. LOC is the environment under which bugs prosper.
So having less code is a legit, noble, and to some extent practical state to strive for.
Nah, no offense to the react fans but I'd rather listen to Fowler than to this guy.
Sure, there are very specific exceptions where repeated code is an asset instead of a liability (and many of them are, appropriately enough, when dealing with graphics), but they are that: very specific. For 99% of situations, we ought to follow the principles (obligatory IMO).
Do you have specific experience that suggests that hyper-generalizing code before you even know about multiple use cases is beneficial over carefully extracting shared logic only when needed?
If I to pick one single practice junior engineers employ that ultimately bites everyone in the ass, it's a blind adherence to generic code and DRY at all costs.
> Do you have specific experience that suggests that hyper-generalizing code before you even know about multiple use cases is beneficial over carefully extracting shared logic only when needed?
I've gone over my comment three times now and I'm yet to see where I even implied something like this. Could you tell me how you got this from my comment? I do want to see how I could send the wrong message so I can word myself better in the future.
Just in case, one of the principles I follow is "premature optimization is the root of all evil".
Wait. What? Goodbye clean code? Clean code obsession is a phase?
Your only mistake was that you didn't ask the committer to adjust their code, you didn't discuss, but you went and changed their code and "pushed it to master" :) This is your mistake. Nothing to do with clean code. And the title is a very misleading, untrue click bait.
Another similar anti-pattern is acting in a way that implies the computer itself cares about such things. Other than efficiency concerns, the audience for code is other programmers, and those programmers over time. The computer doesn't care about your variable names, how DRY your code is, or even the elegance of your data structures except if the ones you have chosen cause it to run more or less slowly.
Another way to put it: if data is passed from one function to another in a way that would make any programmer wince, but the odds of it ever being re-visited by another programmer are near zero, and it affects nothing else, does it matter? No, unless you think the computer itself winces as well.
To me the #1 thing that fixes the social issues described in this post (changing code someone checking in code a few hours after they pushed it) is a code review process. The most valuable thing about always doing pull requests and code review is sharing information and having discussion about changes, not necessarily the immediate improvement of the code. A team will gain more knowledge about what everyone is doing by reviewing each others pull requests than they do from a daily standup by a pretty large margin in my opinion.
In this particular example, reviewing a PR would have meant that a discussion about whether or not that abstraction was a good change before the code was ever merged.
One place I've found this to be especially true is when writing CSS. I went through a phase years ago of trying to abstract any repeated styles into 'clever' oocss patterns but in the end it's often lead to a confusing mess, in part because of the nature of CSS. In the end I've found duplicated and verbose code to be so much easier to work with and maintain.
> I suggest to think deeply about what you mean when you say “clean” or “dirty”.
I'd go one step further: I suggest to think deeply. Period. I used to think I was a slacker for taking one or two hours to go for a stroll during a work day, thinking hard about my code, my problems, the architecture, the abstractions... and daydreaming as well.
I had to shut down my inner Jiminy boss who was telling me that an hour with no code written was an hour lost. And I am still occasionally bad at it.
In the last project I did, there was a strong deadline, so I went straight into coding, went into two dead-end before backtracking and changing the approach totally. Probably could have been saved by some hard thinking at the beginning.
Balancing thinking and doing is one of the Hard Problems™ of software development. I suspect successful methodologies work because they formalize the act of thinking about the code before implementing it. And that when methodologies fail the reason could be that they can easily be implemented such that people think more about the technicalities (assigning work, deadlines, estimation accuracy etc.) than the development (modularity, overlap with existing work, conflicts with concurrent work, security, third party dependencies etc.).
Someone once taught me a phrase I often use "Lowest common denominator coding." What it means is that you should code to the most common abilities held by the group of coders you are working with.
In practice what that means is when you are writing a piece of code you should do so in a way that the new hire fresh out of university with less that a years real world experience should be able grasp what you are doing fairly quickly.
Abstractions, using the latest and greatest additions to the language you are using and obsessing over reducing the code line count can help the code run faster, but it also increases the learning curve for people looking at the code for the first time.
When I look at his example I still sense a compelling sense to create some kind of abstraction. It's just that the example abstraction he chose to make is somewhat esoteric and arguably over-solves the problem. The author isn't suggesting a different, perhaps simpler abstraction, but forgoes abstractions entirely.
I find myself wishing the question of whether to factor code out wasn't binary. Like if you could transclude parameterized code. You'd get the benefits of not having to keep a stack trace in your head, while still having a canonical version of an abstraction. Like a macro, but always expanded in-place.
It's tiring to work with developers who have a very narrow focus on "correctness" and e.g. code style. It can result in a lot of extra work, extra refactoring, extra cleanup, extra rewrites, etc.
It's how you end up with a toolchain, build environment and release process that's worthy of Netflix, but you're just 5 developers, and it takes up all your time to maintain.
Always focus on how it's benefiting the business. Is my time better spent on something my customers need or want, than fixing something that's already working?
This reminds me of Wayne Gretzky's phrase "Skate to where the puck is going".
A really good coder will find clever ways to implement the requirements in the most DRY fashion possible. And it will be perfect. Until the next requirement comes along. This is skating to where the puck is.
At some point you pick up the knack to know where to just leave things dumb and obvious as possible, because something inside you is shouting that there's going to be some new requirement soon that breaks the abstraction you're considering.
These are all judgement calls. Sometimes it's good to refactor to avoid duplication, sometimes it isn't. Only experience gives some hints at which way to go each time.
If unsure, I tend to at least break down these cases down by writing low-level building blocks and express the duplicated code in terms of those blocks. Those blocks should be low-level enough that they can't have a say about how they should be used in all these cases.
You could say it's basically abstracting out the stuff that's so small there's no risk of overabstracting but that's not the point. Rather, it's like building a language for expressing the kind of problems that are solved by duplicate code all around.
I'm sure in the original example there would've been some things that are common for resizing all shapes. You would still have repetition under individual cases but you would replace raw math (I assume) with certain basic operations that you know are common.
Of course, this isn't a generic solution either. Just another step between raw duplication and finely abstracted model.
I don't use Ruby, and don't do OOP either, but my favourite talk is still Sandi Metz's "All The Little Things" and I think that's where she says "prefer duplication over the wrong abstraction." That's really changed me. I've since been seeing DRY and other misapplied dogma in a new light and have grown much over the years since.
Great quote. Link to the talk? When you say you don't use OOP, are you saying you use functional languages or you use imperative languages but don't do any of the OOP design crap?
It is interesting to see the parallels with my work with solid modeling in CAD. I'm constantly harping on the people I work with to think about what they are working on and structure it in a way that can be edited later (as well as stability). This often means a larger/less efficient feature tree, but it is actually maintainable. Just this week my insistence that we build the model to be flexible meant that we were able to fix something in 5 minutes instead of days.
Updating colleagues work without talking to them is something I often struggle with though. On one hand I absolutely hate it when others change my work without at least getting the history as to why I did it that way. Thus I try my very best to give others the courtesy of discussing it with them first. On the other hand, if something isn't being paid the proper attention, or they aren't making progress, sometimes it is a useful way to light a fire under certain people if asking politely hasn't worked.
> Rewriting your teammate’s code without a discussion is a huge blow to your ability to effectively collaborate on a codebase together.
Debatable. Detachment to one's code is as important as being passionate in delivering requirements and enabling people with your code. A better approach: Anyone should objectively weigh the effect of a change whether it is communicated or not. If it turns out better, the change should be allowed for the sake of delivery, improvement, and knowledge sharing.
> My code traded the ability to change requirements for reduced duplication, and it was not a good trade.
An excellent point from danabramov. Premature abstraction is an ugly creature waiting to ambush you couple months in the future. By that time you'll forget why you did it in the first place and have written a lot on top of that abstraction.
I've been through a phase where juniors and intermediates (a lot of them) are obsessed with abstracting duplicated codes (not their fault, they were at the wrong place at the wrong time while receiving the principle that abstraction is always good).
I meditated for a while on it and found a principle to avoid this issue:
Any abstraction MUST be designed to be as close as possible to be language-primitive-like. Language primitives are reliable, predictable, and non-breaking. If they do, they don't affect business logic written on top of it. If parameters are added, defaults are provided. They don't just abstract, they enable developers to express business logic more elegantly.
The challenge is to pick the part of the code abstractable to be primitive-like first and make it a top priority.
This is why language features like Rust async, Go channel-based comm, ES6, C++ smart pointer was such hype in their time and is used up until now. It also applies to enabling tools such as React, tokio, wasm-bindgen, express, TypeScript, jquery (even this which is not a thing anymore).
The problem is not abstractions or clean code, it is the lack of experience, lack of time to think and contradicting views of fellow developers in the team. I code for my own projects now, what a joy that is compared to working in a team for company x. Goodbye teams and good luck with your perfect codebases!
I never had this religion imposed on me in my career, except recently. Originally I designed hardware. There, if you change something it might cost $1m in retooling a factory or throwing away a defective wafer batch. Then, I worked for years in teams producing production code, where job #1 was to not break something. Pretty soon you learned that even an innocuous tidying up change can end up breaking functionality seriously. You also learn that if code is not pretty that probably means paying users are relying on it.
I think the culture of turd polishing code so it looks pretty comes from academia and people who are control freaks. That, plus only deployment where if you screw something up you can just push a fix in seconds. Except if your bad but pretty code leaked sensitive data or lost data..
The interesting part here would have been the actual code for each case, and the changes that it required later, i.e. exactly what the article does _not_ contain. Without it, we have to take the guy at his word, and it's hard to say we've learned anything.
For example, in any vector graphics program I've seen, an oval is defined by the surrounding box. The box defining an oval works exactly the same way as the box defining a rectangle, it makes sense for it to be that way, and it is also what users expect. It's hard to believe that they had a good reason to implement a different resizing behavior for ovals.
I think this is a great post, it covers a topic succinctly and agree with its conclusion.
On the topic of actually refactoring code I think we should consider the code as a variable - that is to say, sometimes these variables just happen to equal each other in which case they are two separate things. Sometimes two variables aren't just equal, but they're the same. That's when to factor out the code. Otherwise the second these two variables are no longer equal you end up in trouble. The secret is divining when something is for all intents and purposes the same as something else in this specific situation.
After all, they help in explaining the intent behind the given piece of code.
With well written tests even the most "clever" implementations become at least reasonably understandable.
Anyway, here's what I do to deal with this problem:
1. Write code (and tests).
2. Switch to some other task.
3. Forget about that previous piece.
4. Get back to it and judge if it's still readable.
If something is readable then it's likely to be modifiable as well.
One classic example of things that I learned to never de-duplicate is the routing configuration. In this instance anything that isn't an explicit list of URLs is usually too "clever" to be readable.
Agreed. Tests often have the benefit of defining better interfaces, so the implementation details often matter less and code duplication often works itself out on its own.
From years of schooling and artificially constructed engineering problems we've gotten used to, for some addicted to, the idea of end solutions fitting neatly into a clean edged box. We've grown up on complex physics problem with a clean integer solution, homes with clean lines and polished finishes, and engineering that's constantly pushing for smaller and faster solutions. So, naturally we want our code bases to reflect our experiences : concise, clever, code that feels like a clean integer solution. And while maybe it should be a common aspiration to write code that's 'clean, the reality of the natural world is that solutions to some of the most complex problems are not 'clean'. Furthermore, rarely is the true end goal of the code you're writing to be the 'final state' of logic. The code that we write is almost always for ever evolving and expanding use-cases; we are creating scaffolding rarely a 'finished' product. Furthermore, most of us are writing code in ever evolving languages so even 'finished' state code will ultimately be paved over with something more concise and efficient as the language evolves. The point being : there's a happy medium ( I haven't mastered it ) of reconciling with the realities of complex evolving solutions and our internal desire for clean elegance in our codebases.
I find this problem to be overblown. I've met dozens of dirty coders who happily copy-paste code without a single thought for maintainability and their colleagues. I've yet to meet a single clean code zealot.
It's impossible to tell from this example (because no actual code is included) whether refactoring the code was a bad idea or whether it was this specific implementation which was wrong. Clearly the math should sit in a geometry/linear algebra libraries and nowhere near the presentation layers.
Clean Code sounds like a virtue, but it wasn't ever supposed to be a primary goal, it's more about disciplined with the code you write. Architecture and Design are more fundamental.
Removing duplication is most often about Design. Very easy to introduce unintended coupling if the only motivation is to reduce duplication. Coupling, Cohesion, Composition, and Abstraction are generally more important concerns than clean code. Sometimes simply by trying to remove duplication you find a better design, but sometimes you don't, sometimes you tangle your design with couplings that don't really make sense. However, after your higher concerns are taken care of, you should keep the code clean.
As a sidenote, what the author did by going to extremes and then backing out of it is actually a really good exercise in learning some of this stuff. We are fond of our "rules" for producing good software, they often encapsulate key insights. But there are no real rules, there's just the vast interacting world of these insights and how they play off against each other and how we socially interact with others with often similar but slightly different insights to create software as groups. It can be done many ways and should be a constantly evolving thing.
As usual with code principles/guidelines, it's about knowing when to apply them, and how much to apply them. In the particular scenario discussed in the article, I think the better option would have been something in between.
I think most devs go through learning phases with code guidelines, code patterns and the like, when they first hear about them, actively trying to crowbar them into every line of code, even if it results in more complex code - because they treat these things as laws, rather than guidelines.
IME, good devs eventually grow out of this, and from experience are much better able to apply guidelines and patterns when it's "suitable" to do so.
This problem is sometimes exacerbated by management, who like to focus on metrics. For example, a project I recently worked on used SAST (SonarCloud), with all sorts of arbitrary restrictions enabled: less than 3% duplicated code, minimum 90% test coverage etc. Predictably, it helped make the code more complex than it needed to be, and led to mock-heavy unit tests that didn't seem to actually test anything - they existed only to satisfy SonarCloud (and ergo, management).
I would even take your learnings from your experience a step further and say that one should be proactive in talking to teammates and exploring ideas on how to improve the codebase if you think there is room for improvement. For instance, it may well be that inheritance is the wrong pattern for that given codebase, but there are indications that the current architecture isn't scalable.
With more investment, more research could have been put into looking at better suited architectures such as the Entity Component System architecture, where Entities such as Rects and Ellipsis are composed of units of functionality (position, resizability, etc), which is a proven architecture for this sort of application. Then, implementation would be a matter of getting team buy-in / weighing the refactor pros and cons with your team.
I really get annoyed by the lack of good tooling to deal with abstraction. I wish there were a way to view abstracted code with function calls (to some depth) inlined and macros applied. It would solve a lot of the difficulties with heavily abstracted code while preserving the advantages.
>Secondly, nothing is free. My code traded the ability to change requirements for reduced duplication, and it was not a good trade. For example, we later needed many special cases and behaviors for different handles on different shapes. My abstraction would have to become several times more convoluted to afford that, whereas with the original “messy” version such changes stayed easy as cake.
The opposite is also often true though. If the requirements had changed in a uniform way across the different shapes, then it would have been much easier to apply the modifications to the abstracted code. The trick is to have an idea of what kinds of modifications are likely in the future, and when you're not sure yet, avoid premature abstraction but ensure things aren't so ad hoc you'll have trouble abstracting in the future.
If you really want to remove duplication, try the other way first, like this:
where there is
// 10 repetitive lines of math
write a few helper functions, like
// 3 repetitive lines of math calling my helper functions
I don't known what math was in this case, but removing duplication should preferably handled in this direction to avoid introducing more abstraction and in this case keep a full separation of concerns.
I think the author got the wrong message from this situation. Yes, the refactor is bad, but I think they misidentified WHY it is bad.
> Firstly, I didn’t talk to the person who wrote it. I rewrote the code and checked it in without their input. Even if it was an improvement (which I don’t believe anymore), this is a terrible way to go about it. A healthy engineering team is constantly building trust. Rewriting your teammate’s code without a discussion is a huge blow to your ability to effectively collaborate on a codebase together.
Yes, but that's a social problem not a code problem. It doesn't mean your code was wrong, it means the process by which you integrated that code was wrong.
> Secondly, nothing is free. My code traded the ability to change requirements for reduced duplication, and it was not a good trade. For example, we later needed many special cases and behaviors for different handles on different shapes. My abstraction would have to become several times more convoluted to afford that, whereas with the original “messy” version such changes stayed easy as cake.
It's telling that there isn't a code example for these "special cases". My guess is that allowing them as configuration points in the de-duplicated calls would not have been that complicated.
I have heard this "problem" brought up before, and in my experience, it's not nearly the problem people claim it is. Yes, there is a balance to be struck: you don't want to remove duplication before you understand what's really duplicated, hence the rule of three (and I'm even fine going well beyond three). But I've yet to work on a codebase that wasn't well on the "too much duplication" side of that.
The REAL problem I see with the refactor is that the new data structures don't correspond to business objects, the way the user thinks about them.
The OP is using clean as a language figure. He attacks the idea of clean code but not as the hygiene of the writing but as in removing what he thought to be unnecessary repetitiveness only to later come to the conclusion that it was adequate yet somehow repetitive.
Why repetition is the opposite of cleaneness in the first place?
What he calls here clean code (and dirty code) is modelling that piece of software with or without repetition and not legibility (which would be a more reasonable opposite concept of "clean code").
He reflects on one thing right tho, the idea that he jumped into abstractions too soon, hence he refactored removing repetitions but injecting a model that was inadequately modelling what was needed in the project in the first place. And worst, he did that without discussing design in advance with any colleague first (that was the biggest mistake IMHO) loosing time and efforts for all involved parts.
So the real underlying issue was that code repetition blinded him of design priorities. Nothing to do with the ability to do abstractions and write clean code on top those abstractions. If you do clean code in the wrong abstractions cleanness (or its opposite) will be irrelevant.
But a huge problem is that he again (with his article itself), jumps too soon into the wrong conclusions: attacking the skill to do abstractions and the skill to write clean code. Readers will be induced to confuse the real thing for the language figures he is forcing with that text.
The skill of imagining good abstractions for general concepts and writting well are things that go way beyond writing software code (all professions needs these), hence attacking them makes no sense at all and a successful attack on them would only promote some degree of general confusion (including areas beyond those directly affecting the professional career).
Back to the anecdote, what would have been good?
1. Prioritize where you want the flexibility and power of the design first and care about luxury details like "code repetition" later.
2. Discuss with colleagues in advance (specially those who will review your merge requests) to agree on what's to be done and what's going to change.
3. Actually implement the changes and open MR.
PS: in favor of the OP, he was generous in sharing his experience so others can learn from it.
> Why repetition is the opposite of cleaneness in the first place?
It's an idiom after the book by Robert Martin, whose central theme sits around heavily refactoring code to remove repetition and in fact even non-repetitive code. The first few chapters hammer the concept that functions should be extremely short and aggressively refactored to compose them into subfunctions. It essentially coined the concept of "clean" code in these terms.
Good clarification! Thanks! I didn't read that one, although, I generally like Rober Martin, I really don't like when language figures go too far.
If pushed one bit too far it becomes propaganda in a culture war instead of a healthy intellectual honest discussion. What we know as flamewars is an example of that.
Nevertheless, I used to work with someone in a startup, who was crazy about over cleaning the code. Every time there was something that has to be cleaned and when requirements changes or new feature is needed we had to do all the fancy stuff again because of the over cleaned code that wasn't flexible enough.
I don't agree the issues OP later discovered has anything related to `refactoring` itself, but more a issue of premature optimization.
my 2 cents, an interface is defined, it shall not be modified for no good reason. Even if you do, you can still have some way to make sure it can be compatible with the original system. and I don't believe you can't extract some common behaviors of those repeated code, and use them within the interface, refactoring doesn't mean you have to rewrite the whole project, it can be done by piece by piece. reducing a line of duplicated code can save you a lot of efforts on maintaining the project in its life-cycle. a lot of times, I have seen a code change was made to fix some bugs were forgotten in other place which duplicated the same original code.
The examples here pale in comparison to what I see. Rather than dealing with "ugly" code, I often come across incorrect code. For example:
function do_add(a, b) {
return a - b;
}
This is an obvious mistake, one would think that the solution would be to simply fix the do_add function, but that will end up breaking a tons of other stuff that relies on the broken functionality, and instead what you get is this:
function do_add_real(a, b) {
return a + b;
}
Yes, seeing this in production code is infuriating. But simply "fixing" the issue is generally not possible.
A sign of a seasoned developer is to see this type of horrible code, take a calm breath and focus on the issue at hand rather than scrambling to fix something that works but is ugly.
I worked with someone that considered code repetition as a force of evil that needs to be fight at all costs. Some of his refactors were valid, other not so much and made everything more awkwars to work and more resiatant to change. It wasnt a pleasent experience.
> dragging each handle in different directions affected the shape’s position and size in a different way
The cause of the duplication itself is probably wrong: why would you do it in different ways? Treating ovals and rectangles the same way is simpler for the users.
> My boss invited me for a one-on-one chat where they politely asked me to revert my change
No justification given here? Is it because the boss did not give any, or is something omitted?
> Once we learn how to create abstractions, it is tempting to get high on that ability.
People should start forbidding themselves to use that "abstraction" word for anything. That's just code factoring.
> A healthy engineering team is constantly building trust. Rewriting your teammate’s code without a discussion is a huge blow to your ability to effectively collaborate on a codebase together
It seems here that this was taken as a hostile overwrite. Nothing to do with "abstractions", "clean code" or factoring.
The problem here is not "building trust", the problem is that they did again something that was already done without discussing the issue they see in it before doing something about it.
This is a methodological, not social, mistake.
> How exactly do they affect the way the code is written and modified?
The usual FUD, the actual trap many people fall in. You cannot predict the future. You should not try to predict the future, that's not your job.
I understand some may be confused by the fact the etymology of "programming" means "write before" or "write ahead of time", but the meaning is that we write what something is going to do; we are not actually writing what will happen for things that are not under our direct control.
> Let clean code guide you. Then let it go.
No. What they call "clean code" is doing the right thing at a specific time. If the requirements change, then do the right thing again. That's called code maintenance.
1. If you broke a code, don't generalise it as an inherent problem of clean code. Just find what exactly you did wrong.
2. If the only reason you refactored it was because you felt it's not clean, don't do it - that code doesn't look terrible to me anyway.
3. To do this kind of refactoring you need good test coverage. So, start there. This also makes sure that you understand the specs well enough to refactor.
4. Clean code is good. It's a skill to know what is good code and what is bad code, what is better code and what is worse code and when to give in on clean code and compromise. Practice it.
I think saying repetition makes code worse is like saying repetition makes prose worse. Both code and prose convey ideas. Repetition is a tool. You can overuse it, and you can underuse it. But it is a tool. Not a flaw.
Is this the same disease that leads to the love of Haskell by those particular zealots (more not all Haskell programmers) who write yet another monad tutorial to spread the word of the one truth without writing any useful programs? Seeing "ah this is sequencing, so i can abstract that..." And so on. Haskell gets you to higher and higher levels of abstraction in your programming but doesn't seem to guarantee you'll get a useful program at the end of it.
(Yes there are useful Haskell programs, obviously. Just less of them than monad tutorials by at least an order of magnitude)
There are many times "clean code" is a good goal to have. It depends on what you mean by "clean code" and what you mean by "deduplication". As with many things in software engineering, the best answer to whether you should strive for "clean code" is "it depends."
If you have a utility function that you are redefining in every file, literally copy/pasting it everywhere, then you might as well create a shared function and import it and use it everywhere instead. This is an example of a good goal.
>Obsessing over clean code is like reorganizing your clothes closet on a daily basis. If it makes you more productive to do so, do it. Too often, however, it's done compulsively and is counter-productive.
I think, for me, I am often on two sides of the situation.
1. I'm moving fast and writing code to get things done for the customer and the product.
2. I have enough time to slow down and write or refactor "testable" code.
Often 1 results in untestable code. Often 2 results in a refactor or most likely a rewrite to create testable code.
This is an important difference. Optimizing to write code just because you think is clean (in the sense of shorter, reusable code) is often confusing to most. This also leads to pull requests that grow large and are hard or time consuming for others to review.
I'm finding this kind of issue coming up a lot with the current abhorrence with polymorphism and inheritance.
I like using interfaces and protocols, but I also still very much use inheritance. It's a fundamental tool that was invented for a reason, and, sometimes, it is the best tool for the task.
I've probably weathered just about every "paradigm shift" that has happened in software development. At one time, using variables with names longer than four characters was considered bad programming.
Anyone remember GOTO?
Some older constructs (like the two above): good riddance. Others...not so much. Structured Programming, which was declared The Mark of Satan, at one time, is still very much the basis for all our work.
I love a lot of the new tools and techniques, but I still mix in a lot of the older stuff when I write software. To some, this is "unclean," because it doesn't tick some arbitrary "büzzwürd du jour."
Simple, solid code is always a great starting place.
The author talks about removing repetitiveness (DRY). I think that's excellent, but, in my experience, I need to be very, very careful when I do that, as the original author may have tweaked just one little line, in one of the clones, and my refactoring may break things; sometimes, not until it's been out to the customers for six months.
That's pretty much de rigueur for any refactoring; not just DRYdock. In my experience, having some robust unit tests and test harnesses in place is absolutely required (and development branches -yay new-fangled VCS!).
I tend to write code iteratively. I'll start with some naive, sloppy code that works; maybe not well, then refactor it in stages, testing the heck out of it; each time.
I used to work for a Japanese company. I had many differences with my Japanese peers, but they were the most disciplined programmers I've ever encountered. Whenever they would modify code, they would leave the old code in there, but commented out, and add some comments, explaining what their new code does.
Made for some pretty verbose source files, but it was immediately apparent what was done, and why (I think the practice began before most good VCSes were invented). It also gave you the original code to copy and paste, if necessary. Very old-fashioned, but it made their changes (and bugs, therein), easy to understand. It also helped because the code was often stepped on by many programmers.
I'm thinking that a lot of folks are relying on commit comments to explain changes; which is good, but adds extra time to figuring something out.
The thing about the handling of different shapes is that they have idiosyncrasies. Resizing one side of a square resizes all of its sides; resizing one side of a rectangle resizes two out of four sides; resizing one side of an arbitrary quadrilateral only resizes the one.
These are very different abstractions, with different behaviors. Even though all shape functions technically use the same code, they're not conceptually doing the same thing. Linking these abstractions by way of a shared extracted functions is inviting bugs.
The author seems to only look at a narrow case that supports his argument. Imagine you start out with 2 shapes and 4 directions, but you eventually get additional requirements for 20 more shapes and 4 more directions. Now you have much more code, introducing room for errors, making it harder to change, and making it harder for new employees to understand.
I agree that he should have told his coworker ahead of time, but I also think his coworker should be open to understanding why his code may cause problems and be ok with it being changed.
One way I think about this. It seems the earlier version of the author's self was thinking about the state of the code as a sequence of atomic states. When you start thinking about your code more as a constantly evolving organism, the BEST version of a particular section of code isn't necessarily what is BEST at the moment. But because there are logical branches that your path can take as the software evolves, your code at this moment is best when it can more easily accommodate the best path to the future.
The only cost of duplication is when you're making changes to it. Let's say that the cost of typing, checking and finding the code to change is "t". Duplication makes it "n*t". Abstraction, in case of no special case, makes it "1t".
So if "n" is somehow big, it'll be costly to make changes and abstraction makes sense.
However do not abstract similar code for different behavior early (as in OP's case), but abstract similar behavior early (ex: file system, db access)
The article didn’t mention if the team that wrote the dirty code had done a code review. If they did, it doesn’t mention whether the author of the article looked at the interaction between the code author(s) and the reviewer. Perhaps this was discussed and the repetitive code accepted with good reason?
When code is merged, it should be too late for further discussion. The review is done and flaws like this is debt that should be taken care of the next time someone needs to touch it.
Sorry for the off topic grammar question, but am I the only one who finds it confusing how people have started to use plural pronouns to refer to individual people?
> My boss invited me for a one-on-one chat where they politely asked me to revert my change. I was aghast. The old code was a mess, and mine was clean! I begrudginly complied, but it took me years to see they were right.
The first they sounds like the boss and the colleague were in the same room, but it can't because he says one-on-one. Still, presumably both the boss and the colleague wanted the revert. But the second they is really ambiguous. Was the boss right or the colleague?
Probably, I didn't even notice it in the OP, and if some other reader noticed anything, they were probably able to read on without much confusion anyway.
(Did you find that one right there above confusing? I suspect not?)
Indeed. Maybe my English skills are fading with time, but my impression was that it works there because it's unclear who you're talking about. "Some other" is singular, but multiple people could have noticed that so it makes sense to use the plural there. It's kind of like how in American English they say "France _has_ won the WC", but in British English they say "France _have_ won the WC" (because a team has multiple people).
But saying "The boss fired me. They are awful." sounds no different than "The boss fired me. He are awful." or "They is awful." My auto parser is failing and I have to think it through.
Note of course you don't say "they is awful", you still conjugate they with "are", even when singular. For an unknown person, for a specific person, either way.
There's a balance, as is often [always?] the case in inversely related factors like code "cleanliness" (hard to talk about since it's a little vague of a word) and developer "velocity": pristine code can go awry and became "fine china" code that's too verbose or restrictive to effectively change and maintain; likewise, spaghetti code, maybe the opposite?, needs no introduction.
One similar experience i had when I was starting out was when I created a PR and my colleague instead of suggesting changes, made the changes himself, approved and merged my PR. I told him directly that it’d be nice if he’d have suggested the changes instead of making them himself. He thought I was being protective of my code. I was never able to have a good working relationship with him anymore and eventually left.
My experience is that good programmers prefer extra abstraction and bad programmers prefer extra repetition. Given that we need to work with bad programmers, sometimes it makes sense to make things easier for them. But repetition is wrong and a source of bugs. Each time you give in and keep a repetition that you know it should not be there you are crippling yourself to make us programmers a commodity.
Clean code is not about eliminating repeated code. Clean code is about readability. It's a not a rule. It's a programming philosophy.
People set rules but we should always see the reasoning behind the rules instead of just following them blindly.
Personally every time I refactor some piece of code, I think "Does this make code easier to read and understand? Will this save me and my colleagues time in the future?".
I would just like to take a second to compliment this website. It loaded instantly even in that weird broken WebView that Materialistic uses, the dark mode switcher was nice and obvious but not in my way and switching between pages is faster than basically any other action I can take on my device (local, native apps take longer to switch tabs than this website). Well done!
Duplication isn't necessarily 'messy'.
In the example, it's very easy to see what's going on, even if it's repetitive, since it's fairly uniform.
Adding an abstraction layer often adds complexity, and really does get messy/complex if you subsequently have to deal with specil cases and exceptions.
I think this guy learned exactly the wrong lesson. The lesson he ought to have learn is to avoid premature optimization - he tried to refactor the code too soon after is was originally created. He should have waited a while so that the additional requirements were fleshed out before trying to create abstractions.
I know it's completely tangent to the point of the article, but is anyone else wondering why it took a week to implement scaling objects in a visual editor? The codebase? The coder? Documentation requirements? A combination of things?
I want to understand what causes projects to lose velocity, and try to solve those issues.
If required to test all those duplicate lines. The author would make the right abstractions in the pursuit of laziness/efficiency.
Overstepping and not reviewing and going through proper code review process isn't a reason to ditch reasonable abstractions. Definitely some conflating of topics here.
To me, clean code is a separate concern from coherent interfaces. Your change changed the interfaces and although I don't have deeper context, I personally prefer the simpler original interface. Have good interfaces first, then use clean code principles to handle implementations
I would also say it is waste of effort working a late night rewriting existing code that actually worked. Why waste your time on over-polishing when s/he could’ve left work early and get a good night’s sleep? I think that is the more important perspective.
> We were working on a graphics editor canvas, and they implemented the ability to resize shapes like rectangles and ovals by dragging small handles at their edges.
It's a dirty shame their development team had to build a graphics editor canvas from scratch.
It would be a shame if this article about specifically DRY created negative sentiment towards the Clean Code Book as a whole – which is full of great advice, not only related to duplication, and a worthwhile read.
There is one issue that I haven’t seen mentioned: testing.
The real problem with duplicated code in my opinion is that it either remains untested or bloats your test suite. And a bloated test suite can slow you down to a crawl.
The point of dry is to avoid having to redo code review and security analysis which can be quite lengthy and onerous. Fixing Bugs in many places is another thing, but not the main thing.
In a nutshell this captures my argument against functional programming, design patterns, and this whole code fetish cargo cult.
I care about what the code does, not what it looks like. If two different things compile to the same hardware instructions, then it makes no difference to me. Repeat yourself if you want. Make a bunch of indirections via factories and generics if you want. Use a bunch of ugly branching if you want.
It's all so anachronistic. We have self-driving cars, VR, drones, neural nets, machines that can write code themselves, post-quantum cryptography--and you're lecturing me about the $(No One Cares) Pattern from the hottest bestseller of 1994?
there is never a golden rule, "the rule of 3", sometimes experience because you have seen the exact same can trump the rule of 3 and you can abstract it on the first try.
Clean code - duplication, should focus on duplicate code that needs to change at the same time on all X places it is duplicated.
If the code seems similar but actually has NO correlation, then its not duplicate it just works the same.
All this discussion makes an internal voice in my head scream: "They need Lisp macros!"
But the world moved on from macros into... this.
Edit: My reasoning is as follows: the comment that says:
// 10 repetitive lines of math
is exactly the line that would have been replaced by a macro in the original code, and everything would have worked the same, as macros don't modify the stack like a function call does. Unless you want it to.
TL;DR
1) We should not obsess with clean code, we can't agree on what clean code is.
2) We should also not write dirty code, we cant' agree on what dirty code exactly is (see all the counter examples)
There are few rules of thumbs that are always true with regards to code quality. Getting sucked in a job or argument where code-cleanliness is the no#1 metric and by distinction the "version" of clean code that your boss is telling you is a hellish way to make a living.
We can usually identify and agree on the two extremes:
The very very very good code, and the very very very bad code... anything else and mostly all of us as coders are in the middle, and we would be more wise to focus on making the code work then chasing Zen-State-Of-Compilation-And-Code-Quality. It is useless when code exists ONLY to transform data... I've never bought a product based on code quality.
Isn’t the only reason we care about duplicate code that we don’t want to miss an instance of it for enhancements or bug fixes down the line? Don’t we have technology that helps us reduce that concern?
Yeah, it's not about text/code duplication as much as it's about behavior duplication. In theory both, hopefully three or more!, code paths share the exact same behavior and if that behavior ever changes it will need to change EVERYWHERE you introduced a method to abstract it away.
Depending though, the indirection trade off may not be worth the abstraction to DRY it up. Then you end up with code like what was popular in the bad old days of Ruby. Essentially, you can get carried away.
This may not come across well, because it is related to very large CAD models. Think rockets, cars, ships, airplanes, type large.
In the assembly paradigm, each component has 6 degrees of freedom. On average, a good human can mentally process a handful of components. Beyond that, it gets very hard to balance all the degrees of freedom.
Solvers also do not scale well once things reach the hundreds, sometimes low thousand component level. Linear compute performance bound. Parallelization is hard and a focus of current development.
Very large assemblies present problems similar to the ones seen in large software.
The dominant way of building big models, assemblies is to compartmentalize sub systems, build to common interface points, and only constrain important component groups.
Resolving all the degrees of freedom requires making big investments in time and compute power that only pay off when things change.
Almost never makes sense.
The balance is making those investments where change is known or predicted with high confidence.
Otherwise, it is easier to defer this task and work with common assumptions, until such time as a more complete model is warranted and indicated.
It is always a hard balance to find because the allure of "handling the future changes" now is seen as a savings when the truth is almost always a loss.
Those efforts become debt in a few ways:
Big investment in change ready model sees changes that lie outside the scope of already implemented model dynamics. (Big refactoring needed) 2x if not more work required.
Model actually defined incorrectly. Creates problems that would not otherwise exist. Hard to find these due to massive compute and human time spent simulating and debugging.
Model overly compartmentalized, due to compute limits and too many assembly model constraints. Makes reasonable changes difficult. Refactoring is needed.
Inability to operate with full or large fractions of model, due to inability to compute it reasonably.
Finally, where the large model is made across various CAD systems, designing in place, just putting things relative to common assembly mating points actually is the most efficient way!
When a dynamic assembly model is needed for analysis or other simulation related task, or changes, making it right then, even sometimes throwing it away when the work is done can make the best sense, with only the final result kept for the future.
None of this is intuitive at first. Everyone will tend to over model, until they hit one of the pain points, have to over correct, and work through it all.
Code should be easy to debug and troubleshoot six months later by someone else.
Think about realistic bugs. Somebody had a minor typo or conceptual error and got a sign wrong in the math. That happens.
You're going to get a bug report a year later about "resizing the top left of a text box has the wrong gap space".
In the original code, you can look at the TextBlock area and resizeTopLeft to instantly narrow down the location of the bug, and compare it to its neighbors resizeTopRight and in like 30 seconds, duh, you subtracted the gap from the X in resizeTopLeft just like resizeTopRight and obviously the symmetrical version would be adding the gap for left because you subtracted from X coordinate for right and they're symmetric, build test, now it moves two pixels the correct direction, commit, close bug, done in like five (labor cost expensive) minutes, depending on build and test system, LOL. You should also look into why the unit test system missed this, but perhaps your fully automated unit testing does not check pixel perfect UI operations, so these things will just happen.
Is there anything for Jenkins implementing pixel perfect UI testing, and how would one spec all the millions of possible actions and combinations of actions? A startup opportunity for someone?
In the new code, its going to be possibly the rendering of everything is messed up and you have to decode and store and analyze the entire design of the entire system in your head simultaneously and run numerous simulation examples in your mind until you realize technically the gap spacing should be -1 times the absolute value of X coord or whatever abstract and elaborate formula. Or maybe you added the absolute value of a negative number which would only affect one left/right side's gap or similar complicated bug. Its going to be a VERY expensive bug, like an hour, maybe a day if its really mysterious and the debugger just doesn't "get" obfuscated code.
Also in the original code you can trivially compose one discrete example at a time and experiment with what changes when implementing the next example. Its going to be written and tested very quickly. You don't have to worry about symmetry related bugs to make TopLeft work at the same time as TopRight and BottomLeft and all that.
In the new code, its very impressive to other programmers but your boss is going to notice you have to pack the entire system into your head in a perfect and 100% correct manner before anything works at all, which seems inefficient.
The most nifty looking abstraction possible isn't necessarily optimized to be anything other than the most nifty abstraction. Its statistically unlikely to hyperoptimize for X where important values of Y like debug-ability or speed of writing are the actual real world goals.
Basically, always simplicate and add lightness, and in some situations, "clean code" is not simple and light. Sometimes clean is obfuscated by some perspectives.
He made a tradeoff that later on turned out to be wrong. I don't think that is a mistake, after all, it wasn't foreseeable at the time.
It could just as well have happened that some other requirement would have made the original solution even more impractical.
I think the only mistake here might be overthinking things (can't comment on changing other people's code, depends on company culture).
Apart from "cleanliness", what about developer happiness. If a more elegant solution makes you happy, why not go for it, at least every once in a while?
I constantly see devs applying DRY in ways that conflict with the “single responsibility principle”. It’s like they just stop at DRY because it’s a simple concept to understand and relatively easy to defend . I’ve seen people essentially labeled as heretics for suggesting that DRY (and any other SOLID principle) should not be blindly applied to every piece of code.
So the two cases against writing the most legible, succinct code given the specifications at the time of writing it are:
>Firstly, I didn’t talk to the person who wrote it. I rewrote the code and checked it in without their input. Even if it was an improvement (which I don’t believe anymore), this is a terrible way to go about it. A healthy engineering team is constantly building trust. Rewriting your teammate’s code without a discussion is a huge blow to your ability to effectively collaborate on a codebase together.
There's no question about this. Nobody likes the self-proclaimed savant who works in isolation and makes sweeping changes to the codebase or other people's work without collaborating and gaining some consensus. If it's a change worth making it should be a simple case to present to your (hopefully) equally intelligent team.
There is a difference, it has to be highlighted, between refactoring someone's code in order to extend it yourself and simply re-writing someone's implementation because it doesn't suit your requirements. The former is part of the job, the latter should at the very least be an opportunity to mentor the person's whose code you want to re-write in why it was suboptimal and guide them on the changes you'd like to make, or even give them the chance to make it themselves. This is kind of what code reviews are supposed to do.
That does not negate the need to structure and optimize code to remove duplication whatsoever. It's not an argument against clean code standards and it's weak that it amounts to 50% of his case here.
>Secondly, nothing is free. My code traded the ability to change requirements for reduced duplication, and it was not a good trade. For example, we later needed many special cases and behaviors for different handles on different shapes. My abstraction would have to become several times more convoluted to afford that, whereas with the original “messy” version such changes stayed easy as cake.
Time changes, requirements change. It's part and parcel of our jobs in software development. Writing code that at one point is optimal and most legible for the cases present should also be done to try to make it refactorable and extendable.
It is much easier to refactor and extend code that isn't riddled with duplication and mangled with hardcoded business logic. Abstract your code and write your implementations well, name things in a way that people can read it and write tests that describe what's expected from it.
Refactoring well isn't easy work. Refactoring a sprawling legacy codebase with a lot of duplication and legibility problems is significantly worse.
I'm not saying we need to be dogmatic here. If you're given the opportunity to develop new code you should be aiming to do the best job of it given what you know now, in a way that will be comprehensible to you, or whoever needs to touch that code next.
We all know that there are problems with premature optimization caused by "best practices" evangelists who'd happily drive up time-to-market and operating costs/complexity exponentially in the name of having the codebase and applications / services architecture in line with whatever he or she has read lately from "thought leaders" in our industry, but writing the code for a given application in line with the above isn't one of them.
The code is not large enough to need maintenance at a fine-grained level.
There is a secondary rule to the DRY "rule of three": If I can blow it away and rewrite it so easily, there is nothing to reuse or refactor in it. The feature is done, and we are into code golf and speculation, neither of which are productive uses of time. In my experience the success rate of speculative refactors like the one author made has perhaps a 50/50 chance, so no better than the initial strategy. It's the requirements themselves and the application of techniques to avoid various classes of errors that give code direction and structure - not the aesthetics at a moment in time(which is what author took issue with).
If you spot multiple approaches on the first try, you can add a comment with a date outlining alternatives so that the conversation may be resumed later when the new requirements come in. But at all times you're always at the mercy of "discipline", and there's no preemptive measure that avoids that.
> There is a secondary rule to the DRY "rule of three": If I can blow it away and rewrite it so easily, there is nothing to reuse or refactor in it.
This rule seems not to be correct though. For example it would mean that one of the most common and widely accepted (as far as I know, and admittedly I know nothing) changes—replacing an explicit for or while loop with some kind of iterator construct, for example a foreach or even a call to a map function—was a bad idea. By and large most individual for loops are pretty easy to understand and rewrite if you look at them. The first problem is that naturally one hardly ever has to read just a single for loop, and that the impact of small insults to abstraction and readability really adds up when repeated tens to thousands of times in a codebase. Second: that a piece of code is easy to read, blow away and rewite is very far from a guarantee of no bugs in either the old or the new version, and AFAIK the history of the vanilla for loop is a classic example of that. Again the impact of this depends on the fact that the for loop can be repeated many times in a codebase.
OTOH the example code in TFA was not repeated with (or without) small variations many times in the program. (I'm not talking about how often it was called or about repetition inside the example code here, ofc.) So your test probably does correctly show that changing or not changing this piece of code on its own is only a small-stakes decision, unlike having say 500 vanilla for loops in the program. But if you consistently let individual bodgy code segments pass then surely you're liable to end up with a large and diverse body of them in the codebase, and that's in some ways even worse than 500 for loops, which can at least all be found with a simple search.
The author also didn't sound like a particularly senior engineer at the time for many reasons. So the original code author and the "boss" may have been taking into consideration timelines and future work/requirements coming down the pipe.
A very valid reason could have been as simple as "We are re-visiting this in a couple sprints after feedback and will have a better idea of how it needs to change. The extra day spent on this wasn't worth pushing getting it into peoples hands, and we don't know if it would be a waste." The author would have known this if he started a conversation about it.
I heard something like: "The second system you design will be the most over engineered piece of shit ever"
I don't know who said it but it has been very true for me and my close friends who work in software development. I remember first starting software development and I started to read up on "how to do it right" in the Java/C# world back when XML was everywhere.
I had first started to expand my skills after university by building my own blog (who didn't at that time?) but thought I should rebuild it according to "best practices".
Hoooooly shit that was a poorly architected and designed piece of software. The example in the blog was of course not as poor of an example as my creation but I feel that many end up in this trap after they have some experience that they need to do everything "right" and they don't have the experience to evaluate if it is worth it.
However I also think a good workplace have a healthy mix because those youngsters will also push the old guard to learn new things and introduce new technology. Just need a balance between using 0.1-alpha libraries and things that were released 10 years ago.
It is much easier to refactor and extend code that isn't riddled with duplication and mangled with hardcoded business logic. Abstract your code and write your implementations well, name things in a way that people can read it and write tests that describe what's expected from it.
Especially in a statically compiled language - “extract method”, “extract class”, “pull members up”, etc. is an automated, guaranteed safe refactor (ignoring reflection) .
Of course you're going to have scalability problems if you don't write a behavioral specification. There are consequences to software architecture choices, and charging ahead without thinking (aka hacking) isn't how you write production code.
The author not only fails to explain why his refactored code is worse he doesn't even show all of his code. There is simply not enough information to know whether the refactoring is better or worse. In terms of less lines of code and code reuse, it is better in this sense. I suspect the author is pretty junior himself to imply the old way is better than his refactoring without a clear explanation as to why. I guess putting your baseless opinion on a fancy blog or medium makes you seem more legit than you really are.
I get his point, however. In the spirit of his argument there is an exact answer as to why one way of designing a program is better than another way despite more/less code re-use or more/less lines of code. I'm not even going to get into legibility here as that is just an opinion piece. Also I'm going to give a very concrete answer here. No design principles no design philosophy or any of that.
Let's say you have feature which we call "A" that can be created as a composition of several primitives. We call these primitives "a, b, c, d." Let's also say feature "A" can be constructed from a different set of primitives "b, f, g, a."
Note the overlap in primitives. The overall set of primitives are different but both ways of constructing feature "A" can share the same primitives if needed. The two sets in the example share primitives "a" and "b".
Now let's say we want the code to be flexible enough to construct feature "B" or feature "C" sometime in the future.
lets say feature "B" needs primitives "a, d, b" to construct.
lets say feature "C" needs primitives "b, f, a." to construct.
Which method of constructing Feature "A" is better knowing that you will need to construct Feature "B" in the future? what if it was Feature "C" for the future?
Obviously depending on "how" flexible you want your design to be you can choose one way to initially construct the program or another way. It's all an opinion and anticipation for the way you design your program in the future.
One strategy to remove opinion from your design is to try to incorporate the full union of primitives "a,b,c,d,f,g" Or find another completely different set of primitives (perhaps "h, i, j") that can be used to construct features "A", "B", and "C."
SO the concrete answer is "h, i, j" is the best set of primitives you can use to construct your program but if "h, i, j" aren't available then your choice of "a, d, b" or "b, f, a" or "a,b,c,d,f,g" both hinges on whether you need to construct feature "B" or feature "C" in the future.
The problem with this article is that he never got into the nature of program design/organization. Just vague reasoning and hand wavy examples.
A good post about someone learning that writing code is more than just about how pretty it looks. How you work with others is incredibly important in this industry
Is that not common knowledge? I feel like this is a well-written post with a good point but it's a familiar point. You could boil part of it down to 'all's good in moderation', so don't just keep your code clean, keep it clean and easy to read, etc.
It's a great post anyway, sorry if my tone was too critical. I enjoyed reading it, for what it's worth, just felt like expecting anything surprising or revelatory out of it wouldn't be the right approach.
Leaving common knowledge undocumented is, in itself, a misguided attempt at DRY.
Your common knowledge isn't necessarily everyone's common knowledge; what you perceive isn't necessarily what it is.
Every day there are many people just starting to code. There are cultural and regional differences. They might not have hit the exact spot in which this information appears.
Knowledge also disappears, fades, gets censored or destroyed, gets mixed up and remembered incorrectly. You might take something as a given and misremember it 20 years from now. You might read your own code 5 years from now and not know why something's there — something that was taken to be obvious.
It definitely is something that needs to be said, as it is not uncommon to find people doing the opposite: taking a rule-of-thumb and asserting that it is the one true way. The people who are most likely to make this mistake are smart and well-motivated, enthusiastic about the power of abstraction to simplify things, and have some experience but not a lot.
It wasn't worth deconstructing their "logic". Rather, my implicit message was 'if you find yourself disagreeing with someone likely better than you at X, step back and assess the epistemic foundations of _your_ belief.'
I have a Google One subscription so I pay Google for storage. Google could suspend my account and I'd have no recourse whatsoever? I'd have no way of getting my 10s of thousands of photos back?
There are a lot of situations where 3-5 lines of many methods follow basically the same pattern, and it can be aggravating to look at. “Don’t repeat yourself!” Right?
So you try to extract that boilerplate into a method, and it’s fine until the very next change. Then you need to start passing options and configuration into your helper method... and before long your helper method is extremely difficult to reason about, because it’s actually handling a dozen cases that are superficially similar but full of important differences in the details.
I encourage my devs to follow a rule of thumb: don’t extract repetitive code right away, try and build the feature you’re working on with the duplication in place first. Let the code go through a few evolutions and waves of change. Then one of two things are likely to happen:
(1) you find that the code doesn’t look so repetitive anymore,
or, (2) you hit a bug where you needed to make the same change to the boilerplate in six places and you missed one.
In scenario 1, you can sigh and say “yeah it turned out to be incidental duplication, it’s not bothering me anymore.” In scenario 2, it’s probably time for a careful refactoring to pull out the bits that have proven to be identical (and, importantly, must be identical across all of the instances of the code).