Goodbye, Clean Code

burlesona · on Jan 11, 2020

I’ve usually heard this phenomenon called “incidental duplication,” and it’s something I find myself teaching junior engineers about quite often.

There are a lot of situations where 3-5 lines of many methods follow basically the same pattern, and it can be aggravating to look at. “Don’t repeat yourself!” Right?

So you try to extract that boilerplate into a method, and it’s fine until the very next change. Then you need to start passing options and configuration into your helper method... and before long your helper method is extremely difficult to reason about, because it’s actually handling a dozen cases that are superficially similar but full of important differences in the details.

I encourage my devs to follow a rule of thumb: don’t extract repetitive code right away, try and build the feature you’re working on with the duplication in place first. Let the code go through a few evolutions and waves of change. Then one of two things are likely to happen:

(1) you find that the code doesn’t look so repetitive anymore,

or, (2) you hit a bug where you needed to make the same change to the boilerplate in six places and you missed one.

In scenario 1, you can sigh and say “yeah it turned out to be incidental duplication, it’s not bothering me anymore.” In scenario 2, it’s probably time for a careful refactoring to pull out the bits that have proven to be identical (and, importantly, must be identical across all of the instances of the code).

scarface74 · on Jan 11, 2020

That can be boiled down to the “Rule of 3”.

My CTO often asks me to implement a feature to do X and make it “generic enough to handle future use cases”. My answer is always the same - either give me at least three use cases now or I am going to make it work with this one use case. If we have another client that needs the feature in the future then we will revisit it.

Of course, there are some features that we know in advance based on the industry how we can genericize it.

endgame · on Jan 12, 2020

From (I think) an old Joshua Bloch talk on API design, paraphrased:

* If you generalise based on one example, you will get a flexible API that can handle only that example. * If you generalise based on two examples, you will get a flexible API that can switch between those two examples. * If you generalise based on three examples, you have a chance of abstracting over the common essence.

abrax3141 · on Jan 12, 2020

Hmm. I wonder if he got that from Simon while he (JB) was at CMU. He (HS) once said to me, jokingly, I think, “One makes an observation, two makes a generalization, three makes a proof”.

DataWorker · on Jan 12, 2020

Sounds good but is wrong. Sometimes you need more than 3. Maybe 3 is for those who need to move fast and break things though.

abrax3141 · on Jan 14, 2020

Yeah. I think he (HS) was joking. He had a Nobel, and (co)invented AI. I’m pretty sure he knew what a proof is.

kitd · on Jan 12, 2020

Yes, he mentions it in Effective Java. When designing APIs, collaborate with 3 clients if possible.

aj7 · on Jan 12, 2020

Until an example that “doesn’t fit” ?

scarface74 · on Jan 12, 2020

And then if the example “doesn’t fit”. It’s by definition a different thing than you originally modeled

Stratoscope · on Jan 12, 2020

The Rule of 3 is a great rule, except when it isn't.

I had a colleague some time ago who wrote a couple of data importers for FAA airspace boundaries. There were two data feeds we cared about, "class airspace" and "special use airspace". These airspace feeds have nearly identical formats, with altitudes, detailed boundary definitions, and such. There are a few minor differences between the two, for example different instructions for when a special use airspace may be active. But they are about 95% the same.

The developer wrote completely separate data definitions and code for the two. The data definitions mostly looked the same and used the same names for corresponding fields. And the code was also nearly the same between the two (in fact exactly the same for most of it).

It was clear that one importer was written first, and then the code and data structures were copied and pasted and updated in minor ways to create the second.

Because the data structures were unique for each (even if they looked very similar in the source code), this impacted all the downstream code that used this data. If you saw a field called FAA_AIRSPACE_MIN_ALTITUDE, you had be sure to not confuse the class airspace vs. special use airspace, because each of these had a field of the same name, the compiler wouldn't catch you if you used the wrong one, and you may have the wrong offset into the data structure.

I asked the developer and they told me, "I have this philosophy that says when you have only two of something, it's better to just copy and paste the code, but of course when you get to the third one you want to start to think about refactoring and combining them."

Yep, the Rule of 3.

In this case there were only ever going to be two. And the two were nearly the same with only minor differences.

But because of blind obedience to the Rule of 3, there were many thousands of lines of code duplicated, both in the importer and in its downstream clients.

I still like the Rule of 3 as a general principle, and I have espoused it myself. But I think it is best applied in cases where the similarities are less clear, where it seems that there may be something that could be refactored and combined, but it's not yet clear what the similarities are.

I think it is a very bad rule in a case like this, where it should be obvious from the beginning that there are many more similarities than differences.

gameswithgo · on Jan 12, 2020

Every single rule or advice in programming is good until it isn't. OOP is good until it isn't, function programming is good until it isn't, premature optimization is the root of all evil until it is the root of all good.

For some reasons humans have this deep need to try and boil things down to bulleted lists which in the domain of programming are just incredibly not useful.

bjornjajajaja · on Jan 13, 2020

This is because programming is an art. It has fundamental components (objects, functions, templates, iterators, etc) like in visual design (point, line, shape, value, etc).

I think engineers should read the old and the new C++ books and master that language to know the evolution of all these paradigms and how to use them. There’s so much wisdom in the “Effective C++” series and Gang of Four and “C++ Templates: The complete guide“ to name a few.

Problem is in this “start up culture” to bang things out and get them working the art is left behind. Just like many other arts.

hermitdev · on Jan 13, 2020

I was part of an uncredited team interviewed by Meyers for "Effective Modern C++". Always thought ir was somewhat ironic. At the firm (declined acknowledgments because they shun media attention) we werent even using a C++11 compiler on either Linux or Windows at the time. Yet, at least two of the patterns, I recognize as being at least a partial contributor to.

Myself, I lost track of C++ standards changes with C++17, and Ive not been using C++ for the last several years.

I still love the power and speed, but right now I'm dping more ETL work, and Python is a better and more productive language for that.

TravHatesMe · on Jan 12, 2020

I think that you have a point but I often find myself citing guidelines or rules when I am evaluating code decisions or questioning code design. Maybe it depends on your interpretation of the phrases, some sayings should be followed religiously but others applied with discretion.

akoncius · on Jan 12, 2020

well said. for a while I started treating everything as a soft rule, more like guideline. it gets easier then :)

Smoosh · on Jan 12, 2020

Consider them to be heuristics, not rules.

bcrosby95 · on Jan 12, 2020

The rule of 3 usually is in reference to small scoped abstractions, not whole modules or subsystems. We're talking about extracting a short function, not significant and potentially thorny chunks of code.

But I guess no one explicitly spells this out, so I could see where someone could become confused.

valand · on Jan 12, 2020

Premature abstraction. Rule of 3 helps. But I found better principle for it:

Any abstraction MUST be designed to be as close as possible to be language-primitive-like. Language primitives are reliable, predictable, and non-breaking. If they do, they don't affect business logic written on top of it. If parameters are added, defaults are provided. They don't just abstract, they enable developers to express business logic more elegantly.

The challenge is to pick the part of the code abstractable to be primitive-like first and make it a top priority.

This is why language features like Rust async, Go channel-based comm, ES6, C++ smart pointer was such hype in their time and is used up until now. It also applies to enabling tools such as React, tokio, wasm-bindgen, express, TypeScript, jquery (even this which is not a thing anymore).

pvorb · on Jan 12, 2020

I find abstractable parts in code by thinking about potential names for it. If there is no good name for a potential common function, it's probably not a good idea to extract the section into a function. Maybe it can be extracted into two separate functions with good names?

konschubert · on Jan 12, 2020

This is a proxy for having a well-defined purpose for a function.

pvorb · on Jan 12, 2020

Yes, absolutely, but it's one that's easier to teach juniors.

aj7 · on Jan 12, 2020

Like.

Stratoscope · on Jan 12, 2020

Yes, exactly, and it is such a good rule to go by so much of the time. But like many rules, you need judgment to know when to apply it.

hyperpallium · on Jan 12, 2020

Rules are a poor substitute for actual thought.

thunderbong · on Jan 12, 2020

> Rules are a poor substitute for actual thought.

This should be the guiding principle of life!

HomeDeLaPot · on Jan 12, 2020

The guiding principle... a rule, so to speak!

NegativeLatency · on Jan 12, 2020

Everything in moderation, even moderation itself.

speedplane · on Jan 12, 2020

> Everything in moderation, even moderation itself.

This was my guiding principle in life, and I thought I was pretty clever for coming up with it, but later found out that someone far more clever got a lock on the phrase in history: https://www.goodreads.com/quotes/22688-everything-in-moderat...

a1369209993 · on Jan 12, 2020

No, no, no; it's "Everything in moderation, especially moderation.". :)

Stratoscope · on Jan 12, 2020

Really? I'm sure you know that all generalities are false.

antonvs · on Jan 14, 2020

Only a Sith speaks in absolutes

pvorb · on Jan 12, 2020

You nailed it. I will print this and hang it on the wall in our office.

he0001 · on Jan 12, 2020

I thought rules were a way of organizing thought, like a language which is governed by rules.

scarface74 · on Jan 12, 2020

The rule of 3 usually is in reference to small scoped abstractions, not whole modules or subsystems. We're talking about extracting a short function, not significant and potentially thorny chunks of code.

I would say it’s for entire features sometimes. For instance, we are a B2B company. We have features on the roadmap or a feature might be suggested by a client. Either way, you hit the jackpot if you can get one client to pay for a feature that doesn’t exist in your product that you can then sell to other clients.

The problem is that you don’t know whether the feature is generally useful to the market. But you think it might be, in that case, you build the feature in a way that is useful to the paying client and try not to do obvious things that are client specific. But until you have other clients you don’t know.

Gibbon1 · on Jan 13, 2020

That's been my experience doing embedded stuff. Customers will tell you they want a feature. You'll implement it then find out they don't need it enough to devote resources to utilize it on their end. So then it just lingers in the code base. And never gets real world testing. Lately I've been pulling unused features and consigning the code to the land of misfit toys. Just so I don't have to maintain them.

scarface74 · on Jan 13, 2020

If that’s the case and they were willing to fully pay for the feature, at professional services + markup hours, it wasn’t a loss. In our case, we still got unofficially professional services and recurring subscription revenue and now we have a “referenceable client”.

If it’s behind a per client feature flag, it doesn’t cause any harm.

WalterBright · on Jan 12, 2020

Any of these rules comes with "use your best judgement" and not drive off a cliff blindly following it.

dkersten · on Jan 12, 2020

> I asked the developer and they told me, "I have this philosophy that says when you have only two of something, it's better to just copy and paste the code, but of course when you get to the third one you want to start to think about refactoring and combining them."

I would argue this is a good rule of thumb, but nothing should ever be a hard unbreakable rule. The usual refactoring rules should apply too: if 95% of the code is the same, you should probably go back and refactor even without a third use case because it sounds like the two use cases are really just one use case with slight variation.

scarface74 · on Jan 12, 2020

If the compiler didn’t catch it, doesn’t that say it was modeled incorrectly?

Why not have an IAirSpace interface or an abstract AirSpace class with two specializations? If there were processes that could handle either it should take an AirSpace class, one that could only handle one or the other took the specialization.

If the steps were the same for handling both, have step1...step(n) defined in the concrete class and have a coordinating “service” that just calls the steps and takes in an IAirSpace.

skrebbel · on Jan 12, 2020

To be honest I don't get it. What language was this in, that you couldn't duck type the data or make the data structures inherit from some shared interface that encapsulated the commonalities?

Stratoscope · on Jan 12, 2020

That is a reasonable question. You may have noted that I left some details vague, so all I ask is that you trust me that the overall story is true.

Of course if you don't trust me, that's cool too! ;-)

skrebbel · on Jan 12, 2020

Hah! Totally do, I'm just nerd sniped by the details I guess :-)

konschubert · on Jan 12, 2020

Rules of thumb are just that. Developers still have to understand the underlying motivation (incidental vs inherent duplication).

a1369209993 · on Jan 12, 2020

> The Rule of 3 is a great rule, except when it isn't.

"Rules are for the guidance of wise men and the obedience of fools."

It's unfortunate that you had to deal with a fool, but that's not a indictment of the particular rule that they picked to follow off a proverbial cliff.

Edit: fixed ambiguous quote formatting.

Stratoscope · on Jan 13, 2020

Thanks for your reply. I just want to note that what appears to be a quote in your comment ("Rules are for the guidance of wise men and the obedience of fools") is not something I wrote at all. If you still have time to edit the comment, I might suggest changing that so it doesn't look like you quoted me.

yawaramin · on Jan 12, 2020

> But because of blind obedience to the Rule of 3, there were many thousands of lines of code duplicated, both in the importer and in its downstream clients.

Well sure, blind obedience to anything at all can cause problems.

aj7 · on Jan 12, 2020

There’s a safety issue here. I can conceive of these code blocks diverging more with time. Plus is there really a cost of duplication.

useful · on Jan 12, 2020

but it was used in many places, so the rule of three applies.

Personally, I would have kept the code that he wrote and made something that handled the special cases you are talking about.

Everything should be replaceable, not extendable. Now if the special cases change, my code can be thrown away without changing the data feed code.

loopz · on Jan 12, 2020

Are you sure merging code for different datafeeds would be better though? In such cases, what is identical and what is not, should be references to eachother in comments. But you don't know beforehand which approach would be better, unless you know the datafeeds will stay the same as now.

The sad story here is that if you know the datafeeds will stay pretty static, there's little to gain making an advanced abstraction over them. Which is why you often find duplicated code that haven't been touched for years.. The original target was met with a naive approach, and no new changes lead to stale codebases.

labawi · on Jan 12, 2020

If you have a 95% match on something nontrivial (and it likely won't diverge significantly), I'd go for merging even with 2 cases. At least merge most of the common parts.

Reading a couple of ifs, and some not-quite duplicate procedures seems much better than having a complete 2-set in cross-refenenced files.

scarface74 · on Jan 12, 2020

Why are you reading a couple of ifs instead of having the two similar things represented by separate classes with shared functionality in a common class? Or even if you prefer composition to inheritance you could still make it work cleaner without a bunch of if statements.

dkersten · on Jan 12, 2020

With a 95% match, you really only have one use case, with some minor variation.

dllthomas · on Jan 12, 2020

No, you also need to ask not just whether they're similar, but whether they are the way they are for the same reason, and are thus likely to change together. It doesn't matter if there are 10, if they all might change independently in arbitrary ways.

"DRY" as coined in The Pragmatic Programmer spoke in terms of pieces of knowledge.

If you are combining code just because it looks similar, you're "Huffman coding."

jdnenej · on Jan 12, 2020

Every time I have seen a feature that was written general to handle possible future uses, after a year of sitting unused there will certainly be modifications or surrounding code that doesn't support the planned extensibility. So it can never be used without a lot of work changing things anyway.

Impossible · on Jan 12, 2020

Future coding has lead to some of the most overcomplicated systems I've worked with. It's one of the reasons (among many) I quit my last job. I was constantly told code that had no use cases was "important to have" because "we needed it".

scarface74 · on Jan 12, 2020

So does that same idea apply to all of the many abstractions thst geeks do just to stay vendor or cloud agnostic just in case one day AWS/Azure go out of business?

Joeri · on Jan 12, 2020

On the other hand, I worked on a multi-million line codebase that was deeply joined to oracle’s db, with a team who all really wanted to move away from it but couldn’t because in the beginning (a decade earlier) the choice had been made to not put in “unnecessary” abstractions.

scarface74 · on Jan 12, 2020

It’s not about the abstractions. In the case of Oracle or any other database, if you’re only using standard SQL and not taking advantage of any Oracle specific features, why are you spending six figures a year using it?

The same can be said about your cloud provider. If you’re just using it for a bunch of VMs and not taking advantage of any of the “proprietary features” what’s the purpose? You’re spending more money than just using a colo on resources and you’re not saving any money on reducing staff or moving faster.

You’re always locked into your infrastructure decisions once you are at any scale. In the case of AWS for instance (only because that’s what I’m familiar with), even if you just used it to host VMs, you still have your network infrastructure (subnets, security groups, nails), user permissions, your hybrid network setup (site to site, client to site VPNs) your data etc.

In either case, it’s going to be a months long project triggering project management, migrations, regression tests, and still you have risks of regressions.

All of the abstractions and “repository patterns” are not going to make your transition effort seamless. Not to mention your company has spent over a decade building competencies in the peculiarities of Oracle that would be different than MySql.

After a decade, no one used a single stored procedure or trigger that would be Oracle specific? Dependencies on your infrastructure always creep in.

lmm · on Jan 12, 2020

Yes. There's no point abstracting over a vendor API if you're not actually using an alternative implementation (even for testing). Otherwise, keep your code simple, and pull out an abstraction if and when you actually have a use case for doing so.

Impossible · on Jan 12, 2020

We had platform agnostic discussions with no concrete plan or action by anyone to actually escape our platform.

dx034 · on Jan 13, 2020

Vendor agnostic code doesn't anticipate AWS going out of business, just them raising prices significantly. It can be smart to be able to switch in a reasonable amount of time so that you can evaluate the market every few years. This way spending extra time to be vendor agnostic can also pay off. But there's no technical reason for that, it's a cost issue.

scarface74 · on Jan 13, 2020

It’s often noted that in almost 15 years of existence, AWS has never increased prices on any service.

What are the chances that AWS will increase prices enough to make all of the cost in developer time and complexity in “abstracting your code” and the cost in project management, development, regression tests, risks, etc make it worthwhile to migrate?

The cost of one fully allocated developer+ qa+ project manager + the time taken by your network team + your auditors, etc and you’re already at $1 million.

Do you also make sure that you can migrate from all of the other dozen or so dependencies that any large company has - O365? Exchange? Your HR/Payroll/Time tracking system (Workday)? Windows? Sql Server? SalesForce? Your enterprise project management system? Your travel reimbursement system (Concur), your messaging system? Your IDP (Active Directory/Okta)?

grogenaut · on Jan 12, 2020

Old boss of mine had a great line: "two points always make a line, but three usually makes a triangle"

DonHopkins · on Jan 12, 2020

Your boss made a good point. My boss also made the exact same good point. But between those two same good points, there isn't a great line. ;)

grogenaut · on Jan 12, 2020

We had introspective bosses?

RealDinosaur · on Jan 11, 2020

I think the catch all term for that is YAGNI.

pytester · on Jan 11, 2020

YAGNI is about not adding functionality until it's needed. DRYing code isn't adding functionality.

philipov · on Jan 11, 2020

I think GP is implying the thing you're not going to need is a generic solution to the repetition. i.e: DRY is in fact a feature.

dragonwriter · on Jan 12, 2020

DRYing existing code to handle exactly what is needed to support the current uses with no additional axes of variation isn't adding functionality (well, if it handles more than the existing values in the existing axes of variation, it's adding some but arguably de minimis functionality.)

Building code with support for currently-unused variation to support expected future similar-but-not-identical needs, that is, pre-emptively DRYing future code, is adding functionality.

michaelcampbell · on Jan 12, 2020

WET - Write Everything Twice

nikodunk · on Jan 12, 2020

What is YAGNI?

chewxy · on Jan 12, 2020

You Ain't Gonna Need It

michaelcampbell · on Jan 12, 2020

Write once, copy twice, refactor after 3.

hinkley · on Jan 12, 2020

On one project we ended up with a series of feature that fell into groups of threes and we kept trying to make the 2nd feature in the series generic, and time after time #3 was a rewrite and significant rework of #2. So any extra time spent on #2 was time wasted.

dlkf · on Jan 12, 2020

I think that what Burlesona is suggesting is more nuanced (and effective) than the rule of three. We can easily imagine situations where code used twice warrants a refactor, and situations where code used three times does not.

The rule of three suffers the same problem as the pre-emptive refactor - it is totally context insensitive. The spirit of the rule is good, but the arbitrary threshold is not.

Similarly, 99% of your comment is bang-on! My only gripe is the numeral 3. But pithy rules tend to become dogma - particularly with junior engineers - so it's best to explain your philosophy of knowing when to abstract in a more in-depth way.

speedplane · on Jan 12, 2020

> My only gripe is the numeral 3. But pithy rules tend to become dogma

Agree, 3 is a pretty arbitrary number. If you have a function that needs to work on an array that you know will certainly be of size 2, it takes minimal effort and will probably be worthwhile to make sure it works on lengths greater than 2.

But the bigger point is valid: you need examples to know what to expect, and if you make an abstraction early without valid examples, you'll almost certainly fail to consider something, and also likely consider a number of things that will never occur.

dkersten · on Jan 12, 2020

> make it “generic enough to handle future use cases”.

The answer to this is usually YAGNI.

That is, don’t plan for a future you might never have. Code in a way that won’t back you into a corner, but you don’t know what the future’s cases might be (or if there even will be any) so you can’t possibly design in a generic way to handle them. Often you just end up with over-engineered generec-ness that doesn’t actually handle the future cases when they crop up. Better to wait until they do come up to refactor or redesign.

Some people argue to design in a way that lets you rewrite and replace parts of the system easily instead.

kazinator · on Jan 12, 2020

The repetition is what is YAGNI!

Repeating code 7 times in preparation for separate evolution of those 7 cases is YAGNI, unless the requirements are on the table now.

Merging repeated code into one is something that is demonstrably needed now, not later.

jfengel · on Jan 13, 2020

It's not so much that you're preparing for 7 separate cases, as a thing that works in one place and almost, but not quite, fits in 6 others. You rarely exactly duplicate code, but often duplicate and modify.

By the time you hit 7, you do clearly Need It. But now you've got 7 cases to work from in writing the generalization. When the number is 2, it's often reasonable to say, "I don't know how these will evolve and I'll probably guess wrong".

dkersten · on Jan 12, 2020

Yes, I agree. That’s not what I was replying to, though. I noted in another comment that I consider merging worthwhile even in the absence of three use cases, certainly if what you have now is very similar.

scarface74 · on Jan 14, 2020

If all seven are the same except for one or two cases, it doesn’t mean that you have to have a bunch of if statements, you either use inheritance or composition to create special cases and judiciously apply “pull members up” and “push members down” refactoring, interfaces, abstract classes, virtual methods, etc. These are all solved problems.

Yes I know about the whole “a square is not a rectangle problem”.

rdiddly · on Jan 11, 2020

"Premature optimization something something..." ;)

lolc · on Jan 12, 2020

That's actually "premature generalization".

sverhagen · on Jan 12, 2020

Isn't premature optimization a generalization of premature generalization? ;-)

DonHopkins · on Jan 12, 2020

Oliver Steele describes "Instance First Development", which the language he designed, OpenLaszlo, supported through the "Instance Substitution Principle". I've written about it here before, and here are some links and excerpts.

https://news.ycombinator.com/item?id=14418108

In the right context, prototypes can enable Instance-First Development, which is a very powerful technique that allows you to quickly and iteratively develop working code, while delaying and avoiding abstraction until it's actually needed, when the abstraction requirements are better understood and informed from experience with working code.

That approach results in fewer unnecessary and more useful abstractions, because they follow the contours and requirements of the actual working code, instead of trying to predict and dictate and over-engineer it before it even works.

Instance-First Development works well for user interface programming, because so many buttons and widgets and control panels are one-off specialized objects, each with their own small snippets of special purpose code, methods, constraints, bindings and event handlers, so it's not necessary to make separate (and myriad) trivial classes for each one.

Oliver Steele describes Instance-First Development as supported by OpenLaszlo here:

Instance-First Development

http://blog.osteele.com/2004/03/classes-and-prototypes/

[...]

https://news.ycombinator.com/item?id=21841054

[...] The mantle of constraint based programming (but not Instance First Development) has been recently taken up by "Reactive Programming" craze (which is great, but would be better with a more homoiconic language that supported Instance First Development and the Instance Substitution Principle, which are different but complementary features with a lot of synergy). The term "Reactive Programming" describes a popular old idea: what spreadsheets had been doing for decades. [...]

https://news.ycombinator.com/item?id=7756215

Oliver Steele (one of the architects of OpenLaszlo, and a great Lisp programmer) describes how OpenLaszlo supports "instance first development" and "rethinking MVC":

http://osteele.com/archives/2004/03/classes-and-prototypes

http://osteele.com/archives/2003/08/rethinking-mvc

[...] I've used OpenLaszlo a lot, and I will testify that the "instance first" technique that Oliver describes is great fun, works very well, and it's perfect for the kind of exploratory / productizing programming I like to do. (Like tacking against the wind, first exploring by creating instances, then refactoring into reusable building block classes, then exploring further with those...)

OpenLaszlo's declarative syntax, prototype based object system, xml data binding and constraints support that directly and make it easy.

OpenLaszlo's declarative syntax and compiler directly support instance first development (with a prototype based object system) and constraints (built on top of events and delegates -- the compiler parses the constraint expressions and automatically wires up dependences), in a way that is hard to express elegantly in less dynamic, reflective languages. (Of course it was straightforward for Garnet to do with Common Lisp macros!)

https://news.ycombinator.com/item?id=17360883

Instance-First Development:

https://blog.osteele.com/2004/03/classes-and-prototypes/

>The equivalence between the two programs above supports a development strategy I call instance-first development. In instance-first development, one implements functionality for a single instance, and then refactors the instance into a class that supports multiple instances.

>[...] In defining the semantics of LZX class definitions, I found the following principle useful:

>Instance substitution principal: An instance of a class can be replaced by the definition of the instance, without changing the program semantics.

In OpenLaszlo, you can create trees of nested instances with XML tags, and when you define a class, its name becomes an XML tag you can use to create instances of that class.

That lets you create your own domain specific declarative XML languages for creating and configuring objects (using constraint expressions and XML data binding, which makes it very powerful).

The syntax for creating a bunch of objects is parallel to the syntax of declaring a class that creates the same objects.

So you can start by just creating a bunch of stuff in "instance space", then later on as you see the need, easily and incrementally convert only the parts of it you want to reuse and abstract into classes.

What is OpenLaszlo, and what's it good for?

https://web.archive.org/web/20080211173607/http://www.donhop...

Constraints and Prototypes in Garnet and Laszlo:

https://web.archive.org/web/20080211173607/http://www.donhop...

scarface74 · on Jan 14, 2020

In OpenLaszlo, you can create trees of nested instances with XML tags, and when you define a class, its name becomes an XML tag you can use to create instances of that class. That lets you create your own domain specific declarative XML languages for creating and configuring objects (using constraint expressions and XML data binding, which makes it very powerful).

This gives me nightmares of over engineered xml programming that is infamous I the Java community. You lose all of the benefits of static type checking.

mytailorisrich · on Jan 12, 2020

Generic enough code that can be adapted to future cases is code with clean architecture that follows standard OO principles.

On the other hand, trying to handle all the hypothetical cases because "that makes the code generic and future-proof" is usually a complete waste of time.

My view is to develop the simplest, well architected OO code that can handle the use cases at hand.

specialist · on Jan 11, 2020

After witnessing the collateral damage of many software reuse projects (anyone remember "components"?), I came up with a different ruleset, useful for "compromising" with "software architects":

First, translate the data.

Second, divine a common format and share the data.

Third, create the libraries for this common format, to be reused amongst projects.

I have never reached #3 in my professional career. Sure, we wrote the libraries. But other teams, projects have never adopted before whole effort became moot.

So I kept my projects in tact and moving forward, while letting mgmt think they're doing something useful.

cheshire_cat · on Jan 12, 2020

Sandi Metz wrote a blog post about this: https://www.sandimetz.com/blog/2016/1/20/the-wrong-abstracti...

> The moral of this story? Don't get trapped by the sunk cost fallacy. If you find yourself passing parameters and adding conditional paths through shared code, the abstraction is incorrect. It may have been right to begin with, but that day has passed. Once an abstraction is proved wrong the best strategy is to re-introduce duplication and let it show you what's right. Although it occasionally makes sense to accumulate a few conditionals to gain insight into what's going on, you'll suffer less pain if you abandon the wrong abstraction sooner rather than later.

BalinKing · on Jan 12, 2020

Sandi Metz's blog (and book) are an absolute gold mine. I'm a junior developer (<5 years), and the material gave me an entirely new outlook on design (object-oriented and otherwise)... and also showed me how un-maintainably most development is done nowadays :(

If there was a required reading list for professional developers, I would put her work on with zero hesitation, I feel it to be that important.

bberrry · on Jan 12, 2020

Which book are you referring to?

steveklabnik · on Jan 12, 2020

Not your parent, but https://www.poodr.com/

xenihn · on Jan 12, 2020

She has a new book too.

https://www.sandimetz.com/99bottles

klyonrad · on Jan 12, 2020

You call devs Junior until they have over 5 years experience? Wow, that’s harsh

BoiledCabbage · on Jan 12, 2020

> You call devs Junior until they have over 5 years experience? Wow, that’s harsh

Less than 3-4 years is absolutely junior dev. Up until around 7 or 8 years is mid-level. Passing beyond that would be senior dev and the other titles then follow.

With the obvious note that it's not strictly time bound - someone could easily get stuck at mid level for much longer if they aren't progressing. It's pretty hard to still be only junior after 7+ years

vertexmachina · on Jan 13, 2020

I agree.

I still very much consider myself junior but someone I know with less experience than me just recently got a job with "senior" in the title. I have seen senior job postings that say something like "3+ years of experience."

I absolutely do not consider myself senior and I don't think I will for at least seven more years.

BalinKing · on Jan 14, 2020

One of the big reasons I realized I still _was_ a junior developer was because of Sandi Metz's content showing me how long of a journey I still have to go :-P

I don't personally think it's strictly about the time one's been programming, although in my experience that can be a good benchmark for e.g. how good one's abstraction, API design, etc. skills are.

teh_klev · on Jan 12, 2020

> don’t extract repetitive code right away, try and build the feature you’re working on with the duplication in place first. Let the code go through a few evolutions and waves of change.

This ^^^.

I'm in my mid 50's now and worked as a software dev since my teens. I've learned over time that certain lumps of code need to be left alone for a while before jumping in and aggressively refactoring the perceived "duplication". I've been guilty of this kinda thing before, spot a wodge of code that looks like a duplication, only to find out later you hit some "special cases" as the project progresses and suddenly, as the article points out, your "helper" method balloons into a ball of spaghetti.

As an apropos to the article, and touched upon therein, checking in a fairly major change to de-duplicate some code without consulting the original author/team is a wee bit rude. Ask your colleagues first why such code still exists before barging in and making these changes, they may already have some concerns as to why refactoring to a helper method or some abstraction isn't in their game plan yet. It's a common courtesy.

mytailorisrich · on Jan 12, 2020

This isn't about rudeness or courtesy unless they are responsible for that code, in which case they should be involved or leading the change.

As you mention it's about technical knowledge: They know exactly why the code is the way it is so consulting them is "due diligence".

kqr · on Jan 12, 2020

> As an apropos to the article, and touched upon therein, checking in a fairly major change to de-duplicate some code without consulting the original author/team is a wee bit rude.

This sounds like the outcome of bad culture. Ownership of the code should be shared to the point where it should never be considered rude to improve the code. Any part of the code.

> Ask your colleagues first why such code still exists before barging in and making these changes,

If I had to synchronise with others all of my improvements to existing code (which was frequently written in a hurry to meet a deadline, so with shortcuts taken intentionally, or with incomplete knowledge of future use cases) I would get at most half as much done.

> they may already have some concerns as to why refactoring to a helper method or some abstraction isn't in their game plan yet.

If there are alluring "improvements" that don't work for such subtle reasons, this should be documented in the code. If it's not, one has only oneself to blame when someone goes in and changes it.

Edit: I realise now that I'm talking about teams where everyone is reasonably senior. It could be different with junior members on the team, to which many changes might look like improvements when a senior engineer would at most see the change as equivalent. In that case I think you're right, but for a different reason: junior engineers should always check in with senior engineers about things in order to learn more of the craft!

_pmf_ · on Jan 12, 2020

> This sounds like the outcome of bad culture.

For 90% of shops outside the cornucopia of SV with unlimited budgets and internal customers only, the client drives decision mercilessly and ruthlessly.

dx034 · on Jan 13, 2020

And for a good reason. The product doesn't exist because they're fun to develop. They exist because they solve a problem customers have. So ultimately, decisions should always be based on what customers (long-term) benefit most from.

babarock · on Jan 11, 2020

One of the areas where I really like "incidental duplication" is in tests.

Tests can sometimes be very repetitive and identical, and it's tempting to want to refactor it in some clever way. That's almost never good.

On top of the reasons laid out in parent comment, tests also function as unofficial documentation. I like having everything explicit in there, it makes them easier to read and understand.

ummonk · on Jan 12, 2020

If you try to abstract away tests, you often just end up re-implementing the same abstractions used in the actual code, and you can end up not catching unfounded assumptions that your abstraction is making in both the tests and the code. There is a scope for having test helpers / utils to make tests easier to write, but you should be minimalist with these.

3pt14159 · on Jan 12, 2020

I agree, but some test helpers are reasonable. It's about balance and readability of the tests in question. Otherwise integration tests end up being a hundred lines of code.

ajuc · on Jan 12, 2020

The problem with abstractions in testing is - you should test it :)

rbanffy · on Jan 12, 2020

Tests should also be treated as a form of documentation. A test should reflect the way you'd use the code in real life as closely as feasible.

therealdrag0 · on Jan 15, 2020

I'm not sure I agree. I like to move all setup code to helper methods so that my tests are just a few lines

  // Given <setup>
  
  // When <thing happens>

  // Expect <thing> to be <such>

This allows the reader to easily see which workflows are actually tested. If the reader is interested in implementations the utility code is one click away and usually only needs to be looked at once to be completely understood. The test bodies themselves however have many flavors for many work-flows so getting rid of the repetition is critical to highlight the specific nature of individual tests.

philwelch · on Jan 12, 2020

I do think a lot of test frameworks would do well to support data-driven testing a la Spock: http://spockframework.org/spock/docs/1.0/data_driven_testing...

lonelappde · on Jan 12, 2020

Terrible acronym for a good idea: DAMP

stackoverflow.com/questions/6453235/what-does-damp-not-dry-mean-when-talking-about-unit-tests

blazespin · on Jan 12, 2020

Tests rarely have bugs, I find, so generally dry isn’t critical. Also, dry is for security (see below)

dboreham · on Jan 12, 2020

Tests frequently have bugs, especially bugs that result in the test passing when it should fail.

chubot · on Jan 12, 2020

That’s why I make the test fail before writing the code. If the code is already written, then I break it in the minimal way to test the test, and then fix it.

sojournerc · on Jan 12, 2020

IME things never fail in the way you expect them to... You can build a fortification where you think you are weak only to find the enemy is already in the castle.

serpix · on Jan 12, 2020

A green test does not equal bug free code. There may be a misimplementation / misunderstanding of the spec or your code passes beautifully under right conditions, with just this data setup.

lonelappde · on Jan 12, 2020

That's a test for your test, so why only run it once transiently instead of running every time? "Mutant" testing helps with this. It's basically fuzzing your test code to make sure that every line is meaningful.

chubot · on Jan 12, 2020

I can see how that would be useful, but I also think it's a matter of priorities.

I'm basically saying I rarely have bugs in my tests because I verify them first. In fact I can't think of a single bug in my tests over the last 4 years (or even 10 years), but I can think of dozens of bugs in my code.

For example here are some pretty exhaustive tests I've written for shell, which have exposed dozens of bugs in bash and other shells (and my own shell Oil):

https://www.oilshell.org/release/0.7.pre11/test/spec.wwz/osh...

I would rather spend time using the tests to improve the code than improving the tests themselves. But I don't doubt that technique could be useful for some projects (likely very old and mature ones)

scaryclam · on Jan 13, 2020

ITYM "I rarely have bugs in my tests that I'm aware of". The amount of tests I've seen that look like they are working properly, have been "verified" and are buggy is huge. Usually we find they were buggy because someone moves some other tests around or changes functionality that should have broken the tests, but didn't.

Please, don't ever assume that your tests are beyond reproach just because you verified them. Tests are software as well and are as prone to bugs as anything.

pvorb · on Jan 12, 2020

And how do you do this in practice? I am struggling to think of a good way to keep the production code that fails the test and the production code that doesn't fail the test together. I might have my test check out an old version of the production code, compile it and test against that. But that is hard to get right.

buzzkillington · on Jan 12, 2020

Tests always have bugs, it's just that you don't know about the edge cases yet.

smnplk · on Jan 12, 2020

As a preventative measure, I write some tests for my tests. Also in TDD style of course. And on a very rare occasion, I have to write a test for those tests as well.

natalyarostova · on Jan 12, 2020

It’s time for TTDD. Start by writing tests for your tests :)

undergrowth54 · on Jan 12, 2020

I do actually do that. I'll write some buggy code in order to learn how to test for it.

TDD for me is primarily a way to guide myself toward accomplishing a goal. So I sometimes write way more tests for myself than the business needs. I will then delete the scaffolding tests before I tag my PR for review.

lonelappde · on Jan 12, 2020

Tests have fewer bugs if you write them before the system under test, and if they don't have mocks, and if you have enough of them that anything you get wrong the first time will get noticed by the results of the many similar tests.

gusmd · on Jan 12, 2020

> Tests have fewer bugs ... if they don't have mocks

100x this. I've repeatedly fail to convince my team members that mocks are unnecessary in most cases. I've reviewed code with mocks for classes like BigDecimals and built-in arrays. This is especially prevalent in Java teams/codebases.

Drives me insane.

dragonwriter · on Jan 12, 2020

> Tests rarely have bugs, I find, so generally dry isn’t critical

DRY is important for tests, but the areas where it is have probably (if you aren't writing a testing framework or using a language that doesn't have one that covers your use case) largely covered by your testing framework already.

jammygit · on Jan 12, 2020

Tests are there to catch issues. In a way, almost every bug that makes it to production is a test bug.

Not to mention the inconsistent tests that fail so randomly from timing issues that the team ignores them

lilyball · on Jan 11, 2020

I have 2 rules I use when determining whether to duplicate code or to refactor:

1. How many duplications are there? If the code is duplicated once, that's fine. If it's duplicated twice (so 3 instances of it), then it's time to consider refactoring, subject to the next rule.

2. Why is the code duplicated? If it's "incidental duplication", i.e. code that happens to look the same, don't refactor. Only refactor if there's actually a reason why the code is the same. Which is to say, attempt to predict the future: if a change is made to one instance of the code, do I expect it to be replicated to the other instances too?

jrumbut · on Jan 12, 2020

It's also useful to look at not just the duplication but the code itself. In this case, it was code for geometry which is not like to change all too much.

Often the difference between harmless but ugly looking duplication and duplication that is actually harmful relies on the semantics of the code and not just its appearance.

shkkmo · on Jan 12, 2020

It wasn't code for geometry, it was code for manipulating geometric shapes via UI. the article specifically mentions that custom behavior was eventually needed:

> we later needed many special cases and behaviors for different handles on different shapes. My abstraction would have to become several times more convoluted to afford that, whereas with the original “messy” version such changes stayed easy as cake.

Hendrikto · on Jan 11, 2020

> attempt to predict the future

Sounds very error prone…

lilyball · on Jan 12, 2020

It is, which is why you want to be conservative. If the duplicated code is obviously supposed to be identical, then predicting the future should be trivial. If it's not obvious, then it's a question of "can I conceive of a reason why I'd want to update one and not the others?". And if the answer is unclear, wait a while and see if anything comes up.

marcus_holmes · on Jan 11, 2020

This. I wait until the feature is mature, and has been used by actual people. There's no point trying to clean up code that hasn't finished evolving, or that I don't understand fully.

It also means I have a tidy stack of refactoring to do when I'm bored or need a quick motivational win :)

detaro · on Jan 11, 2020

And in the other direction, when adding flags etc to a method that was created to reduce duplication, consider if splitting it into two "duplicate" copies or copy it to the call-site requiring the flag isn't the better alternative.

danabramov · on Jan 11, 2020

Yes. If extracting and deduplicating is in your toolbox, so should be inlining and duplicating.

ahartmetz · on Jan 12, 2020

Or, if possible, split up the helper into smaller units of functionality that can be combined as appropriate for different requirements. That could possibly work for the original article's problem, too.

c3534l · on Jan 12, 2020

Refactoring duplicated code is a trade off: on the one hand you create abstraction and centralizing code logic at the cost of, first, the overhead of learning that abstraction, and second, by increasing coupling between functions. I've personally found that the coupling is by far the most important factor to consider. If A depends on B, and C is found in both A and B, then you should factor out C. If A and B share C because they are adjacent, without being fundamentally intertwined, then duplicate C, but consider creating a library or module that makes it easy to talk about things that are similar to that duplicated code, C. (I tend to think about modules/libraries as little, independent DSLs)

soulofmischief · on Jan 12, 2020

Another thing you can do when a function becomes overencumbered is to split the remaining similar logic into smaller functions which are composed into specialized functions.

This has benefit in that when analyzing modules, you can spot differences in procedure at a glance instead of needing to dig through 100 lines of somewhat similar imperative code.

loopz · on Jan 12, 2020

Right. There are two kinds of such refactoring / DRY-ing up the code:

1) Specialized helper sub-functions/classes to make the codeblocks DRY.

2) Functions/classes to make separate features DRY.

Problem arises when the design or understanding of the code doesn't reflect the realities of changes over time, forcing you to restart/revert, or making spaghetti code with optionals and whatnot to accomodate the rising complexity.

The agile approach would be to make the code that is easiest to change either way, and prevent being locked in to only one approach.

mfateev · on Jan 11, 2020

Go error handling is a good example of this. So far all the attempts to reduce the repetitive `if err != nil { ... }` through some abstraction failed. Look at https://github.com/golang/go/issues/32825

renox · on Jan 11, 2020

In the language maybe but both Rust and Zig show that it's possible to have much less 'bloat' for error handling even without using exceptions.

I'd say that go designers have still work to do: Zig especially show that you can be a 'simple' language and yet have both sane error handling and generics.

arendtio · on Jan 11, 2020

If someone else is looking for examples too, I found those:

https://www.reddit.com/r/Zig/comments/99zlc9/exceptions_or_e...

marcus_holmes · on Jan 11, 2020

I thought all of this until I got used to Go's error handling.

There's a couple aspects to this:

1. After a while, the "if err != nil {" becomes a single statement in your mind, and you only notice it if it's different (like trapping things that should error with "if err == nil {"). In other words, it only feels verbose if you're not used to it. After a while, the regular rhythm of "statement, error check, statement, error check" becomes the routine pattern of your code and it looks weird if you don't check for errors (which is as it should be).

2. The point of Go's error handling is that it isn't magic. There's nothing special about error values, and they are handled exactly the same way as every other variable in the system. The only thing the language defines about errors is that they have a method called Error that returns a string. That's it. This means that you can create complex error handlers if you need it, entirely within the standard language. This is extremely powerful.

The Go team's examination of the language error handling is interesting because it seems there's a conflict between newer Gophers who don't like the verbosity of it (but don't realise the power it brings) and the older Gophers who are used to the verbosity and appreciate the power. Almost exactly like TFA. The repitition looks ugly if you don't appreciate the reasons for it.

int_19h · on Jan 12, 2020

"It isn't magic!" mantra is often heard in Go apologetics, but every time I see it, it occurs to me that Go's definition of "magic" is somewhat akin to a 15th century peasant seeing a lightbulb. Stuff like exceptions or error types isn't magic - they have been around for a long time, they're well understood, and they have significant advantages.

marcus_holmes · on Jan 12, 2020

I kinda prefer this to the Rust apologetics, where every post about Go is replied to with three posts about how Rust does it so much better ;)

I've used lots of other languages, as have a lot of other Gophers. I'm not saying "Go's approach is good" out of some strange tribalism or a need to assert my preference. I'm saying this because, having spent over 35 years programming, I really appreciate the simplicity of Go and the lack of magic. I'm not apologising for Go's simplicity, I'm trying to explain why I like it.

_cx2w · on Jan 12, 2020

A garbage collector of obviously lack of magic.

a1369209993 · on Jan 12, 2020

Exceptions are pretty much the perfect example of (bad) magic actually; unannotated, dynamically dispatched nonlocal control flow that can execute code (ie, destructors) in completely unrelated contexts on the way past. At least 0x5F3759DF can be boxed into a function and commented in one place.

> Go's definition of "magic" is somewhat akin to a 15th century peasant seeing a lightbulb.

This is very true and well put though.

loopz · on Jan 12, 2020

Except exceptions are rarely understood and used correctly by most programmers. They can simplify program structure, but at the expense of proper errorhandling and error mitigation strategies.

Golang is still in the sort of niche that builds databases, queues, container-orchestration, etc., but can be built for other things given enough care for spending the extra effort simplifying the solutions.

hota_mazi · on Jan 12, 2020

> Except exceptions are rarely understood and used correctly by most programmers

That's pretty condescending.

The mechanism for exceptions has been around for more than 20 years, it is well understood by most programmers.

The problem is that error handling is hard.

Exceptions are an adequately sophisticated solution to that hard problem. Go's approach only encourages ignoring errors (since the compiler never enforces that you handle them) and boiler plate.

a1369209993 · on Jan 12, 2020

> That's pretty condescending.

Nope! Exceptions are, in fact, rarely understood and used correctly by most programmers. Including loopz. And me. And the authors of approximately every nontrivially-exception-using piece of code I've had to work with. And presumably also of the code loopz has had to work with.

The difference is that some of us have the good sense to rarely use exceptions at all.

loopz · on Jan 12, 2020

Is it really: Are programmers omniscient then that they can trap all kinds of exceptions correctly from external code? It's a sophisticated method that dumps the problem on the user instead.

Golang also output stack traces and even supports panic() if one wants to have something similar to handling exceptions. The difference is that this is used for classes of errors that ideally are programmer error, and not for all kinds of business logic states. I'm not saying the Go Way is perfect either, but it's at least a small step acknowledging the difference, rather than defaulting to dumping random programmer errors on unsuspecting users.

Errorhandling is easier when improving design. The problem is this takes time, thinking and effort.

hota_mazi · on Jan 12, 2020

Go didn't discover anything. Java's runtime and checked exceptions are already the direct consequence of errors being of two kinds: recoverable and non recoverable.

Go's approach is inferior to Java's in all ways.

int_19h · on Jan 13, 2020

Exceptions are basically error types with dynamic typing in an otherwise statically typed language. So if you can deal with dynamic typing in Python, you can handle exceptions in Java.

_old_dude_ · on Jan 12, 2020

The main issue with any discussions on exception is the elephant in the room, Java. Java has a worst model of exception mixing weird typechecking rules + error handling not forcing to recover the exception.

I really like the exception model of Erlang, recovery is only possible from another routine. It's is in my opinion the best exception model.

Go code is nice because everything is fully explicit but it's hard to read, trying to follow the control flow when half of the statements are error recovery code is just very unpleasant. And i still don't know how to express that an error is the result of another error without forgetting the context.

marcus_holmes · on Jan 13, 2020

This is kinda my point: it's hard to read until you get used to it. I can totally see why someone used to Python has a hard time with Go's error handling, because they're not used to the rhythm of it (logical statement, error check, logical statement, error check, logical statement, error check); they're more used to (exception handling setup, logical statement, logical statement, logical statement, exception handling completion). It's different, and therefore strange and weird. But after a while you get used to it, and expect to see an error check after each logical statement, and it sort of merges into one structure in your mind.

There are packages for wrapping errors, and I believe some form of error wrapping (using the fmt package for some reason) is being adopted. More than that is up to the coder to implement.

hota_mazi · on Jan 12, 2020

> Java has a worst model of exception mixing weird typechecking rules + error handling not forcing to recover the exception.

Unless you have a specific complaint about Java's model (which I'd love to read), I strongly suspect that your beef is with a few standard Java library functions misusing checked exceptions than a statement against exceptions in general.

The combination of runtime and checked exceptions offers the most complete solution to the difficult problem of handling errors, with the compiler guaranteeing that error paths are always handled.

int_19h · on Jan 13, 2020

The big problem with Java's model is that exceptions aren't part of the type system: they're that whole separate thing that is applied to methods, but it's not a part of that method's type in any meaningful sense. This means that it's impossible to write a higher-order function along the lines of "map" or "filter" that would properly propagate exceptions, because there's no way to capture the throws-clause of the predicate and surface it on the HOF itself.

hota_mazi · on Jan 13, 2020

`> The big problem with Java's model is that exceptions aren't part of the type system

They are.

To the extent that the checked exceptions that a method can throw are part of the method signature (as all error cases should).

I have no idea what you mean by "exceptions aren't part of the type system".

loopz · on Jan 12, 2020

Sounds nice, and of course it is possible to build solutions with exceptions that do recover all errors elegantly and cleanly. However, the correct judge on this would be your own users. Given enough care, the discussion becomes rather philosophical.

Though, having to confront errors through the callstack makes one review where handling would be most prudent, in real-life the time-pressures are just too strong making such efforts largely unrewarded.

flavio81 · on Jan 12, 2020

>I really like the exception model of Erlang,

Or the exception model of Common Lisp, which is designed for always being able to recover.

flavio81 · on Jan 12, 2020

>Except exceptions are rarely understood and used correctly by most programmers.

That's just your opinion.

loopz · on Jan 12, 2020

When applications and services still defaults to dumping full stacktrace and reporting programmer's errors, it's from longer-term experience also.

int_19h · on Jan 13, 2020

What are they supposed to do? And how is that different from panics?

edflsafoiewq · on Jan 12, 2020

> The point of Go's error handling is that it isn't magic. There's nothing special about error values, and they are handled exactly the same way as every other variable in the system

The error maybe, but not the result of the call. The multiple-value return x, err is not a first-class value. It cannot be handled like any other variable.

This was demonstrated very clearly with proposal for try. try would have automatically returned with err when err != nil. But what if you wanted to change the error, say create an error message? Then try was completely useless. In Rust, where the result actually is just a regular value, you can transform the error however you like just like any other value and try is just as useful as before.

marcus_holmes · on Jan 12, 2020

Sorry, I don't understand what you're saying. Are you saying that because there's a proposal in v2 for error values to not be 1st class values, therefore they're not in v1?

bambataa · on Jan 12, 2020

I think it’s that in Go multiple return values aren’t a first class value. It’s just two separate values. Whereas in Rust or Haskell they’d be a single, first-class Result<a> (or whatever) value.

lmm · on Jan 12, 2020

> 1. After a while, the "if err != nil {" becomes a single statement in your mind, and you only notice it if it's different (like trapping things that should error with "if err == nil {"). In other words, it only feels verbose if you're not used to it.

The whole point of programming is to abstract away repetitive work. Yes, a human will spot that the pattern is the same, but this is both fallible and a waste of human effort. And even if you can see the difference, those extra characters are still filling up your lines and making it hard to keep functions on a single screen where you can comprehend them more easily.

> 2. The point of Go's error handling is that it isn't magic. There's nothing special about error values, and they are handled exactly the same way as every other variable in the system. The only thing the language defines about errors is that they have a method called Error that returns a string. That's it. This means that you can create complex error handlers if you need it, entirely within the standard language. This is extremely powerful.

There's nothing magic about something like https://fsharpforfunandprofit.com/rop/ either. Just plain old functions and values written in the language (that talk literally gives the definitions of all the types and functions it uses, written in plain old F# code). You need functions as values, but everyone agrees that's a good idea anyway, and you need proper sum types, but you need those anyway if you're ever going to be able to model a domain properly.

morelisp · on Jan 12, 2020

> The point of Go's error handling is that it isn't magic.

The problem is first, that sum types are also not magic. There is nothing special about the error type or value in a `Either<T, Err>`. Go's type system is just too crappy to make such things, or make good use of them even after you tried to shove them into an interface{}.

The second problem is that Go's error values, like every error handling system that pretends it doesn't need sum types, have picked up more magic (%w) or impacted the usability of other interfaces (context.Err, separate error channels) bit by bit.

kelnos · on Jan 11, 2020

The downside I see with go's error handling is that you can forget to check. With rust, if the function being called returns Result, you have to deal with the error (even if dealing with it just means propagating it out). Missing error handling is such a common source of bugs that go really turns me off here.

AlexCoventry · on Jan 12, 2020

> you can forget to check

Linters can help with this.

morelisp · on Jan 12, 2020

Specifically for this issue, linters also have many false positives. Some Go libraries trying to encourage a fluent style will accept an error for some logic also return it, so you can `return x.HandleError(err)` - but if you don't want to return it, you obviously don't care it returns what you just passed it. (I personally consider fluent methods a bad idiom in Go, but I also don't get to write all the Go code in the world or even in my project.)

There are also a lot of functions that return errors because Go's type system demands that if the interface returns two values `T, error`, every implementation must also - it won't auto-create a nil value for the second result. That's reasonable if you are committed to errors just being normal values. But such a restriction would not be necessary if the interface could be declared with a sum type - promotion of a `T` to a `Either<T, ...>` or `Just T` or so on would be fine for all types, not just error handling . Lots of infallible Writer implementations like bytes.Buffer and hash.Hasher suffer from this, and linters can't be aware of all such cases.

kelnos · on Jan 14, 2020

Sure, but given a choice, I'd rather work with primitives that are correct-by-construction, not correct-if-I-use-an-extra-tool-and-actually-act-on-its-advice.

If you do use a linter and have it set up so linter issues are fatal to the build, then you run into the issue that if the linter throws false positives, you have to add exclusion rules (if the linter even supports that) or downgrade linter issues back to warnings, and lose the benefit entirely.

zinodaur · on Jan 12, 2020

One aspect of go error handling that still really bothers me is how easy it is to accidentally drop an error on the floor.

If you strap all the linters you can find onto your build system, you can catch the most obvious cases. But I still frequently find mishandled errors that sneak past the linters in ways that can't be solved without restricting the way devs use the language.

By making error handling something you have to deal with every time you call a function, you massively increase the number of opportunities you have for screwing it up.

I would love something like rusts ? Operator for go. You could choose to not use it when you need special handling. But it would be rare and exciting and developers would use it with care.

DelightOne · on Jan 11, 2020

On the error branch, how do you do code coverage?

asdfasgasdgasdg · on Jan 12, 2020

I don't really think Go error handling is a good example. A nil test paired with a return cannot be extracted to a function. A macro could perhaps centralize the logic, but otherwise it's no more amenable to deduplication than plain addition. Moreover, it's completely trivial, if verbose in the Go language.

The case where it becomes interesting is when there are four or five statements that are repeated. If it's two statements, especially if the statements are both control flow, one of which is tied to the frame of the executing function, and the other is a trivial nil test, that falls firmly on the "not refactorable" side of the line.

yodsanklai · on Jan 11, 2020

> Then you need to start passing options and configuration into your helper method... and before long your helper method is extremely difficult to reason about

In which case, you should split the helper function ( extract sub-part common to all cases, and report the differences where the helper function is called).

I think I would most of the time go with de-duplicating as early as possible, as long as the helper function only has few parameters and can be described in plain english rather easily.

To me the cost of having to refactor the helper function later in the process is less than dealing with duplicated code.

Duplicated code causes many issues. You mentioned introducing bugs, but it also makes the code harder to read. Every person who reads the code has to make the de-duplication effort mentally (check that the duplicated parts are indeed duplicated, figure out what they do, and on what parameters they differ...).

barrkel · on Jan 11, 2020

Premature optimization. Duplicated code is only evil when there's a bug you only fix in one place and forget about the duplicates; in almost every other case, it's easier to reason about and is more resilient in the face of local changes.

Abstraction is often like compression, and compressed data is easier to corrupt. Change the implementation of an abstraction, and you put all consumers of it at risk. It's not an absolute good.

yodsanklai · on Jan 11, 2020

> it's easier to reason about

Consider you have 4 times a block of 10 lines of code, they are identical except for a couple of parameters. The person who reads the code has to 1. figure out what the code does 2. see if the duplicated parts differ in some subtle way.

The alternative is to replace the duplicated parts with a function that has a meaningful name. This makes the code easier to read. It's not a premature optimization. It would make sense to keep it duplicated if you're in an explorative phase and not sure yet what the final design will be, but I wouldn't submit a patch where parts are obviously similar. I'm not sure it would pass review.

barrkel · on Jan 11, 2020

It really depends. For test setup, I'm inclined to leave alone; duplication is often easier to reason about when a test breaks. For code with control flow changes in the middle, abstraction is honestly dubious. For mere value differences, maybe, but if the values are complex and not merely scalar, maybe not. More duplication is needed to justify when the abstraction needs more edge cases to cover them all, especially control flow more than different values.

jandrese · on Jan 11, 2020

I generally find it pretty easy to reason about code structured like:

switch(object)

type1:

(bunch of code)

type2:

(bunch of code)

type3:

(bunch of code)

etc...

Even if the function is long it's pretty easy to skip over the irrelevant parts.

When you get in trouble is when you discover a bug (or have changed requirements) in something that gets duplicated several times and have to remember to hit all of them. The last one especially, it's the one that seems to be missed the most often.

Overall the tradeoff is generally worth it though, because you only need to care about one case at a time.

tsimionescu · on Jan 12, 2020

> Even if the function is long it's pretty easy to skip over the irrelevant parts.

> Overall the tradeoff is generally worth it though, because you only need to care about one case at a time.

Which part is irrelevant? As a programmer, I don't generally know which value `object` has, so if I need to understand the whole statement, I need to look at every case, so I often need to check whether they are identical or slightly different.

Duplicate code like this is a well-known source of bugs, one of the cases most often highlighted by static analysis tools.

jandrese · on Jan 13, 2020

It kind of depends what you're doing, but a lot of the time you are in a situation like "there's a bug when you do X", and when you find code like this you scroll through the options until you get to the one labeled as function X.

Inside of there you may run into stuff that was set up earlier in the function, but it's pretty easy to backtrack to the top of the switch statement to see what was done before then.

There's nothing quite like the joy of being asked to fix something and discovering that it's a big old top down project without excessive abstraction/compartmentalization. Just skim through the code in order till you get to the part that's misbehaving and start investigating. No need to jump around a dozen factory template engines or custom generic template wrappers to find the 3 lines of business logic that have an off-by-one error or something.

marcus_holmes · on Jan 11, 2020

I am such a huge fan of Big 'Ol Switch Statements over using polymorphism/types/generics/whatever. So easy to understand, and it's all there in a huge scrolling list of cases. If something needs to change, it's easy to change it, and you know what else is affected.

tsimionescu · on Jan 12, 2020

That works until you have 20 different Big Ol' Switch Statements, each switching on (mostly) the same cases, essentially implementing a set of related behaviors in 20 different places instead of grouping the 20 behaviors under the same umbrella.

Overall, I think there is an equilibrium between the number of cases in the switch and the number of different switches with the mostly the same cases. The fewer cases ypu have and the more times you handle the same cases, the more it will help to group these different behaviors in separate places: each case would correspond to a different class, each switch statement to a different method in each class. The fewer classes and the larger they are, the more I think it helps to apply this transformation.

By the way, this has nothing to do with the discussion above. The alternative to the switch that GP commenter presented isn't polymorphism, it is simply extracting the common lines into a separate function.

marcus_holmes · on Jan 12, 2020

you're entirely right, of course. I just wanted to express my enthusiasm for the approach :)

zmmmmm · on Jan 12, 2020

> Duplicated code causes many issues. You mentioned introducing bugs, but it also makes the code harder to read

That is I believe contestable. Yes, it can end up easier to read but there is a big tradeoff - when you remove code from its context it is much harder for a reader to reason about it. Once something is extracted into a function you can't see, you have to mentally ask the questions like "could this return null / None?", "how will it behave if the input is negative?", "does it throw exceptions", "is it thread safe?" etc etc. All this is directly observable in context when the code is inline, not so when it is removed to a central place.

yodsanklai · on Jan 12, 2020

Ideally, these questions should be answered by the function name, type, and its documentation. And if not, one can always jump to the function definition and the above elements (missing if the piece of code is inlined) will give additional hints to what the code does.

specialist · on Jan 11, 2020

Upvoted. We'd probably come up with different implementations, I strongly prefer composition and "interpreter" style code, but whenever I've seen casual mutations in different dataflows, it was because of doing one off changes while ignoring the whole.

partyboat1586 · on Jan 12, 2020

I think about it in terms of bugs. If your abstraction causes a bug, I have to go in and work out wtf your wonky abstraction is doing and also risk breaking other cases. If there are 6 duplications and there is a bug because one of them is missing a change applied to all of them, that takes 5 mins to fix and risks breaking nothing.

When you make an abstraction think not only "Will this create bugs?" But also "If this abstraction does create bugs will they be easy to identify and fix?".

watwut · on Jan 12, 2020

Extracting common function was right move in your example.

The bad move was to add options to common functio instead of changing one caller to call new different method.

Even worst move was add more and more options to originally simple method. Just because something was rightfully extracted to common place 10 months ago dies not mean it have to remain in common place. And the need for split does not mean original extraction was wrong.

vonseel · on Jan 12, 2020

I’m just wondering where you draw the line.

As with everything, it often becomes a big grey area on what is acceptable and what is not.

Example: my (fictional) company sells a B2B platform which provides companies with an online marketplace or some other type of online application. Each installation requires different, though often similar, integrations with the customer back-ends - think Postgres vs MySQL, but some customers may also use TotallyEffingObscureDB, which doesn’t even speak SQL. Those backends are usually the source of truth for the entire stack, storing user data, etc, the local data is just a mirror.

So, given this scenario, how should we approach user registration processes? Is that a product component or custom (not DRY) code? What about all the other (similar, common, and basic) features of our platform that every customer wants but invariably needs to be implemented slightly differently?

I’m only posing these questions because I worked in a situation where something similar happened, and it wasn’t dealt with well, and I’m wondering what other HN coders have to say...

Fanmade · on Jan 12, 2020

Ironically, we're handling this with the concept called "clean code". We do have a core, which does implement the base logic. Everything in there is domain driven, but only using DTOs and providing interfaces for input and output, using repositories and presenters. When the data source changes, we only need to add the new repositories and set those within the context. If we have to implement some specialized logic just for one client, we add this as a plug-in on top, so it is handled outside of the core logic. Of course all of this is very abstract and since I've worked mostly with simple MVC concepts until now, I'm still struggling to get my head around this approach, but so far it's looking very well. It may look overcomplicated at first, but there is a very strict while at the same time very flexible logic behind it, which can handle all the "creative" ideas our clients have so far, while still keep being maintainable. It is a long process to get to this method of programming, but my initial scepticism has completely changed to "why haven't I worked like this before?".

vonseel · on Jan 14, 2020

That’s pretty much the approach I would suggest and think works best. Thanks for your reply.

Looking back, I think the place I worked had a lot of issues with how code was actually stored and maintained - different repositories for each client and each feature, for example, meant a lot of common code was simply copy/pasted when reused and that obviously made sharing bug fixes much more difficult or even impossible. Real dirty stuff. The solution isn’t all that complicated but when you’re on a services team with several hundred clients and hundreds of issues in your backlog, management focuses less on paving the way for the future and more on getting things done immediately. A few of us did implement a single code-bass / configurable framework for one big feature, but it was hard to get buy-in and convince people to use it - even if it reduced the workload from days or weeks to hours. The concepts from that were eventually re-packaged by management and sold by a more creative manager as an “SDK”, but I didn’t have the privilege of working on that team.

qlk1123 · on Jan 13, 2020

Interesting. I'm curious how long did your core system with basic-logic take to reach that maturity, and how many people involved? What kind of development model did you use?

Also you state that you "still struggling to get my head around this approach," does it mean that the system somehow violates the principle of least astonishment? (https://en.wikipedia.org/wiki/Principle_of_least_astonishmen... )

wutbrodo · on Jan 12, 2020

My first job at a bigco had a very heavy read-the-code culture, and being able to read code and write readable code is one of the most valuable skills I learned there. The lack of this skill is one of the things I've found most frustrating about working with less talented or, now that I'm a bit later in my career, more junior engineers. There's a tendency to glom onto black-and-white, superficial rules without understanding them, instead of appreciating the principles underlying them and having a sense of how to balance trade-offs. This creates an unfortunate cycle: everyone writes unreadable code, so nobody practices reading code, so nobody internalizes what readable code looks like and continue to write bad code.

I tend to have a strong reaction to duplicated code, but DRY is risky if whatever you're pulling out isn't logically concise and coherent[1]. Some of the helper functions I've seen in code reviews (as well as the one in the OP) strike me as written by unsophisticated language AIs, catching textual similarities that aren't semantically linked and pulling them out.

The engineers I've mentored over the years, including ones starting with no eng experience, go on to write fantastic code (and no, this isn't just my opinion). But it's a very labor-intensive, hands-on process of thorough reviews and feedback until they internalize the underlying principles, and it can be tough to swing in very short-term-oriented company environments. Now that I'm running larger teams, I've been noodling over how to encapsulate this deeper understanding of what makes good code in a way that scales better. But the fact that Google et al haven't already done this makes me think it's not something you can teach with eg a bullet point list.

> I encourage my devs to follow a rule of thumb: don’t extract repetitive code right away, try and build the feature you’re working on with the duplication in place first. Let the code go through a few evolutions and waves of change.

(Note: this isn't a case of what I describe in the earlier part of my comment, as I don't think this is a superficial, black-and-white rule)

I disagree pretty strongly here, especially since you're #2 involves waiting til you hit a bug. IME, there are many cases in which a solid understanding of the code allows pulling repeated code out in a way that's principled enough that it's more likely to adapt well to future changes. Indeed, this is true of most changelists to some degree, or you'd end up with no functions or classes other than main().

[1] A good rule of thumb is whether the free function has a reasonably readable name that captures its logic, without having to abuse handleFoo() or processBar().

useragent86 · on Jan 13, 2020

I'm about 70% down the page, and this is the best comment I've read so far. I think you should write a book, and/or a blog. I'd read it, anyway. :)

kazinator · on Jan 12, 2020

Not only DRY but YAGNI.

Unless you are sure you will need to separately evolve the initially duplicated cases, assume YAGNI: You're Not Gonna Need It (the repetitions).

Avoid repetition like the plague; when the new requirements arise that conflict with the reduced repetition, confront it at that time.

It's possible they they are bad requirements, in which case the clean, DRY code can support arguments against the requirements.

Treat the computer as more of an Analytical Engine, and less of a Jacquard Loom.

The special case behaviors for various shapes hinted at in the article:

> For example, we later needed many special cases and behaviors for different handles on different shapes.

This sounds like it might be a bad experience for the end user who has to learn annoying shape-specific handle quirks.

jkoudys · on Jan 12, 2020

It's important for developers to understand that programming is at least as much about expressing yourself clearly with language, as it is about maths and compsci views of functions. Language that's more verbose, but also adds clarity, is a good thing.

If I had an instruction book on building a cabinet, it wouldn't help to re-list every screw, tool, and their sizes on every single step if I could put a parts list at the front. But it also wouldn't help to collapse every matching group of steps with one or two different parameters together.

mhh__ · on Jan 11, 2020

Good generics can be really helpful here.

Can't be specific without knowing the exact helper function but I find sitting down with a cuppa (away from the computer) and planning an interface (If it's C++) that accepts (say) a policy class with a default and a callable object ("Functor") very helpful.

Sometimes it can't be done but it's better to have a bugfree building block (Macros don't count!) that can accept buggy user defined tasks than lots of repitition.

morelisp · on Jan 12, 2020

> Macros don't count!

Well, C++ macros don't count.