Hacker News new | past | comments | ask | show | jobs | submit login
Redundancy vs. dependencies: which is worse? (2008) (yosefk.com)
98 points by ColinCochrane on Nov 3, 2017 | hide | past | favorite | 29 comments



My personal view is that redundancy on stupid things is okay. That is, things that are very simple, you should be okay with doing over and over in multiple modules if you need it.

The experience that lead me to this was a particular team I was on where a previous developer had said "aha! All of these pieces of code relate to the same domain model, so we should build a common package of domain model objects and have all the modules share those objects! That way, we don't have to write them all over and over again!"

These weren't just model objects for data passing, but actual business logic inside these models. Usually very simple stuff, but occasionally a particular module would need some special logic on those objects, so in it went.

The problem was that over time, there modules started drifting. Their domains became different ever so slightly and the owners of those modules became different teams. Now if you wanted to modify the domain object, you had to check it wouldn't break code the other teams were developing.

It became a real mess. And all to avoid writing the same simple pojo classes more than once.


Something that helped me a lot was somebody pointing out that it's OK for code not to be 100% DRY - sometimes things are the same at a given point in time only by coincidence, not by some inherent logical connection. It's a mistake to refactor and remove these "redundant" parts because they are not truly redundant. This helps me chill out when deciding when it is useful to add an abstraction or a separate helper function or whatever.


Indeed. I don't know if it's my age or my domain, but this is something that comes up a bit on the design of my own projects and code that I try to get younger programmers to think about, and which might look weird to those who have rote learned the "don't repeat yourself" mantra, or the culture of "just import everything from modules". Sometimes it's responsible to repeat yourself if you have good reason to believe things will shift/change in the future.

A lot of young data scientists and analysts (and IT types in general) code in a way that solves the immediate problem.

But with a bit of time you start to realise that the initial brief is anyways part 1, an executive/customer will change their mind 15 times before the end of the project. What seems like the same problem now will not be the same problem in two weeks, let alone two years. Doubly so if you're interacting with entities or sources that aren't software engineers.

Over the long run, excessive modularisation creates what I'll call Frankenstein programs. You've been tasked with making a man, and what you end up with is a shambling golem made from rotting, pulsating, mutating parts all stitched and held together. If you're really unlucky, it will evolve further into the akira/tetsuo program, where you begin to lose control of the mutations until it self destructs under its own complexity.

The interesting part is that the answer to this can also be partly found in nature: you modularise and specialise, but you also make strategic choices where you're deliberately redundant.

Too much redundancy is spaghetti code. Modularisation and structure save you there.

Not enough redundancy leaves you vulnerable to changes in your environment and mutation as the project ages and evolves.

As I've gotten older, I'm placing more and more value on the later. Your mileage may vary...


> Too much redundancy is spaghetti code.

Well, there's uncooked spaghetti code and cooked spaghetti code. :)

What I mean is that redundancy can be uniform, obvious, and easy to encapsulate later (if need be). Alternatively, it can be unpredictable, baroque, and difficult to reason about.


I think I've come to a different conclusion reading your run down of this. The key paragraph is this:

The problem was that over time, there modules started drifting. Their domains became different ever so slightly and the owners of those modules became different teams. Now if you wanted to modify the domain object, you had to check it wouldn't break code the other teams were developing.

Isn't the real issue here that the architecture didn't keep pace with reality? It sounds like the dev who made the package had the right ideal. The real issue was subsequent devs introducing their domain specific stuff into a common package, instead of extending it or composing it with their own domain specific code.


The point of the article is that separating the common code into two different pieces is often better than "extending or composing" the shared code. Merging code together is fine if you're willing to separate it again, but lots of devs aren't.


Domain models can be shared as long as it's clear from the beginning that there is a "one true domain model" that the library is targeting, that everyone agrees on, and that has no reason to ever change. You see ADTs like this in language stdlibs and RDBMSes: they have domain objects for Datetimes, for IP addresses, for UUIDs, for URIs, etc.

Basically, if the semantics of your domain model are specified by an RFC, you can probably get away with turning them into a shared library dependency. Because, even if they weren't shared, everyone would just end up implementing exactly the same semantics anyway.

If someone's implementation was "off" from how the RFC did it, that implementation wouldn't just be different—it'd be wrong. There are no two ways of e.g. calculating the difference between two datetimes given a calendar. There's one correct way, and you can make a library that does things that way and be "done."

---

On the other hand, there is a good reason that Rails et al don't automatically create a migration that creates a User model for you. Every app actually has slightly different things it cares about related to the people using it, and it calls these things a "User", but these aren't semantically the same.

In a microservice architecture, service A's conception of a User won't necessarily have much to do with service B's conception of a User. Even if they both rely on service C to define some sort of "core User" for them (e.g. the IAM service on AWS), both services A and B will likely have other things they want to keep track of related to a User, that is core to how they model users. It might be neatly separated in the database, but it'll be hella inconvenient if it can't be brought together (each in its own way) in the respective domain models of A and B.


> On the other hand, there is a good reason that Rails et al don't automatically create a migration that creates a User model for you.

Once you ignore frameworks like Django, which provide you with user model, then yes, you'll be correct.


My current approach is to be liberal with redundancy as I begin a prototype and then gradually replace it with dependencies as my code matures. This approach seems most realistic to me and I have used it to much success.

Ill-formed or too many dependencies seriously constrain speed of development and much worse, they suck the fun and flexibility out of development. "Build tools" that manage dependencies are usually opaque about whats gone wrong. Not fun. Spent too many hours trying to find "which version of this library does this function need?"

Too much redundancy is a bug magnet and I cannot emphasize how much I loathe it. There really is no good answer to 'Why is this function duplicated with one extra parameter ?'


One way to theorize about that experience is to note that the important argument for reuse is that changes only have to be made once to take effect everywhere. The classic change being the bugfix.

But the buried assumption there is that you actually want the change to happen everywhere. A hard thing to decide without hindsight.


I like this. Perhaps the heuristic for refactoring redundancy should be not how many times you've written the code, but how many times you've made the same fix in multiple places.


Default behaviors with custom overrides are the way I've always dealt with such "redundancy". In the end, tracing down the default behavior with it's own default behaviors often causes me to pull them out as simple pojos. I think there is no silver bullet, but that redundancy has a usability X factor that often goes unaccounted for, in the abstract.


It would be nice to fork a class and use it in your project customized to your needs. The forks need never be re-integrated upstream, but that they are forked is documented and the possibility for reuse and generalization is at least a bit more likely.


Indeed. But the political problem is that you can only make this argument retrospectively. When you say this is how it will turn out, you’re advocating for redundancy and everyone learned in Comp Sci 101 or Smashing Magazine or whatever that redundancy is bad. It takes longer to understand why dependencies are bad, in fact, worse as TFA says.


This guy definitely codes, I was nodding nearly every paragraph.

Redundancy vs dependencies is a constant battle, even internally or inside your own head.

Many times it depends on the project, team size and ability of libraries to be maintained whether budget, time or goal of the project and its lifeline.

For instance in gaming, you might have common libs for product purchasing, profile systems, networking, data, serialization, and platform libs to abstract calls. But systems like game objects, physics, gameplay and ui aren't as lib-able unless it is a series of the same type of game or system.

It is better sometimes to just get it out, prototype it and iterate to ship, then harvest and integrate useful/obvious items into dependencies/libraries for the next one. If you are always organizing to dependencies first, you will end up doing lots of unnecessary optimization and sometimes impact shipping speed.

The balance on this is a tough one to learn and changes as you grow more experienced.


Neither, they both can be good or bad. Part of being a professional is learning that there are no hard lines in software development. Being able to tell when something is good and when something is bad.

When I start developing I encourage redundancy. Premature abstraction is a very dangerous thing that can pin you into a tight corner.

Dependency is also fine, I keep things coupled together a bit close when they are still in development and malleable and decouple once they start to solidify.

It's my personal style of development when working alone. When working in a team, I follow whatever convention that works best for the team.


I am often of the impression that it is ok to start with many dependencies to get an application into an experimental state quickly, but ultimately this is extremely immature. As the application becomes polished over time and as there are refinements to increase performance, reduce complexity, and trim some fat many of the dependencies will go away. The reason why many dependencies peel off a maturing application is that as the programmers become more involved and aware of their knowledge domain they have to ask themselves if they can simply "do it better and faster on their own", which is likely.

Unfortunately many applications, or their programmers, never reach maturity. Also there is greater security and dependability in managing certain critical functionality more directly instead of relying upon support from a third party that may not exist with the same level of compatibility in the future.


As the application becomes polished over time and as there are refinements to increase performance, reduce complexity, and trim some fat many of the dependencies will go away.

Do you find that many teams have the discipline to do this? The cynic in me says that software far more often gets slower, more complicated, fat and dependency ridden over time.

More constructively I could ask: what is it that gives a team or project the discipline to do this work of improvement. How could I encourage such a virtue in my own environment?


> Do you find that many teams have the discipline to do this?

It depends on the motivation. Ultimately it comes down to discipline.

As applications get popular over time their code base starts to age and get filled with chaotic cobwebs as new features are added or requirements shift. If you don't really care you will do the minimum required to complete your assigned task and await the next assigned task. This is what it is like when the people who do the actual work have no skin in the game. There is no sense of failure so long as you ship unbroken code, and so everything is very mediocre. If you completely suck as a developer then mediocre is your personal resounded success, and so what is just another layer of abstraction if it makes your job easier.

In startups, personal projects, or ambitious open source initiatives failure and pressure are constant reminders of reality. Everything is highly visible and it is always your fault until you ship something amazingly awesome. You need to ship quality code and make changes in the shortest time possible without dicking up the application. You don't want to get caught with your pants down updating a very simple bug only to discover a larger series of bugs in code you don't own that prevents you from shipping your little tiny code fix.

Also keep in mind that nobody (well... very rarely) is going to look into your code to see if you have a million dependencies, but they know when applications are slow heavy pieces of crap. If an application constantly breaks because the dependencies don't play well together and your developers lack the confidence to solve elementary problems your brittle house of cards pisses people off. I am a web guy and many major companies ignore these problems by doubling down on their branding, instead of training their people to write better code, which really pisses me off.

> what is it that gives a team or project the discipline to do this work of improvement.

Ownership of success/failure. When your job or equity are directly impacted (the more directly the better) the emotional connection to quality drastically changes. I apply this level of thinking to my big open source project even though I lose money on it, because I know people depend upon me and I own it. I can be more successful than I am now if I continue to improve what I own or I can be a disappointing failure.

This is harder than it sounds because metrics of product quality the confidence to attain such a level of quality differ by experience. Not everybody has years and years of experience. In that case you need somebody who can supply some technical leadership and not be afraid to hurt people's feelings when things get tough.


You depend on code that is more abstract than yours. You replicate code that is at the same abstraction level of yours. You shouldn't even be touching any code that is less abstract than yours.

There is something to be said about libraries quality and trivial pieces of code. But I feel that this is just a rant about some environment that lacks a good command line parsing library.


On my last project since I was a one-man team, I erred way over on the side of dependencies. Thought I was being clever and organized for the first time, by encapsulating related things in related places and being a stickler about it. But in the quest to do that, I ended up with a large group of tightly-coupled trash cans!


In Go, "a little copying is better than a little dependency"

https://www.youtube.com/watch?v=PAAkCSZUG1c&t=9m28s



In the long run, dependencies are more expensive to work with, though programmers usually don't see it nor admit it, as language package managers hide the initial portion of the cost (the smaller one, but much better visible).


The main problem with redundancy is that every time you refactor you need to look for all the occurrences, and sooner or later you'll forget to update the change in one of the places. So in my book redundancy is bad almost always. On the other hand if it's only a few lines without extra dependences, it's often more readable to just repeat them, especially (and it's quite common) when you would need to add extra logic and conditions just to make that little abstraction work.


Let's talk about risk of regression.

Ok I forgot to update some code in one place because I was doing duplicated code. If it is often used place it will be found out quick by QA or users. I kind of have more control over what can break.

On the other hand I made something that is abstract and used in many places in code. I might not be able to tell what will break maybe it will be 5 places maybe none.

In the end you should be able to find similar code in all project files with the tooling. The same with references to abstract code, but abstractions are behind interfaces or are subtypes and it is harder to find in my opinion than doing CTRL-F on all files. Specially with OOP code until you run the program you might not know which method is going to be called.


For me it's a lot about the stress of refactoring: if I know the code is all in one place, I can make sure that I don't break the contract and be fairly relaxed about making my changes. Searching through files, making sure that you've found all the occurrences, reading each of those blocks of code and trying to understand them enough to be sure you're not breaking something is a lot more work, but more importantly it makes me feel very uneasy about refactoring. It's much more stressful because there's more moving parts, you need to be concentrated about a lot more things at once.


I always stop and think when I am writing almost identical code multiple times. The kind where everything is the same except for a few data types. Or the kind where every third line is slightly different but you can't switch them around into a single code block guarded by an if.

Trying to DRY would turn this code into a monstrosity (templates - even more interfaces - more ifs than actual logic?). I often find myself forced to let the repetition stand because it is the more obvious code. But it never feels right.


The command line example is a bit odd since any POSIX environment has getopt (GNU has getopt_long too) and most languages out there have well-developed command line parsing in their standard libraries (eg. Python's excellent argparse).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: