He missed at least one: functions should be "small" and broken up for "testability."
No, functions should perform one task and perform it well. Not every conditional test inside the function needs to be its own function and have its own unit test. Sometimes a function that does just one thing and one thing well has more than one step to do that thing, and those steps don't necessarily belong in their own functions. Testability and composability are important, but that has to be balanced against locality of reference and context. When I see a colleague write in a code review that pieces of a function should be factored out of a larger function "just in case they'll be reused" I step on it hard. This is related to "over generalization" but not exactly the same.
This resonates with my latest experience. I am working on a 3D game simulation and I have two player entities (Batter/Pitcher) who run on the ground. The running is done between to predefined control points/nodes in the scene graph which in turn is converted into a path. Every frame update I get the current location and move to the next location.
// func buildPath(pathName): // builds a path between start and end control nodes fetched from the scene graph
// func updateFrame(timeElapsed): // gets current location and move to next location using timeElapsed
Now both these functions are not too big, but not too small either. However a reviewer mentioned I should split and I refused because they don one thing and do it succinctly.
This useless dumb fascination with test practices and being rigid about it, causes grief, both for the code and the coder!!!
> that has to be balanced against locality of reference and context.
I'm also against splitting stuff like crazy, even though I tend to write small functions. For me, the primary concern is "how easy is this function to understand"? Sometimes splitting out code into separate functions actually hinders readability - especially if it makes reader jump around, or uses state passed implicitly between many small functions.
I find good functions to be structured like good explanations - the function body should give you a general and complete understanding of what it does, and low-level details get outsourced to helper functions, that you can recursively review if needed.
I think part of the problem comes from the term "split". Some may assume you are a referring to splitting a function into two or more functions that are called from the original call point, while others may assume what you are referring to here, the separate of concerns inside the function into small sub-functions (which can also sometimes make them reusable) to encapsulate concern.
I think the former is of limited use if any, while the latter is an invaluable tool to managing complexity of a code base. That people may assume the former when the latter is meant is unfortunate.
For now it is an iOS closed source app. I will ask my team if we can get you added as private contributor, if you are OK with it.
My email is my profile. If you can reach out with what you would like to contribute with in case you have anything specific in mind, great. Else just ping, we can talk about potential contributions you make.
For me a functions is meant to represent an "idea", a way to "transform" your given into something you want.
If you look at it this way it is easy to see when a function needs to be sub-divided. When an idea relies on another idea, it should be broken out. Otherwise, in the case that the lines are part of the main "idea" then it should remain.
When grading homework for the course I TA for at my college it is evident that this idea is not understood. I have to admit, I don't actually understand it. It's just something I've taken to doing. I can't describe what I mean really and I don't think I've ever seen a good explanation.
I think it's obvious that you practice the same way I do, but I still see a difficulty in articulating these thoughts.
It goes without saying that, basically, we know that some things need to be moved out, and some things don't, but it's impossible to tell or describe to someone when these actions are taken.
Does anyone know of some place that describes this well?
Yes, yes! Having to locate and dig through 10+ different smaller functions that are called within another function is much harder to troubleshoot and work with. Especially when 95% of those are not re-used anywhere else.
I know it may not be the popular belief but I would much rather work with a larger well written 50 line function that does what it is supposed to do vs having to navigate around a bunch of smaller 5 line functions that are never re-used anywhere else.
If it's done right, though, you shouldn't have to navigate around all the smaller 5 line functions, at least not unless you're debugging. Each of the submethods should clearly describe what it does, and do that without any unexpected side-effects or implicit state changes. Effectively, what this does is raise the level of abstraction of the code, so that your 50-line function becomes a 10-line function, and you only need to read those 10 lines to understand what's going on. Sometimes when you're debugging, you might need to dip into these subfunctions, but when you do, you should be able to think through them as mostly independent units that just need to do what their name says they will do. This ideally limits the problem of having to load everything that your code is doing into your head all at once in order to reason through it. When you're just reading through the code normally, you should be able to gloss over the details of how the submethods work, just as you probably wouldn't feel compelled to dig into the internals of how a library call like Collections.sort works.
The problem is that method decomposition is not always done well, and when there is unexpected behavior or implicit state changes, it can make bad code even harder to navigate. I'd also rather read a well-written 50 line function than a poorly decomposed cluster of 10-line functions. With proper care, though, method decomposition is super helpful.
Part of the problem is that the abstraction of "a sorted list" will never change. But the abstraction of "twiddle my foobar" can change on a weekly basis based upon business needs. Because of this the abstractions we write will never be as strong or enduring as those found in the standard library.
Certainly true, but that's not a reason to give up on creating those abstractions, like the post I was replying to was suggesting. It's a reason to be judicious about when it's appropriate to create these abstractions, and in particular about when they should be shared. An abstraction that's used in just one or two places is still really easy to change. A bad abstraction used in a dozen places is a pain the ass to clean up. It's not an easy problem, and it ultimately takes some combination of experience and careful mentoring to learn to identify good and bad abstractions before they're built.
That's the theory. In practice it is difficult to get the right level of abstraction. This is why writing a good library often takes 3x as much effort as writing the equivalent application code.
This only means that you are not putting enough effort when writing your application code compared to the library.
So you are basically adding technical debt to your application.
Apart from a deadline that will make the project fail if it is missed I don't see any rational reason to add technical debt.
And also in that case you must plan some time to remove the technical debt that you just introduced, and better sooner than later.
So writing a library is not harder than writing a good application.
It is certainly harder than writing quickly a mess of hacks that become an application.
Since libraries have different use cases and characteristics than normal app code they can and should be treated differently.
A 10% cognitive cost for code that is used by thousands of developers is a very high price to pay for an initial 3x speed boost to one developer.
A 10% cognitive cost for code that needs to be understood 3-10x before it effectively expires (like most app code) is a great tradeoff for a 3x initial speed boost.
I'll get my app in my customers hands in 3 months, wear a little "tech debt" on the dev side, and get crucial feedback. You can take 9 months to deliver a functionally equivalent app with slightly better internals.
Part of that can also be that the helper functions are taking in whole objects (think js mvc style model objects) when they're really doing a date calculation on one field in the objects. It's not bad that the logic is encapsulated, it's that its encapsulated at the wrong layer. I've run across well written 50 line functions in Perl or c# but I would hesitate to call a function in js or other potentially more concise languages well written (discounting lines devoted to object/dict/hash definitions).
Coincidentally covered, but just not in terms of functions :)
5.1. Sandwich Layers
Lets take a concise, closely bound action and split it into 10 or 20 sandwiched layers, where none of the individual layers make any sense without the whole. Because we want to apply the concept of “Testable code”, or “Single Responsibility Principle”, or something.
But you have expressed this way better than I did, thanks :)
Unit tests are a fantastic tool, but they have clear limitations.
Tests (unit, integration, system, acceptance or any other kind) should do two things:
- while developing, help checking that the code is doing what it should (e.g. fixing the bug)
- after developed, ensuring that the code keeps doing it.
(ideally also ensure that it doesn't do anything not supposed to do, but that's much more difficult)
That's why the best tests should aim for weak spots and try to give a good assurance that, if there are defects, the test should have a good chance of failure....
Those weak spots sometimes will be in single functions, but other times they won't.
Creating tests just for the sake of having them, not adding a better chance of them failing when a problem arises is just wasted time. Or worse, coupling the tests with implementation details that are irrelevant (for example, this function is divided in two or three because is big, just to add readability) is just pointless....
Well, I've never really used FP outside of hobby/toying around programming so I couldn't say. I'd say it applies to any programming paradigm, including OOP or non-OOP procedural programming.
The problem that always ends up with this is what defines "one task". I've rarely come across people who didn't agree with the notion that things should only do a single task, but I often find disagreement on the granularity of what a task is.
FWIW I don't prefer small functions for reusability but for reasonability. It's a lot easier for me to discern what that block of 10-15 lines of code is doing if it has a descriptive name and a type signature.
Honestly it's one of those YMMV things, as you say there is no right answer. That's why I said "for me" :) People are different and drill into information differently as well.
Most of the arguments I've had over the years on this subject were cases where we were both right. To them X was more reasonable and to me Y was, and the two were mutually exclusive.
As I write this it makes me wonder if this whole topic is a fool's errand, are we doomed to forever be trying
To solve an unsolvable problem?
Good point, the code locality is very important for maintainability down the road. When a single logical unit of functionality is unnecessarily split up across a file or even worse split into multiple classes it makes it much harder to understand.
I think a balance is possible. I tend to prefer to decompose functions for readability. A series of simple sequential function calls that describe what they do is more quickly readable than the same instructions listed as a series of compound operations.
These function definitions are also typically defined in call order in the same file.
If you need to know what the individual functions do you can scroll down the necessary few lines , but IMO as long as you name your functions honestly that isn't generally required to be ablr to understand what both what it's doing and where you're likely to need to make your change/fix at a glance.
This style does require a lot of trust that everyone is following the same strategy - as soon as you have to doubt that accuracy of a named function in this approach, the benefit is lost. So it becomes important to keep them up to date - changing what function does must also change its name.
Thank you. I find layered stuff much easier to understand than massive clutter with "housekeeping" and "domain logic" stirred together, as well.
My IDE allows me to leave the "documentation" panel open attached to a side of the window. As the cursor/caret touches an identifier, the documentation for that function/method or constant/variable appears automagically. But of course, I type reasonably well and write Javadoc/JSDoc on my stuff, so there is something to see. Others seem to rely on gargantuan "self documenting" (bullshit!) identifier names for the IDE to auto-complete, rather than actual documentation. Pet peeve.
Sure, and it's great when a balance is struck. Too often, at least in my experience, the smaller chunks get shuffled out of the same file and into a nightmarish Rube Goldberg concoction in "utils" modules, packages and whatnot. I use primarily compiled C-like languages (C, C++ and Rust, specifically). I tend to put chunks of code that might be candidates for factoring out in their own scope block with a comment. Then, if the function really does become "too big" (for the very subjective measure that is), bits can be factored out on an as-needed basis.
The problem with splitting function up a la OOP best practices will ironically result in zero code reuse because all functions will depend on each other.
Back in the days when I worked on a largish "classic" ASP site (each page is a file full of mixed HTML/VBScript) the senior developer insisted that the best approach for developing anything new was to copy a page that did something similar and then change it to fit the requirements.
There was no code re-use.
Sounds insane by today's practices but in reality it worked well more often than not. Business functional changes almost always applied to just one or a small number of pages. You could change those pages with impunity and be pretty confident that you would not break anything in any of the hundreds of other pages in the site.
Rarely this approach caused more work when a change did affect dozens of pages. But on balance it made most changes much easier to implement and test.
Yep, that is one of the reasons PHP and ASP got such a bad name. Projects like those take a few greps and minutes to get into and get you to productively make changes without much risk. And, outside HN, it is still quite common because of it.
The trick is to strike the right balance between repeating code and testing it. I've seen codebases become unmaintainable piles of almost repeating code that was never tested beyond a developer opening the page and checking the behavior manually.
To prevent such messes is one key responsibility of a developer.
> Back in the days when I worked on a largish "classic" ASP site (each page is a file full of mixed HTML/VBScript) the senior developer insisted that the best approach for developing anything new was to copy a page that did something similar and then change it to fit the requirements.
You should do this in a way that doesn't introduce duplicate code but this method is a really good way of onboarding yourself on a new project: find a part of the project that almost does what you want and go from there.
Those who forget history are doomed to be dogmatic software developers.
A lot of stuff we take for granted are either accidents of history, or powerful counter-reactions to the accidents of history.
There is a practice, and it turns out to be bad. Mild discussion of the virtues and vices would, in a world composed of Asimovian robots, be sufficient to update the practice to something better.
But that's not how humans work! Typically an existing practice is only overturned by the loudest, most persuasive, most energetic voices. And they have to be. Humans don't come to the middle by being shown the middle. They come to the middle after being shown the other fringe.
So a generation changes its mind and moves closer to the new practice. Eventually, that is all the following generation has ever heard of. The original writing transforms its role from mind-shifting advocacy to the undecided to being Holy Writ. The historical context, and with it the chance to understand the middle way that had to be obscured to find the middle way, is lost.
My previous role was as an engineer teaching other engineers an XP-flavoured style of engineering. I often referred to our practices as "dogma", because we are dogmatic. But if we aren't, less learning takes place. Dogma is most instructional when someone later finds its limits.
When I was learning to coach weightlifters, I was told something that has always stuck with me: "As a coach, you will tell trainees a series of increasingly accurate lies". You can't start with nuance. In the beginning, it won't work.
You can start with nuance and openly acknowledge the lie. This doesn't inhibit learning. Every modern textbook on Newtonian physics tells students that it is effectively a "lie", in that it's only an approximation that works reasonably well in most real world cases. But you have to start there before learning general relativity and quantum physics.
You may have more experience than me, but at this point in my life, I find myself disagreeing with this. In fact, one of the biggest problems I had with education is teaching dogmas. On the other hand, when I did teaching, tutoring and lecturing, I always tried to make it clear and explicit that what I'm telling is a practical simplification, that it has limits here and there, but within these particular constraints it's a good approximation. And the feedback I got was always that it made things much clearer to people - people felt it makes sense, because it had context.
What I taught was a way of working. I didn't deviate from the practices, because the principles are easy to state but hard to truly grok.
Going back to what I said earlier, this is the difference between weightlifting drills for various parts of the movement, versus discussions of physiology, anatomy, anthropometry or physics.
Thanks for the clarification. So it's something like, first learn to do something in a decent way, and only then - when you're familiar with the subject matter - start thinking from first principles?
I think the Socratic method is a much better teaching device. Dogma is the opposite of teaching you to think for yourself. Dogma says "Follow these principles and you'll write good code, don't question it!". Principles need to be challenged and proof needs to be provided how and in what way they are really "best practices". The black and white thinking of dogma are the very reasons why many religious groups don't progress with the modern world and insist they know better than everyone else without having to explain why. Not trying to diss all religions, but I was raised in a very strict one and group-think and anecdotal evidence is used to justify poor decision making. I'd rather be a critical-thinking programmer than a principle-obeying one.
Maybe I need to use a different word, especially I used it a little self-deprecatingly. I do answer questions (often at eyeball-dessicating length), but I also insist on the practices.
I sometimes referred to Pivotal Labs as a debating club that produces code as a by-product. Everything is up for debate. "Strong opinions, weakly held" was a frequent motto.
But that didn't mean we started from scratch. Almost all projects start with the core practices and stick with them fairly tenaciously and inflexibly (in the face of the circumstances we have seen before), in order to facilitate the immersion.
Yes. I need a different word. I am not conveying this well at all.
I think working on a large project in c89 made me better at writing in other languages.
When you strip all your useful tools and concepts away, you're forced to rethink how you can organize with just data types and functions. Surprisingly, you can do pretty well with just these.
It's the sort of thing that helps with recognizing when you're looking at FizzBuzz and when you actually need to use a generic factory.
When I started working with PHP over a decade ago I had some experience with Java so I thought it was funny that in PHP everything is an 'array'. Now that I'm working with JS and Elixir and getting into functional programming I'm realizing, that hey, maybe that is the right way to go. Putting everything into structures like maps helps blur the line between code and data.
I have quite some background in OOP and switched to functional programming a few years ago, now working a lot with Clojure, where you actually have the "code is data" paradigm. And from my experience, it's way easier to model data with three simple structures; vectors (= arrays), maps and sets. There are other things, too, but most of the time these three types are sufficient. It requires some discipline in the beginning but it can give you a lot of benefits.
Just to reinforce this, the same is essentially true of all the languages I have touched. If the language has associative arrays ( e.g. maps ) all the better. But associativity is not difficult even in assembly language - so long as O(n) lookups are okay. Adding basic better-than-linear searches ( O(log(n)) ) is also not horribly complex - the 'C' library bsearch() is a conceptual template here.
Having large arrays of const data to drive 'C' programs can - emphasis can - lead to much more manageable 'C' code.
You are going to be lectured at length by the (OOP) static types crowd. And they have a point about using explicit types, when it's easy.
That said, I wish the "strong types" crowd would take a look at all of the temporal coupling that their OOP designs are causing, with their update-at-will practices. Now that we have working garbage collector software plus adequate hardware support, why not make use of that?
Some are just lost if autocomplete in the IDE doesn't enter stuff for them, and the more verbosely the IDE spews, the better, cuz it looks like "work".
That may be fixable at compile time, I believe. In the worst case, you could write a patch. (Dijkstra would prefer a half-open indexing scheme anyway.)
And be incompatible with the universe of Lua programs.
The problem was that Lua reified 1-indexing into the language when they optimized to make arrays faster and then created length operators. At that point, 1-indexing got baked into the language.
I like this part: “Duplication is sometimes essential for the right abstraction. Because only when we see many parts of the system share “similar” code, a better shared abstraction emerges. The Quality of Abstraction is in the weakest link. Duplication exposes many use cases and makes boundaries clearer.”
I just did this yesterday... copy/pasted some code from one function to another function, tested that the new function works and moved on, and then when I went back to work more on it, I wrote more generic code that can be called by either function. Don't have to overthink things before writing the first function.
I tend to see this when people try to pre-emptively abstract. Not "we seem to be duplicating 3+ blocks of code here" but "we might want to do this again at some point so we need an abstraction to make it easier next time".
The former almost always works while the latter almost always fails.
The former gives a little dopamine hit and a boost to the ego because that's being "an architect" while the latter feels more like being a janitor. Ironic really.
Interestingly, the second approach is often recommended in functional languages, although in a slightly different way; compose complex/complicated logic of small pieces. You end up with a lot more functions but it allows for code reuse.
I am note quite sure, though, if we can already talk about an abstraction then.
There is no code reuse of small non-generic functions in practice, just duplication and thus ravioli code. And to make these small functions generic costs time.
What may make sense is to make something similar to lemma in proofs. This is strictly for ease of debugging or readability.
I think "kill your ego" and be happy with the janitorial aspect of it.
De-duplication is "equivalent" to ( LZW style ) compression, and LZW compression is a solved problem. But local conditions may insist more on de-duplication.
Point #7 are called non-functional requirements. Those are often implicit from non-technical people. Nobody will say: "I don't want to be hacked" or "i don't want this system to slow down and die"... This guy says "we don't need NFRs, focus in the functional requirements"... well, that's exactly the cause behind most engineers that get fired for technical reasons: causing one or more serious incidents by not caring about those requirements.
Point #2... seriously? Keep your code consistent, and if you identify opportunities for reuse for things that are related then do it. Copying and pasting code, which is what this guy is advocating for, is not good.
> Copying and pasting code, which is what this guy is advocating for, is not good.
Well obviously this is why we have variables and functions, etc. And hopefully this helps enforce DRY ("don't repeat yourself.")
But, reusing a component leads to coupled code and that can also spell disaster. Sometimes it really is better just to copy stuff. Maybe this is a matter of taste; some people like to have the perfect design and if that makes them a happy coder then i guess you should let them do that. I find that, more often than not, the "perfect design" is not worth sweating over. My style is tracer-bullet: get it working today, write some tests maybe, and then refactor it.
I like the term "tracer-bullet." You should get that written up as a blog, if it isn't already.
Reminds me of an anecdote about getting people onto the moon. When that was their goal, the very first thing Nasa did was make sure they could hit the moon with a rocket. Then iterated from there.
The term is actually from the book Pragmatic Programmer (a good book, though I'd say mostly for beginner-to-intermediate programmers, since a lot of software blogs are essentially just repeating the contents of this book ad nauseam).
I tend to do that too - get something working fast, and iterate over it. Focusing on perfect abstractions makes no sense if then you find out you're aiming in the wrong direction.
I don't think it is more copy/paste that it is necessarily advocating. Rather, make sure you get real value out of anything that you abstract away.
I think the specific scenario was alluded to when someone comes to read your Foo action, you want to avoid: "Found the entry point to Foo, looks like it first sets up something using Utils A. Now I need to get a rough idea of Utils A. Finally, back to Foo. Only, now it makes use of two library functions from B and C. Now, off to get a rough understanding of B and C. ..."
It is possible, of course, to quickly get through all dependencies. Or, to just ignore the understanding of A, B, and C to get whatever you needed into Foo. Doing this, though, typically muddies whatever purpose A, B, and C originally had so that maintaining them will be near impossible.
What I often find is that the copy-and-paste turns out to be copy-paste-and-modify-slightly -- which helps describe the perimeter for the future refactor and abstraction. In this light it's helpful to duplicate a few times, to better identify what repeats and what is unique.
Point #7 is a sales argument for corporate software, especially for business intelligence tools. It makes easier to get the contract signed if you present such functionalities to the management. The users will hate it, because from now on they additionally have to do the business peoples work as well ("Business people never used it") but hey, who cares about them? Especially if you can sell additional trainings or consulting hours? Count yourself lucky if you don't know what I am talking about.
The most interesting takeaway for me is this: [...] The best quality of a Design today is how well it can be undesigned [...]
Aiming for design that's easy to refactor and/or replace bolsters application longevity at the expense of code longevity. I like that. It's like the ecosystem longevity is achieved at the expense of the longevity of individual organisms...
Now then, who has ideas, or best practices or even just anecdotes? I'm eager to hear those!
For my part, I follow design philosophy that revolves around persistent data structures. My code is supposed to be just an accessory to the data. I don't think I'm explaining this well, it's just where I put all my focus on. Another principle is to try capturing users intent rather than the outcome of users actions. This way I can redo the code that detived outcome from actions and fix some the earlier erroneous behavior.
My code is supposed to be just an accessory to the data.
Fred Brooks: "Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious."
Of all the 'maintainable' design pattern use I've seen over the past decade, I have never seen one prevent a continuously maintained code base being rewritten periodically. All that work is thrown away just as a simple to-spec implementation would have been.
It seems that just as in literature, in software engineering too, the essence of writing is rewriting...
It's why I always coded in a custom 4GL I used to have that was essentially BASIC with macros, a RAD tool for quick iteration, and synthesis to fast C or C++. Idea being I kept throwing away stuff plus didnt want to do extra work making C reliable. Tool combined with a Cleanroom-style of composition let me rapidy produce or change stuff with production code in popular language as a side effect.
No longer have it but such tools definitely influenced my belief languages must make code easy to iterate, change, and throw away. LISP with types, Design-by-Contract, and incremental compilation is still the champ. Think, type, result in 0.2 seconds, test, rethink, retype, etc. Just keep flowing.
The mainstream stuff needs support for these things.
Point #8 sounds great: reuse OSS, don't reinvent the wheel. However, this only applies to software that is maintained, now and in the future. Relying on obscure code-bases without knowledge on its inner workings is going to bite you down the line.
Examples? Grunt, Bower, Hibernate, Apache Commons. How long did it take the Apache Commons project to properly add generics to its libraries? How is backward compatibility and developer availability holding back projects down the line?
Additionally, in order to use a library, you need to have a good knowledge of the problem it tries to tackle. By overly relying on open source software, you might blunt the competitive advantages of your business. It's an example of exorbitantly relying on abstraction.
I like to think of this as a dependency cost. Adding a library may "relieve" implementation cost, but adds a dependency cost (ie. finding+evaluating dependencies, integrating and maintaining the dependency, testing the dependency and so on).
This cost varies a lot, for example a library to left pad a string may be easily replaced, but if you choose a library to implement some on-disk format, then that can cement things in. If the library becomes unsupported one very well may end up maintaining it alone, for example.
In the more general case more issues are attached: choosing a deployment stack, a documentation or translation tool etc. can result in decisions that are essentially set in stone. These also tend to be composed from multiple tools, leaving more wiggle room. Deployment for example might be done with AWS and Ansible on the higher levels, but use in-house tooling for figuring out the details.
Similar arguments apply to code de-duplication and DRY: de-duplicating code isn't free, and not always the right thing to do.
This varies a lot. Python generally has good libraries. JavaScript (in my experience) generally has far more fragile libraries and its a lot faster moving, so the dependency cost is more likely to bite you. Usually there are a few libraries doing the same things, so go for the mature ones.
> if you choose a library to implement some on-disk format, then that can cement things in. If the library becomes unsupported one very well may end up maintaining it alone, for example.
But, presumably, you used that specific format because you needed it. So, it makes sense that you take it over if it gets abandoned.
If the problem is that you didn't need that specific format, then you should rip it out completely when (and only when) that library becomes unsupported.
Along that lines, it's also funny when somebody does an "armchair quarterback" of an older codebase.
A company I used to work for was closing its local office, and moving the work we did closer to the home office. One of the lead/architect types asked my friend who was still there at the time "Why didn't you use something like Spring Batch for this?". He accurately responded "Because it didn't exist in 2005."
That said, I'm glad something like Spring Batch (at least for Java) is around now. I've had to implement a partial version of something like that 3 times in the last 20 years :-)
(Both a C version and a Java version for turning client file dumps into mass mailing print output; a Java version for doing ETL type jobs for in/out bound client integration feeds)
That said, I've seen under-engineered batch jobs that mix incremental input and output in a giant spaghetti loop, rather than taking the time to do a bit of partitioning (e.g. - ye ol IPO, Input-Processing-Output), so that you can do things like selectively run certain input units, or recover from bad data in individual input units.
Oh, luxury, I tell you - Grunt, Bower, Hibernate and Commons are the most obscure, unmaintained third-party "solutions" that you have to deal with? I'm over here dealing with vanity projects like "Dozer" and "XmlBeans".
Grunt is already considered an obscure codebase? It is barely four years old!?
Or, are you just saying that if you have a complicated problem you will probably need to know specifics of how these tools work? I can buy that. I do question how many people are truly doing something complicated where these tools wouldn't be sufficient.
Also, consider how the main contributor (cowboy) last commited in May 2015. This is just a cursory github-health-scan (tm), but it tells me two things:
- Don't expect any large changes in the grunt code-base in the coming years
- The code-base is probably not easily modified (either caused by low code quality or of the risk of breaking builds).
TeX is probably a great example where the codebase converged on 'perfection'. A fixed set of features, practically bug-free, superb quality.
I could mention some flaws of TeX, and observe how many are related to input, output and environment:
- archaic syntax. Even though it has an interesting design philosophy, it does not adhere to the syntax (input) most programmers are used to (you know if you ever tried to write TeX)
- does not take advantage of modern system architectures. Needs to process a TeX file multiple times.
- does not allow animation (this was not a requirement, nor a possibility back in the 70's)
- not much interactivity (yes, URLs are possible, but it's mostly a hack). Web wasn't available back then.
- output format (DVI) limits possibilities
So, even though I like TeX and its attention to detail, it was already dated when I started using it in 2000. In our landscape, adaptability is an important trait of libraries.
This is a great article, thank you very much for writing it.
I pretty much agree with all points, except number 9 could probably be "Not challenging the Status Quo" instead of "Following the Status Quo".
Breaking the status quo just for the sake of it would be a mistake, while healthy challenge of status quo with open mindedness to accept to not change anything is probably a better direction.
> Areas of code that don’t see commits for a long time are smells. We are expected to keep every part of the system churning.
Or it could be an indicator of mature features that were well designed and implemented to minimize future headache: they just work and have very few stable dependencies, if any.
One of the (really sad) reasons I suspect is behind a lot of these practices - like pointless wrappers and pointless genericization - is that nobody will pay you to "understand something". You have to produce something, even if you don't quite understand what it is you're supposed to be producing. Sure, you can spend an hour reading a horribly written-by-committee "requirements document", but you had better be producing something that looks like a program by the end of the day. Since admitting that you don't quite yet "get" the DB layer is a fast-track to the unemployment line, some developers have learned to buy time by creating meaningless abstraction layers while they're trying to figure out the inner details of OAuth or Cassandra or whatever else was supposed to be "saving us time".
Those things are not necessarily all that terrible. I can see how overeager coders might do any of those things in anticipation of some as yet unseen requirement. And it's not always clear why a code base evolved the way it did.
Also let's not forget the opposite. I've worked in places where everyone just wrote their own spaghetti, no concept of version control, and every time there's a small change it takes ages for the only coder who wrote it to untangle and modify it. Basically a steaming pile of turd, used to invest real money in the real market. The worst part about it is when you call them out it's YOU who doesn't understand the requirements.
Or "You have Big Data". Paraphrasing Greg Young's idea, "If you can put it on a Micro SD card that you can buy on Amazon, you don't have big data". So if your data fits in 128GB ($37.77 on Amazon right now) you don't need big data solutions.
I would say that the definition of what is big largely depends on the problem you try to solve. If that problem is finding keywords in text files, then your definition sounds about right. For other problems even a couple of KB might be big. To me, big is when your dataset is too big to solve your problem in reasonable time on one machine.
Right. I'm not comfortable with this idea of big data being dependent on the actual file size of the data. There really are problems where doing stuff like reading files or using a relational database break down and you need something else that's more specialized in solving a problem even if the dataset is just a couple GBs. (So in my mind big data is a reference to specific tools or approaches like map-reduce, etc.)
My personal criteria are on the line "if it doesn't fit on a single box (up to a cubic meter)" or "if it's so much it's impractical to move around", then you can say you have big data.
If just maxing out the computer's RAM and CPU count solves your problem, then it's not big data.
...I don't get it? This sounds like it should be saying we try to "plan ahead" too much, but then the description seems to say we don't do enough.
2. Reusable Business Functionality
I think this is arguing against doing too much design work up front? From what I've seen, incrementally growing a system tends towards the opposite problem unless I'm aggressively looking for refactoring opportunities.
3. Everything is Generic
This isn't a case of too much vs enough, it's a case of correct vs incorrect. If you can guess how the business requirements are likely to change, making things generic in the right way will make that change much easier to do.
4. Shallow Wrappers
Yeah. Unless you have actual advanced knowledge that you'll need to switch out a particular library, this should be done on-demand as a refactoring step before such monkeying around. Except for things where you need a seam to make testing easier.
5. Applying Quality like a Tool and 5.1. Sandwich Layers
...does anyone actually think this way?
6. Overzealous Adopter Syndrome
Maybe, but keep in mind these can also be used to clarify intent or to intentionally constrain future changes.
7. <X>–ity
The examples look like things where pursuit of whichever <X>-ity didn't actually work, rather than cases where it wasn't needed.
8. In House “Inventions”
These tend to be a result of either very old systems that date back to before an appropriate reusable version became available, or organically grown systems that had parts gradually come to resemble some existing reusable thing (that initially would have been overkill and more trouble to use than it was worth).
9. Following the Status Quo
Or in other words, "don't fix what ain't broken" isn't actually good? How is this "over-engineering"?
10. Bad Estimation
How does this fit the theme? I thought the standard way to improve estimates was to put more thought and detail into them, which means the problem here is actually under-engineering (well, that and general noobishness).
.
.
Edit to add:
Important Note: Some points below like “Don’t abuse generics” are being misunderstood as “Don’t use generics at all”, “Don’t create unnecessary wrappers” as “Don’t create wrappers at all”, etc. I’m only discussing over-engineering and not advocating cowboy coding.
So... if you disagree you're wrong and misunderstanding the article? If it's that misunderstood, it's the article's fault for failing to communicate effectively.
> ...I don't get it? This sounds like it should be saying we try to "plan ahead" too much, but then the description seems to say we don't do enough.
My view may be biased by my experience, but I understand it that - no matter how beautiful logical structure you invent that makes all the requirements fit perfectly, the business will quickly come up with a new case that doesn't really fit anything. Business requirements are unpredictable because they make no fucking sense - they're combination of what managers think customers need, what marketing thinks it needs, what future visions your boss has that he didn't tell you (or that he doesn't himself even understand yet), all with a sprinkle of peoples' moods and the subtle influence of phases of the Moon.
See also the motto of American Army - "If we don't know what we're doing, the enemy certainly can't anticipate our future actions!" ;).
This is only the case if the business is incompetent and insane, and also not communicating effectively with development.
Or maybe things can look that way, if the devs don't understand the business.
The business is doing something to make money. The requirements will somehow work to support that something, or will be something that your business contact thinks will support that something. If they don't make sense, that means your mental model of the business is different than your business contact's mental model. And unless your company is unbelievably dysfunctional, that's an opportunity to talk and reconcile those models.
My impression of the OP was that most of the time it's beating up a straw man. A more useful article would lay out heuristics or at least present anecdotes regarding sensible points on the spectrum from under- to over-engineering along each dimension.
This is a fairly well-written post with some good ideas and some generalizations that are going to get somebody in trouble if they follow all of them, e.g.
> TL;DR — Duplication is better than the wrong abstraction
Woah, horsey, hold on a moment!
While it's true that an abstraction can get you into trouble, that's not always true.
Over my many years I've heard a number of people say: "We have 100 copies of the same site, but slightly altered for each client, and we don't have time to go back and refactor them, we can't upgrade them, and we're three major versions behind. Want to come work for us? (Silence.)"
I've only heard one person say, "We refactored X and it bit us in the ass, because the developer didn't check when he/she was altering it and accidentally changed behavior for everything."
Partially because once people realize that they're in dependency hell, and no one can guess at how many thousands of places across 100 sites might be affected by a minor change in one spot, they don't even try. That minor change has become impossible (until the next full rewrite), or at least not feasible within a reasonable budget.
So instead changes must be made outside the generic shared code and you end up with 100 slightly different sites that can't be upgraded.
The trade-off isn't about the dangers of not refactoring v.s. refactoring. The trade-off is about time plus the aspect this article hammers, which is lack of knowledge about the future--i.e. if you made a multi-purpose multi-client framework to start with you might still be building things and rewriting them to fit them into even more unforeseen situations instead of having live code running for 100 clients.
* If you want to copy a site 100 times, go for it. But I know very few that ever thought that was a great idea, and each time it was a very specific case. Yes there are client-specific features and multi-tenant sites, but that's not what he said. He said "copy vs. abstraction" which is the opposite of refactoring.
As an aside, I'm really having trouble understanding how people in the HN community could be thinking I'm wrong on this.
I think I need to go to a forum that's more grown-up if this is how things are here now.
I'm not sure about it applying only to web-developers. It might be more focused on application-developers, where business requirements hit the UI. Developers working on services, libraries, frameworks, and platforms have different engineering needs where some of these guidelines don't apply (and others might).
> but I guess that is what is meant by "modern software"
That's what is meant by "modern software" if you just read on-line blogs. On-line tends to focus around on-line (also, it's hipstery and hot, so it gets lots of attention).
That said, the article is rather general and definitely applies to desktop-app programming too.
I completely disagree - I've been developing software since before the web, and I labored under (and against) all of these syndromes in the "desktop software" days, too.
His examples may have been skewed towards webdev, but I've seen all these things outside of that sphere (95% of my career has been systems and scientific programming).
This is great. TL;DR of the whole thing? Don't do (almost) anything unnecessary ahead of time.
I have a small bone to pick with 4) Shallow wrappers. My design process involves ignoring existing solutions for problems and then finding things to match my desired design. Often, this requires absolutely no wrappers, sometimes it requires shallow wrappers, sometimes more involved wrappers, sometimes implementing it all myself.
I do agree that you should not blindly wrap anything.
All in all, a great article. I think this should be required reading for aspiring designers/architects.
No, functions should perform one task and perform it well. Not every conditional test inside the function needs to be its own function and have its own unit test. Sometimes a function that does just one thing and one thing well has more than one step to do that thing, and those steps don't necessarily belong in their own functions. Testability and composability are important, but that has to be balanced against locality of reference and context. When I see a colleague write in a code review that pieces of a function should be factored out of a larger function "just in case they'll be reused" I step on it hard. This is related to "over generalization" but not exactly the same.