> I'm really pissed that technical debt is considered as "Hey the dev guys are complaining again".
That's because it's very untransparent to anyone other than the engineers working on a project.
I've had a limited amount of success by making this more transparent. Signaling every time a feature will take longer because of a piece of technical debt the team wants to fix caused the fix to get priority before implementing the 4th and 5th feature affected.
Don't the bean counters at Ford Motor Company (for example) nark on the assembly line workers and industrial engineers and QA/QC folks have work pile up, broken machines lying around, uncleaned trash?
It's risk/reward to the people who want to decide how their money is spent, isn't it?
In your example, the worst-case scenario is that someone could die, and that tends to spur on investors to discover the probity within themselves to spend some money avoiding an expensive lawsuit.
But when the devs are complaining about the old code being terrible and making their lives hard, it never seems to hinder them that much to management. They keep banging out new features and fixing bugs, and nothing bad seems to happen. But the drip-drip-drip of bugs keeps increasing, and the new features take a little longer each time, and nobody dies at least, but the thing becomes a haunted moneypit that nobody wants to touch, and you're stuck with it now unless you rewrite it all at huge expense, etc., etc.
Maybe everyone should just treat a piece of software as they would a life. I bet we've all seen some codebases where if it were a friend, you probably would have staged an intervention by now. Your software baby needs absolute care from the get-go until the very end, or it will get sick and probably die, and most likely in a very prolonged and painful way.
The place I used to work in has been hiring (junior) people like crazy. Part of the reason they need so many is the crushing foundational technical debt at the core. When they hired someone to capable of improving that they were unable to merge the changes due to fear, and the management couldn't see the business value of doing so. They've had a few nasty outages recently too. I believe the insides of the Atlassian kit are similarly riddled with technical debt.
An important difference being that in your Ford example, you can just throw new people at the problem while in software it generally needs to be handled in the responsible team.
I’ve found it helps to metric “how fast does it take to get a thing of x size done” - if you can measure the results of your improvement (like how fast it takes to get a new design implemented) it’s an easier sell. Eventually, it becomes known throughout the company that things are going faster, regardless of the metric results. Of course if you go around making high risk changes for low reward they’ll see the artifacts of increased bugs and less system reliability.
I've had some success building technical debt into my estimates. While I'm working on a new feature or a bug, I'll tidy up in the area around it - the tidying up is just part of the work necessary to complete the task.
The really cool thing is that eventually you're able to deliver large, complex tasks in very brief times and then spring on the PM/management that you're able to do this _because_ you've been refactoring. That's made a believer out of at least one of my PMs.
Obviously this doesn't work in all circumstances - it's not always feasible to get the really systemic, contagious debt cleaned up as part of feature work, and if the PM catches on then it makes this tactic difficult to continue.
The bigger obstacle I've had, though, is other developers who haven't fully bought into a culture of continuous improvement. Fear of breakages causes refactor paralysis, which makes it easier to break things when working on them, which increases fear, and so forth. I'm not really sure the best way to deal with that aside from adding a bunch of unit tests (which I still sometimes get pushback on)
In this case, a JavaScript front-end that had no unit tests previously. I also wasn't able to get NPM through the firewall, so I used Jasmine standalone and kept its files and copies of our third-party framework files in a "Frameworks" folder within a separate "Test" folder
The pushback I received was that keeping the framework code in Source Control would result in it being caught in the JS build/minification script, as well as my spec files. The individual that pushed back was also concerned about JS exceptions since we were up against a release, which speaks to a need for training about how unit test files work. Ultimately I .gitignored the framework folder but wouldn't budge on leaving the test files in, since .gitignoring unit tests defeats the purpose. Then I learned that the build script wouldn't grab those files anyway. :)
My boss at my last job had the mind set of "refactoring only makes it different, not better". I asked him if I could spend some time refactoring our build system. He said no. I eventually did it anyway a few months later, spotted a bug due to the changes, and all of a sudden, build times were cut in half or in 10 in many instances.
Same story for a pretty nasty hunk of code we had for handling sparse arrays. Asked if I could refactor, got told no, did it anyway a while later, and all of a sudden a problem that had been considered borderline infeasible takes like 1 day of work.
Refactoring isn't always the right decision, a good boss/lead needs to carefully weigh the pros & cons.
There is always some risk that refactoring makes code not only different, but worse. Corner-cases are often there for a reason, and refactoring sometimes misses them, especially when there isn't complete unit test coverage. Since it's often easier to get the core logic right, this likely leads to issues that are discovered in production.
There is no "right" thing. There may be an optimal thing from the development perspective and an optimal thing from the business perspective. Since the two pieces cannot exist without each other, both parties have to communicate effectively and trust each other to find the optimal decision for the combined problem space which may be sub-optimal when considered separately.
> If boss can't trust the minions to do the right thing, someone's got the wrong job.
There are many people who have wrong jobs.
More importantly, there are many people who are good, but not perfect. They do some aspects of their work greatly, other aspect less well. Good boss has some idea about that and is able to work with people who are not super great.
Least not last, even very good people often disagree about many things, including whether refactoring is needed or not or what kind of refactoring to do. Even if boss trusted all and listened all, he would still be told plenty of contradictory opinions.
The only time my improvements have even been noticed is the pointy haired boss said "Well, you should have thought of that sooner. What am I paying you for?"
One of the things that I almost always insist on in a dev/PM feedback cycle is the concept of "chores." The Devs (usually via eng lead) get to schedule chores in the backlog, full stop. PM can have a convo with eng lead to say "hey, will this chore take a super long time? Can you possibly reschedule it?" but if it's work that is a pure refactor (no product implications) the PM doesn't get to block it, period.
Of course, this only works well on teams where your PM and eng lead don't have a fundamentally adversarial relationship. I like to think this is most teams but does take some getting used to in terms of eng lead and PM communicating priorities and needs, between product moment and code quality.
That's how I do it. It took some time to build up the trust relationship, but most of the time, our stakeholders and me can keep a good balance of maintenance and features. And this balance doesn't have to be rigid. I want my maintenance tasks done, but it's fine to prioritize deliverables for a sprint or two - we'll have a sprint or two of maintenance then. And that might be fine, or even beneficial, because then you have a bigger block of time to do some bigger cleanup tasks.
On most of my projects we have enlightened PMs who make allowances for paying down tech debt. For example, on my most recent rotation (an RoR app front-end to manage cloud orchestration software), the PM and tech lead worked out an arrangement where, for the four weeks following "feature freeze", half the dev time was spent paying down tech debt and other chores (the other half was spent fixing bugs).
There was only one time, where we had every Friday, time to improve the codebase. 2 months later it became every 2nd Friday, though.
I'm really pissed that technical debt is considered as "Hey the dev guys are complaining again".