Hacker News new | past | comments | ask | show | jobs | submit login
A Taxonomy of Tech Debt (2018) (riotgames.com)
294 points by jakey_bakey 3 months ago | hide | past | favorite | 75 comments



Contagion is exactly why interfaces are one of the most important pieces of design and should be given significant thought. A beautiful interface with a suboptimal implementation can be easily cleaned up when time is allotted. The reverse is rarely true.


I don't disagree but I think commonly you are missing one of two things that are necessary for a proper design:

1) time to design it 2) knowledge of exactly what it needs to do today and in a year

Sometimes you're missing both.

In which case I think you can prevent contagion from being too terrible by enforcing smaller modules and single responsibility in a compositional way. That doesn't require as much knowledge of the future or time, but just requires you to avoid high-surface-area interfaces that end up with lots of behavioral variants controlled via parameters in a nesting-doll style. Instead, move your config/parsing/behavioral decisions to the edges of your logic instead of letting them seep into all your underlying models too.


> In which case I think you can prevent contagion from being too terrible by enforcing smaller modules and single responsibility in a compositional way.

I would classify that as thoughtful interface design.


> 1) time to design it

Do good interfaces take more time than bad interfaces to write? Does adding more time really make interfaces better? I find that engineering quality (of which interface design is one facet) is largely a function of talent and experience. Time doesn't usually play a factor. Writing good code takes the same amount of time as writing good code, for the most part.


Agree, but I’ve found designing robust, future proof interfaces to be one of the hardest problems in developing software. Even intentionally setting out to avoid tech debt at all costs, it’s just hard to do correctly. It requires more than technical bravado and architectural vision. It really does get into the realm of predicting the future.


> 15. (Shea's Law) The ability to improve a design occurs primarily at the interfaces. This is also the prime location for screwing it up.

https://spacecraft.ssl.umd.edu/akins_laws.html


It's important to accept that you will screw it up. Repeatedly. Interfaces have to be designed before you can start using them, which means that you will never have less information about how a module will be used than you do when you design its interface.

The best defense against this that I've found is to ensure, as much as possible, that interfaces can be replaced. The single responsibility and interface segregation principles can help here. Using small, focused interfaces and letting modules implement more than one of them makes it easier to use the strangler pattern to replace interfaces that no longer work well with new and improved ones.

Also avoid temporal coupling as much as is feasible. Unnecessary statefulness is the easiest way to make this sort of thing harder than it needs to be.


Mr Akin's gotchu, fam:

> 2. . To design a spacecraft right takes an infinite amount of effort. This is why it's a good idea to design them to operate when some things are wrong .

> 3. Design is an iterative process. The necessary number of iterations is one more than the number you have currently done. This is true at any point in time.

> 4. Your best design efforts will inevitably wind up being useless in the final design. Learn to live with the disappointment.

Also 9 10, 11, 12, 13, 14 and a bunch of the others apply too.


A good middle ground is modularize everything into stateless funcs where possible so it can be reassembled in different configurations without much stress.

An excellent interface will eventually be deformed beyond recognition chasing the architectural dragon; a well-crafted library will outlive the project.


Look at how mathematicians build minimal yet complete definitions for inspiration. An algebraic system can be created with a set of operations such as multiplication and addition, and existing concepts can be mapped to this system, such as money, but the underlying algebraic system will never change. It is complete.

Much of the system can be complete like this with forethought. The pieces that cannot can be factored out to the edges.


You’re not wrong in a theoretical sense, but building useful interfaces that your average dev can grok enough to build on top of requires higher level abstractions, approximations, and “reasonable defaults”. My experience is that only a small number of devs actually well understand the codebases they work in (and care enough to be thoughtful in interfacing with it).

The majority of devs generally are happy to tack on their features and PRs to whatever random scaffolding they can, without regard or awareness for how their individual component fits into the larger system, or how it may be extended. And to be honest it’s not necessarily a bad thing, because they do need to get work done, and merging PRs shouldn’t be reserved for the enlightened.

I guess I’m just pessimistic. The reason we don’t see perfect software is because we are not capable of producing it. At a certain point it all becomes spaghetti. If you work with software that isn’t spaghetti, it’s only because the people who care about it not becoming spaghetti haven’t left yet. This is good, but eventually they will leave, standards will decline, and you will become one with the pasta.


You're not too pessimistic yet. Projects that devolved into spaghetti paid their engineers roughly the same as they will pay new ones. Looking at the incentives, it's hard to take on the burden of undoing technical debt if your salary isn't going to change much. Businesses take advantage of passions to fix things like technical debt because they know they don't have to pay too much extra for it.


Keep fighting the good fight. It’s more satisfying, even if entropy inevitably wins ;)


> forethought

Forethought is only possible if people tell you the requirements precisely correctly upfront. Real systems design is you get 90% built and someone drops a hard requirement that's also a layering violation on you.


That is incredibly naive.

Math arises from first principles, human behavior does not.


Consider how often SQL, a hash implementation, a data compression algorithm, or a standard library changes. Not often, because they are complete systems. If you don’t like them, you don’t change them- you switch to another system. But they can support an infinite variety of use cases. Hopefully that clears it up for you.


One must understand what a good underlying implementation looks like to expose a good interface, it's easy to implicitly bake stupid implementations into an interface that cannot be fixed by just changing the implementation. The example that comes to mind is sorting and paging behavior. Junior devs, and many seniors that should know better by now, OFTEN start with requests that use some variant of limit/offset parameters for paging which leads to terrible performance issues and anomalous behavior. How paging works efficiently and what sorting options can be supported with good performance is inherently coupled to the shape of your data and your choice of datastore. People that haven't been through this exercise at a lower layer have little chance of shaping the higher level interface appropriately unless they put work into the implementation up front.


Another example is synchronous vs asynchronous I/O.


Which is why I like languages that make interfaces very explicit, like OCaml or Ada. Most of the time, I don't want to see the implementation, just a properly documented interface. If people can't describe in simple terms the behavior of an interface, something is wrong.


History seems somewhat full of counterexamples, though? QWERTY is rather famous for not being an optimal physical interface. Steering wheels would probably be up there?

In computers, you have x86 being the poster child of ostensibly suboptimal interfaces.


QWERTY proves GPs point, it's a suboptimal interface and basically impossible to get rid of now, even though we've moved the underlying implementation from typewriters to computer keyboards to touchscreens


Actually, I wonder if QWERTY is actually better for touchscreens than alternative layouts. In a better layout like Dvorak, the most commonly-used keys are grouped close together, mostly on the home row. This is great for typing because you don't have to move your hands and fingers as much and can reduce wrist strain. But QWERTY does the opposite, moving all the most-used keys to the non-home rows so you have to constantly move between the top and bottom rows. On a computer keyboard, this gives you RSI, but on a small touchscreen, this means the "keys" you're tapping on are generally farther from each other, so perhaps it makes it easier since you rarely tap on two keys that are adjacent.


Amusingly, I switched my phone to colemak. Mainly as I am just happy with the layout. Though, I confess I hate inputting anything on my phone, as I am not a good phone typist. I can almost make the swipe thing work, but I learned how to touch type on a keyboard and it feels very very weird to try and use a phone's keyboard.


Ah, if the point is just that you can't get rid of bad interfaces, I suppose that works. I was taking it more as a systemic problem caused by bad interfaces. Which is to say, I'd be hesitant to cede that this has caused any actual problems.

Would be like complaining that AC is being superseded by DC and how this is proof of an early choice locking us into a bad choice. But it ignores all of the progress made in the interim. And the odd reality that enough effort can migrate anything. It just takes a lot of effort. And we are often quite willing to throw effort at things.


But contagion deforms interfaces. Its that moment in discussions, were everyone goes away from how it ought to be, to how we must implement it, due to the previously existing modules, you learn about that..


> A beautiful interface with a suboptimal implementation can be easily cleaned up when time is allotted.

That won't happen. Why toy around with your ticket database like that? Just close it to WONTFIX.


I gotta say, it's pretty amazing to me that this was written by an engineering manager. None of the EMs I've worked with would be capable of discussing our codebase at this level of technical detail. Even the ones that used to be engineers.

Although to be fair, we don't have any EMs who were promoted from within. We have a bad habit of hiring managers from outside, as nobody internally really wants to stop doing engineering (myself included).


It seems to be missing the most common type of debt I've seen:

Founder's debt.

This was debt that was created by the founders to get the fast, good value tech out the door. Low hanging fruit that ends up being the foundation of the whole shebang.

The founding documents of many countries fall into this category lol (but not USA! USA! USA!)

Macgyver debt and foundational debt come closest, but neither quite outline this phenomenon.


Great article, from a technical perspective! I would say it’s more a “nomenclature” than a “taxonomy” because it’s neither exhaustive nor discrete (by design), but I might be mistaken there. I loved the physical examples for each especially, really thought provoking.

As always, I have a philosophical nit to pick: the “three axes” introduced at the top are just “Return” and “Investment” from good ol’ RoI, with a subcategory added for a particular type of forward-looking/conditional Return. I’m guessing this decision has worked in practice and I don’t expect video game development practices to be absolutely scientifically sound, but some extra philosophical certainty never hurts!


Discussed at the time:

A Taxonomy of Technical Debt - https://news.ycombinator.com/item?id=16810092 - April 2018 (113 comments)

also this bit:

A Taxonomy of Tech Debt (2018) - https://news.ycombinator.com/item?id=39782923 - March 2024 (1 comment)


> I define tech debt as code or data that future developers will pay a cost for.

One of the best descriptions I’ve encountered.

As in all debt, there’s a “threshold” that should be applied, at the time the debt is incurred, which balances the immediate needs, against the future costs. I feel that most people (not just developers) amplify the immediate, and deprecate the future costs.

For myself, I have an almost pathological aversion to debt, of any kind. I will spend an extra day, factoring out stuff that might be useful, in the future. I seem to be right, about 50% of the time. That said, every time I do something like that, I reinforce habit, which accelerates my basic workflow.


I have worked in 3 "startups" now, only coming in after they have started making enough revenue to pay normal ish salaries. The thing I have seen the most, is several of the founders have a blurry concept of what were ideas they had, what was actually built, and what of parts of what was implemented actually works.


I've used Contagion to describe tech debt ever since I first read this article. Does a great job.


I'm not sure I'd even call "local debt" technical debt in ordinary circumstances - realistically there's always going to be mess somewhere, and encapsulating it away where it can't hurt anyone is normal. If it probably never needs to change unless requirements change (in which case any other implementation would also need to) it's fine.

Perhaps if 24 minion instances constitute an actual problem (rather than just inelegance) their example for it is actually foundational debt related having a "minion" being the simplest primitive that would do the job when maybe something lighter could have existed.


The article goes into it a tiny bit, but the cost is the mental cost of when you do need to work on it, understanding it, and I would add keeping the tooling the same.

Encouraging devs to have their changes include all modules, even those that are old and mature and don't need to be touched, is a good way of ensuring this doesn't build up to where it becomes a problem.


One important aspect is when you knowingly take on tech debt in return of some short-term benefit. Then this benefit becomes an other axis to weigh against.


Just like real debt. Want a new building now to get work done, not in 15 years when you have the capital? Take out debt, baby!

It's a tool, but a powerful and dangerous tool, and if you don't acknowledge you're using it and respect it, it'll hurt you. Or it'll hurt someone who accepts the grenade from you. Just like real debt.


And just like real debt, some tech debt has higher compounding effects than others. (Consider this fix cost and impact in the author's framework.)


Yep. Missing a deadline because your debt kicked off a death spiral is Jimmy the Facestabber coming looking for his vig at 25% a week and breaking your knees, gently pushing back some nice to have features next release while you deal with the debt is "only booked a 4 star hotel because the mortgage was paid first that month".


I've heard this called "tactical debt" instead of "technical debt".


Typically speed.


Great article. The "contagion" factor is a useful concept that I hadn't seen before. Needs a [2018] tag.


My experience at big corporate is that (edit: unmanageable) tech debt is caused by undisciplined and unorganized scrum team.

When you have a proper backlog of tickets, including tech debt tickets, the team will eventually fix the tech debt when there are not enough feature tickets to exhaust capacity.


> the team will eventually fix the tech debt when there are not enough feature tickets to exhaust capacity

I have yet to visit this misterious universe you describe.


> I have yet to visit this misterious universe you describe.

The trick is to have 1 backlog. Tech debt and features live on the same list and it is up to the PM to prioritize. Engineering’s job is to argue cost.

Good PMs will prioritize relevant tech debt or pull it in with feature work in the same area. They understand the tradeoff of go slow to go fast. They also understand when tech debt will never become relevant (because the feature is getting nixed, or hasn’t shown desired impact yet, or because the cost of interest is waaaay lower than the cost of paying it off in many cases).

This only works when engineers have the discipline to look stinky awful code in the eye and say “not today” and stay within agreed timeboxes. You blow this estimate once or twice, get the PM in hot water with leadership, and you’ve lost the trust.


All of the teams I've been on have used one list. I've never seen a PM prioritize the technical work. I still think it's a good idea for it to be one list, but it's not sufficient.

For teams that don't have a good PM, you also need a tech champion. Failing that, engineers need to inflate estimates and do tech work under other stories. Then everything becomes less predictable and teams never develop trust.


All PMs I have seen so far were just passing on management’s desire for more features quickly. The only approach I have seen work is if engineering adds refactoring as part of the normal work that needs to be done without asking for permission.


That's the practical advice to engineers who are stuck in a dysfunctional organization where they can't really effect change, which is probably 90%+ of all organizations.


> For teams that don't have a good PM, you also need a tech champion

Yes. And to add some nuance, you need a [trusted] engineer who can say “This will take 3 weeks because of tech debt items A, B, C. We can fix those in 1 week and then take 1 week to implement this. How would you like to proceed?”

Any decent PM will take the 2 week option that also cleans up the codebase.

But if fixing the tech debt would take 3 weeks and then another 2 weeks to build the feature, then any decent PM will take the option that doesn’t fix tech debt unless there’s a bunch more stuff coming in this area in which case taking 3 weeks to fix stuff is totally worth it.

Their job is to make those tradeoffs. Our job is to highlight the tradeoffs they’re making so they can make informed decisions.


> “This will take 3 weeks because of tech debt items A, B, C. We can fix those in 1 week and then take 1 week to implement this. How would you like to proceed?”

I've experienced something like this, but only on a project that mostly had the original team that built it (including me) still working on it. We were able to keep things in check, and in the above case would just do it that way without really asking.

On many other projects I've been involved in, there's years of tech debt that has accumulated: the typical retrospectively incorrect design decision, followed by layers and layers of band-aids, each time making the real fix more complicated and a bigger scope.

These things undoubtedly increase the cost of everything else, but it's really hard to articulate. The fixes take weeks, the break-even won't come until months later, the long-term team members are a mix of skeptical and defensive of their work (eg: don't want to do the real fix). In some cases, there's a war story "we heard that about x, but that caused so many bugs we had to revert and abandon it, why is this going to be different?"

Any tips for anyone working in this environment?


Those are easy calls for which everyone's incentives are aligned.

The problems come from the calls where personal incentives are not aligned. A typical example - the team builds a feature hidden by a feature toggle which is, after a period of A/B testing, enabled globally on the product.

The existence of the feature toggle raises the complexity of the code - let's say it's used in 10 different places, each of those double the amount of possible code paths. Removing it may be a question of a couple of hours of work and is very clearly work paying for itself in the long term, but PM will not schedule this work, because there's no immediate upside for them personally and the cost of keeping the toggle in code is a long term one, spread over the whole organization.

In other words, PM is more likely to get a bonus by slashing work on such tech debt items (and thus them personally delivering the features faster) rather than punished for keeping the toggles/complexity behind.


I agree with you completely that you need trust.

> Our job is to highlight the tradeoffs they’re making so they can make informed decisions.

This is an oft-stated thing that I oft-disagree with. It states that engineers ought to be subordinate to PMs, which shouldn't always be the case.

If you have shit engineers and great PMs, the best outcome is likely to shift decision making to PMs. If you have great engineers and shit PMs, decision making should shift towards engineers.

If they are both equivalently shit or great, it should be a balance. I believe this is the most likely scenario. I believe that balance is thrown out the window if engineers "highlight the tradeoffs" while the actual decision making is lies with the PMs.

How to actually achieve balance is extremely idiomatic to the team and organization. It's hard to get people to have adult, non-confrontational discussions about this sort of thing, however. Too many people will treat it as a negotiation.


> This is an oft-stated thing that I oft-disagree with. It states that engineers ought to be subordinate to PMs, which shouldn't always be the case.

I think of it more as a partnership.

If I’m in charge of getting groceries and you’re in charge of budgets, we need to have an informed discussion on what exactly is our budget and what food we need so we don’t starve. Sure I could blow the whole budget on steak and I might even love eating nothing but steak for 3 days, but eventually some carbs would be nice. Likewise neither of us will be happy if I go max stingy and buy nothing but bags of rice for the week.

The reason I think PMs should make the final call is not that engineers are subordinate, it’s that PMs are accountable. (RACI – responsible, accountable, consulted, informed). The person whose ass is on the line makes the call.

Usually when I ask engineers if they want to be accountable for making the call (and its outcome), things get real quiet real fast :)


If PMs are accountable, then I'm with you. Decision making should lie with those accountable.

From what I've seen, accountability doesn't mean much. Could be the places I've worked. Poor PMs get promoted despite running projects into the ground, good engineers get held back despite pushing through adverse project plans, vice versa.


I remember working in a team where the backlog was controlled by the PM and he created a separate backlog that developers got to use - unsurprisingly, pretty much nothing ever got moved out of the separate backlog.


> For teams that don't have a good PM, you also need a tech champion.

That's part of the role of a Technical Program Manager. The Eng Manager, Product Manager, and TPM should form a holy trinity of mutual support, filling in for each other's gaps. When that happens, you get much better odd of having a high performing team.

Source: I've been both an engineering manager and a TPM. Never the PM, though.


Perhaps that can work, but I'm skeptical whenever the solution is "another manager".


> it is up to the PM to prioritize. Engineering’s job is to argue cost.

That’s a lot of words to say “more features lol” which is basically what every PM I’ve worked with has only wanted.


That just lets the lazy devs scapegoat the PM for “not letting the “ work on the tech debt.

Most people don’t want to work on it. That’s why there is so much. Generating it is like eating candy. It’s unhealthy but you just want to have something sweet right now and the bowl is in reach…


Hmmm, not sure I buy the argument.

Most co-workers I’ve had would have _loved_ to fix the shortcuts and hacks that were done to meet deadlines, but were never given the time. “Refactor while you do new features” works sometimes, but doesn’t work on anything larger scale - e.g. if your overall architecture is collapsing under its own weight, it’s hard to “sneak in” the sort of major work you need to do to fix it.


Oh there’s always a few of those, but then there’s the corner cutters who make more of the mess than everyone else, and few impostors who fade into the bushes when there’s a gap in the schedule.


Well, sometimes they want fewer bugs!


A good PM will understand that to get to C, we need to build and support A + B before we can build C, and plan for this. Like, if we built B to be a terrible barely working mess, they understand that this will make C basically worthless. But in my experience, this ability is surprisingly rare.


you need a smart PM who works closely with the CTO to craft the narrative to sales, that the next critical feature milestone is gated behind fixing said tech debt...


I and one, maybe two other coworkers will fix some of the tech debt while everyone else tries to avoid making eye contact, and we fantasize about a world where voodoo dolls actually work.


Me too. I have never seen this world


Agile is perfectly optimised for creating tech debt. Corporate software is almost always impossible to change once released so it’s obvious that frequent iterative deliverables that you can only code around or on top of propagate technical debt


Waterfall has time to bury the debt and let the grass grow over the crime scene before people come asking questions.

This is dev culture not agile culture.


“Not enough feature tickets to exhaust capacity” - I don’t think I’ve ever seen this happen :) PMs and sales always manage to book all available capacity.


I’ve done it once or twice. One particular time there was a lot of hand wringing about how there was nothing to work on. I about saw red. Tech debt and bugs. That’s what you work on.

That incident really changed my perspective on people who talk about how tech debt is bad. Some of them will roll up their sleeves, but some just want to look high minded without putting in the effort.


if tech debt would depend on some kind of methodology it would not pop up with XP/Kanban/waterfall.

techdebt can even pop up in unorganized slowmo opensource software.


> My experience at big corporate is that (edit: unmanageable) tech debt is caused by undisciplined and unorganized scrum team.

Yeah, this is 100% correct. I comically left Riot after ~6 months for this exact reason. Obviously it's a large company with many different flavors of teams, and it sounds like this team maybe has gotten it together, but by in large most haven't.

While I was there I was working on some of their core games tooling and felt uneasy about my day-to-day. My teams tech debt was quite literally owning them. Constantly missing sprint scopes, spending countless hours arguing and debating about trivial stuff, it was all a mess. They ended up laying off a number of people from that team in a pretty shifty manner so maybe things have gotten better since then.


What was the rough team composition?


~20 engineers, ~3 people managers. From what I recall the team had high attrition and shuffled through a number of people managers. When I joined 1 manager was new hire, 1 manager was new-ish hire, 1 manager was fairly seasoned at Riot and had a "good reputation". Was still a total mess.


> not enough feature tickets to exhaust capacity

This puts you at grave risk of redundancies.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: