Hacker News new | past | comments | ask | show | jobs | submit login
Not all bugs are worth fixing and that's okay (bugsnag.com)
87 points by kinbiko on July 25, 2018 | hide | past | favorite | 61 comments



I have a funny feeling that the nontechnical people on my current project would be nodding their heads along to the article, but the truth is that our applications have bugs that 100% of our customers are running into; they simply aren't immediately noticeable to a layperson. That doesn't mean they're not important. The business relies on complying with the rules of third-party organizations and the software is blatantly violating those rules right now. The clients are aware but still choose to prioritize new features over fixing the bugs that are creating these compliance issues. If we're caught out by those third parties before the bugs are fixed, there's a good chance it could sink the whole company. But our users don't "see" these bugs so they're not considered a priority over new products, new features, or anything else marketing might want.

I looked up Pinedo's background and she's not a developer; she's a social media manager. This is kind of what I figured because her perspective on development seemed really out of whack to me. There are many kinds of bugs that can't be measured with a simple stability calculation, and IME there are definitely error states that are worse than death (crashes). Plus 97% of teams are definitely not following agile principles. Every dev team I've ever been on said it was agile, and most of them just meant that there was a kanban board and something that vaguely approximated a sprint.


One of the toughest things I struggled with while transitioning from a larval junior developer to a senior tech lead to a project manager was the fact that (at least in the context of a for-profit business) not all bugs need to be fixed, even the ones you personally think are really really bad. The goal is to make money, not necessarily by producing the most perfect software. The quality bar needs to be high, but there is always a point beyond which the returns for fixing a bug are outweighed by the costs of fixing it: the direct engineering cost, the opportunity cost of not working on a feature, the cost of missing a deadline and not releasing in time for Christmas, the cost associated with the extra risk you’re taking by making a late change, etc. Good places judge all these costs, and the best ones have formal processes for judging lots of bugs at scale and constantly re-evaluate whether a bug should be fixed at this point in the project or not. Sometimes the clear right answer is “no.”


That's fair, but one has to remember that some of the key points are to be balanced, and that, like you said, "the quality bar needs to be high".

And I'm more for prioritizing trying to not introduce bugs than to fix all the old ones. Which is challenging on, how could we call that?, "legacy" software. So that priority can and must be reversed temporarily when that "legacy" is too much.

So it's all very context dependent, and not having anybody (or too few) working on making things better when its needed is not going to deliver any kind of velocity in the long term (and probably the short term velocity is already way too low in those cases). Too bad for the mythical time to market...

So you have to be able to say no to bugfixes, but you certainly also have to be able to say no to the eternal rush of new half-backed features, when needed. A short-term ever obsession on "opportunity cost of not working on a feature" could yield quite paradoxical results if trying to build them on some kind of zombie legacy code (that is only ever edited with disgust and great difficulty, but never seriously refactored).

Not only this balance is hard to achieve, but your role as a senior tech lead and project manager is certainly to consider carefully the cleanup needs, and be an advocate for them when needed, including by pushing back against feature creep pressure. Because if you are not, most of the time nobody else will. As a tech lead, this mean among other things, that a black box approach of parts of the maintained software is out of the question (of course you can delegate, but even then its imperative to stay in the equation for that purpose, only with less details). Paradoxically, even if the quality is crap and the organization notices and tracks loads of bugs, most people will be happy at the moment the bugs are triaged and assigned and eventually "fixed" by more horrible garbage (that is, the impression that something is done), rather than doing the right thing that is to organize a cleanup of the software more in depth.

I've got the impression that it is rare to find projects where this balance is achieved correctly, but maybe it's only because of bad luck. Well in lots of cases, the famous ones (I'm thinking on the level of Linux, Firefox, Python, etc. not just your random niche software) are actually not that bad, and their competitors have a way shorter lifespan when not as balanced...


This is my experience as well. Product progress is prioritized over code quality, and the ultimate cost to velocity is real but often unacknowledged. Bugs go unfixed because they are very hard to fix, and this high cost is taken for granted.

It’s definitely on the technical leadership to stand up for code cleanliness and push back against product, and on the higher-ups to recognize the importance of this dynamic.


    One of the toughest things I struggled with while transitioning from a 
    larval junior developer to a senior tech lead to a project manager was the 
    fact that (at least in the context of a for-profit business) not all bugs 
    need to be fixed, even the ones you personally think are really really 
    bad. The goal is to make money, not necessarily by producing the most 
    perfect software.
On the other hand, you have to remember the that impact = risk x loss. And developers and managers are notoriously bad at evaluating the risk posed by a bug. It does no good for software to make $10 millon dollars a year for 5 years and then make a $100 millon dollar loss in the sixth, because of a catastrophic bug that no one prioritized fixing because, "It's been 5 years and no one's run into this bug yet."


> developers and managers are notoriously bad at

So, basically, everybody is bad at it.


Totally understand that a project can't be perfect, and I could be wrong, but I feel that if there are a lot of bugs, the project scope is too big and/or the requirements are not well understood.

There is also the matter of really picky customers; however, if a customer can articulate what you are doing wrong, I don't think that it's a bad thing if they are picky.

Quality should be in step with expectations; selling services that can't be achieved with in a specific time slot is worse than trying to fix all of the bugs.


> [...] I feel that if there are a lot of bugs, the project scope is too big and/or the requirements are not well understood.

Indeed. And I'm under the impression that the project's take on fixing issues (not just bugs!) is a central part of that, which goes against OP's argument somewhat:

Scope/requirements are usually not out of whack because nobody thought about it when the project started (that's a whole other world of bad project management). The fact is that we're not good at all at judging how the scope and the requirements will change in the future. When (not if!) that eventually happens, you cannot just go adding random changes and edge cases forever, or you'll end up with a horrible Rube Goldberg machine no one has ever even a chance of understanding. You'll have to consistently monitor the sanity and accuracy of your model and either choose to limit the scope of the thing you're building or to correct the model for those edge cases you haven't considered, lest you monkeypatch yourself into a corner.

Now, those bugs that cost so much to fix are usually those where your model breaks down. And that, in turn, is where it's necessary to regurarly step back, find out which part of the model doesn't fit reality anymore and how you can fix it. Then, you can make one of the two decisions above. I've seen my fair share of code in projects where that wasn't done properly, and sometimes, only ignorance on the management side can explain the lack of panic regarding that code.

> Quality should be in step with expectations; selling services that can't be achieved with in a specific time slot is worse than trying to fix all of the bugs.

Well said.


I agree, but I think you still need to be deliberate about not fixing bugs. Sure, don't fix it, but create a ticket for it and use it to track how common it actually is.

You also should probably be fixing bugs that "If we're caught out by those third parties before the bugs are fixed, there's a good chance it could sink the whole company"


Heads up - I co-wrote this article with Kristine and this is based around the whole thesis of our company existing - I'm a software developer by trade with 15 years of experience. Having said that, dismissing the content based on the author I think comes across as an ad-hominem attack - after working with her for 4 years I count Kristine as an expert on this matter.

The overall point here isn't to use this as an excuse to not fix bugs, more that you should consider applying the same logic as devops/sre teams use for "uptime and availability" (5 nines, etc) to software stability to help you move faster as a company.


It's not an ad-hominem. It's an inference on the predictive power of the article as a function of the commenters prior belief on the expertise of non-developers.

They may or may not be right, but you can't just dismiss things you don't like by classifying them as a "fallacy."


The opening Dijkstra quote is out of context and deeply misleading. See https://www.cs.utexas.edu/~EWD/transcriptions/EWD03xx/EWD303... for his views on the subject.


I just want to note -- I didn't dismiss the article based on the author's background; I was baffled after reading the article because it made no sense to me when I was reading it from the perspective of someone I initially assumed was a developer. I'm not really sure how someone can develop expertise in software development without having participated in the process, but if you cowrote the article maybe you can explain? Do you agree 97% of developers are following agile principles and that this stability formula is a good way to measure the overall bugginess of an application?


Agile is not about the details. If you are producing shippable versions of your product every 2 months or less then your probably Agile relative to what existed when the term was invented.


Maybe if you mean little-a agile, as in the dictionary definition "moving quickly". Big-A Agile, as in the set of software development principles, has a bit more to it than that.

http://agilemanifesto.org/principles.html


Nothing on that list is a specific requirement for a specific methodology. “The most efficient and effective method of conveying information to and within a development team is face-to-face conversation.”

That does not say you can’t have a distributed team. It simply says face to face communication is best.


>I have a funny feeling that the nontechnical people on my current project would be nodding their heads along to the article, but the truth is that our applications have bugs that 100% of our customers are running into; they simply aren't immediately noticeable to a layperson. That doesn't mean they're not important.

If they're not noticeable, and the customers keep buying, despite 100% of them having them, how are they "important"?

>The business relies on complying with the rules of third-party organizations and the software is blatantly violating those rules right now.

That's a business decision. It could even be illegal (but still not big of a deal depending on circumstances) but not a bug.


The bugs will be very noticeable to stakeholders who are not end users, if they decide to look. The users don't buy the software because they want to; they buy it because they are required to. If groups we're supposed to be accountable to see the shenanigans we're up to and say, sorry no dice, you have to try again with some other company that made their software properly, _then_ the users will be upset because they'll be out mucho dinero for nothing. The end users don't know what rules we're supposed to be following on our end, nor should they have to.

Where I come from, if software doesn't meet requirements and instead produces incorrect results, it's considered a bug. To use another example from a different company I've worked for, I once found out that the financial reports generated by a particular piece of software were all completely wrong due to errors in the way calculations were done. I pointed this out to the higher-ups and they agreed that there was a bug and that all the existing financial reports were in significant error. They refused to let me fix it, not because we didn't have time (there was plenty, and I was otherwise free to work on pretty much whatever was in the backlog), but because fixing the bug would let the customers know that there was a bug in the first place. Some of this data would end up getting passed on to shareholders and the government. Is this not a bug? In the current case, the reason we're not complying with the rules was because of a bad architectural decision that wasn't properly cleared with anyone before it was implemented.

I guess I'm just not on board with the "just following orders" school of software development when the negligence involved rises to the level of illegality and scams. I'm happy to write a lot of software that I personally think is wonky or strange, but for me it stops when we start ripping people off and breaking the law.


>They refused to let me fix it, not because we didn't have time (there was plenty, and I was otherwise free to work on pretty much whatever was in the backlog), but because fixing the bug would let the customers know that there was a bug in the first place. Some of this data would end up getting passed on to shareholders and the government. Is this not a bug?

No, it's an opportunity to ask for a large sum of money, to leave the company and not talk about it.


It's not exactly related to your post but remember that Agile != Sprints


I know, but that doesn't stop managers from claiming that any process on some kind of n-week cycle is "agile". Maybe I've had bad luck, but it feels like a lot of companies pick up some agile-for-managers book and implement whichever process is there, minus the parts of the process that make them uncomfortable, and the results are often not great.


That is like Open Source != FOSS, a lost cause.


No, it’s not. That kind of atfitude is pervasive and wrong.

Loads of teams are nowhere near Scrum and work basically Kanban. That’s reasonably agile.


Something about this rubbed me the wrong way and I realized it was because this pretends that ignoring much of the long tail of userbase is not only okay but beneficial to the majority. If, say, Quip ignored bugs in IE6 that's likely fine because my parents using their CRT iMac aren't going to be using Quip, but imagine if a crucial app like Gmail ignored older browsers; suddenly all the disadvantaged people that can't afford new laptops lose access to their email.

If it's a bug that 10 users are hitting because they were migrated from an earlier version incorrectly, sure it might be okay not to fix, but if 10 users are hitting it because they're legally blind and using an extraordinarily large font to use your product, it's crappy to say they don't deserve a fix. You have to understand what part of your userbase is hitting a bug and then decide from there.


> crucial app like Gmail ignored older browsers;

Google started telling me some months ago that my browser is unsupported and random stuff has stopped working every few weeks since. I'm running circa 2015 Safari.


Something to perhaps consider: if both groups of 10 people are experiencing a bug, especially one not caused by their own doing, why is one group more "deserving" of a fix than the other?


A bad migration can be worked around by a clean install, but blindness can't, so there's one. Legal reasons are another (e.g. the Americans with Disabilities Act).

OT: does anybody know of a site with similar interesting content and discussion to HN, but with a fraction of sociopaths closer to that of the general world population?


Thanks for the input!

I hope the sociopath comment wasn't triggered by my question. I tend to question widespread assumptions, perhaps more often than I ought to. But I find that, more often than not, people don't have a good reason for the positions they hold. I abhor groupthink which, sadly, dominates our culture today.


The sociopath comment was indeed triggered by your question, though it wasn't fair that it got directed at you rather than any of the hundreds of other comments that make me feel the same way, and I'm sorry for that, but I suspect that any of those commenters would have given a similar defense.

It matters very much which widespread assumptions one tend to question, and which subconsciously get a free pass, and I find that HNers on the whole are likely to question assumptions like "people should be kind to each other" more than most people, while questioning "companies are legally required to maximize shareholder value" less, where in fact it's the latter that's false, and while the former isn't a statement of fact, it's a very healthy axiom for humanity.

Shared values are not "groupthink": they're what allow us to have society at all, and while we should be allowed to discuss them, dismissing them carelessly is anti-social.


Not exactly the same thing but this is a similar idea of using data and algorithms to ignore the disadvantaged.

https://www.nytimes.com/2018/05/04/books/review/automating-i...


> There’s no such thing as a bug free application

This is a stretch. Seems like many people think of code as a living thing that just does what it wants, and us programmers have to beat it into submission.

The truth is, there can be bug free applications.

The problem I think is the complete opposite of the point of the article. Programmers need time to write good software. Without stopping to fix the things we run into, technical debt does what it's known for, and exponentially increases, and kills time that could be spent writing features.

So maybe software is like a living being in a way, that it needs to be cared for gently.


I remember the time when people around me started using expressions like "My PC is not feeling good today." feeling

It's about time, it's about having managers that were programmers and not just managers and so on. Bug free is possible for sure.


Good article. It reminds me of Sandi Metz's treatment of well-designed code as a business proposition: the goal is to save money, because a good design makes changes cheaper.

It makes sense to consider the cost of having a given bug vs the cost of fixing it. Of course, such estimates will almost always be hand-wavey.

I would also say that when in doubt, fix it. A bug is, by definition, the software not doing what it's expected to do; I think it's better to make fewer promises and keep them. A user who encounters a bug loses trust in the software, and there's a tipping point where they abandon it. You might not know where that is.

You also might not realize what a bad day it could give someone, even if they're only one person. Eg, if you're an email platform and you have a bug that drops one email in a million, that might seem OK. But if missing that email gets someone evicted...


    It makes sense to consider the cost of having a given bug vs the cost of 
    fixing it. Of course, such estimates will almost always be hand-wavey.
I agree with that approach in theory, but in practice it turns out that that it's a lot easier to estimate the costs of fixing the bug than it is to estimate the cost of having the bug. As a result, because of our biases, in any ambiguous case, our bias will be for keeping the bug, since the cost of having the bug is the impact of the bug multiplied by the probability of someone hitting it, and it's always easy to lowball those probabilities. "Oh, no one will notice that," or "Yeah, but that's a really obscure case." And then you find out that all it takes is one obscure case for your trading application to lose hundreds of millions of dollars a day. Or for hackers to breach your systems and make off with millions of credit card numbers. Or for malware to turn your IoT devices into a botnet.


"97% of respondents said they practice agile in their organization"

Yeah, they all SAY they practice agile, but a lot of them practice waterfall with agile naming conventions.


This number sounded suspicious to me, so I downloaded the report. What it actually says is that 97% of organizations practice some level of agile, but the percentage of teams using agile is much lower. The most common response (46%) was "Less than 1/2 of our teams are agile", for example.

Saying that "97% of organizations practice agile development methods" makes it sound like it's overwhelmingly dominant, but it's not even possible to tell from these responses if a plurality of the teams who responded to a "State of Agile survey" use it, or what the most common development methodology is.

This is exactly the sort of contextless figure that you'll find torn apart in "How to Lie with Statistics".

> "JavaScript is more popular than ever, and over 69% of developers use JavaScript

Does anyone really believe that StackOverflow surveys gather a representative sample of all developers? This same survey found that 1 in 6 developers target the Raspberry Pi (more than iOS), and 7.5% use assembly language (more than Go, Objective-C, or VB.NET). It's an interesting survey but take it with a grain of salt.


100%. They think they are doing agile as long as they have Sprints.

Startup I'm working at right now told me "Agile" is overkill for us. But ironically it's more iterative then any company I worked before that did sprints.


Every competent programmer that I know did a form of agile (just the sane parts) long before Agile was a religion.


> "Agile" is overkill for us.

What does this mean/did they mean by this?


yeah being agile is good. I meant by incorporating Agile frameworks like Scrum.


Or they practice not having any process and say it's agile because they constantly context switch...


The problem is that many organizations don't know how to tell important bugs from non-important ones, don't have proper processes for bugs reported by customers, and generally ignore customers. Here's a couple of popular apps that I have used and the reason I have quit or will quit them: Hulu with the TV package constantly tells me I'm streaming to more than 2 TVs and won't play anything even though I'm not streaming to any. Youtube TV constantly plays the wrong thing when I click something to play. These are egregious bugs that I'm sure I'm not the only one experiencing because they happen on multiple, different mobile devices, tablets, the web apps, etc. The strategy outlined in the article may be viable when customers are not paying anything, but when companies charge an arm and a leg for software (> $40 / month) users expect the software to not have any major bugs like this that prevent its basic functioning. With the type of support offered by such companies, users will be moving on quickly to competitors that hopefully have their basics worked out. I wouldn't use an email program that can't send or receive email and these bugs are on par with that.


>at a certain point, it’s too expensive to keep fixing bugs because of the high-opportunity cost of building new features.

While I may agree with this in the abstract, in practice most folks don't really know whether they're at that point. It also doesn't consider cumulative effects over time.

Bugs don't just affect application stability or user experience. A system that does not behave as designed/documented/expected is a system that will be more difficult to reason about and more difficult to safely change. This incidental complexity directly increases the cost of building new features in ways difficult to measure. Further, new features implemented by hacking around unfixed flaws will themselves be more difficult to reason about and more difficult to change, exacerbating the problem.

The larger the system grows over time, the more people working on it over time, the faster this incidental complexity problem grows over time. At a certain point, it's too expensive to not fix the bugs because of the increasingly high cost of building new features. At that point, folks start clamouring for a rewrite, and the cycle begins anew.


If the only alternative is between a rewrite, and not-fixing the mess gradually, then I'll take the rewrite anytime and let the cycle continue.

The problem is: is your rewrite really going to be a full-rewrite, or some kind of hybrid monster (at the architectural level, of course, there is no problem in reusing little independent pieces, if any exist)? Because you can easily fall in all the traps of both sides, if the technical side is not mastered well enough by the project management...


The most important thing I take away from this, is that even if you don't fix the bugs, you should be aware of them and tracking them.

Maybe it is a bug that only affects 1 out of every 10,000 customers. But if you get enough of those it can start to add up. Keeping track of them allows you to go to management with the data to support spending a sprint on bugfixes and code maintenance.


This doesn't mention that crashes aren't the only kind of bugs. In fact, crashes are the bugs I fear least because I know about them immediately. It's the bugs that happily do the wrong thing that worry me. Like sending email to the wrong person, for example. I review email-sending code 3 times more closely than other code.


The author of this article seems to think that occurrence frequency is the sole metric of whether a bug should be fixed.

Which of these two is more important?

  - 0.1% of my users lose their data irrecoverably
  - 30% of my users get an error page and have to refresh
The article never waded into this at all, which is disappointing. I don't feel like I learned anything.


This article is strictly true - there are bugs that are not worth fixing. But the process of figuring out which ones are and aren't isn't as simple as crashes/sessions.

There are some categories of bugs that must always be fixed, regardless of how infrequently users run into them - security, privacy, accessibility, data loss.

There are also cases where many low impact bugs all share a common root cause - the value of fixing any one bug is low, but the sum of fixing all current and preventing all future occurrences is high value. Enforced static analysis tools (like error prone for Java) and libraries/frameworks with safety checks (autoescaping template languages, polyfills, etc) are a great way to address these long tail bugs. I generally write a new compiler error after encountering the same bug class three times.


Both parts of the title are correct:

* Not all are worth fixing, that is, the (financial) upside of their being fixed is too low compared to the effort required.

* And it's okay, that is, it's something we have to accept and live on, though it's not really nice and satisfying.


Thanks Kristine for the link to my article...

Myth of Five Nines - why high availability is overrated https://www.iheavy.com/2012/04/01/the-myth-of-five-nines-why...


Dijkstra raised the distinction between pleasantness and correctness. Since most programs are unspecified there is no logical grounds for declaring their behavior incorrect. Unpleasant behavior abounds however, but if your users are willing or forced to accept that unpleasantness then it can be rational to let it stand.


I don't see any mention of security or legal concerns. Not all bugs can be identified by exceptions, exception frequency is not an indication of their impact, and in some cases stability is less important then correctness. Using bugs per session as the (only?) metric is a horrible way of doing product management.


Bugs shouldn't happen. Actually, in some critical systems, bugs can't happen.

Bugs aren't magic; they happen for a reason. It could be a broken dependency(unsupported versions, fatal bug in a dependency, deprecation, configuration etc.), resource limitation(out of memory, security breach etc.), poor design which leads to poor implementation(logical errors, bad data abstractions).

Abstractly waving your hands and saying "we can't fix all bugs" doesn't feel right. Identify the underlying cause of the bugs and address that.

One solution is to reduce dependencies, increase resource allocation, and rely on a less rigid design. As a business grows, dependencies will increase, resource allocation will increase and the design will become more complex.


Maybe, but "Words with Friends" doesn't need to be written in Ada with a team of 50 engineers blowing through $20M working on proof systems for verifying the behavior of placing a tile on a board. There's maybe a tiny difference between aircraft control surface stability software and casual video games, as just one tiny example.

So, obviously, as engineers we have to say that it's context dependent - bugs have priorities. And sometimes bugs exist that you can't reproduce in a lab, have only occurred once in history, and you can't even be sure it wasn't some hardware glitch (because, well, hardware is buggy too)... So, the sane and reasonable thing to do is to let those go and spend our time somewhere where we're likely to make considerable and reasonable progress.


> Actually, in some critical systems, bugs can't happen.

https://blog.acolyer.org/2017/05/29/an-empirical-study-on-th... says that they found 16 bugs in three formally-verified systems (including two bugs that didn't get caught because of bugs in the verifier). So, I'm pretty sure that bugs can happen. (Unless you mean that in certain critical systems, bugs can't be allowed to happen, in which case I agree.)

More, I'm pretty sure that most bugs don't happen for the reasons you list. I suspect that the majority of bugs are just poor implementation.

It's possible, however, for the opposite extreme to happen: The program wasn't buggy, but circumstances changed, and now it is. I'm thinking specifically of crypto code, which can be perfect... until a new attack is devised. Then the software is buggy, because it can't stop an attack that didn't exist when it was written.

When we do get software that has "no" bugs in critical systems, it's because of extreme care at every step: specification, design, implementation, review, and testing. Obsessive testing, and testing, and testing, and testing.


From my personal experience, the vast majority of bugs happen due to an astounding failure on the part of developers to consider even the most basic edge conditions. Also, terrible contract documentation...


What if you need features in addition to bugfixes, and you have finite resources?


I dare you to write 100 lines of useful code without a bug in it.


Are you even trying?

A random search tells me that "The mean DD for the studied sample of projects is 7.47 post release defects per thousand lines of code (KLoC), the median is 4.3 with a standard deviation of 7.99." ( https://ieeexplore.ieee.org/document/6462687/ )

So clearly if you are careful and use state of the art practices, this is very doable.

Not only this is doable, but various individuals and teams in history have been able to reach way lower defect densities. Hey, for all practical purposes, TeX is bug free, for example.

If you are not able to write 100 lines of useful code without a bug in it (not in an infallible way, but at least sufficiently often enough), maybe you should simply study and practice to get that ability.


Those measurements inherently make no sense as you can't know unknown unknowns. Sure, for all intents and purposes if you never encounter a particular defect in a billion years of usage then a bug may as well not exist, but that doesn't mean it doesn't.


Those measurements inherently makes more sense than hand-waving; and although mathematically I agree with you, the world is not mathematically pure.

Regardless, I stand that implying that it would be exceptional to be able to write 100 lines of bug-free useful code is ridiculous. I'm not stating that it is easy, nor that most of chunks of 100 lines are written like that. Just that not only this is possible, but this is accessible. Now depending on the field it might be more or less difficult, but in general I suspect there are tons of chunks of 100 lines that have been developed correctly on the first try, and those metrics tends to, non-formally I concede (but if you dig enough what is even formal enough?), weight more in favor of my view point than in favor of the difficulty level being astonishingly high.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: