Hacker News new | past | comments | ask | show | jobs | submit login
Why software projects take longer than you think: a statistical model (2019) (erikbern.com)
277 points by max_ on March 8, 2021 | hide | past | favorite | 131 comments



> A reasonable model for the “blowup factor” (actual time divided by estimated time) would be something like a log-normal distribution.

Interestingly, we did extensive time tracking on a multi-year in-house software project and collected data comparing the estimated completion time of tickets with their actual time.

The software department was under a lot of pressure to improve their forecasting, and were somewhat despairing that their estimates were off by a factor of about 1.6 on average, and sometimes a factor of 10 or more. This persisted in the face of all attempts to improve calibration. Managers were worrying that developers had no idea how long a task would take and estimation was futile.

When we plotted the data, in all cases, the actual time was very accurately fit by a lognormal whose scale parameter was precisely the predicted completion time. That is, whether the tickets were predicted to take 1 hour, 3 hours, 13 hours, or whatever, the histogram of their actual completion times followed the exact same shape but with a corresponding scale change on the x axis.

This told me that the developers actually have a really good understanding of the class of problem they're dealing with when they start a task. But sometimes tasks have multiplicative factors that make them take longer than you expect. Sometimes the bug turns out to be two bugs, and so on. Based on this analysis, I urged them not to consider it a prediction failure when a ticket takes 10 times longer than expected; that's just a property of the lognormal distribution, and that estimate likely did a good job of reflecting all available information at the time they made it.

Instead of changing the estimates, I suggested that we pick a safety factor for external facing commitments that reflects this distribution. Padding the estimate by 1.6 factor gives the mean, but if you want to make a commitment you can take to a customer, you can just extend the lognormal up to 95% confidence or 99% confidence or however trustworthy your promised commitment needs to be. Of course, a 99% confidence interval on a lognormal is a pretty big factor. But if that's better than running late, it is what it is.

Another interesting thing is that you'd expect, by the central limit theorem, that sufficiently large tasks would eventually become normally distributed rather than lognormal, because they're composed of a large number of subtasks. But it turns out that lognormals are a pretty pathological case; sums of n lognormals can continue to look nearly lognormal until n becomes really, really large.


I find this interesting. I've been told that law schools teach that when estimating how long something will take (not how many hours of effort, but calendar time) they say to double your estimate, then increase the units by one. So what you think is a 2 hour task will actually take 4 days to complete. 4 weeks will take 8 months.

I wonder how much of that is professional wisdom that embodies this lognormal distribution. Law isn't quite the same as dev, since there can be a lot of lost calendar time just due to certified mail, or court dates, or whatever. But I've found that when I apply this formula to my estimates, I'm a lot less anxious when I start to bang my head against unforeseen setbacks.

Of course, knowing which value to emphasize to a manager depends on why they want the estimate. Sometimes they really want the effort estimate, and sometimes they want the calendar estimate, and I'll sometimes give both if the context warrants it.


This points to another issue I've observed is that managers/operators ask for an effort-estimate and then map that directly to calendar estimate.

That and thinking that two 4h efforts can be completed in one 8h workday. MFer interrupt work so frequently how do they forget it happens? Forgot you had me in that BS 2h sales meeting "for show"!?


Yeah, but who do you think are in the best position to know what your calendar will look like?

You.

Always give calendar estimates. That is, after all, what matters to coworkers, to the business, to partners, to the customer, and to users.


That depends on whether you're in a position to set priorities. I can tell you that something will take a week, but I can't tell you that it will be done next week unless I'm permitted to work exclusively on it this week.


Nobody is ever able to work exclusively on a thing for stretches of a week.

I'd suggest starting to keep track of how much of your time you can take to work on the thing you work on. If it's between 30 % and 90 %, include that factor in your estimation procedure.


I think we're talking past each other. I'm not referring to a percentage of time being consumed by meetings and such; I'm referring to how frequently I'm asked to multitask because several urgent projects are going on at once. New priorities are added unpredictability, and sometimes I'm asked to clear all other work off my plate to fight some fire.

Of course that's a sign of trouble, but it's natural for the stage this company is at. (Note: different company than the one that did the lognormal study I mentioned a few posts ago, and a different role for myself, much more directly exposed to customer issues).

In an environment like that, calendar estimates are completely nonsense because the priority list itself is volatile on the timescale of the estimate. But work estimates can still be stable; I can still accurately report that I have eight hours of work left on issue A, and when I provide that estimate along with work estimates on issues B and C which are also assigned as "top" priorities, management can use that to decide whether I'll finish issue this week or not.

If you're sufficiently informed to know what your priorities will be on the timescale of an estimate, you can make reliable calendar estimates.


Maybe I fail to see the point of even bothering to estimate something that is so unimportant it can be (or even appears likely to be?) down-prio'd in favour of more important things so quickly.

Could you explain in what stage to switch between things so quickly few things are ever finished, and why it makes sense to estimate those things at all, in that context?


Early stage, need effort-budget to CBA vs risk to build calendar estimate.


The rule of thumb that I was taught for software delivery is to double it for every layer of management you have to deal with.

So if it's just you to a customer, 1 week becomes 2.

If it's you, reporting to your manager, to a customer 1 week becomes 4.

Although I generally don't apply multipliers this liberally, I do keep in mind how many stakeholders there are that can add uncertainty and friction to a project.


I've noticed that increasing the number of something usually indicates increases complexity. Working with another team takes more time than just keeping the work in one team. Working with two takes longer. Same with things like the number of repos, languages, or services.


That may be an expression of Brooks Law; increasing the number of people involved increases the average error count.


I bet that analysis went down like a lead balloon. Any other response than "we were a naughty department, but we have fixed the problem and are really sorry about it" would have fallen on deaf ears at all the places I have worked. Log-normal? Most people outside of tech dept dont know the difference between mean, median and mode.


Fortunately, this was a tech-internal meeting; management was sympathetic and pretty tech savvy and understood the takeaway, if not all the math. They had been legitimately worried that the department was not capable of holding to commitments even when they freely selected the date to commit to. And the developers themselves, to be fair, had no idea how good their estimates actually had been the whole time, because it had been viewed through the lens that a ticket that took 8 hours should have been estimated as 8 hours.

The presentation to the sales department was very different, and can be summarized as "we'll try to be clearer from now on to distinguish estimated delivery dates from committed delivery dates, and don't be surprised if they're months apart."


I'm not sure if I would really consider that a legitimate worry.

Software developers' expertise is in developing software, not in analyzing statistics to predict the future.

Developer estimates are one of the best data points you have to predict how long something will take, but if you think that you can just take them at face value, you are not doing your job as a manager. You need to compare them to past estimates and how those estimates panned out during development.


Exactly, hence this endeavour! Where we discovered the secret map between the software developers' predictions and reality, which management worried was completely uncorrelated but it turned out to just be the lognormal distribution.


> Software developers' expertise is in developing software, not in analyzing statistics to predict the future.

Which is a shame and causes so much wasted work. I wish we all got better at this.


Yep. I would claim analyzing statistics to predict the future is an integral component of being a good software engineer.


I can independently attest all of your comment from my own internal research. Well summarised.

I also think fitting the exact distribution is unimportant and uninteresting. The important thing is that you have roughly the right distribution and that it has the fat-tailed property of only slowly converging to normal.

Why the distribution doesn't matter beyond that is that, I think, finding the exact 95 % upper bound is probably impossible to do with any statistical significance, because of its rarity and how contributing factors change over time. Getting in the right ballpark matters a great deal.

Verifying that you get in the right ballpark (only 1 in 20 blow their committed date, and they do so independently in time, etc, the usual value-at-risk stuff) is fortunately also trivial, no matter the underlying distribution and it's parametrisation at the time.


Thanks, and I'm glad to hear you've found the same thing! So including the article, that makes at least three of us, it seems.

I agree that it might not be terribly important practically, but I think it's interesting because it might hint something about the underlying nature of task planning. Sort of like how if you were measuring radioactive decay times and found a Poisson distribution, you might learn something about the underlying nature of radioactivity.

What it is exactly, I'm still not sure, but I do think there's something there to poke at.


One thing that makes software special and might contain a kernel of an explanation is that software is scale-free: big software is build from many small software, which in turn are build from even smaller software. This makes it different from e.g. houses, which are not built from many miniature houses.

I speculate this self-similar nature drives a lot of the other odd properties we observe in software development, but I haven't come up with a specific model yet.


> Another interesting thing is that you'd expect, by the central limit theorem, that sufficiently large tasks would eventually become normally distributed rather than lognormal, because they're composed of a large number of subtasks. But it turns out that lognormals are a pretty pathological case; sums of n lognormals can continue to look nearly lognormal until n becomes really, really large.

I’d say that the CLT wouldn’t apply here since the sub tasks aren’t independent? E.g. a discovery when doing sub task C could mean a change to already-completed sub tasks A and B


> ...if you want to make a commitment you can take to a customer, you can just extend the lognormal up to...

What you're proposing is effectively a fancy "padding factor", it's like on Star Trek when Scotty says they can only go at Warp-8, but really, Warp-11 is possible.

Sometimes stakeholders do behave like toddlers and get red in the face and stomp around when a "promise" is broken because "we aren't there yet." (ok, maybe not literally, but whatever the adult version of a toddler flip-out is).

But most of the time people can understand that unexpected problems come up. It's perfectly fine to make log-normal plots and think about statistics, but everyone will much happier if you explain what's going on to the stake-holders instead of just giving "black-box" estimates using confidence intervals.


The idea here is just to be precise in what we're communicating, when providing a forecast, because different departments have different risk tolerances on that forecast.

What the lognormal says is "it might even be done by late May, most likely by June, but almost certainly by December".

The dev department needs to allocate people, so for them, the most useful thing is the mean completion time. So for them, June is the important thing, although they want to be aware of potential delays in case they need to reshuffle.

The sales department is playing a very different game. Upset, impatient customers cost a lot, both in terms of sales and reputation, and can be very unforgiving. Sales much prefers to communicate a hugely pessimistic version of when something will be ready, that they have confidence in, and possibly be early, rather than a 50-50 estimate.

This is decision theory in action, where each actor combines the same probability distribution with different payoff matrices and so arrives at different conclusions suitable to their role.

To enable this, we clarified the language around "estimates", "commitments", and "promises", as shorthand for different intervals of the same underlying distribution.


I disagree somewhat. There's a world of difference between saying "here's when we think it'll be done, but sometimes shit happens and there'll be a delay" and saying "we're 95 % confident it will be done by this date" -- especially when you have a solid track record behind the second.

The first one tells you effectively nothing. The second one you can build strategy on.


> "here's when we think it'll be done, but sometimes shit happens and there'll be a delay"

I agree that a statement like that would not be useful. What I am saying is that it's better to make a good effort at estimation but then keep the stake-holders informed about what is going on, about the challenges and the mid-project delays.

Any "95%" confidence calculation is kinda dubious. It would only work if you have meticulous and accurate records, have a long-enough history of delivering the same kind of product with the same team, the same customer-base, and the same tooling, oh yeah, and everyone has the psychological safety factor to be honest. If you work in a place where that exists, more power to you, but I think that's rare.


I think I can work out a 95 % estimation I am willing to stake money on even without the things you mention. It does take good management/culture, but if I don't even have that, I'm headed for the dumpster anyway and should go home and rethink the whole thing first.

> What I am saying is that it's better to make a good effort at estimation

But what does that estimation mean? If we don't know after the fact whether the estimation was good or bad, then it's useless because it's an answer to an unknown question, and every stakeholder will interpret it differently. In order to know after the fact, we must know ahead of time whether we're aiming for 50 %, 90 %, or something else.


> I think I can work out a 95 % estimation I am willing to stake money on even without the things you mention.

I doubt it. It's dicey to do estimations even when everything is well understood, but if you lack detailed and accurate records of past work for very similar projects... the best you can do is pad like crazy, cross your fingers, meet the deadline and say your confidence interval calculation was "accurate."


If the lognormal distribution is correct, the 95% confidence interval is a bit over 5 times the developer's gut-feeling estimate. The 99% confidence interval is 10 times the developer's gut-feel estimate. We found a lot of data to back this up, but your mileage may vary.


This is a principled approach.

And of course it does depend on the nature of the contractual relationship with your client.

However, even if we charitably assume that the developers are accurately estimating the mean, that means that out of 100 projects, 50 of them will be delivered late. It is likely that you'd want to do better than that, and to do so in a principled manner, you're going to need to adjust the estimates that come out of the estimation process.


My understanding is that the "multiplicative factor" can typically be mostly attributed to what I call "quality of organization". Not to be confused with efficiency, the quality of organization is measure of how smoothly things go. For example, how likely it is that you are going to be distracted from your task, how much unpredictable synchronous coordination you need to complete your task, etc. An organization which can plan more of its work will have higher quality and an organization where more of its work is of unplanned nature will have lower quality.

There are some possible exceptions. For example, some organizations may have a lot of work unplanned but that unplanned work is segregated to not affect people working on planned work, etc. For example, you may have operations and development teams where operation absorbs most of the unplanned stuff and only rarely bothers the development.

Now, just because work can be accurately planned doesn't mean the organization is efficient. As an example, military tends to be relatively quite good at planning, very inefficiently (ignore the latest failed high profile project...) On the other hand startups tend to be more efficient while also less predictable environments.

I like to think that at some scale of the project some level of trust in projections is necessary to be able to even organize the project. So some drop in efficiency is acceptable in order to be able to plan work.


In this case, the time tracking was done in a very fine-grained way; you'd log your working time against a given ticket or other thing. So if you got interrupted to go to a meeting or something, that time got logged to the meeting, not the ticket.

Of course interruptions do incur a cost to context switch. But that probably won't explain how one bug fix that you thought would take an hour actually did, and another that you also thought would take an hour took five, especially if you did them both on the same day.


> Interestingly, we did extensive time tracking on a multi-year in-house software project and collected data comparing the estimated completion time of tickets with their actual time.

> The software department was under a lot of pressure to improve their forecasting, and were somewhat despairing that their estimates were off by a factor of about 1.6 on average

Hum, 1.6x average is actually extra interesting to me - if case time tracking is left on while people are in meetings/etc, this could translate to a perfect estimate with about 1/3 of their time spent on non-casework (meetings, co-worker interruptions, etc). That kinda sounds about accurate to me, depending on the day...


This company handled time tracking quite differently than most, because everyone was paid by the hour, even devs. The same time tracking system was used to track time spent on tickets as was used to by the accounting department to track hours. If you went to a meeting you'd update your timesheet with the meeting and its billing code. People were quite religious about maintaining an accurate timesheet as part of the culture. I think it's about the highest quality dataset you could hope for.


This is pretty interesting, do you doing you are able to defend those estimates using the same methodology?


>do you doing you are

I have no idea


In my experience from the past decade, it is often the case that, the difficult thing isn't estimating how much time the proposed project takes to build (this is the difficult part for some devs though), but it's how much other stuff comes up. Stuff includes everything not related to your project. E.g. your last project that is now on prod, where bugs were reported and need to be fixed. Another team needs your expertise on something and next thing you know you're in a 2 hour meeting with them, and also reviewing their 1000-lines PR. Maybe a real production fire is happening and you have to help and it takes you the next 2 full days on it.

Because of this, my general approach when estimating a project in my head is how much time I know it would take me if I get to work full time on it with nothing else on my plate. Then I can better estimate how much time realistically it would take after factoring in X number of hours per day on distractions. In many cases I also communicate with the stakeholders (PMs, biz ops, leads) directly "This is how long it would take me if I dedicate full work days on it" and let them take it at face value too. (It's a sentence that usually ends with "So realistically probably [2x of what I just said]")


Yes, we've recently started to track how much time developers are working on things that were estimated vs how much time they're spending on other things (standups, retros, refinement, one to ones, company meetings, training, customer meetings, builds, support and maintenance....)

A surprisingly large portion of the 'estimates being wrong' is actually down to the fact that people generally estimate that they're going to be working 100% without interruptions or other tasks. Sure, there are times when tasks are harder than expected, but this is a smaller factor than I had originally anticipated.


I'm not at all experienced on the subject, but, so far, the mindset I liked most is the one presented on "Software Estimation: Demystifying the Black Art": one must be aware about the difference between an estimate and a plan.

Estimates are statistical by nature; they are made of "educated guesses" and historical and empirical data available. Estimates must be unbiased. They are unnegotiatable and they are not commitments.

Plans are not statistical. They are oriented by estimates, but they must present concrete dates and, when they are not met, the plan must be revised and renegotiated. Plans are commitments.

I think this separation brings useful implications and tools to argue sensitively with stakeholders. I've always hated having to commit to hard dead lines based on estimates. While I was aware of the importance of providing concrete deadlines, it felt like I was taking the entire risk of the estimate for myself, which is unfair, to say the least.

By making the separation between estimates and plans, I'm more equipped to discuss the matter. I can show the estimate and, when pressed to commit with a tighter deadline, I have some leverage to trade the desired deadline for a decrease on the project requirements set, for instance.


Depending on the complexity of the plans and the dependencies between activities, you absolutely can go over a planned date, so long as it's the early start. In critical path method scheduling, you will typically show an "early start" and "late finish" dates for an activity and maintain float to the early or late start of the next (driven by other activities in the DAG and availability of resources assigned to them). I (and others) like to explicitly state contingency factors and/or risks as successor activities that consume this float. They can also be used for Monte Carlo analysis where those activities are randomly increased in duration (usually on some statistical curve as discussed in the article)

What we are really doing is simulating many many permutations of the effects of known risk likelihoods and consequences across the graph. Otherwise in a large program of works we will tell Project Managers or Contractors that they can manage their own float which gives them flexibility to execute.

Another method that simplifies this is PERT[1] method, where you take, optimistic, pessimistic and normal durations to produce an expected duration.

[1]https://en.wikipedia.org/wiki/Program_evaluation_and_review_...


Estimates are for projects where you know what you're doing. You can estimate home construction because you've built a home before.

I can estimate the happy path. I can't estimate edge and corner cases until I get closer to the edge or corner. It's not even there until I get close to it.

I strongly agree with these statements from the article:

>"Tasks with the most uncertainty (rather the biggest size) can often dominate the mean time it takes to complete all tasks.

The mean time to complete a task we know nothing about is actually infinite."


Totally. Software estimates are like looking at a dungeon map where you can see the beginning and the end, but the rest of the page is blank until you've actually entered that part of the map.


What I never get about estimates is why people don't just reduce the granularity until they have some amount of accuracy they can work with.

95% of my teams have estimated using the 'story point' crap, or rarely in days (one in hours for a hot minute, that was a fucking joke I'll tell you what).

In every single one of those teams eventually some stakeholder chucks a hissy fit because some feature they're emotionally invested in was 3 days worth of points and it ended up taking a month.

If they'd measured in weeks, that blowout would have been lost in the noise of other things that took 2 weeks instead of 1, or 3 instead of 2. If they'd measured in months, it wouldn't have been a blowout at all.

Which sounds like putting a band-aid over a gaping wound, but the thing is, what does your business really need to respond to on the time scale of days? The point isn't to improve the estimates (everyone should know by now that that's a fools errand), it's to improve the business response to what the development team is doing.

If you're trying to decide whether to launch in 8 months or 12, or you need to time a project to some external event that's 6 months away, and your plan is to have a big "how are we tracking" meeting once a month, or once a fortnight. Then why do you need to measure in anything more granular than a month, or a fortnight?

Set goals to the cadence of those meetings, try to make them as objective as possible, then hands off and go find something else to do. IME 95% of what teams (especially in enterprise) call estimates is just something for a PO or PM to jack themselves off to for 30 hours a week on projects where their input is only really needed for around 10.


> "What I never get about estimates is why people don't just reduce the granularity until they have some amount of accuracy they can work with."

I think you've kind of answered yourself here. Software development is fractal-like. The closer you get, the more complicated. If you want to estimate smaller chunks, you have to do a lot more work. At some point it's better to just start developing than to zoom in further, because at that point you're basically developing anyway.


> At some point it's better to just start developing than to zoom in further, because at that point you're basically developing anyway.

Unfortunately this is what management / PMO actually want.

To somehow magically do the development in your head during a meeting so that you can accurately estimate some loosely defined task.


In my experience, people don't estimate granularly because they want to be fast. They see this activity as a chore, an overhead, and want to get it over with quickly. And planning sessions do take a lot of time even if we minimize the time it takes to come up with estimates.

Sometimes, inaccurate estimates come back to bite us. So I regularly make the point that spending time on this chore is a necessary evil.


I think of this often, and generally call it the https://en.m.wikipedia.org/wiki/Coastline_paradox -- in addition to perceived absorbtion for stakeholders, I think the estimators too find more details the closer they look.


Sometimes you even fall through the floor and find a whole 'nother world down there.


Sometimes you find out that it's a Spirit Temple project and you won't be able to complete it for another seven years.


I've always seen it like trying to measure the length of a coastline or any other fractal curve. The more you zoom in any particular task or part of the curve keeps revealing details previously unseen.


Fog of War in Starcraft


You're saying the right thing but drawing the wrong conclusion from it, I think. If you can estimate the happy path and generate some corner cases until your 95 % estimate is right 19 out of 20 times -- that's really fucking valuable to your co-workers, your customers, and your end users.

You're not asked to be prescient -- just estimate an interval that is actually statistically useful.

The 5 % worst case events that blow up and take months when you said days, those need to be handled separately, differently.


Sure, I think different people use estimates for different reasons.


Sometimes it comes down to poorly understood user requirements, necessitating design work in the middle of a sprint, causing a cascade of other issues like new data models, different table relationships, updates to UI screens, and on and on. Agile might pride itself on being able to handle such things, but...it’s not. Nothing is. Poor planning, no matter the methodology, results in slow, cumbersome, error-prone execution and tons of rework, 100% of the time. This was not a meant to be a rant against Agile, but in BA (Before Agile) we spent a great deal of time in requirements gathering and architecture research, so that by the time we were ready to code, we could focus on the how and not the what. And damnit, in BA we had requirements documents that actually described what needed to be done. For sone reason that is probably a misunderstood principle of Agile, requirements are considered completely unnecessary or at best a waste of time. The prevailing attitude seems to be just build the darn software, we’ll figure out the requirements later. This is like trying to build a plane while on the runway with passengers trying to board and the control tower screaming at you to take off.


I don't know where you worked, but have I worked in a place that did a design document, dropped it off on my desk, and left me to work.

Them: "Why didn't you include the functionality to twiddle the foo?" Me: "What? I don't know what you are talking about." Them: "I swear it was in the design document, it's one of the most important features!" But it was not.

Every design document has flaws, and even if they are as accurate as possible, many projects had to be updated after the fact as even the designers themselves did not predict some of the functionality needed. I've found that when developing new things you need to have steady communication with the people who need to use it, to understand what they need and why.

For recreating old the things (CRUD apps and whatnot) I'm sure this is not nearly as important, but I have always wondered why recreating old software with modern tools has usually resulted in everyone just using the old software. I do know in one case I've done this, I was told not to include certain features because no one uses them... only to find our users not updating, because everyone needed those features. Some times even the vendors don't know what the customers want.


> And damnit, in BA we had requirements documents that actually described what needed to be done.

That’s some damn fine nostalgia juice you’re drinking there. Can I have some of that?

Me, I had the time and expense of requirements documents combined with the pleasure of them not actually describing what needed to be done.


All things considered, agile probably still produces results faster overall. Its like running the requirements gathering & software dev concurrently rather than serially.

Faster = cheaper. So the logic of economics dictates that is exactly what will happen, even if it is more stressful :/


At my software agency, we were actually inspired by this post when it was on hn a couple years ago to, in a hackathon, build a simple machine learning model around it.

Then, recently, we actually built our own internal tool to add "uncertainty points" to estimates for each client project we do.

And the ML model predicts better and better how uncertainty uniquely affects each project we're using it in.

We've been using it on 1 project for and it's actually working surprisingly well.

We're also in the process of rolling it out to most of our other projects, or at least the ones using Pivotal Tracker, which is the only tool we're supporting right now.

If happen to be using Pivotal Tracker are interested in testing it out yourself and giving us any feedback you might have, plz shoot me a message at the email in my profile.


And management would like to add something like “optimism points” so they can cancel that out and have better metrics to report to stakeholders.


This is interesting. Have you benchmarked it against e.g. a linear regression model built on your own subjective estimations?


If you ever port this to jira, I’d give it a shot


Hofstatder’s law: it always takes longer than expected, even after accounting for Hofstatder’s law.


The 80-20 rule: the last 20 percent of the work takes 80 percent of the time.


I've hear it as, "the first 90% of the task takes the first 90% of the time, and the last 10% of the task takes the other 90% of the time." :)


Amin


The critical thing many people miss about this rule is that it applies recursively. So of the 20 % that takes 80 % of the time, in turn 20 % takes 80 % of that time.

Go further and you get a 50-1 rule: 1 % of the work takes 50 % of the time. (Or, equivalently, 50 % of the time gets you 99 % of the way there.)

This is a strong argument for clever scope reduction.

Edit: To be clear, this is the Pareto distribution with α=1.16 -- and I don't think this applies to software implementation. The tail isn't quite that fat, in the data I've had access to. But the general idea still applies, except with less aggressive percentages!


My corollary: Never attempt to account for Hofstadter's Law!


The hardest part of this is communicating it to a client. “What do you mean it will probably take 5 weeks, but could take between 2 and 10 weeks? That’s a pretty big span. The other programmer I asked promised it would be done in 5, period. Are you sure you know what you’re doing?”


I've made a comment some minutes ago about the difference between estimates and plans. Making the client aware of the difference may be useful.

However, it's also a fact that most clients ask for being deceived. I've seen that scene over and over: Company A presents a fairly detailed 3-months plan while Company B salesperson says that the project is easy and they will deliver it in 1 month. Company B wins the project most of the time, only to (possibly) deliver it in 6 months or so.

Some clients even train their employees to avoid such traps, making them informed of estimation techniques and encouraging them to use plain good sense, but those are exceptions.

Software projects would be way less painful if clients, and not only software development companies, worried more about achieving higher levels of maturity.


We used to try and present estimates as a range ("between 5 and 10 weeks"), but we've settled for trying to provide a single number rather than a full span.

For our time and materials estimates, we try to provide an ~85% confidence estimate, while for fixed bid projects we try to aim for a ~95% confidence estimate (and there's usually a pretty big gap between those two numbers, for the reasons pointed out in the article).

So, in this scenario, we'd probably say to the client that it'll be done in about 8 weeks (making clear that's an estimate, and not a promise). One number, rather than a range, seems a lot easier to explain to a client and if you end up low (say it takes 3 weeks), then you have a pleasantly surprised client.


I think there are a number of things that contribute to problematical estimation, and a number of development tactics that make the damage exponentially worse.

In my experience, I've found that managers will pretty much always end up using Waterfall, the higher you go. It may be agile in the trenches, but it's Victoria Falls, in the boardroom.

At least, projects that have multiple, converging workflows tend to end up looking a bit "waterfallish." I have not had the luxury of working on a standalone, single-line project (until now, as I'm working on my own).

Working on my own, I have developed a number of techniques that start from the assumption that any estimate made, that looks forward more than just a few iterations, is a dumpster fire. I've learned to work around evolving specifications, project progress variances, and the need to reduce tech debt.

This reminds me of some of the work Watts Humphrey did, way in the day, and also a lot of McConnell's work. I like McConnell. He has a very realistic approach. Humphrey's work was too "eggheadish" for me, but people that used his process raved about it.


I tend to be skeptical of statistical analyses like this.

At the end of the day, it's usually someone who is paying or funding a task, and someone who will perform the task. The person performing the task wants to please the person funding it (in order to secure that funding). This commonly involves minimizing difficulty and giving optimistic estimates.

People who give straight answers (something like: this is really hard, it will take a lot of time, there are lots of unknowns, and might not be achievable given the budget) won't get the award. Those who overdo the bullshit and give transparently falsely optimistic estimates will not receive it either. So those who fall in the middle do end up getting the award.

TLDR being, model this effect Game Theoretically, where the "blowup factor" is derived from a Nash equilibrium between counterparties playing optimal strategies seeking to optimize their payoff.

Finally, take three companies: Company A delivers a project in 50% of the time, and 50% under budget. Company B fails to deliver the project, after consuming 100% of the budget. Company C completes 90% of the project with 100% of the funding, and only needs 20% more money to finish it. The customer will most likely top them off, because they are so close, and it's a sunk cost.

Who makes the most money? Company C. Who is the biggest loser? Company A (despite being the best). Anyway, for my next trick, I'll explain why this same effect causes agile to make everything take longer, cost more, and make everyone more miserable.


How does the ABC example extends to repeated games?


Company (or team, or individual) A "the best" is perceived to not work hard. They are perceived to be underutilized. They miss out on so much potential revenue they don't make it to the next round.

Company B is obviously incompetent, and doesn't make it to the next round.

Meanwhile, company C, barely competent, is perceived to make best use of resources and remain fully utilized. They are received to work really hard. They get not just the initially agreed upon budget but a top off in addition. They have the most profit to compete in the next round.

If company (or individual A) makes it to the next iteration, they know how to manage expectations, stretch the budget, and make a theater of working hard.. thus turning into company C.

After enough iterations everyone kind of ends up like company C. Just good enough to complete the task with estimated level of effort some double digit percentage over the original estimate, but not incompetent enough to flop altogether. It is to this I attribute why software is usually behind schedule and over budget to a reliably predictable degree


Did you swap B and C here? C was the “completed with a bit of extra money” one in your original where B never finished.


Thank you! Fixed in an edit


Ah right same dynamics as departments using up all allocated funds by an end of a fiscal year.


Would love to hear your take on agile. You should write a "blog" post about it... Company A would eventually get a good reputation and win in the long run.


In an idealized world consisting of perfectly rational econ agents company A would win out. However we live in a world of people full of cognitive biases and driven by fear, greed, sentiment, and a whole slew of not perfectly rational drives.


I've seen individuals and small groups that would fit in category A loose out to C's. People that quietly get on with their work without problem whilst a bunch of drama queens get all the attention for heroically fixing problems they created in the first place.


This is exactly right.

A team that crams agile sprints with tons of "tickets", makes features that aren't fully thought out (necessitating more "tickets" to fix them), and has steady influx of general bugs cropping up is perceived to work super hard. People like to see noses to the grindstone and people stressed out.

The team that works carefully and deliberately, does things well the first time, and seems to be moving at a relaxed pace is perceived to be lazy and inefficient. People are offended by this. This hit me hard the first time I was a project lead -- the customer was delighted, the product was steady and reliable, but my staff were super uncomfortable and insecure -- because the project was successful and they did their job well and like professionals, they realized that they weren't super busy. And when they aren't super busy it means they may be on the chopping block - it was terribly depressing. No amount of reassurance that, well, this is what life is like when you're good at what you do really settled anyone.

So, consciously or not, we fill our time with "work theater". We work really hard to write new code that then requires more work to get right and to fix, and with a backlog of work, it looks like we're working really hard. Being always a little behind, we then make the case for more funding, more staff, more resources, etc etc etc... And the cycle continues.


I'd love to hear your standpoint on the agile.


Agile plays this scenario out every two weeks, instead of once per project.


Because we take on stuff we don't know enough about.

That doesn't mean it's new or hard, it's often just stuff we didn't do before.

Or we didn't document what we did and months later start from zero.


In automotive mechanics this is paraphrased as such:

"Every 20 minute job is one broken bolt away from being a 3 day ordeal."


I'm not sure this unique to software projects. It seems like any major project is subject to similar issues. Although for software projects I think the perception is that things are more malleable, so adding a requirement/feature etc is seen as less problematic than, say, deciding a new construction project should have a few extra rooms added to it.


Past discussion with a lot of comments: https://news.ycombinator.com/item?id=19671673


Weber–Fechner law states that: "that the subjective sensation is proportional to the logarithm of the stimulus intensity".

So maybe the "time estimates" actually are estimates for expected stimulus intensity.

Think about it as 'guessing the level of dread you feel when looking at the project /after/ you finish it'.

According to Fechner this should be related to the actual difficulty by a logarithmic law ( at least for humans).

These estimates of sensation are than just normal distributed, as one would expect.


Manfred Schroeder's "Fractals, chaos, power laws" (1991) had a similar statistical distribution (Levy? or something else?) which reminded me very much of software. Basically, the longer something has been observed to remain uncompleted, the further away the expected date of completion becomes. If you have to slip, slip big.


Fits with my rule of thumb for estimation.

Come up with your most pessimistic estimate that feels right, then double it.


Based on personal experience, I use the factor π. Maybe the appropriate factor varies with surrounding conditions that are hard to account for. Also, use the coarsest granularity you can get away with.


An entire management layer could be removed from the average software company or consultancy if only it were possible to maintain multiple estimates for one project. Expecting one number to inform resource scheduling, account management and new business is expensive.


A story from a rainy summer in Berlin:

One day my manager came to my office and said: 'Look at this task. How long will it take?'

I thought 'two days' and I told him 'ten days'. He said 'Ok, do it!' and went away.

I was happy that I had the chance to do it very well. To take the time to write really well structured code without pressure. And I did it.

At day 5 I my work was finished. I was proud of it.

At day 6 I took the time to re-read everything and to do intense testing. I found a minor bug and fixed it.

At day 7 I practised for my next martial arts exam during the work time.

At day 8 I went to my manager and told him 'It is finished and it is good'. He was very pleased to hear that I finished early! Before the deadline!

My colleagues were happy with the well working code.

I call this a win-win-win situation. Oh, what great memories. The day when I told my manager that the task will take 10 days.


These comments read like from a different galaxy (for me). Here it is very clear why. Management asks: how long does it take. Engineer replies: will take X man month. Management: thats too long, lets estimate shorter.


Project Manager: I promised the customer we'd deliver Vagueness v1 by next Wednesday. Can I have a build?

Engineering Manager: ok, Engineer, we need Vagueness, how long will it take?

Engineer: I can whip up a prototype in about a week but it's only a prototype and will lack any functionality. I'm also busy trying to get the fire put out on the fourth floor and my kids' daycare is closed because of the riots.

Engineering Manager: OK, 3 weeks plus another 3 weeks for QA and bug fixing. Also, have those reviews done and fill out your daily TPS reports. Oh, and can you just slip in those other two projects we talked about informally last week?

Project Manager: OK, I'll promise the client they'll have it by Tuesday. Where do I tell them to download from?


I clearly remember estimating something with a manager at a previous job, saying it will take 2 weeks. His response was, that's too long, let's try 2 days and see what happens.


Have had the same experience :-) It took two weeks of course.


so what happened?


I estimate they will answer in 2 days.


The reason is way simpler: since no project can be completed in negative time, probability distribution of completion time will be skewed to the right (with edge cases of positive infinity)


I always treat it as a thumbsuck because who really knows. I could tell you two to three hours to solve it and then it takes me five minutes. I could say a week and only end up using a day.


I've always found business expectations are to complete in less time than my estimates. My estimates are often a little under as well.


There's a simple method for software estimates that has worded well for me in practice. Imagine how long it should take if all goes smoothly, then multiply by pi.


What does "worked well" mean? What percentile do you practically end up at when you go back and verify after completion?


By "worked well" I mean that x3 is the "about right" factor. As software engineers we tend to be optimisitic and so when I think "it'll take about a week", that's assuming no unforseen complications. I'm afraid I haven't conducted a detailed a statisical analysis, however. The pi method is in truth just a joke between myself and a former project manager. There's nothing magic about pi exactly. I imagine each person ought really to come up with their own mental multiplier based on their own tendency to underestimate. But 3 isn't a bad starting point for iteration.


I guess I'm asking what "about right" means. Does it mean you finish faster than that number 50 % of the time? 90 % of the time? No statistics necessary, just your gut feel.


Ah, good question. First instinct says 'success' should be statistically 50% each way. Right in the middle of the bell curve.

But on second thought, being behind estimate half the time would be pretty sucky. So I'd like to bring the peak of the curve onto the early side of the line.

I guess ultimately it depends on the question being asked of the estimator & the implied confidence interval.

What do you think?


99% there is no incentive to make accurate predictions.


You never had a customer rip your head off because they took your estimate as a promise and based their own deadlines on it?


Exactly. So when a customer asks for an estimate you negotiate for the biggest buffer you can get away with, there is no incentive to make an accurate prediction.


Now I get your point, and agree. Plus, coming out below that estimate will make for a pleasant experience.


It depends. In most typical cases if it is billed by the hour (inc. regular employment) it does not worth it to deliver early. When failure to meet deadlines is expected the optimal time to deliver is after the original estimate. The exact numbers can vary a lot.


So lognormal implies the total time is the product (not sum) of lots of little random tasks. Why would this be?


Thanks for asking the question. I also wondered. I can't think of any obvious reason such a model would apply to software development.


Software project estimation is fractal.


And quite possibly NP-complete.


More like NP-hard.

I don't think that something like a SAT solver will help you much with your estimations.

But you can probably turn a NP-complete problem in terms of software project estimation. The traveling salesman problem looks like a good candidate here.

Edit: come to think of it, it is like the halting problem, your project may never finish


> Edit: come to think of it, it is like the halting problem, your project may never finish

Uncertainty in the solution can only be reduced to the extent you have actually solved the problem. Of course, the problem may may not be solvable or alternatively, the cost can be too high.


It's because of all the bullshit admin work MBA execs give us


What about video games?


Over the course of my life I've designed and built many products/projects. Not a single one that had deadlines imposed by the customer was late. It might deviate on non critical feature set but the main course has been always served properly.


And?

I suspect customers that asked you to, say, build a new inventory tracking tool that interfaces with their existing enterprise infrastructure (SAP, Salesforce, etc) didn't ask you to do it in 2 weeks.

So what was asked was probably -reasonable-. If it wasn't, you probably would have dropped the customer/left the company out of frustration.

If it was reasonable, for a sufficiently fully featured solution that you COULD drop features and still have an MVP, then I would fully expect you to do so.

That's PM Triangle 101; you can give on features while keeping cost and time fixed.

That's just how we handle the fact that estimates are wrong, but we have to keep a date (we drop features, or bring on more people if the work is sufficiently parallelizable). This article is about the estimates being wrong.


>"That's just how we handle the fact that estimates are wrong"

Estimates are not wrong. We agree upfront what is guaranteed to be delivered by deadline. The rest is optional. I might do it if I am significantly ahead of schedule or should they decide to continue with my services after initial delivery. Sometimes they order initial product but then prefer to maintain it themselves.


I know, I'm supposed to be modest but it is the truth. And I did not mean just me personally. Some things I did on my own and for some I had to involve team.

I think one of the reasons is that while I am reasonably good at architecture and programming in general I do not really get hung up on processes / languages / tools / technologies etc. What makes me tick is when I see the product working and serving people. That's the only thing I really interested in.


> I know, I'm supposed to be modest but it is the truth.

I don't think you were downvoted because you were immodest (although I'm sure it didn't help) but because your comment doesn't contribute much to the discussion.

Do you have any comments on the content of the article? Do you have any estimation techniques to share? What were the deadlines based on?


Here you go. From the article:

>"People estimate the median completion time well, but not the mean."

I do not estimate either. I estimate guaranteed delivery date and what functionality will be delivered by such date.

My "special" estimation technique is probably that I do not always do things in prescribed way. I am not afraid to look outside and find ways that would bring solutions faster (sometimes by much) than the standard recipe.

Here is the example of the unconventional way I used to save someone's butt in 2 weeks when the other company (Certified consultant shop charging $300 per hour per developer) could not do it in 2.5 month https://news.ycombinator.com/item?id=21055881


What kind of things do you make, how well do you know the technology/domain, what are the customers/users like, and how long out are the deadlines set? Why do you think you're so successful when so many other people have problems with estimating schedules?


I made various things. Two examples to show the range - Enterprise grade business process middleware for Telcos. Software and microcontroller firmware for law enforcement. And then everything in between.

I have never really thoroughly analyzed why I am successful with the estimate and delivery. I think I'd be way more interested in it if I was late. Since I am not I do not really care. I just am and this is enough for me.

I am eternally grateful to my University (nothing to do with comp science btw) and teachers though who taught me not to blindly remember the rules but how create those. I guess this helped me to become who I am.


There are still people who develop software in traditional projects?

As software can easily changed according to changing business needs, software is never really done. So, it's unnecessary to put it in a project.

While you do a factory building project and decide a point at which you are done, you wouldn't do a car manufacturing project.

Software development is more similar to building cars than to building factories. So instead of projects it is better to think of assembly lines (sprints, kanban boards, etc) and throughput (e.g. burndown). The question is how many business changes come in every week and how many can you solve every week.

Measuring and viewing things the right way then also shows how successful you actually are as a development team.

PS: If your company isn't working like this yet, you might consider getting some external help quickly. This is the standard for quite some time now, but admittedly it's hard to find out what the standard is if you rarely get new people added to your team.


Don't get confused between Projects and Operations. What you are describing is actually akin to operational projects (albeit at the risk of bespoke designing and building each car).

Projects in the traditional sense produce new assets with new value with new benefits meant to be operationalized. You need to segregate associated costs so that you can depreciate the value of the asset once it's operational. Usually the cost is considered an investment and funded by separate finance.

Operational Projects support, extend, enhance existing business assets. This is the assembly line mentality and is more aligned with continuous improvement for which there is usually a fixed budget for the year and is completed during the normal course of business.

Yes there is some grey area but both are necessary.


I see, so the financial side between projects and operations is also different.

When would you choose to do software as a project, coming from that perspective? Maybe the budget for operational things is more limited, and so you choose to do something as a project, although you know later on you also want to still change it?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: