Hacker News new | past | comments | ask | show | jobs | submit login
How to replace estimations and guesses with a Monte Carlo simulation (lucasfcosta.com)
399 points by lucasfcosta on Oct 6, 2021 | hide | past | favorite | 155 comments



Many commenters on this story believe the main purpose of estimation is to assert when something will be done. Then they say "this is impossible" and call the whole exercise a waste of time, or management abuse.

I gotta tell you that unless you're working with absolute bozos, nobody is looking at estimates and saying "oh duh, I am betting the farm this will complete on that date."

Benefits of estimation that are more important that the date itself:

Enabling trade-off analysis: If you tell me the thing will be done in two months, I bucket the thing in my head as something that takes "a few months" as opposed to "weeks" or "years." Often that is enough to drive a build/pass decision on the feature.

Dependency visualization: If I don't ask you to estimate, you might just start coding. Estimation may force you to think about critical paths and dependencies, then we can plan for them to improve your chances of success.

Troubleshooting: why did we miss an estimate? A task was harder than we thought, that's ok. We missed a dependency, ok that's a prompt to think about those better next time. Developer keeps getting distracted, maybe we need to change something on the team. Estimation is just "calling a shot" which then enables us to (maybe) learn from missing it.

All those things are valuable even if the actual estimated date is missed, even by a lot.


I gotta tell you that unless you're working with absolute bozos, nobody is looking at estimates and saying "oh duh, I am betting the farm this will complete on that date."

Happens all the time, and not just by bozos. Sometimes getting paid is entirely dependent on delivering when you said you would. I used to work at an animation studio and one year we worked on a series of Christmas themed TV spots. When we said we could deliver on time we where very much betting the farm our estimates where correct. Nobody is paying to air those in February.


Similar story, in a larger firm. There are internal incentives where teams are compared against each other and a lot of it hinges on delivering things when a team said it would.

This leads to all sorts of strategies to make that happen: from cutting corners, to underpromising, to padding estimates. In the end, the sum of it all is that nobody is able to paint an accurate picture of the capabilities of teams overall because everything is so heavily skewed by internal incentives and competition.


That's not an estimate, it's a deadline. Different concepts.


Surely a deadline is just an estimate you bet the farm on :)


No. You would make your own estimation. If the result is November 1, +- 2 days, you can bet the farm. If the result is December 22, +- 14 days, you should pass.


Not really. You have to estimate within the deadline. Estimates are often required due to such dependencies.


But they are bozos if they give you a task they want done for Christmas, you tell them that it will likely not complete until February, and they say, "well, just work harder/extra hours/hire more people", which is pretty common.

The way to avoid bozo-land is for the person asking to come back with "well, what can you confidently complete in time for Christmas?", and plan for that.


While the article is rather low quality in and of itself, it still manages to point out:

> Before claiming that the simulation above is reliable, I must expose a few things you’ll need to pay attention to when using a Monte Carlo Simulation to make forecasts:

> (A) The quality of the inputs upon which your simulations depend.

> (B) The consistency and predictability of your team.

> (C) The need to re-forecast.

> (D) The size of your work packages.

Among those, I am not exactly sure what the author is referring to with (C). However, problems arise most often when none of A, B, and D is controlled by the person being asked to estimate.

Now, if you can control A, B, and D, YOU DO NOT NEED TO RUN MONTE CARLO. You'll be able to split the work and guess within 10 - 20% just by using rules of thumb.

If you cannot control A, B, and D, running Monte Carlo will not help you.

I say this as a person who's been doing Monte Carlo since the late 80s.


We're stuck in the eternal loop: "Tech Y will save us from ourselves and having to think."


> I gotta tell you that unless you're working with absolute bozos, nobody is looking at estimates and saying "oh duh, I am betting the farm this will complete on that date."

Apparently I've worked with a lot of bozos over the last 25 years... (apologies to everyone I've worked with).


Lion tamer reporting in. It’s all circuses out there.


> I gotta tell you that unless you're working with absolute bozos, nobody is looking at estimates and saying "oh duh, I am betting the farm this will complete on that date."

I gotta tell you, you've never worked with a business stakeholder, the government (see financial penalties for missing deadlines) or with an absolute deadline (ie Christmas) if you think that's true.

> Many commenters on this story believe the main purpose of estimation is to assert when something will be done

We know our estimates can never be guarantees, but that is irrelevant to stakeholders whose career advancement depends on project X being done in day Y. Guess who's opinion carries more weight in literally every profit making enterprise.


You and a few other responses hit on a related but distinct problem: an unmovable deadline.

In this case, the question is no longer "how long will X take" but "but what will it take for X to complete by Y." That can require aggressively cutting scope, reassigning resources, de-risking in every possible way. That's quite different than asking a developer/team "how long this will take" assuming constant resources and given scope.


There is a reason project management calls scope-cost-time the iron triangle. It's extremely to bind all three sides tightly unless the estimates are very good, and/or there is significant added buffer to the project (and that really means the cost side is what's stretching to give you extra certainty on the hard scope and time deadlines).

More commonly at most two of the three are tightly constrained, and another side is more free. One can optimize in some directions, but none of the sides are fully independent of each other. I think the best project outcomes I've seen on are ones where only one side is strongly fixed, and the team is in control of the other two aspects.


You get all of the benefits that you’re looking for just by planning what needs to be done, estimations or not.

You break down the tasks, the dependencies, the risks, complications, places where there are options that need more research or comparison…and you’ll see the volume involved and have an idea of the size in the terms you mean. All without numerically estimating a single task in the plan.


> All without numerically estimating a single task in the plan.

However, often you have multiple paths to achieve a goal, and you have to choose which one you're going to take. Either you estimate each path in days/weeks/months or you'll have to compare all the alternatives (a > b?, b > c?), at which point having numbers is probably worth it.


I would also argue that enabling capacity planning and drawing a boundary around the scope are also valuable by-products.

Perfection may be impossible, but Brooks' Law being a harsh mistress, and scope creep being a very real thing, having a serious think about how big you intend the project to be to be before you start can make the difference between finishing it in months, and having it turn into a years-long death march project.


> I gotta tell you that unless you're working with absolute bozos, nobody is looking at estimates and saying "oh duh, I am betting the farm this will complete on that date."

Meanwhile, anybody working anywhere close to the retail space fell laughing off their chair. Thanksgiving won't move no matter how much you insult your co-workers.

(This is not limited to retail. Many industries have fixed delivery dates and bet on those. Thanksgiving is just the most obvious example of an immovable object)


There's a difference between "it's done now, before Thanksgiving, let's bet the farm on it" and "it might be ready two days before Thanksgiving, let's bet the farm on that"...

Ed: that said, given gp quote about working with bozos, there's always that anecdote from the introduction of (2nd éd) of Peopleware, about the deadline being moved, because the project was on track for meeting the original deadline... (March meeting: we can deliver as planned in June - project leader called back; we've moved the deadline to April - paraphrased).


The deadline is still immovable. Yes, you've planned with a buffer, but you're still betting the farm that ultimately, it will come out in time to capture that business.

And Black Friday season is large enough (30% of annual retail) that you might well try for a farm-sized bet. Been there. Done that. Had the conversations around risk.

It is both "it must be operational by this day, or huge losses" and "we would rather not ship your effort than ship it between Black Friday and Christmas".


If you supply anything that retail needs on Thanksgiving (including physical media containing digital assets), you have a very strict timeline as well. There is sometimes a possibility to pay extra to expedite a later part of the production process (ship via air instead of ship?) but that's still a big hit and often isn't even available.


>>I gotta tell you that unless you're working with absolute bozos, nobody is looking at estimates and saying "oh duh, I am betting the farm this will complete on that date."

I hate to break it to you, but it happens a lot (and yes, management structures can often work to filter absolute bozos upwards). They make all kinds of representations to customers, investors, etc. about dates that have nothing approaching this level of estimation, more of a wishful-thinking goal. And when they fail, investments, customers, jobs, and even companies are lost — often unnecessarily if better plans and contingency plans had been made.

Sadly, just because it is blindingly obvious to you and me, does not mean it is so obvious to someone in a mgt chair.


Eh, I think there is two ways this can go.

If you're really working with people who are 'so stupid' as to take estimates literally and make plans assuming guaranteed success in those times, then you probably have other leadership/management/risk problems in your company and are doomed anyway.

There are plenty of time where there's a little bit of managerial art to communicating estimates to customers and prospects. For example, if my devs say "October, the API will be available", I may share that with the client if I understand that in reality it will take the client a long time to actually start using that API in production. So I may say "Probably early November..." with the mutual understanding that that's when they'll begin to integrate but the thing may not actually be prod-ready from either of our sides till January. It's complicated!


You are making the assumption that development estimates have anything to do with when management, sales, or customers decide that they need something.

Even without mgt/sales over-promising stuff to customers based on wishful thinking and only wanting to say "Yes" to a stupid customer demand without even bothering to ask development/engineering, I've seen situations where the date is the date, period.

I've been in plenty of situations where it doesn't matter - it's an event, scheduled years in advance, national television, and the show goes on at Monday 16:00 PDT, period. Scheduling estimates have nothing to do with it. Be ready and be there, or everyone gets nothing. The only question is: "Can the team do it?".


I don't understand why commenters are so hung up on the "unless you're working with absolute bozos" part. That sentence is irrelevant compared to the rest of the comment. The three benefits listed are a great articulation of how estimates should be used. Couldn't have said it any better.


Explicitly asking for a critical path sounds better than hoping somebody will consider it while doing something else.


this is an incredibly obvious yet valuable comment but all the replies seem to be working hard to miss the point.


While a different application than shown here, my project management class taught me one of the biggest benefits of Monte Carlo simulation - estimating uncertainty.

Traditionally, net present value calculations are done with single point estimates. For example, analyzing a rental property we want to buy, we'd estimate the vacancy rate, interest rate, property appreciation, maintenance expenses, and do all of that on a cash flow time-adjusted basis.

There's variability on every aspect of the through, and understanding how much can easily swing your decision.

Traditional methodology is to pull the trigger on the action if the net present value is greater than 0. We put in a variety of estimated factors, ran it through Crystal Ball (simulation software), and found a 75% chance the NPV would be negative. It suddenly isn't the safe investment it looked like originally.

---

In practice, it's tough to get people in business to understand uncertainty. They gravitate towards "the number", which is almost surely wrong all the time.


The real danger is not understanding that this "uncertainty" estimate is a function of your assumptions. How you model the distribution of your inputs is huge, and often not stated clearly.

GIGO


  > The real danger is not understanding that this
  > "uncertainty" estimate is a function of your assumptions.
The bigger danger, in my experience, is treating estimates as deadlines.

Unfortunately, that seems to be the norm in almost every place that I'm familiar with.


“Don’t worry, this is _purely_ an estimate and would never be used to set the deadline. So just give us something to share with the ELT…”


"Just take a guess - we have to have a number in this box by the end of the meeting". I was literally just told that yesterday. No one else who would look at that number will know it was a totally random guess pulled out of thin air to satisfy ceremonial box-filling. They will treat it as an estimate, and make plans accordingly.


And if you make up a number that is too obviously small or exceeds some unspoken upper bound, you'll be asked to re-estimate anyway. Sometimes the best way to respond to that is to finesse the discussion into coming up with a number that, while it will have no relationship to the actual effort, at least reflects what the stakeholders hope and desire it to be. At that point the team, if they are smart, will examine scope and re-plan to come up with some level of effort that they are confident can be done in the time hoped for.

In other words, get the people who want a number to tell you what number they want, then use your best efforts to scope the effort to one you can be pretty confident will fit.


GIGO, the first thing I learnt, as I entered the industry 20 years ago. This was from a 60 year old engineer who told me that experience is only a nice name for "all the @#$% I made I will try not to make again".

A very nice thing about Monte Carlo simulation is that at the end your distribution of results are all within the feasible range. If you do error propagation using uncertainty on your parameters, you can get non-sense results.

For example, suppose you have a pendulum, you simulate it with error propagation and you have a non 0 probability of having an increase of the energy in the system over time.

It is easy to spot this in such constrained example but with more complex models, this is sometimes pretty hard to control.


> A very nice thing about Monte Carlo simulation is that at the end your distribution of results are all within the feasible range.

I guess Monte Carlo helps provide conservative estimates behind a façade of rigour, but the truth of the matter is that in the end it's still GIGO.

Any empirical distribution only reflects the empirical measures that were used to generate it. If you bundle everything from the time it took employee A to walk the dog while allocated to project Foo to the time it took employee Z to fix a nasty Eisenbug while allocated to project Bar, and Foo and Bar used totally different tech stacks and team members and even approaches to project planning, that distribution is meaningless in estimating, say, how much time it will take employee G to implement a React widget.


I’m smiling a bit while reading your comment. You are completely correct that that estimate is virtually meaningless. And yet… that doesn’t mean it’s completely useless!

I’ve been using a somewhat meaningless Monto Carlo inspired approach to project planning for a while in my consulting company. The vast majority of the projects I do have only a small amount in common with previous projects, so when I’m estimating I’m informed by past estimate vs actual numbers, but not really relying on much other than intuition and gut feelings from past work.

My basic approach is to estimate 2- or 3-sigma upper/lower estimates for each task, modelled (incorrectly) as a Normal distribution in hours, and then sum the random variables to come up with a final distribution (whose variance is quite a bit smaller than what you see on the original bag of tasks). From there, if I am making a quote, I’ll quote at +3-sigma x hourly rate as a high-end effort estimate. If it seems like a very meeting-heavy client, I’ll either add meetings in as development tasks or just pad it out by a %age or fixed hours/week.

This technique has worked amazingly well for me, and it’s been quite rare that I blow the estimate, and in the… one time I can think of, we missed by very little and there were tasks that we hadn’t thought of when we did the estimate.

To your point though (with walking the dog), there’s a really key thing that this process doesn’t capture, somewhat on purpose: the resulting estimate is in effort-hours, not delivery date. While modelling per-task effort hours as Normal is suspect, modelling task delivery dates as Normal is completely irrecoverably wrong: delivery dates, in my experience, only ever slide in one direction. People get sick and the project slips a week; people don’t ever get super healthy and effectively knock out 80 or 120 hours worth of tasks in a week.

I honestly haven’t found a good way to estimate delivery dates very well. At one point I did put together some regression on my “actual billed hours per week” based on my billing, but ran into the same problem. “Oh, my dog died and my mother-in-law got sick during that project.”

Someone who’s better at statistics might have a better way to model that as a high-skew distribution, but when I tried doing that myself I ended up with a distribution that didn’t feel like it did a good job of capturing the non-negligible long tail of things that slow down calendar estimates without burning hours.


This sounds like a really useful approach. But I'm bad at math and statistics and also not my native language.

Could you maybe provide an example with figures?

Thanks


The traditional distribution for costs and dates is Beta-PERT. Douglas W. Hubbard uses log-normal distribution. I am a pessimist, so I like Beta-PERT with a fat, fat tail.


> If you do error propagation using uncertainty on your parameters, you can get non-sense results.

Only if you do it wrong, generally. Events which are impossible under some hypothesis should have zero probability under a model for it and not be sampled.


> experience is only a nice name for "all the @#$% I made I will try not to make again"

One of my favorite quotes is one that was made famous by Will Rogers and Rita Mae Brown:

"Good judgment comes from experience. Experience comes from bad judgment."

I have seen this "attributed to Nasrudin," which means no one really knows who came up with it, the first time.


Estimating uncertainty is preferred, but if you have people who insist on a single number, then teach them that it should be the median instead of the mean. Means tend to overvalue "moonshots" where there's a 99% chance you lose money, but if the payoff in that 1% where you win is large enough, it can still result in a positive mean. And a mean can often be an outcome that isn't actually possible, but falls between several options, while a median is always an outcome that can actually happen, and there's ~50% chance you'll get a better one.


Heh, when people ask for a single number estimate, I give them mu + 3-sigma. If you’re curious, I’ve got a sibling comment that describes a bit more.


It's not just business people. I have yet to find a task tracking system (Jira etc.) that lets you assign a range of points to a task.

People try and use nonsense like Fibonacci numbers to imply uncertainty, but then just add up all the numbers to get a number with no uncertainty measure.


Story points are a curse on the software development industry. However much people say "they're indicative, and don't map to hours", someone, somewhere, will map them to hours.

The most accurate project plan I was ever involved in had only 3 values that could be assigned to a piece of work during the early estimation phase: hours, days, and weeks. Each of those was then turned into a range of possible hours they could represent, with that range expanding as you got into larger units. You could then slice and dice the numbers however you choose to get anywhere between the most pessimistic timeline to the most optimistic timeline. Probably unsurprisingly the project was delivered somewhere between the two.


That seems reasonable, but I would say you don't have to resort to such crudeness. In my experience I know the difference between a task that will definitely take weeks and a task that might take a day or might take weeks. Just let me write that down!

Even if you don't have any idea about the uncertainty we already have a crude way of measuring it - planning poker! Just record everyone's guesses instead of throwing away the uncertainty information. There's a huge difference between everyone guessing 5, and some people guessing 1 and others guessing 20.

I agree about points being stupid though - there's simply no way to avoid it being converted to/from time, because that's the actual unit of work.


Sorry, reading back I definitely come across as disagreeing - I would kill someone for task tracking software which supports giving the degree of uncertainty on an estimate.


> Monte Carlo simulation - estimating uncertainty.

By definition, "uncertainty" is the thing that does not have a PDF ... You can quantify risk based on assumptions you make about underlying probability distributions from which you are drawing. Far too often, even in Monte Carlo, people decide to work with the easy distributions instead of the most appropriate distributions.

The "easy" distributions tend to admit actual mathematical solutions which means Monte Carlo is helpful if you can't do the math but not strictly necessary.

Monte Carlo shines when you cannot get nice solution and there is your opportunity to not be constrained by the need to do so: Draw from appropriate distributions.

Also, know the difference between LLN and CLT and when to appeal to which to justify your methods.


I would think that you could have "uncertainty" even if you do have a PDF. Maybe there is a formal definition of uncertainty that I am not aware of. The PDF can describe the uncertainty for every outcome.


Risk is what you can put a probability on (i.e., quantify). Uncertainty is what you can't. Uncertainty encompasses unknown unknowns. See Knight[1] and Keynes[2].

Good rule of thumb even if it sounds overly simplified.

[1]: https://www.econlib.org/library/Knight/knRUP.html

[2]: https://www.jstor.org/stable/4538116


But ... but ... everyone knows that Nature is linear and Gaussian. Shhhh.


A related field: https://en.wikipedia.org/wiki/Real_options_valuation

> This simple example shows how the net present value may lead the firm to take unnecessary risk, which could be prevented by real options valuation.

Even with a monte carlo on NPV, you may still make a more risky decision than needed; as decisions are rarely just "do or dont" but can be sequenced pending more information.


One of the concepts that many people don’t get is differentiation between risk and uncertainty. These same people will then attempt to “model risk” and end up with (ironically) an “uncertain estimate”

https://en.m.wikipedia.org/wiki/Knightian_uncertainty


They define risk as having statistical noise or a varying parameter in the estimate, while uncertainty is lacking information about the target. Doesn't the presence of noise in an estimate imply that there's information you aren't accounting for? I don't see the importance of the distinction.


I’m a big fan of the Donald Reinertsen approach: measure queue length.

Simply track the time to complete each task in the team queue on average, then multiply that by the number of tasks remaining in the queue.

Each team will habitually slice things into sizes they feel are appropriate. Rather than investing time to try and fail at accurately estimating each one, simply update your average every time a task is complete.

The bonus with this approach is that the sheer number of tasks in the queue will give you a leading indicator, rather than trailing indicators like velocity or cycle time.


Strongly seconding this. For anyone still hesitant, I further recommend the following experiments:

----

Sample a few activities your team has completed. Check how long the 90 % smallest activities are on average, and compare it to the average of the biggest 10 %. Or the median compared to the maximum, or whatever. You'll probably find the difference is about an order of magnitude or less. In the grand scheme of things, every activity is the same size. You can estimate it as exp(mean(log(size)) and be within an order of magnitude almost every time.

Once your team has accepted that something is "an" activity and not a set of activities, don't bother estimating. For all practical intents, size is effectively constant at that point. What matters is flow, not size.

----

For the above sample, also study how long passed between the "go" decision on the task and when it was actually released to customers. In a stable team, this number will be eerily close to the theoretical based on Little's law referenced in the parent comment.

Oh, and you shouldn't focus on man-hours. Work with calendar days. Not only does that simplify mental arithmetic for everyone, it's also the only thing that matters in the end. Your customer couldn't care less that you finished their functionality in "only" 6 man-hours if it took you six weeks to get it through your internal processes.

----

Fun follow-up to the size experiment: now ask someone intimately familiar with your customers to estimate the dollar value of each activity. You might find that while all activities are practically the same size, they'll have very different dollar values. That's what you ought to be estimating.


This. I've found that, whatever the project is, the velocity of a team in terms what they define as a "task" is pretty constant, surprisingly so even. In the end, just counting the outstanding tasks proved to be a good estimator of where we'd end up at the deadline.


This is great for a simple monte carlo simulation!

Choose a finished tasks time randomly, once per remaining task in the queue, and add this up to be a single estimate. Do this 1000 times or so and get an estimated distribution of completion times for the current queue.

This type of thing is covered extensively in Evidence Based Scheduling[0], and is one of the reasons I still think FogBugz ' power is misunderstood.

[0]: https://www.joelonsoftware.com/2007/10/26/evidence-based-sch...


Nice, I just recently left a company where this approach would have been tremendously useful and fairly easy to build - we had all the required data already, but were just looking at averages of past performance to budgets and using that as a multiplier on the schedule rather than going through a distribution.

That being said, I also despise tracking time!

…Suppose you could move that technique to story points (or whatever unit of measurement) though you would lose a ton of precision.


What do you do when your future tasks are unknown or ambiguous?

For example, at my day job my task is to implement banking. The day to day tasks change... day to day. There aren't a "number of tasks remaining in the queue," since whatever I'm doing is what I'm doing.

One could say this is poor planning. But due to the nature of Big Banks, each task is usually blocking the next one -- in other words, it's not possible to discover or plan what you need to do next, until you've finished currently.

An example of this is when we realized we didn't need to run our banking API using $BigBank's test environment. Their test environment was ... uh ... well, let's just say, when we realized that we could simply switch on "production mode" and bypass their test environment altogether, we collectively facepalmed while rejoicing.

It wouldn't be possible to add "switch to the production environment" into the queue several days ago, because we didn't discover that we could do that until yesterday during our biweekly sync call.

I'm sympathetic to your writeup, and I like your recommended approach. But I just wanted to point out a realistic case of it failing. But in fairness, I think every estimation approach would fail us, so don't feel singled out. :)

Perhaps your approach will work in most cases though, and I'm merely stuck in a twilight zone special case.


> The day to day tasks change... day to day. There aren't a "number of tasks remaining in the queue," since whatever I'm doing is what I'm doing.

What you are describing is not ambiguity, it's total variability. If your future is 100% random, it is, by definition, impossible to predict. Such a state would also mean a total absence of direction/vision. Predicting dates is not only impossible but not a question you can ask, since you don't know what's next.

What I'm going to challenge is that you're effectively in such a case, because I don't think it's true.

> One could say this is poor planning. [...] in other words, it's not possible to discover or plan what you need to do next, until you've finished currently. [...] because we didn't discover that we could do that until yesterday during our biweekly sync call.

The example you're giving *is* poor planning. You're going into execution without validating base assumptions. That you discover the specifics of a dependency that late into the game is that you're going into it without a plan. I'm not judging, in your case maybe no-one is asking for any sort of accountability, and just executing is the best recourse with the lower overhead. But the fact that you can't estimate isn't due to the environment, it's due to the fact that you don't have a plan. Some of the companies I worked with are fine with that, most are not.


It's like this for day-day operations when leadership is absent and no CSI ever gets prioritized over new development. You can say the org is dysfunctional, but there's less leverage workers can use to change such situations. Especially when efficiency measures get rewarded with layoffs.


When new scope is found, you update the plan of the project and provide new estimates.

I don't expect my team to know everything about a project on day 1. I just want them to know enough to start and provide a "good enough" estimate.

I do expect my team's estimate to get more accurate the further they get into a project.


What happens when you complete your work before you know what you need to do next?

If this never happens, then you have some invisible queue, as you do have things to do next.

As far as your example, that's a great example of a task that seemed like it would take long, and ended up being very very short. Can you describe why this would be bad to add into your task system?

    - Add Task: Run banking API in $BigBank test environment
    - Start work time clock.
    - Find out we don't need to do it, and switch to prod mode
    - switch to prod mode
    - Close task, and time clock
This is now data for your estimates of future tasks, as this will probably happen randomly from time to time in the future.


Switching to prod mode takes 5 to 7 business days, because we have to order certs from DigiCert and then upload them to $BigBank, whose team requires 5 to 7 business days to activate said certs.

We expected to turn on prod once testing was finished. But we ended up discovering that prod was the only correct test environment, because their test environment is rand() and fork()ed to the point that it doesn't even slightly resemble the prod environment. Hence, "prod am become test, destroyer of estimates."

So for 5 to 7 business days, we'll be building out our APIs by "assuming a spherical cow," i.e. assuming that all the test environment brokenness is actually working correctly (mocking their broken responses with non-broken responses.) Then in 5 to 7 business days, hopefully we'll discover that our spherical-cow representation is actually closer to the physical cow of the real production environment. Or it'll be a spherical cow and I'll be reshaping it into a normal cow.

By the way, if you've never had the pleasure of working with a $BigBank like Scottrade, Thomson Reuters, or $BigBank, let's just say it's ... revealing.


Maybe I'm missing your point. It seems you're attempting to answer the wrong question: is this task accurate, given all the changes that have happened? This is irrelevant for large scale estimation.

The question for scheduling prediction is: what distribution of time will it take to mark any task in this queue as FIXED/INVALID/WONTFIX/OBSOLETE/etc? The queue can have any amount of vagueness you want in it.

Regardless of the embedded work, regardless of whether or not it changes, becomes invalid, doesn't exist, etc - these are all probability weights for any given task/project.


This is interesting. A lot of machine learning works exactly like this [1], in the sense that you're biasing your prediction of each ticket time towards the average, except when there is strong evidence that ticket is special.

(You'd arrive at exactly this method if you found that there were no easy to find characteristics of tickets that could distinguish them as extra-long or extra-short)

[1] https://en.wikipedia.org/wiki/Ridge_regression


That is essentially what a Burndown Chart does in visual way, isn’t it?

https://en.m.wikipedia.org/wiki/Burn_down_chart


Probably this works for a large team of tens or hundreds of developers.

On a team of 4 or 5 people, where people go sick, take vacations, leave the company, new members join... each of these events has a big impact on those metrics. Which again, becomes a lot of work and effort wasted.

But yes, probably this is a viable option on larger teams.


New problem: can you estimate the number of queueable tasks in this supertask?


I love small simulations like this, but I find tiresome writing all the scaffolding to summarize the results. So I wrote this Python module:

https://github.com/boppreh/carlo

  pip install carlo
  carlo "d(20)+randint(2, 5)+int(random()*20)"
This will continually generate samples and show them in a self-updating histogram.

This is harder than it looks, because there's too many samples to store them all, and we don't know the range of values at start. But I'm quite happy with the results.


That is pretty neat! I have a dice stats python script I use specifically for DnD. It has command line flags and basically the same syntax as the discord dice bot. Very useful when you are considering strategy, even though my group plays fast and loose with the DnD rules.


I'd love to hear a post mortem blog post on a project that was ran in this way in real life & what the pros/cons at different stages were.

The author first talks about the waste of time that is upfront analysis/planning or adding a buffer for contingency and how Monte Carlo simulation fixes for that, but then the caveats seem to be based on an established team, of a fixed size doing similar style work.

For me, this stage of a project has never faced big hiccups in estimation & our teams are normally quite good at knowing "what's a 2 vs a 5 point story" — if there _is_ a big discussion in a planning meeting at that stage, it's normally because one person has overlooked a key requirement (or the other is assuming a requirement that doesn't exist) so it's a good opportunity to solve for that.

To use the analogy, we're probably quite good in established teams at estimating how long it'll take to write 10 blog posts. The trouble comes when someone says "The latest blog posts aren't generating enough interest. Can you estimate how long it'd take for you to write 10 poems instead? We want a mix of iambic pentameter and maybe some trochaic tetrameter (TBC). We've gotten budget approval to increase the size of the team, so we'll have 5 new people (off-shore) joining next week so we can go faster. Also, some of the old blog posts seem to be getting lots of visitors in South America, so we may need to translate these to Portuguese/Spanish too. When can you give me an estimate for all that?"


The book "How to Measure Anything" by Douglas W. Hubbard has a chapter on using Monte Carlo simulations to do project planning and gather estimates. It's successfully been used for project planning large complex projects like building nuclear power plants, or projects that NASA or the Navy or similar have done.

The approach is slightly different then the article above describes. Instead it has each engineer go through calibration exercises until they can fairly accurately produce 5th and 95th confidence interval estimates. Then each engineer provides 5th and 95th confidence interval estimates for each item that needs to be worked on. Those confidence intervals are kept separate. You can then run a Monte Carlo simulation where each piece of work is weighted randomly assigned to each engineer that provided estimates for that particular item, and randomly picks a number based off their provided confidence interval estimate on how long it ends up taking them to complete the item for this particular simulation.

I was on a small team that used the above technique. We were in a large company trying to launch a new product, and given manufacturing lead time, and seasonality of the market demand, it was very important that we could provide a good estimate to the business when we could have the software portion of the MVP completed. The business provided us with what they thought the MVP features were, we further added in engineering tasks that weren't business facing, but needed to be completed. We confidence interval estimated those, and then also confidence interval estimated our personal vacation days, sick days, as well as a bucket of "unidentified work". The 80th percentile Monte Carlo simulation put us out a little more then a year. Our actual delivery was off by only a week from the Monte Carlo tp80; I don't remember in which direction, but it wasn't consequential to the business.


Not to hijack the conversation, but I though some people interested in replacing estimations could find this useful.

In order to replace estimation, we are trying Basecamp's Shapeup language and techniques [1]

In a nutshell the thinking is reversed, instead of asking:

- "how long would it take to implement X?"

you ask:

- "how much appetite (in weeks and people involved) do I have for this feature ?"

Then you work with a fix "dead line" and variable scope.

Instead of user stories, they call it Bets, and they give themselves 2 weeks to shape bets and 6 to implement them (these two cycles can happen in parallel), during those weeks is about leaving the design at the Mockup/sketch level, and identify with help of business and technical people all the possible risks, so the Bet safe becomes a safer Bet.

We are still in the first cycle, and still a lot of questions and doing our own little customisations, but everyone response seems to be positive with the way of working, curious if anyone else has tried succeed and fail.

- [1] https://basecamp.com/shapeup


We work this way as well and it is wonderful.

We've been doing Shape Up since July 2020 and have yet to fail to ship what we shape for each 6 week cycle, and while there's sometimes a bit of a crunch in the last few days before a release, everyone is generally working at a pretty relaxed pace.

My favorite side-effect of working this way as a leader is that by batching 6 weeks worth of planning effort, the team (and myself) can just be generally left alone to get into it without having to decide on the next priority every single week. Combined with async check-ins (that the team provide themselves every couple of days via Basecamp) instead of daily/weekly stand-ups, everyone generally has an empty calendar at all times.


Flipping the question within a restricted variable is my favorite hack.

Overlords - “How much money do you need for this project?”

Me - “How much money can we spend?” / “How much time should we spend?”

Friend- “Look at how much money that CEO earns!”

Me - “What will you give up in your lifestyle to do what the CEO does?”

The only challenge though in these situations is that most people aren’t open enough to do “Think in bets”. They either avoid the question or don’t even attempt to come an answer (despite them not spending a single dime!).


Shape Up is really powerful and cuts straight to the meat of things which is getting things done as opposed to the traditional way of planning and processes greater than actual results. We’ve been using Shape Up for a little over a year at the Sailscasts Company and love every bit of idea in it. I think personally is a lean mean way of getting things done with tangible and measurable results!


Shape up is a powerful framework, it takes a while to get right but it’s very worth doing. We are about a year in, still improving at it. Our main focus now is doing things more by the book and getting rid of the customizations we added in the early days - so try and not make too many! :)

Email is in profile if you want to chat more on trade notes on it.


We've been using ShapeUp at Zinc Learning Labs for two years now and it has done wonders to our pace and sanity. I wrote about our experience with ShapeUp recently, specifically about moving away from estimation - https://world.hey.com/karthikc/wrestling-with-the-unknown-be...


The issue I've always had with Monte Carl simulation in this context is that it takes more knowledge, expertise, time, and care to accurately perform a Monte Carlo Simulation than to prepare an accurate estimate. It's like saying you can avoid building (yet another) barely functional go-kart by instead building a four-wheel drive. Monte Carlo Simulation sometimes cloaks that the assumptions behind the simulation largely determine the output and are just as (if not more) prone to error as the usual assumptions.

If the burden of accurately estimating an average duration for a blog post is too much for your planner, what are the odds they're going to accurately develop a probability distribution for the duration of a blog post?

In simple estimates there's no need to use a stochastic approach at all. For instance, the example in the post - if you know a blog post takes between 1-10 days (uniformly distributed) and that you need to get one out every ~6 days to get 60 out in a year, you already know the probability is ~60%. If you know there's a skew to the higher end and guess a distribution (as in the example), again you can directly work out that the odds of success are ~35%.

There is value when the estimate is not corrupted by other priorities (rare), when the expertise to accurately develop distributions for activities exists (also rare, particularly for work that isn't easy to sample and re-forecast), and when the plan is complex enough that it's hard to directly predict the impact of your statistical assumptions.


How is this different to just summing estimated value and variance?

I agree with the author that agile world zero effort estimates are pointless though.

"... engineers determine it by licking the tip of their finger and putting it up in the air.

Unfortunately, the only way to win at the estimation game is not to play it." "


I think it's understandable why programmers dread doing estimates so much. But I believe there are multiple factors contributing to the current overall mess and the guilt should be shared among several areas -- and that includes us.

Estimates are often, if not always, mistaken as commitments or deadlines. But we don't do a great job on explaining that estimates are probabilistic by nature, there are risks involved, and leaving those risks entirely to developers may be comforting at the beginning, but harmful to the entire Organization in the long term.

Granted, that would be a huge cultural change, which surely is not going to happen overnight. Besides, political conditions are seldom in our favor. But I advocate that we should do the possible to dialogue.


> Estimates are often, if not always, mistaken as commitments or deadlines

This. This is the problem with "estimates."

My solution - and this will sound bad - is to over-estimate.


I don't judge, and I recur to this solution as well, more often than I'd like to. But it doesn't only sounds bad.. it is bad indeed, for a number of reasons.

For instance, work usually expands to spend all the time available. Then, if you estimate a one-month task as four months, you will likely not win three months to spend on other tasks. You'll probably spend the whole four months on a one-month task instead.

Another issue: if you are competing for a project, over-estimates will translate in higher than necessary costs, and you may end up overpricing a project and missing the opportunity.

However, as I said, I don't judge and I also recur to that more often than not. On many Organizations, there aren't many other ways to deal with the matter. And that's why I say that the entire Organization looses: managers will have a good night of sleep after putting all the burden over your shoulders, but the losses are affecting every single stakeholder.

It's about time for them to start behaving a bit more responsibly.


Fair points, but I do not triple my estimations. Its more like added buffer for unforeseen risk. And I am careful to balance my team's well-being with the demands of business.


Yes, this is everyone's solution, all the way up the organizational hierarchy.

Software Engineer, it will take me 2 weeks. Engineering Manager it will take us 4 weeks. Product Manager it will take them 8 weeks. VP it will be done by the end of next quarter.


And with all the red tape and shifting priorities, they all got it wrong unless scoping down.


> But we don't do a great job on explaining that estimates are probabilistic by nature, there are risks involved

Ironically, I've recently been catching flak for "not being confident enough" when I give confidence intervals around my estimates indicating uncertainty.

In some sense, literally can't win. I then realized it was more about my lack of doing this: https://www.csun.edu/~to18470/articles/Managing_your_boss.pd...


When I am leading teams, I also ask for a certainty score along with the estimate, then track them together with daily updates. Like all human processes it is not infallible but it allows more reliable planning. I always wish I had a tool that would include this in the estimate process. Hansoft got close but nobody uses it!


Sounds like double down on all the fun!


A more “back of the envelope“ approach to handling uncertainty for this sort of thing is the “three point estimate” approach, where you give a best, worst, and most likely estimate for each sub task. These points are implicitly used to parametrise a distribution, and then you analytically find the overall uncertainty, rather than through simulation.


We used a different three point system. 1 point means we know exactly how to accomplish this work. 2 means there is uncertainty, but we are confident it will completed. 3 means there are unknowns that require more digging. A 3 pointer is a good candidate for breaking the story/task up into smaller chunks.


I really dislike posts in this vein, because they use statements like

"This post will teach you how to replace estimations and guesses with a Monte Carlo simulation."

as though the output of a Monte Carlo simulation were not an estimate, but some higher truth. The process of simulating many different outcomes with a touch of randomness gives the layperson the impression that we are really _doing_ something, when in fact we could have obtained the exact result (no confidence intervals needed) of Monte Carlo by quadrature (although admittedly that could be tedious).

In the end, the method outlined in the link is an estimate, like any other, and is not necessarily better or worse than any other estimate. It erects a straw person and then misuses mathematical terminology throughout.


I sort of agree with you although for different reasons. The problem with the MC approach IMO is not really that it is also an estimate, estimates are fine. However doing a Monte Carlo simulation is only reasonable if we have a good model. So now we have moved from we can't estimate (intuitively I guess) to how to find a good model, that is not trivial and the time spend on finding and verifying the model might take more time than what you are actually trying to do (just an (gu-)estimate on my part)


A Monte Carlo simulation is infinitely better than all the conventional "to estimate, take your first estimate and multiply it with 10x". And even better, your manager can do that without your help and this learn better how much you tend to over/under estimate projects without trying to force you to change your estimates to fit what he think it should be.

It also ensures that the final estimate becomes on average correct, which is much better than most projects which tends to under estimate the effort it will take. Although, under estimating projects is often a feature managers wants since it makes it easier to sell it to customers, ie "My engineers estimated we can remake Twitter in a weekend, I think we should focus on that!".

Anyway, the point is that if your manager complains about your estimations being on average off then he is just incompetent and should have applied a Monte Carlo simulation on it instead of asking you to perform an easily automatable task.


A model is not required if you use data from your team's past performance with a tool like this: https://marketplace.atlassian.com/apps/1216661/actionableagi...


Which is called bootstrapping.


Author here. Thanks a lot for your feedback.

Indeed, "estimations" might not be the best word as you're still, in some way, estimating with the Monte Carlo approach. I thought of this differentiation because I consider the MC approach to yield a forecast, while an "estimation" is the typical term used in agile settings for "guessing how long/much work it will take.". I'll think of ways of making this distinction clearer.

As for the misused mathematical terminology I guess you're referring to "confidence intervals"? If so, I understand it also has a clear definition within statistics, but I used it in the broader sense. I'd be grateful if you could point out which terms you'd recommend replacing and what you think I could replace it for.

Thanks a lot.


> Indeed, "estimations" might not be the best word as you're still, in some way, estimating with the Monte Carlo approach.

Not in some way -- literally. The samples are averaged, and as the number of samples goes to infinity, the average converges to some true value. In the case of one of the histograms you plot, you are averaging an indicator function .

> I thought of this differentiation because I consider the MC approach to yield a forecast, while an "estimation" is the typical term used in agile settings for "guessing how long/much work it will take.". I'll think of ways of making this distinction clearer.

It seems to me the distinction you care for is to provide not a single value ("estimate"), but a distribution of values. That's not specific to Monte Carlo.

> As for the misused mathematical terminology I guess you're referring to "confidence intervals"

That's certainly one of them. I even misunderstood what you were doing and said you don't need confidence intervals, but in the way you're using them those again can be obtained by quadrature.

My main gripe remains the language, and the presentation of the method. You could most likely obtain your results by hand, i.e. with pen and paper, and you would still get a distribution at the end of the day. The advantage of running Monte Carlo is that it simplifies this process. You could illustrate this by getting the distribution of the sum of two independent uniform random variables in two different ways: by doing a convolution, and by MC. Wow, MC was so much easier, anyone can do it, and it can handle arbitrarily complicated distributions (in principle).


Understood. Thanks for taking the time to provide more detail.

I'll take some time to digest and think through changes, and either edit, or write a clarifying appendix.


Monte Carlo simulations are often just a relatively easy way of doing numerical integrations. But it sounds fancy.


I use Monte Carlo simulations because I am not that great at doing complex statistics.

It sounds fancier to say modeled using a Monte Carlo simulation, but in reality it is much easier than working with joint and conditional probabilities.


> It erects a straw person

Just wanted to appreciatively note your use of inclusive, non-gendered language here.


A little bit of knowledge about stats and the appropiate language helps make this exercise easier. The R language is good at this kind of simulation. It comes with a whole bunch of random number generators out of the box. Here is how to estimate frequency of 7 from sums of two d6 dice rolls in R in two lines:

n <- 1e06

table(round(runif(n,1,6))+round(runif(n,1,6)))[6]/n

To break this down:

`n <- 1e06` assigns 1 million to n.

We want two six sided dice so we sample a random uniform (continous) distribution and round to a whole number:

`round(runif(n,1,6))`

This will make a vector of 1 million integers representing random "rolls" of the dice. In R it is really easy to pair-wise add two vectors of the same length so to get two dice we do:

`round(runif(n,1,6))+round(runif(n,1,6))`

Now we have a million rolls of summed pairs of dice. We want the frequency count of 7s. We can get this by getting the frequency count of every dice with `table()`:

`table(round(runif(n,1,6))+round(runif(n,1,6)))`

Which will return a print out of frequencies for every number that appears in the vector.

We can select 7s by pulling the 6th index and then dividing by the `n` dice rolls to get the frequency:

`table(round(runif(n,1,6))+round(runif(n,1,6)))[6]/n`

We can do the exact same oneliner in python using numpy and the Counter() function from collections:

`Counter(np.around(np.random.uniform(1,6,size=10*6))+np.around(np.random.uniform(1,6,size=10*6)))[7]/10*6`

Will return the frequency of 7s in python3.


> let die = Uniform::from(1..7);

I think this is a good example of the bad consequences of zero-based indexing. First, I hope you agree that a line of code describing a six-sided die should never feature a literal 7!

But zero-based indexing causes us to embrace exclusive ranges: if we want 6 numbers, we write 0..6 and that’s fine.

If you want 6 numbers starting at 1, then you must use an inclusive range operator, since you don’t want to bring the irrelevant number 7 into the picture. Now Rust has that: ..= and the author should use it here. But e.g. Python does not. Personally in Python, I would still refuse to write 7 and write range(1, 6+1) instead, but that’s most people would write range (1, 7).

Which brings me to my point: I’m not trying to argue against zero-based indexing, but I believe that

(a) programming language designers (e.g. Python) don’t recognize sufficiently that it has downsides as well as upsides, and languages need to provide things like inclusive ranges, even if they don’t actually provide 1-based indexing, since 1-based indexing scenarios do naturally occur when programming. It seems that Julia is an example of a language that has very much recognized that both sides have merit.

(b) In the standard case of a language that basically adopts zero-based indexing (most modern languages), 1-based thinking should not be presented as somehow vulgar or ignorant. A good test of whether your language is too hardline on zero-based indexing is: would you ever see users of your language using the literal 7 when describing a six-sided die? If you see that happening, the language is being too hardline on zero-based indexing; either you don't provide inclusive ranges or they're not promoted enough in the documentation.


It's fascinating how poor this area currently is. We are able to capture so many tiny, horrifying details about engagement and optimise shit out of ad clicking but we don't do basic things like using calendar or task dependency for internal delivery estimation.

Looking at what we're already doing on outside, improvements inside seem trivial. Good estimator should know that estimation spanning holiday season will be impacted. It should know serial dependency will span wider in time. It should know that set of unevenly distributed tasks (A-many + B-not-many) for small-team-A and larger-team-B will flow slower in time than distribution matching weighted max flow for team A and B. It should know multiple dependency on team X will have lower flow because they're maxed out.

It seems to me that monte carlo projection on timeline from inputs we already have mapped onto graph is really not a rocket science and it should have been solved problem by now. But it's not. Weird.


I recommend the book "Software Estimation: Demystifying the Black Art" (2006) by Steve McConnell. Nice collection of excerpts: https://leventov.medium.com/excerpts-from-software-estimatio...

There is also a short chapter on estimation tools using e.g. Monte Carlo simulations. One tip which is relevant here and that is also more or less presented in the article and probably obvious :) "TIP #65: Don't treat the output of a software estimation tool as divine revelation. Sanity-check estimation tool outputs just as you would other estimates."


Some other books I can recommend to read alongside McConnell's are Waltzing With Bears by DeMarco and Lister (of Peopleware fame) and How to Measure Anything by Douglas W. Hubbard.


if you use JIRA you can do it with a tool like this one and use past performance as input so you don't need to model anything https://marketplace.atlassian.com/apps/1216661/actionableagi...


I think FogBugz used this method to estimate how long open tickets would take to get closed. Or at least I remember Joel talking about it. I wonder if there are any other issues tracking apps out there making use of this technique


Yeah, FogBugz had a very similar feature 15 years ago.

It was a hybrid between the approach proposed in the article, and the "always multiply by N" technique.

Basically, FogBugz always asked you for your own best estimate. It tracked how much longer the actual time to completion was compared to your best estimate, and learned a simplified statistical distribution of your personal multipliers (e.g. in 80% of cases your estimate was 20% shorter than actual time to completion, in the remaining 20% of cases it was 10% longer).

FogBugz used the learned distribution to perform Monte Carlo simulations, and presented you with estimates of actual time to completion (e.g. "in 95% of simulations the project didn't take longer than 30 work days"). The great thing about this method is that it automatically accounts for chronic optimists, who always think it'll be ready to ship soon, and underestimate difficulty. The learned multipliers tell the algorithm that the estimates made by the optimist tend to be underestimates, so that it can present you with a "debiased" estimates.

E.g. if something always takes at least twice as long as Developer Anderson says it will, then FogBugz would tell you: "Anderson estimated 10 days, but based on past data I think it'll take at least 20, and it took less than 15 only in 5% of the simulations".

It had a very simple algorithm at its core, but the developers were knowledgable ,and had tons of data about project management. This allowed them to add many empirically verified "embellishments" (e.g. if the algorithm didn't have enough data about your specific multipliers, it assumed that your multipliers were about the average of your coworkers'), which made the algorithm surprisingly accurate in the end. Certainly better than any estimates most people could produce by hand.


I never tried FogBugz but I heard good things about it, similar to what you've written. It's surprising it didn't catch on more if it's that good? I mean, I suppose it could still catch on, the product still exists as far as I know.


I’ve written about why FogBugz failed before, but it basically came down to Atlassian having vastly better marketing, being much more willing to support every edge case instead of being opinionated, and having a better pricing model.

Unrelated, the new owners of Kiln and FogBugz are asshats, I moved my stuff off ages ago, and I’d encourage anyone else still on them to do the same.


Worth checking out Guesstimate[0] for a neat tool to help make these kinds of estimations.

[0] - https://www.getguesstimate.com/


Great tool. I've used Crystal Ball in the past, and this is so much easier to use.


The issue with focusing on realistic estimates is it omits the psychology of deadlines, incentives, ownership, competitiveness and determination.

Going back to your university days, who hasn't left things to the last minute and been amazingly productive in the last couple days before the coursework or exam was due?

The issue with aggressive deadlines is burnout, creative oppression, flawed risk management etc

As always it's a balance. Grass is always greener but chances are it's neither.


For those who like to do things a little more visually I recommend the open-sourced Argo[0] for microsoft excel.

There's a lot you can do with a monte carlo model. Tornado plots are something I highly recommend for understanding risk profiles around any complex task

[0] https://github.com/boozallen/argo/releases


This looks great. I wish there were something like this for google sheets or that worked outside windows. Guesstimate is close to this.


I wish we had public Monte Carlo simulations for the economy.

It's hard watching people argue about things with almost no evidence. The cynic in me is starting to think that that's all politics is. But imagine if people did have evidence, or at least projections within an error bound like the roughly 3% margin of error for polls, and could make decisions based on likely outcomes.

For example, how much would taxes on the US middle class decrease if taxes on the wealthy were raised? How much lower would they be after 5 years, 10 years, if that excess capital was reinvested in, say, an automation moonshot? I've framed a scenario which is obvious to me after running the simulations in my mind over 40 years of watching the national debt increase, but maybe I'm wrong (unlikely).

This site has a "Sample Size Calculator" tab which has a calculator for the number of samples needed to reach a certain confidence level:

https://www.pollfish.com/margin-of-error-calculator/

More info:

https://en.wikipedia.org/wiki/Margin_of_error

For a population size of 350 million, to get to a 95% confidence level with a 3% margin of error, requires polling 1068 people. Getting to 99% only requires 1849 people.

Now why can't we write something like the program from the article, maybe with some Bayesian inference, maybe some machine learning, with big sample sizes and high confidence levels? This may be the plot of some dystopian novel, but, surely more evidence would help inform a populace that's being spoon-fed ignorance by coopted media?


There are, roughly speaking. You can look at option pricing to see market predictions on various outcomes, at least ones that can be boiled down to "what will the price of SPY/TLT/EURUSD be in 2022?" If you have a better simulation, take your output and trade against the market where it's wrong.


Shameless plug but we wrote a free tool that you can pipe in Jira data and run monte-carlo simulations to help generate forecasts.

If you don't want to go through the process of logging in via Jira there is a demo where you can manually input the data needed.

See: https://agilytics.leanloop.co.uk


I will definitely be using this! Can I use single project analytics instead of the whole jira? (Different people work on different projects)


Yep it will allow you to select a project from your JIRA instance and only count throughput from that project in the calculations.


I think that one of the factors making it so difficult to produce accurate estimates on software projects is their usually high levels of variability. There are often so many big differences between projects and it's hard to consistently draw lessons from past experiences.

However, it seems that when such variability is low, it gets easier to make predictions.

I've seen an ERP company which used to provide good estimates for most of their customization projects. However, they used a home-built, suited-to-the-task programming language, the team was experienced and relatively stable, and the projects' scopes were usually narrow.

Those factors are not easily reproducible, for sure. But I believe they indicate that taking measures to reduce variability and standardize processes may pay dividends in the long run -- projects may take longer than they would take under agile approaches, but the improved predictability may be desirable.


A Monte Carlo simulation can hide a lot of assumptions about the the actual system being modelled and the distribution of random variables (both on a univariate basis and as a dependency).

Its flexibility and power of estimating (otherwise maybe untractable) metrics comes at the expense of transparency, potential simulation noise and difficulty in estimating accurate "what if" scenarios. The risk is that one might get out simply what one assumes.

Here a list of questions you can ask to minimize the associated risks and steer your development:

* Do I have a good, self contained, description of the system that I want to simulate? Is it even possible to define it in practical terms?

* Do I have historical data that can pin down its stochastic behavior? Can I estimate a statistical model reliably?

* Does the uncertainty around model estimation justify retaining the full model or could I possibly simplify it and obtain semi-analytic results?

* Can I validate my estimates out-of-sample


I used this method for a while on one of my teams. It was useful and helped provide more accurate estimates.

Some tips I have for people interested:

* Get about two months of velocity data in place first. * Run the simulation on a regular basis. Update your estimates every week/sprint. * It helps to have individual work units be "sized" as consistent as you can. (E.g. each ticket is 2-3 days, a consistent pointing system, etc.) It's ok if anomalies happen from time to time. * Dates will get more accurate the further you get into a project.

Some things I've learned from doing this exercise:

* When work units are sized to be 2-3 days of effort, engineers complete about 4 units/wk on average. * Don't shoot for the perfect estimate at the start of a project. Plan as much scope as you know and enough to get started. * As new scope is found, update your project plan and estimates.


If you already have ~2 months worth of data, why not just use that to plot a distribution curve? Seems to me that the monte carlo simulation that the article describes is really just used to "guess" the likely distribution curve. Seems unnecessary if you already have ~2 months worth of data.


Speaking purely to Monte Carlo simulations: one key issue is knowing the underlying distributions of your random variables. Without this, you may be led further astray than before your simulation.

A much more robust approach - though not always appropriate - is a worst-case vs middle vs best-case scenario analysis


You don't need a monte carlo simulation if you have a known distribution. You can just add the means and variances and solve the equation rather than approximating it with a simulation. Monte carlo is more useful when drawing from an unknown distribution.


My understanding is that while the normal distribution is friendly to simple manipulation, a lot of distributions aren't. Not being a statistician I couldn't tell you why or which.


Monte Carlo simulations are invaluable when it comes to creating tissue line odds. For one of my horse racing projects, I can come up with a range of how an individual horse is likely to perform-- which isn't much use unless compared against the others in the race with a frequency component (e.g., this horse will likely be the best only 10% of the time-- but the pools are offering 15-1 odds, so it is +EV). On modest hardware, I can run thousands of race simulations and somewhat effectively determine the likelihood of a given horse winning.


I feel like the author just re-invented the XP concept of Velocity[1], but gave it a mathematical form.

1. https://martinfowler.com/bliki/XpVelocity.html "You should usually determine velocity by measuring how much got done in past periods, following the principle of YesterdaysWeather. A typical approach is to average the velocity the past three time periods to determine velocity for future time periods."


For anybody who wants to try it out in a spreadsheet environment and/or is looking for a useful abandoned project to duplicate, I recommend Risk Engine for Mac. Unfortunately it hasn’t been maintained for a long time and has been abandoned, but I keep using it.

https://www.macupdate.com/app/mac/30431/risk-engine


A nice exposition! I think Omniplan does this [1], and it certainly helped me in the past get a better idea on variability of long-term project planning. Of course, it's only as good as the assumptions that go into it!

[1] https://www.omnigroup.com/omniplan/features/#Monte-Carlo-Sim...


I tried this at a past job. I fed Jira history into a Monte Carlo simulation that would give a set of probabilities for various completion timeframes for any new user story. Despite months of trying to sell the idea, backed by sound mathematics instead of guesses, I never got past the point where stakeholders would listen politely then immediately go back to demanding to know "when will it be done"


I wrote a Monte Carlo area calculator once. Just draw an arbitrary shape and hit solve. You can increase the number of samples used.

https://victorribeiro.com/monteCarlo/

The drawing area is 500x500 pixels, so if the shape covers the whole area, the result should be 250.000 pixels


After many decades we still discuss and wonder how work amount estimates should be done and why they are always wrong. The most brilliant software engineers and project managers have developed countless different methods and nothing works.

For me it tells that nature of work is such that we just can not estimate work amounts. We are just pretending that it can be done.


Absolutely; we still treat knowledge-based work as if it were factory based with known inputs and outputs (someone can chime in on MBAs etc...)


Are you suggesting that MBA programmes are a factory with known inputs and outputs? Probably a fair critique


It depends on the type of work. Generally and counterintuitively, the more individual random variables the work contains the easier it is to estimate. This is because random variations cancel each other out when they are in large quantities.

With programming, one random variable can add days, weeks or even months of work.


Now you need to guess the variance and correlation between events, and if you get it badly wrong it ends up like CDOs in ‘08


Let the observed data do the work and remove the guessing by use pairs / trebbles etc. of past events in your simulation.


If you don't want to script your own, Vistimo Quotes lets you do monte-carlo simulation with vague, uncertain estimates; and lets you play with variables like the number of people working, vacation time, etc.

https://quotes.vistimo.com


I've actually been working on turning this idea into a product for the past year or two. Looks like I'm not the only one to have thought of this!


I worked on something similar a while ago. I was defeated by UX more than the mathematics.



Code used Rust, interesting, I have come to expect Python and just recently used the open source version of Matlab called Octave.


Author here. I thought no one was going to make comments about Rust, but I just loved using it for this post.

Thanks, Peter :)


Rust is cool and all but I feel like the code examples would have been a bit more didactic in e.g. Julia since there's less distracting complexity in those types of languages. Here's a somewhat equivalent translation of the first snippet:

  rolls = 1_000_000
  n_sides = 6
  sevens = 0
  
  for _ in 1:rolls
      if rand(1:n_sides) + rand(1:n_sides) == 7
          sevens += 1
      end
  end
  println(sevens/rolls * 100)


I have been looking for an excuse to use Rust, thank you for the interesting post on all counts...


Has anyone used Excel’s Monte Carlo Simulation? I believe they’ve had it since Excel 1999.


It would be good to build a company corpus using completed .mpp files.


Whilst I'm happy that probabilistic methods of estimation get air time, I feel these sorts of posts give the lay person unearned confidence that they are suddenly far safer then they are. In most cases I find people don't know what should be planned, how it should be defined, measured, tracked or assured - and when combined with a lack of appreciation for Risk and how it should equally be defined, measured, tracked and controlled this is the core problem not idea of estimating itself.

### Skip to TLDR ###

Usually it's of little consequence and the stakes are low, but it's this general lack of skill in such a large population of people who are doing it poorly that gives estimation such a bad reputation in the first place. The idea of estimation is not incorrect, it's that people aren't taught how to do it properly in the first place and industry continues to propagate the myth that it's simpler than it is. Often because it's mandated by those who don't understand it themselves.

Having played in many types of industries I have a thesis that much of what we do in planning and estimating has had real benefit in some areas historically - where there are clear, repeatable, measurable observations, certain types of construction and fabrication - but not others due to their success being tied to the method alone without account for how the nature of what is being measured.

Monte Carlo has very real and valuable uses in critical path method scheduling, but, as with most things, only when used appropriately. Weather is something that fits well, because it is cyclic, we have a lot of data, we can, ahead of time, to some degree, have a likely view of where and when things like hurricanes/cyclones will occur in a given period.

In the Australian North-West Shelf where I used to plan O&G maintenance dive campaigns, knowing how long it takes to get to an area, a diver to get to depth, perform a task, return safely and leave the area is the easy part. "Everything going to plan"* we should be able to do these x number of inspections/interventions in y number of days. But weather is where you can come unstuck. Sea state matters, there is always some chance that you will have to interrupt or postpone on a day because of conditions. We know there are roughly x number of these days per cyclone season and roughly how long we might not be able to work for but pay full rates for crew, equipment etc.. An at risk period for weather goes on to the end of the schedule as contingency and is drawn down as needed, but in some scenarios where in the network of activities these multi-day events occur can have varying impacts. Inserting a three day stand-down early on could have a greater impact on the end date than a three day stand-down right near the end if at the earlier time you are doing things that must be done in a continuous sequence with - if interrupted - will need you to repeat again from the beginning. Luckily there are tools we can use to not just look at the net effects of activities in a schedule network having their own distribution curves, but also injecting events, again with their own probabilities, randomly throughout the network during the run.

This gives you a much greater insight than a 'P90' estimate on your critical path, it gives you the sensitivity of activities in your network.

### TLDR ###

This all assumes, of course, that what you have modelled (planned) reflects reality to some degree. A bad schedule with bad bounds will give you a bad result regardless. It doesn't matter how many times you simulate it.

*it rarely does


This is a very naive approach for a number of reasons:

1). You're only sampling a tiny fraction of the space of possibilities. The example using two dice having 36 outcomes makes it seem like you're in a very well behaved world. But the total number of possible samples is n_possible_values ^ n_dice, so with three dice you have 216 possible outcomes, with 4 dice 1296, with 10 60466176. The growth is exponential in the number of variables, with predictable results.

2). The distribution of delivery times is not normal, experience shows it's much closer to a power law with a small number of tasks exploding beyond any reasonable expectations.

3). No critical path. A task can't be completed before it's critical path is completed. Since you allow subtasks to vary your critical path needs to be calculated for each run. With 1). and 2). it basically means that you have no idea if you've gotten a good representation of the sample space of critical paths.

If you run a simulation that samples a tiny fraction of your probability space you have no idea what monsters lurk in the background when the problem space is one prone to explosions.

In short: beware of tap dancing in minefields when blindfolded.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: