I first heard about this in the context of software project estimation.
The author (forgotten to me) made the point that nobody has permission to think about disasters when estimating project completion time.
So why not explicitly ask them to? "So now we have our estimate, what would cause us to miss it?"
And in the course of giant monster attacks, you give people permission to talk about specific scenarios that are actually very likely to derail the schedule.
It was an interesting observation, vs the "just multiply the everything goes right estimate by X" method.
If everyone else is being positive, no one wants to be the lone (realistic) downer.
Spot on. They used to call me 'Dr. No' because that is exactly what I would do, but it saved us tons of money so it was tolerated. We also actually delivered on time and within the budget but I'm sure it cost us contracts to more optimistic competitors.
This. Most engineers delude themselves and plan for the happy path.
I was famous too for "being negative." In the planning of a complex project once a rather dull engineer, after I had just pointed out a major potential problem asked, "Why are you so negative?"
I've seen a lot of engineers be negative in a destructive way. They tear down ideas, but fail to offer solutions. Usually this is about their ego rather than a desire to help the team.
I agree that's a dark pattern and something I hope I have never been (I realize you weren't accusing me of that).
I try to ask questions like, "What happens when X goes down?" What happens when network latency goes up?
I quit a job over this one. I had designed an API for a consumer IP/IR control type thing. You say, "I want to tell that thing over there to turn on." The API does the thing, and then exits. The API does retain state-- but only in as much as it wakes up when it gets a packet from a device, parses it, etc, and makes any internal state changes.
Well, management decided they wanted to demo it running continuously for days at CES. Now if you've never done a demo for CES-- it is the worst environment possible. Networks go up and down, there is so much radio traffic that WiFi, BT, anything are unreliable.
I told them I would need to harden the API, that it wasn't designed for that scenario and most of our tests didn't last more then a few seconds keep in mind that at this time, this was a skunk works sort of thing that had not yet been productized. Also, keep in mind there was an aggressive, aggressive development schedule.
They predictably lost their minds. I know its crazy right? Test for the exact scenario you plan to show to customers? They forbade me for doing any sort of test like that and charged ahead with the demo. A month later a manager dressed me down in front of my entire team. About the bug that they had forbade me to fix.
I walked out and never went back.
EDIT: That turned out to be more of a story about ruthless management and constructive dismissal.
My approach was to allocate 2 weeks for even the most basic things because unknowns always creep in and a tighter schedule tends to make those things creep in even more. Also, many of the requests had no particular urgency to begin with.
On the flip side, I'm sure you retained more old business. And employees: trudging through pre-doomed projects was a major root cause of talent and motivation drain at most B2B places I've worked.
I do exactly the same thing - it's all part of the project and system analysis and it helps you neatly sidestep potential pitfalls.
It reminds me of the whole "hero developer" or "hero team" myth; teams who do not do proper analysis, build a turd, but work insane hours stopping the turd falling to pieces.
The people who do that come out looking like heros, when they could have totally avoided the drama in the first place.
It can depend on accountability mechanisms. I would like to see more contracts that give bonuses to companies that come in under budget and under schedule and penalize the overly optimistic ones that never seem to hit their target. This is becoming more common in some domains.
My contract manufacturing company does this. Our standard contract has a 5% bonus for being less than 10% late on the delivery date (and penalties start around 25% late). After suppliers see the contract they will often revise their originally quoted schedule.
The predominant domain to use this type of contract is infrastructure construction. I haven’t personally seen it used in software development outside of control systems but I can’t immediately think of reasons why it couldn’t be extended to other domains as well
I just deployed something worldwide that took about six weeks to develop and our initial guess was about two weeks, so this is fresh in my mind. I was able to lay out a list of work items at the beginning that stayed like 80% the same from the beginning to the end, but didn't account for the possibility that each one of these could be push-button, or could lead to a blocking problem that needed a day or two of research to resolve. Based on that, I'm thinking that the way to approach the next one of these is to lay out the roadmap, but assume that something like half of the trivial steps are going to turn into complex problems that stop the train for a day or two, or lead to a change in the design.
The best framing I've heard for this problem is: minimum time for each component is bounded, maximum time is unbounded. I.e. there is effectively no answer to the question "If this component becomes a problem, what is the maximum amount of time it could take to solve it with the resources available?"
Ergo, in the worst case, any given single component's ballooning time can dominate the overall project schedule.
Which turns estimation into a game of "How certain am I that no component's time will explode?" To which the answer in any sufficiently complex system is "Not very."
I'm pushing my work to move to something more like a converging uncertainty plot, as milestones are achieved and we can definitely say "This specific component did not explode."
Our PMs aren't used to hearing an idealized minimum schedule time + an ever decreasing uncertainty percentage based on project progress, but it feels like less of a lie than lines in the sand based on guesses.
(Note: This is for legacy integration development, which probably has more uncertainties than other dev)
One interesting thing to also account for is the correlation between activities regarding cost or schedule impacts. Meaning the uncertainty analysis should also account for systemic effects where a ballooning of one item’s will also cause another item’s schedule to increase by some correlated amount
I do remember doing a risk assessment as part of a proposal for a client. They pushed back hard on a number of points and got the sales person to remove them.
The funny thing is the risks that were removed all occurred during the project. We should have stood our ground.
> So why not explicitly ask them to? "So now we have our estimate, what would cause us to miss it?"
I didn't used to take this seriously. Then one project I was on was delayed by several weeks when members of the team got jury duty, got injured, and had other medical emergencies in a rather short period.
We still got it done, but I got a lesson that month in planning.
nobody has permission to think about disasters when estimating project completion time
I've been a part of a few large scale system implementations and we documented every significant risk to the project at the beginning and as we went if new ones presented themselves. This seemed to be a standard part of the PM's playbook, and I used to to save critical organizational capabilities when things collapsed:
--------------------------
I raised risk on a must-not-fail deadline, but it actually failed and was the camel straw that brought the project down. (An Oracle product) PM's were didn't appreciate the risk. There's still probably an Oracle VP out there who really doesn't like me for using the established project protocols that allowed me to escalate the issue to the president where I worked and way above the VP in Oracle. On my own initiative I kept the legacy system-- supposedly partly decommissioned-- updated just in case so we could fail-over softly. This was unexpected: When failure became obvious I reminded the VP that I had raised the original risk, and then I had prepared for it. I think their plan had been to box us in to meet their increased $ demands.
It was part of another missed deadline required for a major milestone 25% payment on the contract, and it was a fixed-price contract so they should have absorbed the costs.
Instead, when the failure progressed to my area, we all showed up to work one day and everything was gone. No contactors, nothing was left, like a cleaning crew had cleared it out. A week of negotiations failed when they wouldn't budge from demands for extra $$millions to keep working on the project, plus their missed 25% milestone payment.
Ultimately I lost a years worth of work when things entered lawsuit land and sued each other for breach of contract. (we won-- or settled at least. The fixed price contract was clear, and the missed deadline were partly due to demos of fake functionality that didn't actually exist or components they knew would be EOL'ed after signing the contract & before implementation would begin. The case for outright fraud was pretty strong: We had videos of the demo.)
In case you're wondering, I don't like Oracle. Though with a different product we were still forced to use Oracle DB's. Those I actually don't mind and were a huge step up from the ~2000 MS SQL Server & much older DEC VMS file-based database that wasn't SQL compliant or relational.
I’ve struggled with Out of Life support from almost every database vendor. They create something that is very hard to change, upgrade or rip out, and then require you to upgrade it every few years. Lots of hidden costs for the buyer. They’re generally in a better negotiating position than understaffed IT departments.
Out of Life support was a huge issue in the case I outlined above. I kept things ticking over on my side, but there were no more updates from the original vendor except for bespoke work we had to pay them for to complete annual updates for federal compliance issues. Actually we pooled together with a few organizations in the same boat to do that until we rebooted the failed project.
To give the legacy vendor credit though, It was a legacy product 5 years past its "final" EOL and kept honoring the maintenance agreement and providing ToS updates for a long time. In terms of the database itself, it probably helped that it wasn't their own: It was native to OpenVMS and hadn't substantially changed in at least a decade. Ultimately that made data migration a bit easier since industry tools for migrating from VMS systems had reached maximum maturity by the time we got around to it.
I still have a soft spot for that old system though: It lacked any sort of modern functionality newer than about 1995 and the underlying application has it's nroots in the 60's. But it was fast & I had low level access to do some complex things much more easily than the upgraded system (installed about 6 years ago). You won't get much faster than a well-tuned decades-old system running on modern hardware, at least not unless you need something that can handle medium-to-big data.
I once worked on a team that did something along these lines and referred to it as a “pre-mortem.” We basically held a meeting where we imagined the project in its concluded state and discussed what “went wrong.”
PI Planning in SAFe explicitly calls for this. Risks to the plan are called out in front of everyone and each is discussed to see if it can be mitigated (and who will own that mitigation).
If anything happens due to one of those foreseen issues, everybody knew about it in advance and already discussed what, if anything, could have been done to prevent it as well as what action was actually taken.
I love the SAFe / PI Planning approach because it makes sure that everybody is on the same page and totally removes any blame game from deliverables. Far, far fewer surprises.
The tension I've seen at most places where this goes off the rails is due to mis-assigning responsibility.
PMs are responsible for keeping projects on schedule. Engineers are responsible for completing work.
Consequently, PMs are incentivized to compresses schedules, and engineers are pressured to do the same.
The end result is that "the people who do the work plan the work" goes out the window, because risks aren't fundamentally understood (on a technical nuance level) by PMs, so naturally their desire for schedule wins out whenever there's a conflict.
(That said, I've worked at shops that hew closer to how it it should be, and it works great. Current job just happens to be a dumpster fire of bad Agile)
Yea, I can totally see that happening. Hopefully in a room full of people, somebody will have the gumption to not vote low confidence if there’s concern about this happening.
Instead of doing more "post-mortems" after projects fail, try doing "pre-mortems" before projects begin.
Imagine it's 6 months from now, and the project we're about to begin is half complete and twice over budget. What factors will the post-mortem identify as causes for the failure?
Learn the lessons before the project begins, not after it fails.
I think this is an excellent approach, and I try to do it myself. But YMMV for getting teammates to do it in a meeting - even with a generally supportive manager (not me, I'm just scrum master), there is just a psychological resistance. I don't think that they have a list in their heads and are simply afraid to share it, I'd guess that it requires some effortful imagination, and would be unpleasant, so things just get stuck. I'd love ideas for follow up prompts that might help with this.
It's not only that. There seems to be a 'political' advantage to overconfidence even given the effect it ought to have on your track record. (This is not advice.)
As a for instance, I was the operations lead a couple years ago for a large, customer-facing financial product rollout. The timeline was insanely aggressive, approximately 10 months ahead of my prediction and predicated upon several payment and banking vendors nailing their legacy integrations out of the gate with no delay (perhaps a first in the history of the world). Several of these vendors weren't committed nor was a spec agreed upon prior to the timeline. When raising these concerns everyone acknowledged them, but mitigations or deadlines revisions we not made as it would countermand previously set expectations with the executive team.
The project continued on for another 18 months past the deadline with a "launch" set every three months. Inevitably something would derail it a week beforehand that was unexpected but known months in advance by the project team (e.g. the mainframe the payment vendor uses has a fixed column length. It will take two months to build around this to support xyz).
In the end it got rolled out. Everyone forgot about the delays and the project team was praised as the implementation did better than expected. The same technique is employed once again.
While I don't like it, I now see the estimates and re-estimates as a method for an executive team to manage a project portfolio and prioritization. It's not a good way to do it but it's easy to express priority by simply ranking deadlines.
It's much easier to avoid this in high-trust environments (typically smaller organizations).
You know at first glance I would say this isn't true, since SWOT analysis covers this pretty well. Then again, I didn't learn SWOT in any programming class, it was in business classes.
I might be misunderstanding but I thought it was SOP to have a section/activity in your estimate where you mention risks and opportunities and their 'chance' if happening?
The author (forgotten to me) made the point that nobody has permission to think about disasters when estimating project completion time.
So why not explicitly ask them to? "So now we have our estimate, what would cause us to miss it?"
And in the course of giant monster attacks, you give people permission to talk about specific scenarios that are actually very likely to derail the schedule.
It was an interesting observation, vs the "just multiply the everything goes right estimate by X" method.
If everyone else is being positive, no one wants to be the lone (realistic) downer.