Hacker News new | past | comments | ask | show | jobs | submit login
You can't stop the business, or why rewrites fail (swizec.com)
77 points by kiyanwang on July 19, 2023 | hide | past | favorite | 74 comments



I proposed a re-write for a tool that we needed to use in a different context recently. We built an MVP relatively quickly, but ended up throwing out the rewrite. We had two issues-

1. when we did the rewrite, we ran into a lot of the same constraints we ran into on the original application, and we didn't come up with significantly better solutions 2. we had to continue using and improving the original tool to serve it's original use case, and the dual maintenance was terrible

But there was a huge positive as well. Without worrying about breaking the original use case, we could hack on the new tool all we wanted without regard for stability or reverse compatibility. We got a thorough understanding of how to serve the new use case and why the original tool was designed the way it was

We ended up taking the best stuff from the new tool and bringing it to the original tool, and made the original one better. I am not sure that it was the best use of time, but I'm also pretty sure that we would have never been able to modify the tool as well as we did without the benefit of the rewrite


This is a good example of why it is useful to have more than one person or team work on the same problem in parallel. Yes you duplicate effort but you often come up with better solutions especially if the teams regularly meet to compare notes.


As long as their "career performance/review" doesn't depend on making their solution/big project work. Otherwise, you're setting either for failure and creating a culture averse to sharing, collab and positive but passionate debate.


How detached does your management need to be in order to f this up? You’d obviously have to coordinate this in a large corp.


You'd be surprised how common this is. You gotta remember the managers are people playing the career game too. And if it's one of the large corps with a lot of turnover, they have to move fast.

Speak with people who work in large corps and you might here similar stories. Problem is people online naturally can't go too deep to avoid doxxing.


#2 is the biggest problem I've faced. It's nigh impossible to truly revamp what you have while also having to sell/serve/use/change the existing product. You either have to leadership who is willing to take the hit and stop new features/bugfixes from coming in while you're redoing the product, or you have to pad timelines for new features/bugfixes but that's risky too because you can be seen as someone moving too slow


> But there was a huge positive as well. Without worrying about breaking the original use case, we could hack on the new tool all we wanted without regard for stability or reverse compatibility. We got a thorough understanding of how to serve the new use case and why the original tool was designed the way it was

Maybe good for your understanding, but this is pure waste.


How is more thoroughly understanding an important service pure waste? I could agree with inefficient, but definitely not pure waste.


What did thoroughly understanding it produce? It consumed a shit ton of resources - what was the product?


What is software development if not thoroughly understanding the problem? Implementation is rarely the hard part


This is no more wasteful than discarding a prototype, the only possible problem here is putting too much effort into throwaway code.


Enhancing understanding is not waste.


Speaking of shipping code, I always envision my systems as if I'm designing Ships of Theseus. Design it in a way that things are nice and modular and you can take any one piece and replace it, especially if you're on the high seas. Most large ships these days have their own machine shops. They're constantly Theseusifying themselves without having to go to port and shut down operations.

The hard part, of course, being that you need competent engineers who know just how modular is the right amount and how to achieve that. This can fail horribly when you go leftpad on the modularity. But I believe it's the only way to build something that lasts.


This is a great metaphor! Every line of is a liability. And every line of code will eventually be replaced. So prioritize code that you can rip out and replace!


I’ve used the Ship of Theseus metaphor before but never considered using it as a verb: “Theseusifying”. Love it. I could even abuse the word more by twisting it into other forms: “This refactor is a one aspect of the larger Theseusification process we’ll be undertaking this quarter…”


There's a huge difference between systems that are business process heavy vs. simple straight-forward tasks that need a special touch to get right. An accounting system that's got hundreds of thousands of interacting rules conditions will be maintained and reworked in a very different way from an email server that sends out millions of mails a day. To just wave a hand of treating "systems" as a singular thing, the general advice just simply falls apart.


In that example, the requirements of the accounting system are also very poorly defined since they've evolved over time without thorough documentation. An email system on the other hand has fairly rigid (but not perfect) requirements in the form of RFCs and other standards.

I think the same applies to rewrites--systems with very well-known/understood requirements will be easier to rewrite than systems with soft requirements that have evolved a lot over time.


"Systems with very well known & understood requirements are easier to rewrite" is a good way of pointing out that 99% of applications are hard to rewrite :-)


Just commenting to say this is true in almost 100% of cases and if someone with decision-making power is suggesting it in your organization, run.

The only time you should remotely take it seriously as an option is if the person proposing it also happens to be the most experienced with the existing system by a wide margin. Almost no sane subject matter expert would ever suggest it. In the case they do, they might be right. But being right in this case would almost always indicate that the business itself should be totally transformed.


Rewriting a whole system from scratch is almost always a bad idea. Rewriting a system part by part is much more achievable with each part going into production incrementally. In the end you might have a 100% rewrite, but doing it incrementally keeps the system sane.

Doing this way also often ends in failure though because it takes longer and often the rewrite stops midway (business requirements change, key person leaves, organisation rework) and now you have two competing architectures in your product.


Martin Fowler suggests an alternative: the Strangler Fig Application[1].

[1] https://martinfowler.com/bliki/StranglerFigApplication.html


I did something similar to that at one of my jobs. We had a Java web app that we wanted to port to Python. I created a new Python web app and ran it alongside the Java app. Then I started reimplementing the Java code in Python, one endpoint at a time. Every time I had a new Python endpoint working, I went into the corresponding Java endpoint, and I replaced its code with a call to the Python app. The Java app, in effect, was becoming a proxy to the Python app.

The plan was that once all the code had been ported over, the Java app could just go away. I skipped over some of the details here, but this was a nice way to port the app over without feeling in a rush, and without having that big stressful "throw the giant switch" moment. Ultimately, the company decided to scrap the original app altogether and rewrite from scratch because a lot of the original app was irrelevant to the new direction we were taking... but I got about 20% of the original app ported over in a week or so.


I like this sort of process. I've done something very similar converting scientific code I prototyped in python into c++. I would re-write one function or class at a time in c++, write a binding to it with pybind11, and replace the original python with a call to the binding. There was some extra work with the bindings, but that's such straightforward boilerplate it was worth it.


We tried that in the company I worked, project was abandoned 1 year later. Improvements and better practices were incorportated into the old codebase which is still scaling today.


This makes me wonder if the system was still capable of supporting improvements, better practices, and increased scale, why was it targeted for a rewrite in a first place?


I worked somewhere where this is all we did. Changing things was discouraged (versioned api endpoints) unless it was to fix a bug. I actually loved it. Each time you copy-pasted an endpoint to make a significant change, you didn’t have to. You could just rewrite it from scratch, taking everything you knew from the old versions.


One of my colleagues recently described[0] how we applied this at Deliveroo and it worked nicely for incrementally decomposing

[0]: https://youtube.com/watch?v=LP0sTNruvaM


Rewrites are risky, but sometimes necessary (EOL software with security risks, increasing costs for implementing new features within the existing codebase, code rot etc.). I think a lot of developers also lean in that direction because new code is easier to write than the old one is to read, understand and maintain, but attempting to create something that encapsulates and carries the old business logic over 1:1 can be very hard in practice!

There will be problems regardless of whether you're trying to do a gradual rewrite and carry over API paths one by one, or do a big bang rewrite, where you replace the whole thing. I actually recently watched a conference talk called "Top 5 techniques for building the worst microservice system ever", which also touches upon the issues with rewrites: https://youtu.be/88_LUw1Wwe4?t=775

But as long as you're okay with the loss of some functionality or changes to some of it (hopefully for the better), then it should be a bit more doable.


Did this once. Took 5 years (old system was still available, we swapped out various parts over time, etc, etc).

Rewrite/relaunch did increase revenue a lot, but not enough to justify the 5 year investment of R&D. Pain.


Good writing on the problems, I think it is just missing the "what is the right approach then?" Approach that I've seen working is the divide and conquer: It's similar to the new system next to the old (greenfield) but incrementally (avoiding the monolith substitute). Get a manageable slice of the business and develop a substitute system for that slice, old system continues to live and evolve (as needed) while you evolve the new system incrementally (getting bigger slices from the business/process).


Great comment, I have been looking at doing something like this in my own case. You target a single customer / stakeholder, and suddenly stopping the world is less damaging because you're only stopping / slowing development for a smaller segment of your business. That also means you can time it to align with that segment's needs, trying to find a good stretch of downtime where you can get to work. (This would be impossible for the whole business). If you pick the right customer, you get a good blend of reduced scope (you don't have to solve EVERY problem) without being too narrow (you can't just work on the fun parts - you have to fully solve the problem for that stakeholder, forcing you to think through all of the boring stuff that will need to be figured out).



Strangler _Fig_ Application

"Recently I thought of a small tweak that might help things a little. If I rename the post to “Strangler Fig Application”, and use the term “Strangler Fig” as much as possible, then hopefully that would reduce the violent connotation by reinforcing the metaphorical link that is the whole point of the name. Because it's a small change, maybe it will spread enough to be worthwhile, and it's not much effort, so seems worth a try." - M. Fowler https://martinfowler.com/bliki/StranglerFigApplication.html


In addition to the strangler pattern, look at the tools from Michael Feathers's "Working Effectively With Legacy Code." along with refactoring. If you can get all pieces under test, you can have the confidence to replace and refactor larger pieces of the system without replacing the entire system.


In systems software I've often seen/done this on a micro scale, where you build the new system inside the old system, think feature flagging but something so fundamental that it's almost the whole thing.


I find it odd when tech people talk about “the business”.

The “business” is everyone working together.

Sales, ops, tech are all the business.

I appreciate this word has a specific meaning in tech circles but its use distances you from “the business” you’re part of.


Right, but sort of the point of this article is that launching a long rewrite that delivers no value doesn't help the business aka it's NOT everybody working together. It's the tech team going off working by themselves.



That is the organization.


I've successfully done a few "rewrites" but truthfully, the "rewritten code" was put into production, replacing the old code, within a month. Rewrites don't work if you have to wait to use the code.

Basically, if you plan to rewrite something, plan on getting it released to production on the same schedule that you would release everything else (maybe a month longer? but not much). If you can't do that, you cannot do the rewrite.


I can tell a story about a company that missed out on opportunity _because_ they didn’t rewrite early enough.

A friend of mine worked for a business that had a product that was supposed to be built to a specification provided by a regulatory body, but fundamentally hadn’t been.

The product team was bogged down firefighting regulator requests, chronically hacking in modifications (on modifications) to give the appearance that it was built to spec, but it fundamentally wasn’t - stemming from core design choices.

There was almost zero product velocity and changes regularly failed. After years, it burnt out one team and got handed to a fresh team, dropping most of the gained experience on the floor.

The opportunity cost of those fundamental design decisions had been extremely high - years spent covering up mistakes, rather than addressing them at the root - because it was quicker/easier/preferable. The result was that every patch to get them over the latest regulatory hurdle had hardened the software which, by the end, was so brittle that enhancing the product was impractical.

The business survived. The product sucked. The team hated the experience. It’s a sad, yet familiar story.

What’s the alternative? Refactor, and re-factor often. You could always be refactoring (rewriting) toward deeper insight. As you gain experience in a domain, you could ducktape new requirements on (for a higher future cost), or you could use your insights to better design your software (for a higher now cost). It depends on whether you would like your software to remain responsive to change. A primary quality of software is its changeability. I recommend keeping your software responsive to change.


> I recommend keeping your software responsive to change.

This requires a note: this is not achieved with technical practice (e.g. something like dependency injection) but with the code and models being enough self contained, and well defined, isolated key steps/blocks of the business logics you’re coding


Agreed. Thank you for the contribution.

Thinking as a reader, I’d also like add to your note: and this does not mean microservices (another technical practice) automatically gives you this either.

The microservices (optionally) come after discovering and choosing a useful model for your solution. Put time into this bit, not the technical practices.


I led two medium to large (18 months, 5-10 engineers) successful rewrites. It's not that we didn't like the old systems, they were simply unmaintainable in the opinion of every engineer, including the ones that built them. The rewrite was gradual and the systems lived side by side (sometimes handling subsets of the scenarios as features got implemented).

We had the tribal knowledge in-house to successfully rewrite them in a better, leaner stack. The end result was a much faster and easier-to-maintain system. Features that took weeks to build could be built in 1 or 2 days.

Rewrites are hard, but they can be done if you have a good team that know what they are doing. You should take a really hard look before moving ahead with a rewrite, but you shouldn't fear it when it's necessary.

Maybe there is an analogy somewhere with a root canal; they are no walk in the park but when you have to do it, you have to do it - and a good professional will be able to do a root canal safely.


They don't all fail.

In the spring of 1998, Netscape launched Open Source with the Mozilla project. Six months later, under pressure from Web standards advocates and others, Netscape and Mozilla tossed the Netscape Communicator 5.x beta code base and started over, a re-write based on a nascent engine that would come to be called Gecko.

That re-write did indeed take much longer than expected as Netscape wasn't able to release Netscape 6 for about two full years which was an awful long time to be without a web browser release.

And Netscape 6 wasn't terribly successful except that a few of us took that code in 2002 and turned it into Firefox which shipped in 2004 and quickly grew to hundreds of millions of users and a half a billion dollar a year business that's maintained at those levels of revenue for more than a decade.

So, re-writes can work, just not necessarily the way one might expect them to work.


Rewriting Netscape was one of the all-time greatest mistakes in the history of the web. Probably the decision most responsible for Netscape losing the browser wars. https://en.wikipedia.org/wiki/Browser_wars#/media/File:Layou...


I don't see how that's true. Firefox was gaining a lot of ground over IE but didn't lose to where it is now due to "being there late". It did because Chrome had an uncompetitive advantage over it and Chrome is now doing similar things that IE did then, which is pretending it's the only browser and pushing their "chrome only" ways.


I remember when there hadn’t been an update to Netscape in a long time (years? Maybe). People were taking advantage of vulnerabilities left and right. You would use literally any other browser if given the chance. They “lost” the browser wars because they stopped updating the browser that got them there in the first place. Firefox didn’t come along for quite awhile…


In typical HN fashion of late. This doesn't address my points.

I also am downvoted. It's Reddit 0.5 and has been for quite a while, which is quite funny, when you think about reddit's current path.


> This doesn't address my points.

You didn't make any points and what you said didn't make any sense. Firefox and Chrome didn't come out anywhere near the same time. Firefox came out almost half a decade before Chrome. Chrome won because it was a breath of fresh air after dealing with Netscape, then Firefox. The dev-tooling was amazing (even in the early versions, IIRC) while firefox required installing a separate extension (which your customers almost never had installed).

IE 'won' because Netscape wasn't updated. It was the defacto browser that was still being updated.

So a lot of Chrome's success was us devs telling customers "go install Chrome and tell me what the error is in the console," instead of "go install this extension and navigate all these screens/tabs ... no, not that one." Then there was the heavy advertisement on Google.com (but that was a little later IIRC).


Chrome won because it was the default and installed by default and invested in heavy advertising. It was also in your phone. People didn't choose it because devs told them to. People chose it on their own.

People had Gmail and it synced shit too. People used Google and got pushed to use it.

It's absolutely disingenuous to pretend Firefox lost the upper hand it had because of - bad decisions from Netscape ages ago - chrome being objectively better.

> Then there was the heavy advertisement on Google.com (but that was a little later IIRC).

You recall wrong. The stats showed Firefox on the way to overtake IE until chrome showed up strong and unavoidable, prompting people and being the default.

Your experience with the Dev tools is fine but most power users who wanted customization preferreed Firefox and recommended it to their non tech friends and installing and add-ons is not harder than installing chrome. So it's basically anecdote against anecdote.

IE won because it was the default.

> Chrome won because it was a breath of fresh air after dealing with Netscape, then Firefox

A fresh of uncustomizable air, right?

You drove the wrong conclusions from the market movement and wanted to chuck it down to developer decisions.


Where on earth was Chrome installed by default? Android didn’t even ship Chrome by default up until the Pixel came out. There was absolutely nowhere ever that Chrome was a default browser in the early 2010s.


Notebooks AND chromebooks.

Clearly you haven't paid attention. But even if you're fixated on the default install aspect, the Google pestering you aspect should give you a clue about the power of their reach. You know, Google, the default search engine for mostly everyone?


I never saw it installed by default back then. Maybe it’s region specific?

Chromebooks didn’t come out until 2011, they aren’t really a part of this discussion.


> Initial release of Firefox: November 9, 2004.

That's a lot later than I remembered! I thought I was using it earlier but I guess my memory is wrong.


Joel Spolsky wrote about this 23 years ago (fuck I'm old)

> It’s important to remember that when you start from scratch there is absolutely no reason to believe that you are going to do a better job than you did the first time. First of all, you probably don’t even have the same programming team that worked on version one, so you don’t actually have “more experience”. You’re just going to make most of the old mistakes again, and introduce some new problems that weren’t in the original version.

https://www.joelonsoftware.com/2000/04/06/things-you-should-...


I'm failing to see how the rewrite you mentioned delivered any value _to Netscape_.


Netscape didn't really have the data to deal with and more importantly state. If you mess that up you are done. The articles example of a billing system is a lot more difficult to manage.


Is it the case that "rewrites fail" only applies when you have an active available system with state though? I don't know that I've ever seen this disclaimer before.

I'm very curious if this is the biggest reason behind why people advocate for Joel Spolsky-style "never re-write it" approach. Because I would very much like to do a start from scratch on my own codebase, and it doesn't have any of those availability limitations. But I still haven't been able to make space to do it without stopping the world because we have customers who are currently using the code who expect new features and support.

Are we saying that the balance only tips towards "don't rewrite it" when your code runs an always available service though? If your code is merely a product that works on a release schedule, is it now ok to splinter off a Project Phoenix team to make the new hotness? Very curious what people's experience is.


It's just a lot more difficult when you have state and a World to preserve. You need to build the migration from the old to the new from early on. Hard to say if it is the largest factor to consider. It's very rare to see a company do a reason other than they are being forced to do it.


it was called the phoenix browser.


The lightweight fork was called "Firebird", and when they learned that the name was taken by, I believe, an RDBMS, they switched to Firefox.


It was called Phoenix. It was renamed to firebird later.


...because the goal was for it to rise from the ashes of the failed Netscape rewrite.


I'm reminded of this Goomics: https://goomics.net/50/


There is also "write one to throw away" -- the problem is that it's often not thrown out soon enough.


Two weeks? That is an insignificant problem. Whether the author succeeded or failed, it's a meaningless data point

What is generally referred to here is any business system that has (my number) at least a man-year of work invested, and will probably take three man months at minimum to rewrite.

These systems are everywhere. A midsize company can have hundreds. They can be excel spreadsheets and access databases.

System rewrites, despite theoretically having more stable requirements, are probably even worse than the usual software project failure rate, because they will be underinvested, the new/additional requirements will be 2x the effort per the usual


>If the old code has 104 weeks worth of features, and the maintenance team works on fixes and additions for 1 day per week ... you'll need 130 weeks to catch up.

I don't agree with this as a general rule. All the communication, specification and validation of the features is already done in the old system and can be reused, the actual implementation didn't take 104 weeks in the old system and should take even less in the new.


So strangler fig is the way to go in such cases Id guess


Man I felt the pain of the 2 years one.

But 2 week estimate to rewrite a billing system? After 6 weeks "window to try our new business model was gone"?


Also see second system syndrome/effect:

https://en.wikipedia.org/wiki/Second-system_effect

2.0 is the hardest version to write.


I also used to subscribe to this advice.

But it turns out my career has seen a bunch of rewrite projects, and they were all spectacularly successful.

So as with all advice: YMMV. ¯\_(ツ)_/¯


Rewrites like all things really on the competence and experience of the people doing the rewrite. C# . Net rewrite worked out pretty damn good


This is almost always true. But this is also how a new company inevidently sprung up and eat your lunch. Everyone knows photoshop is bloated and slow, they have all the money and industry knowledge in the world, why didn't they come up with figma. Maybe it's a bad example that adobe ended up buying figma, or it further proofs the point that don't bother rewriting, buy your competitor instead. If you can buy your competitor, then there is no innovator's dilemma




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: