Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How to be productive with big existing code base
284 points by maheshs on Feb 27, 2019 | hide | past | favorite | 180 comments
I have just started working with one of the client who have existing nodeJS code which they build in last 3 years.

Is there any guiding principle which is beneficial while working with existing code base?




My #1 rule for existing codebases: Just because you wouldn't have done it the way they did doesn't mean they did it wrong.

I think it's developer nature to look at a huge pile of code that someone else wrote and immediately think: "This is a pile of crap. I can do better, so the first thing to do is rewrite all of this, my way (which just so happens to be _The Right Way_)."

Figure out what you're trying to do, and what is keeping you from doing it. Take an iterative approach to get things done. Realize that after 3 years, they have hopefully fixed a lot of bugs and got to a solution that is somewhat mature and better than you can do in a week.


There's a term for this, Chesterton's Fence: https://en.wikipedia.org/wiki/Wikipedia:Chesterton%27s_fence

> let us say, for the sake of simplicity, a fence or gate erected across a road. The more modern type of reformer goes gaily up to it and says, "I don't see the use of this; let us clear it away." To which the more intelligent type of reformer will do well to answer: "If you don't see the use of it, I certainly won't let you clear it away. Go away and think. Then, when you can come back and tell me that you do see the use of it, I may allow you to destroy it."

That said, if you know how to refactor safely even without tests, you can make improvements. Since the questioner is working in JavaScript, Martin Fowler came out with a second edition of Refactoring with JS code that might be useful.

I've never read the first edition (in Java) but Michael Feathers' Working Effectively With Legacy Code is a frequent recommendation of mine. Its main goal is to get your mess under test, ideally tests that help you understand the system as well as improve its quality. It's organized as a set of questions one finds oneself asking in these kinds of code bases and some ways to resolve them: https://www.oreilly.com/library/view/working-effectively-wit...


The counterpoint to Chesterton's Fence is the psychology experiment with the monkeys, the stairs and the banana.

This is long but very pertinent to the "that's just how we do things around here" attitude:

> This human behavior of not challenging assumptions reminds me of an experiment psychologists performed years ago. They started with a cage containing five monkeys. Inside the cage, they hung a banana on a string with a set of stairs placed under it. Before long, a monkey went to the stairs and started to climb towards the banana. As soon as he started up the stairs, the psychologists sprayed all of the other monkeys with ice cold water.

> After a while, another monkey made an attempt to obtain the banana. As soon as his foot touched the stairs, all of the other monkeys were sprayed with ice cold water. It's wasn't long before all of the other monkeys would physically prevent any monkey from climbing the stairs.

> Now, the psychologists shut off the cold water, removed one monkey from the cage and replaced it with a new one. The new monkey saw the banana and started to climb the stairs. To his surprise and horror, all of the other monkeys attacked him. After another attempt and attack, he discovered that if he tried to climb the stairs, he would be assaulted.

> Next they removed another of the original five monkeys and replaced it with a new one. The newcomer went to the stairs and was attacked. The previous newcomer took part in the punishment with enthusiasm! Likewise, they replaced a third original monkey with a new one, then a fourth, then the fifth. Every time the newest monkey tried to climb the stairs, he was attacked. The monkeys had no idea why they were not permitted to climb the stairs or why they were beating any monkey that tried.

> After replacing all the original monkeys, none of the remaining monkeys had ever been sprayed with cold water. Nevertheless, no monkey ever again approached the stairs to try for the banana. Why not? Because as far as they know that's the way it's always been around here.


I don't see how that's a counterpoint. Chesterton doesn't say "don't change it, that's how it is and shall be." He's saying, "Question why it's present before you attempt to change it."

Looking around, there were no obvious livestock or other reasons for the fence. But perhaps they're only there six months out of the year. Ok, make the fence a gate, and not a fence, so it can be opened more easily. Maybe it's no longer used for livestock, ok remove it altogether. But if it is used for livestock (but they were just over the hill that day), and the fence is removed, an error has been made (it existed for a reason) and the livestock can now wander beyond the lands they're supposed to be on.


Usually there is nobody to ask though, and if you're refactoring a mess then the code won't give you a clear answer. The only way to find out if the was a reason the fence is there is to remove it and see what goes wrong which is what you were going to do anyway.


Actually, the right place to ask is the unit tests or integration tests. Though there seems to be an inverse relationship between the code that needs refactoring and the prevalence of proper tests.


... which is a form of questioning why it's present. "Attempting to change it" in this situation is more like "submitting the pull request[1]" than "editing the code". The PR tends to only come together after one or more rounds of "git stash, try again".

[1]: or merging your branch, if you're solo.


Short of doing that, talking to the stakeholders could help? I feel like if you understand the actual requirements you can safely make those refactoring decisions


> The counterpoint to Chesterton's Fence is the psychology experiment with the monkeys, the stairs and the banana.

Except that experiment never happened. It appeared in some book without any source (a.k.a. pulled out of the author's ass). Interestingly there was a similar experiment, but with an opposite result; the old monkey re-learned the forbidden behavior from the new one.

http://www.throwcase.com/2014/12/21/that-five-monkeys-and-a-...


A lot of these parables are purely made up.


Except for most cases, this is the proper behavior. We share experience with each other, and we can learn from it even if we haven't personally experienced it, or even if the person telling us about it didn't experience it personally.

The anecdote says they don't approach the stair because that's the way it has always been, but the monkey might say, "we don't approach the stair because the last monkey in here told us that we would be sprayed with water if we did"

Learning from people who are no longer here is really important. Just because something hasn't happened in your lifetime doesn't mean it won't happen. There are so many stories from Japan after the Tsunami, about places that learned from their ancestors to stay safe from something they themselves had never experienced.

https://www.forbes.com/sites/davidbressan/2018/03/11/how-cen...

http://www.nbcnews.com/id/43018489/ns/world_news-asia_pacifi...


I head a tale that had the same lesson. Not sure where I heard it, but it went something like this:

>There was a Buddhist monastery up in the mountains. Every day at noon the head monk would call everyone together in the main courtyard to meditate.

>One day a cat that lived on the monastery grounds started to come to the courtyard during these meditation sessions. It would screech at, and scratch the monks while they tried to concentrate.

> After a few weeks the head monk got tired of this. He ordered that every day before they meditate, the cat be caught. They tied up the cat away from the main court yard. After meditation the cat was to be released.

> This went on for a few years and everything was fine. Then the head monk passed away. A new monk was appointed.

> A few more after that the cat passed away. The monks retrieved a new cat from a nearby village and they started typing up this new cat; as is tradition.


The monkeys are passing intelligent learned behaviour - the monks are not.

In the monkeys case - it makes sense they 'follow the herd'. The learned behaviour of their ancestors is saying something: "you're gonna get sprayed with cold water if you do this"

I'd imagine a lot of things get passed on this way, maybe some right some wrong, but I don't think this really helps the 'go your own way' case so much. That we generally follow crowds millions of years into our evolution might indicate that there is indeed wisdom in crowds. Though perhaps it's helpful to realize that they are not always right.


That was such an awesome story that I immediately tried to find the source. Unfortunately, all I found was an article and a stack exchange answer arguing that it is made-up by the authors of the book it first appeared in. [0] [1]

Nevertheless, I can come up with a few examples similar behavior in large organizations surrounding processes, workflows and general wisdoms.

[0] http://www.throwcase.com/2014/12/21/that-five-monkeys-and-a-...

[1] https://skeptics.stackexchange.com/a/6859


It’s an apocryphal story that became popular with “new age” and personal development crowds to encourage you to not be like the masses. It is endearing and illustrative of a useful truism, but sadly made up to be that way.

https://en.m.wikipedia.org/wiki/Hundredth_monkey_effect


I think it's always important to lead with an attitude of service and compassion for the labor of others before you and yourself now. Get upside down on that and it makes it really hard to be impactful.

It's easy to get mad at code when you first show up and just apply Chesterton's Fence and move forward. Personally, with legacy systems (and what isn't really?) when I show up to a new project I do my best to grok the internal knowledge and test it. With official tests but also building out an environment around it so I can test conclusions. I document what people tell me, what docs and scripts they link me to, and what my results are. Almost without fail, the institutional knowledge is cargo cult.

People rarely break out of their mold for what works for the thing they need to work on. This isn't a problem that they need to fix, but legacy systems always need someone to take the time to make more clarity. So as a new person I focus on building up dev and deploy environment from zero, over and over. It's kind of an ops process. For me it's about finding multiple/duplicate configuration points, processes, improving VM or container environments, profiling, making sure that if external services are being leveraged that they can be bootstrapped to local dev or at least faked, and confirm tests harnesses if they exist.

With a "big existing code base" I have found that almost universally, the full stack can't be bootstrapped on the devs computers. They rely on tricks to run parts and cloud services and undocumented scripts. It's never because it's impossible and as a new person, I am uniquely qualified to solve that. Once I have a smooth rebuild process I move inward and refactor for clarity and tests, being careful to not disrupt other people's working strategy.

So in my experience, the first 4 weeks are asking questions and documenting the lies while repeated bootstrap. Second 4 weeks is environment cleanup. After that it's learning the system from the outside in by making the entire environment single command bootstrappable. I keep a daily journal of my experience and try to build out docs/readmes to consolidate and correct the practices that were relayed to me in the first 4 weeks as well as new clean up that I'm working on.

Being a new person on a large legacy code base is powerful. You don't have the entrenched survival problems other people have. Clean up and document to explore and try to be an exponential productivity add while you do it. One of my favorite tricks is "adding" features by removing things that people did not remember they have put into production. At my current job, my first three months was -7000 and +500 lines to our master repo. My negative -7000 was more impactful than some people that pushed 100k+/- in commit. Also I find it's useful to wildly speculate about how things work and be willing to be VERY wrong early and often. People are more likely to tell you about how they think something works when they are correcting you than when you ask directly for help. No clue why that's a thing, but it is.

After you get done cleaning things up you can make your own horrible lore and mistakes and start the process all over!


> People are more likely to tell you about how they think something works when they are correcting you than when you ask directly for help. No clue why that's a thing, but it is.

While some of that might just be people's enjoyment at telling other people they're wrong, I think it's actually often something else. It's just easier to _respond_ to something than it is to start with a blank slate.

The question "How does this work?" is a blank slate. Even when asking it for yourself about a gigantic new codebaes, you form hypotheses and then 'test' them in some way, as you say, "testing conclusions". When you give someone a theory, you've just given _them_ a 'conclusion' they can 'test' against their own domain knowledge. :)


If someone ask me to review a commit that contained +/- 100k loc, I think I'd reject it on principle.


Not one commit, total sum over 3 months. You would be right to refuse that in most circumstances.


I've managed to finish reading the chapter with the example refactoring from Fowler's book before putting it down as parody.

He's part of a design school that likes to pulverise code into tiny functions with the goal of having code that consists only of assignments and function calls, reading like instructions in English to the computer.

Whoever was indoctrinated by Robert Martin, Martin Fowler & co should read Ousterhout's design book for a fresh perspective on this.

The value of a book like "Refactoring" lies in naming the refactoring (e.g. extract function), but please don't listen to the design advice.


Thanks for the pointer to Ousterhout, I don't think I've heard of him before but a lot of the Tcl people have put out good stuff.

Uncle Bob and friends definitely form their own "school". I think they call the "functions and assignment" style "Newspaper style", since for them it's kind of like reading a newspaper. Following it you can actually get a long way towards the dream of "self documenting code". I'm happy when people who believe in some specific school of thought get it to work for them, and even if I disagree on things at least there's consistency.

It's still never as clear as a literate program, though. And while Uncle Bob has been upgrading by learning Clojure, a lot of that school's work has assumed OO-in-the-style-of-Java5/C++98 environments. And to be fair that's a lot of the mass of legacy code we unhappily get saddled with. I think their work is less applicable and there are other schools to listen to when you have different languages or programming methods (FP, declarative, OO with a MOP, etc.).

I read the Clean Code book finally not too long ago, and while I've never argued with a book so much, there is value in it, and some programmers would do much better than their current flailing to follow its approach exactly. But these guys aren't my personal programming heroes, I'd rather be like a Norvig than a Jeffries (even if the latter is a perfectly acceptable engineer). But I can usually find value in all their works; philosophically I'm aligned with Bruce Lee, I'll take what I think is valuable and discard the rest.


Re"pulverise code into tiny functions with the goal of having code that consists only of assignments and function calls"

You dont refactor so its "so small it becomes functional". You remove/refacor code with side effects where possible. So that part of the code becomes functional. It does not matter if a function is a lot of code or a little as long as it does not impact the world outside it.

A small function that adds two numbers and as a side affect assigns some global state is to be avoided however small the function.

"Fuctional programming" is not just about writing lots of functions


I apply Chesterton's Fence to pretty much everything, from code to design to business goals to politics. Back when I worked for a large tech company, I'd get deeply frustrated with their slow decision-making framework. I'd joined the company through their acquisition of our company and it felt like banging my head against a wall all day.

It was only a year after I quit that it began to make sense. Now I try to assume that people had good reasons for their implementations and I try to figure that out first.


Ok, but seriously I am genuinely curious, what is the reason for the fence in the picture? I see fences like this when I am out in the mountains camping or hiking. Is it to prevent some type of animal from roaming down the roads?


Every fence might be different! Clicking on the picture gives a description of that particular one's history. When I see fences/gates on forest service roads or access roads I presume one reason for them is to limit access. Roads require maintenance, having fewer vehicles driving over one means less maintenance. During winter season a fence could also be there to limit access for the purpose of reducing the need for search & rescue or towing services...


One common thing is for cattle to be allowed to graze on public land, and it keeps them from wandering out of bounds...


With statically typed languages, you can do guaranteed safe, automated refactors. I’ll do those mercilessly without tests. I would be very wary of refactoring dynamically typed language.


I've broken code in Java because I inlined a private method as part of a refactor. I found out later that some other code was accessing it through reflection. Fortunately that other code was an old test, instead of production code, so when it started failing on the test automation servers I found out about it and could update the test to not do that. Then I did what I do frequently anyway, regardless of dynamic or static typing, which is to use ag[0] and increase my confidence there weren't any other surprise references. Of course there's always the possibility of some sadist splitting the method name into two strings and concatenating them later, ag's no guarantee either.

If you've ever done a partial build system you can also break downstream dependencies you didn't know about from even compiling, and not know about it until you try to integrate. (I've done this too, fortunately the integration happens as a gate to checking into the main code branch.) If you at least keep the source of everything locally, even if you don't build it, ag can help again.

My only point here is that static typing isn't enough if you're prone to fear-driven development; you have no guarantees, just things that increase confidence. Static type proofs are but one way to increase confidence. I'm happy Fowler decided to use JS for his second edition, since this particular line of FUD when it comes to there being some impedance mismatch between dynamic languages and refactoring is unmerited. The first auto refactor tools were made for Smalltalk, a dynamic language, after all.

To me the two keys to safely (at high confidence) doing any refactor are to first know what you're doing (and what a tool is doing if you're using one, I'm all for pushing for better tools) and second to do it in small bits with a tight feedback loop. As part of the loop after you make a change you use whatever methods (compiling for static type proofs, running tests, manually testing (REPLs help this a lot), sometimes just pure reason) to become sufficiently confident that you didn't break anything. Sometimes you still break things, as part of a refactor or just as part of regular development -- every bug filed is one that got past all of your reason, your compiler, and your tests, but don't let that fear drive you.

[0] https://github.com/ggreer/the_silver_searcher There are other tools too. Regardless of language, writing "grepable" code promotes a lot of nice qualities, not just easing refactoring.


Even if someone were crazy enough to use reflection to get around accessibility for testing (which you should never do), once I saw that once, I would be judicious about using “Find all references for a class” - something else you can’t do reliably with dynamic languages.

As far as partial build systems, with at least C# that would still be caught when you do an integration build. But that still begs the question, why wouldn’t you use a proper internal package repo and proper versioning?


This is quite true.

But it must be noted that the code may actually be bad. Or it may be bad due to a thousand valid reasons (time pressure, business changes, etc)

My personal approach is this:

* Write new code must be "good" (Whatever that definition is)

* As you iterate through old code, clean them up.

For example our own codebase is several years old Nodejs project. Mostly written in callback style with `async.auto` and it's really ugly and hard to maintain comparing to async/await.

All our new code uses promises and async/await. All old codebase uses callbacks. We have to use promisify a lot.

But without any hiccups, we are slowly and slowly moving towards a better codebase.

What helps "immensely" is tests. We have 2 different layers of test and I don't think anything would've been possible without them (although none of them are at 100% coverage. Nowhere near)


I go back and forth if the code I'm adding should be what I consider good or if it's more important that it match the flow and feel of what's already there. A code base written in twelve different ways is usually even harder to read and understand than one that's "not good".


Well definitely don’t write bad code intentionally!


Certainly, but the definition of “good” changes over time. At one point, callbacks were good, and now they’re not. What happens when the definition of “good” changes a second time before the code base has been fully converted from callbacks? Well, you shouldn’t write bad code, so now the code has three code styles in it, and, more importantly, two styles of conversions: From callbacks to current-good style and from previous-good style to current good style. This is harder to keep in your head, so conversion now goes more slowly.


I share the same perspective. The value of a code base following one single coding style/convention/architecture is huge, even if there's a better coding approach right around the corner.


Yes. Trying to refactor everything at one go is a disaster waiting to happen. It is always best to do it incrementally.


I don't think that's the right mindset to approach it. My experience usually is that it is a pile of crap, and I can do better, but I probably would make the dame pile of crap if I was under the same conditions of the original coders (time constraints and scope creep usually)

So change it if you can, but respect those who were there before


I'd also add here that hindsight is 20/20. You don't really know what you're creating until it's created. Do it again a second time when you know exactly what you're getting at the end, alongside the challenges you'll face in the process of such, and you'd be able to do it faster, cleaner, and just overall better.


Intuitively, I often feel the same.

Joel Spolsky argues against that: https://www.joelonsoftware.com/2000/04/06/things-you-should-...

The Mythical Man-Month as well: https://en.wikipedia.org/wiki/Second-system_effect


It's worth mentioning that what both of these are talking about are rewrites using a new project in a all or nothing stake. Iterative rewrites are not subject to these caveats.


I agree with both you and the parent that sometimes rewriting the code is best, and you can usually make a superior solution by utilizing lessons learned from the first version, either from rewriting or iterative improvement.

But I think the mindset to approach a codebase with is to be open to it, rather than dismissive of it. Code is such a stylized thing, and everyone has their own style and loves their own style most of all. It's hard and takes time to understand someone else's code (a vastly underrated skill IMHO). The temptation to find some reason to start over when it isn't exactly necessary is usually quite high.

Once you understand what the current code is doing, then you really have to apply your good skills on what parts to save, and what to change, and how that relates to the schedule and cost.

Also in a corporate setting, there can be political / organizational friction to throwing out code or starting over (and it rarely makes you friends with whoever wrote the first version).


Exactly!

The first version of anything almost certainly ALWAYS sucks. Usually the only way that doesn't happen is if the team developing the solution has prior knowledge of some kind...having built a similar solution before.

I used to always hate consulting for this reason. The company I was working for was always doing greenfield development for clients. We never, ever got to do v2, because the projects always transitioned to clients' teams. I can't remember a single project I was working on that I was happy with.


I'd say that you can make a better design, sure. But the implementation will almost always be worse at least for some time. An old codebase carries tons of bug fixes and workarounds, they may not be elegant, but they make it more stable.

A new reimplementation, albeit cleaner, will almost always be buggier.


I think the book « The Pragmatic Programmer » advises to do refactorings twice: First refactor to see what it’s hitting, then rollback. Once you know what you’re hitting, format it and clear it up, commit, then perform the big refactor, and you have much better chances of going smoothly.


My number #1 rule for existing codebases:

Get it into a state that will support current business goals and roadmap while protecting current business value and users.

That definitely can and often should include refactoring. But. There are many caveats and your rule is absolutely one we should heed. Though I would hate it for the capable, talented, motivated individuals that are part of every informal (or formal) clean up crew to heed it at the wrong times. The times when (a) that's not the way anyone should do it (b) it's actively adding negative value along some dimension and (c) it's preventing the business from reaching the next level.

I think it's a skill in its own right to identify solutions that do and don't need replacing. I think a good rule of thumb is to ask "will this enable some new capability that we haven't been able to do before and that would deliver real value along some dimension to more than just me?". If you don't have a solid yes to that you shouldn't do it. Often I find if you can get an order of magnitude better along one or multiple (preferably multiple) dimensions then it's usually worth the effort.


Dead code, inconsistent naming, CPOLD versioning, god objects, giant functions with christmas-tree like block nesting, SRP violations everywhere... A lot of code written by professional developers is like this and this is objectively bad.

I've been lucky enough to work on several occasions with code that wasn't like this and it was a joy to maintain. Cherry on the cake, that type of code usually comes with some automated tests.


Don't forget the customary "Utils" class which always contains lots of hidden gems like reimplementing 75% of the string manipulation functions provided by the standard library - some being unique implementations, others being a single call to the underlying standard library :)


I think it's beneficial to work within the idioms of an existing codebase to maintain predictability and reduce complexity overall. If you can improve the code by removing complexity, that's a win. Otherwise idiomatic code will stand the test of time in a complex codebase even if it isn't how you would have written it. One bad way of doing something is better than two completely different ways of doing the same thing.


"When you see something that's old and it's been there for a long time and it's working, don't laugh at it."

(From a not-really-related yet quite interesting talk I watched a while back: https://www.youtube.com/watch?v=TQB0KK2rxcw#t=31m0s --- also featured on HN at https://news.ycombinator.com/item?id=18430512)


It's developer nature to look at any code and think that -- especially your own! Or if you prefer the abstraction, perhaps to consider that "myself 3 months ago" is a different person than "myself today".

Every time I've gazed upon a new legacy codebase and said "This looks like junk", the system architect responded "Yes -- that's why you're here!"


I mean it's pretty normal for a long existing codebase to be have had a lot of poor (in hindsight) design / architecture decisions in it and feel that if you got to write it with the benefit of hindsight you'd do a better job. It's only hubris to think you can do it in a week, rather than requiring as much time as the original (but with a better resulting codebase).


I'm doing exactly this now ... the problem is multiple iterations of trying to set an architecture and a pattern to where things go. The architecture presented, if it was to be implemented that way would have been perfectly fine.

Now there are 3 iterations .. with 3 different patterns of what would be a state container, UI business flows split across 3 layers (data access/api, business/decision , view models) and view models (NOT VIEWS) sharing state.

What also doesn't help is the tooling is close to non existent .. I got so used to webpack / lib(framework) dev tools / hot reloading / all in 1 dependency management / intellisense


If your predecessors were indeed total hacks at programming, you can get all of the above. But it's a dubious pleasure. My head sustained a lot of scratching and smacking and my face was over-palmed.


It's not that they were total hacks. It's that they fell into the 1 standard deviation of the normal distribution that made them an average developer. Average devs produce the status quo, which is "software that works but that everyone complains about having to maintain". In my experience people with solid framework design and system architecture skillsets invariably fall into the top tail end of the distribution. They aren't like other developers. And they're rare. They can and often do get dubious pleasure out of making old code new again.


Sometimes a very dubious pleasure. I work at a company where another product team decided to rewrite the entire product from Angular 1 to Angular 2 because it was so much better. I was shocked and dismayed, but didn't say much because I didn't feel like starting a fight I would lose, and it would suck me into a project that wasn't my primary focus.


This.

I see myself in your words.


Sometimes the initial design truly was a mistake.


Yes, and sometimes the people are fully aware of that as well. But even a shitty designed piece of code that has been running in production for year will have ironed out most of its bugs. There might be other designs that would be more maintainable but refactoring to reach that might be an investment that the company is not willing to make.


"This is a pile of crap. I can do better,"

I'll take this one step further and say: if you think this, you're unqualified for the position. You are an amateur.


Very many developers have this as their first thought whenever they see somebody else's code for the first time. Likely it's just a response to being confused about how or why something works. It's easier to think that somebody else is bad than to think that you've got something to learn.

What matters is how you handle that feeling. If you run with it and continue to operate as though the code is terrible, you're probably not going to get very far as a developer. If instead you take a step back, work to understand the code, and think about what led the original author to make certain decisions, you'll do quite well.


I'll take this one step further and say if you truly think trash fires are acceptable and there exists no developer who can replace a trash fire with something warm and inviting then _youre_ the amateur.

I get what you mean though. I've seen dreadful results of some people who tried to replace a WinForms app with a WPF app and the people who did the WPF version totally half-baked their underlying framework. It was very amateurish. In contrast the existing framework built on WinForms was 90% baked. It simply had a lifecycle the developer had to understand (very difficult, nobody got it right) and was essentially only missing a final inversion of control layer to tie it all together. I added the missing layer and time to implement a new screen and number of bugs plummeted.

I thought it was crazy in 10 years of that code existing and hundreds of people having worked on it that _I_ was the one to spot that and go "hmm, you missed a spot" and have such a dramatic impact with a single week of work.

It still fucks with me to this day that _nobody_ was either capable or willing. That just blows my mind. It stuck out like a sore thumb to me. It seems to be the pattern thats emerging in my career though. I've come to understand it's a rare thing. So, fundamentally I agree with you're advice as it applies to most people. I just know first hand there are some people out there that understand exactly how to breathe new life into old systems and how to prevent new systems from decaying.


I know this opinion is extremely controversial, but sometimes — usually as a result of inexperience — people do put out bad work.

Sometimes it's cheaper for a business to rewrite something (the size of that something is highly context sensitive), than it is to work around a poor approach to a problem.

I know we love to believe in the industry that everyone is a genius and any self-doubt is imposter syndrome, but as Camille Fournier put it: "This is Hallmark card pablum".[0]

[0]: https://twitter.com/skamille/status/1004735128726376448


I dunno, I've seen quite a few piles of crap in my day as a SWE. The bar is not that high to do better.


I have to agree with the comment you're replying to with a caveat. In my (limited) experience balls of mud are created because the developers either didn't understand the problem or its domain well. So if you come into a "pile of crap" and have the same (or less) experience with the problem/domain and (almost definitely) less time, then I really don't think you can do better. Maybe you can do it badly, differently, and think it's better though.

If you have more experience in the problem domain (either business or technical), then you might actually be able to rewrite it much better. But then there are quite a few other problems: Do we have time to rewrite it? Will it integrate well? Are you actually solving the right problem? Is it so different people won't know how to use or maintain it? etc.

I think even if you're rewriting code, doing it in the framework that's already available, and making as much use of it as is reasonable, is the better option.

I don't think it really matters what the solution is, within reason, as long as everyone agrees to follow it.


The point is "doing better" is pointless unless it serves a specific business value. Beauty is not a sufficient reason to refactor a code base in production - because refactor always has risks, and those risks should be offset always by some tangible expected reward.


Definitely. "Is it good enough for the business?". If yes, then leave it alone. Try to add new features in a safe way with automated tests.

I once worked a contract where it was expressly forbidden by the dev manager to refactor any code unless it was demonstrably necessary to support a new feature. Not much fun for devs at times but I respect the reasons behind the decision (esp. as it was a derivatives trading platfom and it can get pretty expensive pretty quickly when they go wrong).


If you always think this you’re probably an amateur, but if you seen good code that wasn’t yours, then you might have a sense for things. Of course that requires that you read a lot of code.


I have no idea who downvoted you. I agree completely.

In a business setting all decisions should come from a business analysis, not from an inner desire for beauty.

Beauty is important, but save it's search outside business hours.


Sometimes it's true but you definitely should wait quite a while before you say it. First you need to understand what the current code does and also learn some history how the code has developed.


I have a similiar problem like you, except it's Java and more like 15 years old...

What helped? Using a debugger and stepping through the code was useful, it's more a less a REST-API here (build ontop of the system, before it was SOAP, etc.pp) and I've just used some heavily used endpoints and stepped through all the way...

Another huge boost in understanding was using flamegraphs (not sure what's hip for nodejs maybe this? https://github.com/davidmarkclements/0x)

This was really an eye opener because that app also used an external huge Java ECM and there was lot's of AOP magic, reading the flamegraphs and looking at the source was a big boost in understanding.

It's also a really useful tool to get visibility for performance problems that are not directly visible in the code.

If there are tests, reading them might also be worthwile.

And take your time... took me a few months to get a basic understanding how it's working (I'm more sysadmin, not really a dev there), so don't except to grasp everything in one week.

Ask your colleagues - maybe were to find documentation or if you don't understand something while reading the source.


This is good advice.

I'd add that producing something as part of your process of understanding the code base would be helpful to your colleagues.

If there is hard to understand code, figure it out and capture it in some documentation. If some code doesn't look robust, add tests. If you find bugs, log them and write tests. That way, your colleagues also understand the process you're taking during your "grok the code" period, see progress and can jump in to help. Having a top level understanding of the system and its business functions is important so you can evaluate code your read relative to them.

Adding type annotations can be useful, so try FB's "flow" which gives useful results without having to do any annotations yourself for starters.


+1 on this. To be productive in a new code base, you need to get to know your way around it. I find stepping through in the debugger to be the best and fastest way to learn my way around.

If you don't have a debugging setup in place, it's well worth taking the time to set it up.

You can't really just start a debugger at the beginning and step all the way through - for a big code base it will take hours to step through one entire e.g. REST API call, most of it wading through unimportant framework and support code. Some strategies for finding a juicy place to stick a break point:

- pick up a small bug fix task and try to hone in on important areas from there - ask another dev where some of the "main" parts of the code are - if they've been there for any time at all, they will know where to point you - look for files with heavy commit activity over time. Don't limit your search to recent commits - often, core code becomes stable and less frequently changed, but still gives you the best picture of how the whole application works - use performance profiling / flame charts to figure out where the most CPU time is being spent. As a bonus, this functionality is often included in your debugger setup

Once you find some "main" areas of the code, take some time to step through individual lines, and step out to better understand the call stack that led there. This will get you up to speed way faster than trying to read through code, documentation, and even unit tests IMO.


I've been using flamebearer to get flame graphs from node. It was silly how easy it was to use. Highly recommend it.

https://github.com/mapbox/flamebearer


It depends. Number one, find out if the codebase is bad or just big. This will take a few months, so I try to keep my mouth shut for a while.

If it's really that bad, build a world in a teacup. Try to make one small new area of code that's nice and slowly work existing code into it whenever you get the excuse. It's very unlikely they'll allow you to rewrite or even make substantial changes to existing code. If it was allowed, somebody would have done it.

It's also unlikely you'll ever have the codebase migrated completely. In my case, this meant migrating part of the app to a new web framework while keeping the ORM layer relatively the same. Focus on the worst parts. Kinda bad stuff can wait. Expect to write some glue between the worlds on your own time.

In your case, IMO Node is a bad plaform for large code bases. My approach would be to introduce TypeScript into a small corner of the app and grow it over time. Even in the existing code, Typescript will type checked the JS and make work/refactoring easier.

Once you have typescript up and going, pull in some add-ons to make Node work with async code. The biggest downsides of Node are dynamic typing and callback hell. Typescript + async + heavy linting so this doesn't happen again should put you on a good path unless there's more demons lurking in their stack


If it was allowed, somebody would have done it.

People are often scared, don’t care or don’t know how to. I have found it best to include a batch of refactoring each time I touched something during development. (Because adding features or fixing bugs requires understanding the part of the code, so you may as well improve it since you already have it in your head.) This makes everything take more time, but that has to be expected when dealing with a lot of technical debt.

A huge +1 for TypeScript. I usually work in a well-typed language (Swift), but have been recently working in JavaScript and it was amazing how much TypeScript improves the situation. I can’t imagine working on a serious code base without types. (Some people can.)


> Once you have typescript up and going, pull in some add-ons to make Node work with async code.

Maybe I am confused about the intent of your comment, but NodeJS works with Async code all by itself so long as you use a recent version, such as v.10+.

Callbacks are indeed hell, but you can avoid them by using Promises, Events, and Async/Await all without having to "pull in some add-ons". Furthermore if you use an event queue like Kafka or RabbitMQ and avoid shared global state it's possible to scale NodeJS horizontally quite nicely.

My comment here doesn't seek to denigrate TypeScript, nor defend NodeJS's limitations for building large systems with large teams, but to highlight that asynchronous code execution is quite a good fit for NodeJS when managed well. It is true that using observer patterns like RxJS can improve that even further, but they aren't required to achieve sanity.


You haven't said much about your role here, but let's assume that programming figures prominently. Your immediate problem is that you will be given tasks that involve making changes to a codebase that you don't understand very well. There are many different ways to understand a codebase, and to be effective, you will need to learn some of all of them.

Firstly, there is the purpose of the system, which means getting to know what its users want from it and how they use it to achieve that. I put that first, because everything else follows from it, but that does not mean that you have to know all there is to know about that aspect before tackling anything else.

You will need to learn about your environment's process: whatever is used for task assignment, scheduling and tracking; version control; building; testing and verification, inspections, test setup and execution; integration and deployment. Of these, you will need to know how to get the source code and test any changes you make, before you can do any programming, and a significant landmark in getting to know the system is when, given nothing but a backup of the source and configuration files, you could resurrect it.

The more you know about the architecture, the better - it is the first step in understanding how the system meets (or fails to meet) the users' needs. The architecture often imposes requirements and constraints on how you approach completing a given task.

Understanding how everything works at the code level would be a desirable goal, but not one that can be achieved quickly (if at all), so you will have to be guided by what you need to understand in order to do do your assigned tasks.

It is also useful to know who knows what among the people you will be working with.


I recommend the book Working Effectively with Legacy Code by Michael Feathers. I've been reading it recently, and I've enjoyed the lessons and advice so far.


second that. but in a much more concise way, i would suggest the same answer to the question "how to eat an elephant"...


Based on my years of experience in working with a millions-of-locs over decades codebase:

Be aware of the abstraction fallacy: developers are often guided by this insane notion that they can get rid of accidental complexity by wrapping it away behind an abstraction layer. You can't. It's better to suck up to the complexities of the existing system, and prefer explicit, procedural copy-and-pasting than trying to invent your own abstraction layer on top.

The problem with the after-the-fact abstraction layer is that if the original team members are not available, you are likely not in possession of the whole theory of the software. Hence it is not likely you can in the beginning choose the right abstractions.

The correct way - if possible - to simplify existing code is to refactor the code itself.

The specific anti-pattern you will reach by following the false abstraction strategy is the lasagna architecture: https://herbertograca.com/2017/08/03/layered-architecture/

Two literary works that helped me enormously to grok working with legacy code and programming in particular:

* Peter Naur's paper "Programming as theory building" - this was an amazing eye opener to me. It specifically highlights several problems that may arise when working with legacy code when the original developers have left the building.

* Michael Feathers: Working Effectively With Legacy Code - not to be read necessarily as a "how to" recipe book, but rather as a collection of philosphies and techniques to utilize when faced with a huge in-production codebase. It can be read as a recipe book if the examples match your situation, but that's not the point.


A large codebase under active development presents a moving target; Even if you knew how something worked last week, that code might have changed twice since then. Detailed knowledge in the solution domain gets outdated fast.

To address this issues, I work with something I call a behavioral code analysis. In a behavioral code analysis, you prioritize the code based on its relative importance and the likelihood that you will have to work with it and, hence, needs to understand that part. Behavioral code analysis is based on data from how the organization works with the code, and I use version-control data (e.g. Git) as the primary data source. More specifically, I look to identify hotspots. A hotspot is complicated code that the organization has to work with often. So it's a combination of static properties of the code (complexity, dependencies, abstraction levels, etc) and -- more important -- a temporal dimension like change frequency (how often do you need to modify the code?) and evolutionary trends.

I have found that identifying and visualizing hotspots speeds up my on-boarding time significantly as I can focus my learning on the parts of the code that are likely to be central to the solution. In addition, a hotspot visualization provides a mental map that makes it easier to mentally fit the codebase into our head.

There are a set of public examples and showcases based on the CodeScene tool here: https://codescene.io/showcase

I have an article that explains hotspots and behavioral code analysis in more depth here: https://empear.com/blog/prioritize-technical-debt/

I also have a book, Software Design X-Rays: Fix Technical Debt with Behavioral Code Analysis, that goes into more details and use cases that you might find useful for working with large codebases: https://pragprog.com/book/atevol/software-design-x-rays


Nice! I'm working on internal tooling for us that does a lot of the same things - gonna buy the book, thanks for that, weird I've never heard about it. For now I'm measuring: churn, complexity, linting, test coverage, test quality and am going to add a dependency graph. It seems to me that churn, complexity and dependencies are the biggest indicators of a hotspot.

Got any tips for possible problems I'll encounter along the way?


Cool - thanks! While the measures a simple in theory, there are some practical challenges; git repositories tend to be messy. So part of the practical challenge is to clean the input data (e.g. filter out auto-generated content, checked in third party libraries).

Another challenge is that version-control data is quite file centric, while many actionable insights are on a higher architectural level. In CodeScene we solve this by aggregating files into logical components that can then be presented and scored.


I'm already onto data cleanup, tbh I'm focusing on Android repos for now. But great idea, now that I think of it as an architect I'd mostly like the option to group a few classes/packages or let's call them "modules" together and then visually see which pieces are too dependent on outside sources and which are the hotspots/connections/dependencies inside that group.

There goes my weekend...


What is nice about this approach is that it gives you an instant map of the codebase (where the important parts are, etc.) that you really just can't get any other way.


The things I do when starting on a new codebase:

1) Ask if there is onboarding documentation, or someone who can give you a high-level overview of the codebase. Typically finding a person with a lot of context on the code is the fastest and most thorough way to understand the responsibilities and layouts of a codebase. Ask if they can draw an ER diagram, it's extremely valuable documentation for any additional developers.

2) Read all the documentation possible, especially design documentation. This should hopefully give you some clues as to both function (what) and purpose (why). The discussion around this will also introduce you to the major players in the architecture of the codebase.

Note this does not necessarily mean formalized design docs, it could just be searching for any README's or relevant wiki pages. You're just gathering threads at this point, and documentation tends to be a lot more compact and easily digestible than foreign code.

3) Look at the models - there will be compact representations of data at some point. This gives good insight into the shared language of the code and can give a lot of clues about how things are done. They also tend to be a lot more human-readable than other pieces of code, so this is a plus.

4) Find and skim the largest files. Typically these perform the majority of the work, have the most responsibility, and introduce the most bugs. Knowing roughly where the major players are and what they do makes it a lot easier to read any individual file.

5) Run the application, find some small behavior (a single, simple endpoint) and debug it, stepping through the application code so you can see how a particular request flows through the system. This can show you how a lot of different concerns within the code are tied together and also ensures that you're set up for both running and debugging the codebase.

At this point you should have a fairly solid understanding of at least the most critical points of the codebase, and also be set up to run and debug it. You should also have at least one or two points of contact to ask questions. This gives you a good framework for figuring out how to modify the codebase moving forward.


Some of my rules for a big legacy code base:

- don't plan or do a full rewrite - it'll almost never work

- learn and use tools to automate the build system and quality assurance (jenkins, sonarqube, docker, git, etc.)

- take the time to improve your skills and the skills of your team (coding dojos, experiments)

- write automated tests (unit, integration, acceptance) for existing code where ever possible - write at least unit tests and integration tests for new code

- do refactoring and first focus on cross cutting concerns (APIs, translations, caching, logging, database, etc.)

- migrate things to well tested isolated APIs (e.g. use REST / Graphql APIs with new endpoints in the frontend and try not to use untested code for these APIs)

- don't be too backwards compatible (move fast and break things)

Hope it helps ;)


These text boxes are not great...even on my 4k monitor I have to click the scroller at the bottom to see 1/2 of your longest bullet points; must be even worse on mobile. Better to just write it in plain text instead of a box.


Thx for the hint... edit done :-)


A lot of people are giving advice on changing/testing/refactoring etc, I'm not sure if that's what you are asking, as opposed to how to lean a new code base.

My technique for learning code bases, other than asking other devs questions like :-

"What architectural patterns are you using?" "How do you deal with testing?" "How is it deployed?" "how do you deal with data access?" "How do you deal with data migration" "how do you deal with scaling / concurrency / security / authentication" etc... broad stroke things.

If there are no other devs...look at the code and look around at some of these broadstroke things.

One thing you may find is they are using an architectural pattern you aren't familiar with, so be on the lookout for things that look odd but look very deliberate and duckduckgo to see if you can find any information around names in the code. Like if you saw FooActor BarActor, google for Actor / software etc.

Another thing to keep in mind when you find odd stuff is that quite a lot of devs copy paste stuff from the internet, and implement partial ideas they have learnt about, so duckduckgo for snippets you suspect.

Then the main thing I like to do is take a usecase from the software and follow it right through every layer, either through static inspection, or using a debugger. I then follow up on things that initially seem confusing. I then start sketch out a bit of an architectural diagram ( throwaway )

For Embedded systems code I tend to start from boot and draw out a bit of a flow diagram of what's happening.


Do they have good test coverage? That's key. If they don't, start with that.


One thing to consider: large code bases are large. To get any kind of useful test coverage it may take you a very long time. While you are doing that, you won't be visibly making any progress as seen from other parts of the company. I often describe it as a kind of horizon -- you can do what you want until you get to a time horizon. After that, the rest of the company feels like they have lost sight of you. This causes them to panic and they will usually alter your priorities dramatically. My rule of thumb is that your time horizon is about 2 weeks. If you don't make visible progress in that time frame, your project takes on more and more risk as you sail away.

Instead, try to find as much low hanging fruit as you can find. Fix bugs. Add small features. Address nagging complaints that have been around for a long time. At the same time, start filling out your test framework (and also beef up your build process). Keep pairing refactoring and janitorial tasks with "work that pays the bills".

One last thing. Try to identify areas where your users/paying customers/stakeholders think, "This shouldn't be hard" and cross reference that with areas that are hard due to code complexity. Make sure to pick some of that work so that you can make the tests in that area robust. Once that's done, start refactoring so that you can do those requests very quickly. Once you do that don't forget to advertise your success! "Remember how much of a pain X used to be? Now we can do it quickly because we worked hard on fixing the problems". This will give you much needed political capital that you will spend investing on improving other more tricky areas of the code.


This guy got it!

Intertwining refactoring work with quick wins that fix visible issues is key. Do not blow all your load in the first timeframe by just crunching away on all quickly fixable issues - even if that would probably shed you in a light as bright as the sun, resist the temptation, and just fix as much visible stuff as you need to justify your position and to please the Excel number-juggler crowd. You should project an image of being productive, but not overly productive - while you are in reality actually being overly productive, it's just that you put the remaining time into refactoring, increasing test coverage, improving build/dev infrastructure, all that stuff that people don't grasp the value of.

If you keep that up for some time, you should get the system to a point at which your "hidden" investments start to deliver actual, visible value. Maybe build time goes down, people can iterate faster, maybe your deliveries get better in quality because your tests catch more bugs earlier...whatever, at some point in time you can come out of the dark and start talking about some of the improvements you already did, even actively advertise their value. This is just as crucial as the earlier "shadow phase" - if you want constant improvements and refactorings to be done all the time in a sustainable way (especially long after you've left the project) you need to change the attitude towards such "janitorial tasks", you need to showcase their value. Ideally, you (and others in the project) will be able to actually get time allotted for some refactorings and you won't have to do those in the dark.


Developing good test coverage after the fact can be very expensive, but of course you can prioritize the test development by bang-for-the-buck, and also do test development on-demand.

One of my consulting tasks involved replacing a PostgreSQL interface layer in a large and complex legacy bespoke system, which had a custom object-to-relational mapping layer atop it (with versioning and other features), and a meta layer atop that, with a decade of technical development atop that, and the original architect was no longer available. This was not something anyone wanted to disturb, but it had to be done.

There were also some complications with many SCM branches of the code being maintained in parallel, and couldn't be updated frequently, and only a very small team, so we couldn't practically do a one-time refactoring of the entire code base.

I ended up creating a fairly exhaustive regression test suite for the lower-level database layer, and then using that new test suite (as well as the understanding gained while developing it), to create a drop-in shim over the new PG interface.

This turned out to be a great success, thanks to judicious investment in retroactive regression test suite. And the system, despite having a lot of old code and complexity, and being bespoke rather than leveraging latest off-the-shelf framework, was actually the first on AWS to receive a particular certification.


I've discovered that this advice doesn't work for me because I really prefer to see the software being used in a real world scenario than in tests. This is also why I hate writing tests for something I didn't develop.

So instead, I usually try to setup the dev stack and walk through multiple use cases of the product, usually narrowed down to the area I would be working on. I would make sure my stack is so that I can iterate really really fast (I've a few quirks that I tend to use like weird aliases etc etc) and all this while also simultaneously working on small/minor bug fixes. I do this for atleast 3-4 weeks atleast, then I get a feel for the code and where things are. After that, I personally feel confident enough to take on bigger projects on that codebase.

Has worked out well for 3 large codebases at 3 different companies so far.


God no, I've seen two code bases where this happened and each time the tests were rubbish because they didn't understand the domain.

Plus the clients both rubbished those developers because they didn't get anything done, so that's a surefire way to get a bad rep as a freelancer.


I totally agree with this, and writing a test suite also happens to be a great exercise in learning a new legacy code base.

I’d also suggest following this up with a solid monitoring system so you can be confident any exceptions / performance regressions are caught.

Confidence in new deployed code and reduce time to find any faults is one of the most important aspects of development productivity.


...and if they do, take advantage of it - you can learn a lot from what the tests do.


I’d say the most important thing is to learn the domain and the business you are working with. Never assume the code is doing things the right way for the business. Get to know your client really well and try to understand what they need to software to do. Keep them in the loop as much as possible.


This should be the best course of action before diving into technical aspect. Look for any updated business documentations. If there aren't any, ask the users of systems to find use cases and replicate them in dev system. Try to document the business flow and emulate any hidden behaviors. Last, open the codebase and matching business process with code, and comment everything possible.

Then you can begin fixing or refactoring with use cases / test cases in hand. If time isn't possible for that, try to shift the responsibility to the one giving you task (pm).


This is super specific to each project but here things that worked for me in previous projects.

Two assumptions: You plan to work on this longer-term (not a 1month project stint) and there are things worth improving (eg barely used legacy app might not be worth your time)

#1 Get the team on board

if there are multiple people you need their buy-in and support for whatever approaches you want to do

#2 Plan for "health by a thousand small improvements"

it will be an iterative approach and you will refactor as you go.

#3 Don't assume different = bad

people might have done differently, consider using their approaches. you might do it differently. but it's better if you keep a consistency within the codebase. in codebase management consistency trumps cleverness

#4 Create space

Consider introducing a fix-it friday where everyone can work on little improvements

#5 Create non-blame culture

Stuff will break if people risk improving things. Avoid blame shifted to them. If bug trackers ping individual people consider pinging the whole team instead

#6 Consider automation

introduce linters, autoformating, codemods, danger.js, code complexity analysis, etc

#7 Introduce tests

This one is the most annoying. But worth doing: whenever you improve a feature a bit try adding a test - often in legacy apps there are no good tests. A lot of people recommend writing a test suite for the whole app before you do anything. If you are lucky enough to do this try it. I always found the iterative approach more realistic as you can also do feature work while refactoring.

When doing tests focus on integration (vertical/functional/etc) and not unit tests (unless the "unit" contains critical or complex logic). Your goal is to know "that you broke something" - you get by if you don't always know "what you broke"

#8 Acknowledge tech debt

not everything needs refactoring. If it's not critical and nobody needs to touch it consider acknowledging it as tech debt. Add larger notes above the problematic areas and explain why you aren't refactoring it, explain things worth knowing to understand the code better, etc. Whenever you leave comments remember that comments should explain "why" not "what" the code does.

hope that helps! good luck.


Disclaimer: I don't know anything about managing a team, this advice is more for a solo developer or a developer joining a team.

#2 is what I use. After one/two hundred tiny commits (even if it's sometimes just fixing grammar in comments) you'll feel more at home. And because they're small you can achieve that within a few weeks.

This requires any way of telling that things still work. Sometimes you'll be able to do that because the transformation is an identity given the language definition, but for other cases I'd encourage #7. Having someone more senior (with respect to the codebase) review changes helps (this is what I mostly used), but I understand you don't have that option.


> "code that doesn't get touched dies" - so you want to "touch up" code as often as possible and get into a habit of small improvements.

I've seen many cases where this is far from true. Tightly and well-written back end code in a well-designed system can run for years - even decades - hardly being touched.

User-facing UI code less so of course.


I agree. That's one of the most ridiculous code tip's I've ever heard.


disagree here but in favor of simplicity of the main message removed it.


Your #3 seems to contradict nearly all the others. Most of the rest seem to be about assuming things need to change (different == bad).


The others are largely about how to successfully manages the changes you determine are necessary, as I see it, not about assuming things need to change.

If nothing needs to change, it's easy, you just try to look busy and collect your paycheck until you find a position with actual work (because eventually people will notice you aren't needed.)


Yes - basically saying: just because previous workers did things differently than you would doesn't mean it's wrong

i updated the comment to explain this (hopefully) better


First, get your tooling set up, especially a code search tool with go-to-definition and find references. A good code search tool will make you much faster and better at understanding code, finding correct usages, debugging problems, etc.

It also makes it easy to get a URL to any line/region in a code file to paste into email/Slack to ask/answer questions. (Of course, GitHub has URLs, too, but you probably aren't browsing code on GitHub already because it lacks code navigation/intelligence features, so getting the GitHub URL would add an extra clunky step.)

Here is a study of Google's internal code search tool with some example use cases and interesting stats: https://research.google.com/pubs/archive/43835.pdf. Most(?) engineers at companies with large codebases use code search frequently if they've ever tried a good code search tool (i.e., it's hard to give it up once you've used it).

(Disclaimer: I work on a tool that does this, but I'm omitting the name/URL because the advice is general.)


Lots of good comments here, but I didn't see a code search tool recommended while skimming.

I use this one: https://github.com/ggreer/the_silver_searcher

But the important thing is to be comfortable popping open the console and using it. Makes it so much easier to "research" a particular part of code quickly.


I just linked ag too, then decided to skim the comments again since it's grown to >100 and saw yours... Someone did mention OpenGrok, my company supports it but I find it less useful than ag because it's on the web and (due to company policy) gated behind another layer of authentication despite my browser being SSO'd... For monstrous code bases it's also prudent to ag on an SSD or at least make sure there's enough RAM to have most files in cache.

Another simple CLI tool I like to use is tree. (https://linux.die.net/man/1/tree) Seeing the full project layout, with everything expanded, is occasionally extremely useful.


Reading Refactoring by Martin Fowler helped me a lot. The examples are in Java but many of the concepts apply across all languages. However I would say the examples makes the most sense for statically typed languages. I wonder if anyone knows of a book that covers the concepts in Refactoring but with examples in a dynamically typed language like javascript?


I haven't read it, but the second edition of that book used Javascript for the examples.


The second edition, which was just recently published, has all examples in JavaScript.


Read a lot of code!

Spend time reading code in the codebase even if it doesn’t seem to make sense, even if you don’t think you’ll need to know that part of the code. Keep reading until it starts to make sense.

My trick has always been to just read. I even avoid tools that automatically navigate the code because I like to just read it until I know where things are.


Apart from what others have mentioned, it may help you to run some stats on the Git (assuming it's Git) repos/code. The most changed files would be the important ones. Modules/Classes with most test coverage would be important ones. Check out for God classes (http://wiki.c2.com/?GodClass)

There is no better substitute to talking to people though.


I can recommend trying out Empear’s CodeScene tool - it takes source control analysis to a whole new level and highlights many issues.

For example it will help you figure out relationships like when I add another widget here I should remember to add new switch/case clauses there and there and there.

Another nice approach is using static analysis tools to look at metrics like cyclomatic complexity, coupling and the dependency structure matrix to find the most important and troublesome parts of the code base.


Very simple input, but they have served me very well :-)

1: Make sure you have development and test environment available (including data transfer from prod -> test)

2: Source control and Easy deployment (to all environments)

3: Map the code into importance. (not all code is equal).

4: If possible spend time with the main user / product owner (especially in peak periods). You're blessed to have users, who understand your system.

And rember code that has been live for 3 years have earned a lot of expirience

PS: Typescript +1


1. Assess how much of the code is actually understood. Is there any record of the design decisions, the edge cases, the debugging process, the paths that weren't taken? Who knows the most about the codebase and how it got to be how it is?

2. What's the current specification? Don't look at the stack, look at input and output cases. How well does the code meet the spec? Where is it failing?

3. Before you change anything you need to know what the change process is. You probably do already know this, but if you don't need to find out whether there are any demarcations of responsibility, even if they're only informal and unstated areas of interest.

4. When you have all that, you can start working on the code with some knowledge of the context you - and the code - are operating in.

5. If code works, don't rewrite or refactor for style without a very very good reason. And don't do it unless you can change all the "bad" code at once. Otherwise you'll end up with a mess of incompatible idioms that make future changes hard to read.

6. Write your own docs as you go. Best case is other people will benefit from reading them, worst case is you'll remind yourself what you were doing six months from now - because you'll have forgotten by then.

If you're a junior you may not have access to all of the above, so the fall-back is to find out what you specifically are supposed to do, and where you're supposed to do it.

If that's vague or unspecified, I'd suggest studying the code to make your own model of it and then running possible actions past other team members and the client before you make the first few changes - to establish a working pattern.


Get a copy of "Working effectively with Legacy Code".

The definition of Legacy Code for the author is "untested code". The book is mainly a list of situations a code base can be in and how to add tests.

Once your application is end-to-end tested you can play with the code with better peace of mind. Just adding those tests require you to get all the specifications the app has to fulfill so it is a good way to learn them.


I just happened to write a blog post about this a while ago:

http://obdurodon.silvrback.com/navigating-a-large-codebase

Short version: document everything as you learn, master your tools, look at message and data formats before code, follow some of the important code paths, change something and see how the system responds.


Been there multiple times.

Before making such judgement, you need to be sure you know the community guidelines for that language being used, libraries, etc. You'll need to get familiar with the language ecosystem so that you're not the one doing things your way. You can not live inside your head.

You should never aim for a full refactor but rather try to refactor a module at a time. It might happen that codebase in questions doesn't uses modules but you can start to commit one at a time.

The rule I follow is: I work on a ticket and will only clean/modify the files that are relevant to the ticket I'm solving. This way you get more familiar with the codebase has you go and you might notice that some parts might actually be well written.

Do not fell tempted to wipe out the old code because it contains a good chunk of specification that got lost in past conversations and is probably crucial to the business logic and/or fixes bugs. Another funny side effect is: might be buggy but actually works according to the spec :D


Never try to refactor everything. Sometimes logic may look like it was implemented incorrectly, but from my experience it may have been written like this on purpose. Business logic can be really twisted.

Also remember about scout rule: "Leave Things BETTER than you found them.". It will help you to slowly, yet steadly improve the codebase.


> Never try to refactor everything

That's certainly true when you have a working product that just needs incremental work.

When you inherit something that doesn't work, then you really do need to refactor "everything." Most likely, problems arise from incorrect low level assumptions or design decisions.

What do I mean by quoting "everything?" This is the kind of refactoring that feels like you're refactoring everything, but in reality, you're still keeping higher-level assumptions and design decisions.

In this case, if you don't refactor away the mistakes that lead to something that doesn't work, then you'll never have something that works.

(BTW, I know this from personal experience, I had to refactor a shipping product shortly after starting the job because the shipping product did not work. Now I'm the lead architect on the product.)


Step 1: investigate. find out where the program is breaking and/or delivering unacceptable performance. This means asking stakeholders "is anything broken? What is wrong with the app? what do you want to change?", reviewing exceptions, and crash logs. Keep in mind that often times stakeholders don't know what they want - determine the business objective and then work from that. Determine where the broken code is, and write narrative comments about what you think it does. If there is documentation, read it. There often isn't any documentation, and often times the comments are useless on a good day. Resign yourself to playing a combination garbage man/forensic psychologist for the next few months.

Step 2: triage. determine which broken parts you can get away with leaving alone for the short term, and which parts need immediate attention.

Step 3: Fix the most critical broken parts incrementally. If there are no tests (there are never tests...), write a test for each block of code you modify. Avoid wholesale redesigns if possible. make sure that you write tests. Try to avoid getting mad about the previous person's style - focus on getting things working to a borderline acceptable level, writing comments to explain your decisions so you or someone else has a frame of reference. The goal of doing this is to buy yourself time to clean the entire thing up.

Step 4: Once the app is working at a baseline acceptable level, examine the codebase and determine which areas (if any) require redesigns, and determine the cost/benefit of each redesign, based on what the stakeholders need, want and expect. If any redesign is necessary, negotiate with stakeholders to buy time for it - your bargaining chip should be an additional feature or two that the previous guy shat the bed on. Basic criteria for a redesign: is the current design impossible to understand? Does the current design impose unacceptable costs in terms of performance or development time? if yes to either, a redesign is probably worthwhile.


You'll come back to the same pieces of code over and over, forgetting most of the details and context each time. If it takes you a while to figure out, write it down. If you had to use the debugger to find out what's in a map, leave a comment with an example of what the keys and values look like and where they're populated from.

Once you have added comments, it lets you hover over a function to remind yourself "This does X to Y when the deposit is a check", so you never have to read the internals of that function again when you're not tracing a check.

When you have to go 12 levels deep in the call stack to find the source of parameter Y, make a note in a side wiki so you can recover that detective work the next time.

Your knowledge of the code base grows like compound interest when you only have to figure out what each piece of code does once and can skip over it after that.


If you use an IDE, learn and exploit its code-exploration features. Use them all the time.

Do global searches whenever you aren't sure how things work. If they aren't fast enough for you, get an SSD. Use them to look for all kinds of things to convince yourself your change is safe.

Add comments as you figure things out. For example, "this algorithm seems to match the one at /src/core/foobar.js:123".

You can also add "DEBT" or "REFACTOR" to your IDE's list of TODO tags, and use it in comments where you see something you think might need cleaning up. If you mark these places now and change some of them later, you'll avoid doing damage; you'll get a chance to learn more before changing things.

Put ticket numbers (Jira, whatever) in your comments too.

Think "dig safe". You're spraying painting warnings near significant opportunities to break things.


It's not exactly a guiding principle, just more of an analysis & learning technique: Get ahold of a big piece of drafting paper, or a whiteboard, or one of those big-ass pads of paper they're always putting on an easel for silly brainstorming sessions. Something physical, big, writable, and not a computer. Set it up or pin it up or tape it up, semi-permanently, and use it daily to diagram and map out each new thing you learn about the system and its interrelationships. I'm assuming here that you have the space to set something like that up. If you don't, you're in a 3rd world programming situation and you have my sympathies, but you can always do something graphical within the computer, too, especially if you have a big screen or multiple screens. I just have always found it quicker to do it in a physical medium.


To my experience nothing works that isn't directly besides the code / isn't generated by it. So schema generation and/or javadoc-like mechanism. Nobody will read or want to maintain separate structures, especially not arbitrary tomes like wikis.


Keep the goals of the code in mind: providing its users value. It's a tool, even if you think code should be art. The code's purpose is not to look pretty or to get best marks in static analysis. It's this code that earned the company the money they are spending now on you. Clean code pays off in the long run, but might make you go bust in the short term. In the beginning of a project, it's usually a prototype and it needs to prove itself. Once it has generated some value, you can consider either spending time for quality improvements to negate the tech debt, or you re-write it with everything you have learnt from the previous version.


Run the code through in Softagram analyzer and start digging into the structures using the visual browsing capabilities in Softagram Desktop app. There is free trial at softagram.com where you can do it easily if you happen to have your codes in some Git repo in cloud services such as Bitbucket or GitHub... Shameless self promotion this is however, as I work for the company. But that approach I have also been personally using more than 10 years: static dependency analysis coupled with excellent visual dependency browsing. I think some expensive version of Visual Studio has also similar stuff available.


Document everything as you explore it. I'm an advocate for literate programming but accept it's not going to be accepted by most organizations. So I use it as a personal tool.

Tools: emacs, org mode, org babel.

Create a parallel directory structure, hypothetical project:

  ./src
  ./project/src/main.js
  ./project/src/some-file.js
Create a new directory structure with one org file per source file and one index org file:

  ./project-org/src/main.org
  ./project-org/src/some-file.org
  ./project-org/index.org
(You can organize it differently, this has worked for me.)

index.org will be a simple tree view of the folder hierarchy:

  * Project Name
  Description
  ** Source Files
  *** src
  **** [file:src/main.org]
  **** [file:src/some-file.org]
You may add in some notes about the general purpose of each of those files.

main.org copy the entire code into the main.org file like:

  * main.js
  #+BEGIN_SRC js
  // all the code from main.js
  #+END_SRC
  * [[file:../index.org][Project Root]]
Start splitting the contents of main.js into separate snippets, I don't know Javascript very well so let me make some quick C example:

  * main.c
  #+BEGIN_SRC c :noweb yes :tangle yes
    <<includes>>
    <<structs>>
    <<functions>>
  #+END_SRC
  ** includes
  #+NAME: includes
  #+BEGIN_SRC c
    ,#include <stdio.h> // [0]
  #+END_SRC
  ** structs
  #+NAME: structs
  #+BEGIN_SRC c
    // structs
  #+END_SRC
  ** functions
  #+NAME: functions
  #+BEGIN_SRC c :noweb yes
    <<some-func>>
    <<main>>
  #+END_SRC
  *** main
  #+NAME: functions
  #+BEGIN_SRC c
    int main (...) {...}
  #+END_SRC
  *** some_func
  This function will initialize a block of memory
  to be used as a shared buffer between two processes.
  #+NAME: some_func
  #+BEGIN_SRC c
    void some_func (...) {...}
  #+END_SRC
As you go through this you can make cross-references to other files and functions/structures. Eventually you'll find a smallest reasonable unit. A long, but clear, function doesn't need to be dissected. But a short, complex one, may end up with each line broken out and analyzed.

I don't just import a massive code base and do this in one go. Instead I import parts of it and break down all the related files to a particular topic ("How does X happen?"). I trace it from start to end, and then repeat with the next question. Good, modular code makes this much, much easier. The more tightly coupled, the harder it is to understand no matter the method.

[0] The comma is inserted by org babel to distinguish from it's on #-prefixed content.


I like this and have considered this approach using a git branch for annotations (although specific to using git, not familiar with other version control software). Have you done the git branch (or equivalent) approach?


I have made a new repo or a branch. Yes, but I typically keep it to myself and generate reports for others (if used at work).

EDIT: I was on mobile earlier, so extending my thoughts.

I typically make a new branch or repository but keep it on my own machine. I've gotten zero interest from colleagues in collaborating on this sort of thing, but they usually like the output. Org mode (my tool of choice, but not the only one) creates decent HTML output (you may want to play around with your own CSS or color schemes for the code blocks). So what I've done when we on-boarded a new project was to start doing this for certain critical sections that were under-documented. I then generated HTML output as a sort of white paper, and a PowerPoint deck that walked through the structure and control flow (would be best if I used flowcharts, but usually this is just text).

If we had good development machines at work, I'd definitely do the above with PlantUML or something similar to do text-based diagrams. Org will produce and embed images in the HTML output. This would make the flow for producing documentation much easier, I disliked trying to embed flowcharts created in Visio (tool available at work) into the HTML. I had to generate them, export to an image, link the image in org, and then keep it up to date manually. For a few charts it's not bad, but if you make a lot it's tedious to switch between tools and correctly export the image.

=====

For non-work stuff, I try to use literate programming from the start, but it's always solo projects so there's no "selling" this method. If I were collaborating with others, I'd have to reconsider the method. Leo has (from what I've read) an effective literate->code->literate story (that is, edit the code and the changes show back up in the literate format). Org mode can do that, but I haven't explored it. I'd really want that if I was to pursue literate programming in a collaborative environment (so that those uninterested in my method could still contribute).


Amazing, thank you for following up!

[I had constructed a reply to your comment while it was being edited so when I posted the comment was much longer and had provided more than enough detail! Revised this comment accordingly]


I came to believe that ‘bird view’ summary documentation (index.org here, readme.md elsewhere) should be created for each more-or-less isolated module in the codebase. It should describe why the module exists and how it is used, i.e. its external contract/API, including the expected ranges of argument values.

This makes it much easier to learn proper use of a module when adding new calls to it. And of course, several months down the road you'll feel like you're seeing the module for the first time, so an overview should serve you well as a reminder.


I agree. You could do what I've described to produce such documentation if it hasn't been constructed already. Which is (as a professional maintenance programmer) the situation I'm normally in (poorly documented code design, even if we have a "complete" system specification).

And even if such documentation exists, it's often useful to recreate it yourself in developing an understanding of a complex code base (or at least sections of it).


It's not really the same: documentation that's tied to code structure tends to describe what code does instead of why it exists and how it works on a larger scale. That's why I prefer (additionally) having plain-human-language descriptions separated from the code―it forces the perspective of an external user, at least a little.

This is a gripe of mine especially with inline comments that are too often as useful as this:

    // increments the counter
    i += 1
At the same time, the ‘self-documenting code’ crowd forget that code can't really describe the rationale for its existence and e.g. the expected sequence of calls to its public functions, so plain-language descriptions are still necessary even if the syntax of the chosen language approaches English.


I agree completely, but as a maintenance coder I often don't inherit good documentation. Typically, by the time it hit my shop the code "documentation" was doxygen or similar auto-generated documentation. It showed the program structure but not why it happened. When I do this I don't just tear the code apart. I explain the rationale (as I understand it):

  * can_send: () -> bool
  =can_send= will signal =true= if the conditions are correct
  for transmitting a message over the radio. Otherwise, it'll
  transmit false. Here are the conditions that it checks:
  - Condition :: description
  - Condition :: description
  If any of these are true, then we can transmit.
  #+BEGIN_SRC C
    // body of can_send
  #+END_SRC
With perhaps more levels to my org tree structure if appropriate. Perhaps one of those conditions is particularly complex, I'd give it its own explanation.

If I have a system spec, which in my field I usually do, I'll try to relate it back to the specific requirements or specification elements that this code is implementing.

  - Condition :: description, which maps to Requirement SRD-1010.
  * Message Y
  // description of the message format
  // code for packing it or the class struct or whatever
So the first pass is more "what does this do", second pass is "why does it do it". Again, it's because of where I'm coming from, always late to the party. If I were doing a project from scratch, I'd try to keep the "why's" present more than the "what's".


As someone who would rather read comments than code, I like the idea of "literate programming". But I was expecting you to be taking notes about each function, not documenting the file structure.

What's the goal of breaking the file down like this? You don't try to maintain this when you change the code, right? So it's just a one-time familiarization with the code files? But why do I need to note down "This section of the file has structs?" I can see that by scrolling or with an IDE.

Doesn't reading through one entire source file make about as much sense as reading the first paragraph of every column in the newspaper? Don't you want to read up and down the call stack of something that does something interesting, instead of a bunch of code that may never need maintenance as long as you work there?


I threw that post together in about 5 minutes at work and never came back to it.

I do describe the what and why more, but I start with the code structure because that's what I've been handed and need to understand. I also work in embedded systems where, generally, the code call tree is acyclic and sticking with this format works well (each file is often its own module with clearly defined, if not clearly documented, interfaces for the outside). If I weren't dealing with these systems I'd need to reconsider the structure.

It's not just "this section has structs". I'd start with that:

  ** Structs
  #+NAME: structs
  #+BEGIN_SRC c
  // all the struct defs
  #+END_SRC
then:

  ** Structs
  #+NAME: structs
  #+BEGIN_SRC c :now yes
    <<msg123>>
    // rest
  #+END_SRC
  *** msg123
  This struct holds messages of type 123. Here's the spec for it [some link].
  Here's a list of each component and their acceptable values:
  - type :: stored in the lower 5-bits of the first 16-bit word, should be 123
  - timetag :: stored in the second 16-bit word, represents time since midnight
    in seconds.
  #+NAME: msg123
  #+BEGIN_SRC c
    // code
  #+END_SRC
If the structs are straightforward (think a standard quick-and-dirty llnode definition), I won't bother breaking it down because it's clear (for me). If I'm communicating it to a new developer, maybe I write more.

While I don't actually develop from this code (except for personal projects), I do a sanity check. I attempt to tangle the code (generate the source output from the org files) and run a git diff. If it shows any non-whitespace differences, then I accidentally altered a line I didn't mean to. Which means my documentation will be wrong.


Took me 1 month to be productive.

Went in to a large rails codebase as a backend developer in the middle of a v1 to v2 rewrite and it had custom folders with their own conventions on where to put stuff.

All that with no documentation except for the setup.

What I did was just ask how the app works + specific workflows in front-end side of things and just connect the dots on the backend part.

Most of my troubles where on where to put modules because of their existing conventions.

Tips: 1. Ask a lot 2. Read the code 3. A debugger helps 4. Make sure you add tests for areas you touch.


When improving existing code, only do gradual changes. Don't do rewrites, don't replace existing code with better solutions in one swoop.

This way, you won't be in a situation when the new solution doesn't work in some cases and you already thrown all the code it replaces under the bus. You'll always have a mostly working app.

OTOH, if you introduce an alternative solution, finish migrating to it before beginning improvements in other places in the codebase.


On a practical level, getting really familiar with grep/similar for searching the codebase and using a debugger (I like the one built into VS Code for working with Node) will help you when trying to work out how it all fits together.

Depending on time and other constraints, upgrading the codebase to typescript would be a great way to both familiarise yourself with it and to make working with it more productive, but obviously you’d need client buy in.


1. Build the code. Don't do anything else until you can build the code.

So simple, yet so many places get it wrong. Lazy devs check in code that is broken. Project structures that depend on you having things on your machine and in a particular place. Circular references. You name, I've cursed it.

If it's a project that you should be able to check out and build locally, then you should be able to check the code out in any directory and build it. Period.


Are you able to talk to the original author or the most recent maintainer? If so, I'd spend a couple hours reading the code on my own to get a basic familiarity, then sit down with the original author and ask them to give you an overview. They'll probably even still remember which parts they consider bad or hacky, and knowing about that stuff could save you a lot of trouble.


It's a three-year-old code base? That's good. It means that you can probably talk to some of the authors, to figure out what they were trying to do. A big 20-year-old code base is worse, because many of the original authors are gone, and there's been a lot more time for it to be patched by people who didn't understand what the original author was up to.


Has anybody tried software intelligence tools? Seeing the architectural components and control flow of the code seems like a quick way to document the codebase and figure out which “clusters” to study. https://en.m.wikipedia.org/wiki/Software_intelligence


Git blame is one of the best tools for understanding the history and archeology of a large legacy code base.

There is nothing worse when you're trying to do solve some problem that you discover a giant reformatting commit typically instituted by a youngish developer.

It obviously doesn't remove the history, it just makes it so much harder to actually find out why the code was written as is.


What is your problem with the code base actually?


> What is your problem with the code base actually?

All OP has told us about the codebase is that it's big, it's in Node, and it's 3 years old. It doesn't sound like they think there's any problem with it.


I'd suggest the book [Working Effectively with Legacy Code](https://www.amazon.com/Working-Effectively-Legacy-Michael-Fe...).


Before modifying an existing line of code, understand what it's purpose is/was. Even if it doesn't appear to have one, or make any sense, someone created it for a reason.

Also IDE/search tools for determining where a function is used are great for removing stale, unused cruft.


If you have the time, make sure that you have adequate coverage from a suite of automated tests. This goes doubly so for a dynamic language like javascript. This will free you up to make changes and will make the whole process of introducing changes less nerve wracking.


You beat me to the punch. Yes if I'm hired to work on a large, buggy, and antiquated code-base, the first thing I'd do is build tests. Many of them, both unit tests and big functional selenium-style tests. I would definitely try to setup some coverage analysis, but I wouldn't be religious about it. The focus should be to cover popular features, not lines of code.

With those in place, you can start chopping away left and right. Even if you make big changes, when you see all of those tests pass green (or blue if you're on Jenkins), it gives you a level of comfort that no careful reading of the code can supply.


If you inherit a big code base my suggestion is to always spend a lot of time going through the tests before jumping into the code. Learn about the business logic. What things have a lot of coverage? What broke a lot of caused regressions? You will learn so much.


Unit tests / automated tests are critical.

If you're lucky, you will have a suite of unit tests and automated tests with high code coverage. Rely heavily on these tests as you refactor and debug.

If you don't have working tests, consider writing them before making any major change.


That's nothing. Try a 10 year old PHP codebase with remnants of a failed GWT integration.


Oh dear. I wrote a GWT app at one point (and recently replaced the client with a new React+Redux implementation).

How would you even go about integrating GWT and PHP in the first place?


Fear large changes to existing code. If you come into a large existing code base with a mindset of a smaller one you will cause 3 bugs for every one you fix.

Instead always try to make the smallest possible change to achieve your end goals(bug fix/feature etc).



focus on outcomes. have an objective like better monitoring, better perf, better code coverage, and use that as your guiding principal. make sure any change benefits the customer or your team. avoid refactoring just to modernize, every change is a potential regression. and what is in style now will be legacy in due time.

once you decide on the objective, try to divide components up into interfaces so you can test and refactor a component with reduced side effects.

set a goal for each objective. like improve coverage 5% a month or no net neg.

long story short, be methodical and focus on outcomes


The only thing it gets in my nerves is lack of comments and documentation. The fallacy that code can be remotely as easy to understand as natural language text. The rest I can endure and tolerate.


Get familiar with your debugger. You may be surprised by how deep in a call stack a bug can live. It will also make you intimate with the different ways your codebase is orchestrated together.


Debuggers are your friend.


... singular?


If you need to maintain and extend the project I'd consider converting the JavaScript code to TypeScript with strict compilation option enabled.


I have been there for couple of times now. First thing, I look into it to how to get refactor if i see something here.

Luckily, I had one of the best mentor, who taught me how to refactor the any piece of code without failing.

It is really works,If this code base test cases. If doesn't have test cases, I think we have take one step back and do small functionality at time and move on from there.

Bad of big code repositories is mostly, they are still using some version of framework which is most uncomfortable to new bee for getting their head around.


Just squash every commit from the repo into a single commit with message "Legacy code" and force-push to master.

-David Winterbottom


OpenGrok supports JS and helps to navigate the codebase easily, in turn helping you to reason about it. Good luck!


Rule #1 use the boyscout approach: clean where other left dirty stuff.

Sure, many here would argue that "never touch a running system" is a gold rule, but what if the running system is running in the wrong way?

If everyone attain her/himself to this rule, there wouldn't be what we call innovation.

Clear example: why develop Windows 7, 8, 10 / Mac 10.10,.11,.12.. if their predecessors were working fine?


my #1 working with existing code bases is you need to get debugger working. nothing can explain the processing workflow better than tracing the execution.

also having any kind of schemas, protocol definitions and tests can help grasp whats happening in the system.


You can't really. Especially in JavaScript world where velocity is everything. You'd spend at least ~1 year if it is really large and hit all kinds of hacks in the form of callbacks within callbacks within callbacks that are tuned in a way that (mostly) works. I'd recommend you to run away, seriously.


Learn to use grep well.


- put a watch on the app dir filesystem

- put a watch on the network


Same exact boat. Case sensitive search.


I think this was downvoted because JS is case INsensitive. But they are talking about large code bases modified over years. In my experience, especially with TypeScript, case sensitive search is better for figuring out what you need in this case.


Devil advocate:

I made so many rewrites that it can be say is most of my job.

My first real project was move a FoxPro DOS app to Visual FoxPro. I probably not read any (old) code at all. The next, move torwards a N-Tier with SQL Server (that was circa 2000). Then move in other company a Fox desktop app to ASP.NET 1.

And that's is only counting my first years.

----

How I could tackle this WITHOUT READING THE OLD CODE?

TALKING TO THE OLD DEVELOPERS!

If is possible for you, let them talk about how the old app work. Even better, let them (or anyone in marketing, support, etc) explain the problems that this app solve, and the new ones this app have failed to solve.

I have the luxury to mostly work with apps with a RDBMS in the back, and rarely fancy (and NASTY) architectures like micro-services. Understanding a RDBMS is orders of magnitude easier than the codebase:

https://quotes.yourdictionary.com/author/fred-brooks/31361

    Show me your flowcharts and conceal your tables,
    and I shall continue to be mystified. 

    Show me your tables, and I won’t usually need your flowcharts; 
    they’ll be obvious.
-- Fred Brooks

So, before get deep in code: MAKE A DATABASE, and PICTURE THE FLOW OF DATA.

Sometimes, is even possible to cut a massive amount of (legacy) code that is the result of a iterative development (under pressure and without planning) that result in a terrible "flow of data". Fix the flow, fix the schemas, and suddenly the code is short and easy!

Also:

Complicated software are infections of a complicated business requirements (ie: company). When something is a pile of mud, NOT FIX THE MUD.

Fix the business requirements until it get easier to handle. This also lead, most of time, to a massive reduction in messy code bases.

Also:

You have applied any of the sensible advice elsewhere. Instead of rewrite, you make testing and all that.

IF YOU FEEL IS STILL TERRIBLE AND YOU KNOW IN YOUR GUT YOU WILL GET STUCK HERE FOR ALL THE ETERNITY

cut that code without mercy. Not push along when you have, proved is a dead end. I made a mistake like this with a rewrite from a iOS App made by a consulting firm and lost 6 months(!!!!) trying to be reasonable.

This cost me the contract? You bet it. However, In the last 2 week I remove almost 60-70%(?) of the code and rewrite it to be more along the Apple guidelines. I still lost the contract but the next team? Finish it in a month.


Grok.

Make a small change.

Loop.

-

When you enevitably come across a trade off, choose the one which is easiest to change later.

Everything else is just noise.

Large changes in legacy systems are often suffer from the second system effect among other problems.


comment


aakybe How yus i do,you now pieng wued vase is just is code us fade1jsyeb ask?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: