I recently came across this artice: https://en.wikipedia.org/wiki/Overfitting Al...

gautamnarula · on June 19, 2018

I really like this extension of the concept of overfitting to codebases in general.

I especially noticed this in libraries/packages that were "community owned" in a company--instead of one team owning the package and being the authority on deciding the long term roadmap and communicating with other teams about feature requests, deprecations, documentation, bug fixes, etc, the community at large, where "community" was very broadly defined as a team that for whatever reason had an interest in using/maintaining/adding onto the package, would collectively own the package.

Naturally, the result was exactly the scenario you described. Each team hacked on their own bit of functionality for their specific purpose, while doing their best to not affect or break the increasingly precarious tightrope of backwards compatibility. There was no long term architectural vision, so there was a definite need for refactoring--and yet no team had the incentive to invest the amount of time needed to do that.The documentation was woefully incomplete as well, and few people understood how the entire thing worked since each team would only interact with their small fraction of the code.

tomelders · on June 19, 2018

Two principles I live by (much to the annoyance of my bosses)

1. Don't fear the refactor. 2. If you don't want to rebuild your entire application from scratch, don't worry, a competitor will do it for you.

There's nothing wrong with creating something in increments. It's the fear of revisiting something that destroy's a code base.

vincentmarle · on June 19, 2018

Your bosses might be right.

Technical debt, much like regular debt, can also be used as leverage to quickly gain a competitive advantage. While your competitors are busy refactoring/rebuilding perfect applications without hardly creating any more customer value, the scrappy startup that writes piles of spaghetti code might be building exactly what customers want.

Code quality != business value.

mikekchar · on June 19, 2018

While this is clearly true (and is exactly what was being described when "technical debt" was coined), the unfortunate reality is that we often take on huge amounts of technical debt in order to fund the equivalent of pizza parties. Having eaten all the pizzas, we then have to pay back the debt and frequently the company can't afford it.

This is one of the reasons why you must not fear the refactor. Sometimes you need to get that code out the door because the business requires it. Then you need to pay back the debt -- by refactoring that mess every time you touch it in the future.

There is no such thing as "technical inflation" to magically wipe away our debt. It's important to have good lines of communication so that the business doesn't get used to squeezing development in order to eat pizza (because, why not? It's free!)

tomelders · on June 19, 2018

Piles of spaghetti code will give your customers what they want today, but rob them of the features they want tomorrow and make every future feature orders of magnitude more expensive to develop than they should be. That's the interest you pay on Tech Debt.

Much like regular debt, if you don't repay it, you go out of business and end up penniless.

guiriduro · on June 19, 2018

> Code quality != business value

I don't think that's a given: in some circumstances code quality absolutely is business value. It might be better to say code quality can be, but isn't always, business value. As ever, context is the deciding factor.

blauditore · on June 19, 2018

Well, I would say technical debt is similar to the classic kind of debt: It may give you short-term advantage (liquidity), but on the long term, there's interest on it. If not paid off, it grows exponentially.

So yeah, technical debt can be used as a tool, but it doesn't come for free.

williamdclt · on June 19, 2018

I really don't think so, apart from exceptional case (if you're selling your code to another dev maybe), code quality is never value to the user. That's not to say that good code quality is useless of course, but the usefulness of code quality is not in the business value.

sudhirj · on June 19, 2018

Technical debt is already has similar business concept in "expensive" money - Funds that you raise from VCs while in distress on bad terms because there's no other way to do what needs to be done fast enough. Programmers paid with expensive money trying to argue that they need more time to write high quality code because 'future' will seldom win that argument.

rlkf · on June 19, 2018

I think this is actually a better analogy than debt (notice that equity and debt are on the same side of the ledger); just as future valuation of the company is uncertain, so may also be the business value of the quick hack. I.e. in the same way that a certain VC investment may or may not be wise, the quick hack has the same uncertainty attached.

Add to this that the business people may have a bad grasp of the true cost of the hack, and the developers little insight into the business value of it, you get the current situation.

varjag · on June 19, 2018

And when you clearly see that the whole product was a dead end, you can default on your technical debt. Saving you untold man hours.

Ace17 · on June 19, 2018

> Technical debt, much like regular debt, can also be used as leverage to quickly gain a competitive advantage.

Unlike regular debt, technical debt is extremely hard to quantify.

You can't balance a business strategy if you can't estimate how much you're going to pay.

dalore · on June 19, 2018

What tends to happen in reality, after the code gets 7-8 years or more long and it's always been piecemeal and spaghetti code then each change is exponentially more difficult to make.

There is the story by Robert C Martin about the company that made a really good C debugger back in the day. Then C++ came out and the company promised to make a version for it. Well months came and went and eventually they went out of business. Because the first version of the debugger they wrote was awful code it made changes really hard to mark and so they couldn't adapt to the changing market.

hackits · on June 19, 2018

Business is mostly a math's problem and most programmers don't really understand why they go to work.

waynecochran · on June 20, 2018

Amen. Good enough working code gets your foot in the door. You pay later, but at least there is a later.

rimliu · on June 19, 2018

This view has a danger that some understand this as "you never have to pay your debt back". But if project lives long enough to be successful you end up painting yourself into the corner where you cannot change a single thing without breaking something.

matwood · on June 19, 2018

> 1. Don't fear the refactor.

Like most things in life, there is a balance. I have argued against large refactors many times. Often wanting to do a refactor is just a thinly disguised excuse to use some new technology (I'm as guilty of this as anyone else). Anytime a refactor comes up my goal is to figure out why:

1) What will the refactor fix?

2) What will the refactor potentially break? Are there tests around critical functionality?

3) Does the group proposing the refactor really understand the ins and outs of the application? When new people come into a system they often want to change it to fit their mental model of the problem, and miss subtleties of why the system is a certain way.

That being said, I evaluate small refactors anytime I have to touch a piece of code.

kochthesecond · on June 19, 2018

I am more inclined to your sentiment. Now there is no excuse for badly formatted code and being a lazy slob, and I never use the word refactor in the sense it is used here.

I often _redesign_ old code to meet new requirements and to support new features, but I would not call it refactoring.

I always strive to leave the code better than when I found it. But I would not name it refactoring.

hyperpallium · on June 19, 2018

oblig. https://www.joelonsoftware.com/2000/04/06/things-you-should-...

nicodjimenez · on June 19, 2018

"It's the fear of revisiting something that destroy's a code base." So true.

thomasmeeks · on June 19, 2018

Yep, when fear creeps in around modifying a part of an application it is time to have a very serious conversation about fixing that. It is one of the few cases where I find the refactor vs. creating customer value argument is more clear cut -- letting that fear linger is likely to spread to other parts of the code & turns into a human problem pretty fast.

Fixing might be a presentation, tests, documentation, refactoring, rewrite, deprecation, whatever. Just don't let it languish and the fear grow.

scalesolved · on June 19, 2018

I wholeheartedly agree! Companies that delay tackling technical debt still ship features, they get slower and more error prone development as time passes. As they still keep shipping they can fail to see how much faster they'd be shipping 6 months down the line if they tackle debt which adds weeks to each feature being developed.

I've expanded on these thoughts before on my blog about technical debt inflation if anyone is interested https://scalabilitysolved.com/technical-debt-inflation/

collyw · on June 19, 2018

Indeed. I used to be scared of database changes in case something went wrong. Now I realise the worst thing to do is to hack code on top of a poor database design to make up for it. That usually ends up far worse.

taurath · on June 19, 2018

And one can extend that to businesses as well. How many established companies have been laid low by someone with a new process built in a more modern foundation.

jopsen · on June 19, 2018

It would be interesting if someone actually had data on this?

I suspect this is something "software engineering" researchers might study.

taurath · on June 19, 2018

Whatsapp is a prime example in the tech vs tech space - ride-shares vs taxi services, automated freight loading, fedex vs ups in terms of automating their package sites. An old factory with 1000 workers not being able to compete on a cost basis with a new automated one is the story of the last 50 years I feel.

jdwithit · on June 19, 2018

Agree with the other commenters, very interesting insight.

It maybe doesn't fit the metaphor quite as well, but as an operations person, I've frequently run into the "underfitting" problem. For example, we run Chef to manage our physical and virtual infrastructure. There are a ton of community-authored Chef cookbooks available. Which at first blush, sounds great. But often, they have grown over time to become these awful hydras that try to be all things to all people. PR after PR has added support for the specific use case of every organization that wants to run the cookbook in their own special way. The "Getting Started" section of the README eventually becomes a dumping ground of 900 attributes you need to set correctly, and yet somehow it still doesn't quite perform how you'd like.

In many cases, we've tried to use community cookbooks and even merge our own customizations back upstream. Only to eventually give up and write our own version that's 50 lines of Chef DSL/Ruby instead of 5,000 but does exactly what we need, the way we need, and no more. It's very possible to make a system too generic and configurable, to the point where it loses all meaning.

mwaitjmp · on June 19, 2018

Found the exact same thing regarding the community cookbooks. We do use some though, it depends on the complexity and how well they work. I've either written some from scratch, taking pointers from the community ones or forked them to make them simpler and better suit our needs. Pull requests have been made where it makes sense.

Glad to hear we're not the only ones who found the community ones not perfect for every need.

rusk · on June 19, 2018

> There are a ton of community-authored Chef cookbooks available. Which at first blush, sounds great.

Welcome to software development! Not as easy as it looks is it :)

EDIT you may find these articles helpful (or at the very least food for thought):

- https://blog.codinghorror.com/dependency-avoidance/

- https://www.joelonsoftware.com/2001/10/14/in-defense-of-not-...

mlthoughts2018 · on June 19, 2018

The problem with the analogy is that for a learning algorithm, there are clear definitions of the model complexity as it relates directly to the outcome being optimized. YAGNI applied to a model is a penalty term for parameters or various methods of regularization.

But when the “goal” of the system is just “arbitrary short term desires of management” you can easily point out the problems, but there is no agreement on what constraints you can use to trade-off against it.

Especially for extensibility, where you can get carried away easily with making a system extensible for future changes, many of which turn out to be wasted effort because you did not end up needing that flexibility anyway, and everything changed after Q2 earnings were announced, etc.

In those cases, it can actually be more effective engineering to “overfit” to just what the management wants right now, and just accept that you have to pay the pain of hacking extensibility in on a case by case basis. This definitely reduces wasted effort from a YAGNI point of view.

The closest thing I could think of to the same idea of “regularizing” software complexity would be Netflix’s ChaosMonkey [0], which is basically like Dropout [1] but for deployed service networks instead of neural networks.

Extending this idea to actual software would be quite cool. Something like the QuickCheck library for Haskell, but which somehow randomly samples extensibility needs and penalizes some notion of how hard the code would be to extend to that case. Not even sure how it would work...

[0]: < https://github.com/Netflix/chaosmonkey >

[1]: < https://en.m.wikipedia.org/wiki/Dropout_(neural_networks) >

nkozyra · on June 19, 2018

Overfitting is a quantifiable problem. If you're not doing robust data segregation and CV you're not even engaging in elementary ML practices.

janekm · on June 19, 2018

Only if the training data you got is representative of all future use cases. Good luck with that.

eanzenberg · on June 19, 2018

You can segment the validation to be data after a certain date, and train on data before that date. You get an accurate sense of how well the model will perform in the real world, as long as you make sure the data never borrows from the future.

thousandautumns · on June 19, 2018

That only ensures your model is accurate assuming real world parameters remain the same, which again, is prone to overfitting.

To use a real world example, financial models on mortgage backed securities were the root cause of the financial crisis, because they were based on decades of mortgages that were fundamentally different than the ones they were actually trying to model. Even if someone was constructing a model by training on data from say, 1957-1996, and validating using 1997-2006, they would have failed to accurately predict the collapse because the underlying factors that caused the recession (the housing bubble, prevalence of adjustable rate mortgages, lack of verification in applications) were essentially unseen in the decades of data prior to that.

Validation protects against overfitting only to a certain degree, and only to the extent that the underlying data generating phenomena don't ever change, which, in the real world, is generally a terrible assumption.

sethrin · on June 19, 2018

I'd probably put fraud ahead of models as the root cause. The entire purpose of those securities was to obscure the weakness of their fundamentals.

nkozyra · on June 19, 2018

That's not hard and fast, though. While no model is perfect, robust models can "handle" outliers. Worst case, you know when it happens and train with more a priori.

nerdponx · on June 19, 2018

Worse case? More like best case.

It's not about outliers. Let's say you're at a startup and you fit some model to your first 30 customers. It works great for your next 10 customers, but fails dramatically for your first enterprise client. Why? Because the enterprise client was fundamentally different from your previous 40 customers. If you fit your model on a population in which the relationship looks one way, then try to apply your model to a population with a different relationship, it will fail.

Machine learning and statistics are both application of the same principles of probability and information theory. They work (for the most part) by modeling the world capturing the relationships between random variables. A random variable can be any natural process that we can't express in precise terms, so we express it in probabilistic terms.

This is the same principle underlying the premise that "past results do not guarantee future success." The relationships between random variables in the world that affect success in anything -- stock market performance, legal outcomes, etc. -- might not be the same tomorrow as they are today.

And that's not even a matter of overfitting. That's just your ever-present real-world threat of having all your modeling work invalidated by forces outside your control. Overfitting happens when you, the data scientist, fit your model to random noise in the training data. An overfitted model will have bad generalization performance on held-out samples, even from the same population. It's not always easy or possible to detect overfitting, especially with small training sets.

nostrademons · on June 19, 2018

What's the problem with that, though? Startups are usually advised to service one market, not several. If your first 40 customers were prosumers but then you have a prospective enterprise client, the logical response is say no to the enterprise client and go after another 60 (or 60,000) prosumers.

Or at least understand that you're entering a new market and budget appropriately for development. Usually, if you're switching from between prosumer -> enterprise, you are very, very lucky if the sum total of changes you need to make is training a new machine learning model. To start out with, you usually need to get used to sales cycles that take 6-18 months, hiring a dedicated sales guy to manage the relationship, and handling custom development requests.

nerdponx · on June 19, 2018

There's no problem with it, but some very intelligent people don't seem to realize that you can't just "use machine learning" and predict whatever you want. It's gotten better over the last few years, now that it's less new and magical than it used to be, but I still see it happen now and then.

yetanother1980 · on June 19, 2018

Hopefully your analysts (which in this case includes your lawyers, accountants and statisticians) will tell you that the new client is different to the others and your models may not hold up and may need revision.

Hopefully you also listen to them.

tomrod · on June 19, 2018

Close. Extrapolation is possible using structural theories rather than only reduced form models.

nerdponx · on June 19, 2018

Only if your structural theory is not-wrong enough.

Even if you KNOW that your model is not-wrong in the right direction and within acceptable orders of magnitude, how do you fit the parameters for that structural model? You need some kind of data, even if you're just using anecdata to pick magic constants.

tomrod · on June 19, 2018

All models are wrong, some are useful.

Fortunately models like these are often testable across many contexts, amenable to metastudies, available for calibration, etc.

nerdponx · on June 19, 2018

That's my whole point. You just asserted that you can extrapolate outside a training set with a structural model. I am asserting that those "many contexts" and "metastudies" amount to a bigger, more representative training set.

juliend2 · on June 19, 2018

What do you mean by CV? I'm not familiar with those terms. Thank you.

nkozyra · on June 19, 2018

As sibling points out, cross validation, which is the front-line approach to avoiding overfitting for supervised classification problems.

thousandautumns · on June 19, 2018

It means cross validation. It essentially means is a way of simulating how well your model will do when it encounters real world data.

When building a model, you divide your data into two parts, the training set and the testing set. The training set is usually larger (~80% of your original data set, although this can vary), and is used to fit your model. Then, you use the remaining data you set aside for the testing set by using your model to generate predictions for that data, and comparing it to the actual values for that data.

You can then compare the accuracy of the model for the training and testing sets to get an idea if your model generalizes well to the real world. If, for example, you find that your model has an accuracy of 95% on the training data, but 60% on your testing data, that means your model is overly tuned into features of the data used to build the model that may not actually be helpful for prediction in the real world.

lostcolony · on June 19, 2018

Never seen the acronym (not really in the space) but I assume cross validation.

sbhn · on June 19, 2018

Camouflaged Vacuity

DEADBEEFC0FFEE · on June 19, 2018

I assumed Code Versioning so that if you have robust data segmentation you have less uncertainty about the impact of change. However, I'm a tourist here and hope OP comes back to share.

tomrod · on June 19, 2018

Cross-validation: testing model fit on non-training data

BenFrantzDale · on June 19, 2018

I assumed Computer Vision.

logicallee · on June 19, 2018

Fantastic insight, really top-notch.

Just some random thoughts in no particular order - curious what you make of them:

- On the subject of incremental piecemeal changes over time with no requirements: don't you all find that in your workflows (when you're doing something for yourself), it is hard to step back and "architect" something? It is easier to just let it evolve.

- Likewise it takes real work and thought to organize something as simple as a spice rack. (I just keep opened packages of spices in the cupboard.) The knowledge that company is coming is one of the few pushes. But it kind of feels like it's being done for show.

- It's hard to add architecture when you know there's no team that is coding against it as an API. It's just you. It feels like that extra power is, kind of wasteful.

- The other thing is that it may be the case that you know there is some deeper level of architecture. In the case of my spices, for example, most of the opened spice packets I mentioned are actually mixes. (Such as grilled chicken spice mix.)

- If I had to architect my own spice rack, I should start by learning which spices I'm actually using more of. And since what I'm doing works, I don't actually care. Plus, it would be a step down: the first time I mixed my own spices, I would probably end up with a worse dish than pouring some out of a premixed packet.

- The first time you architect a "proper" framework rather than let your machine learning algorithm "overfit", the result is probably demonstrably worse.

- That is a lot of pressure on not architecturing, and just continuing to (over)-fit.

kthejoker2 · on June 20, 2018

This is where good logging helps.

The lifehack is to throw all your spices in a box and only pull hthem out when you need them and then leave them on the rack. Then throw away any spice you haven't used in n months and add it to a blacklist. The ones you use frequently should be prominently displayed and texted with extra care and possibly set up for autorenewal from the grocery.

Only introduce new spices when there's a recipe, and buy just the amount you need.

So too with code. Log your code paths, prune little used features, optimize the hell out of the most frequently used ones, introduce features sparingly and with purpose...

I like this spice metaphor, thabks for it.

jacob019 · on June 19, 2018

well-constructed != over-architected

hyperpallium · on June 19, 2018

Epicycles within epicycles eventually get replaced with a clean redesign (https://wikipedia.org/wiki/Paradigm_shift)

The tricky bit is mostly that you need a new theory of the data to have a better abstraction. That's the tricky bit.

Models generated by DL lack even a paradigm or theory or abstraction.

randomsearch · on June 19, 2018

This is a brilliant insight.

One of the problems I’ve seen in research into technical debt is the lack of a good definition. This insight could form the basis of one.

phonebucket · on June 19, 2018

"Given how much poor coding practices resemble machine learning (albeit in slow motion), it's hard to hold too much hope about what happens when you automate the process."

Your whole argument seems to be based on your personal experiences. Perhaps it is also thus vulnerable to some sort of overfitting :)

matchagaucho · on June 19, 2018

Hopefully code reviewers with institutional knowledge can advise on where to apply pruning and prevent code overfitting.

Pruning is also the common ML practice to prevent statistical overfitting.