> One of the main challenges when dealing with technical debt has been the lack of a way to measure it. To help overcome that problem, CISQ/OMG led the development of an Automated Technical Debt (ATD) measurement standard, which is currently being updated with a new version expected in 2023.
I'm highly skeptical about the tech debt measurement algorithm this article purports to be developing.
Google researchers recently published a paper on their attempts to measure technical debt.
They tested 117 metrics that were proposed as potential indicators.
Regressions were used to test each metric to see whether it could predict an engineer’s perceptions of technical debt.
No single metric or combination of metrics were found to be valid indicators.
"I'm a bit pessimistic that much of this research is being driven by large orgs in collaboration with researchers who aren't sufficiently independent."
I'm one of the co-authors of the study. I think your sentiment is valid but what you describe is true for most fields of research: conflicts of interest can be a problem.
I can attest to the fact that the researchers behind this study have extensive backgrounds in academic research and hold themselves to high standards. If nothing else, not doing so risks putting individual reputations on the line.
Thank you for the response! Though I think in taking the rest of the context of my criticism away you've avoided the main point of it, which isn't conflict of interest, but whom your research serves to benefit. For example, it would be a shame if cancer research were carried out only on rich or educated participants. That doesn't mean that you shouldn't work with the population you have access to, but perhaps you should give some thought to how you can broaden participation.
> Developer experience encompasses how developers feel about, think about, and value their work.9 In prior research, we identified more than 25 sociotechnical factors that affect DevEx. For example, interruptions, unrealistic deadlines, and friction in development tools negatively affect DevEx, while having clear tasks, well-organized code, and pain-free releases improve it.
"You could look at literally any objective measure to proxy actual productivity and be better off than this"
It's fairly well-established in research (and in practice) that there is no objective measure of developer productivity. Metrics like lines of code, number of pull requests, velocity points are incredibly poor proxies.
Lines of code is a much better proxy than reported productivity.
It stops being a good proxy if you use it to reward or punish developers, compare different languages, or different types of software. But if you don't do those, it's quite good.
Velocity points is worse, because the current culture implies it will be used to reward or punish developers. But it's probably still better than reported productivity.
In fact, reported productivity is probably one of the worst metrics around. People can't even keep track of how long they spend programing, much less of how much they accomplish.
>>>> It's fairly well-established in research (and in practice) that there is no objective measure of developer productivity.
Oh, there is.
Money.
How much do you spend on eng to earn a buck (eng value, infrastructure). Impact of your next project on the bottom line. How much can you save by making a change.
So many bullshit features from engineers, and product people that have FUCK ALL impact on the bottom line. But hey they made you happy, or looked good on your resume to get you the next job...
If you read the introduction of the paper, you'll see that the aim of this paper is to give managers and developers concrete data to use to help get buy-in on investing in developer experience from business leaders.
I'm one of the co-authors of the study. Your critique is valid though by research standards, for this type of study, our sample is sufficient. We are planning to replicate this study on a larger scale in the future, though!
Are there any plans to figure out objective ways to measure productivity and what distinguishes “good devex” from “bad devex”?
I’ve worked at a lot of big tech companies that do surveys about internal tooling and every year it’s rated as a weak spot, across years and companies this seemed like a consistent trend.
And yet everyone had teams dedicated to improving various aspects of devex so it’s unclear if these teams are just improving the wrong things or if productivity really is improving and it’s something else (eg the amount of code debt grows faster than devex improvements or people are asked to go faster than the devex improvements can keep up or the devex is being improved but the size of the survey means not enough people feel it because you optimize smaller subsets of engineering orgs).
That’s another thing to be mindful about large scale and small scale surveys - the latter might be sampling specific teams adopting the tool whereas the former might find there’s no way to make everyone happy and it all turns into a wash.
"Are there any plans to figure out objective ways to measure productivity"
You can't measure developer productivity objectively, assuming you're referring to metrics like lines of code, number of pull requests, or velocity points which are infamous. There's broad agreement on this both within the research community as well as practitioners at leading tech companies.
> You can't measure developer productivity objectively, assuming you're referring to metrics like lines of code, number of pull requests, or velocity points which are infamous. There's broad agreement on this both within the research community as well as practitioners at leading tech companies.
No those are metrics you’re suggesting. There are better ones as someone else mentioned (time to get code from PR to production, some way of measuring the quality of work getting to production, etc). Yes, the obvious metrics are poor and better metrics are difficult to measure and quantify. And obviously no single metric is going to capture something as multidimensional as code development.
Also, the link you reference doesn’t support your argument.
> Across the board, all companies shared that they use both qualitative and quantitative measures
Throughout it discusses that people do use quantitative metrics to help guide their analysis and none of them try to do the obviously naive ones as mentioned.
This isn’t intended as a critique, but as an engineering profession anything that isn’t quantifiable means it’s open to interpretation and argument weakening forward progress to be restricted to what becomes adopted as industry standard which is in many ways more of a popularity contest of fads rather than concrete technical improvement.
Asking people whether the developer experience is good or bad is not going to be the most efficient of approaches: It's ultimately asking for a mood. When teams are asked what they are spending a lot of time on than they wish they shouldn't, you can at least see what are the heavier pain points. It doesn't help if your developer experience budget is zero, but it can at least organize the useful alternatives.
In most places I've worked at, the a survey asking for specific pain points gets great results, because the worst time sinks stick out like a sore thumb, especially if you have workers that have worked in high quality organizations.
Those places did detailed surveys but the results were uninteresting; it’s always something abstract like long compile times, long CI times, etc. then the next year the company presents all the concrete ways they made things faster and the same themes repeat.
There are three problems. The first is that people who can accurately point out the problem are outweighed by a bunch of people who are just unhappy and generate a response for the sake of participation / being prompted. This means that you can address the pain point you think you’ve identified only for sentiment to remain unchanged so you try to tackle the next point and the cycle repeats.
The second is that things like “slow compile times” may have hundreds of different reasons so if you improve compile times in aggregate by 20% you’ve not solved anyone’s specific day to day compile time pain point where they normally spend many 10s compiling which sees a reduction to 8s but the expensive compile which runs infrequently (both because less needed or because it’s so slow) takes 1-4 hours. It’ll maybe see a reduction into the 48min-3.2 hour range which is substantial but not enough to be felt or it may be unaffected because the improvements aren’t measuring that slow build and it’s not a focus (eg maybe it has a bad dependency chain that pulls in way too much). The causes of why that’s slow can be hard to tease out correctly and engineers are incentivized to make the “biggest bang for the buck” changes that sound impressive and quantifiable (20% reduction in compile times across the company vs I made this team happier and it’ll maybe show up in an end of year survey if I’m lucky)
The third is that the rate at which certain kinds of things get worse (long CI and compile times) keeps up or usually outperforms the pace at which things get better (eg 1000 developers adding to compile and test times cannot be beat by a team of 10 engineers spending their time on speeding things up).
Having been on the teams that improve developer experience, the problem is that one of my hands gives while the other takes away. I can address every complaint a developer has about the company platform, but at the same time requirements change. As a company grows they start caring more about security and firewalling between different data and services which makes developing harder and more annoying.
For your first question about good devex, there are definitely some objective ways to measure it.
* The time it takes for completed code to be deployed to production
* count of manual interventions it takes to get the code deployed
* count of other people that have to get involved for a given deploy
* how long it takes a new employee to set up their dev environment, count of manual steps involved
Having done internal developer & analyst tooling work (and used DX), this type of survey is great for internal prioritization when you have dedicated capacity for improvements.
I'd be curious to see more about organizational outcomes, as this is piece of DevOps/DevEx data that I feel is weakest and requires the most faith. DORA did some research here, but it's still not always enough to convince leadership.
For the uninitiated among us - can you share more context on the research standards and the reasoning behind it? I'm interested and would like this to influence some decisions I have but would like to understand the confidence here :).
Accelerate - this book has become an excuse for managers to spend outrageous money implementing metrics like lead time and deployment frequency to measure teams, whereas the book actually advises something very different.
Logz.io comes to mind: https://docs.logz.io/user-guide/accounts/account-region.html. I’m sure I’ve seen configuration for other services, but haven’t cared as much since usage was sufficiently low and latency-insensitive that the region didn’t matter.
I'm highly skeptical about the tech debt measurement algorithm this article purports to be developing.
Google researchers recently published a paper on their attempts to measure technical debt.
They tested 117 metrics that were proposed as potential indicators.
Regressions were used to test each metric to see whether it could predict an engineer’s perceptions of technical debt.
No single metric or combination of metrics were found to be valid indicators.