There's a lot more that can be said about how things change as the scale of a software system grows.
Every order of magnitude increase requires a new level of discipline. At 10^3 lines, you can do whatever you want -- all your function and variable names can be one or two letters, you don't need comments (or indeed any documentation), your functions don't need well-defined contracts because they only need to work in a few cases, etc. etc. At 10^4 lines, if you're smart you can still get away with making a mess, but it starts to become helpful to name things carefully, to add a few comments, to clear away dead code, to fuss a little over style and readability. At 10^5 lines, those things are not just helpful but necessary, and new things start to matter. It helps to think about your module boundaries and contracts more carefully. You need to minimize the preconditions of your contracts as much as practical -- meaning, make your functions handle all the corner cases you can -- because you can no longer mentally track all the restrictions and special cases. By 10^6 lines, architecture has become more important than coding. Clean interfaces are essential. Minimizing coupling is a major concern. It's easier to work on 10 10^5-line programs than one 10^6-line program, so the goal is to make the system behave, as much as possible, like a weakly interacting collection of subsystems.
There's probably a book that explains all this much better than I can here, but perhaps this conveys the general idea.
Very good point — that's why good abstractions are such a win. With functions, I don't need to think about all the code needed to read the body of an HTTP request; I just read it.
That's also why I think that macros are so valuable: a single macro can abstract away an extremely complex and tricky piece of code into a single line, or two.
With the right abstractions, we can turn 10^6 projects back into 10^3 projects.
This alludes to another issue: you can pack a lot into one line, but that doesn't necessarily mean it'll be a tractable line of code lest something goes wrong
True enough, and it applies to functions as well as macros. A function named getThingList which also sets the user's password and send a rude email to the Premier of Elbonia is just as bad as an unhygienic macro!
I was going to reply with a link to Manny Lehman's "FEAST" publication page, which (tragically - and I am not being cavalier) has seemed to disappear following his passing. I highly recommend chasing links starting from his Wikipedia page: https://en.wikipedia.org/wiki/Manny_Lehman_(computer_scienti...
It is challenging to summarize in a brief reply, and I would not do justice in any event - characterizing the relationships between the evolutionary pressures on software, the limitations of the people and organizations who produce it to handle the attendant complexity of any such system, and the feedback loops which drive the process, etc., is not a 1 paragraph post.
My crotchety old geezer, "get off my lawn" take is that minimizing LOC count is vastly unappreciated (as FEAST underscores how increasing LOC count decreases the ability to evolve/change/modify software), and that DSLs are therefore far more promising than are "index cards" and "stories" and "methodologies" to decrease LOC count and thus increase agility
Anytime someone mentions LOC count, I have to reply with this great line from Dijkstra:
[I]f we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger.
The way you describe relevance at each order of magnitude is eye opening to me. It's exactly what we have been doing, but it was never explained so eloquently why we succeed because of that. Did you get this from somewhere or is it your wording?
This is some combination of something I read once somewhere (Fred Brooks?? possibly) with my own experience over the years. (I didn't go past 10^6 lines because I've never worked on anything larger.)
Anecdote time, once when I was younger I was a programmer. I should work on a java program. ( was a piece of shit and mess) refactoring will never happen ( costs money ) so i switched programming to baring a sysadmin.
Valuable lesson I learned: code will be unlikely rewritten or recaptured. It costs money.
But at least you get the chance to upgrade hardware and software once in a while, because it wants out of support or breaks down.
I net this java program is still a mess and runs somewhere.
Thanks to this code I realized programming isn't something I want to do 24/7 :)
> There's probably a book that explains all this much better than I can here, but perhaps this conveys the general idea.
Why should I read a book when you have done a great job in one paragraph. Seriously, everyone who has built large systems should understand it immediately. This is a big part of the software engineering craft.
Is not "software" that has "diseconomies of scale", is the design process.
Then, because software has close to zero of marginal cost the design cost is all the one we end up paying.
Imagine the same milk that the author is talking about, imagine to design the system that get the milk from the cow to 2 people every morning.
Not a big deal, get a cow, a farmer and a truck mix all together and pretty much done.
Now imagine to serve a supermarket, well you start to need something more than a cow a farmer and a truck.
Now imagine to serve a whole city, what about a whole nations? What about the whole world?
Simply design how to serve milk to a small supermarket is a problem, but since the marginal cost of the milk is not approaching zero, the cost of the design process will always be less that the cost of the milk (otherwise it wouldn't make financial sense), hence the whole idea of "economies of scale."
To conclude I believe that the root causes of "diseconomies of scale" doesn't lie in the "software" part but in the "design" part.
If you don't understand the problem you can't really find a solution...
Other than that it does matter because in a future you may see the same problem in other industries... A very advanced 3d printer may bring this whole class of problem to the manufacturing industry as an example.
If we learn how to manage the complexity of the design phase we will be able to apply the same concepts to other fields.
The capitalist wins because he can sell the same milk cheaper, because then each farmer don't need own truck, and to bother with going around selling stuff to people. Capitalist will have a fleet of cars, sales team, etc optimized for cost efficiency.
Cheaper food means for example - people who couldn't afford to go to vacations, or to buy a computer for their kids - can.
So - designing the logistic chain is worth it, it provides value for customers.
Same as making software.
The design process/writing software on the other hand - is hard to do on big scale from scratch. I'm not aware of any big logistic network that started big.
Economy of scale has nothing to do with the price of milk in different-sized containers, nor does it have anything to do with software complexity.
Economy of scale is about it being cheaper for a large farmer to produce a liter of milk than for a small farmer, because overhead costs don't increase linearly.
Likewise, software has economies of scale because, on a per-user basis, it's far more expensive to support a 1st, or 5th, or 100th user than it is to support a 100,000th.
The author is fine to note that software gets more complex and costly to maintain as it gets bigger, but that has nothing to do with economies of scale -- the author is completely confusing concepts here. Economies of scale are about the marginal cost per user, not the marginal cost per bugfix.
Maybe he shouldn't have used the term "economies of scale", instead using "economies of scope", which he does use. Do you agree with his point about diseconomies of scope?
I appreciate the thought process here, but to my reading the understanding of economies/diseconomies of scale is quite wrong.
(a) The author correctly point out some diseconomies of scale in software, i.e. things that cost more when you do more of them.
(b) The author fails to identify that economies of scale typically far outweigh the aforementioned diseconomies. The main error seems to be basing the argument on this statement --
"This happens because each time you add to software software work the marginal cost per unit increases"
-- without considering the idea of dividing the expense by the number of users, which can increase dramatically for projects reaching a critical mass. To take a trivial example, most 10-employee businesses can't afford to pay for a 10,000 line software application. A company with 100 employees may need to write a 30,000 application due to increased complexity of their environment, but they can afford it because the project now involves 300 lines of code per employee rather than a 1,000.
In short this author accounts for both the numerator and denominiator in the milk analogy that's up front, but then effectively ignores the denominator in the discussion of software costs.
Of course this is why most programmers work for large organizations, at least relative to the average employee. It's also why a handful of large software projects make most of the money in the software business. I'm not happy about this, btw, but it is the case.
To my reading the author's use of "economies of scope" and "economies of specialization" are even further off base. For example, the trend over the last 50 years or so has been towards increasing specialization (which again, benefits larger teams, although the app economy may have provided a brief counterpoint, the same forces are at work there).
Many apps do have diseconomies of scope — they have too many features, and are therefore hard to use. And the company spent more resources to build a more featureful app, like hiring more people, but this extra investment has had a negative return.
Overspecialisation also causes problems, like engineers who don't understand user experience or empathise with the user or even stop to ask whether the flow they're building is needlessly complex. Or designers who propose ideas that are not technically feasible given the platform.
If you can increase scale without increasing scope, like WhatsApp supporting 900 million users with 50 or so engineers, great. If you're increasing scope in order to increase scale, you can't assume that the former leads the latter.
> Suppose your developers write one bug a year which will slip through test and crash the users' machine. Suppose you know this, so in an effort to catch the bug you do more testing. In order to keep costs low on testing you need to test more software, so you do a bigger release with more changes - economies of scale thinking. That actually makes the testing harder but… Suppose you do one release a year. That release blue screens the machine. The user now sees every release you do crashes his machine. 100% of your releases screw up. If instead you release weekly, one release a year still crashes the machine but the user sees 51 releases a year which don’t. Less than 2% of your releases screw up.
This example makes no sense. The user in either case still gets 1 crash per year. They're actually worse off in the many-releases case because in the annual update scenario, they can at least block off a few days to cope with the upgrade scenario (in the way that sysadmins schedule upgrades and downtime for the least costly times of years), but in the weekly release, they could be screwed over at any of 52 times a year at random, and knowing Murphy, it'll be at the worst time. '% of releases which crash' is irrelevant to anything.
I agree, it makes no sense, not only from a planning perspective from an implementation perspective, specifically...
"In order to keep costs low on testing you need to test more software, so you do a bigger release with more changes - economies of scale thinking."
If you wanted to keep testing costs low, you wouldn't do bigger releases, you'd create automated tests. You may spend more effort up front on building the tests, but as long as you make the test components modular and target the tests at the right level of your application the 'cost' of testing will decrease over time.
Yeah, sure, everything can be tested in an automated way. I can name a few projects that tried it. They were released with a hilarious amount of bugs.
Also writing good automated tests requires a great test developer. The thing is, anyone with such credentials would be a great developer and as such, not a tester.
Even if you go fully test-driven, which makes it much cheaper, the cost of test a lot development model is surprisingly high for any application of useful size.
Just imagine trying to write even something a simple as MS Paint with good test coverage.
Writing good automated tests doesn't necessarily rely on a great developer, it all depends on how you approach testing. For example, let's say you want to use Selenium to write web UI tests. One common approach is to have developers create a page object model, which testers can then use to write readable and robust tests. Creating a page object model is a simple task, and working with that page object model is a simple task, as the model guidelines define an easy to implement and easy to follow control structure (essentially all page object methods must return another page object, which means you can chain them together).
Oh and MS Paint would be easy to create a good test suite for. For what it's worth I'm a software tester by trade, so perhaps it's straightforward for someone who creates tests for a living to know how to approach it, I guess someone who was inexperienced wouldn't necessarily know how to approach it.
Do you have any suggestions on where to learn more about testing? I've gone through numerous tutorials and read through The Art of Unit Testing but I still feel like I'm writing tests just to write tests. It's not really clicking for me
Sure. First of all, even though you're clearly a developer (as you mentioned writing unit tests), I'd recommend starting at a higher level of abstraction with BDD tools, basically anything that implements the Gherkin language (I can't tell you which BDD tool to use as I don't know what language you're coding in, if you can tell me I can give you a more specific recommendation). The idea behind this is that you're writing your code to meet a specification that your client can also work with, so you can be sure the specification you're coding against is what the client wants. As an added bonus, the Gherkin scripts form a type of living documentation, providing clear information about what the application does whilst also remaining up to date (so long as all tests must pass before a new version is sent out).
Second recommendation is to look into how to avoid test-induced damage, which is where code gets bloated and more complicated in order to make it more testable. One major source of problems in this area is the need to create mock objects. I'd recommend this video from Mark Seemann as a good starting point in this area, as it looks at how you can create unit tests without mocks:
As you're also using JS I'm guessing you're creating web apps, so I can recommend Selenium if you want to automate front end tests. You can code Selenium tests with C# too, and you can also abstract away the details of the Selenium implementation to use SpecFlow to write the tests. If you have access to Pluralsight (if you have an MSDN licence check to see if it's bundled with your MSDN licence, I think I got a 45 course Pluralsight trial with my MSDN Enterprise licence), there's some good courses on Selenium and SpecFlow, including one that takes you through combining the two.
You think it would be easy, but UI tests automated by things like Selenium are hard coded sequences. As far as I remember, they do not run randomised collections of features with randomised data input.
Plus they do not verify application state really, just that it does not crash. What if it just looks and acts funny?
I've used Selenium. Perhaps you missed that I said I'm a software tester. This sort of stuff is my bread and butter. In terms of values, there are well established guidelines you follow when picking those values, for example boundary cases.
As for "looks and acts funny", that's why you still have exploratory testing. Automated testing can drastically cut down on overall testing time, but there's still the need to perform exploratory testing to look for quirky issues.
Developers should be doing a lot of things. In my experience the trouble is that most developers are too optimistic and tend to neglect negative test cases. So most teams still need specialist testers to ensure acceptable quality.
Is this not a case for developer education? If developers are neglecting corner cases that's a real issue that needs to be addressed directly rather than having it be the job of someone else to clean up after them.
Fish or cut bait? Obviously I train developers on my teams to be better testers but that takes time, meanwhile we have to actually get working releases out. The testers work directly with the developers, they aren't cleaning up after them.
(And please no one tell me that I should only hire developers who are already also expert testers. I have to operate in the real world.)
Yeah, it's too hard to hire good programmers and too expensive as well. If you have the capability to teach programmers, then that's a real competitive advantage.
It's not hard to hire good programmers. There are plenty of good programmers available at the market clearing rate. However there are never any perfect programmers available anywhere at any price. Everyone has some flaws and knowledge gaps. So managers have to keep their expectations realistic.
Careful to qualify that. Sure developers should write unit tests for their code when possible. But they should not be the ones coming up with the acceptance tests (product owner) and doing all the testing of their work (QA).
I've seen that argument several times "Why can't the developers test their work?" For the same reason professional writers have proofreaders, editors, translators, etc. Because it would be amateur not to have more than one pair of eyes and one brain looking at something. No matter how good those eyes/brain are, they'll miss things that would be obvious to someone else.
There are only a few types of fonts available - TrueType, PostScript, bitmap, maybe others? At any rate, you test a couple of examples of each type. Paint is a bitmap graphic generator, right? Apply letters in a test font to a bitmap, and compare the resulting bitmap to an expected value.
We don't need to test every possible font there is. We only need to test enough fonts to cover the likely failure cases and known bugs.
Speaking of bugs, suppose we find a bug with a specific font, or a specific size of font, or something like that. We write a test that exercises the bug, then fix the bug. Because it's a one-off test, it's not a tremendous burden to write. Moreover, that bug will never come back undetected again.
Validation of new fonts is handled by Windows rather than Paint, I don't see why a new font that meets the criteria set out for Windows would cause any massive test burden for Paint.
You target the functionality that resides the application, and its integration with other software. However, you don't need to test Windows functionality, that's for Windows developers to do. 'Total test coverage' is about testing the application under test, not every component of the OS it builds upon.
If you find bugs in 3rd party components whilst testing, of course you try and do something about it. The point is, at some point you have to trust that the components you're working with are relatively bug-free otherwise every application will end up testing all the features of the OS it sits upon. Do you really expect developers to run a full test suite for Windows every time they develop a painting application? Note that I'm talking about actively looking for bugs, not the bugs that are apparent from some happy path testing.
Edit: Some spacing, and grammar, and some sentence structure.
+1. Putting too much faith on the test coverage is a sign of too much faith in humans.
Some of my time goes to mobile development/android these days, and here are facts:
1. On android, there is occasionally this one vendor that crashes randomly for some piece of code. Because they modified the source code for android when they shouldn't. And it's just one model, among thousands of others. And it's different model for different code pieces.
2. Android API is not exactly one would assume. They return nulls for cases when they should return empty lists, and vice versa. They do this occasionally. So you don't really know when an api returns null vs empty list, and some apis (like get running processes) should never return empty, except that it does in some bizarre cases. So your code would need to test both empty list and null, which adds to your test scenarios. Aside from null/empty list, there are many other examples.
3. When you release, all your tests work fine, at least for what they test. There is this one client that uses things a bit differently than others. You also fail 10% of the time for that client because of how they use an API. Especially in Android, how they use an API does matter. A simple example is using Application Context vs Activity Context vs Custom Context implementations. You'd assume they should all work fine, because Context is an abstraction and
Liskov substitution principle should apply, but it doesn't because it's leaky abstraction. Thus, somethings works with some context while others work with other contexts, and you don't really know which.
I came to the conclusion that for most cases, testing is a must to verify correctness of known scenarios, and known regressions. However, for automated upgrades/updates, testing will not work. Not at all. Doing automated updates/upgrades, it should be a controlled release. it makes more sense to make the release A/B test way, if possible, where small population (say 1% of the users) get control/previous release, and another population of the same size but disjoint gets the experiment/new release. Then you compare your baseline metrics. Could be # of http connections, number of user messages sent, number of crashes, etc.
Testing can only cover what humans can think of. Unfortunately, we don't know what we don't know. That's the main problem.
When the bug that slips through the testing is discovered, the first thing to do is write a test that exercises the bug and makes it repeatable. This is sometimes difficult in the case of intermittent or highly condition-dependent bugs, but simply learning how to repeat the behavior is an important learning experience.
The bug should not be fixed until there's an automatic test that induces the bug. That way, you can test to know that your fix actually works.
That's a great ideal. But often the code has to be substantially rearchitected to allow for automated testing like this (especially for those difficult bugs, as you mention), and when the fix only requires changing a couple lines of code, it's not worth the risk.
Although it would be great if we went through the exercise of "how would we write an automated test for this?" at a high level, even if it doesn't get implemented. That might get people thinking about how to architect their code so that it's more testable from the get-go.
If the code is written with a proper testing suite from the start, it's not hard. If you're dealing with a legacy code base that doesn't have a testing harness (I've done this a lot), then the first thing you should be doing before making changes is writing tests for them!
Every time I do the "I don't need a test for this, it's just an obvious little bug" thing, and I implement a test-free fix, it fails! Every. Single. Time. Maybe it's because I'm stupid, but I like to think I'm above average. So if above-average programmers have a high risk of releasing "bug fixes" that don't actually fix correctly (or cause other problems) because they didn't write good tests, that really turns the "not worth the risk" response on its head.
It's not that it's not worth the risk to write tests. It's not worth the risk to not write tests!
That's the part I meant when i said "known regressions". If there is a bug, you write the test, it doesn't happen again... Until next refactor, because most often you'll find that the tests were written at a lower level, instead of a functional test (that tests the e2e functionality) it was written to test individual pieces of SUT such as individual methods in a class. This happens quite a lot. Devs will prefer unit tests over more complex e2e tests which takes a lot more effort to write (because they are slow, flaky, more complex). Any friction in this process will make the dev prefer the easier way to test (which does not mean future proof).
Even if you do automated testing you should still do exploratory testing, it's not an either/or situation. Automated testing allows you to cover the bulk of your system functionality, exploratory testing allows you to pick up on the unexpected quirks that crop up. Automated testing frees up the time of testers to do exploratory testing as they don't have to run through all tests manually.
You mean they will block the one big release for a year?
Or are you implying there will be a working hotfix soon after the release?
What about security updates then?
See, in a weekly model you can probably safely skip a cycle. Not so with a big release. If the release has some critical functionality broken, you might be waiting a long time for it to get fixed.
This is a complete misinterpretation of the concept. The only economy of scale that matters in software is that (in theory, barring certain scaling considerations which are in any case becoming more abstracted out thanks to IaaS) the same program can be run to support 10s of millions of users. Each additional unit costs very little to produce (serve it up via a CDN, some amount of network calls, processing and data storage over the lifetime of a user).
But the issue is not that this happens because this is software development. The reason is software is a job in which cost increase is linearly related to manpower. There are no silver bullets that can magically reduce the number of people needed.
Also increasing number of people makes things even worse as communication problems increase, but I think all agree on that.
But the main problem with the article is that milk is always the same: in a glass or in a tank car. Software is completely different: more functionality require more lines of code that increases people... and cost.
Fair point but that's neither what the title says nor what he wants you to believe. This is clickbait. And technically what I described is an economy of scale wrt software development the same way the assembly line corresponds to car production. Software dev benefits from fast, cheap distribution of a product that can be duplicated easily because otherwise a developer would need to rewrite the software each time and deliver it by hand to users.
Arguing against there being economies of scale in development is a straw man, no-one has thought this for a long time, the mythical man month was published decades ago.
Exactly. Scaling a housing business means building more houses, not adding more stories to the one house you're building. And since the marginal cost of software is so low, it naturally tends to a winner-take-all market, which is an interesting counterpoint to the "get small" advice in the OP.
The argument that software development has diseconomies of scale is well know e.g. it has been incorporated a long time ago into the COCOMO estimation model. It would be interesting to reverse the question - where does software development really have some economies of scale? What situations does scaling the team or the chunk of software under consideration up make sense?
- system testing?
- roll-out?
- bulk purchasing of licenses for development?
- architecture in the sense leveraging consistent frameworks, naming etc. across a team?
- infrastructure?
- development time too long? (according to COCOMO there is an optimal time and beyond effort increases although slowly)
When I think about economy of scale and software development, all examples I come up with have the property that any gains are diminishing quickly. For example, a large system has a better build system, which catches a higher number of bugs, therefore the developers are more productive until the now larger project eats the added productivity due to higher complexity.
On the other hand, what in traditional industries would be called production, that is producing the specific website for one request, has a rather absurd economy of scale. With a single server and a static site you can serve a sizable fraction of the world population, with negligible marginal costs of serving an additional user. Actually Metcalfe's law suggests that the marginal cost of serving an additional user is negative and hence we get the behemoths like Google and Facebook instead of competition of a few different corporations.
Building up an understanding of the domain. The bigger the solution, the more you can charge for it, the smaller the relative cost of understanding the problem space.
Small, frequent releases are a great idea for web applications and (to a lesser extent) mobile and desktop consumer apps where updates are automatically pushed through some sort of app store. But this totally doesn't work for software licensed to enterprise customers for on-premises use. Every upgrade you release means your customers now have to incur an expense to test your update with the rest of their environment, retrain their users, and roll out the update. Release too often and your customers will be unhappy. So the tolerance for defects goes way down.
There are lots of software models where you can't rerelease at all. The amount of formalism in your software development model is highly correlated with your release costs.
A few days ago I would have agreed whole heartedly. Now, after reading the article about the formation of AWS I’m not so sure.[1] It seems that Amazon (and others such as Google) have achieved economies of scale in development with internal services and well organized APIs. Certainly much of this is being exposed externally for profit, but it does seem to indicate that well designed software has economies of scale for the development of future software. Certainly one monolith application with a team of a thousand starting from scratch is a bad idea, but once internal APIs and services exist, these definitely seem to aid in the rapid development of future software products for a company, which seems like economies of scale to me.
Bloated bureaucracies and bad processes can negatively impact any company in any industry, not just software. Some of the article’s logic doesn’t seem to differentiate software development from anything else, such as “working in the large increases risk”. So while a large monolith application is risky, building a hundred million widgets is risky too. Better to iterate and start with a prototype and expand, but the same goes for other industries too: better to prototype and market test your widget before mass production. Seems to me like the article is talking more about lean and agile and process in general than about economies of scale.
What the article says is true for now, but it doesn't mean it will always be true.
Making and transporting large milk bottles is very efficient today (because it's all automated and there are well-tested processes in place),
but it wasn't necessarily always like this.
When people where still figuring out how to make glass bottles by hand (through glassblowing), bigger bottles were probably more
challenging to make (more prone to flaws and breakage during transportation) than small bottles. So probably they just figured out what the optimal size was
and just sold that one size.
With software, it's the same thing, we don't currently have good tooling to make building scalable software easy.
It's getting there, but it's not quite there yet. Once Docker, Swarm, Mesos and Kubernetes become more established, then we are likely to
see the software industry behave more like an economy of scale.
Once that happens, I think big corporations will see increased competition from small startups. Even people with basic programming knowledge will be able to
create powerful, highly scalable enterprise-quality apps which scale to millions of users out of the box.
That's the 90's calling back with the 4th gen language that were supposed to drive the developer out of work. Before that it was COBOL, a simple enough language that does not require programmer.
Automation happens a lot in the Software Development world, but instead of depriving the developer of work, they just pile on the shoulder of the developer. For example, today, with AWS/Docker/TDD/DDD/... I basically do the work that would have taken a team of 5 people only 15 years ago.
The thing is there is always going to be somebody that sits at the limit between the fuzzy world of requirements and the rigorous technical world of implementation and those people are going to be developer (of course they will not be programming in java, rather in something else, but rigorous enough that the activity is still called programming)
Unless AI takes over, but it probably means that work as we know it has changed completely.
You seem to be confusing deploying software at scale rather than building software at scale. In fact, there are two different types of scale involved here. I can build a 1 page application and deploy it out to billion people.
The article is talking about large software, not small software deployed to scale.
As for tools to help build large software, we have them in spades and they will continue to improve. But some things still don't seem to scale. Tools help with managing 500 developers but not enough to really make 500 developers as effective as 50 developers on a smaller project.
More than two scales in fact, I can think of three: scale of firm, scale of distribution, scale of product.
In most industries scale of firm goes hand-in-hand with scale of distribution. Software breaks the paradigm, so we have to be careful to say which scale we're talking about.
Obviously, with distribution, software has insane economies of scale, since we can copy-paste our products nearly for free. That's why we can have small firms with a large distribution, unlike most industries.
With scale of firm, we face some of the same diseconomies as other industries. Communication and coordination problems grow superlinearly with firm size.
Effort and resources needed also grow superlinearly with product scale. That's also true of other engineering disciplines though. Making a tower twice as high is more than twice as hard. Part of it is the complexity inherent in the product, and part of it is that a more complex design needs a bigger team, so you run into the firm diseconomies of scale mentionned above.
> Once Docker, Swarm, Mesos and Kubernetes become more established, then we are likely to see the software industry behave more like an economy of scale.
> Even people with basic programming knowledge will be able to create powerful, highly scalable enterprise-quality apps which scale to millions of users out of the box.
I must disagree. The real problem with scalability is that any system that scales enough must become distributed, and distributed systems are obnoxiously difficult to reason about, and as such remain difficult to program and to verify.
Talk to me about Docker and Swarm and the like hosting technology platforms and frameworks that make it trivially straightforward to program distributed systems reliably, and really hard to program them wrong, and we might have the utopia you speak of.
Powerful abstractions make the promise that you'll only have to learn the abstraction to write powerful software, so "even people with only basic knowledge will be able to do X using abstraction Y".
The promise is almost always false. All abstractions are leaky, and if you do serious development with them inevitably bugs will bubble up from below and you'll have to dive into the messy internals.
For example, ZeroMQ makes distributed messaging relatively painless. Someone with very little knowlegde of the network stack can write simple programs with it easily. But for any serious enterprise application with high reliability requirements you'll eventually run into problems that require deep knowledge of the network stack.
>Talk to me about Docker and Swarm and the like hosting technology platforms and frameworks that make it trivially straightforward to program distributed systems reliably, and really hard to program them wrong, and we might have the utopia you speak of.
Your argument definitely applies to Backend-as-a-Service kinds of software and I agree 100%, but the nice thing about the Docker/Kubernetes combo is that it gives you an abstraction but it does so in a way that doesn't prevent you from tinkering with the internals.
The only downside I can think of is that tinkering with those internals can become trickier in some cases (because now you have to understand how the container and orchestration layer works). But if you pick the right abstraction as the base for your project, then you may never have to think about the container and orchestration layer.
Maybe I was unclear, what I meant is that usually you need to tinker with the internals at some point. Which is fine, but it does mean you need more than basic knowledge to use the tool productively. (And if the software is proprietary and poorly documented, you're SOL).
The lie is that this tool is so easy, you just have to read this 30 minute tutorial and you'll be able to write powerflu software and you don't even need to learn the internal mechanics of it.
I havn't used Kubernetes, it's possible it's so good that you don't need the learn the messy details, I'm just sceptical of that claim in general.
Your last line pretty much describes exactly what I think the next phase will be in the container/orchestration movement. The problem with 'Docker and friends' at the moment is that they are disjointed general-purpose pieces (highly decoupled from any sort of business logic) - To make anything meaningful with them, you have to do a lot of assembling (and configuration).
I was a Docker skeptic before I stumbled across Rancher http://rancher.com/. In Rancher, you have the concept of a 'Catalog' and in this catalog, you have some services like Redis which you can deploy at scale through a simple UI with only a few clicks.
I think that this concept can be taken further; that we can deploy entire stacks/boilerplates at scale using a few clicks (or running a few commands). The hard part is designing/customizing those stacks/boilerplates to run and scale automatically on a specific orchestration infrastructure. It's 100% possible, I'm in the process of making some boilerplates for my project http://socketcluster.io/ as in my case, but you do have to have deep understanding of both the specific software stack/frameworks that you're dealing with and the orchestration software that you're setting it up for (and that's quite time-consuming).
But once the boilerplate is setup and you expose the right APIs/hooks to an outside developer, it should be foolproof for them - All the complexity of scalability is in the base boilerplate.
This presupposes that Docker and friends are the answer. I have become increasingly sceptical of that. :(
This also ignores much of the physical advantage that existing powerhouses have. Amazon, as an example, will be difficult to compete with, not because AWS the software is so amazing, but because the data centers that house AWS are critical. Google and others have similar advantages.
I agree wholly with your first paragraph, but disagree entirely with your last one. Look at every industry with economies of scale today. Do any of them make it easy for newcomers to compete? At best economies of scale are entirely uncorrelated with number of players. I suspect it's worse than that: economies of scale allow large players to crowd out small ones. That will happen in software as well once we figure out how to systematize our learning (http://www.ribbonfarm.com/2012/10/15/economies-of-scale-econ...)
By this guy's definition, housing construction has diseconomies of scale. You wouldn't believe how expensive a thousand story tall condominium complex would be!
Article conflates distribution with production. Production would be a bigger cow.
Deep learning benefits from scale: both in computation power and in a larger corpus.
From the example: is the final product the cow, the milk, or the nutrition it provides? Is a software product the lines of code, the app to download, or the service it provides?
It seems like the article's milk analogy is applied to the wrong thing: another pint of the same exact product.
Maybe it's more apt to make the analogy with a pint of a different type of milk product, a type of milk with brand new features. I'm talking about real-world products like these which have appeared in the last 10-20 years:
* Almond Milk
* Cashew Milk
* Coconut-based Milk products (not traditional coconut milk)
* Soy-based Milk products (not traditional soy milk)
* Lactose Free Milk
* Grass Fed Cow Milk
* Organic Cow Milk
* Omega 3 Enriched Milk
etc.
We would not expect the marginal cost of these to be less than another pint of conventional regular cow's milk, and indeed, it isn't.
That said, I suspect that the article is basically going in the right direction. I mean, it seems like additional features on a complex software project really do cost much much more, on a percentage basis, than a new kind of milk beverage does.
Interesting take on production. Multiple kinds of milk that use the same distribution medium of cartons and grocery stores. These products are decoupled from each other, so the development cost is comparatively linear.
I agree with OP's premise that productivity diminishes with project complexity - a decades old problem addressed by Fred Brooks' Mythical Man-Month. But, a lot of the complexity is now wrapped by reusable components. It is now possible to write a component that is shared by 1000's of projects. Somewhat akin to a milk carton that can hold many kinds of milk.
What he describes (a product growing in complexity) isn't a typical economics of scale type of product. I'd also argue that the general observation is true for all complex products. Even a car has relatively high R&D cost and only reaps the benefits of economies of scale (lower per unit production cost than the competition) once it is ready to ship. Unit 1 is going to be very resource intensive, the following units are what contributes to the scale (and like he said for software that scale is amazing).
However maintenance cost in software are usually really high especially since there is a tendency to gradually "improve" the same code base instead of ripping it out and building a new one every n years (buy a new car).
We're transitioning from a monolith PHP app to a Node microservices architecture. The approach with microservices seems to take these scaling issue into account, allowing for narrow focus on each service during initial development and ongoing maintenance.
Any production system has the same issues, they're usually called overhead or in some cases agency costs.
This is captured well in the marginal productivity of labor: the returns on adding more labor decline as you add more labor.
Talking about actual economies of scale, software has massive economies of scale. The marginal cost of serving one additional copy of software (in the old "software as a product" way) is close to zero.
When developing product lines, you can enable tremendous leverage through good architecture over time. The problem is scaling with people, i.e. this is not something that can be accelerated by involving more people..
Software does have economies of scale. How much did a copy of Windows 7 or Office 2013 cost? Only about $100? That's because the more we produce/supply/consume the cheaper it gets, just like milk.
The notion that there's an optimal amount of human capital needed for a project is nothing new. We've all heard of "Too many cooks spoil the broth."
Every order of magnitude increase requires a new level of discipline. At 10^3 lines, you can do whatever you want -- all your function and variable names can be one or two letters, you don't need comments (or indeed any documentation), your functions don't need well-defined contracts because they only need to work in a few cases, etc. etc. At 10^4 lines, if you're smart you can still get away with making a mess, but it starts to become helpful to name things carefully, to add a few comments, to clear away dead code, to fuss a little over style and readability. At 10^5 lines, those things are not just helpful but necessary, and new things start to matter. It helps to think about your module boundaries and contracts more carefully. You need to minimize the preconditions of your contracts as much as practical -- meaning, make your functions handle all the corner cases you can -- because you can no longer mentally track all the restrictions and special cases. By 10^6 lines, architecture has become more important than coding. Clean interfaces are essential. Minimizing coupling is a major concern. It's easier to work on 10 10^5-line programs than one 10^6-line program, so the goal is to make the system behave, as much as possible, like a weakly interacting collection of subsystems.
There's probably a book that explains all this much better than I can here, but perhaps this conveys the general idea.