Circa 2005 I was working at places where I was responsible for 80 and 300 web sites respectively using a large range of technologies. On my own account I had about 30 domain names.
I had scripts that would automatically generate the Apache configuration to deploy a new site in less than 30 seconds.
At that time I found that most web sites have just a few things to configure: often a database connection, the path to where files are, and maybe a cryptographic secret. If you are systematic about where you put your files and how you do your configuration running servers with a lot of sites is about as easy as falling off a log, not to mention running development, test, staging, prod and any other sites you need.
I have a Python system now with gunicorn servers and celery workers that exists in three instances on my PC, because I am disciplined and everything is documented I could bring it up on another machine manually pretty quickly, probably more quickly than I could download 3GB worth of docker images over my ADSL connection. With a script it would be no contest.
There also was a time I was building AMIs and even selling them on the AMZN marketplace and the formula was write a Java program that writes a shell script that an EC2 instance runs on boot, when it is done it sends a message through SQS to tell the Java program to shut down and image the new machine.
If Docker is anything it is a system that turns 1 MB worth of I/O into 1 GB of I/O. I found Docker was slowing me down when I was using a gigabit connection, I found it basically impossible to do anything with it (like boot up an image) on a 2MB/sec ADSL connection, with my current pair of 20MB/s connections it is still horrifyingly slow.
I like how the OP is concerned about I/O speed and bringing it up and I think it could be improved if there was a better cache system (e.g. Docker might even work on slow ADSL if it properly recovered from failed downloads)
However I think Docker has a conflict between “dev” (where I’d say your build is slow if you ever perceive yourself to be waiting) and “ops” (where a 20 minute build is “internet time”)
I think ops is often happy with Docker, some devs really seem to like it, but for some of us it is a way to make a 20 sec task a 20 minute task.
And I'm guessing with this system you had a standard version of python, apache, and everything else. I imagine that with this system if you wanted to update to the latest version of python, in involved a long process making sure those 80 or 300 websites didn't break because of some random undocumented breaking change.
As for docker image size, really just depends on dev discipline for better or for worse. The nginx image, for example, adds about 1MB of data on top of the whatever you did with your website.
You hit a few important notes that are worth keeping in mind, but I think you handwave some valuable impacts.
By virtue of shipping around an entire system's worth of libraries as a deployment artifact, you are indeed drastically increasing the payload size. It's easy to question whether payload efficiency is worthwhile when the advent of >100, and even >1000 Mbit internet connections available to the home, but that is certainly not the case everywhere. That said, assuming smart squashing of image deltas and basing off of a sane upstream image, much of that pain is felt only once.
You bring up that you built a system that helped you quickly and efficiently configure systems, and that discipline and good systems design can bring many of the same benefits that containerized workloads do. No argument! What the Docker ecosystem provided however was a standard implemented in practice that became ubiquitous. It became less important to need to build one's own system, because the container image vendor could define that, using a collection of environment variables or config files being placed in a standardized location.
You built up a great environment, and one that works well for you. The containerization convention replicates much of what you developed, with the benefit that it grabbed a majority mindshare, so now many more folks are building with things like standardization of config, storage, data, and environment in mind. It's certainly not the only way to do things, and much as you described, it's not great in your case. But if something solves a significant amount of cases well, then it's doing something right and well. For a non inconsequential amount of people, trading bandwidth and storage for operational knowledge and complexity are a more than equitable trade
I had scripts that would automatically generate the Apache configuration to deploy a new site in less than 30 seconds.
At that time I found that most web sites have just a few things to configure: often a database connection, the path to where files are, and maybe a cryptographic secret. If you are systematic about where you put your files and how you do your configuration running servers with a lot of sites is about as easy as falling off a log, not to mention running development, test, staging, prod and any other sites you need.
I have a Python system now with gunicorn servers and celery workers that exists in three instances on my PC, because I am disciplined and everything is documented I could bring it up on another machine manually pretty quickly, probably more quickly than I could download 3GB worth of docker images over my ADSL connection. With a script it would be no contest.
There also was a time I was building AMIs and even selling them on the AMZN marketplace and the formula was write a Java program that writes a shell script that an EC2 instance runs on boot, when it is done it sends a message through SQS to tell the Java program to shut down and image the new machine.
If Docker is anything it is a system that turns 1 MB worth of I/O into 1 GB of I/O. I found Docker was slowing me down when I was using a gigabit connection, I found it basically impossible to do anything with it (like boot up an image) on a 2MB/sec ADSL connection, with my current pair of 20MB/s connections it is still horrifyingly slow.
I like how the OP is concerned about I/O speed and bringing it up and I think it could be improved if there was a better cache system (e.g. Docker might even work on slow ADSL if it properly recovered from failed downloads)
However I think Docker has a conflict between “dev” (where I’d say your build is slow if you ever perceive yourself to be waiting) and “ops” (where a 20 minute build is “internet time”)
I think ops is often happy with Docker, some devs really seem to like it, but for some of us it is a way to make a 20 sec task a 20 minute task.