I've just started using docker (I had to reimage my linode to take advantage of the recent upgrade). I've got nginx and postfix containers running. If anyone can offer some thoughts on the following points I'd be grateful.
1) I built two Dockerfiless on my laptop (one for nginx, one for my postfix setup) tested locally, then scp'd the Dockerfiles over to the server, built images and ran them. I didn't really want to pollute the registry with my stuff. Is this reasonable? For bigger stuff, should I use a private registry? Should I be deploying images instead of Dockerfiles?
2) The nginx setup I deployed exports the static html as a VOLUME, which the run command binds to a dir in my home dir, which I simply rsync when I want to update (i.e. the deployed site is outside the container). Should I have the content inside the container really?
3) I'm still using the 'default' site in nginx (currently sufficient). It would be kind of nice to have a Dockerfile in each site I wanted to deploy to the same host. But only one can get the port. I sort of want to have a 'foo.com' repo and a 'bar.org' repo and ship them both to the server as docker containers. Don't really see how to make that work.
What I think I want is:
- a repo has a Dockerfile and represents a service
- I can push these things around (git clone, scp a tgz, whatever) and have the containers "just run"
Not sure how to make that fit with "someone has to own port 80"
1) Testing a Dockerfile, then scp'd it to the server, doesn't guarantee that it will build successfully on the server. However if you build successfully an image and push it to a repository - it will definitely work. Based on this, you can decide which one works for your setup. If you are on a production setup, I would say that you should use tested images, instead of hoping that the Dockerfile will build correctly.
3) As far as I understand your problem is that both containers would be running their own nginx, and would have to take port 80 for example. If this is what you mean, you could just EXPOSE port 80 from within the container, and it will automatically be mapped to a random port like 43152. Both containers would be mapped to different random ports (for example 43152 and 43153). You could then install Hipache and route different domain names/sites to different containers, essentially having Hipache proxy in front your Docker containers setup.
EDIT: There is also a project called Shipyard, which is Docker management... what I described above is called "Applications" inside Shipyard.
1) Deploy images vs Dockerfiles is always a tradeoff. If you want to be absolutely sure that your code will run in production, then testing locally and pushing the image to a private registry such as Quay.io (disclaimer: I'm a cofounder of Quay.io) is a better approach. If, on the other hand, you want reproducibility of your execution environment, a Dockerfile can be better. As an example: Quay.io itself is built from source using a Dockerfile, which includes as one of its steps a RUN command that executes our full suite of tests. Users can tie Dockerfiles into Quay.io's Github integration [1] so that on every code push the Dockerfile is built (which includes running tests), and then pushed automatically to their repository. If setup correctly, this allows every single code change to be tested and a production-ready image to be created, all without an external CI or build machine.
2) Again, this is a tradeoff. If the context is sufficiently large or changing, it might be better to rsync the data from an external source every time the image is built (or even when it first starts if the data is EXTREMELY large). On the flip side, having the data inside the image means that you do not need to worry about networking and security issues around your data source. One point to note: The cache that Docker uses when building Dockerfiles is sensitive to file changes in the build package; if your files are changing a lot, make sure they are either on an external volume OR placed late inside the Dockerfile to prevent your earlier images from being rebuilt all the time.
3) HAProxy might be a solution for this; you could pipe different requests to different ports based on the incoming DNS name. We are also working on an (WARNING: experimental) project called gantry(d) [2] that makes handling container-based-components a whole lot easier.
Not an expert, but one solution is to have an Nginx configuration file for each website in `/etc/sites-enabled`, which matches requests based on the `server_name` parameter.
Then you can reverse-proxy requests to the right location/port from your Nginx webserver.
This could be provisioned pretty easily, but not from within a Dockerfile. I use a post-receive hook on a remote Git repository, for one of my own websites.
I currently have an ansible script which can set up a web-service on any Debian/Ubuntu box, and can be invoked
1) over SSH, or
2) by Vagrant when provisioning a VM.
Docker, on the other hand, provisions it's containers from a rather simplistic Dockerfile, which is just a list of commands. The current solution to provision a container through ansible is rather messy[1], and shows that Docker's configuration doesn't display the same separation-of-responsibilities as Vagrant's does.
Luckily, this lets me use Docker as another provider through the Vagrant API. Woooo!
I paused integrating Ansible with Docker for the moment until I'm happy with my Ansible repo and Docker is stable.
I still think Ansible inside Docker is feasible, by
* using it to generate the initial base image
* receiving some sort of signal inside the container to update its playbook and run.
So when you want to update the container, you're not tearing it down but instead telling the container to perform some sort of "soft reset".
Using ansible inside of Dockerfile's means that you do a full rebuild of your image for every minor change, and when shipping images, you ship the full image every time instead of a small delta.
What do you gain by using ansible inside of your Dockerfile? I find ansible pretty useful to set up a bunch of Docker images on a server, but I haven't found it very useful to actually build the images.
I've not used Docker before, so this is all postulation: Dockerfiles don't look great for complex software setup processes, being just a list of commands to run on the machine.
We already have provisioning tools which attempt to solve this problem, so I'd much rather write one definition which can be applied everywhere, whether to virtual machines, Linux containers, or hosts running on physical machines.
Maybe I'm just trying to use Docker incorrectly? My particular use-case is setting up software (CKAN[1]) in an Ubuntu environment, but the server I have access to is Arch Linux, and the software is I/O-heavy, so I imagine that a container would probably be better than a VM.
A provisioning script is essentially just a list of idempotent commands to run on the machine. But given the way that docker works, idempotence is not required -- if you change a command, Docker rolls back to a known state and runs the command.
A provisioning script might be slightly "higher-level" than a Dockerfile or shell script, but I find the difference is minimal, that the number of lines of code required are similar. Many provisioning tools provide libraries of pre-built recipes you can utilize; Docker provides a repository of pre-built images.
Dockerfiles are designed to give you all the primitives you need to compose arbitrarily complex build processes, and no more.
A Dockerfile is not a replacement for your favorite build script: it's a reliable foundation for defining, unambiguously, in which context to run your script. The Dockerfile's defining feature is that it has no implicit dependency: it only needs a working Docker install. Unlike your favorite build script, which may require "python" (but which version exactly?), ssh (but which build exactly?), gcc ( but...), openssl ( but...) and so on.
Of course I'm biased but I consider it a significant regression in your usage of Docker to build images from a vagrantfile instead of a Dockerfile.
I'm curious why you feel Docker doesn't offer separation of respinsibilities? In my experience the opposite is true, and a recurring theme in why people switch from Vagrant to Docker.
This is really cool. It'd be great to be able to piece together development environments from dockerfiles even quicker than you can now.
One of the great things about docker is that once you've played about for an hour or so you've already picked up most of it. It's not like Chef or Puppet – configuring environments using VirtualBox and a VM is really simple. I wonder how fast this will make things.
I've been looking for a way to integrate docker into my existing workflow. This integration takes nothing away from Docker and just makes Vagrant that much more flexible and valuable to teams already using it and newcomers. Can't wait to run this through its paces.
Curious, how is the default "proxy" vm on macs sized?
This is a really great step forward - thanks Mitchell!
I've recently spent a couple of weeks doing a deep dive into Docker, so I'll share some insights from what I've learned.
First, it's important to understand that Docker is an advanced optimization. Yes, it's extremely cool, but it is not a replacement for learning basic systems first. That might change someday, but currently, in order to use Docker in a production environment, you need to be a pro system administrator.
A common misconception I see is this: "I can learn Docker and then I can run my own systems with out having to learn the other stuff!" Again, that may be the case sometime in the future, but it will be months or years until that's a reality.
So what do you need to know before using Docker in production? Well, basic systems stuff. How to manage linux. How to manage networking, logs, monitoring, deployment, backups, security, etc.
If you truly want to bypass learning the basics, then use Heroku or another similar service that handles much of that for you. Docker is not the answer.
If you already have a good grasp on systems administration, then your current systems should have:
- secured least-privilege access (key based logins, firewalls, fail2ban, etc)
- restorable secure off-site database backups
- automated system setup (using Ansible, Puppet, etc)
- automated deploys
- automated provisioning
- monitoring of all critical services
- and more (I'm writing this on the fly...)
If you have critical holes in your infrastructure, you have no business looking at Docker (or any other new hot cool tools). It'd be like parking a Ferrari on the edge of an unstable cliff.
Docker is amazing - but it needs a firm foundation to be on.
Whenever I make this point, there are always a few engineers that are very very sad and their lips quiver and their eyes fill with tears because I'm talking about taking away their toys. This advice isn't for them, if you're an engineer that just wants to play with things, then please go ahead.
However, if you are running a business with mission-critical systems, then please please please get your own systems in order before you start trying to park Ferraris on them.
So, if you have your systems in order, then how should you approach Docker? Well, first decide if the added complexity is worth the benefits of Docker. You are adding another layer to your systems and that adds complexity. Sure, Docker takes care of some of the complexity by packaging some of it beautifully away, but you still have to manage it and there's a cost to that.
You can accomplish many of the benefits of Docker without the added complexity by using standardized systems, ansible, version pinning, packaged deploys, etc. Those can be simpler and might be a better option for your business.
If the benefits of Docker outrank the costs and make more sense than the simpler cheaper alternatives, then embrace it! (remember, I'm talking about Docker in production - for development environments, it's a simpler scenario)
So, now that you've chosen Docker, what's the simplest way to use it in production?
Well, first, it's important to understand that it is far simpler to manage Docker if you view it as role-based virtual machine rather than as deployable single-purpose processes. For example, build an 'app' container that is very similar to an 'app' VM you would create along with the init, cron, ssh, etc processes within it. Don't try to capture every process in its own container with a separate container for ssh, cron, app, web server, etc.
There are great theoretical arguments for having a process per container, but in practice, it's a bit of a nightmare to actually manage. Perhaps at extremely large scales that approach makes more sense, but for most systems, you'll want role-based containers (app, db, redis, etc).
You probably already have your servers set up by role, so this should be a pretty straight-forward transition. Particularly since you already have each system scripted in Ansible (or similar) right?
To run Docker in a safe robust way for a typical multi-host production environment requires very careful management of many variables:
- secured private image repo (index)
- orchestrating container deploys with zero downtime
- orchestrating container deploy roll-backs
- networking between containers on multiple hosts
- managing container logs
- managing container data (db, etc)
- creating images that properly handle init, logs, etc
- much much more...
This is not impossible and can all be done and several large companies are already using Docker in production, but it's definitely non-trivial. This will change as the ecosystem around Docker matures (Flynn, Docker container hosting, etc), but currently if you're going to attempt using Docker seriously in production, you need to be pretty skilled at systems management and orchestration.
There's a misconception that using Docker in production is nearly as simple as the trivial examples shown for sample development environments. In real-life, it's pretty complex to get it right. For a sense of what I mean, see these articles that get the closest to production reality that I've found so far, but still miss many critical elements you'd need:
To recap, if you want to use Docker in production:
1. Learn systems administration
2. Ensure your current production systems are solid
3. Determine whether Docker's benefits justifies the cost
4. Use role-based containers
Shameless plug: I'll be covering how to build and audit your own systems in more depth over the next couple months (as well as more Docker stuff in the future) on my blog. If you'd like to be notified of updates, sign up on my mailing list: https://devopsu.com/newsletters/devopsu.html
I forgot to mention, the Phusion guys (who Mitchell mentions in the post and who create the excellent Passenger web server) have created some great assets for Vagrant and Docker:
And regarding role-based containers, Phusion's Hong Lai says:
"Wait, I thought Docker is about running a single process in a container?
Absolutely not true. Docker runs fine with multiple processes in a container. In fact, there is no technical reason why you should limit yourself to one process - it only makes things harder for you and breaks all kinds of essential system functionality, e.g. syslog.
Baseimage-docker encourages multiple processes through the use of runit."
> Docker runs fine with multiple processes in a container.
I think that, while most people realize this, it's important to highlight this fact again. Generally, the examples you see are "look, we can run this would-be-backgrounded-elsewhere daemon in the foreground in its own container!", when IMO this sets the bar a bit low for how people initially approach Docker. Containers have simplified my life significantly, and (again, IMO) their real power becomes evident when you treat them as logical units vs individual components.
That's a great post, thanks for that. I'm currently using basically Amazon * for everything and was looking into docker a little bit yesterday. For a two man team I think I will continue to stick with Amazon since my strength is not in system ops.
Just wanted to say that I wrote the 2nd article you mentioned, and I fall more into the category of "engineer that just wants to play with things" rather than "run a mission-critical business", so take my article with a grain of salt. Thanks for the good summary!
As someone just now learning how docker works, I absolutely agree.
Personally, I think there are some new-ish interesting immutability ideas that can be explored with regards to static files. It's not clear to me whether static assets (or even static sites) belong inside the container. I would be really interested in experienced folks' opinions on the immutability of the image. Where do you draw the line on what goes inside it and what's mounted in?
This is my general rule of thumb, but bear in mind it's only my approach. I'm not even going to claim that it's good. Just that it works for me...
Is it source or configuration? Then its external to the container, either mounted at runtime of the container (typically always true for source code) or injected into the image at build time through the Dockerfile. Application source is almost always mounted, things like nginx/apache configuration files are nearly always injected during the build process.
I prefer this approach because my source is decoupled from the image, and if another developer wants a dev setup free from Docker, he/she can do so.
I prefer the source I use for config files because it allows me to keep the dockerfile, the configuration files for various services, and a readme all beside one another in source control. This allows other developers to get a basic idea of the container's configuration (if they so desire), and also modify that configuration if they want to tweak/alter a setup.
I see the configuration file approach being potentially a bad idea, but with the small group I currently work with, we're all fairly comfortable making configuration changes as necessary and communicate those changes to others effectively. I don't know how well that approach would hold up at scale.
I would do the same thing with static sites and files, I think. Why? The static site isn't the part of the system, it's the product that should be executed on the system. Therefore, at least in my opinion, it should be as decoupled from that system as possible.
But, like I said, this is just my philosophy. I'm sure someone else will have an equally valid but totally opposite approach.
You don't have to choose between what's inside and what's mounted in. You can mark persistent directories (directories that should live longer than any given instance) as volumes with "dockercrun -v". Docker will arrange for them to live separately from the rest of the container filesystem. Then you can share volumes between containers with "docker run --volumes-from"
> If you truly want to bypass learning the basics, then use Heroku or another similar service that handles much of that for you. Docker is not the answer.
IMO Docker is a no-ops thing in the same way Heroku is. Not because Docker is especially good at taking care of all this but because Heroku isn't especially good either. Some things are taken care of by Heroku for free that aren't taken care of by Docker and vice versa. One example is that when you set up a Heroku app the number of logs it keeps is very short. This could be a disaster. At least when Docker is set up it keeps a reasonable number of logs (with most that are on GitHub). The developer may not know how to get them but can learn it on the fly.
The other no-ops solutions often suck in some ways because to the difference between the needs of the companies that sell them and the developer. So even though Docker might have some problems I'm not convinced that a naive Docker setup is worse than a naive PaaS setup.
Either way the developer who doesn't get the basics right (not your exhaustive list) is likely to be embarrassed at some point.
Vagrant was a huge step forward for managing vm environments, but I'm afraid its integration with Docker is forced and misguided.
For instance, the idea of ssh provisioner does not jive with Docker. The better approach is run the container with shared volume, and run another bash container to access the shared volume. If you are just starting to look at Docker, I would recommend to use Vagrant to provision the base image, and leave the heavy lifting to Docker itself.
What you described is actually the default way the Docker provider works, which aligns with the best-practice of how Docker is used: you can launch a set of single-process containers that don't support SSH, but have volumes mounted, links, etc. Then you use `vagrant docker-run` to launch one-off containers with bash, or run scripts, etc. The first two videos and examples show this.
To be clear: see the first example Vagrantfile that is in the blog post. Then read down further and see `docker-run`. You can use that to launch another container to get a bash prompt. This is _exactly_ the workflow you describe.
We built exactly for this. :)
We also support SSH-based containers, but you can see that it is explicitly opt-in (you must set "has_ssh" to "true"), also showing that it really isn't the normal way things are done.
Vagrant is about flexibility, and we surface both use-cases.
Yeah, part of me was thinking that too. Use container volumes. Use dockerfiles. And don't all containers support lxc-attach (as opposed to SSH?).
I didn't want to be negative -- it'd be great to be able to have an environment set up in 10 seconds with vagrantfiles -- so until then I'm trying to see the positives.
This looks promising, though the message "Warning: When using a remote Docker host, forwarded ports will NOT be immediately available on your machine" was a bit disappointing.
Given that it transparently wraps boot2docker and handles proxying SSH connections, I had hoped that it would also transparently manage the port forwarding to the host VM, as that's a much more common use case with Docker than SSH access.
I have wanted this since Docker was announced last year. In my eyes the biggest gain of Docker for development over VMs is boot time. Now I can turn all my Vagrant VBoxes to Docker containers, and work much faster. Thanks to all the maintainers for the hard work.
I did know about and tried vagrant-lxc, and even the per 3rd party vagrant-docker implementations. None of them worked well or provided the right amount of ease of use and features that Vagrant and Docker provided.
Can someone explain to me - are containers based on a base OS, or are they capable of running on any OS? I see the 'run anywhere' taglines and it just doesn't make sense to me.
I imagine that docker containers have to be provisioned in some way, and if you're provisioning with `apt-get` then it's not going to work when deploying to a redhat OS.
Essentially, I understand Docker containers to be lightweight virtual machines rather than applications that can be deployed to anything running the docker service. Am I on the right track?
As long as the userland is supported by your kernel, you can run it within a container on that host. You build your custom containers off of a base container that has the initial userland in it already. This is normally the first line in a Dockerfile:
Containers contain processes, and Docker base images allow you to use yum/dpkg/apt in various containers, it doesn't matter what host OS you use, as long as you run a supported Linux Kernel.
Although I've heard of Fig, I just took a closer look for the first time. From the docs, the Docker provider is a super-set of it. Meaning: Vagrant + Docker can do everything Fig can do and then some. I may have missed some detail, but at the very least all of the major pieces are there.
This is the best piece of news to start my Wednesday morning that I could've asked for. I've been slowly converting the entire team in the office over to Vagrant, and am using it for everything. I recently started playing with Docker and wanted to explore deployment of our web apps using it, and now I'll be able to slot it in to my existing workflow! Vagrant is amazing :)
Bit late to this game, but in theory would this let me do the following?
1. Set up a VM using CentOS to mimic my deployment environment
2. Distribute that to several people, including some running Windows and OSX, and have it automatically set up, with all parties reliably in exactly the same environment.
This particular post was a feature announcement, but Vagrant will indeed do that. For a summary, see the homepage under "Vagrant will change how you work," -- http://www.vagrantup.com/
I've personally used it for creating and iterating quickly on Puppet scripts. I've seen it recommended for devops with Chef also. See also: http://www.packer.io/ (for making your own gold master vm images; written in Go by the same guys)
Does Vagrant do any setup to allow you to run Docker within the dev container, say if you're working on a project that you wish to ship as a Docker container image?
You can "docker push" any binary image to index.docker.io.
However if you don't use a Dockerfile, you won't be able to use the Trusted Build feature, and other users won't be ae to verify which source the image was built from. So your image will remain a second-class cotizen.
Is anyone else really offended by the name of their product? I mean why didn't they name it Gypsy or Hindu, maybe Eskimo or Shemale? So many groups out there just waiting to be further denigrated, trivialized and then commoditised. Fuck these guys.
It's nice but it still doesn't fix the issue with people who have medium end machines where you develop in a VM full time and want to be able to run a VM within a VM.
For example with fairly old but still reasonable hardware you cannot run virtualbox inside of an existing virtualbox instance.
If you have a windows box and develop full time in a linux VM you cannot run vagrant inside of that linux VM because unless you have a modern CPU it lacks the instruction sets required to run virtualization within virtualization.
Now using docker instead of a VM would work but docker only supports 64 bit operating systems so anyone stuck with a 32bit host OS still can't use vagrant and has to resort to using raw linux containers without docker which is really cumbersome.
If you have a great dev box with a ton of ram then using docker or a VM is irrelevant. You can just set the whole thing up in a ram drive and things are nearly instant, assuming you're using virtualization for short lived tests on some provisioned server that matches your prod env.
With an older machine and a 32bit OS (ie. 2 gigs of ram) you can't do anything except run 2x 512mb VMs side by side within a host or a single ~1GB VM on its own so it's a real let down to see they decided to use docker instead of just plain LXCs which do work with a 32bit OS.
1) I built two Dockerfiless on my laptop (one for nginx, one for my postfix setup) tested locally, then scp'd the Dockerfiles over to the server, built images and ran them. I didn't really want to pollute the registry with my stuff. Is this reasonable? For bigger stuff, should I use a private registry? Should I be deploying images instead of Dockerfiles?
2) The nginx setup I deployed exports the static html as a VOLUME, which the run command binds to a dir in my home dir, which I simply rsync when I want to update (i.e. the deployed site is outside the container). Should I have the content inside the container really?
3) I'm still using the 'default' site in nginx (currently sufficient). It would be kind of nice to have a Dockerfile in each site I wanted to deploy to the same host. But only one can get the port. I sort of want to have a 'foo.com' repo and a 'bar.org' repo and ship them both to the server as docker containers. Don't really see how to make that work.
What I think I want is:
- a repo has a Dockerfile and represents a service
- I can push these things around (git clone, scp a tgz, whatever) and have the containers "just run"
Not sure how to make that fit with "someone has to own port 80"