Hacker News new | past | comments | ask | show | jobs | submit login
Docker not ready for primetime (goodstuff.im)
490 points by erlend_sh on Aug 28, 2016 | hide | past | favorite | 281 comments



I have run docker in production at past employers, and am getting ready to do so again at my current employer. However I don't run it as a native install on bare metal. I prefer to let a cloud provider such as Google deal with the infrastructure issues, and use a more mature orchestration platform (kubernetes). The author's complaints are valid, and the Docker team needs to do a better job on these issues. Personally I am going to be taking a close look at rkt and other technologies as they come along. Docker blew this technology open by making it more approachable but there is no reason to think they are going to own it. It's more like databases than operating systems.


I'm trying to get started with rkt right now but it's (understandable) lack of maturity is a bit daunting, I think some usability issues need to be handled / offloaded to some other tools. And acbuild severly needs caching built in.

Disclaimer, new to containers, if it sounds like I'm doing something wrong let me know, it certainly feels like I'm missing something right now.


You'll see a maturity in this space after the Open Containers Initiative standardizes the container image format. Then, developers can focus on UX improvements rather than worrying about what the build tool should even be producing.


Reliance on initiatives and standards bodies to provide a guidance for maturity, at least in recent years, is a fools errand. They usually end up rubber stamping what's been the norm and work back from there.


That's true, OCF is more or less the same as the Docker container format.


If you are looking at rkt, if you have time, take a look at Kurma (open-source too). Kurma was built using the same specification rkt was built from. Here is the getting started guide: http://kurma.io/documentation/kurmad-quick-start/ What I like of kurma is that I can simply run docker images from the hub, no need to download, convert, etc.

Disclaimer: I am an Apcera employee and Kurma is an open-source project sponsored by Apcera


We use it in production.

It generally works if:

* you don't use it to store data

* don't use 'ambassador', 'buddy', or 'data' container patterns.

* use tooling available to quickly and easily nuke and rebuild docker hosts on a daily or more frequent basis.

* use tooling available to 'orchestrate' what gets run where - if you're manually running containers you're doing it wrong.

* wrap docker pulls with 'flock' so they don't deadlock

* don't use swarm - use mesos, kube, or fleet(simpler, smaller clusters)


woah, wait.....

It "geneerally works if" you " rebuild docker hosts on a daily or more frequent basis."

Perhaps I'm misunderstanding, but needing to rebuild my prod env several times a day seems pretty "not ready for prime time" to me.

That's like when we'd say that Rails ran great in production in 2005, as long as you had a cron task to bounce fastCGI processes every hour or so.

So, can you elaborate on why rebuilding the containers is good advice?


The security exec at Pivotal, where I work, has been talking about "repaving" servers as a security tactic (along with rotating keys and repairing vulnerabilities).[0]

The theory runs that attackers need time to accrue and compound their incomplete positions into a successful compromise.

But if you keep patching continuously, attackers have fewer vulnerabilities to work with. If you keep rotating keys frequently, the keys they do capture become useless in short order. And if you rebuild the servers frequently, any system they've taken control of simply vanishes and they have to start from scratch.

I'm not completely sold on the difference between repair and repave, myself. And I expect that sophisticated attackers will begin to rely more on identifying local holes and quickly encoding those in automated tools so that they can re-establish their positions after a repaving happens.

But it raises the cost for casual attackers, which is still worthy.

[0] https://medium.com/built-to-adapt/the-three-r-s-of-enterpris...


Having everything patched as soon as patches are available (or within, say, 6 hours of availability, for "routine" patches, with better responsiveness for critical patches) is a win.

The rest: not so much.

Rebuilding continuously for security is not something I would recommend.


> Rebuilding continuously for security is not something I would recommend.

So that I understand, could you elaborate?

Particularly, do you mean "not recommend" as in "recommend against" or "not worth the bother"?


It's not worth the bother. Apart from keeping patches up today --- which is a good idea --- it's probably not really buying you anything.

It's not crazy to periodically rotate keys, but attackers don't acquire keys by, you know, stumbling over them on the street or picking them up when you've accidentally left them on the bar. They get them because you have a vulnerability --- usually in your own code or configuration. Rebuilding will regenerate those kinds of vulnerabilities. Attackers will reinfect in seconds.


A lot of companies do lose their keys that way. www roots, gists, hardcoded in products, github history, etc.

The win to rotating them is not so much because you'll be regularly evicting attackers you didn't know had your keys, but because when you do have a fire, you won't be finding out for the first time that you can't actually rotate them.

It also forces you to design things much more reliably which helps continuity in non-security scenarios.

After redeploying and realizing that Todd has to ssh in and hand edit that one hostname and fix a symlink that was supposed to be temporary so the new version of A can talk to B, that's going to get rolled in pretty quickly. Large operations not doing this tend to quickly end up in the "nobody is allowed to touch this pile of technical debt because we don't know how to re-create it anymore" problem.


It seems like it's good to be able to rebuild everything at a moment's notice after patching against a major exploit, though. You should have a fast way to rebuild secrets and servers after the next heartbleed-scale vulnerability.


Being able to rebuild critical infrastructure from source, and know that you'll be able to reliably deploy it, is a _huge_ win for security.

After a bunch of harrowing experiences with clients, I'm pretty close to believing "using packages for critical infrastructure is a bad idea".


Being able to rebuild critical infrastructure from source, and know that you'll be able to reliably deploy it, is a _huge_ win for security.

In that case, you might be interested in bosh: http://bosh.io/docs/problems.html (the tool that enables the workflow jacques_chester was describing). It embraces the idea of reliably building from source for the exact reasons you've mentioned.


I'm confused now, earlier you recommended patches over rebuilding continuously from source, but this seems like the opposite?


What does "packages" mean here? Sorry.


My guess is that "packages" is shorthand for "binary packages", as opposed to being able to redeploy from source.


Nod.


I'm guessing they meant to write "patches".


That was my hunch too. Thanks. I'll ask more about whether I missed something on the other side of the argument.


Depends, usually you have to be able to re-build your prod infra within minutes or maximum hours, otherwise you are doing devops wrong. The whole point of automation is reproducible infrastructure that you can stand up quickly. With stateless approach you can just do this. Why would you do that? Imagine an outage in one of the 3 datacenters you are running your infra in the same region. You need to move 1/3 of the capacity to the remaining 2 datacenters. This is not too much different to re-building it.


Aaah, the classing "you're doing <it> wrong" argument. I can come up with dozens of different environments where it is simply not feasable to rebuild an environment within two hours.

- Any infrastructure with lots of data. Data just takes time to move; backups take time to restore.

- You're on bare metal because running node on VMs isn't fast enough.

- You're in a secure environment, where the plain old bureaucracy will get in the way of a full rebuild.

- Anytime you have to change DNS. That's going to take days to get everything failed over.

- Clients (or vendors) whitelist IPs, and you have to work through with them to fix the IPs.

- Amazon gives you the dreaded "we don't have capacity to start your requested instance; give us a few hours to spin up more capacity"

> Imagine an outage in one of the 3 datacenters you are running your infra in the same region. You need to move 1/3 of the capacity to the remaining 2 datacenters.

Oh, this is very different. If your provider loses a datacenter, and your existing infrastructure can't handle it, you're already SOL - the APIs for spinning up instances and networking is going to be DDOSed to death by all of the various users.

Basic HA dictates that you provision enough spare capacity that a DC (AZ) can go down and you can still serve all of your customers.


I mostly disagree with your points, with the exception of the last one.

I used to work in the team that runs Amazon.com. All of the systems serving the site can be re-built within hours and nothing can serve the site that cannot be rebuilt within a very thing SLA. However, I understand that not all the companies have this requirement. This feature is only relevant when a site downtime hurting the company too much, so it could not be allowed.

Reflecting to your points:

- Lots of data -> use S3 with de-normalized data, or something similar

- Running a VM has 3% overhead in 2016, scalability is much more important than a single node performance

- High security environments are usually payment processing systems, downtime there can be a bit more tolerated, delaying transactions is ok

- Amazon uses DNS for everything, even for datacenter moves. It is usually done within 5 minutes

- This is a networking challenge, using something like EIP (where the public facing IP can be attached to different nodes) makes this a non-issue

- Amazon has an SLA, they extremely rarely have a full region outage, so you can juggle capacity around

Losing a dc out of 3 does not require work because you can't handle the load, it is required to have the same properties (same extra capacity for example) just like before. Spinning up instances should not DDOS anything, it is with constant load on the supporting infrastructure.

The last point I agree with.


First, two important assumption I'm making when I say this (and I feel they are reasonable assumptions). I'm not just talking about bringing a production environment back up in the same or adjacent AZ; I'm talking about true DR, where you're moving regions. I'm also not limiting my discussion to AWS' infrastructure - not with Google, Rackspace, Cloudflare and others in the space as well.

> Lots of data -> use S3 with de-normalized data, or something similar

S3's use case does not match up with many different computing models (hadoop clusters, database tables, state overflowing memory), and moving data within S3 between regions is painful. Also, not all cloud providers have S3.

> Running a VM has 3% overhead in 2016, scalability is much more important than a single node performance

Not when you have a requirement to respond to _all_ requests in under 50ms (such as with an ad broker).

> High security environments are usually payment processing systems

Or HIPPA, or government.

> delaying transactions is ok

Not really. When I worked for Amazon, they were still valuing one second of downtime at around $13k in lost sales. I can't imagine this has gone down.

> Amazon uses DNS for everything, even for datacenter moves. It is usually done within 5 minutes

Amazon also implements their own DNS servers, with some dynamic lookup logic; they are an outlier. Fighting against TTL across the world is a real problem for DR type scenarios.

> EIP (where the public facing IP can be attached to different nodes) makes this a non-issue

EIPs are not only AWS specific, but they can not traverse across regions, and rely on AWS' api being up. This is not historically always the case.

> they extremely rarely have a full region outage, so you can juggle capacity around

Not always. Sometimes, you can. But not always. Some good examples from the past - anytime EBS had issues in us-east-1, the AWS API would be unavailable. When an AZ in us-east-1 went down, the API was overwhelmed and unresponsive for hours afterwards.

> Spinning up instances should not DDOS anything, it is with constant load on the supporting infrastructure.

See above. There's nothing constant about the load when there is an AWS outage; everyone is scrambling to use the APIs to get their sites backup. There's even advice to not depend on ASGs for DR, for the very same reason.

AWS is constantly getting better about this, but they are not the only VPS provider, nor are they themselves immune to outages and downtime which requires DR plans.


"Any infrastructure with lots of data. Data just takes time to move; backups take time to restore."

Exactly. Don't put data in Docker. Files go in an object store, databases need to go somewhere else.


> Any infrastructure with lots of data.

OP's first point is 'don't put data in docker'. Docker is not for your data. But more to the point, if you're rebuilding your data store a couple of times every day, a couple of hours downtime isn't going to be feasible.

> You're on bare metal because running node on VMs isn't fast enough

In such a situation, you should be able to image bare metal faster than 2 hours. DD a base image, run a config manager over it, and you should be done. Small shops that rarely bring up new infra wouldn't need this, but anyone running 'bare metal to scale' should.

> bureaucracy

Isn't part of the infra rebuild per se.

> Anytime you have to change DNS. That's going to take days

Depends on your DNS timeouts, but this is config, not infra. Even if it is infra, 48-hour DNS entries aren't a best-practice anymore (and if you're on AWS, most things default to a 5 min timeout)

> Clients (or vendors) whitelist IPs, and you have to work through with them to fix the IPs

I'd file this under 'bureaucracy' - it's part of your config, not part of your prod infra (which the GP was talking about).

> Amazon gives you the dreaded...

Well, yes, but this is on the same order as "what if there's a power outage at the datacentre". Every single deploy plan out there has an unknown-length outage if the 'upstream' dependencies aren't working. "What if there's a hostage event at our NOC?" blah blah.

The point is that with upstream working as normal, you should be able to cover the common SPOFs and get your prod components up in a relatively short time.


> OP's first point is 'don't put data in docker'. Docker is not for your data.

I agree, but I (and the GP, from my reading) was not speaking about only Docker infrastructure.

> Isn't part of the infra rebuild per se.

I can see your point, and perhaps these points don't belong in a discussion purely about rebuilding instances discussion. That said, I have a very hard time focusing just on the time it takes to rebuilding capacity when discussing a DC going down; there's just too many other considerations that someone in Operations must consider.

When I have my operations hat on, I consider a DC going down to be a disaster. Even if the company has followed my advice and the customers do not notice anything, we're now at a point where any other single failure will take the site down. It's imperative to get everything taken down with that DC back up; and it's going to take more than an hour or two.


I know you probably aren't trying to address all cases, but just because you can't re-build your prod infrastructure in minutes or hours doesn't mean you aren't doing devops right.

Many larger companies can't do this; my company has 70+ datacenters with tens of thousands of servers. We can't re-build our prod infra in minutes or hours. We are still doing devops right :D

Like I said, I know you aren't talking about my situation when you made your statement... I just get frustrated when people act like there are hard and fast rules for everyone.


Well I am not talking about a 1M node outage. That you cannot fix with anything. I am talking about a maximum datacenter wide outage, that actually happens pretty often. Amazon has game days, Netflix has chaos monkeys for the same reason. Make sure that you can rebuild parts of your infra pretty quick.


As long as your VMs are prebaked and your cloud provider supports ASG-esq primatives (i.e. "I need X instances running at a time, instantiate with such and such meta data") anyone can rebuild their prod infra quickly. You don't need Docker or containers to do that.


Some of us work for those cloud providers :)


I have no doubt ;) it's why I always strive for accuracy here!


Oh, of course. Our datacenters often go offline (both planned and unplanned), and we are always ready to handle that. We are pretty much constantly re-provisioning servers... with so many physical machines, hard drive and other hardware failures are a daily occurrence.


I'm going to have to call bullshit, Where do you work or what company do you own?

I work at MindGeek, depending on the time of the year it would be fair to say we rank within the top 25 bandwidth users in the world. We are not even close to that amount of servers and we deal with some of the largest traffic in the world. What company is running in 70+ datacenters!? World's largest VPN provider? Security company providing all the data to the NSA?

Maybe it is just my broad assumptions but I would hope that the major big 10 that come to mind such as Google, Amazon, Microsoft, etc would be able to rebuild their production regions in hours.


Not the OP, but first thing that comes to mind as to why you'd need a lot of datacenters are CDNs like Cloudflare or Akamai. Stuff like that, you need lots of servers, lots of storage, and low latency. You'd also need a good amount of configurations because things like video streaming would require different server settings than, say, protecting a site that's being DDOSed.


Ding ding ding :D

I don't work for one of those two, but I do work for a very large CDN.


> I work at MindGeek, depending on the time of the year it would be fair to say we rank within the top 25 bandwidth users in the world.

"I'd have thought I've had heard of them..."

One Wiki search later... yup. I've heard of them.


Ha! I went to their website... Hmm... never heard of them. Went to Wikipedia... Oh, now I know who they are! Yeah. Lot's of bandwidth.


>What company is running in 70+ datacenters!?

People running edge networks and therefore need servers local to everywhere in the world to keep latencies down. Maybe it's not as much of an issue for MindGeek (the parent company for a lot of video streaming sites). I would guess you guys need a lot of throughput but latency isn't so much of a problem. Or you simply don't need to serve some parts of the world where it might be illegal to distribute some types of content.

FWIW, Cloudflare has 86 data centers: https://www.cloudflare.com/network-map/


Or they are customers of CDNs.


[Note: MindGeek = the pornhub network]

Short version: That's a video streaming websites which is rather simple, yet bandwidth intensive.

Outsourcing the caching and video delivery means MindGeek can do with little servers and a few locations.

Nonetheless the CDN you're outsourcing to does need a lot of servers at many edge locations.

Actually, if we think in terms of "top bandwith users in the world". It's possible that your company is far from being in the list. It's likely dominated by content delivery / ISP / and other providers, most of which are unknown to the public.


Would you say Netflix & Youtube are rather simple? Handling +100 million users is never rather simple...


Youtube had the challenge of being one of the first streaming services, and it's operating at an unprecedented scale. I am actually wondering, how many orders of magnitude youtube has more traffic than pornhub?

I am in distributed systems and try to work exclusively on hard problems. So when I say "simple", that is biased on the high end of the spectrum.

If you go to pornhub.com and looks at "popular keyword", you'll only find thousand or ten thousands videos. In a way, there is not that much content on pornhub.

All major websites have challenge. Pornhub is a single purpose website and a lot of the challenge is in video delivery, which can be outsourced to a CDN nowadays.

"simple" is maybe too strong a word. I am trying to convey the idea that it has limited scope and [some of] the problems it's facing are understood by now and have [decent] solutions.

That's not to say it's easy ;)


I work for a large CDN.


I think the point of the comment you're replying to is not really what you are replying to. Having the ability to rebuild quickly and frequently is great, and something you should aim for, but actually being forced to do it regularly whether you need it or not is pretty bad.

Edit: didn't notice someone else had already said this, opened this tab like 15 minutes ago


I see your point. The source of force that makes you do it should not come from Docker I agree. :) We are just lucky that we can do it when it happens.


"have to be able" is very different from "have to"

Sure, you want to be able to deploy quickly. But if there's no reason to, then don't.

And I would be very scared if Docker images had a 1 day uptime max


Can != should


We see a couple of different bugs that are best solved by simply rebuilding the container host. To docker's credit these tend to decrease with high versions.

We also see them mostly in non-prod environments where we have greater container/image churn. We use AWS autoscale and Fleet so containers just get moved to other hosts when we terminate them. We have actually thought about scheduling a logan's run type job that kills older hosts automatically - it's in the backlog.


Bugs that can be solved by a rebuild are not restricted to Docker. We had an interesting week when the build was red all the time for various reasons, and then prod started failing. Usually we deploy once a day, and not deploying for a week caused several small memory leaks to turn into big ones.


> So, can you elaborate on why rebuilding the containers is good advice?

While I sincerely hope I'm wrong, I assume it's because you reset the clock on the probability something goes very wrong.


The "have you tried turning it off and on again" of DevOps. It makes a surprising amount of sense though, as long as your service is truly stateless, the restart can be easily orchestrated, and it results in no difference in operational costs.


If it's stateless, then why does rebuilding it change anything about the frequency of bugs popping up?


Ooh I can answer this one: because ask people if their container root is writeable, and get amused at the blank stares you get back.

I am currently fighting an ongoing battle at work to point out that the plans for our Mesos cluster have not factored in that the first outage we have will be when someone fills up the 100gb OS SSD because no one's given any thought to where the ephemeral container data goes.


I am a layman to devops. By "ephemeral container data" do you mean temporary files created by the service, temporary files created by the OS / other applications, or something else?


if both the code and the infra it's running on is stateless, then yeah.


So we're replacing somewhat not fully stable VMs running on somewhat not fully stable virtualization infrastructure with theoretically stable containers running on violently unstable container infrastructure?


Pretty much, yes.


Mutability is the root of all computing evils


And the source of all computing value.


The host isn't the container itself. They want to re-provision the host likely not because of something wrong with the application but instead docker is in some state which is non-recoverable, or at least not recoverable by automatic means.


Because then you always know you can always rebuild automatically and that's being tested constantly while developers work, a bit like how Netflix crashes everything all the time randomly to ensure they can always automatically recover from every dependency.

It also naturally rewards optimizing around time-to-redeploy, probably a lot of benefits there.


" use tooling available to quickly and easily nuke and rebuild docker hosts on a daily or more frequent basis."

That's like "my car works fine as long as you spray WD-40 into the engine block every 48 hours..."


Of course, since we're developers, we will automate this task with a 3D-printed WD-40 injector plus filling level indication for the dashboard.


Where can I read more about the container patterns you mentioned?


These are some good curated books https://msdn.microsoft.com/en-us/library/dn568099.aspx http://shop.oreilly.com/product/0636920023777.do

These are not necessarily container centric.


http://static.googleusercontent.com/media/research.google.co... is a collection of more advanced container patterns you might be interested in.


google :)


Thats a lot of ifs that could go away if docker would put some more emphasis on quality.


You shouldn't need to wrap pulls with flock anymore, the pull code got completely rewritten and should be fine now.

Also hosts shouldn't need to be rebuilt THAT often. Of course, your infrastructure should automatically nuke and replace failing hosts (and there's various ways to establish what's "failing"), and like any good infrastructure team you should be keeping your hosts up to date with security patches, etc, which is best done by replacing them with new versions (as then you can properly test and apply a CI lifecycle), but docker itself is stable enough these days that you don't need to be that aggressive with nuking.

Agree (especially with the comments about orchestrating with kube/mesos) with all the other points, though.


Can you elaborate on:

"you don't use it to store data" - What is wrong with Docker volumes specifically? What issues did you run into?

"wrap docker pulls with 'flock' so they don't deadlock" - I have never had a problem with docker pull, can you elaborate on this?


I've been running Docker in production for a year and both of these issues were very real.

I got bit hard by this issue:

https://github.com/docker/docker/issues/20079

when CoreOS stable channel updated Docker. All my data volume containers broke during the migration and migration could not be reverted.

As for the second issue: when two systemd services that pull containers with the same base images would start simultaneously, they would deadlock pretty reliably. Had to flock every pull as a result. This might be fixed by now though.


Yikes that issue has been open since February?!


I'm echoing /u/bogomipz' request for elaboration. I rely heavily on docker-managed data volumes for data. The advice in these threads of comments is alarming. Should I conclude that hackernews devs don't understand docker-managed volumes, or is there some glaringly obvious downside that I'm failing to see?


I'm curious about the "buddy" and "data" container patterns -- I didn't see those in the documents you linked to. Why do you mention these as being best avoided?


may I ask - what dragons did you find that make you want to rebuild docker hosts daily? Do you do this just to properly clean up unused containers / images and such?


I spent close to 12 hours yesterday trying to get a fairly simple node app to run on my mac yesterday. Turned out I had to wipe out docker completely and reinstall. Keep in mind this is their stable version thats no longer in beta. I've just run into too many documented bugs for me to consider it stable. I wouldn't even say it should be out of beta.

The issues here are the real telling story: I spent close to 12 hours yesterday trying to get a fairly simple node app to run on my mac yesterday. Turned out I had to wipe out docker completely and reinstall. Keep in mind this is their stable version thats no longer in beta. I've just run into too many documented bugs for me to consider it stable. I wouldn't even say it should be out of beta.

The issues here are the real telling story. https://github.com/docker/for-mac/issues

I love docker, it's amazing when it works. It's just really not there yet. I get that their focus is on making money right now, but they need to nail their core product first. I honestly don't care about whatever cloud platform they're building if their core app doesn't even work reliably.


I'm a fan of Docker in general but I'm amazed by some of the poor choices they've made with Docker for Mac.

- Stable? That's honestly laughable. It is nowhere near stable. As an example, I was trying to upload images to a third-party image registry but the upload speed was ridiculously slow. It took me forever to figure out but it turned out I needed to completely reinstall docker for mac.

- They had a command-line tool called pinata for managing daemon settings in docker for mac. They chose to get rid of it. Not only did we lose a way to declaratively define and set configuration but the preferences window has no where near all of the daemon settings that are available.

- The CPU usage is still crazy. I regularly get 100% CPU usage on 3/4 of my CPU cores while starting up just a few containers. Even after the containers have started it will idle at 100% 1/4 cores.

- It needs to be reinstalled regularly if you are using it on a daily basis. Otherwise it will get slower and slower over time. See my first complaint.

- The GUI (kitematic) will randomly disconnect from the daemon forcing me to restart the GUI repeatedly.

- They really need some sort of garbage collector with adjustable settings. With the default settings the app will just keep building and building images and eventually fill up, crash, slow down, etc. How is that acceptable? What other apps do that?

Like I said, I like docker in general. I think they are tackling some very hard problems and definitely experiencing some growing pains from such crazy growth. However, at some point they need to take a step back and focus on the core of what they offer and make it as simple, and rock solid as possible. As another example, they still haven't added a way to compress, and/or flatten docker images. No wonder docker for mac slows down after regular use when it's building 1GB+ images for simple things.


Not 100% sure but I think there's a memory leak in hyperkit. Eventually the memory usage will grow to fill up the allocated space then docker will crash. It might be something else causing it, but that's just what I've observed.

There's also the Docker.qcow2 file ballooning in size. Only way is to do a "factory reset" or running a couple commands to clear out old images.


Not sure where the leak is, but I can confirm there is definitely one there. For me it happens whenever I restart Docker for Mac: 500+ MB of usable RAM gone. Combined with the fact that we don't reboot our Macs very often and that Docker for Mac needs frequent restarts because of hogging CPU otherwise, that's a bit of a problem.


Yes, agreed 100%.

Also, the CoW disk volume they use basically grows unbounded, even if you purge layers and old containers.


Heads up -- I think you messed up your copy/paste.


That, or the issues here are the real telling story.


I'd like to point out that docker or no docker, node is one of the worst languages to host applications for, custom selinux rules to get around if executing memory of the stack, memory and CPU hungry applications, npm is an absolute nightmare and you have people that don't understand programming writing 'code' they think is production worthy.


This is because docker wasn't written for apples sad excuse for bsd, it was written for Linux.

It's cute that they're trying to port it other places, but if you're trying to run with stability, don't use a mac, use a Linux box.

Why they did this; I have no idea, but it's a horrible idea.

If nothing else use VirtualBox, install Linux, and use docker that way.


Only Dockers has the same issues on Linux -- in fact most of the people complaining in this thread use it on Linux in the first place.


One of Docker's biggest problems is that internally they have fomented a culture of "users are stupid" which is immediately apparent if you interact with their developers on GitHub.


Docker founder here.

It makes me sad that you believe that. We're not perfect but we try very hard to keep our users happy.

Are there specific issues that you could point out, so that I can get a sense of what you saw that you didn't like?

Keep in mind that anyone can participate in github issues, not just Docker employees, and although we have a pretty strict code of conduct, being dismissive about another participant's use case is not grounds for moderation.

EDIT: sorry if this came across as dismissive, that wasn't the intention. We regularly get called out for comments made by non-employees, it's a common problem on the github repo.


Look dude, this is hard. Most of the people here are rooting for you, even the ones complaining. I'm on your side.

But this isn't about you, or your feels, or what you think is going on.

All that's happening is people are trying to communicate with you and you're not listening. You're doing your best, I don't doubt that, but you've got to step back, regroup, and come at the problem from another angle.

Don't get defensive, don't make excuses. "When you make a mistake, take of your pants and roll around in it." ;-) Give people the benefit of the doubt that they mostly know what they're complaining about (even if they don't.)

It's a pain-in-the-ass, but it's the only way to deal with this kind of systemic (mis-?)perception in your community.


I have said one form of this or another to Solomon regarding Docker the company and his own personal defensiveness on two or three separate occasions on this account alone[0]. So while I laud your effort, I don't think much is going to change on this front. Honestly, that's too bad and I'm not being snarky or shitty here; I genuinely wish Docker and Solomon would work on this stuff because it'd go a long way. I'm discouraged to even bring stuff up any more.

I don't even think it's intentional, really, but I have never once seen a response by Solomon to criticism of Docker that did not dismiss the content and messenger in some way. These things are hard to get right and I'm not perfect, so I don't even really know what advice to offer and I'm far from qualified. I am very frequently dismissive of criticism as well and have had to put a lot of work into actively accepting it, so I can at least understand how hard it is.

I outright told him his initial reaction to Rocket, to use an example, directly caused me to plan for a future without Docker. Companies are defined by their executives, and a lot of Docker's behaviors become clearer when you consider some of the context around Solomon's personal style.

[0]: the most productive example being https://news.ycombinator.com/item?id=8789181


I feel a little badly about the exchange now, having more of the context available.

I told my sister last night, jokingly, "I think I made a co-founder of Docker cry on HN today." (I also explained what Docker and Hacker News are a little. She likes me so she didn't call me a nerd to my face.)

Honestly, I was just amused at the reply evidencing (or seeming to) the very attitude the OP was complaining of, it was incredible.


It is your right to ignore nonpaying users but do so at your own peril. RedHat is a perfect example of this. They started loosing their lead in the Linux market as soon as they quit directly supporting the desktop version of RedHat Linux. Fedora was a poor replacement and it opened the door for Ubuntu. What RedHat didn't realize was that even though these nonpaying users seemed like a drain on resources, they were the next generation of hackers who would have recommended, or even insisted upon, RedHat enterprise at their future jobs and would have brought with them the expertise to roll it out.

Docker may be on top now but if you don't cater to the needs of your nonpaying users, it will be short lived. This includes taking seriously feature requests which don't necessarily serve enterprise.

You can find me on GitHub.


Why don't you just ignore people like this? There's a whole thread about technical issues upthread, and you respond to some random guys emotional problems?

If there is a problem with the way the Docker open source project is run, it is with the amount of small but serious bugs that affect subsets of your users. This is probably due to you guys overextending.

Responding that you are saddened by this guys issue with his perception of the 'culture' in your team is just craziness. No one will be helped by this. The only thing that will help is showing that you guys are running a tight ship. Maybe hire an extra person on two to maintain your GH issue garden and proceed with making Docker the successful business we all believe it can be.


fwiw I read this comment and didn't find it dismissive at all. shykes indicated this isn't the experience the docker community is striving for and asks for specific instances so that he could gain more context and to potentially address the problem.

I have seen many instances of individuals with no affiliation to an open-source project being negative or rude and therefore giving a negative impression of the community or parent company. As a project owner or maintainer this is a really difficult thing to see happening, and can be difficult to address and prevent. So I personally think shykes makes a valid point. I wouldn't take this as dismissive. Give him the benefit of the doubt.

I also think we should cut maintainers a bit of slack. Imagine your are managing an open source project with over 10,000 logged issues and even more in the community forums. It is draining to spend all day dealing with complaints and issues and can be a thankless job. Maintainers often try to put their best foot forward, but it isn't an easy and they are human, and make mistakes and say things they regret. I'm not saying it is justified, just try to put yourself in their shoes.

I've personally logged a few issues with the Docker team and have also found the interactions to be respectful. There are times when I've asked for features and they have certainly played devils advocate, but that is to be expected with any project that is trying to prioritize their work and constantly fighting feature creep.


Do you realize you've just provided an example?

The third paragraph particularly is spectacular, as you manage to be dismissive about being dismissive.

Incroyable.


Sorry, that wasn't the intention. I tried to clarify with an edit in the original post.

It looks like now is not a good time for me to participate in this discussion. Instead I'll make a list of things we can improve so that the next HN post about Docker is a more positive one.


naaaaaaw! Don't squander this golden opportunity to connect with your peeps! You can turn this around, but you've got to "walk the talk" here.

You should just be like, "Hey folks, we're sorry as hell we let you down. I'm here now and I'm listening. What has made you sad, and what can we improve?"

People love that sh--stuff.

That's just my advice. I don't even use Docker. I just like to see warm-fuzzy success, eh?


I'm not the original commenter, but the "--insecure-registry should be on docker pull" issue is one that impacted me personally. It was maddening reading all the comments from frustrated users who were running into the same problems I was: https://github.com/docker/docker/issues/8887


I think this is one of many instances where Github's issue system doesn't enable good communication between developers and end users.

- You shouldn't have to read through pages of posts to find potential workarounds.

- You shouldn't have to count comments to guess if there are a lot of people affected by an issue.

- You shouldn't have to read comments, then wonder which posts were an official developer response or a comment from another end user.

- There should be some way to communicate easily that "this issue has our attention, but dealing with it is de-prioritized because ..."

In this instance, you can see it's a security issue and it will be tough to convince Docker to change it, but at this point it's two years later and people are still griping about it, so maybe it's time to put some attention on it. There are a few suggestions for seemingly reasonable updates, but nobody is championing any of them, presumably because there's no indication other than comment activity that Docker will consider any updates on this issue whatsoever.

I'm not sure myself what the right thing to do from Docker's perspective would be, but this is clearly one of those issues that has worn through the attention span of the development team. Someone either needs to stand up and say "we're not changing anything here" or "we're looking for a solution" - no bug report should be open that long.


"Like @ewindisch already said, we do not want to encourage this client-side behavior. The pain induced by requiring the flag as a daemon flag, is so that people actually set up TLS on their registry. Otherwise there's no incentive. Thanks for your understanding."

This was a reply by @tiborvass in https://github.com/docker/docker/issues/8887

Can you possibly get any more dismissive?


[flagged]


Dude, you literally had the founder of Docker asking you where the opinion came from that devs don't care about users. He or she just pointed out that often times non-employees talk on github issues and just remember Docker can only be responsible for employees conduct, not that of others. You've read into this that

1) you have been called a mouth breathing moron

2) it is confirmed that docker is run by assholes

3) the founder does not care about users (by asking for examples of poor conduct by Docker employees)

4) docker is run by a lousy leader who cannot demonstrate care, again by asking where people get a sense of docker not caring.

Internet comments lack a lot of tone so its easy to misread, but when there's this kind of fanatic backlash when there's a direct connection to the founder of a pretty big service, the luster of any of these sites dims considerably. You have given nothing actionable to the person who could have done something positive for docker, and just made it that more difficult for the organization to take legitimate criticisms at face value.


Well, that's an incredibly cynical and negative way to read shykes' words. FWIW, I thought his post was fine.

A more constructive response than yours would be to post examples of where Docker has failed in community engagement, rather than just re-stating his post with the most negative spin possible.


Hi,

Can you please add Facebook connectivity to Docker?

Thanks.


They more or less ignored the file sync issue of Docker for Mac, until there was a push asking the Docker devs for more transparency. There were other issues with the release, and where people did not push for it, issues would languish with no one from the dev team saying anything. It sometimes feels like a culture of coverups and shame. (Let's hide this under here and maybe no one will notice).

If people are feeling the "users are stupid", combined with what looks like coverup culture, I find Docker's blog posts less and less credible.

For example, there was a recent blog post about An independent security review comparing Docker to other similar technologies, including rkt. The conclusion of the post was that Docker is secure by default. (I'll leave it to the reader's own opinion whether that is true or not; this is not the issue I am pointing out).

What is so weird is that, nearly two years after the fact, the tone in that blog post was still as if CoreOS had betrayed them. And while I get there are hurt feelings involved, this is not a high school popularity contest. When combined with arrogance and coverup culture, I can see Docker moving in a direction that drives them further and further from relevancy.

Based on what I'm hearing here about Docker Swarm, Swarm is basically a great advertising for K8S or Mesos. No one there is pretending that multi-node orchestration is an easy problem. If the divergence from the community continues, I can see lots of people fed up and going over to rkt, or something else that works just as well.


> They more or less ignored the file sync issue of Docker for Mac

Which particular problem? The crappy performance/CPU usage one(s), or something else? Having used various different approaches (docker in vmware linux/virtualbox), the d4m (osxfs?) one seemed to be the least broken for general dev stuff.

Hopefully not more issues I need to keep a lookout for.


Crappy performance related to the osxfs --latency issues for writes for host volumes. It's getting closer to the latency as Docker for Linux. (I hear though, workarounds like using unisonfs and fswatch works all right)

There is also the bit about host networking. Since Docker for Mac runs a transparent Linux, the host networking goes there instead. That thread ended up in a big, roaring silence.


I've also had such an experience. Same with Realm.io. The developers get very dismissive with just about anybody who comes in saying they have a use case which doesn't fit the scope of what already exists. Meanwhile features/fixes sit on the backlog for years.

I'm not saying this will necessarily stop me from using a project, but it definitely does not create any loyalty.

Compare this to say Rails, where I've had really positive experiences with the maintainers.


This is pretty dangerous path to failure. If you look at the most successful companies it is obvious that they respect the user base using their service. Software is no exception to that, you have to make it user friendly, regardless if your user base is mostly software and systems engineers.


I don't think it's so cut-and-dry. I mean, does Oracle respect its users and is it user-friendly? What about Epic, the dominant software system in the healthcare industry? Or Microsoft SharePoint?


"If you look at the most successful companies it is obvious that they respect the user base using their service" - I think a better way of putting it would be that "successful companies respect their customers". In the case of software tools like Docker, the customers generally are the users. In the case of Epic, an electronic health record, the buyers are senior management, the users are the clinical staff such as the nurses, doctors and techs. So Epic does give the senior management what they want (buzzword compliance, a safe choice that it's hard to get fired for), but doesn't give clinical staff what they want..because they are not the ones making the buying decision.


Most, not all. I think you are still better of listening to your customers.


Won't validate whether this happens with Docker, but I've seen it other places. I won't name names, but I bought a commercial add-on to an open source project very similar to Docker, and when I had a support issue, I was pretty much told "tough luck, not my problem" by the project founder.


Was the "project founder" related to the company that sold you the commercial add-on? If not, why should he support it, unless of course there is some commercial relationship between the open source project and the company selling the add-on?


Yes, his company is the one that made the add-on; it was their first commercial product atop the open source project. Today they're a nice fat funded company, but then, they had little revenue, which really put me off.


I am using kubernetes instead of docker swarm to orchestrate docker images and all points mentioned in this article do not apply. My cluster is small - I have <100 machines at peak, but so far it feels ready for prime time.

There are parts of docker that are relatively stable, have many other companies involved, and have been around for a while. There are also "got VC money, gotta monetise" parts that damage the reputation of stable parts.


> My cluster is small - I have <100 machines at peak

Let me ask: WTF are people are doing that <100 machines is a "small cluster"? I ran a Top 100 (as measured by Quantcast) website with 23 machines and that included kit and caboodle -- Dev, Staging and Production environments. And quite some of that were just for HA purposes not because we needed that much... Stackexchange also runs about two dozen servers. Yes, yes, Google, Facebook runs datacenters but there's a power law kind of distribution here and the iron needs are falling very, very fast as you move from say Top 1 website to Top 30.


The number of machines you need to run a service is not really a linear function of your traffic. If you have a mostly static website that can be heavily cached/cdn'd, you can easily scale to thousands of requests a second with a small server footprint. I expect that's true of many of the top 100 sites as measured by visitors (like Quantcast does).

But if you need to store a lot of data, or need to look up data with very low latency, or do CPU-intensive work for every request, you will end up with a lot more servers. (The other thing to consider is that SaSS companies can easily deal with more traffic than even the largest web sites, because they tend to aggregate traffic from many websites; Quantcast, for example, where I used to work, got hundreds of thousands of requests per second to its measurement endpoint.)


Note: the site I mentioned did hit the database quite a few times for each page. It was a nice challenge.


Not everyone can afford to vertically scale. Those 24 servers of StackExchange together cost more than the average 100 machine 'small' cluster. At Digital Ocean the performance sweetspot probably lies at the $20/mo machine, so that's 2000$/mo for a 100 machine cluster. You think StackExchange pays less than that for their hardware?

Also, some sites are simply larger than StackExchange, and you never heard of them. There's a huge spectrum between StackExchange and Google.


There's some truth in that, the site I mentioned in 2010 was running DB servers with 144GB of RAM -- unlike SE we rented servers, the colo numbers didn't look good.


Not all computing use cases are running a website.


I have a processing pipeline that does crawling, indexing and some analytics on the crawled data - I am building a vertical-focused search engine focused on catering for some specific kinds of companies.

Usually my cluster is very small, unless the pipeline is in progress. At this very moment I am not doing any processing, so I only have 3 machines up.


Heard this elsewhere. Why do people use Docker swarm with Kubernetes around? Does it require less setup or something?


I see two reasons for this:

- If you don't use google container engine (hosted kubernetes, also known as GKE), kubernetes have a reputation of being hard to set up. I am running on GKE, so I can't comment too much.

- It's hard to see how all of the orchestration/docker/etc. things play together when entering the area. I expect many people hear "docker" and want to just try "docker", not being aware that there exist alternatives for some parts, some parts are more reliable that others, etc. E.g. the article we are discussing seem to be doing this.


Kubernetes definitely runs best on GKE, but projects like kops (https://github.com/kubernetes/kops) are making setups on AWS etc. a lot easier.

You still have to deal with the other drawbacks of those platforms (slow networks, disks etc.) but that's not really a k8s issue.


> kubernetes have a reputation of being hard to set up

Yeah, but there's no shortage of things wrapping Kuber (e.g. OpenShift).


Kubernetes seems very unwieldy and hard to wrap my head around. Swarm seems straightforward and easy to figure out how to configure.

Let's give it a try and see if I can find a good tutorial for kubernetes to do what I want. (I haven't tried this experiment in a couple months, since before nomad and swarm got my interest.)

-----

Ok, I'm back. I went to kubernetes.io. there' "give it a try" has me creating a google account and getting set up on google container engine. Due to standard issue google account hassles, I quickly got mired in quicksand having nothing to do with kubernetes.

I have no interest in google container engine. Let me set it up using vagrant, or my own VPSes, or as a demo locally, whatever, I'll set up virtual boxes.

They lost me there.


Check out minikube: github.com/kubernetes/minikube

It's a local setup to let you give Kubernetes a try. We haven't made it the default in the "give it a try" dialog yet, but we're considering it.

Disclosure: I'm an engineer at Google and I work on Minikube.


I found the Minikube getting started to be a good experience about a month ago. If you did make Minikube the default for "give it a try" that would instantly gain k8s more credibility in my eyes, as it calls attention to the excellent tooling around k8s, diminishes the perception of GKE as the only first-class k8s environment (ahem, lock-in!), and promotes the notion of an economical and fast k8s development environment.


It isn't directly linked from the front page, but they have a getting started section [0] that covers non-GKE options. I run k8s on AWS using the Kops tool.

Kubernetes is definitely more to learn before getting started than swarm, but that's mostly because it has different and more powerful primitives and more features built in.

[0]: http://kubernetes.io/docs/getting-started-guides/


What are standard Google account hassles? The k8s getting started guide is one of the single best cloud setup guides I've ever read. You've deployed your app, and can easily leverage the concepts listed their to deploy more complex apps.

K8s might seem more unwieldy than swarm, but from that feature set you can expect things to work the way they are explained.

Swarm on the other hand has made my entire team question whether 1.12 is even worth upgrading to.


Completely 100% disagree that the getting started guide is easy. Let's go through this by the numbers. First off specifically comparing the guides, the Docker 1.12 stuff works ANYWHERE you have root access. When I go here I'm totally confused:

http://kubernetes.io/docs/getting-started-guides/

Ok, let's see, not only do I immediately get diverted to another page, but I feel like every OS+Cloud combination isn't represented. I guess the CLOSEST thing to working is Ubuntu+AWS. Click.

http://kubernetes.io/docs/getting-started-guides/juju/

YAY! JUJU a new technology I need to learn. Hey guess what, this only works with Ubuntu. Closes browser. I spent weeks trying to map this out in my head. I can't understand why Kub doesn't just "install" like Docker does.

Ok, back to the deployment guide. Let's see there's a GIANT TABLE OF LINKS based on cloud+OS+whatever. So I think it's a massive understatement to say that Kub is more unwieldy than Swarm 1.12.


The thing is, the docker stuff might be simple, but it doesn't actually work. Or at least the networking portion doesn't work as advertised.

http://kubernetes.io/docs/hellonode/ Is the guide that I was taking about. From your post it was not clear that you weren't using gce.


The docs are more of an encyclopedia: a lot of facts, not a lot of editorializing. Unless you enjoy learning from encyclopedias, I would highly recommend talking to other members of the community on the kubernetes slack (kubernetes-novice, kubernetes-user or for AWS sig-aws). That way you can quickly find what has worked well for people and what hasn't, and hopefully save yourself a ton of time.

I would love for our docs to be better - good people are working on it though documentation is always hard. In the meantime, the community is a wonderful resource!


You can have a swarm with only a handful of commands now, with TLS (for docker api). Eventually stable docker host to docker host container networking will have encryption out of the box too. The number of steps for a production docker setup keeps getting shorter and shorter, kubernetes seems to be now focused on this aspect, they might catch up. This is all of course ignoring the IaaS solutions and obviously for the reasons stated in the article, swarm doesn't see those benefits with 1.12 because major features do not work yet which are supposedly stable.


Yeah setup time is great. However, it seems that the number of commands required for stability with docker swarm -> ∞


I used it because I didn't understand what I needed and wanted to stay in the Docker ecosystem to try to limit tooling issues. Kubernetes sounded like more than I needed given what Swarm offered. Turned out alright because I was able to learn a lot, but Docker tools aren't as great as they make them sound (nothing is production ready and devs are too constrained to improve anything). I've moved up to kube and feel like I would have saved a month if I'd gone with it in the first place.


I've used K8S before, and those things are smooth once installation happen. We have a project which is looking at using Docker Swarm on a two node set up. The idea is that this will give us a baseline, and since it is looking more and more we'll move to GKE, this sets us up for a later move into GKE. I thought Docker Swarm would be mature enough to handle two nodes, for a short time (maybe half a year) until we make the leap to GKE. But based on the reports coming in from the wild, that's looking less and less likely.

We hired a contractor to do this part. My team is so resource constrained that I don't have time for this, so we farmed it out. But now I'm thinking, the risk of project failure is much higher than I thought, made worse that the contractor is also showing signs of having poor communication skills. (I would rather be updated on things going wrong than to have someone try to be the hero or cowboy and figure it all out).


we are exactly in your boat. However, we are signing up for kubernetes since we want to keep the GKE option open.

The only reason why we didnt go ahead is because of broken logging in k8s - https://github.com/kubernetes/kubernetes/issues/24677


I went through a few iterations for logging, but now I settled on using built-in GKE logging. Stdout logs from my containers are picked up by kubernetes and forwarded to stackdriver. Since it's just stdout I do not create too much lock-in. I use stackdriver dashboard for investigating recent logs and BigQuery exporter for complex analysis. My stdout logs are jsons, so I can export extra metadata without relying on regexps for analysis - I use https://pypi.python.org/pypi/python-json-logger.


Just as a hint in case you're not aware of this: If you log errors in the format[1] expected by Stackdriver Error Reporting, your errors will automatically be picked up and grouped in that service as well.

1: https://cloud.google.com/error-reporting/docs/formatting-err...


since we are not on GKE already, I want to be able to use k8s to forward to my host machine journald and thats broken. I think Google is doing a lot of handholding to get it working with stackdriver.

This is the blocker for me. I cant switch to GKE already because I use AWS postgresql. But I want to use k8s :(


Thr K8S project I was involved in last year used AWS postgresql too. At that time, figuring out how to have persistant data was too much. Further, AWS EBS driver for K8S storage wasn't there. And, PetSets have not come out. With PetSets out, I think figuring out how to do a datadtore on K8S or GKE will be easier. (I had mentioned to my CTO, I don't know how to do persistant store on GKE; he pointed out the gcloud data store; I told him, that was not what I meant ;-)


this is interesting. Were you on GKE or AWS. GKE is obvious, but if you were hosting on AWS.. what was your logging setup ?

If you were not using AWS EBS .. what were you doing ?


The owner of the company ran out of money before I could add logging, but my plan was to get it out to something like papertrail.

I was on AWS. I sidestepped the issue by using AWS RDS (postgresql).

I had tried to get the nascent EBS stuff working, but when I realized that I'd have to get a script to check if an EBS volume was formatted with a filesystem before mounting it in K8S, I stopped. This might have been improved by now.


I probably wrote that support (or at least maintain it), and you shouldn't ever have needed to add a script to format your disk: you declare the filesystem you want and it comes up formatted. If you didn't open an issue before please do so and I'll make double-sure it is now fixed (or point me to the issue if you already opened it!)

On the logging front, kube-up comes up with automatic logging via fluentd to an ElasticSearch cluster hosted in k8s itself. You can relatively easily replace that ES cluster with an AWS ES cluster (using a proxy to do the AWS authentication), or you can reconfigure fluentd to run to AWS ES. Or you can set pretty easily set up something yourself using daemonsets if you'd rather use something like splunk, but I don't know if anyone has shared a config for this!

A big shortcoming of the current fluentd/ES setup is that it also predates PetSets, and so it still doesn't use persistent storage in kube-up. I'm trying to fix this in time for 1.4 though!

If you don't know about it, the sig-aws channel on the kubernetes slack is where the AWS folk tend to hang out and work through these snafus together - come join us :-)


@justinsb - based on the bug link I posted above, what do you think is the direction that k8s logging is going to take?

From what you wrote, it seems that lots of people consider logging in k8s to be a solved issue. I'm wondering why is there a detailed spec for all the journald stuff, etc.

From my perspective - it will be amazing if k8s can manage and aggregate logs on the host machine. It's also a way of reducing complexity to get started. People starting with 1-2 node setups start with local logs before tackling the complexity of fluent, etc

Is that the reason for this bug?


I'm not particularly familiar with that github issue. A lot of people in k8s are building some amazing things, but that doesn't mean that the base functionality isn't there today.

If you want logs to go into ElasticSearch, k8s does that today - you just write to stdout / stderr and it works. I don't love the way multi-line logs are not combined (the stack trace problem), but it works fine, and that's more an ElasticSearch/fluentd issue really. You'll likely want to replace the default ES configuration with either one backed by a PersistentVolume or an AWS ES cluster.

Could it be more efficient and more flexible? Very much so! Maybe in the future you'll be able to log to journald, or more probably be able to log to local files. I can't see a world in which you _won't_ be able to log to stdout/stderr. Maybe those streams are redirected to a local file in the "logs" area, but it should still just work.

If anything I'd say this issue has suffered from being too general, though some very specific plans are coming out of it. If writing to stdout/stderr and having it go to ElasticSearch via fluentd doesn't meet your requirements today, then you should open a more specific issue I think - it'll likely help the "big picture" issue along!


So far I'm feeling pretty good about the decision to skip the first generation containerization infrastructure.

At the outset it had the look of something that wasn't an advance over standard issue virtualization, in that it just shuffled the complexity around a bit. It doesn't do enough to abstract away the ops complexity of setting up environments.

I'm still of the mind, a few years later, that the time to move on from whatever virtualization approach you're currently using for infrastructure and development (cloud instances, virtualbox, etc), is when the second generation of serverless/aws-lambda-like platforms arrive. The first generation is a nice adjunct to virtual servers for wrapping small tools, but it is too limited and clunky in its surrounding ops-essential infrastructure to build real, entire applications easily.

So the real leap I see ahead is the move from cloud servers to a server-free abstraction in which your codebase, from your perspective, is deployed to run as a matter of functions and compute time and you see nothing of what is under that layer, and need to do no meaningful ops management at all.


This sounds like I wrote it.


Two days ago there was a fairly long discussion about a similar argument ("The sad state of Docker")

https://news.ycombinator.com/item?id=12364123 (217 comments)


The title should say: Docker Swarm is not ready for primetime. We have used Docker in production for more than two years and there has been very few issues overall.


This article should really be about Docker Swarm not being ready for production. It's much newer technology than Docker and is predictably brittle.

The only points made against Docker proper are rather laughable. You shouldn't be remotely administering Docker clusters from the CLI (use a proper cluster tool like Kubernetes), and copying entire credentials files from machine to machine is extremely unlikely/esoteric.

Docker, with Kubernetes or ECS, is totally suitable for production at this point. Lots and lots of companies are successfully running production workloads using it.


> You shouldn't be remotely administering Docker clusters from the CLI (use a proper cluster tool like Kubernetes)

Docker Swarm is advertised as stable, production ready cluster management solution. Then, if you actually try to use it, it is very NOT. Kubernetes is great, but it feels like adding another layer to the system, and it is not always a good thing (especially if you are on AWS and have to work with _their_ infrastructure management too).


I would say that my experience with Docker has been fantastic. I run over 10 Ubuntu Trusty instances on EC2 as 8G instances, mounted with NFS4 to EFS. This makes it super simple to manage data across multiple hosts. From that you can run as many containers as you like, and either mount them to the EFS folder, or just spawn them with data-containers, then export backups regularly with something like duplicity.

I use rancher with it, and it's retarded simple using rancher/docker compose.

For a quick run-down see: https://github.com/forktheweb/amazon-docker-devops More advanced run-down of where I'm going with my setup: https://labs.stackfork.com:2003/dockistry-devexp/exp-stacker...


Just want to add that I also use it on windows without barely any issues (win 10 x64). I'm not sure how stable it is on Mac OSX but Kitematic is pretty sweet.

The only problems I've had with Docker container are those where processes get stuck inside the container and the --restart=always flag is set. When this happens it means that if you can't force the container to stop, when you reboot the defunct container will restart anyway and cause you the same issue...

My solution to this has been to just create a clean AMI image with ubuntu/rancher/docker and then nuke the old host when it gives me problems. This is made even easier if you use EFS because it's literally already 100% setup once you launch a replacement instance.

Also, you can do automatic memory-limiting and cpu-limiting your nodes using rancher-compose and health-checks that re-route traffic with L7 & HAProxy: http://docs.rancher.com/rancher/v1.1/zh/cattle/health-checks...

The only thing even comparable to that in my mind would be Consul health checks with auto-discovery: https://www.consul.io/docs/guides/index.html


Most of the complaints I've seen recently about using Docker are about the immaturity of Docker Swarm. Can this be mitigated by using Docker with Kubernetes / Mesos / Yarn?

If it's truly a problem with the containerization format / stability with the core product, I'm not sure what a good alternative would be. I see a lot of praise for rkt but the ecosystem and tooling around it are so much smaller than that for Docker.


Using it with Kubernetes definitely helps. One reason is that if you have enough surplus nodes, then Docker misbehaving on one of them shouldn't screw up anything; Kubernetes is really good at shuffling things around, and you can "cordon off" problem nodes to prevent them from being used for scheduling.


> Each version of the CLI is incompatible with the last version of the CLI.

Run the previous version of the cli in a container on your local machine. https://hub.docker.com/_/docker/

  $ docker run -it --rm  docker:1.9 version


>Run the previous version of the cli in a container on your local machine.

I'd rather get out of IT and go into farming is this is considered a valid recourse.


May I join you?


This is both very clever and horrifying. This obviously works for development but doing a rolling update of your fleet is stressful without the additional wrinkle of progressively updating which tool you need to inspect or control that fleet.


Or, you know, just set `DOCKER_API_VERSION` to the version of the engine you're interacting with.


If it has the capability to communicate cross-version, it should just do that, instead of displaying an error message that /doesn't/ point you to `DOCKER_API_VERSION`.


Not doing so is a perfectly defensible decision. Not informing the user of the environment variable, however, is indeed a fairly awful mistake.


Are we the only two people who know this??


I certainly hope not. It's literally the first environment variable listed in the Docker CLI reference doc: https://docs.docker.com/engine/reference/commandline/cli/


You literally just proved his point.


I don't think he was arguing against it.


It's so hard to tell. I hope they were joking.


I'm not sure that I did. Let's expand out my point to see if I'm on the same page as the replies here.

1. TFA complains that versions of servers require corresponding client tools to be installed.

2. TFA says he loves docker in dev, but not in prod.

3. I state (omitting many steps): install the latest docker in your dev environment (or wherever you're connecting to prod), spin up a small docker image to run the appropriate version cli. This is cheap, easy, easy to understand why it works, consistent with points 1 and 2.

What am I missing?


I have several past clients running docker in production just fine.

At my current job we run nearly all our services in docker.

I've replied to this type of comment on here at least a dozen times, it has nothing to do with docker, it is a lack of understanding how it all works.

Understand the history, understand the underlying concepts and this is no more complex than an advanced chroot.

Now on the tooling side, I personally stay away from any plug-ins and tools created by the docker team, they do docker best, let other tools manage dockers externalities.

I've used weave since it came out, and it's perfect for network management and service discovery.

I prefer to use mesos to manage container deploys.

There is an entirely usable workflow with docker but I like to let the specialists specialize, and so I just use docker (.10.1 even), because all the extra stuff is just making it bloated.

I'm testing newer versions on a case by case basis, but nothing new has come out that makes me want to upgrade yet.

And I'll probably keep using docker as long as it stays separate from all the cruft being added to the ecosystem.


The Docker team is doing more than anyone to move container technology forward, but orchestration is a much harder problem to solve than wrapping OS APIs. I wish they would stick to the core, and let others like the Kubernetes team handle the orchestration pieces. Swarm is hard to take seriously right now. I'm not sure bundling it into the core was the best way to handle it.


I disagree. Kube, Mesos, etc.. are all great I'm sure. I had Kub documentation open in the background for about a week or two while doing other things, trying to get a handle on installation, deployment, how it will fit my use-cases... I kept wanting to dip in, but kept getting pounded by tons of heavy dependencies I would have had to learn. After Swarm Mode was released I had a cluster deploying things locally within 2 hours of reading the documentation. From there was able to map all of our use-cases on top (rolling updates, load balancing, etc...). I totally get that it's "hard to take seriously" perspective with some of the early glitching, but Kube, Mesos, etc.. needed this egg on their faces to realize how easy the installation and getting things up and running should be. If nothing else, that alone will make those other products better.


I think we should stop falling into the marketing greed of Docker to manage container at large scale. They want to get a grip of the corporate market and catch up with kubernetes at the cost of their containerization quality. Unfortunately with their current strategy they are loosing credibility as a containerization tool AND as the container orchestration tool they try to become. Using it solely as a containerization tool with k8s/ecs for the heavy lifting is the relevant way to go as for today IMO.


Here are alternatives to Docker

One can use Ubuntu LXD which is Linux containers built on top of LXC but with ZFS as storage backend. LXD can also run Docker containers. http://www.ubuntu.com/cloud/lxd

One can also use Linux containers via Kubernetes by Google. http://kubernetes.io/


I think the underlying issue here is that no two people agree on what "Docker" is. Is it the CLI? Is it Docker Machine? Is it Docker Swarm?

The container part of Docker works well. And they've ridden that hype wave to try to run a lot of other pieces of your infrastructure by writing another app and calling it Docker Something. Now everybody means a different subset when they say "Docker".


I would say that my experience with Docker has been fantastic. I run over 10 Ubuntu Trusty instances on EC2 as 8G instances, mounted with NFS4 to EFS. This makes it super simple to manage data across multiple hosts. From that you can run as many containers as you like, and either mount them to the EFS folder, or just spawn them with data-containers, then export backups regularly with something like duplicity.

I use rancher with it, and it's retarded simple using rancher/docker compose.

For a quick run-down see: https://github.com/forktheweb/amazon-docker-devops


> Each version of the CLI is incompatible with the last version of the CLI.

I'm pretty sure that as long as the CLI version is >= the server version, you can set the DOCKER_VERSION env var to the server version and everything works.

I haven't used this extensively, so maybe there are edge cases or some minimal supported version of backwards compatibility?


I wonder how much suffering would be alleviated in most mid-level IT organizations if they just used Joyent/SmartDataCenter, and I say this as a FreeBSD developer with no affiliation.


A lot.

zones in SmartOS provide full-blown UNIX servers running at the speed of the bare metal, but in complete isolation (no need for hacks like "runC", containers, or any other such nonsense).

Packaging one's software into OS packages provides for repeatability: after the packages are installed in a zone, a ZFS image with a little bit of metadata can be created, and imported into a local or remote image repository with imgadm(1M).

https://wiki.smartos.org/display/DOC/Managing+Images

That's it. It really is that simple.


Docker Swarm is definitely not production ready. Try to run any service that requires communication among nodes and you will agree. It works fine for web servers, but that is about it.

DC/OS is emerging as the go-to way to deploy docker containers at scale in complex service combinations. It 'just works' with one simple config per service.


Network partitions really do happen! They are often short, but if you can't recover from them, then you shouldn't call yourself a distributed system.

I am shocked at how fragile etcd is in this way. I was hoping docker swarm was better, but I'm not surprised (alas) to find out that it has the same problem.

I'm about ready to build my own solution, because I know a way to do it that will be really robust in the face of partitions (and it doesn't use RAFT, you probably should not be using RAFT, I've seen lots of complaints about zookeeper too. I've done this before in other contexts so I know how to make it work, but so have others so why are people who don't know how to make it work reinventing the wheel all the time?)


I'd love to hear more about your solution. Are you saying that you've created an algorithm distinct from paxos/raft/zab that's more robust?


Check out weave, it's great for service discovery. Zookeeper is much better than etcd imo.


amazon efs + duplicity p.o.t. snapshots -> s3 + docker = win


If you're truly running Docker in production you're probably not making use of both of these issues taken with Docker. No one would dare interact with a production cluster via the basic Docker CLI, instead you should be interacting with the orchestration technology like ECS, Mesos or Kubernetes. We are running ECS and we only interact with the Docker CLI to query specific containers or shut down specific containers that ECS has weirdly forgotten about.

It definitely sounds like Swarm is not ready but I wouldn't say this is representative of running Docker in production: instead you should be running one of the many battle tested cluster tools like ECS, Mesos or Kubernetes (or GCE).


Agreed. Wouldn't use Docker CLI for production purposes.

We use Cloud66 (disclaimer - not associated with them) to help with the deployment issues if any arise.

Also we don't store DB in containers.


Just read through most of the thread, seems like a very large disconnect between people that are happy with Docker and those that are not. Personally, I've been extremely happy with it. We have one product in production using pre-1.12 swarm (will be upgrading in the next couple months) and most of our dev -> uat environments are now fully docker. It's been stable. On my personal projects I used Docker 1.12 and yes, after a few days things kerploded, but after upgrading to 1.12.1 things have been incredibly stable. For Nodejs apps I have been able to use Docker service replicas instead of Nodejs clustering, and been very happy with the results.


it seems that none of the container frameworks are generally ready.

Take for example k8s - I just started exploring it as something we could move to. https://github.com/kubernetes/kubernetes/issues/24677 - logging of application logs is an unsolved problem.

And most of the proposals talk about creating yet another logger...rather than patching journald or whatever exists out there.

For those of you who are running k8s in production - how are you doing logging ? does everyone roll their own ?


Logging is definitely not an unsolved problem with K8s. It's trivial to set up Fluentd to snarf all container logs into some destination, such as Graylog, ElasticSearch or a simple centralized file system location.

The Github issue talks a lot about "out of the box" setup via kube-up. If you're not using kube-up (and I wouldn't recommend using it for production setups), the problem is rather simple.

The logging story could be better, but it's not "unsolved".


Thanks for that clarification. We are trying to get our feet wet by building a small 2 node setup.

We don't want to stream our logs or setup fluent, etc. I just want to make sure my logs are captured and periodically rotated. Now, the newer docker allows me to use journald as logger (which means all logs are sent to journald on the HOST machine)... But I can't seem to figure out how to do this in k8s.

Also as an aside, for production deployment on aws.. What would you suggest. I was thinking that kube-up is what i should use.


Yes, journald would definitely be superior, conceptually, to just writing to plain files. I can't help you there, though, as I've never tried setting it up.

(As an aside, I don't understand why more Unix software doesn't consistently use syslog() — we already have a good, standard, abstract logging interface which can be implemented however you want on the host. The syslog protocol is awful, but the syslog() call should be reliable.)

As for kube-up: It's a nice way to bootstrap, but it's an opaque setup that doesn't give you full control over your boxes, or of upgrades. I believe it's idempotent, but I wouldn't trust it to my production environment. Personally, I set up K8s from scratch on AWS via Salt + Debian/Ubuntu packages from Kismatic, and I recommend this approach. I have a Github repo that I'm planning to make semi-generic and public. Email me if you're interested.


Check out kube-aws, a tool from CoreOS that will get you a proper CloudFormation/VPC set up in just a few minutes: https://coreos.com/kubernetes/docs/latest/kubernetes-on-aws....

If you're interested to see more about what it does behind the scenes, it is very similar to the full manual guide for setting up Kubernetes: https://coreos.com/kubernetes/docs/latest/getting-started.ht...


Whats the k8's limitation here? You can do it however you like. e.g. make a fluentd daemonset, deploy on both nodes have them slurp up to s3. There are tons of ways to handle the logs and k8s can either help or stay out of the way.

https://github.com/fabric8io/fluent-plugin-kubernetes_metada...


it seems that none of the container frameworks are generally ready.

None of the container frameworks on Linux are ready, but that doesn't mean that's the case in general.


logging of application logs generally in the world is an unsolved problem, but I feel like k8s does better than most; fluentd is the default, and then logs are collected by whatever end system you like the best that's appropriate for your needs (e.g. GKE -> fluentd -> Cloud Logging, and I've heard of K8S -> fluentd -> ELK).


since most people use k8s in the context of GKE.. there seems to be some magic that Google has put in.

But if you read the bug, this doesnt work quite well in the wild. In fact, an entire spec is being written on it and right now the debate is whether to forward to journald and have it proxy further or reimplement a new logging format.


well, there's definitely systemd people pushing for journald, that's for sure. The first comment talks about how it doesn't fit, though. I wonder what in particular about the solution doesn't work for those people.


We've tried to use Docker a couple of times, but it was always much more trouble than it was worth. It was always some edge case that caused it to not work as expected one developer machine or another.

After about 2 years of giving it another shot on and off, we just gave up. And it's not like we were doing something crazy, just a typical documented "run this rails app" type thing. I would definitely not use this in production for anything based on my experience.


Were you using Linux developer machines? Docker, Inc. provide ready-made installers for Windows and Mac - the Docker ToolBox makes setting up on those OSes very easy these days. I haven't tried the new products that replace ToolBox yet.

I can imagine it being harder on some Linux distributions because your host system will have to have all of the kernel bits etc. Ubuntu 14.04 and above or current versions of Fedora should be good - the distributions explicitly support Docker, and Docker, Inc. provide packages so that you can run either the distribution packages or the Docker, Inc. ones.


We are all on Macs.


I have the same experience. I am trying to set a new node where I accidentally installed 1.10 and CLI does not work. Looking for some thing on the internet how do X with Docker, only articles available for previous versions. I mean seriously, command line development is not supposed to be this hard, get a set of switches and stick to it unless something major forces you to change. If you shuffle around CLI switches between minor releases nobody is going to be happy.


Would love to hear thought from people who use it in production?


We're using Docker pretty extensively at Segment with over 1,200 containers running in production with ECS (total count, not uniqueness).

We haven't touched Swarm and are staying a couple versions behind. With that said, we're pretty happy with the setup right now.


Airtime is also using ECS to run docker containers in production. We ran into a few issues early on due to early adopter status, but the AWS team was very quick to add many features we needed.

I have no complaints about ECS now, and in fact I'd say that ECS solves a lot of the problems that the author complains here. I'd say that Swarm is not production ready, but docker itself as an engine for running a container is fine. You just need to build your own orchestration layer, or use something like ECS.

More details on Airtime's docker setup here: https://techblog.airtime.com/microservice-continuous-integra...


Maybe things has changed since Apr 2015, but I found ECS pretty limited and K8S solving those limitations. One big issue was container-to-container addressing. It was not available last year when I put an ECS cluster into production. Has it been added since?


Direct container to container communication isn't the recommended pattern when using ECS. Instead you let ECS associate with either a legacy Elastic Load Balancer, or one of the new Application Load Balancers, which can direct requests to different classes of containers based on the request path, allowing you to take a set of containers of different types and meld their http resources together into one API on one domain.

If one container needs to talk to another container of a certain type it just uses the load balancer, and the load balancer takes care of directing the request to the other container which may be on the same machine, or another machine, or maybe even in a different cluster.

Obviously this forces you into a fully stateless design because the load balancer round robins all the requests across random containers, but its very fail proof, and gives you high resiliency.


That doesn't necessarily give you resiliency or fail proof. It's recommended that way because there is no other way with ECS. The reason is that unlike K8S, ECS does not inherently have a service discovery. K8S solved it well with a notion of services and has a built-in load balancer. The design is very solid and doesn't require you to pay for a set of ELB instances every time you want to do that. That is why I asked if ECS has added this. It sounds like it has not.


Yeah ECS is a pretty much just a dumb layer on top of docker that handles two things: placing your containers onto instances (and restarting them or replacing them if needed), and adjusting the settings on an ELB, or ALB so that the load balancer is pointed at an instance running the container.

The ECS approach is a pretty minimal to glue together a cluster of instances running docker and ELB, while Kubernetes has a lot more baked into the core implementation.


I've also used ECS for a production system, and have found it to be an excellent runtime for docker containers.


I used it in production with CoreOS and now Kubernetes. I almost never touch the Docker CLI and haven't tried Swarm. I think a lot of developers mistakenly think they can get to a production environment in a few Docker commands which is definitely not possible today.


From an ops side, that's worrying.

Yes, in an absolutely ideal world no one ever should touch a production environment, and ideally in every situation you find yourself having to touch a production system you should then be creating at the very least a backlog item to figure out how to avoid doing it in the future, but the reality is some kinds of troubleshooting are hard to do without having access. Generally the kinds of situations where you need that access are ones when you want access the quickest and easiest, i.e. the world is burning somehow.


OP wasn't talking about access, but about the effort needed to set up ("get to") a production environment. Of course Kubernetes/Docker gives you full access to your production environment.


I've posted about this before, but my work (non IT Department at a State Govt agency) has been using Docker in production for about a year. We are small by HN standards (~12 apps on 5 hosts that are a mix of API's, front ends, and several more complex GIS apps). Moving to Docker was a massive improvement for us in terms of deployment and development. Deploying with Docker is ridiculously easy (no more taking a node out of HAProxy for an update, rollbacks are painless, and we love that what you deploy is what you get - there is no question over what the container is actually doing since you define it in the Dockerfile). I can't remember the last time we had a bad deployment or even cared about making a change to production. It's that reliable. It's also been a massive improvement for QA and development because of the ability to standardize the container using the Dockerfile.

We've been using CentOS for the production hosts and CentOS/OSX for developing on. There were a couple of minor issues when we started but the benefits outweighed the negatives then. It's been completely painless since Docker came out of beta. We have some containers that are deployed daily and others that run for months without being touched.

I've never quite understood the negative attitude towards Docker. Perhaps we are too small to see many of the issues that people often complain about, but moving to Docker was a great decision in our office.


Are you installing docker out of the CentOS repositories, or from Docker Inc's repository? The CentOS version is based on the work Red Hat does to backport patches and create more production ready versions of Docker, so that might be why you are having a better experience.


We are indeed using the CentOS repositories. The current version on our production servers is 1.10.3.


Unless you're using it to compartmentalize CI jobs on Travis or Jenkins, don't use it in production (do use it in development for standing up local dev environments quickly). We do, and it's a wasted layer of abstraction.

I predict the Docker fad will end in 2-3 years, fingers crossed.


> I predict the Docker fad will end in 2-3 years, fingers crossed.

I work for what is, ostensibly, a competitor and I don't agree.

Containers are here to stay. Docker will pretty much have a brand lock on that for a long time.


> do use it in development for standing up local dev environments quickly

You can do this, for sure, but, tbh, I'd still rather use Vagrant. I'm already provisioning things like my datastores using Chef (and not deploying those inside of Docker in production); I might as well repurpose the same code to build a dev environment.

I do agree regarding using containers (I use rkt rather than Docker for a few reasons) for CI being a really good jumping-off point, though.


Vagrant is pretty heavy, but if you're already using Chef, makes sense. If you're deploying using Docker, then mirroring your environment in dev makes more sense.

I've used both Vagrant and Docker for dev, and find Docker to be super fast and light. (though I've definitely run into some headaches with Docker Compose that I didn't see in using just Docker)


I think it's probably not worth worrying about "fast" when "slow" is "ten minutes once a week, maybe, if you're really destructive" and similarly worrying about "light" when you have 16GB of RAM and a quad-core processor; the heaviest dev environment/IDE I can find (IntelliJ and a couple hundred Chrome tabs) doesn't use eight of it. It comes off as super-premature optimization to me, and those optimizations that you would otherwise do in Docker (which are just building base images, whatever, Packer does this trivially) exist in a virtualized environment too.

Running datastores in Docker is bonkers in the first place, which still leaves a significant place for an actual CM tool--so unless you are a particular strain of masochist, when you are replicating your local environment, you'll probably need it anyway.


> ten minutes once a week, maybe, if you're really destructive

If you do a lot of active machine development, where you're iterating on that machine setup, it's an extremely slow process. Perhaps it would make sense to iterate in Docker, and then convert that process to Vagrant once you finish.

> similarly worrying about "light" when you have 16GB of RAM and a quad-core processor

It's a concern when you're launching many machines at once to develop against. Docker tends to only eat what it needs. Also, while I think everyone should have a 16GB machine, there's quite a few developers out there running 8 or 4GB machines.


> I predict the Docker fad will end in 2-3 years, fingers crossed.

Containers are definitely here to stay. Google, Heroku, and others have been using them for years.

Obviously Docker is just a subset, and it might fade, but I think it's pretty well established as a defacto standard.


Containers have been around for more than a decade as lxc. I'm saying Docker itself. It's hard to make money when you're running on an open standard that anyone can build tooling around.


I think it's a pretty excellent layer of abstraction, as it was in the form of jails/zones. Even if we stuck to single node computing I think I think it'd be a powerful thing, but viewing a whole cluster of computers as a single compute resource is incredible.

I think docker itself may fade but I view its popularity as jails finally catching on rather than a new bit of hype that's going to die down soon.


maybe not end, but scale down to realistic usage scenarios.


I use it in production by way of kubernetes. It's been great so far.


We use it via GKE in prod. Zero complaints so far.


Docker not being ready for primetime and Swarm not being ready for primetime are two different things no? As for the cli compatibility issues, don't most people use an orchestration tool like ansible/chef/puppet etc to manage their fleet? I'm not sure I think the title of the post is accurate.


don't most people use an orchestration tool like ansible/chef/puppet etc to manage their fleet?

I don't. All my configuration management is done with OS packages. And once you start making OS packages to do mass-scale configuration, the whole premise of Docker becomes pointless. If I have OS packages, and I can provision them using KickStart, or JumpStart, or imgadm(1M) + vmadm(1M), without using or needing any code, then what exactly do I need Docker for?


We have docker in production for about a year for our ENTIRE INFRASTRUCTURE, handling about 500 millions requests per month on 20 services including JVMS, Distributed Systems and so far it helped us spare so much time and money, I wouldn't consider going back without for a minute...


How is that relevant? The author didn't complain about the performance - the complaint was about API stability. Which I agree with. Docker is really terrible at supporting their own APIs.


Fair enough! There is also performance penalty and annoyingly complicated setups in certain cloud environments.


Can anyone using Swarm seriously in production post any account of their experiences here? Thanks.


I doubt you'll get any replies. We tried using swarm in production, but could not ever get a reliable, working component. Failures of the worst kind would happen - networking would work 1/10 times meaning that a build could've conceivably gotten through CI.


"Just imagine if someone emailed you a PDF or an Excel file and you didn't have the exact verion of the PDF reader or Excel that would open that file. Your head would blow clean off."

Obviously this person has never spent a good deal of time dealing with AutoCAD...


Or more to the point early versions of the MS Office suite, where major updates could and did create backwards-incompatible saves. Users had to be careful that files intended for broader distribution were saved in a sufficiently old/compatible format version.

Which is to say: this is a sign that the involved technologies are still rapidly maturing.


or excel!


We've been running in production across thousands of containers for long over a year now and it's been fantastic, not only a life saver of a deployment method but it's allowed for reliable, repeatable application builds.


Creating and maintaining a service cluster is hard. I dont think you should just take it back to the store if your magic wand has a hiccup.


As I worked for my ex-employer, we use docker based Elatic beanstalk serve over millions requests across three services per seconds.


You can set the CLI via environment variables to use newer clients with older machines:

export DOCKER_API_VERSION=1.23

You can export and import machines with this handy node js tool: https://www.npmjs.com/package/machine-share


I agree in general, and find Docker one of those technologies that does not (yet) live up to the hype.


Docker's far from ready from primetime. I'm sure everyone likes taking down their site to upgrade Docker to the latest release.

Their mindset is very tool driven; if there's a problem let me just write a new tool to do that.

Ease of use or KISS isn't a part of their philosophy


For the versions issue, check out dvm: https://github.com/getcarina/dvm

While versions are an issue, it's at least a reasonable way to work around it.


Aye, theres a lot of important bugs that are just sitting there


I watched again an explanation of what docker really is, it just seems to be this awesome thing that solve the very hard problem of inter-compatiblity between systems. I always tend to question how and why a developer had to use docker instead of making choice that would avoid it.

It's not surprising docker can't always work, but it's nice to see that programmers are winning. I guess future OS designers and developers will try to encourage for more inter compatibility if possible. That's really a big nerve.


Is lxc any better? Are any of the issues in OP solved in lxc vs Docker?


Do they speak English or the caller is expected to know French ?



yes, thanks


Make no mistake: Docker Inc has a lot of excellent engineers.

But there's also a landrush going on. Everyone has worked out that owning building blocks isn't where the money is. The money is in the platform. Businesses don't want to pay people to assemble a snowflake. They want a turnkey.

CoreOS, Docker and Red Hat are in the mix. So too my employers, Pivotal, who are (disclosure) the majority donors of engineering for Cloud Foundry. IBM is also betting on Cloud Foundry with BlueMix, GE with Predix, HPE with Helion and SAP with HANA Cloud Platform.

You're probably sick of me turning up in threads like these, resembling one of the beardier desert prophets, constantly muttering "Cloud Foundry, Cloud Foundry".

It's because we've already built the platform. I feel like Mugatu pointing this out over and over. We're there! No need to wait!

A distributed, in-place upgradeable, 12-factor oriented, containerising, log-draining, routing platform. The intellectual lovechild of Google and Heroku. Basically, it's like someone installed all the cool things (Docker, Kubernetes, some sort of distributed log system, a router, a service injection framework) for you. Done. Dusted. You can just focus on the apps and services you're building, because that's usually what end users actually care about.

And we know it works really well. We know of stable 10k app instance scale production instances that are running right now. That's real app instances, by the way: fully loaded, fully staged, fully logging, fully routed, fully service injected, fully distributed across execution cells. Real, critical-path business workloads. Front-page-of-the-WSJ-if-it-dies workloads.

Our next stretch goal is to benchmark 250k real app instances. If you need more than 250,000 copies of your apps and services running, then you probably have more engineers than we do. Though I guess you could probably stretch to running two copies of Cloud Foundry, if you really had to.

OK, a big downside: BOSH is the deployment and upgrade system. It's not very approachable, in the way that an Abrams main battle tank is less approachable than a Honda Civic (and for a similar reason). We're working on that.

The other downside: it's not sexy front-page-of-HN tech. We didn't use Docker, it didn't exist. Or Kubernetes, it didn't exist. We didn't use Terraform or CloudFormation ... they didn't exist.

Docker will get all this stuff right. I mean that sincerely. They've got too many smart engineers to fail to hammer it down. More to the point, Docker have an unassailable brand position. Not to be discounted. Microsoft regularly pulled did this kind of thing for decades and made mad, crazy cash money the whole way along.


@jacques_chester - after your last comment, I did check CF out. It is orders of magnitude harder than k8s for example.

To deploy CF on AWS - i need 20 instances. k8s can happen with a master and slave. I know that you probably do much much more.... but it will be nice to gradually scale out the components.

Second .. well BOSH. I see that there are 13K commits to the BOSH repo, so its probably set in stone anyway... but it would be nice to have something more standard (like Chef, Salt or Ansible. Ansible would be the closest thing to BOSH). These are extremely flexible tools... but far more accessible.

Third - the big one : Docker. I know that you have the Garden repository to run Docker.... but I dont think its production ready.

Cloud Foundry is an incredible piece of technology. But the problem is not that its unsexy... its inaccessible.


@sandGordon - Hey, I work for Pivotal and am the anchor of the Garden team.

Is there anything in particular that has you concerned about Garden? I see jacques mentioned we're moving (very nearly have moved!) everything over to runC backend which is certainly a big step in the right direction but I'd love to get a feel for what else you feel we need to get right before you would consider it "production ready".

Feel free to reach me here, in the #garden channel on the Cloud Foundry slack, or any other medium that works for you. Thanks.


hi @wlamartin - thanks for replying. It will be great if you can follow the (rather long) parallel thread on this topic. You should have a fairly good idea of how I personally look at this.

I wouldnt generalize that this is the problem that a small startup would always see ... but I think it is fairly close.

I would say this comment (and its reply) captures the gist very accurately - https://news.ycombinator.com/item?id=12378554


I did read it, and believe I understood, I just wasn't sure whether to apply your criticisms of CF as a whole to Garden individually, or whether there were additional distinct technical issues that you had concerns about.

The good news from a small startup/lone developer/tinkerer is that we have some work coming up in our backlog to make that much easier, including a single downloadable binary with sensible pre-configured defaults so people who don't want to wrestle with BOSH don't have to. (https://www.pivotaltracker.com/story/show/119174859)

While we grew to meet the containerization needs of CF first and foremost, I'd definitely love to see more projects like Concourse (https://concourse.ci) make use of Garden. Seeing first hand the dance they do to consume it, we definitely need to up our game if this is to be an attainable goal.

Thanks for your feedback!


Thanks for your frankness.

> I know that you probably do much much more.... but it will be nice to gradually scale out the components.

Agreed. There are intermediate options.

To try a full Cloud Foundry that fits into a laptop, you can install PCFDev[0].

As an intermediate position, you can install BOSH-lite in the cloud. You lose a major advantage of Cloud Foundry, though, which is that it's a distributed system which BOSH can keep alive.

Similarly, the default installation for Cloud Foundry uses 20 VMs. You can, if you wish, consolidate these to fewer. It's a matter of copying and pasting some YAML. But not recommended in the interests of robustness.

> but it would be nice to have something more standard (like Chef, Salt or Ansible. Ansible would be the closest thing to BOSH). These are extremely flexible tools... but far more accessible.

I have not used Ansible or Salt. I did spend time with cfengine, Puppet and Chef a few years back.

BOSH's concept of operations is a complete inversion of those tools. cfengine, Puppet or Chef take a system as it is and transform it, through declared goals, into the system you want it to be.

This actually means they have to do with a great deal of incidental complexity. Different distros, different versions of distros, different packaging tools and so on and so forth.

The BOSH paradigm is: wait, why are you trying to automate the construction of pets? You want cattle. So it starts with a fixed OS base image and installs packages in a fixed, non-negotiable way. At the low level, very inflexible. But it allows BOSH to focus on the business of installing, upgrading, repairing and reconfiguring distributed systems as single systems. The single distributed system is where BOSH starts, not where it was extended to. It's an important distinction.

> Docker. I know that you have the Garden repository to run Docker.... but I dont think its production ready.

Strictly, Garden can run Docker images. It's an API with driver backends. The original Garden-Linux backend is being replaced by one which uses runC instead. No point replicating engineering effort if we can share it with Docker.

But the problem is not that its unsexy... its inaccessible.

You're right. We're in the classic position of other heavy, robust tech. Hard to tinker with. I've been jawboning anyone inside Pivotal about this for years. We're making headway -- PCFDev is a start, bosh-bootloader is under active development and BOSH itself is going to get major love.

But compared to docker swarm or k8s, it's harder to set up an experimental instance.

[0] https://docs.pivotal.io/pcf-dev/


You consistently respond to this style of CF critique by enumerating alternatives and workarounds to the individual points. But you're missing the forest for the trees. CF is by its nature too complex! No amount of justification of the complexity will fix that; no number of afterthought scripts will make up for it. The future of this space is scaling 1-dev, 1-node systems up to production, not in asking devs to wrap their minds and laptops around intrinsically huge platforms like CF or OpenStack.

I'm not saying CF is dead, because I'm sure you've got lots of paying customers, and lots of interest in larger enterprises. But I think it's pretty obvious the way the winds are blowing, and I think the smart money is on Kubernetes.


By design, developers are not meant to have to care about how Cloud Foundry works. It's meant to be a magic carpet. You send your code and pow, it works.

You mention afterthought scripts -- how many blog posts do you read about people who've rolled their own 1/3rd of a platform on top of Docker + various other components? It's fun, it's easy to start, and pretty quickly you're married to a snowflake for which you carry all the engineering load.

It's not that I don't see the forest for the trees -- I do, internally I am constantly saying that developer mindshare predicts the 5-10 year performance of a technology and we suck at it.

But, at some level, we're talking about different forests.


> By design, developers are not meant to have to care about how Cloud Foundry works.

Yes, that's exactly the problem. At small scale -- which is where the vast majority of the market is, to be clear -- there is no difference between "the person who is responsible for the platform" and "the person who deploys code onto the platform". Platforms need to optimize for making both of these tasks easy. Docker understood this from day 1 and are backfilling technical competence. Kubernetes figured this out a bit late but are now rushing to make the adminstrata easier. CF and OS seem stuck in maintaining the separation, which keeps them out of the running.

> how many blog posts do you read about people who've rolled their own 1/3rd of a platform on top of Docker + various other components? It's fun, it's easy to start, and pretty quickly you're married to a snowflake for which you carry all the engineering load.

And doing so is still easier than bearing the cognitive burden of learning CF!


I agree that being swallowed up from the bottom is our main risk. Which is why, above, I said I talk about this a lot internally.

> And doing so is still easier than bearing the cognitive burden of learning CF!

Right, the incremental cost seems small. A bit here, a bit there. And actually, it's a lot of fun to roll your own. You know it, you can push it in any direction and so on.

The flipside, and this is where Pivotal and other CF distributors are making money, is that it turns into a nightmare pretty quickly. A lot of big companies are perfectly happy to pay someone else to build their platform.

Before I transferred to Cloud R&D, I was in Pivotal Labs. I got to see several hand-rolled platforms. Some of them were terrible. Some were ingenious. And all of them were millstones. That experience is pretty much what turned me into the annoying pest I am now.

We're at an interesting moment. Right now, writing your own platform seems reasonable -- even attractive -- to lots of people. In maybe 5-10 years, it won't be a thing that many people do any more. These days, for most projects, most engineers don't consider first writing an OS, or a database, or a web server, or a programming language, or an ORM, or a graphics library and on and on. Those are all well-served. Once upon a time it was a competitive advantage to roll your own; these days it just doesn't enter anyone's thinking.

There will be between 1 and 3 platforms that everyone has settled on, with a handful of outliers that people tinker with. My expectation is that Kubernetes will slowly morph from an orchestrator and be progressively extended into a full platform. Docker are clearly doing likewise. I think Cloud Foundry will be one of the three, simply because it's the first and most mature full platform already available for big companies to deploy.

Doesn't mean we can't have some of our lunch eaten.


> The BOSH paradigm is: wait, why are you trying to automate the construction of pets? You want cattle. So it starts with a fixed OS base image and installs packages in a fixed, non-negotiable way. At the low level, very inflexible. But it allows BOSH to focus on the business of installing, upgrading, repairing and reconfiguring distributed systems as single systems. The single distributed system is where BOSH starts, not where it was extended to. It's an important distinction.

This is a problem compared to how we are reasoning about systems in the Docker world.

You either use an orchestration tool - which doesnt care what the base system is, and only cares whether it is discoverable/alive/dead/etc - or you use a construction tool . The construction tool could be a Dockerfile .. which you massage in shape for the particular OS/config you have... or you use ansible/salt to "transform" the system. For example, k8s uses Salt for orchestration and Docker for construction.

BOSH is a merger of both.. and is therefore extremely, extremely opinionated and incestuous. If I may overstep the limits of my intelligence - I would say, it is a product of times when Docker was not invented yet.

It is also very hard to work with as a consequence. Just wanted to let you know that.


> If I may overstep the limits of my intelligence - I would say, it is a product of times when Docker was not invented yet.

Well, it was, so that makes sense.

I actually said ... a year ago now, I think? ... that tools like Puppet and Chef would morph from "discipline my pets" into straight up image builders. They're so much nicer than Dockerfiles -- the latter is severely constrained by the nature stackable filesystems.

> It is also very hard to work with as a consequence. Just wanted to let you know that.

Out of curiousity, where do you see the seams to cut it open?


> tools like Puppet and Chef would morph from "discipline my pets" into straight up image builders

Maybe, but if you're running something like Kubernetes, which I genuinely think is the future, "images" stop having any sort of relevance to this problem space.

If you've ever written a dockerfile, as I'm sure you have, you'll know that the recipes for individual applications' setup become so simple that problems that Puppet, Chef etc. exist to solve just melt away. Most of my dockerfiles are just a RUN line or two. A tool like Puppet, much of which is architected around the issue of mutable state, would just get in the way when your build is immutable.

As for the host, containers don't care what kind of host environment they're being scheduled in, and so the recipes for building Kubernetes hosts also become a lot simpler.


I think Docker or runC is pretty much mandatory. Not because it is great tech or anything.. but because of https://hub.docker.com/explore/

So BOSH needs to work with my Dockerfile. That's the first step.

Second, build a minimal cluster (1 node?) for me with my Dockerfile. All that the cluster is doing is restarting my application if it dies and helping me deploy. And let me scale out gradually from there.

Keep BOSH if you want ;)


FWIW, I totally bailed on writing a provisioning system once you clued me into BOSH. It's the right answer, as near as I can tell.

I don't currently use it, because I am in a single-cloud environment and CloudFormation/Chef make more sense (plus, as you note, it's not currently super friendly), but I fully expect, should the progress that I've been seeing continue, I'll be using it in the next couple years.


One of the things I like about K8S is that is serves as the building block for creating a PAAS. I built a small tool on top of K8S for a production run. Deis is doing something similar (and far more ambitious), though they are also creating packages of abstractions called Helm. K8S is flexible and extensible because it doesn't try to be a Heroku, more of a framework for building your own Heroku.

Is there value-added in someone providing the higher level pieces? I bet there is. The flip side is, why did Kubernetes gain so much traction so fast, in a way that Cloud Foundery hasn't?

And if Deis and Helm overtakes Cloud Foundry? (They dropped what they had in order to build everything back from K8S) Maybe there is something CF folks are not seeing about this.


> K8S is flexible and extensible because it doesn't try to be a Heroku, more of a framework for building your own Heroku.

Agreed. Cloud Foundry is not a "competitor" to Kubernetes, in that sense. Diego is the Kubernetes analogy. The Cloud Foundry internals have become progressively more finely abstracted and teased apart to make it possible to swap components, so you never know.

That said, for those who want Docker + Kubernetes bundled into a turnkey PaaS, Red Hat went all-in on those two techs for OpenShift 3.


CF is a complete PaaS, isn't it? So it's not directly comparable to Docker nor Kubernetes.

From my perspective, CF seems far too large, monolithic (as in all-or-nothing) and opinionated, and honestly a bit heavy-handed and antiquated (the last two partly due to BOSH).

What's attractive about K8s in particular is the elegant, layered, modular, unopinionated design. K8s itself isn't simple, but it's quite small and lightweight, and its core principles are easy to understand; more importantly, it offers primitives without imposing a particular framework on anything (for example, it doesn't care about how the underlying host is configured or provisioned), which makes it a superb foundation for a higher-level PaaS, once someone designs that (maybe Deis).


We've already released a PaaS built ontop of Kubernetes at Red Hat, with the upstream named OpenShift Origin and the supported version named OpenShift Container Platform (https://www.openshift.com/container-platform/).

Our developers contribute a ton of work to the Kubernetes project, and pretty much everything that works on Kubernetes will also work on OpenShift. We do have some security settings by default that are more restrictive (like no containers running as root) but that can be easily modified to be more permissive.


Docker is the new mongodb. Let's just use freebsd jails + postgresql.


+1 I got sick of Docker and switched to FreeBSD. Far better experience.


Jails and Tredly.com or SmartOS.

Anything else is busted by design. Enjoy your poop sandwich.


Docker, container for hipsters that doesn't know deploy a simple linux virtual server.


> Each version of the CLI is incompatible with the last version of the CLI.

Sure, but I don't think this is a show stopper. You can and should only carefully upgrade between versions of Docker (and other mission-critical software). The process is functionally identical to the process you'd use to perform a zero-downtime migration between versions of the Linux kernel -- bring up a new server with the version of Docker you want to use, start your services on that new server, stop them on the old server, shut down the old server, done.

I don't mean to suggest this is trivial, only to suggest that it is no more complicated than tasks that you/we are already expected to perform.


>Sure, but I don't think this is a show stopper. You can and should only carefully upgrade between versions of Docker (and other mission-critical software).

Whether I upgrade carefully or not, the Docker CLI should work with all versions of Docker.


I don't get why this is a sticking point. There's no reason to update the Docker CLI independent of the Docker server, and it would be silly to run two versions of Docker on the same host, so why would it matter?


I think you are missing the point. It is not about upgrading the production cluster, more like erasing muscle memory and re-learning docker cli again. How often do you re-learn grep, awk, ls, ps, or other command line tools? How mad would you be if these were developed the same manner as Docker CLI? I would be definitely not happy.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: