This looks great: simple yet powerful. I'm working a lot with Kubernetes, and you don't actually need to run an overlay network on AWS (or GCE). On AWS, there's some VPC magic that surprised me when I first saw it! But I believe that's beside the point; it's not about ECS vs Kubernetes, it is about what we can build on top.
In particular, I think the idea of embedding a Procfile in a Docker image is really clever; it neatly solves the problem of how to distribute the metadata about how to run an image.
Exactly! One of our goals was to also make the scheduling backend pluggable, so we're hoping that the community will implement a Kubernetes backend in the future. There's a lot of similar concepts between both, but we ultimately chose ECS for it's ease of operation and the integration with existing AWS services like ELB.
I'm interested in hearing more about how you use this in terms of development lifecycle. Does a container image get created for every release of your app? I've always wondered about the more correct approach to this.
This is how I currently use Docker:
1) Custom base image with all the things my company needs like supervisord, libpq, etc..
2) Custom per-service base images like ones with Java for our Clojure services or Python for our research services which are built off of the base.
3) A release consists of pulling the latest version of the base image, example, acme-python, and then injecting the latest project code into it.
My concern here essentially boils down to the image repo. Github needs to add container storage because while I admire Docker Hub's efforts, I don't trust it.
We have a setup that has been working out well for us:
1. We build docker images on every commit, in CI, and tag it with the git commit sha and branch (we don't actually use the branch tag anywhere, but we still tag it). This is essentially our "build" phase in the 12factor build/release/run. Every git commit has an associated docker image.
2. Our tooling for deploying is heavily based around the GitHub Deployments API. We have a project called Tugboat (https://github.com/remind101/tugboat) that receives deployment requests and fulfills them using the "/deploys" API of Empire. Tugboat simply deploys a docker image matching the GitHub repo, tagged with the git commit sha that is being requested for deployment (e.g. "remind101/acme-inc:<git sha>").
We originally started maintaining our own base images based on alpine, but it ended up not being worth the effort. Now, we just use the official base images for each language we use (Mostly Go, Ruby and Node here). We only run a single process inside each container. We treat our docker images much like portable Go binaries.
Really cool stuff. Seems like you found a good way to hand off most of the hard stuff to AWS and only do a few key things yourselves to make the experience better. As such I think Empire has the potential to be a viable option for many companies, which is something I rarely say about a PaaS project :)
Thanks Blake! I think somebody mentioned that we were standing on the shoulders of giants. I think most of your contributions around this domain qualify for that :)
This is neat. You might want to check out KONG (https://github.com/Mashape/kong) instead of putting a plain nginx in front of the containers/microservices. It is built on top of nginx too, but it provides all the extra functionality like rate-limiting and authentication via plugins.
KONG definitely looks interesting, and I'd love to know more about it. However, there's definitely not a lot written about it yet.
For example: I've gone searching through the blog posts, github readme, and KONG documentation, but I still have no idea _why_ it needs Cassandra. What does it store in there?
One of the main graphics on the KONG docs shows a Caching plugin (http://getkong.org/assets/images/homepage/diagram-right.png), but the list of available plugins doesn't include such an entry. Is that because caching is built in? Is the cache state stored in Cassandra? Or is the plugin yet to be built?
All the data that Kong stores (including rate-limiting data, consumers, etc) is being saved into Cassandra.
nginx has a simple in-memory cache, but it can only be shared across workers on the same instance, so in order to scale Kong horizontally by adding more servers there must be a third-party datastore (in this case Cassandra) that stores and serves the data to the cluster.
Kong supports a simple caching mechanism that's basically the one that nginx supports. We are planning to add a more complex Caching plugin that will store data into Cassandra as well, and will make the cached items available across the cluster.
Yep! We're definitely looking into Kong in the future. For now nginx + some static configuration works amazingly well for us. Very maintainable and minimizes external dependencies.
This work has plenty about it that was interesting. The best part to me was their answer to "why not feature X?" They said they prefer to build upon the most mature and stable technologies along with naming a few. Too many teams end up losing competitiveness by wasting precious hours debugging the latest and greatest thing that isn't quite reliable yet. Their choice is wiser and might get attention of more risk-conscious users.
Thank you, this looks awesome! As someone who still hasn't embraced docker due to all the orchestration / discovery madness I really appreciate such an elegant solution. I love and run everything on AWS so building on top of ECS is just another selling point.
I don't think so. It's their hardware, infrastructure, and engineers hosting it. They control those things. You get the rest. Sounds like an AWS-hosted solution with some advertised advantages over other solutions. Definitely not self-hosted.
Note: I think the only 3rd party thing I'd call self-hosted is colocation where I delivered the server, they plugged it in, and the most they do is reboot it for me.
From the point of view of software, I generally consider something self-host{ed,able} if I can run it on a machine I choose, without enforced network/environment requirements.
It's a fair viewpoint. I guess my critical point is control: control over the hardware, its software, legal rights to it, and so on. If they're in control, it's theirs. How can it be myself if outsiders control or own it?
My highest priority for businesses making these choices is usually slightly more pragmatic, and focuses on avoiding provider lockin - something that you can install and run on your own local hardware, you can also (in most cases) install and run on co-located hardware, on rented hardware, on traditional rented virtual hardware (i.e. VPS) or on "flexible" rented virtual hardware (i.e. AWS, Azure, etc).
That's a very pragmatic philosophy. I'm especially impressed at your unusual focus on vendor neutrality as a lack of it costs many companies millions in long run (see IBM & COBOL). Since I mostly avoid clouds, I'm not up to date on that end. I'd like to attempt your style of things as an experiment in the future, though.
Do you have a resource or resources for what components, strategies, or platforms are best for the deployment you describe? Something useful for production apps, reliable, and easy to move from dedicated hardware all the way to AWS (or back if necessary). I'm sure there's other readers on my end of things that might be interested as well.
Thanks. It doesn't always work out - clients/managers often seem to have "all the cool kids are using it" and/or "but it's the cloud, everyone uses the cloud now" mentality, but I try.
I should also emphasise that I'm mostly talking about infrastructure level "lock in" here - e.g. the artificial lock-in AWS creates for their Load Balancer and/or Elastic IP service by giving VMs new IPs on reboot, etc.
I definitely prefer Open Source solutions, but I'm one step more pragmatic in that space too - if a piece of locally-installable but proprietary software does the job and works with open standards (e.g. if you want to use self-hosted atmail) I'm less worried/vocal about that than if you say you want to use Gmail or whatever, but I'd still try to suggest a more open option.
In terms of resources, no sorry I don't have any single resource to go on, besides a basic rule/test:
Can I demonstrate the full stack being implemented, using one or more laptops (e.g. using VMs) on a plane or cruise ship? You could equally say "can i test the full stack while the WAN is disconnected" but that doesn't sound as fun!
I'm actually building my new business around this basic idea - giving smaller companies a better option to keep more control of their tech without the need for a full-time sysadmin (which is often financially impossible even if they wanted to). I genuinely believe the vast majority of things most businesses want/need to achieve can be done with existing Open Source software, its just usually not particularly easy to setup the various pieces, and make them work together.
That makes sense. It has been done before to a degree. For inspiration, look at Net Integrators Nitix appliance [1]. A UNIX system that was easier to configure, self-managing, largely auto-configed, partly self-healing, automatic backups, HA support, had most applications and a UI to integrate their configuration. It was selling well despite being priced above most SOHO servers. As often with good tech, a big firm (IBM) gobbled it up and rolled the tech into their own stack (Lotus).
A stack like you describe with good traits of Nitix-like solutions could be great for businesses not wanting much IT overhead. Might spread like wild fire so long as you don't sell out or balk over patent suits.
As I explained in a sibling thread, I would consider "self host{ed,able}" to mean that you can run it on an arbitrary machine (virtual and/or physical) regardless of provider (e.g. this is AWS specific, so I can't run it on physical hardware I own/rent or even on a competing provider of virtual machines)
We actually have a relay (https://github.com/remind101/empire/tree/master/relay) service that can be run alongside Empire that acts as a proxy to interactive Docker sessions. It's a bit of an experiment right now and something we'd like to solve better in the future, but it allows you to run containers with `emp run <command> -a <app>`.
Interesting! We are in a similar situation to where you were, with a bigish app on Heroku which we are keen to move over to EC2 to join the rest of our infrastructure, definitely keen to see how empire develops
Congrats, not everyone can create a simple elegant platform and write about it in such an accessible manner. I suppose you're standing on the shoulders of giants, but still.
This is the level of engineering/communication I always shoot for, and which (somewhat disappointingly) is rare where I've worked.
personally i use dokku (https://github.com/progrium/dokku). i would be happy to see one standard "heroku-like" paass since i feel too many people trying to tackle the same problem.
Dokku is definitely an awesome project (pretty much anything from Jeff Lindsay is pretty good)! The primary problem is that dokku is meant for just 1 service and we have quite a few.
We'd love to see one standard too. Personally, I think it's good to have a lot of competing solutions right now (ECS vs Kubernetes, Docker vs Rocket, etc) and we'll see things settle in the next couple of years as containerization becomes more common.
That's still all on one machine; Dokku can't automatically schedule & distribute containers/processes across a cluster of hosts like Empire & others can.
If you're doing anything non trivial you're going to outgrow one machine pretty soon :)
Cloud Foundry requires a ton of (compute) overhead to get set up. It's very much intended for large projects. Hell, you need a separate VM just to install it.
For anyone looking at a Dokku alternative, Cloud Foundry isn't one.
Actually it's quite possible to run a full Cloud Foundry environment in a single VM, as seen with Stackato (http://www.stackato.com/), which can then be scaled out as necessary.
> Hell, you need a separate VM just to install it.
I'm not sure why that's a problem. If you want something that's actually like heroku like in terms of uptime and what not, you need something that can manage the health of the cluster. Dokku's cool, but it doesn't make sense for anything you actually need to depend on. If it doesn't make sense to pay for the overhead of running your own paas, just use Heroku instead.
Heroku isn't cheap. And to run a small PaaS like alternative there are many many more solutions that are way better and faster and more secure than Cloud Foundry.
Hell I mean did you even read the requirements of Cloud Foundry, it takes a hell lot of everything just to manage a few boxes.
Also CF comes from Pivotal, listen to the talks they made, they say, start out small, however starting out small with CF is just impossible.
And OpenShift ist great, especially Origin M5 which is coming, soon tm.
"are way better and faster and more secure than Cloud Foundry"
Like how? Better, okay it's in the eye of the beholder. But secure? Faster? Shenanigans.
IF you want to just get started with Cloud Foundry, you install Lattice, it's one VM. In an HA setup (not the target but possible), it's 5 VMs.
Cloud foundry is about 20 VMs, and is what happens when you say "and then I want..." About 5 times: multi tenancy, role based access control, auto recovery health management, service brokerage, multiple DNS domains, app staging, etc. wherein people will otherwise just build on their own ...
Lattice is a more minimalist approach, for those who desire the production hardened core of Cloud Foundry but prefer to bring their own PaaS components or simply experiment.
That project is not really maintained anymore, and it is nothing more then dokku with a bunch of plugins preinstalled. I would suggest just using dokku and installing the plugins you need. Dokku has moved forward in their API and stuff a lot since dokku-alt was forked and the documentation for dokku does not really apply to dokku-alt.
Instead of nginx, we've had a pretty good experience using vulcand (https://github.com/mailgun/vulcand) as the front-end router for our micro-services.
First off, nginx is awesome and you can do all the things we did with vulcand with nginx, so it was just a question of friction.
The reason we went with vulcand is that it natively supports what we wanted to do i.e. route to micro-services based on dynamic etcd driven configuration. To do the same thing in nginx (at the time), we would have either had to use confd or custom lua.
We actually looked at VulcanD in an older version of Empire. When we decided to use a routing layer with this version of Empire, rather than just letting Empire/ELB expose each service (mostly because it is a lot easier for us to later shut off public access to each service) we threw together nginx because it was so simple.
I think at this point everytime we move a service we add like 5 lines to an nginx config, re-deploy the router in Empire, and the service is exposed.
The internal 'service discovery' makes this a lot easier, since we just have to tell nginx to route to http://<app_name> - no domain, no port, nothing more than the app_name thanks to DNS/resolv.conf search path & ELB stuff.
Sorry if that's not the case. I've also played briefly with Flynn and Deis and I haven't found anything that complicated that would need a whole rewrite and changing the entire approach. Moreover with Deis I can easily change providers (DO, AWS, Azure, etc.) and with Emprire I'm bound to ECS. At least that was my first impression, I have to read more.
Our approach was to re-use as much existing technology as possible, which is not the case for most others. That's the "complication" I'm referring to here. Empire grew from the need for a production grade platform that was going to be stable.
Empire doesn't actually lock you into ECS. The scheduling backend is pluggable and could support Kubernetes/Swarm in the future.
So at this point Empire itself doesn't deal with things like Autoscaling. That said, the Demo Cloudformation template (and the bootstrap script that kicks it off easily for you) make use of Autoscaling groups for the instances that containers are being run on.
So, in theory you could autoscale just like you always would. Monitor stats for a host, if a bunch of them start to run low on resources, kick off an autoscaling event.
That said, there's been quite a bit of talk about integrating Empire with Autoscaling, so that when, say, ECS couldn't find any instances with resources free for a task, Empire could kick off the autoscaling events for you. Could be pretty awesome :)
aws is silly expensive. Why didnt you build this on top of digitalocean? Digitalocean is so awesome right now. They dont even charge for bandwidth overages.
Digitalocean is silly expensive. Why don't you look at Atlantic.net they are so awesome and charge way less than overpriced digitalocean. Why not run it on your laptop which you've already paid for, that would be even cheaper than overpriced atlantic.net!
My old laptop offers much better price/performance, especially when I use free wifi in coffee shops for bandwidth. The laptop is already paid for so I only pay for the electricity (on days when I don't plug it into the wall at a friend's place to save even more). Those prices are just too expensive. There's always a place willing to do a job cheaper...
DO's "most popular plan" is $10/mo [1]
AWS's t2.micro in us-east is ~$10.30/mo with a standard 8GB disk [2]
Both VMs are single core, 1GB RAM. DO gives you 30GB SSD, but AWS has a freely adjustable disk size. Upscaling from 8 to 30GB is another $2 - but how many single-core low ram instances use double-digit GB?
In the middle, DO has 8 core, 16GB for $160/mo, AWS has 4 core 16GB for 185/mo + storage.
At the top end of the DO offerings, DO's 20-core 64GB machine in $640/mo, and AWS's 16-core 64GB machine is $725mo + storage (not much). The difference in pricing is not that crazy, and you get a crapload of extra free features on AWS.
Those AWS prices are with the "On-Demand pricing". If you're willing to lock-in for a year, reduce by 1/3. The argument that DO is "OMG cheaper" than AWS is no longer valid.
You seem to make the implicit assumption that a DO $10 VPS is equal in performance to a AWS t2.micro instance, which is not the case. For an example, check out:
DO 1GB instance at $10/month has a UnixBench of 1041 [1], to beat that with AWS you have to spend $374/month.
Also, with the t2.micro you get an EBS disk, whose I/O you have to pay in addition to the instance cost. You also have to pay for the bandwidth out of the chosen AWS region. This is not the case on DO.
AWS complicated pricing makes comparison like yours very difficult and error-prone: I would suggest to go with AWS only if you need the particular features (like ELB, SQS, VPC, etc.) that DO doesn't offer.
You seem to be ignoring my two other data points. Similarly, bandwidth costs are small, unless you're really pumping out a lot of data. I don't even notice our EBS transfer costs on our bill.
And how reliable a measure is UnixBench, when the top-performing server "CC2 Large" (by a factor of 25% over second place!) is a 2-core, 8GB RAM offering? It easily beats out all the two-dozen core, high-ram offerings below it.
Hell, the names of the AWS instances in that list aren't even correct. What's a "high-cpu medium"? They mean a "c1.medium" from looking at the stats page, which is now two generations obsolete - you have to know about them and go out of your way to provision one. The one name they do list, "m3.medium", is incorrectly labelled a "high i/o" VM; AWS doesn't have a "high i/o" VM, and the m3.medium is not considered by them to be network-, ram-, or storage-optimised, so I'm not sure where that's coming from. And if you do need disk i/o with AWS, you can provision reserved i/o (not very expensive), which needs to be accounted for in these comparisons. It's just getting my goat at the moment, because my comment was trying to argue against FUD, but that reference list can't even get well-known and advertised names correct.
AWS billing is complex, absolutely, but there is also a ton of flexibility, and it makes sense once you pass the learning curve. And micros do get throttled, but they also get a certain number of "throttle credits" that help them survive bursts. And yes, I agree that you should choose the right tool for the right job - one HNer really uses that huge amount of free bandwidth you get with the small DO servers with a media streaming service (I forget the handle). But that still doesn't change the fact that AWS is no longer "OMG expensive!" over DO.
Unixbench is a pretty awful benchmark unless the only thing you care about is disk IO (unlikely for a typical web app). The site you linked to doesn't even have data for AWS t2 instances types, so not really a useful data point for your comparison. The site only has data for instance types listed as "previous generation instances".
"But they'd be wrong! Truth is, I thought it mattered. I thought that [the marginal hosting costs between cloud service providers] mattered. But does it bollocks. Not compared to how [developer productivity and costs] matter."[0]
Yes, and going with AWS kills productivity because you have to learn AWS specific APIs, you're locked into their system and then you have to re-work your stuff to be hosted elsewhere when you outgrow AWS.
More than one startup has been killed purely by AWS hosting costs in the past 5 years.
> going with AWS kills productivity because you have to
> learn AWS specific APIs
Products I use:
* Redshift. It's Postgres's API, and I didn't have to learn how to manage petabyte-scale clusters
* EC2. It's Ubuntu. Or CentOS. Or whatever. You choose! Except no messing about with my own virtualization or hardware agreements or going ot the datacenter.
* RDS. It's whichever database you want it to be! Only it scales! And backsup! For free!
* ElastiCache. It's Redis!
etc. etc. etc.
Learning those dang AWS-specific APIs, eh? Who'd do it?
> has been killed purely by AWS hosting costs
Then they planned badly, because AWS prices consistently go down over time. If your "business" goes under because it becomes popular, then your marginal cost per user is negative, and you're running a charity for the benefit of your users, not a business. Blaming the demise of a company whose business model is giving out free icecream on the cost of icecream is missing the point a little.
DO is great for a lot of things, but it's not AWS. You can't allocate extra disk to a droplet, for example. AWS is a _much_ more complete offering than DO.
Agreed... AWS and Azure both offer a lot of services beyond just VPS hosting. Hosted database services, and extended blob/s3 storage are pretty valuable in and of themselves.
DO/Linode don't offer the equivalent, which means maintaining your own.. which is fine, but if you're relatively small, or a single person... time you dedicate to operations tasks is time you aren't developing features and/or fixing bugs. One's business is paramount... technology is just a tool to serve that.
In particular, I think the idea of embedding a Procfile in a Docker image is really clever; it neatly solves the problem of how to distribute the metadata about how to run an image.