The way I've been happiest using EC2 Auto Scaling was to have a single cron-job continuously calculating how many instances I should be running, and it sets the desired capacity manually with the Auto Scaling API[1]. This may seem to defeat the purpose of Auto Scaling, but it's actually much more convenient than spinning up/down EC2 instances with the EC2 API. You get to precisely control how to scale, and won't be at the mercy of the Auto Scaling heuristics.
So we had this cause a spectacular outage a few years ago.
We were doing exactly this - but we had a flaw: we didnt handle the case when the AWS API was actually down.
So we were constantly monitoring for how many running instances we had - but when the API went down, just as we were ramping up for our peak traffic - the system thought that none were running because the API was down - so it just kept continually launching instances.
The increased scale of instances pummeled the control plane
with thousands of instances all trying to come online and pull down their needed data to get operational -- which them killed our DBs, pipeline etc...
We had to reboot our entire production environment at peak service time...
That's not the right way to do it. You shouldn't monitor how many instances you're running. You just need to determine how many instances you should be running based on your scaling driver (cpu, # of users, database connections, etc). Then you call the Auto Scaling SetDesiredCapacity API with the number, and it is idempotent[1]. If the AWS API is down, your fleet size just won't change.
While the poster was aware of it, he did not provide a solution whereas DVassallo provided a valuable step by step on how to do it properly. Which may help others in the future.
Think long and hard why you felt it necessary to make your comment and what value it actually provided.
> Think long and hard why you felt it necessary to make your comment and what value it actually provided.
Yeah, that's what I was telling the other poster in a nicer way. Same goes to you. You can provide advice without repeating criticism for no reason when the person specifically said they did something wrong.
And save me the lecturing on "long and hard." My comment has 15 upvotes, so it's pretty unlikely I'm the one off the mark in this conversation.
> You can provide advice without repeating criticism for no reason when the person specifically said they did something wrong.
That statement is a joke, here it is reworded:
> A person should be invulnerable to criticism as long as they make a humbling remark.
Doesn't sound so great now does it.
Also, I'm not surprised you got 15 upvotes. This place has ceased to be a hacker forum for many years now. Too many eternal politically correct Septembers.
A strawman argument + when your view is not popular, the environment must be the problem. Classic undefeatable argument. I'm surprised you have problems getting along here.
> when your view is not popular, the environment must be the problem
You were the one using upvotes to validate your argument when it's a fact that posting a political opinion in either a left and right forum will net highly different responses. Of course the environment plays a part.
Won't even bother dissecting the first shot. Your arguments have been weak at best till now this final one was the final straw, man.
If this is still a pitfall for users of AWS ~5 years in.. then its not a fault of my communication...
You know what I think its a fault of:
Lack of a canonical DevOps "university" stemmed from SV startups.
DevOps at this point should not just be a degree -- it should be a freaking field of study and classes offered by YC.... Look at their pedigree of companies at scale. We should all make an effort to give back in this regard...
devops as a field of serious study is pretty pathetic.
I wouldn't trust a devops 'grad' to do anything or
know anything.
But with the resurgence in certs and boot camps and
other snake oil making $$ why not?
Yeah, like I said - this was a few years ago, and the system wasnt designed to be able to scale using ASGs at the time. (Fleet didnt yet exist, amd a bunch of other reasons) - but scaling was based on users and load complexity for the data we were handling -- this wasnt a web service.
I don't understand--how were you launching instances if the API(s) was/were down? Your system was unable to determine that there were instances running, but it was able to send RunInstances requests to EC2?
Another thing to keep in mind is that AWS local capacity can run incredibly close to the wire at times. You might be surprised if you knew how much capacity for your instance type was actually available under the hood. I’ve personally seen insufficient capacity errors.
What was your resolution to this issue? Did you fix your service to account for the API being down, or did you switch to an entirely different approach?
I can't recall the exact implementation detail, but We then logged the number of running instances in a file, and read the last qty of instances and the delta from when launched - and made the system not get over aggressive if it couldnt read the current set.
We also added smart loading across AZs due to spot instances getting whacked when our fleet was outbid and AWS took them back.
As well as other monitoring methods to be sure we werent caught with a smart system doing dumb things.
AWS has limits on the amount of resources you can have in a VPC. You can request to increase these via a process out of bounds of the API. This mechanism is there exactly for these kind of things (and malicious API calls should you get hacked). Maybe someone at your company was thinking to big? Normally these are around 10/50 for each EC2 type.
you are aware that if you have a close enough relationship with AWS you can request and set your own limits?
Limits are malleable based on your use case. Speak with your rep.
You might even not know how limits came to be... I am.
---
I have had a time when git suffered a flaw, and a junior dev also suffered a flaw in checking in secrets.... thousands of instances across the globe were launched... for bitcoin mining... $700,000 in a few hours...
I don’t think it’s crazy to think you are more capable of predicting the unique demand curves of your business better than a heuristic designed to be good enough for the median Amazon customer.
We do this for all of our Kube clusters here. We have a nifty use of this in our CI cluster, where Job resources are scheduled by a service that monitors our build queue. As builds are initiated by git pushes, single-use Jobs (pods) are created and when there are not enough free capacity to scale them, the build cluster scales up. On the weekends when everyone is gone, it scales back to near-nothing. This is a huuuuge money-saver for us because we use beefy c5 instances for our builds. It also saves lots of time because devs are no longer waiting 45+ minutes for their build to start.
There are many limitations that you need to "read between the lines" with AWS auto scaling.
For example, we have daemons reading messages from SQS, if you try to use auto scaling based on SQS metrics, you come to realize pretty quickly that CloudWatch is updated every 5 minutes. For most messages, this is simply too late.
In a lot of cases, you are better off with updating CloudWatch yourself with your own interval using lambda functions (for example) and let the rest follow the path of AWS managed auto scaling.
There is also a cascading auto scale that you need to follow. If we take ECS for example, you need to have auto scaling for the containers running (Tasks) AND after that you also need auto scaling for the EC2 resources. Both of these have different scaling speeds. Containers scale instantly while instances scale much slower. Even if you pack your own image, there is still a significant delay.
The effectiveness of dynamic scaling significantly depends on what metrics you use to scale. My recommendation for that sort of system is to auto-scaled based on the percent of capacity in use.
For example, imagine that each machine has 20 available threads for processing messages received from SQS. Then I'd track a metric which is the percent of threads that are in use. If I'm trying to meet a message processing SLA, then my goal is to begin auto-scaling before that in-use percentage reaches 100%, e.g., we might scale up when the average thread utilization breaches 80%. (Or if you process messages with unlimited concurrent threads, then you could use CPU utilization instead.)
The benefit of this approach is that you can begin auto-scaling your system before it saturates and messages start to be delayed. Messages will only be delayed once the in-use percent reaches 100% -- as long as there are threads available (i.e., in-use < 100%), messages will be processed immediately.
If you were to auto-scale on SQS metrics like queue length, then the length will stay approximately zero until the system starts falling behind, and then it's too late. If you scale on queue size then you can't preemptively scale when load is increasing. By monitoring and scaling on thread capacity, you can track your effective utilization as it climbs from 50% to 80% to 100%; and you can begin scaling before it reaches 100%, before messages start to back up.
The other benefit of this approach is that it works equally well at many different scales; a threshold like 80% thread utilization works just as well with a single host, as with a fleet of 100 hosts. By comparison, thresholds on metrics like queue length need to be adjusted as the scale and throughput of the system changes.
From a bird's eye view, you also need to figure out what costs you more.
For example (and I know nothing about the use-case of OP, can only estimate), you might be able to buffer requests into a queue and have it scale up slower.
You might have auto scaling that needs to be close to real time and auto scaling that can happen on a span of minutes.
Every auto scaling needs to also keep in mind the storage scaling, often you are limited by the DB write capacity or others.
AWS employee here. If you are able to achieve consistent greater than 50% utilization of your EC2 instances or have a high percentage of spot or reserved instances then ECS on EC2 is still cheaper than Fargate. If your workload is very large, requiring many instances this may make the economics of ECS on EC2 more attractive than using Fargate. (Almost never the case for small workloads though).
Additionally, a major use case for ECS is machine learning workloads powered by GPU's and Fargate does not yet have this support. With ECS you can run p2 or p3 instances and orchestrate machine learning containers across them with even GPU reservation and GPU pinning.
I'm not totally up to speed on ECS vs EKS economics but it seems like EKS with p2/p3 would be a sweet solution for this. Even better if you have a mixed workload and you want to easily target GPU-enabled instances by adding a taint to the podspec.
ECS GPU scheduling is production ready, and streamlined quite a bit on the initial getting started workflow due to the fact that we provide a maintained GPU optimized AMI for ECS that already has your NVIDIA kernel drivers and Docker GPU runtime. ECS supports GPU pinning for maximum performance, as well as mixed CPU and GPU workloads in the same cluster: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/...
Apart from pricing and the potential to overcommit resources on EC2-ECS, there are a couple of other differences.
One is your options for doing forensics on Fargate. AWS manage the underlying host so you give up the option of Doing host level investigations. It’s not necessarily worse as you can fill this gap in other ways.
Logging is currently only via CloudWatch logs so if you want to get logs into something like Splunk you’ll have to run something that can pick up these logs. You’ll have that issue to solve if you want logs from some other AWS services like Lambda to go to the same place. The bigger issue for us is that you can’t add additional metadata to log events without building that into your application or getting tricky with log group names. On EC2 we’ve been using fluentd to add additional context to each log event like the instance it came from, the AZ, etc. Support for additional log drivers on Fargate is on the public roadmap[1][2] so there will hopefully be some more options soon.
At least one is, that you get to use the leftover CPU and memory for your other containers when you use an EC2 instance. With some workloads this lets you to overcommit those resources if you know all your containers won't max out simultaneously.
Edit: another one is that you can run ECS on spot fleet and save some money.
Fargate is orthogonal to ECS and can be used together. The difference is that instead spinning VMs as hosts and configuring them for ECS and worrying about spinning just right amount of them, you select fargate, which does all of that behind the scenes (kind of like lambda), but the VM instances provided by fargate are a bit more expensive.
Auto-scaling is depending on startup time. If your startup time for a new instance/container is 5 seconds, then you need to predict what your traffic will be in 5 seconds. If your startup time is 10 minutes, then you need to predict your traffic in 10 minutes.
The choice of metric is important, but it needs to be a metric that predicts future traffic if you want to autoscale user facing services. CPU load is not that metric.
The best way to do autoscaling is to build a system that is unique to your business to predict your traffic, and then use AWS's autoscaling as your backup for when you get your prediction wrong.
Having used ECS quite a bit, I do not recommend anyone building a new stack based on it. Kubernetes solves everything ECS solves, but usually better and without sveral of the issues mentioned here. Last time I checked, AWS was still lagging behind Azure and GCP on Kubernetes, but I have a strong feeling they're prioritizing improving EKS over ECS.
If you're already invested in ECS it's a different story, of course.
AWS employee here. I don't think there is anything fundamental that makes Kubernetes avoid autoscaling issues. Just like with ECS if you don't setup the right horizontal pod autoscaling settings in Kubernetes you can easily end up with under or over scheduling of your Kubernetes pods. Ultimately no matter whether you use ECS or EKS you will need to do some finetuning and testing to make sure your autoscaling setup matches with your real world traffic patterns.
AWS is committed to improving both ECS and EKS. You can see our public roadmap with many in progress improvements for both container orchestration platforms here: https://github.com/aws/containers-roadmap/projects/1
Feel free to add your own additional suggestions, or vote on existing items to help us prioritize what we should work on faster!
We use a ton of ECS, for running batch processing for our data pipeline, 4ish small internal webapps/microservices, and our Jenkins testing compute.
Some of the problems we're seeing is task placement and waiting is too hard (we had to write our own jittered waiter to not overload the ecs api endpoints when asking if our tasks are ready to place). Scaling the underlining EC2 instances is slow. The task definition=>family=>container definition hierarchy is not great. Log discovery is a bitch.
Are these all solved under K8S? I've no experience with kubernetes, but if so, might need to rethink where we run our containers. ECS was just so easy, and then so hard.
Disagree. We've been running on ECS for years and it's a very economical and reliable way to run containers on AWS. The service itself is free, the agent over time has become very reliable, and the integrations with AWS services like ELB, and CloudMap are seamless.
It's significantly more complex to host kubernetes infrastructure, and EKS is significantly more expensive.
ECS would be a first choice, with EKS a second choice if my needs dictated it (perhaps a hybrid or milticloud scenario).
Kubernetes is way more complicated if you just need to run one or two services using Docker, Fargate is brand new so it has a lot of things to prove...
AWS employee here. Sorry to hear that you feel ECS is half baked. Feel free to reach out directly using the details in my profile info if you have any feedback you'd like me pass on to the team.
To clear up the confusion on the relationship between Fargate and ECS, think of Fargate as the hosting layer: it runs your container for you on demand and bills you for the amount of CPU and GB your container reserved per second. On the other hand ECS is the management layer. It provides the API that you use to orchestrate launching X containers, spreading them across availability zones, and hooking them up to other resources automatically (like load balancers, service discovery, etc).
Currently you can use ECS without using Fargate, by providing your own pool of EC2 instances to host the containers on. However, you can not use Fargate without ECS, as the hosting layer doesn't know how to run your full application stack without being instructed to by the ECS management layer.
From my perspective Fargate offers the functionality I would have expected from ECS in the first place. What ECS provides OOB requires too much janitoring and ultimately isn't terribly different in effort compared to running your own k8s or mesos infra on EC2 instances you provisioned yourself. You still basically needed an orchestration layer over ECS.
Which is why, I assume, Fargate is now listed as an integral feature of ECS on the product page.
Yeah to be clear the Fargate container hosting was always the vision for ECS, from the very first internal proposal to build this system. But its necessary to build something that keeps track of container state at scale first, and that is ECS. We built ECS so that it can keep track of container state both in Fargate and containers running on your own self managed EC2 hosts. This gives you the most flexibility if you have really specific needs for your container hosts that Fargate can't cover for you.
Agreed. ECS has several limitations that you don't really discover until you are fully into the weeds. Don't use it if you are just starting with AWS unless your use case is a straightforward website stack. Do not use it for complex microservice architectures.
I'm surprised I didn't see application performance monitoring mentioned here. A lot of applications are complex and in those cases adding containers is only effective until you reach the next constraint.
Having two resources (such as DB and app) scale in concert can be exceedingly difficult.
(Author here) Absolutely! It's amazing how complex things get on the configuration side as you try to get smarter about autoscaling. The big downside of trying to be too clever here is that you wind up with a wildly brittle autoscaling setup that falls over as soon as the underlying assumptions around the relationship between your metrics and your scaling needs change. As in many things engineering, we've found that it's best to keep the configuration as simple as possible and use a solid foundation of reporting / alerting to give you an early heads up that you need to revisit and update your autoscaling strategy.
> Having two resources (such as DB and app) scale in concert can be exceedingly difficult.
This means your resources are too tightly coupled. If they are so tightly coupled that they need to scale together then they are not two resources and you should look into restructuring them into two actual resources or bind them more closely to make a single resource.
As far as runtime, applications and DBs are already decoupled. You have N application instances mapped to M database instances. Applications can usually scale pretty much with load. Databases vary wildly in how they scale and it depends on the DB type.
You could hack around this by creating two auto-scaling groups with different instance types and then have them follow the same metric such that the small group goes to 0 and the larger one spins up. Not a great solution but better than nothing.
Jelastic can run docker in two separate ways:
1) the way when the image is running as a plain VZ container (advantage: many extra features become available, disadvantage: compatibility with native docker is not 100%)
2) run via Native docker engine, but instead of VM (like everyone does) elastic VZ container is used as a host machine for running embedded containers. Compatibility with classical docker is 100% + vertical scaling as an extra feature, because of VZ layer on top.
Basically both ways of running docker images allows to have flexible and managed resource management so vertical scaling is indeed available for both of them.
BTW, Kubernetes in Jelastic is using the same approach that makes Jelastic only available platform for K8S on the market with "pay as you use" billing model, where you can pay only for consumed resources and not for the limits as everyone does.
Hi, Jelastic founder is here. Thank you for mentioning our product. Vertical scaling is not a marketing :), it's reality.
Jelastic public cloud providers offer automatic vertical scaling with pay-as-you-use billing model (please do not confuse with pay-as-you-go). Except this our team helps related technologies to become more elastic, for example Java https://jelastic.com/blog/elastic-jvm-vertical-scaling/
Regarding the docker support, there are two flavors inside Jelastic:
1) Native Docker Engine - you can create a dedicated container engine for your project in the same way as you do on any IaaS today, for example “How to run Docker Swarm” https://jelastic.com/blog/docker-swarm-auto-clustering-and-s.... An advantage here is the vertical scaling feature. In Jelastic unused resources will not be considered as paid while at any other cloud provider you will have to pay for the VM resource limits.
2) Enhanced System Containers based on Dockerfile - there is no need to provision a dedicated docker engine or swarm. This solution provides even better density, elasticity, multi-tenancy and security, more advanced integration with UI and PaaS features set compared to #1. It supports multiple processes inside a single container, you can get an SSH access and use all standard tools for app deployment, write to local filesystem, use multicast and so on. It supports traditional or legacy apps while images can be prepared in the same familiar Dockefile format. Unfortunately it's not fully compatible with Native Docker Engine due to specifics limitations/requirements of docker technology itself.
Thank you for pointing out this issue. In the upcoming release we will clarify the difference between two and provide more tips which one is better to use in various cases.
I've also been deploying services on ECS for close to a year now and would like to address some inaccuracies the author seems to have made:
1) in 'Surprise 1' the author offers examples of CPU Utilization (or target) is between 80% and 95% without mentioning the reserved CPU/memory (aka size) of those tasks (under the assumption that he's using the Fargate launch type). The 'size' of a task also influences the average CPU target utilization. For instance, if a task requires the reserved CPU of 4 vCPUs, then a spike from 80% to 95% is handled differently than when a task reserves 1 or 2 vCPUs. The same goes for memory. In an example setup I'd use 1-2 vCPUs sized tasks with a service-wide target avg. CPU Utilization of 70% along and a StepScaling policy which adds 10% more tasks if the service avg. CPUU falls between 70-80, 20% if between 80-90 and 25% if above 90. My strategy has been being smaller-sized tasks, lower service avg CPU utilization (compared to 80%-90%) and shorter evaluation periods/datapoints for the scale-out CW alarms (minimum being 60 seconds IIRC). The short evaluation periods/low number of datapoints of the CW alarm allowed me to handle spikes reasonably fast.
2) in 'Surprise 3' the author claims that the Terraform's aws_appautoscaling_policy 'is rather light on documentation'. Since I am a user of Terraform for several years, I find it inaccurate mostly because of the several examples available in the documentation https://www.terraform.io/docs/providers/aws/r/appautoscaling... as well as many more when doing a Github exact search for "aws_appautoscaling_policy" language:HCL will reveal many, many more examples from open-source repos (some with permissive licenses too). I'd created a custom ecs-service TF module which creates for each service (optionally) an ALB along with listeners and the attached ACM-issued TLS certs and TGs, the scale-in/out CW alerts with configurable thresholds/policies, SGs, Route53, etc. allowing one to quickly configure and launch an ECS service fast and reliably.
Regarding the scale-in, I typically also have that at intervals between 5-15 minutes to avoid an erratic scale-in/scale-out 'zig-zag' happening even at the cost of briefly over provisioning.
Yup. Your DB usually has to be over-provisioned for peak WRITE capacity.
Read capacity is easy to skill to infinity with caches. But if a DB can only write 1000 updates per second, nothing will change that.
In many cases - it's ok to not process EVERYTHING right away. Process the important stuff RIGHT AWAY. Slowly process the unimportant stuff in your spare time.
The biggest challenges I’ve had with auto scaling have been slow scaling time and default metrics not being a good proxy for scaling needs. One thing I was mildly curious about: if you’re going to build your own metrics and scaler, what would be some of the downsides of having it scale down by just putting instances in the Stopped state, then scale up by starting them? In my experience starting takes seconds while launching new instances takes minutes.
Having to deploy updates to stopped instances would be complicated and you’d have to pay EBS costs for stopped instances, but I’m curious if there are other issues. Launching an instance from an AMI, even after the instance comes up the disk tends to be very slow for some time as if it’s lazily loading the filesystem over the network.
AWS needs to glue EC2 and ECS scheduling together. Today the schedulers are separate. So basically the feet does not know what the arms are doing. That leaves fixing this scaling up to the client meaning duplicate code effort solving the same thing for each AWS customer.
Feel free to drop a thumbs up on the roadmap item to show your support and boost its priority on the roadmap, or leave a comment to let us know more about your needs.
Your last sentence is something I have been thinking about for a very long time.
I have been in Devops since it before it had a name and I see many companies solving the same problems.
Like ths auto-scaling post. That's not the first company to deal with it (nor the last). Providing a set of tools can be very beneficial but so hard to dial down.
I have a very big itch around solving this problem.
Quick question for the AWS employee solving inquiries: I used ECS in 2017, and back then there was this weird issue where sometimes tasks would switch to new versions in like, a minute (if that), but sometimes, like 2/10, it would take like 10-12 minutes just for it to start killing old Task containers. Back then there wasn't any timeout option or anything to force the killing. Do you know if now there is? The project was killed for different reasons, but I really liked everything else on ECS. Thanks!
EDIT: I meant killing containers, not the Tasks themselves. Sorry.
An incredible amount of software and infrastructure is written precisely for analytics data gathering workloads.
I'm pretty confident AWS's product for this use case would be Lambda and the new on-demand DynamoDB.
Is there actually a use case in analytics that requires a server that accepts connections from multiple clients, and then has to have <60ms latency including state over the wire and executing sophisticated business logic, between those clients, for time periods longer than 5 seconds? I.e. something that resembles a video game?
Because if there isn't, if your goal is to scale, why have containers at all?
Im under the impression that lambda gets expensive if you have many requests. E.g. the story on ipify that showed 1000s of USD on lambda vs 100s on Heroku:
“Today the service runs for between $150/mo and $200/mo depending on burst traffic and other factors. If I factor that into my calculator (assuming a $200/mo spend), ipify is able to service each request for total cost of $0.000000007. That’s astoundingly low.
If you compare that to the expected cost of running the same service on a serverless or functions-as-a-service provider, ipify would cost thousands of dollars per month.”
Batching incoming requests, for one. Kinesis only allows 5 write requests per second per shard, for example. As well, Lambda have limits regarding concurrent executions and are very slow (10s) if needing VPC connectivity (in this case the default concurrent lambda limit is 350 due to ENIs)
Hmm... I don't see anything in the docs implying that - Kinesis API docs say it's possible to ingest 1000 records or 1MB per shard per second. There's a 5/s limit on reads however but those deal with batches of records anyway.
We have one service running that consumes data to a Kinesis stream published as an API GW endpoint. Preprocessing is done in Lambda in batches of 100 records and the processed records get pushed to Firehose streams for batched loads to a Redshift cluster for analytics. So far we've been very happy with the solution - very little custom code, almost no ops required and it performs and scales well.
You can, but it’s not visible to the hypervisor (it’s an OS concept) so you have to publish that metric from an agent on the machine. Then you can use it for autoscaling.
The number of switching tasks may be very high for a number of reasons, including a very large number of threads which do very small chunk of work each and yield.
I got bit by this! Even worse is that one of the servers crumpled because we didn't scale up fast enough - so AWS killed it because of the health metric. Which then took out the remaining two because they were then far, far over capacity. I got the pager duty alert and found a total cluster and just manually set it to scale up way bigger. Now for all big events we manually bump minimum server counts for that period :\
>So, if you’re targeting 95% CPU utilization in a web service, the maximum amount that the service scales out after each cooldown period is 11%: 100 / 90 = 1.1
How many errors are in that sentence? The 95% -> 90 mysterious conversion. 100/90 is actually 1.111(repeating of course), not 1.1. And if it did equal 1.1 it would be 10%, not 11%.
I think most people want to write this kind of blog after slogging through the AWS learning curve. Then they figure out how to use it and the urge goes away.
[1] https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-man...