We got around this with a bit of a hack - use a CloudWatch event to trigger a dummy invocation of your function every five minutes. This keeps the container "hot" and reduces the start time (and is negligible cost-wise). This won't fix the cold starts when the function scales up, but it does reduce latency for 99% of our API requests.
We're already doing this, but unfortunately this doesn't work well when you are expecting lots of parallel calls, and various times.
The obvious solution would be to just merge microservices into the same lambda, but then we'd rather switch to EKS or smth, and actually be able to utilize microservice architecture fully.
To give a little more context, we have a bunch of microservices exposing GraphQL endpoints/schemas. We then have a Gateway which stitches these together, and exposes a public GraphQL schema. Because of the flexibility (by design) of GraphQL, we can easily end up invoking multiple parallel calls to several microservices, when the schema gets transformed in the Gateway.
This works really well, and gives a lot of flexibility in designing our APIs, especially utilizing microservices to the full extent. It also works really well when the lambdas are already warm, but when we then get one cold start, amongst them all, suddenly we go from responses in ms, to responses in seconds, which I don't think is acceptable.
We've been shaving off things here and there, but we are at the mercy of cold starts more of less. So our current plan is to migrate to an EKS setup, we just need to get a fully automated deployment story going, to replace our current CI/CD setup, which heavily uses the serverless framework.
It's still insanely cheap. You could have millions of executions per month and only pay $0.50. But if you needed to, it could scale up to billions of invocation nearly instantly, something a standard server would have trouble doing as easily as Lambda does.
Then again, you are doing the hacks you describe because it is not scaling up nearly instantly. The cold start delays are not only an issue when scaling from zero to one, they hit you whenever you scale the capacity up.
This. I continue to hear about hacks of people running ping to keep a single instance warm. But that doesn't cover periodic changes in capacity needs nor spikes. I would think to avoid cold starts all together you'd need a pinger that sent exactly the load difference between peak load and current load. I would love to hear if anyone is keeping Lambdas warm at more than n=1 capacity.
>if anyone is keeping Lambdas warm at more than n=1 capacity
There are various ways to do it, but I feel that it's a very suboptimal solution, and it still won't guarentee no cold starts happen.
I've personally come to the conclusion, that lambda is very nice for anything non-latency sensitive. We are still using it to great effect for e.g. processing incoming IoT data samples, which can vary quite a lot, but only happens in the backend, and nobody will care if it's 1-2 seconds delayed.