Data API for Amazon Aurora Serverless

coderecipe · on May 31, 2019

With this, VPC is no longer needed from lambda call to RDS, and this means that cold start time will be lowered from seconds to milliseconds. I made a ready to use recipe (source code+deployment script+demo included) here https://coderecipe.ai/architectures/77374273 hopefully this help others to easily onboard to this new API.

scarface74 · on May 31, 2019

This only works for Aurora Serverless, not regular Aurora or any other managed databases.

etaioinshrdlu · on May 31, 2019

I told my AWS account manager today that this is what I wanted to see on Aurora Serverless:

- mysql 5.7 compatibility

- acting as replication master or slave

- faster upscaling, more likes 5s instead of 30s

- publicly accessible over internet (the rest of RDS has this)

- aurora parallel query built in

- aurora multi master built in

Basically, I asked for one product to merge all their interesting features. That sounds nice and like a one-size-fits all database. I would very much like to use it in production. It would require very little maintenance.

hn_throwaway_99 · on May 31, 2019

I wonder what effect this may have for AWS Lambdas connecting to a DB for synchronous calls (e.g. through API gateway). The biggest issue with Lambdas IMO is the cold start time. If your Lambda is in a VPC the cold start time is around 8-10 seconds, and if you have decent security practices your database will be in a VPC. I know AWS said they would be working on improving Lambda VPC cold start times, but would like to know if using Aurora Serverless with these kind of "connectionless connections" would also get rid of the need to be in a VPC. I've used Aurora (and really, really liked it) but I haven't used Aurora Serverless.

ftcHn · on May 31, 2019

Would it "get rid of the need to be in a VPC"? I think yes.

It looks like by enabling Data API, you expose that endpoint to the entire internet - which is secured like all the other AWS services with HTTPS, IAM, etc.

coderecipe · on May 31, 2019

Feel free to take a look at my code sample here with this new Data API on Aurora Serverless https://coderecipe.ai/architectures/77374273, demo and source code included. It removes the need of a vpc and it works like a charm

davmar · on May 31, 2019

came here wondering the same thing. those cold start times aren't acceptable for user-facing apps and i had to switch a serverless project from RDS in a VPC to dynamodb. even worse is that these cold start times are for each lambda, so if you've got concurrent usage then each new lambda spin up causes a cold start.

having said that, i'm actually pretty happy with dynamodb...so far.

cavisne · on May 31, 2019

Another cool thing about this is it avoids the connection pool issue with Lambda (where concurrent requests cant reuse connections).

Aurora is already pretty good at handling a lot of connections but this is even better.

keysmasher · on May 31, 2019

Not really, it was solved without this API given aurora serverless would manage connections and scaling automatically (https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide...).

But the real problem was connection time was unusable for any client facing application. I tried it after it was released (not preview). I really doubt this API would respond any faster.

djhworld · on May 31, 2019

You can create a connection pool in a static context that lives throughout the lifetime of the JVM.

Although admittedly if Lambda scales to multiple JVMs as request rate increases, you'll have multiple pools. Or if your request rate is low you'll not get much benefit

cavisne · on June 1, 2019

Lambda containers serve 1 request at a time, so the number of JVM's tends to scale out a lot quicker than you would expect. This is more of a broader problem with Java on lambda, as the classic Java way of creating a bunch of singletons on startup and accessing them from multiple threads doesn't work, you just get a really slow cold start time and some near empty connection pools.

tienshiao · on May 31, 2019

The beta version seemed like it had pretty poor performance: https://www.jeremydaly.com/aurora-serverless-data-api-a-firs...

Does anyone have performance feedback now that it is no longer beta?

reilly3000 · on May 31, 2019

I'm definitely excited about this, especially after paying $36/month for a NAT that I barely used for a long, long time, and spending too many hours configuring it for my Lambdas.

That said, I don't know how Jeremy Daly got away with making that post, per AWS preview terms. They are pretty explicit about not posting benchmarks on their preview products, and that makes sense as the API is not stable at all.

Still, I'm glad to see the data and hope that the performance has improved. I wasn't accepted into the preview, and I've started work now to move most of our infrastructure to GCP. It notably does not require any fancy footwork to have a Cloud Function talk to a Cloud SQL instance https://cloud.google.com/functions/docs/sql#overview

keysmasher · on May 31, 2019

Wait if I read that doc correctly, does it seem to suggest that connections will be closed when the function goes cold. So the locked up connections where lambda dies without disconnecting isn’t a problem google functions?

Think of a spike in traffic, 100 functions connect one connection per function. Then a break 80 of them go cold. Your max connections is 100, so if 80 didn’t disconnect and are waiting to timeout you are stuck. Any more functions coming online won’t have any connections.

The only work around in AWS was to setup an external connection pool, kind of begins to kill the serverless savings and all.

mattnguyen · on May 31, 2019

Jeremy has updated the post in response to the announcement.

- Lots of improvements & better documentation

- Smaller response size, but can be cut down a lot more

- Sub 100ms query performance

> I’m really impressed by the updates that have been made. I do want to reiterate that this isn’t an easy problem to solve, so I think the strides they’ve made are quite good. I’m not sure how connection management works under the hood, so I’ll likely need to experiment with that a bit to measure concurrent connection performance.

edit: formatting

blaisio · on May 31, 2019

... Don't you have to establish an HTTPS connection to use this API? Is that really easier than using the existing MySQL protocol? Or is it really so horrible that HTTPS is faster?

Things establishing new connections will never be as fast as things reusing existing connections. It seems wasteful to ignore this.

smt88 · on May 31, 2019

This appears to be targeted at Lambda function, which can't reuse existing connections between executions.

Also, establishing an HTTP connection is much faster than establishing a typical database connection, in my experience. I don't know why that is.

tybit · on May 31, 2019

Unlike a properly configured RDS cluster, this is available outside of your VPC on the open internet.

That’s the main selling point to me, though the connection pooling of MYSQL connections by the HTTPS proxy is also nice too.

joemag · on May 31, 2019

Unless I’m misunderstanding the question, many http clients would pool http(s) connections. Most of them do that by default. So connection establishment cost gets amortized over large number of API calls.