Hacker News new | past | comments | ask | show | jobs | submit login
Quicker serverless Postgres connections (neon.tech)
164 points by nikita on March 28, 2023 | hide | past | favorite | 98 comments



> A key feature of these environments is that state is not generally persisted from one request to the next. That means we can’t use standard client-side database connection pooling.

So we introduced so many optimizations just because we can't persist state. I can't help but think this is being penny wise pound foolish; the problems this article is solving wouldn't have been problems in the first place if you choose a boring architecture.


Serverless environments are a matter of fact. They exist, they are super convenient, and people want to use Postgres in them. This is a bridge for such people. Yes, it requires some sophisticated engineering, but it also enables to have less boilerplate code for a lot of people.


I think if this particular logic is hidden away behind a server, it doesn't really matter - people want convenience, and for someone else to hide the complexity.

Serverless postgres-as-a-service isn't a stretch at all.

This thread does remind me of a previous thread, I believe it was something like optimizing a basic string operation (concatenation?) by going down to assembly and SIMD. Few people have a need to know the details of that particular implementation, and mere mortals cannot comprehend the wizardry going on there either, but that's just fine because if it gets incorporated into standard libraries or the underlying implementations that said standard libraries use, everybody benefits without ever having to understand the inner workings.


Sure, you want the easiest thing possible that fulfills the common use cases. But right now serverless isn't there quite yet. Having persistent Postgres connections is a really basic requirement that the serverless platforms should've implemented already.

Heroku and similar things are the sweet spot right now for me. I don't have to deal with much, and I get exactly the amount of flexibility that I need. Sure it'd be a little nicer not to deal with Express on my own in a NodeJS environment, but right now it's easier to deal with than it is to avoid. Some day I'll happily go serverless.


Unless it breaks one day and no one knows why


Someone knows why, but that someone is not me. And if it had to be me, I'd go figure it out given some time. Point is, I don't need SIMD knowledge hot-loaded in my head.


Also known as layers of abstractions. The entirety of (software) technology is built on that concept.


Things are getting more serverlesss too. With Deno Deploy, Cloudflare Workers, etc I see myself eventually wanting to run most of my application on platforms like this, and a serverless & distributed database makes that much easier and I expect it’ll probably perform better in many cases too.


Deno Deploy doesn’t yet support npm (it’s coming), which I found out the hard way last week.

Overall tho I agree with your statement, and I now build most of my apps using serverless and databases like Neon.


100%


I've been doing "serverless" on app engine for more than a decade and it handles postgres connections fine without any heroics.

There's always instances with state somewhere; it's just a question of how smart your serverless infrastructure is about pooling the instances.


The “serverless” being referred to in the article is run using V8 isolates, whereas app engine runs full Node.js processes. Although they have limitations like not supporting raw TCP, isolates can start within a couple milliseconds and are much more CPU efficient than standalone processes.


Bound to be controversial hot take: serverless is the boring architecture you were looking for in the first place


Almost. It'd be boring if it'd handle these Postgres connections efficiently for me instead of making me rely on workarounds.


(As cool as Neon is. If you're already serverless for other reasons, they're providing a nice solution for Postgres.)


I think serverless is fine as long as you don't lock yourself in to proprietary, vendor-specific APIs. For example, with AWS Lambda and Python, you could use Mangum (see https://mangum.io/ ) so you're deploying an ASGI app, based on say FastAPI or Starlette. For databases, stick with Postgres.


As if boring architectures don't have their own problems.


Such as? Besides of course not being able to find people that want to work on it.


IDK, I'm assuming "boring" here means stateful monolith?

So you have:

a) No enforceable isolation, reliance entirely on good behaviors and module systems

b) No independent scaling

c) No isolation of security components so instead of a minting service and an edge service you have one service that does both

d) Single language lock-in

I could go on and on an don


Boring architectures generally don’t scale well


I usually find that "interesting" architectures scale even worse, at least for the 99% of projects and organizations that'd never actually outgrow the "boring" solution. All the pain of the big fancy solution, but without the actual need that makes it truly better than the alternative.

... of course, the best-paying jobs are at the places that really do need that scale, so of course we all want an excuse to play with those tools, even when it's not the right business choice. And it's not as if software folks are the only ones with a principal-agent problem, in business. Oh well.


I'm with you. I understand the critique, but experience has shown me "boring" is more maintainable, and at some point even "clever" tools stop being worth the effort. Besides, eventually scale pushes tech to roll their own, or rewrite/augment tools.

I dislike using boring as a measure of sufficiency. It makes me wonder if those drawn to the clever solutions carry a "I know better than you" perspective, the kind of dogmatic optimism that tanked SVB.


Define "boring" and "scale", because in my experience you can vertically scale hardware to an almost unbelievable number of concurrently logged-in users.

With most services, until you go over a few million users concurrent users, you really don't need to horizontally scale.


Being boring has nothing to do with scale. Some of the biggest applications run on incredibly simple architectures, and often are able to scale so well because of that (boring) simplicity.


Cassandra is probably in the realm 'boring and scalable'


Definitely not. I'd much rather scale Postgres manually through layers of sharding and proxies than handle Cassandra.

There is ScyllaDB for a much better reimplementation of Cassandra/Dynamo, but wide-column databases are still best for niche scenarios, especially as RDBMS are rapidly evolving into natively distributed architectures.


What are the biggest apps on the simplest architectures and what are the architectures?


Monolithic apps running on big servers talking to databases running on big servers. Very little orchestration overhead or complexity with serverless and the rest.

Everything from StackOverflow to ad serving systems to high-frequency trading exchanges are run this way.


Stack overflow is unique in that it’s most read only. Link to high frequency trading “boring” architecture?


What is your definition of non-boring architecture? I'm curious as to what you consider the exciting architectures necessary for scaling to make sure we're talking about the same thing.

Quora, Pinterest, Twitter, etc are all just big app instances talking to DB instances, with separate systems for background processing, queues, and caching. Are you suggesting that they would scale better with serverless functions instead?

Here's a list of architectures: http://highscalability.com/all-time-favorites/


I do not consider any of your examples really that simple or boring.

Boring to me is a load balancer, app server and cache. Once you start talking about background jobs and queues I don’t see how that’s any simpler than the alternative.

You could say Google, running Borg, is basically a database with a bunch of load balancers and background job processors. Obviously the complexity is in the scale and implementation.


Ironic that this page is down? (also not https? heh)


Stack Overflow is probably one of the more public examples. That is mostly a C# monolith with a relational database behind it.


Boring architectures already exist. Why reimplement what's already done?


Pretty insightful observation on authentication. Using a password method that requires the client to compute a brute force resistant hash that takes ~100ms of cpu time is a non-starter for most serverless platforms.

I'm curious if Cloudflare, or other serverless platforms offer the ability to generate short lived JWTs assigned to each worker. Combined with some configuration (pub key exchange, claim setup, etc) a platform like Neon could use these JWTs to establish identity. Sorta SSO for workers without the CPU overhead.

Seems like a safer approach than basic auth with fixed pws.


In Cloudflare Workers we like to represent things like this with "bindings". A binding is like an environment variable, except instead of containing a string, it contains a live object with methods that access the remote resource. Authentication is baked into this object, so the Worker code itself never has to see an API key at all. For example, if you have a Worker configured to use a KV namespace, you can write code like:

    let value = await env.MY_KV.get(keyName);
No setup needed, at least in code. You create the binding either through the configuration UI or API.

So far we've mostly used this technique to connect Workers to other Cloudflare-provided services like Workers KV, but I'm super-interested in the idea of third-party bindings. Hopefully, you'd be able to configure them through an OAuth-like flow, where the Cloudflare dashboard redirects you to the third-party service, that service prompts you for permission, then redirects back to Cloudflare, and you never have to copy/paste a single secret.

Of course, under the hood this would all be backed up by strong authentication, but it's high time we stop making application developers waste time thinking about this stuff.

(I'm the tech lead for Workers.)


Heck yes - this is the way. Make this available and you will unlock an entirely new ecosystem of CF worker plugins. A++


Could we use mTLS to do this with third parties?

https://developers.cloudflare.com/workers/runtime-apis/mtls/


My new hero, please update the documentation on the workers, bindings, and examples.

I can't cite examples while I'm being ushered by a distracted Uber driver but this explanation goes beyond the documents.

Also, please extend Cloud Workers to be written as NanoVMs.


This is basically how we do authentication in the Splitgraph DDN [0], which is kind of like a multi-tenant serverless Postgres.

We implement the Postgres frontend with a forked version of PgBouncer, and we changed the authentication method such that when the user authenticates, we issue them a JWT which we store as a session variable. That session variable has the same security properties as a cookie in a web browser (the user can change/manipulate it, but if it's signed by us we can trust its claims).

That's the simple explanation that skips over the multi-tenant part. I don't want to derail from the thread - Neon is very cool, and we are actually experimenting with it right now, for storing the Seafowl [1] catalog [2] when deploying to "scale to zero" services like Google Cloud Run or AWS Lambda, which don't have persistent storage.

[0] https://www.splitgraph.com/connect/query

[1] https://seafowl.io

[2] https://seafowl.io/docs/learning/architecture


Postgres also supports auth via SSL client certs, so that's perhaps another option. I don't know how well that maps to apis in Cloudflare and the server side of Neon.


This is a good idea... how much compute does SSL client cert auth require on the client side? I would suppose it has to produce a signature of some kind, perhaps signing a challenge from the server?


Client signs previous handshake messages with private key to prove it has possession of private key for client certificate.

It should be a lot faster than a purposefully slow kdf AFAIK.


Since serverless is basically just the PHP model with more languages (so: also basically just CGI) I'm waiting for them to balance clients onto as few machines as possible in each DC and add a (revolutionary! State-of-the-art!) connection-pooling agent to each one.

I reckon the current hype-cycle will be just about at its peak, when that happens.

(FWIW I like serverless—but then, I liked CGI, so of course I do)


We are working with the Cloudflare team and see if they can build it. This is a great idea.


Iirc lambda can use iam policies to connect to the database, and you can get a policy from the lambda conf of for more complex setup lambda can pick up authenticated user policies from an api gw


That is some serious optimization. Nice!

The article ends with the statement that you are pretty much done here for now. Would optimizing your TLS termination not maybe offer some more ways to speed this up? Or is that also already fully optimized?

I did not realize before that your approach with Websockets actually meant that there was no application/client side pooling of connections. What made you choose this approach over an HTTP API (as for example PlanetScale did) anyway?


> The article ends with the statement that you are pretty much done here for now. Would optimizing your TLS termination not maybe offer some more ways to speed this up? Or is that also already fully optimized?

No, we don't do early termination yet, but it makes sense to try it out too. Here we mostly concentrated on how far we can get in terms of reducing number of round-trips.

> I did not realize before that your approach with Websockets actually meant that there was no application/client side pooling of connections. What made you choose this approach over an HTTP API (as for example PlanetScale did) anyway?

To keep compatibility with current code using postgres.js.


> To keep compatibility with current code using postgres.js.

That makes a lot of sense - not needing an additional driver/client package is indeed a good point. Any plans to add a HTTP based API though anyway?


Potentially. We will follow what our users will ask us to do.


This isn't just useful for serverless. Any process-per-request based architecture can also benefit (PHP, Ruby, etc).


A simple optimization in these cases is to keep the connection open (see https://docs.djangoproject.com/en/4.1/ref/databases/#persist...)


PHP isn't generally process per request these days (for precisely this reason).

Each request is "fresh", but extensions can persist state between requests, including connections to databases etc.


The end of the post mentions HTTP/3 and QUIC. Planetscale have blogged about this [0].

0. https://planetscale.com/blog/faster-mysql-with-http3


I'm CEO of Neon. Happy to answer any questions you might have.


Neon looks like everything I've ever wanted out of a serverless database. I send a vendor pg compatible queries, vendor takes care of everything, charges me for usage - and connections work with 3rd party serverless runtimes.

I have a number of workloads I'd migrate to you over the next few weeks if you're ready for it. What is the state of Neon?

Is it ready for small-volume but business critical loads?

I'm struggling to find support guarantees and SLAs on your website, do you have them yet?

What is your off-site backup story? Can I export my backups to something like R2, S3, or B2?

Many of my customers are running in Cloudflare Workers. Their volume is low, but it's business critical (DB downtime means business downtime - business will fail if we lose all the data).


> I have a number of workloads I'd migrate to you over the next few weeks if you're ready for it. What is the state of Neon? Please do we will support it! We have 22K hosted databases on the platform and are quite stable.

> Is it ready for small-volume but business critical loads? Depends on business criticality. But generally yes. We are debating internally when we are going to announce it.

> I'm struggling to find support guarantees and SLAs on your website, do you have them yet? Good feedback. It's in the works.

> What is your off-site backup story? Can I export my backups to something like R2, S3, or B2? Since our storage is integrated with s3 and supports branching we can treat branches as backups. > Many of my customers are running in Cloudflare Workers. Their volume is low, but it's business critical (DB downtime means business downtime - business will fail if we lose all the data). We haven't lost data yet. There are a lot of redundancy in the system. Safekeepers, S3, branches (backups).


How do you compare against planetscale (I understand that they are mysql vs you postgres but I am looking for serverless db right now so business/product/support comparison is important)


Planetscale[1] is MySQL that is automatically sharded and proxied with a system called Vitess to scale horizontally. Because it's real MySQL, most functionality still works without the operations overhead.

Neon[2] is Postgres that has a surgically enhanced data layer that reads from a distributed set of nodes and object storage, but is still a single Postgres instance that's started and stopped on-demand and not automatically sharded. Because it's real Postgres, all functionality works including extensions without dealing with issues from sharded datasets, but also without the horizontally scalability and instant start (so far). However because the data layer is improved, the actual compute node is much more efficient in both startup and processing so it works well in many scenarios.

CockroachDB[3] is a proprietary distributed database built to Postgres wire/data protocol with natively separated storage and compute. Their serverless plan also starts and stops instances as traffic comes in but with fast startup because of their specific architecture. Because it's not real Postgres though, there's a good bit of missing functionality.

There's also TiDB[4] from PingCap which is similar to a MySQL version of CockroachDB, and Yugabyte[5] which is somewhat like Neon but with both distributed compute and data layers also using real Postgres components - however neither has a serverless offering.

1. https://planetscale.com/ 2. https://neon.tech/ 3. https://www.cockroachlabs.com/ 4. https://www.pingcap.com/tidb/ 5. https://www.yugabyte.com/


Thank you. This is helpful.


I'm glad to have an answer to the question I asked you 7 months ago: https://news.ycombinator.com/item?id=32472443 (What's Neon's point of view about transient state in nodes? Is there a world where serverless client connections are stateless, or is the set up overhead not expected to be worth the cost?).

This is a good analysis of network time and optimization. I'd love to see a followup exploring the impact on the server side; Considering how Postgres was historically not very efficient at setting up new connections but has been steadily getting better with each major release. It would also be great to see what portion of the request is taken up by the setup versus the query itself, to have some idea of cost.


Is the websocket connection opened directly from the client to the Postgres server? Or is the other end of the websocket tunnel terminated some sort of reverse proxy?

In that case, do you manage the possibility of multiple clients opening a websocket connection to the same to “the same” server (which may actually be a different instance of the same proxy)


Any plans to introduce a way to connect to a Neon database without making it publicly accessible via the internet?


Yes. What's your use case? Security or egress costs (those are 0 if you are in the same region).


Security


We are looking at private link for this. Will this work for you?


I haven't used PrivateLink before, but from my brief look, it looks like it would work.


how would they do this? allow it to be added to a wireguard VPN or something?


Or within AWS world you can do VPC peering.


How about private link?


Why do web services not discuss the metrics that were common for cloud servers? Like average response time and distribution, throughput, error rates, etc... and what can you tell us about neon?


Due to latency, Neon is only for AWS/GCP/Azure datacenters or can we use it from smaller providers, such as DO/Linode/etc?


You can use in anywhere. We will keep working to drive GLOBAL latency down. As in connect to Neon from anywhere.


Is your geo-distributed read replicas going to lean on WAL replication?


Yes, we are working on this at the moment. Since we control storage we will be implementing this at the storage level.

From there we are thinking to expose <project_id>.read.neon.tech and <project_id>.write.neon.tech and for read queries route traffic to the local replica.

It's not set in stone yet, but that's the gist of it.


Excellent. That work’s wonderfully for us too ;-)


I have a non-tech business approaching limited release. I'm very interested in two things:

- Data safety and privacy. As I'm UK based - can my data be restricted to the UK only (no magic round the world trips inside your backend)? do you comply with the GDPR?

- Can I set up postgres (WAL) replication to a non-neon database?


With a persistent OS-level network tunnel (ex. wireguard) that exists when the app starts, you wouldn't need HTTP, WebSockets, TLS, etc. The app could open a single TCP connection to a sql connection pool and send its request, and it would be tunneled securely from point to point, with no additional round trips or protocols.

The idealist in me would rather this kind of thing become a standardized extension of address + transport protocols. But every person I talk to would rather stack 12 protocols on top of each other than work on improving the existing status quo. It feels like a truism of human societies; everyone wants improvement, but nobody wants fundamental reform.


It's worth noting that Wireguard doesn't use any persistent connection (although you can configure clients to send Keepalive packets). It sends UDP packets only as long as you're opening connections to remote endpoints over its network interface, and lazily sends handshakes only when it determines it hasn't sent one in a while. In the normal setup, that network interface is "persistent," but only in the sense it has an entry in the routing table, etc.; by default there is no daemon sending any packets over it (unlike say, OpenVPN for example), because the nature of UDP means there is no connection to maintain.


This article is mainly focusing on serverless architectures where there can be no persistent connection for pooling.


They most definitely run some network hardware/software in front though - you could definitely spin up workers behind a vpn? They are assigned a network interface, after all?

Ed: oh, i see - the usecase is going from a serveless app to pg? I suppose any vm-like serveless node could be made to speak witeguard, too...


I didn't know that neon supports using S3 as cold storage. That's seriously amazing. How does that work with indexes and queries? It's there a single index which will always grow and stored on disk or is the index split?


The storage works at the page level. Pageservers provide low latency reads and safekeepers low latency writes. Then we move cold data to S3. Think about safekeepers + S3 being the system of record for most recent + historical data. And pageservers is a scratch space for updating data and service reads. Pages are also on compute nodes.

The problem that S3 doesn't love small files. So we organize pages in LSM trees on the pageservers and offload layers to S3.


> The problem that S3 doesn't love small files.

I hadn't heard it before and am not finding references googling.

I'm curious what problems you have with small files on S3; or if someone wants to feel free to point me to a link to discussion of this apparently known fact!


Small as in 8kb page. A sizable Postgres database would have millions of them.

Lots of stuff on the web. Quick googling found: https://www.upsolver.com/blog/small-file-problem-s3


Do you guys do compaction on the levels that live in S3?


Compaction is performed only by pageserver and only for L0 layers. But L0 layers can be offloaded to S3 and loaded back. So we actually perform compaction of S3 layers (but we download them to the pageserver).


So how do you do development with this?

Is the recommendation to have a per-dev branch off the main DB?


Main idea was one branch per pull request to better test migrations on a fresh data. But one branch per dev or branch per dev per pr will also work.


Nice - I mean it seems awesome to me especially the copy on write billing logic for this sort of dev style


neon.tech compute units looks costly as compared to mongodb atlas and cloud postgres


To spare someone a click, yes there is still a postgres server.

Can't stop being mad at the misuse of "serverless". Even worse than using crypto for cryptocurrencies. Words have meaning...


It is a terrible term but in this case it's referring to the client application, not the database instance.


At least there are cryptographic functions backing cryptocurrency. It's not _as_ inaccurate as "serverless".


And what's your definition of serverless?


From what I can tell, you still need to think of it in terms of hardware, as the prices are based on instance sizes.

A true serverless solution, is for me something that takes care of this automatically by autoscaling / sharding. I should just have to point my database connection to an URL and it would not require any thoughts on my side. If I suddenly get a peek during Black Friday it should just handle it transparently.


Something from peer to peer world for example (or postgres embedded in the same process perhaps)


The title made me think there's, uh, no server. So Postgres running locally on a client. I'm not being obtuse or something, it's a reasonable guess.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: