What it means to be a Postgres extension

oliwarner · on Nov 6, 2017

This seems like a VERY shallow analysis of the benefits.

Forking an industrial-grade tool means the entire lifespan of the entire product becomes your responsibility to your client. Tracking the major upgrade changes might be a pain in the arse but they're nothing compared to tracking every security and data-loss fix that bubbles around the Postgres community.

It's not just developer time that's the cost here. They had to compile the whole Postgres+Citus database, for every platform they support, in a timely manner, test it and distribute packages. Think of all the CPU cycles and bandwidth they're saving by only having to compile as an extension against public headers.

Functioning as an extension means Postgres and its distributors (eg Ubuntu) are the people responsible for keeping Postgres alive and secure. Citus only have to support their thing.

Why aren't they talking about how much this move is saving them in day-to-day? There's no shame in being efficient.

bane · on Nov 6, 2017

Being an extension also makes you entirely dependent on somebody else's platform, and makes it possible that your work will simply be subsumed by the platform if they think it's important enough. It's a very weak business position to be in and you have to have incredible future looking planning and brand buy-in to make sure you succeed like this.

oliwarner · on Nov 6, 2017

You're already dependent. Upstream can turn around tomorrow and provide everything your fork/extension does for free. They can alter their entire codebase to cause you weeks of work to keep up. It can be a hard slog being downstream, no doubt about it. That's why downstreams tend to "get involved" upstream. Sponsorship, sit on technical advisory boards, etc.

But what you're saying —which wasn't immediately obvious, and correct me if I'm wrong— is your users are using your database product, not Postgres, so you can hold them back as long as you like when they're using a forked product. They won't be carried away by an automatic update and it's much harder for them to jump ship.

And while there is some truth to that, it comes with a karmic cost. People picked you because you were based on their favourite, industry tested database. If you slip behind in features, or (more importantly) can't backport security fixes instantly, you're dead.

simooooo · on Nov 6, 2017

They were surprised that maintaining a postgres fork was a pain in the ass?

stubish · on Nov 6, 2017

I think it was more that forks just get ignored. Customers want to run PostgreSQL, not something almost but not quite like PostgreSQL maintained by a company that may or may not employ PostgreSQL core contributors. And we certainly don't want to talk to your sales team about why it is a better fit for our use case.

The stuff Citus has been landing in PostgreSQL is fantastic.

jacques_chester · on Nov 6, 2017

> I think it was more that forks just get ignored

They call out a number of forks that do quite nicely. I work for Pivotal, which sponsors Greenplum. Companies pay handsomely for the capabilities it brings to the table.

But they are right that rebasing is a nightmare. My understanding (possibly wrong) is that the broad selection of APIs that make an extension-only approach did not appear in PostgreSQL until more recent versions -- anyone who forked earlier (such as Greenplum) have to first catch up and then migrate.

I do know that the Greenplum team have decided to catch up until they are working against mainline. It is, as you might imagine, a slow process: rebasing millions of lines of code a release at a time is not the easiest task on earth. But maintaining a fork will, in the long run, be harder.

vog · on Nov 6, 2017

I also don't get it.

Ever for smaller changes, it is very well known that you better contribute back to the Free Software project - because having your patch included and being maintained there is a lot less trouble than maintaining your fork and updating your patches every few months.

One exception might be one-off changes, but we all know that nothing is more definitive than the temporary.

The only real exception is when your patch is superseded by another patch (or a better solution). Then, you maintian your private patch only until next version is finalized.

stubish · on Nov 6, 2017

It took a lot of work to not fork, and only recently became practical. Features like logical replication, DDL triggers, foreign data wrappers are all useful for this sort of thing and are all new. Companies like Citus and 2nd Quadrant first needed to get the infrastructure in place, so kudos to them and the PostgreSQL core team.

StreamBright · on Nov 6, 2017

Citus is my favorite website when it comes to Postgres content. Their blog posts are usually very informative and useful.

areskib · on Nov 6, 2017

I couldn't agree more, I'm amazed by every single one of their post. They are doing a great job at communicating and giving rich insights on Postgres.

mosselman · on Nov 6, 2017

Can anyone shed some light on how the supposed new cluster features of Postgres 10 compare to something like Citus?

In other words. Does Postgres 10 offer the same features in terms of clustering as Citus does?

manigandham · on Nov 6, 2017

No. Postgresql is a single-node database. It supports replication and failover to other nodes and foreign-data-wrapper extensions that let it query other datasources, but it does not have any support for natively working as a distributed database across several nodes.

Citus is an extension that takes several database nodes and makes them appear as a single logical database server (at the table level, by automatically sharding them based on a column).

giancarlostoro · on Nov 6, 2017

All I found was this:

https://wiki.postgresql.org/wiki/Replication,_Clustering,_an...

And Citus is the first link in that list.

riku_iki · on Nov 7, 2017

Citus provides ability to build sharded environment.

Data for different customers can be stored on separate nodes, and your DB is not limited by capacity of one node.

In regular PG, all data need to fit single node.

siscia · on Nov 6, 2017

Since we are talking about postgres, please let me go a little OT

Is there any interests in a RDS as a Service? So basically setting up and running a completely fault aware postgres cluster in any infrastructure, either public or private?

techdragon · on Nov 6, 2017

Oh dear sweet lords in the many heavens yes. I have literally turned down devops transition and general system administration work before because they refused to use RDS or equivalent service like Compose.io due to internal corporate policy. Life is too short for me to ever babysit another database server through the painful process of a carefully orchestrated rolling capacity upgrade by way of deliberate use of replicas and failover. (Apparently this has gotten better in newer Postgres versions, but it's not been my job to keep on top of the implications of these things for several years now)

siscia · on Nov 6, 2017

Pretty much the same pain point that I experience.

Thanks for your feedback :)

jacques_chester · on Nov 6, 2017

Pivotal & IBM maintain a BOSH release of PostgreSQL for this purpose: https://github.com/cloudfoundry/postgres-release

Since it uses BOSH, you can deploy to a wide range of targets. OpenStack, vSphere, AWS, Azure, GCP and I forget what else.

Disclosure: I work for Pivotal.

mosselman · on Nov 6, 2017

A lot of interest. Setting this up is too much sysops for someone who wants to spend his time developing code.

siscia · on Nov 6, 2017

Thanks for your feedback :)

manigandham · on Nov 6, 2017

What seems to be missing in the market is the ability to only pay for the setup and maintenance operations, but use my own existing cloud account and resources instead of running multi-tenant or in some other company's cloud account.

Basically as if I hired a contractor to install, monitor and upgrade, but automated. Existing services charge too much since they resell VMs and storage, while also being less flexible with access and performance.

There's also the rise of Kubernetes (with operators, helm charts and persistent storage) that takes away much of the complexity. By version 2.0, it should be able to easily make any legacy single-node system into a fault-tolerant service.

siscia · on Nov 6, 2017

Exactly!

Ideally, I would just need an SSH key inside your machines and the capabilities to open an ssh tunnel inside the firewall to scrape metrics.

Ideally, the metric should get exposed back to the customer.

I am not a big fan of containers when working with data that are irreplaceable. But the use k8s may really help.

manigandham · on Nov 6, 2017

We would be interested, along with other datastores like ElasticSearch, Redis, etc.

AWS, Azure and Google all have various resource organization systems now so that the environment can be isolated but still within our overall account. We run on Google and it would be nice to have a separate project for managed databases while taking advantage of our existing billing arrangements and private network.

siscia · on Nov 6, 2017

I will make sure to keep you updated :-)

brightball · on Nov 6, 2017

Yes, but the trick with it is pricing it so that I have a reason to use it instead of just using RDS.

If I could get something like that on Digital Ocean I’d be all over it.

manigandham · on Nov 6, 2017

You already have options:

https://aiven.io

https://www.databaselabs.io/

siscia · on Nov 6, 2017

Well, Aiven seems to support DO: https://aiven.io/postgresql

no1youknowz · on Nov 6, 2017

Out of interest, could you support MySQL? Or more specifically MemSQL?

manigandham · on Nov 6, 2017

MySQL and MemSQL are very different, are you looking for a data warehouse specifically?

They are closed-source and require enterprise licensing based on RAM quota so it's not simple to do automated cloud provisioning. They do have their own MemSQL cloud offering so you might inquire into that. Also MemSQL Ops is probably the easiest and most reliable operations software for any database, it just takes a few clicks to install and upgrade your cluster.

siscia · on Nov 6, 2017

This is really just an idea to solve a pain point of mine that I guess is shared among us...

By the way I don't have any experience with MemSQL

jarym · on Nov 6, 2017

I imagine there is interest in it since Aiven provide such a thing.

siscia · on Nov 6, 2017

Aiven works only with public clouds...

te_chris · on Nov 6, 2017