This seems like a VERY shallow analysis of the benefits.
Forking an industrial-grade tool means the entire lifespan of the entire product becomes your responsibility to your client. Tracking the major upgrade changes might be a pain in the arse but they're nothing compared to tracking every security and data-loss fix that bubbles around the Postgres community.
It's not just developer time that's the cost here. They had to compile the whole Postgres+Citus database, for every platform they support, in a timely manner, test it and distribute packages. Think of all the CPU cycles and bandwidth they're saving by only having to compile as an extension against public headers.
Functioning as an extension means Postgres and its distributors (eg Ubuntu) are the people responsible for keeping Postgres alive and secure. Citus only have to support their thing.
Why aren't they talking about how much this move is saving them in day-to-day? There's no shame in being efficient.
Being an extension also makes you entirely dependent on somebody else's platform, and makes it possible that your work will simply be subsumed by the platform if they think it's important enough. It's a very weak business position to be in and you have to have incredible future looking planning and brand buy-in to make sure you succeed like this.
You're already dependent. Upstream can turn around tomorrow and provide everything your fork/extension does for free. They can alter their entire codebase to cause you weeks of work to keep up. It can be a hard slog being downstream, no doubt about it. That's why downstreams tend to "get involved" upstream. Sponsorship, sit on technical advisory boards, etc.
But what you're saying —which wasn't immediately obvious, and correct me if I'm wrong— is your users are using your database product, not Postgres, so you can hold them back as long as you like when they're using a forked product. They won't be carried away by an automatic update and it's much harder for them to jump ship.
And while there is some truth to that, it comes with a karmic cost. People picked you because you were based on their favourite, industry tested database. If you slip behind in features, or (more importantly) can't backport security fixes instantly, you're dead.
I think it was more that forks just get ignored. Customers want to run PostgreSQL, not something almost but not quite like PostgreSQL maintained by a company that may or may not employ PostgreSQL core contributors. And we certainly don't want to talk to your sales team about why it is a better fit for our use case.
The stuff Citus has been landing in PostgreSQL is fantastic.
They call out a number of forks that do quite nicely. I work for Pivotal, which sponsors Greenplum. Companies pay handsomely for the capabilities it brings to the table.
But they are right that rebasing is a nightmare. My understanding (possibly wrong) is that the broad selection of APIs that make an extension-only approach did not appear in PostgreSQL until more recent versions -- anyone who forked earlier (such as Greenplum) have to first catch up and then migrate.
I do know that the Greenplum team have decided to catch up until they are working against mainline. It is, as you might imagine, a slow process: rebasing millions of lines of code a release at a time is not the easiest task on earth. But maintaining a fork will, in the long run, be harder.
Ever for smaller changes, it is very well known that you better contribute back to the Free Software project - because having your patch included and being maintained there is a lot less trouble than maintaining your fork and updating your patches every few months.
One exception might be one-off changes, but we all know that nothing is more definitive than the temporary.
The only real exception is when your patch is superseded by another patch (or a better solution). Then, you maintian your private patch only until next version is finalized.
It took a lot of work to not fork, and only recently became practical. Features like logical replication, DDL triggers, foreign data wrappers are all useful for this sort of thing and are all new. Companies like Citus and 2nd Quadrant first needed to get the infrastructure in place, so kudos to them and the PostgreSQL core team.
No. Postgresql is a single-node database. It supports replication and failover to other nodes and foreign-data-wrapper extensions that let it query other datasources, but it does not have any support for natively working as a distributed database across several nodes.
Citus is an extension that takes several database nodes and makes them appear as a single logical database server (at the table level, by automatically sharding them based on a column).
Since we are talking about postgres, please let me go a little OT
Is there any interests in a RDS as a Service?
So basically setting up and running a completely fault aware postgres cluster in any infrastructure, either public or private?
Oh dear sweet lords in the many heavens yes. I have literally turned down devops transition and general system administration work before because they refused to use RDS or equivalent service like Compose.io due to internal corporate policy. Life is too short for me to ever babysit another database server through the painful process of a carefully orchestrated rolling capacity upgrade by way of deliberate use of replicas and failover. (Apparently this has gotten better in newer Postgres versions, but it's not been my job to keep on top of the implications of these things for several years now)
What seems to be missing in the market is the ability to only pay for the setup and maintenance operations, but use my own existing cloud account and resources instead of running multi-tenant or in some other company's cloud account.
Basically as if I hired a contractor to install, monitor and upgrade, but automated. Existing services charge too much since they resell VMs and storage, while also being less flexible with access and performance.
There's also the rise of Kubernetes (with operators, helm charts and persistent storage) that takes away much of the complexity. By version 2.0, it should be able to easily make any legacy single-node system into a fault-tolerant service.
We would be interested, along with other datastores like ElasticSearch, Redis, etc.
AWS, Azure and Google all have various resource organization systems now so that the environment can be isolated but still within our overall account. We run on Google and it would be nice to have a separate project for managed databases while taking advantage of our existing billing arrangements and private network.
MySQL and MemSQL are very different, are you looking for a data warehouse specifically?
They are closed-source and require enterprise licensing based on RAM quota so it's not simple to do automated cloud provisioning. They do have their own MemSQL cloud offering so you might inquire into that. Also MemSQL Ops is probably the easiest and most reliable operations software for any database, it just takes a few clicks to install and upgrade your cluster.
Forking an industrial-grade tool means the entire lifespan of the entire product becomes your responsibility to your client. Tracking the major upgrade changes might be a pain in the arse but they're nothing compared to tracking every security and data-loss fix that bubbles around the Postgres community.
It's not just developer time that's the cost here. They had to compile the whole Postgres+Citus database, for every platform they support, in a timely manner, test it and distribute packages. Think of all the CPU cycles and bandwidth they're saving by only having to compile as an extension against public headers.
Functioning as an extension means Postgres and its distributors (eg Ubuntu) are the people responsible for keeping Postgres alive and secure. Citus only have to support their thing.
Why aren't they talking about how much this move is saving them in day-to-day? There's no shame in being efficient.