Microsoft acquires Citus Data (YC S11)

cdbattags · on Jan 24, 2019

In the latest world of Postgres:

- we now have closed source Amazon Aurora infrastructure that boasts performance gains that might never see it back upstream (who knows if it's just hardware or software or what behind the scenes here)

- we now have Amazon DocumentDB that is a closed source MongoDB-like scripting interface with Postgres under the hood

- lastly, with this news, looks like Microsoft is now doubling down on the same strategy to build out infrastructure and _possibly_ closed source "forked" wins on top of the beautiful open source world that is Postgres

Please, please, please let's be sure to upstream! I love the cloud but when I go to "snapshot" and "restore" my PG DB I want a little transparency how y'all are doing this. Same with DocumentDB; I'd love an article of how they are using JSONB indices at this supposed scale! Not trying to throw shade; just raising my eyebrows a little.

craigkerstiens · on Jan 24, 2019

Craig here from Citus. We're actually a bit different than past forks. Many years ago Citus itself was a fork, but about 3 years ago we became a pure extension[1]. This means we hook into lower level extension APIs[2] that exist within Postgres and are able to stay current with the latest Postgres versions.

[1]. https://www.citusdata.com/blog/2016/03/24/citus-unforks-goes...

[2]. https://www.citusdata.com/blog/2017/10/25/what-it-means-to-b...

sytse · on Jan 24, 2019

Congrats on the acquisition. I love that the complete extension is open source and will stay available: "And we will continue to actively participate in the Postgres community, working on the Citus open source extension as well as the other open source Postgres extensions you love.".

As we continue to grow GitLab this Citus is the leading option to scale out database out. I'm glad that this option will still be there tomorrow.

jaxn · on Jan 25, 2019

As a very happy Citus customer, the extension being open source is very important. And at the same time, I hope I never have to manage my own clusters again and who better to manage it than the team that built it.

cdbattags · on Jan 24, 2019

Holy wow! Thanks for the response!

Yep, I love the fact that y'all went the extension route much like https://www.timescale.com/ and others.

anarazel · on Jan 24, 2019

I think Citus was the first PG fork to "unfork"...

(yes yes, I'm biased, I worked my ass off making that happen)

asah · on Jan 24, 2019

user here: can confirm.

ABeeSea · on Jan 24, 2019

If the creators of Postgres wanted all improvements to be upstreamed, they wouldn’t have released under a permissive license. The ability to use Postgres commercially without exposing your entire codebase to copyleft risk is one of the reasons it’s used commercially in the first place.

basilgohar · on Jan 25, 2019

This is a big assumption. There are many reasons to release something as copyleft – not everyone that releases BSD-like is actively choosing to deprioritize upstreaming. Rather, they are choosing a license that is less restrictive which has other advantages beyond non-copyleft.

Moreover, using copyleft software doesn't mean using forces you to release code. There are specific interactions that trigger the sharing clause in, for example, the GPL, such as distribution, linking, and so on. There remain many, many uses that allow commercialization that do not run afoul of the copyleft nature of the GPL.

I am commenting because I have seen this sentiment repeated ad nauseum on here and, maybe that's not what you meant, but I felt the need to clarify. Moreover, if the code is not AGPL, most online uses do not run afoul, because the code product (say executables) are not themselves being distributed. AGPL was formulated to close this loophole, but GPL code is free from this.

scarface74 · on Jan 24, 2019

And this is a benefit to prevent lock-in. Amazon’s OLAP database, Redshift, is protocol compliant with Postgres. Even if you won’t get the performance benefits of Redshift if you move to a standard Postgres, at least you don’t have to change your code.

Now you can move to Azure without having to change your code.

Someone · on Jan 24, 2019

That doesn’t really prevent lock-in. You may not be locked in now, but that compatibility can end whenever Amazon wants it to end.

scarface74 · on Jan 24, 2019

If that compatibility ends, it might as well be a new product. Every client that connects to Redshift uses a standard Postgres driver.

cdbattags · on Jan 24, 2019

100% agree. I'm just weary of all these "mini optimizations" that all these cloud providers are about to start doing differently.

scarface74 · on Jan 24, 2019

They don’t have a choice. Their infrastructure is different. For instance, Aurora integrates with IAM for authentication and has extensions to load and save to S3. Aurora writes to six different disks across three availability zones and read replicas are synchronous because they read from one of the disks that Aurora is already writing to for high availability.

You can’t get those types of features in a vendor neutral way.

anarazel · on Jan 24, 2019

The IAM stuff requires using a few hooks, at most adding minor modifications that could be upstreamed. Storage is obviously harder, but I and others are working on making that pluggable in core. Amazon's core contribution, from a few smaller things like commandline tools: 0.

Sorry, not buying it.

CWuestefeld · on Jan 24, 2019

While not offering commits, I was under the impression that Amazon had contributed some funding. But I just went off searching for that, and can't find any evidence of this either.

Anybody know anything about them contributing money rather than code?

anarazel · on Jan 24, 2019

They've provided AWS credits, yes.

scarface74 · on Jan 24, 2019

https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect

or

"It couldn't be that hard. I could make a Twitter clone in a week by hiring some people from UpWork"

Did I mention point in time recovery or the architecture behind Serverless Aurora?

anarazel · on Jan 24, 2019

Yea, right... I don't know anything about how hard this stuff is, I'm just a postgres developer & committer, currently working on making postgres' storage pluggable.

CodesInChaos · on Jan 24, 2019

When I publish code using a permissive license I want people to contribute back under the same license. But I don't want to force them to.

rch · on Jan 24, 2019

The good news is that we'll have another reliable, growing, potentially profitable, PostgreSQL company up and running in no time.

manigandham · on Jan 24, 2019

Amazon Aurora doesn't have much to do with Postgres and is a custom storage subsystem used by many different database engines. Aurora Postgres is actually using Postgres code on top to handle queries, and eventually PG itself will get pluggable storage engines.

It's similar with Redshift although it's a much older codebase from the v8 branch with more customizations. The changes are very specific to their infrastructure and wouldn't help anyone else since it's not designed as an on-prem deployable product.

There's also no confirmation that DocumentDB runs on Postgres and its most likely a custom interface layer they wrote themselves. If you just want MongoDB on postgres then there are already open source projects that do it.

atombender · on Jan 24, 2019

Redshift isn't even developed by Amazon — it's a commercial product called ParAccel, which they license (and modify, presumably).

Another commercial MPP database based on Postgres 8.x, GreenplumDB, was open-sourced a few years back. The changes are so extensive that there's little hope of catching up with the current Postgres codebase. Given the focus on OLAP and analytics over OLTP, there might not even be a strong motivation to catch up, either.

anarazel · on Jan 24, 2019

Worthwhile to note that Greeplum is being moved forward. While it initially seemed, from the outside, fairly slow-going, it seems that later versions were done more quickly. Apparently largely because of fixing some technical debt. They're catching up to 9.4 in their development branch, I believe. For years they were on 8.2...

jacques_chester · on Jan 26, 2019

It is an explicit goal of the Greenplum team to merge up to the PostgreSQL mainline. It is a heroic effort to apply tens of thousands of patches, consolidate them with heavily forked and modified subsystems, while maintaining reliability and performance.

The biggest hurdle was that after 8.2, the on-disk storage format changed. The traditional way you upgrade PostgreSQL is to dump the data and re-import it.

This is basically a non-starter with MPP, for the simple reason that there is just too much data. Given the available engineering bandwidth, Greenplum for a long time didn't try to cross that bridge. When the decision was made to merge up to mainline, coming up with a safe format migration was the major obstacle.

Disclosure: I work for Pivotal, the main contributor to Greenplum, though in a different group.

atombender · on Jan 26, 2019

Very interesting, thanks. I can't begin to imagine how you'd catch up a decade-old codebase patch by patch. During Greenplum's development, did the team try to avoid modifying core parts of Postgres so that keeping the codebase in sync would be easier? For example, I imagine that you wouldn't need to touch most of the storage engine (page/row management, indices etc.), but you'd have to modify the planner quite a bit.

jacques_chester · on Jan 28, 2019

I think that basically the disk format change was too far a bridge to cross. You can't tell customers that they should quadruple their cluster so that they can dump their data and reimport it to perform the upgrade.

The planner is basically entirely different. It was later extracted into a standalone module to build HAWQ[0]. Since then there has been work to build a new shared query planner called GPORCA[1].

[0] https://hawq.apache.org/

[1] http://engineering.pivotal.io/post/gporca-open-source/

manigandham · on Jan 24, 2019

The original codebase came from ParAccel which itself was later acquired by Actian, but Redshift is definitely owned and developed by Amazon.

But yes, the changes are not viable to upstream without so many modifications that fundamentally change the database. Pluggable storage in mainline would be a good first step.

atombender · on Jan 24, 2019

Looks like Amazon actually bought the source code and forked it. This Quora thread has some replies by people from Actian: https://www.quora.com/Amazon-redshift-uses-Actians-ParaAccel....

scapecast · on Jan 25, 2019

I'm the guy who wrote the reply that pops up first when you read that Quora thread.

Here's the short story (and I know all of this because the guy who invented the core engine for ParAccel's MPP columnar tech, that is the foundation for Redshift, is one of our early advisors).

- ParAccel developed the tech for a columnar storage database. I believe it was called "matrix"

- Amazon AWS bought the source code from ParAccel, limited for use as a cloud service, i.e. they couldn't create another on-premise version that would compete with ParAccel

- ParAccel then sold to Actian, and a few years ago Actian shelved the product as clearly the on-premise world had lost to cloud warehouses.

The reason AWS bought the source code was time-to-market. It would have taken too long to build a product from scratch, and customers were asking for a cloud warehouse. Back then, ParAccel had by far the best and fastest MPP / columnar tech, plus it's very attractive since it's based on Postgres.

So Actian and Amazon AWS essentially had the same tech, just different distribution models. One is on-premise (Actian), the other one a managed cloud service (AWS). We all know who won.

there's very interesting paper by the Amazon RDS team (where Redshift rolls up). It's not only about "faster, better, cheaper" - it really is about simplicity and that's what Redshift delivered on.

https://event.cwi.nl/lsde/papers/p1917-gupta.pdf

Spin up a cluster in less than 5 minutes and get results within 15 min. Keep in mind, this was all in late 2012, so what appears "normal" today was pure magic back then.

but ever since the "fork", i.e. when AWS purchased a snapshot in time of the code base, the products have obviously diverged. There's some 8 years of development now in Amazon Redshift.

throwaway12iii · on Jan 25, 2019

In 2012 neither a column store or spinning up a cluster was magic or state of the art.

Redshift delivered on sales and marketing.

Amazon made their fortune on the backs of open source contributors many times, and this is just another one of those times.

gfody · on Jan 24, 2019

ParAccel also came from Postgres, https://wiki.postgresql.org/wiki/PostgreSQL_derived_database...

atombender · on Jan 24, 2019

Yes, we already covered that. :-)

koolba · on Jan 24, 2019

Redshift isn’t just a custom storage tier atop an older version of Postgres. It has an entirely different execution engine that compiles commands to an executable and farms them out to multiple data nodes.

manigandham · on Jan 24, 2019

Yes, I was trying to keep it simple but that detail only further supports the fact that these changes have no value to upstream.

pas · on Jan 24, 2019

That's a pretty strong claim. There's bound to be one Apache project (even a future one) that could utilize that code/knowhow.

manigandham · on Jan 25, 2019

What does Postgres have to do with Apache? The basics for building OLAP systems are already well known and Apache has several projects (Druid, Calcite, Arrow, Parquet, ORC, Drill) that are related to it.

It's one thing to take a database and fork it towards a specific narrow focus and runtime, it's entirely different to try and put those changes back and make the original database more capable in a general environment.

cdbattags · on Jan 24, 2019

I replied to a few others further down the thread that had similar thoughts as these.

stingraycharles · on Jan 24, 2019

CitusData made tons of improvements to upstream postgresql, though. Can’t say that about Amazon.

tomnipotent · on Jan 24, 2019

OP is referring to the habit of cloud providers to invest in open source platforms to build cloud services but not contribute back to the community.

AsyncAwait · on Jan 24, 2019

Despite what people say, Stallman was a visionary in a sense with the GPL, we're seining that today more than ever.

SEJeff · on Jan 24, 2019

Stallman is totally ok with the "GPL Loophole" that allows service providers to not give back their changes since they aren't re-distributing the software. If postgres and all citusdata stuff was GPL, this wouldn't change really anything.

Now the Affero GPL prevents this, but Stallman has always been crystal clear he sees the "service provider loophole" as an ok thing and not evil.

keepper · on Jan 24, 2019

RMS is totally NOT okay with this, and has written a extensively on the issue that he sees SaaS similarly to proprietary software[0]

[0]https://www.gnu.org/philosophy/who-does-that-server-really-s...

SEJeff · on Jan 24, 2019

Ah then perhaps I misunderstood this interview[0].

""" Q: All right. Now, I've heard described what is called an Application Service Provider - an "ASP loophole"...

Richard Stallman: Well, I think that term is misleading. I don't think that there is a loophole in GPL version 2 concerning running modified versions on a server. However, there are people who would like to release programs that are free and that require server operators to make their modifications available. So that's what the Affero GPL is designed to do. And, so we're arranging for compatibility between GPL version 3 and the Affero GPL. So we're going to do the job that those developers want, but I don't think it's right to talk about it in terms of a loophole.

Q: Very well.

[7:50]

Richard Stallman: The main job of the GPL is to make sure that every user has freedom, and there's no loophole in that relating to ASPs in GPL version 2. """

[0] http://www.groklaw.net/articlebasic.php?story=20070403114157...

alexhutcheson · on Jan 24, 2019

The article keepper posted was written in 2010, while the interview you linked to happened in 2007. Like any of us, RMS's beliefs and opinions evolve over time.

SEJeff · on Jan 24, 2019

Love or hate Richard Stallman, he is unbelievably resolute in his points of view. They've rarely changed, even though GNU effectively "lost" and open source is generally seen as more business friendly. You've got to give the guy credit for where it is due, and he's preaching almost exactly the same thing he was before I had used a computer today.

AsyncAwait · on Jan 24, 2019

I think you may be over analyzing this. I think his point is simply that the GPL has no loophole in terms of being intentionally designed to be worked around with for SaaS providers, it just was designed at a time and primarily for desktop software/software where this was not a common concern, but it was found to be a problem hence the Affero GPL. He specifically says v3 is more compatible with Affero GPL.

kabwj · on Jan 24, 2019

If these platforms didn’t want cloud providers using their products they shouldn’t have released their products as open source. They can’t have their cake and eat it too.

cdbattags · on Jan 24, 2019

It's not about that, though. It's amazing that Postgres is taking off like this!

I just want to be cautious that the more we use services like Aurora, the more we're relying on our cloud providers to maintain stability with the core Postgres API/internals while they do some fanciness under the hood to optimize their hardware (if that makes sense).

SEJeff · on Jan 24, 2019

But at least they open sourced their fork, designed for data warehousing, before this happened:

https://www.citusdata.com/product/community

ohthehugemanate · on Jan 25, 2019

Kudos to azure for opening so much of what they do. Lots of kubernetes work, including AKS-engine which runs their k8s implementation. Machine learning toolkit. Media services (faceid etc) as a container. The whole azure shabang runs on service fabric, which they've also open sourced.

It's a differentiator for some of their workloads: you don't have to hand your business over to a black box.

spullara · on Jan 24, 2019

Aurora databases and DocumentDB share the same underlying reliable single-writer, many-reader block device for storage. That is all the magic. Not sure where you got the idea that DocumentDB has Postgres underneath it.

cdbattags · on Jan 24, 2019

See this thread: https://news.ycombinator.com/item?id=18869755

The HN community did a little bit of reverse engineering.

spullara · on Jan 24, 2019

I think they are wrong and Amazon is just sharing code with their Postgres layer.

jen20 · on Jan 25, 2019

That was more guessing than reverse engineering, no?

timClicks · on Jan 24, 2019

I get what you're saying, but BSD-licenses are specifically designed to facilitate things not being sent upstream. I don't understand why people moan about companies complying with the license agreement.

cdbattags · on Jan 24, 2019

Your argument is legal and my argument is moral =P

cat199 · on Jan 25, 2019

So, castrating independent economic activity and forcing people to be subservient based on other activities is 'moral'?

this argument cuts both ways.

ymistykwim · on Jan 25, 2019

your moral assertion isn’t an argument, because it contradicts the license chosen by the relevant people

pjmlp · on Jan 24, 2019

This is what happens in a world devoid of the GPL, or where a large majority doesn't sponsor the work of upstream.

ezrast · on Jan 24, 2019

MongoDB was already under the AGPL; Amazon just replicated the API on top of their own storage engine (or an existing permissive-licensed storage engine? Who knows?).

If we're at the point where Amazon can just re-implement whatever project they want, more or less from scratch, I'm not sure there's any license that can save us. :(

freeone3000 · on Jan 24, 2019

Save us from... companies writing software? Amazon's new project is a completely different database that happens to share an interface. At what point do we acknowledge that it's their own work?

pas · on Jan 24, 2019

That's not the question.

The question is of game theory. MongoDB Inc. invested a lot into developing MongoDB, they figured out the right semantics for a lot of things, trade offs, UX/DX (user and dev experience), and so on. (Recently Mongo4 added transactions. Which is a very very very handy thing in any DB.) But MongoDB calculated that they will recoup their investment because they are in the best position to provide support and even to operate Mongo, all while keeping Mongo sufficiently open source (you can run it yourself on as big a cluster as you please, you can modify it for yourself and don't have to tell anyone - unless you're selling it as a service, pretty standard AGPL).

Now AWS took the API and some of that knowhow, and invested into creating something that's not open source at all. You can't learn anything from it, you are not vendor locked in, because the API is standardized, but other than that it takes away a revenue stream from MongoDB Inc. (Sure, competition is good. DocumentDB-Mongo is probably cheaper than MongoDB Inc.'s Atlas.)

But the question is, will this result in less/slower/lower-quailtiy development of MongoDB itself?

Usually big MongoDB clusters at enterprise companies are not likely to upgrade and evolve, they usually get replaced wholesale, but they would have provided the revenue for MongoDB Inc to continue R&D, and to allow them to provide that next gen replacement. Now ... it'll be likely AWS something-something. Which will be probably closed source (like DocumentDB) and at best it'll have an open API (like DocDB Mongo).

Is it fair? Is it Good for the People? Who knows, these are hard questions, but a lot of people feel that AWS doing this somehow robs the greater open source community, and it cements APIs, concentrates even more economic power, and so on.

ericfrenkiel · on Jan 25, 2019

Well said. Unfortunately even in software, it seems that might makes right.

ezrast · on Jan 24, 2019

Save independent software companies from having their lunch eaten by the behemoths after doing the legwork of proving product/market fit - no different from any industry dominated by a few large players. Amazon is not in the wrong for building a competing product, but it's good for the market if the scrappy underdogs have a few edges in their favor.

lukeh · on Jan 24, 2019

Well, reimplementing an API is essentially the Oracle/Google Java case, right?

sudhirj · on Jan 25, 2019

Amazon has explained in their reinvent videos that Aurora is the storage layer of Postgres rewritten to be tightly coupled to their AWS infrastructure. So it is just regular Postgres (they upgrade to latest on a slightly slower cadence). And there’s no benefit to getting the Aurora layer upstream, no one else could use it anyway.

Citus is an extension, not a fork.

So neither of these projects are doing Postgres a dis-service. Both are actually pretty heavily aligned with the continued success and maintenance of mainline open source Postgres.

anarazel · on Jan 25, 2019

> Amazon has explained in their reinvent videos that Aurora is the storage layer of Postgres rewritten to be tightly coupled to their AWS infrastructure. So it is just regular Postgres (they upgrade to latest on a slightly slower cadence). And there’s no benefit to getting the Aurora layer upstream, no one else could use it anyway.

I don't think this is an accurate analysis. For one, they had to make a lot of independent improvements to not have performance regress horribly after their changes, and a lot of those could be upstreamed. Similarly, they could help with the effort to make table storage pluggable, but they've not, instead opting to just patch out things.

> Citus is an extension, not a fork.

Used to be a fork though.

> Both are actually pretty heavily aligned with the continued success and maintenance of mainline open source Postgres.

How is Amazon meaningfully involved in the maintenance of open source postgres?

zjaffee · on Jan 24, 2019

This is the future and it's not just big companies doing it.

Virtually all of the companies that were built on open source products in the past few years stopped centering their focus as being the best place to run said open source program, but instead holding back performance and feature improvement as proprietary instead of pushing back upstream.

scarface74 · on Jan 24, 2019

we now have closed source Amazon Aurora infrastructure that boasts performance gains that might never see it back upstream (who knows if it's just hardware or software or what behind the scenes here)

The performance benefits of Aurora over Postgres are mostly because Amazon rewrote the storage engine to run on top of their infrastructure.

cdbattags · on Jan 24, 2019

All I'm saying is that it looks like Azure and Microsoft are about to do the same.

scarface74 · on Jan 24, 2019

What good would it do for AWS to send it’s changes upstream? No one else could use it without the rest of AWS’s infrastructure.

cdbattags · on Jan 24, 2019

We're at a point in time where f(x) = y and we're starting to stop caring about the internals of "f" as long as the "determinism is equivalent" and that scares me.

OpenJDK for example and the API/ABI (whatever you want to call it) copyright and now MongoDB with DocumentDB, etc.

scarface74 · on Jan 24, 2019

We’ve been at that point since the first PC compatibles came out in the mid 80s with clean room reverse engineered BIOS firmware.

We’ve been living with abstractions over the underlying infrastructure for over 45 years.

manigandham · on Jan 24, 2019

Do you know how your CPU works? Or all the networking hardware in the middle of you loading this page? There are 1000s of layers when it comes to computing that it’s impossible to be transparent with them all.

And quite frankly, that end functionality is what customers are paying for, so that they don’t have to care about all the technical details and operational overhead. It's not like open-source Postgres is being halted by this. The Citus extension itself is open-source too.

illumin8 · on Jan 25, 2019

> - we now have Amazon DocumentDB that is a closed source MongoDB-like scripting interface with Postgres under the hood

To clarify, Amazon DocumentDB uses the Aurora storage engine, which is the same proprietary storage engine that is used by Aurora MySQL and Aurora PostgreSQL, and gives you multi-facility durability by writing 6 copies of your data across 3 facilities, with a 4 of 6 quorum before writes are acknowledged back to the client.

So, it's a bit inaccurate to say that DocumentDB has anything to do with Postgres.

luhn · on Jan 25, 2019

There’s evidence to suggest that DocumentDB is actually running Aurora Postgres under the hood. https://news.ycombinator.com/item?id=18870397

cbsmith · on Jan 24, 2019

I would argue Microsoft's strategy actually makes them more wedded and committed to ensuring the vitality of open source PostgreSQL than anything AWS is doing.

Smerity · on Jan 24, 2019

The big news here: Citus Data donated 1% of their equity to non-profit PostgreSQL organizations[1] so this acquisition is a win for the community even in the darkest scenario of Citus Data disappearing into a canyon on the Microsoft campus.

Given Microsoft's change in operation over recent years there's also hope that they can continue their contributions into the future.

It's fascinating to see Microsoft leave behind the "embrace, extend, extinguish" narrative only to have Amazon adopt it, causing massive rifts and action within the database community[2][3]. I am genuinely concerned about the future of open source software in this continued scenario.

An article with what I considered an outrageous headline ("Is Amazon 'strip mining' open source?"[4]) has only rung more true over time. Amazon is one of the largest companies on earth, selling products that they receive for free but never improve[5], attacking the primary open source provider, and then shift toward their comparable proprietary closed offerings.

Hopefully new ways to "give back", such as equity contribution, can be one of the many paths forward needed to keep open source software healthy. Given how much innovation is unlocked by this, it'd be a crime to go back to the past era.

[1]: https://www.citusdata.com/newsroom/press/citus-data-donates-...

[2]: https://www.cnbc.com/2018/11/30/aws-is-competing-with-its-cu...

[3]: https://techcrunch.com/2019/01/09/aws-gives-open-source-the-...

[4]: https://www.cbronline.com/analysis/aws-managed-kafka

[5]: From [2], "Jay Kreps, a creator of Kafka and co-founder and CEO of Confluent ... said Amazon has not contributed a single line of code to the Apache Kafka open-source software and is not reselling Confluent’s cloud tool."

koolba · on Jan 24, 2019

Any clue what the base for that 1% is going to be? Didn’t see any mention of the total acquisition amount anywhere.

weinerk · on Jan 27, 2019

https://news.ycombinator.com/item?id=18295756

craigkerstiens · on Jan 24, 2019

In case folks are interested here are the details from our founders on the Citus blog - https://www.citusdata.com/blog/2019/01/24/microsoft-acquires...

jarym · on Jan 24, 2019

Well this is great news for the guys at Citus - they created something great as a Postgres add-on and a big chunk of it was open sourced.

They made a decent cloud business model out of it (no idea how successful but everyone I asked was happy with it).

I just hope Microsoft allow the tech to evolve as open source!

iKevinShah · on Jan 24, 2019

"I just hope Microsoft allow the tech to evolve as open source!"

Current Microsoft sure will. They're good with open source stuff.

shangxiao · on Jan 25, 2019

What about future Microsoft? :)

jarym · on Jan 24, 2019

Yes, agreed. Long may this continue.

manigandham · on Jan 24, 2019

Citus is already used by Microsoft itself internally, a recent example being the VeniceDB project to analyze Windows telemetry: https://www.youtube.com/watch?v=AeMaBwd90SI

Considering the competitive database landscape, this is a compelling offering to add to any cloud portfolio. Congrats to the Citus team.

skunkworker · on Jan 24, 2019

I still can't get over the fact that Microsoft is using Postgres internally, if you had told me that 5 years ago I wouldn't have believed it. Did they go into why over MSSQL?

manigandham · on Jan 24, 2019

MSSQL currently does not have horizontal sharding capabilities like this, or easy UPSERT functionality.

plasma · on Jan 25, 2019

I've dabbled with SQL Data Warehouse (via Azure); wouldn't this be the horizontal functionality? It has some limitations, but curious how it compares to Citus.

manigandham · on Jan 25, 2019

Yes but that's a different product designed with columnstores and still missing the upsert. This use-case was for lots of telemetry data that had high updates and very selective queries using indexes rather than large aggregations. Scale-out OLTP was better suited for these analytics than a normal OLAP system.

plasma · on Jan 28, 2019

Thank you makes sense

pritambarhate · on Jan 24, 2019

The main question is: Did MS want an expert PgSQL team to work on Azure PostgreSQL (and may to create a proprietary competitor to Aurora)? Or Did they acquire Citus for its product, to improve and market it further?

It feels like it was the first. If so, it means bad news for Citus product as it will most likely be ignored for a while. That will be really sad, as I don't know any actively supported automated sharding solution for PgSQL other than Citus. There is PostgresXL[1], but there isn't much focus to make it community friendly.

[1]: https://www.postgres-xl.org/overview/

mjw1007 · on Jan 24, 2019

I don't think anyone should expect acquihiring an expert Postgres team to work on a proprietary product to work well, because the programmers' skills are eminently transferrable.

Half the team would probably wander off to work for one of the other postgres-centered companies (and quite possibly continue to work on the open source Citus code).

bgentry · on Jan 25, 2019

Fun fact: the team that built Citus Cloud began with 3 people that came over from Heroku after building its (proprietary) Postgres cloud service.

scarface74 · on Jan 24, 2019

This is more of a competitor to Redshift than Aurora.

teej · on Jan 24, 2019

Citus improves the performance of OLAP query loads but it's not an analytics solution first. They say so themselves -

https://www.citusdata.com/blog/2018/06/07/what-is-citus-good...

---

When we first started building Citus, we began on the OLAP side. As time has gone on Citus has evolved to have full transactional support, first when targeting a single shard, and now fully distributed across your Citus database cluster.

Today, we find most who use the Citus database do so for either:

(OLTP) Fully transactional database powering their system of record or system of engagement (often multi-tenant)

(HTAP) For providing real-time insights directly to internal or external users across large amounts of data.

tosh · on Jan 24, 2019

Great news for Citus, Microsoft, Postgres and for people using open source relational databases. This makes so much sense. (I know this comment might read naive to some but I’m genuinely excited right now)

tracker1 · on Jan 24, 2019

I'm pretty excited as well... Especially if this means improvements to Azure's PostgreSQL options. DBaaS is one of the areas where cloud providers give a LOT of value, more so as long as the interfaces you use can be used internally/locally for development.

Similarly, I really appreciate MS-SQL for Linux on Docker as it is a lot easier to setup for CI/CD and local for dev and testing and is nearly transparent going to Azure SQL or MS SQL Enterprise for hosted deployments. I'd much rather use PostgreSQL with PLv8 than MS-SQL though.

seneca · on Jan 24, 2019

I wonder how long it will be before they shutdown their own Citus Cloud hosted offering, which is hosted on AWS. Seems obvious that will become part of Azure soon.

btown · on Jan 24, 2019

I doubt they'd disrupt their AWS operations right away - this certainly won't be the first time that a MSFT team/subsidiary has used AWS.

What's more worrying to me is if they try to do both - build out a Citus offering on Azure, and simultaneously try to keep high-reliability of their AWS Citus Cloud, which may be the most reliable option for some time. It's tough for any organization, no matter how much capital has been injected, to keep a laser-sharp eye on two inevitably-competing initiatives, each of which have their own performance and automation characteristics. I don't want the one person in the company who knows, say, cloud hard drive recovery patterns like the back of their hand and had previously been the EBS guru, to suddenly be pulled into the new Azure optimization project... and that's not something that capital injections can necessarily fix.

That said, this could accelerate their development timelines overall, and it guarantees stability for the product for quite a while. Overall I think this is good news! Citus is one of those things that you want to have in your back pocket when building any type of app on Postgres, and we certainly see it in our company as a long-term "escape hatch" when we're forced to make database-heavy design decisions at currently relatively-small scale. This deal keeps it alive and prospering!

0xbadcafebee · on Jan 24, 2019

It's going to be really funny if Microsoft ends up using Open Source software to compete against its proprietary service-based competitors. Sort of like how GCP runs k8s... you can use the free tool, or you can use the managed service, and the community helps build the thing. In theory, you retain competitive advantage because you have the most expertise in the product.

The Googles of the world lose out on professional services, but Microsoft could still make a bundle of money by just consulting on the tools without even managing them. You might even make higher margins by not managing the service.

areohbe · on Jan 24, 2019

Congrats Citus team. Just please keep the blog alive! Craig's post are some of my favorite Postgres reads.

reacharavindh · on Jan 24, 2019

So, a part of Microsoft will advocate for SQL server, and another part will develop for PostgreSQL? Isn't it weird? Why would Microsoft want this?

Dangeranger · on Jan 24, 2019

Because MS is more and more in the business of selling the operation of software as a service, instead of selling licenses for their customers to operate themselves.

Think of them like a wedding event rental company, they are more than happy to rent you their own brand of tables, flatware, and silverware, but if you want another brand that’s fine too as long is you buy from them.

nradov · on Jan 24, 2019

Microsoft is hedging their bets. PostgreSQL has the potential to disrupt the traditional relational database market, so if they're going to be disrupted then better to do it themselves.

I expect they'll also try to port Citus Data functionality to the SQL Server platform.

bdcravens · on Jan 24, 2019

I doubt Microsoft is as worried about "disruption" than just expanding their reach. The number of developers who currently run Postgres (hell, even the number of customer Citus currently has) is far greater than the numbers likely to completely switch from SQL Server to Postgres.

reacharavindh · on Jan 24, 2019

May be MS is just too rich that they can afford to do stuff like that. But, that sounds very bad for Citus. Now they are PostgreSQL experts. With MS, they will be salesmen who are try to sell extra bloat that will also work with SQL server. Pity to lose such engineering effort in PostgreSQL community.

jolmg · on Jan 24, 2019

Disrupt how? Haven't they been part of the traditional relational database market for years now? What's changing?

Dangeranger · on Jan 24, 2019

People outside open source are noticing.

Also PostgreSQL has improved by large margins in the last five years. The software is more akin to a pyramid rather than a skyscraper. The foundation took a long time, but now that its complete there is a strong base for growth.

signal11 · on Jan 24, 2019

It's a little bit like Linux in the late 90s. Postgres is increasingly being used in financial services, for instance, because it's "good enough" -- many teams really don't need Oracle's or SQL Server's feature-set, and Postgres has enough 'interesting' features of its own.

It also lets teams own their DB infrastructure and play around with deployment patterns that'd make little sense on Oracle et al because of cost reasons.

It's not that Postgres is "killing" commercial databases, it's just that more and more people are, as you said, noticing that they don't need a commercial database for lots of use cases. And support and consultancy -- and even in-team skills -- for Postgres are often available.

abraae · on Jan 24, 2019

Shifts in the database world happen at a glacial pace. They've been a part of that world for a long time yes, but they're steadily becoming more and more legit and acceptable as an alternative to Oracle, SQL server.

manigandham · on Jan 24, 2019

Postgres is a nice open-source platform but the commercial engines are far more advanced in many areas and are not going to be disrupted anytime soon. If you can use Postgres for your needs today than it's likely that any relational system would've worked and you didn't need a commercial system in the first place.

SQL Server platform already has one of the most advanced optimizers and distributed planning with its use in Polybase, stretch tables, SQL MPP and Azure SQL DW.

grumpydba · on Jan 24, 2019

They are more advanced in many levels, a bit behind on the dev friendliness side (postgres is kind there), that's true.

However I manage roughly 2000 instances of commercial databases. I'd say maybe a 10th could not be hosted on postgres.

It gives postgres a huge disruption potential and the management, in all the big firms I know, is actively looking at it.

manigandham · on Jan 24, 2019

Yes, PG has many more usability features. Simple things like UPSERT make UX improvements if you don't need the advanced capabilities of MSSQL.

I'm sure PG can take on more of the standard RDBMS workloads today but I don't think that's really making a big dent on SQL Server as the bulk of their revenue comes from the serious enterprise scenarios.

grumpydba · on Jan 24, 2019

> I don't think that's really making a big dent on SQL Server as the bulk of their revenue comes from the serious enterprise scenarios.

I'd say postgres can take 90% of the revenues. I've worked in 3 of the top 10 european banks. Most of the SQL Server instances do not even need partitioning, for example. Let alone always on, hekaton and so forth.

People mostly buy peace of mind. Until they are charged millions and start questioning the stupid expensive bills saying "do we need that ?"

Really I have all the metrics to back this up: CPU usage/Availability requirements/data size... It is literally my job to collect those.

Also, proper window support, great postgis, great json support, open source ecosystem support are not simple and huge cost savers.

manigandham · on Jan 25, 2019

Revenue for what? Postgres is free.

If you mean the licensing and support from commercial distributions and vendors then, as you already recognize, these decisions will fall to which vendor they trust more. That will usually end up being Microsoft.

jeltz · on Jan 25, 2019

Proffesional PostgreSQL support is not free, and the banks would most likely buy support from some company just like they currently support for SQL Server.

manigandham · on Jan 25, 2019

>>> "If you mean the licensing and support from commercial distributions and vendors then, as you already recognize, these decisions will fall to which vendor they trust more."

Enterprises are more than just banks, and vendor relationships matter. They're not fungible and are rarely based on price.

grumpydba · on Jan 25, 2019

You do not trust a vendor who rips you off.

manigandham · on Jan 25, 2019

It's not the fault of the vendor if you decide to buy their product. If you didn't need the product features then you shouldn't buy it but enterprise deals are rarely about the absolute price.

grumpydba · on Jan 25, 2019

Indeed. And as I said this reasoning might eat 90% of sql server's revenue.

garyclarke27 · on Jan 25, 2019

I moved from SQL Server to Postgres 5 years ago, I would argue the opposite, ie PG is way better. eg pg supports UTF8, csv, jsonb It has way more standard SQL features and support, such as windows analytical and aggregation functions, string agg is particularly powerful. More SQL join options, join on using clause is v handy, lateral join is standard SQL and v powerful and much easier to understand than SQL servers obtuse version. It’s user defined functions are far more powerful, even with standard sql, let alone the umpteem other languages you can use such as Python or Javascript, functions can be chained for incredible power. Many Dev friendly features make you much more productive eg drop schema and the ability to easily use powerful editors such as sublime text or VSCode. With Recent parallel query improvements pg has mostly caught up on performance, it’s only glaring weakness vs Oracle now is lack of auto incremental mat view refresh - MS auto mat views had so many limitations they were almost pointless, last time I looked - pg mat views have no restrictions but are not that useful because they lack incremental refresh.

manigandham · on Jan 25, 2019

Yes PG has more usability features but it is not a match on performance at all and lacks the advanced features that enterprises want. I don't know why this is so shocking to hear. These commercial engines have billions of dollars in research and engineering, they're not just standing still and aren't obsolete because PG finally got parallel query (which is only workable in v11 and still years behind the others). There's a reason why all the enterprise PG distributions add so many other features and even basics like connection pooling, because that's what it takes to compete in the enterprise space with its vast and complex requirements.

As I said, "If you can use Postgres then you didn't need a commercial system in the first place."

bdcravens · on Jan 24, 2019

At the end of the day, even if you find a better alternative, database engines are a bear to switch. Our database is still SQL Server, even though we've switched our development platform 3 times, and I'd love to be on Postgres.

nradov · on Jan 24, 2019

Well that's exactly how technology market disruption works. The disruptive product sneaks into the low end of the market without any of the established competitors really noticing. Then the disruptive product gradually moves up market and eats everyone's lunch.

15 years ago Windows Server was far more advanced than any Linux distribution. What does the server OS market look like today?

jhall1468 · on Jan 24, 2019

Postgres also has a number of commercial derivatives, so you get the advantage of starting small (open source) and moving to a more optimized derivative when it becomes necessary.

Aster and Greenplum perform exceptionally for what they do. If SQL Server were better that's exactly what companies would use. Commercial engines are definitely more advanced. But some Postgres derivatives can absolutely out perform enterprise platforms in some uses-cases, and vice-versa.

giancarlostoro · on Jan 24, 2019

Azure gives you Linux and Windows as options for hosting, because money is money and engineers are not free.

adventured · on Jan 24, 2019

For a similar business reason as to why Azure offers Linux, Microsoft develops for Android and iPhone and Oracle owns MySQL.

Microsoft doesn't regard PostgreSQL as a direct threat to SQL Server and it's happy to make money where it can so long as it doesn't perceive something to be a mortal threat.

rockker · on Jan 24, 2019

That is what I am observing too.

manigandham · on Jan 24, 2019

Offering services is far more lucrative than a single product. That's the entire cloud computing business model.

Azure is happy to take your money to run SQL Server or Postgres, just like how Amazon has been running Aurora side-by-side with Oracle and SQL Server for years now.

z3t4 · on Jan 24, 2019

If you could switch from Oracle, you would switch to mySQL, thus oracle buys mySQL to keep control of the enterprise market.. If you could switch from MS SQL you would probably switch to PostgreSQL, thus MS wants to control part of the PostgreSQL enterprise market. If you would switch from Facebook, you would probably go to Instagram. Whatever you choose, the money will end up in the same pocket.

bdcravens · on Jan 24, 2019

Same reason why Microsoft wants libraries for languages like Ruby to talk to SQL Server (at the cost of ASP.net adoption - I had the pleasure of speaking to that team at RubyConf one year) or to run SQL Server on Linux (at the cost of Windows licenses). Expanding their breadth benefits the company overall, and they know there will be some who never run SQL Server or Windows, yet 2019 Microsoft wants to have solutions for those folks.

samfisher83 · on Jan 24, 2019

Why did Facebook buy Instagram? Look at all the various travel sites. They are all owned by priceline.

the_common_man · on Jan 24, 2019

Same thing as Oracle owning MySQL. Keep your friends close but enemies closer.

heroprotagonist · on Jan 25, 2019

They shifted a lot of focus to Azure.

Azure Database for PostgreSQL competes against RDS and to a lesser extent, Google's Cloud SQL for PostgreSQL.

They'll still sell MS SQL Server too. But sometimes PostgreSQL is a better fit for your stack, or your preference, and they want your money for them to provide that too.

brokensegue · on Jan 24, 2019

for azure?

rockker · on Jan 24, 2019

It is all about choice. As both AWS and Azure user, I can attest that Microsoft is doing a lot more with OSS and contributing than AWS. We are both a SQL Server and PostgreSQL shop and are excited about this move by Microsoft and Citus.

jerrysievert · on Jan 24, 2019

hosted postgres is already available on azure, ala rds.

talawahtech · on Jan 24, 2019

A little off topic, but I wonder how long it will be before MS acquires Docker Inc. Seems like an even better fit for them now that they own GitHub. GitHub + Docker Hub on the developer engagement side and Docker Enterprise on the traditional enterprise side.

barbecue_sauce · on Jan 24, 2019

I'm wondering how much the OCI and CRI-O has impacted Docker's value proposition. Docker Hub seems more and more like the real product, though I guess you could argue that the container runtime was never really a product in the first place.

talawahtech · on Jan 25, 2019

Private repos on Docker Hub is definitely a product, especially if you provide a seamless path for moving from source code on GitHub to images on Docker Hub to containers deployed in the cloud or on premise.

MS could certainly also do good business selling Docker's friendly Enterprise orchestration tools (including their new Kubernetes based tool) which check all the Enterprise requirements for security, policy, identity management etc.

gaius · on Jan 24, 2019

Docker the company is a lame duck, and docker the software is being rapidly supplanted by podman and buildah. There would be no point.

mmsimanga · on Jan 24, 2019

I haven't used Citus but once thought about Cstore_fdw. How much of this is about Cstore_fdw? I am curious because in data warehousing space my experience has been column store databases totally rule when it comes to speed on analytics. I know SQL Server has column store indexes but that requires you to create them whereas with genuine column store you get the performance boost by virtue of how data is stored.

massaman_yams · on Jan 24, 2019

Very little, I'm guessing; cstore_fdw is not remotely competitive with mature analytics DBMS.

see here: https://tech.marksblogg.com/benchmarks.html

mmsimanga · on Jan 24, 2019

Great link, thanks for sharing.

manigandham · on Jan 24, 2019

SQL Server indexes can be either clustered or non-clustered, which determines whether table data is stored by index order. If you have a clustered columnstore index then the table is actually physically stored in a column-oriented format. Combined with vectorized processing, an impressive query optimizer, and in-memory tables, MSSQL is one of the fastest OLAP systems available.

Also Cstore_fdw is rather obsolete and more of an experiment. It's a rough wrapper around ORC files and is missing many features, advancements and an execution engine to match the performance and usability of a real OLAP database.

Robin_Message · on Jan 25, 2019

Any stats/articles on SQL server having an impressive query optimizer? I have personally found it almost entirely devoid of ability but maybe we just had awkward queries.

olavgg · on Jan 24, 2019

For data analytics I use ClickHouse instead of PostgreSQL. There is a PostgreSQL Foreign Data Wrapper (FDW) for the ClickHouse database, but I have never used it.

diminish · on Jan 24, 2019

Does anyone know any details about the financials of the deal? Is this an acquihire or more?

simonw · on Jan 24, 2019

There's no way this was just an acquire: Citus represents some truly impressive computer science.

andrewstuart · on Jan 25, 2019

Impressive computer science does not at all correlate with commercial success.

If anything, that makes it more likely to be an acquihire.

jchristopherinc · on Jan 24, 2019

Happy for the folks at Citus. I use Citus at work and it's amazing. Hope things stays the same after the acquisition.

jaxn · on Jan 24, 2019

My sentiments as well. Great team to work with and really like the product.

oarabbus_ · on Jan 24, 2019

Speaking as a data professional and SQL addict, I was always impressed when I came across Citus Data posts. Good acquisition by Microsoft.

Apaec · on Jan 24, 2019

Why is an acquisition a win for the company? Seems to me like the big company is killing the small one and absorbing its soul(brand).

Shouldn't sustainability be the primary goal instead of making big bucks temporarily?

taormina · on Jan 24, 2019

So, how much did YC make?

sam0x17 · on Jan 24, 2019

Maybe now they will actually add a free tier so people can sign up for this, develop their product using a free tier, and upgrade when they launch, as is the natural progression with most other cloud products. I think before there were some complexity and/or financial issues preventing this but with Microsoft's wallet it shouldn't be an issue.

aidos · on Jan 24, 2019

There’s a community version.

sam0x17 · on Jan 26, 2019

That doesn't really help. What I want is a seamless service I can just spin up an SQL server on with some cheap or free plan and tiny capacity, and then have it horizontally scale based on usage without me ever having to do or change anything, potentially to the point where it's dealing with terabytes and terabytes of data, thousands of connections, a large monthly bill, etc. Google's Cloud Spanner was supposed to be this, but the minimum monthly fee is $90ish which makes it impractical for anyone who doesn't want to waste money while developing. Citus has traditionally turned up its nose on HN at users who don't already have a massive database, but you gotta start somewhere, and the product is a lot better if you can stay in the same DB ecosystem from 0 users to 1,000,000 users and beyond.

e.g. something like the free "tiny turtle" plan here: https://www.elephantsql.com/plans.html, but that can auto-scale up to citus-scale things, without user intervention, as needed.

yingw787 · on Jan 24, 2019

Congratulations to the Citus Data team! I don't have anything significant to add, but I loved the free socks you gave out :)

rockker · on Jan 24, 2019

Wonder if they will have Microsoft socks now :)

noir_lord · on Jan 24, 2019

5 kinds and with a lot of work you'll be able to figure out which kind is cotton if they put the people in charge of .net naming in control of the sock buying division (which they probably should at that).

gist · on Jan 24, 2019

On that page appears that the photo is a composite and mashed together from various sources. And not a good job either.

ninju · on Jan 24, 2019

Yeah...looks like only the first, and maybe the second, guy was really on the stairwell. All the other people appear to be photoshopped in :-)

grumpydba · on Jan 25, 2019

If microsoft invests significantly in plsql support and oracle compatibility they can bleed Oracle bug time.

CSDude · on Jan 24, 2019

Great news for another Turkish company acquistion, I wonder what was the acquisition price.

aidos · on Jan 24, 2019

It’s a very unsettling future we’ve ended up in where I see that Citus has been purchased and I’m pleased in was by Microsoft.

iKevinShah · on Jan 24, 2019

What are the odds that Citus's Enterprise is released to community version like Github's private repos in near future?

gigatexal · on Jan 25, 2019

I wonder if the Citus folks will have any influence on the future of SQLSERVER. Maybe they’ll bring a plug-in system to it?

mountainriver · on Jan 24, 2019

This is awesome, congrats. Any chance you all may change the license to something more permissive? :fingers crossed:

coverband · on Jan 24, 2019

Congrats to folks at Citus -- they've matured quickly to reach this point in just a couple of years.

akerro · on Jan 25, 2019

genuine question:

Why people think this is good, but MySQL acquisition by Oracle was bad? Both companies have internal conflict of interest by already owning close-source SQL servers, both companies have history of attacking opensource communities.

detaro · on Jan 25, 2019

I wouldn't say it is "good", but two reasons why it probably isn't so bad: Citus is a company active in Postgres, it's not "the Postgres company" like MySQL AB was. Their business model seems like it fits to what Microsoft is doing now, so the open-source part is hopefully not in real danger.

ksec · on Jan 24, 2019

That means PostgreSQL will hopefully get more resources development upstream.

brightball · on Jan 24, 2019

Whaaaaaaaaaaaaaaaaaat!?

Is this a move to defend SQL Server or expand Azure’s PG offering?

rockker · on Jan 24, 2019

I think SQL is a huge business for them already. This looks like for Azure and OSS and expanding there.

jonathannorris · on Jan 24, 2019

Congrats to the Citus Team! Really a great group of people over there.

gfody · on Jan 24, 2019

I hope this means tPostgres or pgtsql gets some Microsoft resources!

lonk · on Jan 24, 2019

Next achievement is selling the operating system to Micros~1

kod · on Jan 24, 2019

Great news, their product is awesome and this will hopefully let more people use it.

Knowing that Citus is available as an option if you need to scale makes Postgres that much more compelling of a default choice for data store.

irisiert · on Jan 25, 2019

The photomontage is pathetically bad...

elvinyung · on Jan 24, 2019

Azure Cloud Spanner time?

rockker · on Jan 24, 2019

it is known as Azure CosmosDB

elvinyung · on Jan 24, 2019

I mean, isn't CosmosDB more like Cloud Datastore/Megastore?

janemanos · on Jan 25, 2019

Think CosmosDB just is not too appealing with it's layered multi-model approach compared to ArangoDBs native way to support multiple data models in one db.

jbverschoor · on Jan 24, 2019

crap :(