While it's great that AWS has indeed contributed fixes to upstream Elasticsearch...

_msw_ · on Jan 22, 2021

Disclosure: I work at Amazon on cloud infrastructure, but not on the codebase in question. I helped with parts of the blog post to try to explain some of the nuance about how things are set up with Elasticsearch as an "upstream".

The 9 PRs were only to demonstrate working in the "upstream first" practice, and aren't exhaustive. It also doesn't cover the additional work in the Apache Lucene project that benefits Elasticsearch as well, which is where larger code investments are being made (since that's the right place for them to live, for much of what's being built).

t0mas88 · on Jan 21, 2021

A lot can be said about AWS and open source, but it is clear they have created some "secret sauce" on the networking, storage and virtualization side of things which is their core business. Unlike Elastic they never promised open source and never used it as marketing. So it completely fair for them to keep those things closed as they provide a big part of the competitive advantage to their cloud.

Those components underpin things like Aurora (which does tricks with storage and replication that MySQL can't) and this warm/hot storage. So there is probably no practical way to open source those elasticsearch changes without opening up their storage system as well and even then it wouldn't run outside of AWS.

catmanjan · on Jan 21, 2021

The way I see it is the secret sauce is just black boxing it so you don't have to/can't worry about those things...

t0mas88 · on Jan 22, 2021

There is a slideshare about their networking approach for VPC somewhere it's quite clever. But I can't find anything for EBS and what features it might have that make Aurora replicate the way it does.

robhu · on Jan 22, 2021

I'd be interested in seeing that slideshare if you have it to hand?

t0mas88 · on Jan 22, 2021

Edit: I found this video https://www.youtube.com/watch?v=St3SE4LWhKo it uses the slides I mean but looks like there are a few versions from different events. Google for "another day, another billion packets"

I remember what it looks like, tried to Google it but can't find it. The gist of it was that they didn't create any tunnels or VPN like infrastructure to create VPC, because it wouldn't scale to large numbers of nodes. Especially because your servers can be on completely different hosts all of their internal network.