While it's great that AWS has indeed contributed fixes to upstream Elasticsearch, they link to 9 PRs that are generally on the trivial end of the scale. (Though I don't doubt the PR that adds a missing synchronized keyword might have been gnarly and time consuming to debug, and that diff size does not necessarily correlate to importance)
For a project AWS was making hundreds of millions in revenue on four years ago (as per an ex AWS employee), patting your own shoulder for such a trivial amount of contributions is a bit disingenuous. They might have contributed more, but if there was something significant, they probably would have mentioned.
Disclosure: I work at Amazon on cloud infrastructure, but not on the codebase in question. I helped with parts of the blog post to try to explain some of the nuance about how things are set up with Elasticsearch as an "upstream".
The 9 PRs were only to demonstrate working in the "upstream first" practice, and aren't exhaustive. It also doesn't cover the additional work in the Apache Lucene project that benefits Elasticsearch as well, which is where larger code investments are being made (since that's the right place for them to live, for much of what's being built).
A lot can be said about AWS and open source, but it is clear they have created some "secret sauce" on the networking, storage and virtualization side of things which is their core business. Unlike Elastic they never promised open source and never used it as marketing. So it completely fair for them to keep those things closed as they provide a big part of the competitive advantage to their cloud.
Those components underpin things like Aurora (which does tricks with storage and replication that MySQL can't) and this warm/hot storage. So there is probably no practical way to open source those elasticsearch changes without opening up their storage system as well and even then it wouldn't run outside of AWS.
There is a slideshare about their networking approach for VPC somewhere it's quite clever. But I can't find anything for EBS and what features it might have that make Aurora replicate the way it does.
Edit: I found this video https://www.youtube.com/watch?v=St3SE4LWhKo it uses the slides I mean but looks like there are a few versions from different events. Google for "another day, another billion packets"
I remember what it looks like, tried to Google it but can't find it. The gist of it was that they didn't create any tunnels or VPN like infrastructure to create VPC, because it wouldn't scale to large numbers of nodes. Especially because your servers can be on completely different hosts all of their internal network.
For a project AWS was making hundreds of millions in revenue on four years ago (as per an ex AWS employee), patting your own shoulder for such a trivial amount of contributions is a bit disingenuous. They might have contributed more, but if there was something significant, they probably would have mentioned.
Notable new features like "ultrawarm" they did not attempt to contribute upstream, nor open source at all: https://aws.amazon.com/about-aws/whats-new/2020/05/aws-annou...