This is one of the best thing I read; I thought I was pretty good at Flexbox (and its tailwindcss building blocks), but this scratched my itch on some theoretical foundations.
I bet the engineers at Instagram were unaware of pythons performance profile when they chose it, you should let them know that they should just switch to a different language.
Meta is just a small startup though, they probably don't have enough resources nor the skills to switch to a better language even after they've heard the gospel.
I don't know if you're joking or not but that is exactly true. Meta went as far as creating their own PHP engine and then a new PHO compatible language because they didn't have the resources to switch from PHP.
Instagram is presumably in the same position. Switching language is basically impossible once you have a certain amount of code. I'm sure they were aware of the performance issues with Python but they probably said "we'll worry about it later" when they were a small startup and now it's too late.
Well, Facebook also created their own hacked up version of PHP (called Hack) that's presumably easier to migrate PHP to.
Hack is actually surprisingly pleasant, basically about the best language they could have made starting from PHP. (I know, that's damning with faint praise. But I actually mean this unironically. It has TypeScript vibes.)
I was excited about Hack when it came out. Unfortunately PHP took just enough from it to kill it. I gave up on it once Composer stopped supporting it, after backwards compatibility with PHP no longer became a goal.
IMHO Hack's best feature was native support for XHP... which (also unfortunately) isn't something PHP decided to take.
I only used Hack when I was very briefly working for Facebook. (And I used PHP once before nearly 20 years ago by now for some web site I had 'inherited', back when PHP was truly an awful language)
Did what? Rewrite Instagram into another language? Do you have any source on this?
Last time I checked they're working on improving Python performance instead (yes I know they forked it into Cinder, but they're trying to upstream their optimizations [0]). Which is very similar to what we're doing at Shopify.
Of course 100% of Instagram isn't in Python, I'm certain there's lots of supporting services in C++ etc, but AFAIK the Instagram "frontend" is still largely a Python/Django app.
The joke is that if Meta thought that replacing all the Python code they have with something else was worth it, they'd have done it already.
> The joke is that if Meta thought that replacing all the Python code they have with something else was worth it, they'd have done it already.
"Worth it" depends on both how much performance improvement you get, and how hard it is to replace. Did you consider maybe the rewriting effort is so humongous that it is not worth doing despite large performance improvements? Thus making the joke not funny at all...
That's exactly the joke though. Every time Ruby (or Python) is discussed on HN we get the same old tired question of "why don't they just rewrite in Rust".
But that's some silly engineer tunnel vision, squeezing the very last bit of performance out of a system isn't a goal in itself. You just need it to be efficient enough that it cost you significantly less to run that the amount of revenue it brings you.
I can bet you that moving off Python must have been pitched dozens and dozens of time by Meta engineers, but deemed not worth it, because execution speed isn't the only important characteristic.
So yes, I find it hilarious when HN commenters suggests companies should rewrite all their software into whatever is seen as the most performant one.
It's usually dismissed because companies think short term, and switching languages is a project with huge short term disadvantages and huge long term advantages.
1. 10% performance improvement at Instagram could lead to many millions of revenue "instantly". It is not laughable at any company.
2. It won't be a 5000% performance improvement. Facebook uses its own fork of Python that is heavily optimized. Probably still far from C++, but you should be thinking about languages like Java when talking about performance.
"Better" is a very subjective term when discussing languages, and I hope such discussions can be more productive and meaningful.
Yeah it's definitely welcome, but even if it is double the performance (doesn't seem to be quite there in my experience) fast languages are still 25-50x faster. It's like walking twice as fast when the alternative is driving.
Well, it really depends on whether that alternative is open to you, and at what cost.
So eg lots of machine learning code is held together by duct tape and Python. Most of the heavy lifting is done by Python modules implemented in (faster) non-Python languages.
The parts that remain in Python could potentially be sped up by migrating them, too. But that migration would likely not do too much for the overall performance, but still be pretty expensive to do (in terms of engineering effort).
For organisations in these kinds of situations, it makes a lot of sense to hope for / contribute to a faster Python. Especially if it's a drop-in replacement (like Python 3.12 is for 3.9).
What makes really me hopefully is actually JavaScript: on the face of it, JavaScript is actually about the worst language to have a fast implementation. But thanks to advances in clever compiler and interpreter techniques, JavaScript is one of the decently fast languages these days. Especially if you are willing to work in a restricted subset of the language for substantial parts of your code.
I'm hoping Python can benefit from similar efforts. Especially since they don't need to re-invent the wheel, but can learn from the earlier and ongoing JavaScript efforts.
(I myself made some tiny efforts for CPython performance and correctness. Some of them were even accepted into their repository.)
> Facebook uses its own fork of Python that is heavily optimized.
So likely the 5000% improvement is no longer possible because they already did multiple 10% improvements? I don't know how this counters the original point.
All clues point to FB going this route because they had too much code already in PHP, and not because the performance improvement would be small.
In any case, "facebook does it" is not a good argument that something is the right thing to do. Might be, might not be. FB isn't above wrong decisions. Else we should buy "real estate" in the metaverse.
Looking at the traffic, isn't this literally MITM'ing all your traffic? This actually should be marked as [Flagged]; there's absolutely no reason why anyone should be using this.
The Go ecosystem (at least in the public) heavily encourages committing all generated code as go code is meant to be functional via a simple `go get`. Even a popular project like Kubernetes is full of generated protobufs committed in the codebase.
I also think CLAs are eerie and goes against the open source spirit, I don't think CLA alone puts a project in "high risk". I'm not sure about the FAANG open source projects that are used as libraries (Guava, React, ...). These projects fundamentally don't jeopardize these company's businesses, and serve to increase the developer goodwill amongst the engineers. Nobody can predict the future but I can't imagine these projects becoming relicensed.
More plausible scenario is them becoming an abandonware, but even in those cases the community can carry the torch.
Agreed, and there's a few projects with CLAs that are ranked lower due to mitigating factors, like K8s [1]. I honestly don't get why they have a CLA, anyone know?
The impact of developer good will is difficult to measure, so I don't attempt it. Redis burned community good will so badly with their relicensing that several forks rapidly emerged. Seemed like a predictably poor decision to me.
I also don't want to pick favorite companies, because it's subjective, companies can change strategies or even sell projects off. What if Meta decided to sell React to a patent-troll-like company instead of just abandoning it?
> I honestly don't get why they have a CLA, anyone know?
There are valid reasons to have a CLA: confirmation that your contributions are not encumbered by an employer contract is a good one. What there is rarely an excuse for is a copyright assignment, which often gets bundled into a CLA.
The only non-nefarious example of copyright assignment that I can think of is the FSF, but only because they have such a strong record on software freedom.
This is great; how does this compare to Day One? I've been using Day One for multiple years (1000 day streak) and was wondering if I should stick with this. It does what I want but it seems like it's lacking in product development post acquisition compared to other offerings.
Thanks for the interest. It's definitely not at the level of polish as Day One yet. I used to use Day One years ago when it was still a one-time purchase app, but even back then it had nice things like recording your location and local weather conditions. I'd love to add simple niceties like that eventually.
I'm trying to position my app less like a traditional journal and more of a "jot down quick thoughts" place, hence the social media-like interface. But I also want to add more zettelkasten-like features as well, so it's kind of a shoebox app for whatever you want to keep track of for later.
What is there to learn from an "storage industry expert" or major vendors? network attached block level storage at AWS's scale hasn't been done before.
Some of it, like random IOPS, spindle bias etc...was well known.
Well among implementers, vendors were mostly locked into the vertical scaling model.
I ran a SGI cluster running CXFS in 2000 as an example, and by the time EBS launched, I was spending most of my SAN architect time trying to get away from central storage.
There were absolutely new problems and amazing solutions by the EBS team, but there was information.
Queue theory was required for any meaningful SAN deployment as an example and RPM/3600 had always been a metric for HD performance under random.
>What is there to learn from an "storage industry expert" or major vendors?
I mean, literally every problem they outlined.
>Compounding this latency, hard drive performance is also variable depending on the other transactions in the queue. Smaller requests that are scattered randomly on the media take longer to find and access than several large requests that are all next to each other. This random performance led to wildly inconsistent behavior. Early on, we knew that we needed to spread customers across many disks to achieve reasonable performance. This had a benefit, it dropped the peak outlier latency for the hottest workloads, but unfortunately it spread the inconsistent behavior out so that it impacted many customers.
Right - which we all knew about in the 90s, and NetApp more or less solved with WAFL.
>We made a small change to our software that staged new writes onto that SSD, allowing us to return completion back to your application, and then flushed the writes to the slower hard disk asynchronously.
So a write cache, which again every major vendor had from the beginning of time. NetApp used NVRam cards, EMC used dedicated UPSs to give their memory time to de-stage.
Etc. etc.
>network attached block level storage at AWS's scale hasn't been done before.
This is just patently false. It's not like EBS is one giant repository of storage. The "scale" they push individual instances to isn't anything unique. The fact they're deploying more pods in totality than any individual enterprise isn't really relevant beyond the fact they're getting even greater volume discounts from their suppliers. At some point whether I'm managing 100 of the same thing or 1,000 - if I've built proper automation my only additional overhead is replacing failed hardware.
Downvote away, watching HN think that re-inventing the wheel instead of asking someone who has been there already what the landmines are seems to be a common theme.
I'm guessing that at cloud scale, more innovation & scalability is needed for the control plane (not to mention the network itself).
Regarding a durable/asynchronously destaged write cache, I think EMC Symmetrix already had such a feature in the end of '80s or 1990 (can't find the source anymore).
> whether I'm managing 100 of the same thing or 1,000 - if I've built proper automation my only additional overhead is replacing failed hardware
Hahahah surely this is a joke, right?
If it’s so easy and you already had solved all these problems, why didn’t someone already build it? Why didn’t you build EBS, since you apparently have all the answers?
For example, BigQuery has natural support for arrays and nested data, and it's quite nice / essential for good data modeling. For example, "tags" can be stored as `Array<Struct<Key, Value>>`, and this can be used to implement things like, "search with fields with particular tags".
This reduces the cognitive burden of remembering which tables join with which, especially if we know that a relationship is solely relevant in one context. I.e. Tags can only be joined to the main table, and no other joins are sensical.
reply