Instagram engineers themself wrote a bit about their backend infrastructure. One of the more important topics was how they shard the data [0] and this is also linked to in this blog post.
The potential for hotspotting decreases with the number of inserts per second. Like if you only did 1 insert per second and timed it right you could put all of those inserts on one server, but this would likely not overload the server.
It's virtually impossible for anyone to hotspot in a meaningful way with this system.
Nah this will totally be an issue. You can be on the extreme end of replication or the extreme end of sharding and experience performance problems. Sharding is more likely to hotspot depending on where hot data is consistent.
The solution in most cases is a simple database that acts as a pointer database user db -> user's db. That is generated on the creation of a user.
From here you create some simple cold storage models ( if user isn't active ) and some warm models which will scale out the db if the user's db grows it shards and replicates for more read access. But the last thing you want is to slow replication or have one DB that can't move to balance resource utilization. There are some new DB tech that does this without even sweating the deets.
I wonder how they horizontally scaled shards. If they had 2k logical shards they probably had much fewer real/physical shards. So a single database was holding many sharding keys. So when a new physical shard gets added, the data needs to somehow be replicated. That is only true if w reading is the problem. If only a relatively short time period of data is hot you can probably just move over the logical shard IDs to the new physical shard without moving existing data. This requires keeping track of when which physical shard became active.
> How can they know the internal infrastructure but have to assume the app language?
At that time that was the only solution that would have made sense given the achieved behaviour, performance AND development effort.
I did spend a lot of time Cordova(PhoneGap) and all the other HTML5 app thingies for iOS at the time.
Not sure why that particular in my opinion pretty obvious choice bothers you that much. That is very much the reason why they didn't even bother releasing it for Android until almost two years later.
Enough with the "rules" just because you don't find it novel enough.
The author is on HN and says "Just my own brain reading through old talks and articles from Instagram engineering and Excalidraw for the diagrams. I did my best to put together all the info I learned from them into a comprehensive and simple manner".
If it was compiled from more than 1 article, it becomes an original article. The post should have only novel idea and info. is an arbitrary requirement and not the meaning of that rule.
Though I would add if author were taking anything verbatim, that should be highlighted as a quote with the original source. (edit: reading more, author has already done that.)
They sure seem bothered a lot by something trivial (god forbid the post which was NOT made for HN anyway quoted some original sources and didn't go into the detail they'd like it to).
Somebody took the effort of compiling an article on several sources, and we're throwing the rulebook on them.
How can they know the internal infrastructure but have to assume the app language?
Edit: So the entire piece is taken almost verbatim as-is from a couple old articles on instagram engineering blog.
It might as well just redirect to: https://instagram-engineering.com/what-powers-instagram-hund...
This is against the guidelines:
> Please submit the original source. If a post reports on something found on another site, submit the latter.