> we can assume that Instagram was written using Objective-C and a combination o...

rmetzler · on Sept 16, 2023

Instagram engineers themself wrote a bit about their backend infrastructure. One of the more important topics was how they shard the data [0] and this is also linked to in this blog post.

[0]: https://instagram-engineering.com/sharding-ids-at-instagram-...

Sparkyte · on Sept 16, 2023

If it wasn't just an image site the potential for hotspotting would be insane!

Size isn't a bad thing anymore since price has dropped exponentially since the inception of Instagram.

I am positive they would use another modern technology today if it was present in the past.

Fantastic read though.

_3u10 · on Sept 16, 2023

The potential for hotspotting decreases with the number of inserts per second. Like if you only did 1 insert per second and timed it right you could put all of those inserts on one server, but this would likely not overload the server.

It's virtually impossible for anyone to hotspot in a meaningful way with this system.

Sparkyte · on Sept 16, 2023

Nah this will totally be an issue. You can be on the extreme end of replication or the extreme end of sharding and experience performance problems. Sharding is more likely to hotspot depending on where hot data is consistent.

The solution in most cases is a simple database that acts as a pointer database user db -> user's db. That is generated on the creation of a user.

From here you create some simple cold storage models ( if user isn't active ) and some warm models which will scale out the db if the user's db grows it shards and replicates for more read access. But the last thing you want is to slow replication or have one DB that can't move to balance resource utilization. There are some new DB tech that does this without even sweating the deets.

coldtea · on Sept 16, 2023

It IS an original source. It's not just a repost or a report on another older article. It's a reworking of those articles.

nulltype · on Sept 16, 2023

I suspect they mean it’s a secondary source not a primary one

nielsole · on Sept 16, 2023

I wonder how they horizontally scaled shards. If they had 2k logical shards they probably had much fewer real/physical shards. So a single database was holding many sharding keys. So when a new physical shard gets added, the data needs to somehow be replicated. That is only true if w reading is the problem. If only a relatively short time period of data is hot you can probably just move over the logical shard IDs to the new physical shard without moving existing data. This requires keeping track of when which physical shard became active.

rjzzleep · on Sept 16, 2023

> How can they know the internal infrastructure but have to assume the app language?

At that time that was the only solution that would have made sense given the achieved behaviour, performance AND development effort.

I did spend a lot of time Cordova(PhoneGap) and all the other HTML5 app thingies for iOS at the time.

Not sure why that particular in my opinion pretty obvious choice bothers you that much. That is very much the reason why they didn't even bother releasing it for Android until almost two years later.

danparsonson · on Sept 16, 2023

They had a different person/team working on the front end and/or they don't remember?

_xivi · on Sept 16, 2023

[flagged]

coldtea · on Sept 16, 2023

Enough with the "rules" just because you don't find it novel enough.

The author is on HN and says "Just my own brain reading through old talks and articles from Instagram engineering and Excalidraw for the diagrams. I did my best to put together all the info I learned from them into a comprehensive and simple manner".

You can take it with them.

blackoil · on Sept 16, 2023

If it was compiled from more than 1 article, it becomes an original article. The post should have only novel idea and info. is an arbitrary requirement and not the meaning of that rule.

Though I would add if author were taking anything verbatim, that should be highlighted as a quote with the original source. (edit: reading more, author has already done that.)

_rm · on Sept 16, 2023

[flagged]

bluepizza · on Sept 16, 2023

Not OP. No anger in his words. Please don't make OP feel inadequate for expressing themselves clearly.

coldtea · on Sept 16, 2023

They sure seem bothered a lot by something trivial (god forbid the post which was NOT made for HN anyway quoted some original sources and didn't go into the detail they'd like it to).

Somebody took the effort of compiling an article on several sources, and we're throwing the rulebook on them.

throwaway290 · on Sept 16, 2023

The author is not the submitter. Rulebook is thrown for submitting, not for writing.

jacquesm · on Sept 16, 2023

Unless you're a mind reader you are way out of line with this comment.

rhuru · on Sept 16, 2023

Looks like those wordpress bots have finally figured out HN as well.

_rm · on Sept 16, 2023

[flagged]

kitd · on Sept 16, 2023

Ah, but that's just the sort of thing an AI Wordpress bot would say.

surfingdino · on Sept 16, 2023

Looks like someone may have been using ChatGPT to produce that post.

engineercodex · on Sept 16, 2023

Author here. No ChatGPT was used.

Just my own brain reading through old talks and articles from Instagram engineering and Excalidraw for the diagrams.

I did my best to put together all the info I learned from them into a comprehensive and simple manner.

renegade-otter · on Sept 16, 2023

Actually, ChatGTP is very useful for fixing your writing.

Sometimes when a paragraph I write reads a little too harsh to the ear, I ask ChatGTP to rewrite it - it's still my original thought.

It's really effective, but I tend to tone it down a bit to sound like myself since the output can be too formal, dry, and "academic".