Not really. People are not posting data into Netlix. Netflix is mostly read-only...

PittleyDunkin · 2024-11-13T19:14:09 1731525249

Is it? It's pretty rare to download assets from servers that you're uploading to. Sometimes you have truly interactive app servers but that's a pretty small percentage of web traffic. Shared state is not the typical problem to solve on the internet, though it is a popular one to discuss.

nicce · 2024-11-13T19:27:44 1731526064

Whatever your service is, usually the database is the bottleneck. The database limits the latency, scaling and availability.

Of course, how much, depends on the service. Particularly, how much concurrent writing is happening, and do you need to update this state globally, in real-time as result of this writing. Also, is local caching happening and do you need to invalidate the cache as well as a result of this writing.

The most of the relevant problems disappear, if you can just replicate most of the data without worrying that someone is updating it, and you also don't have cache invalidation issues. No race conditions. No real-time replication issues.

PittleyDunkin · 2024-11-13T19:40:14 1731526814

> Whatever your service is, usually the database is the bottleneck. The database limits the latency, scaling and availability.

Database-driven traffic is still a tiny percentage of internet traffic. It's harder to tell these days with encryption but on any given page-load on any project I've worked on, most of the traffic is in assets, not application data.

Now, latency might be a different issue, but it seems ridiculous to me to consider "downloading a file" to be a niche concern—it's just that most people offload that concern to other people.

nicce · 2024-11-13T22:53:02 1731538382

> It's harder to tell these days with encryption but on any given page-load on any project I've worked on, most of the traffic is in assets, not application data.

Yet you have to design the whole infrastructure to note that tiny margin to work flawlessly, because otherwise the service usually is not driving its purpose.

Read-only assets are the easy part, which was my original claim.

PittleyDunkin · 2024-11-14T03:05:45 1731553545

> Read-only assets are the easy part, which was my original claim.

I don't think this is true at all given the volume. With that kind of scale everything is hard. It's just a different sort of hard than contended resources. Hell, even that is as "easy" these days with CRDTs (and I say this with dripping sarcasm).

nicce · 2024-11-14T19:38:26 1731613106

Asset volume is just a price issue in these days. You can reduce the price by using clever caching, with programming language choices, or higher compression rate.. but in the end it is not a real problem anymore in the overall infrastructure architecture. Read-only assets can be copied, duplicated, cached without any worries that they might need to be re-synced soon.

TZubiri · 2024-11-14T01:51:23 1731549083

That's true as the internet becomes more $ focused, companies become more interested in shoving messages (ads/propaganda) than letting users say anything). Even isp plans have asymmetric specs.

I'm honestly much more impressed by free apps like youtube and tiktok in terms of throughput, they have MUCH more traffic since users don't pay!

jacksontheel · 2024-11-13T17:09:38 1731517778

Every time you like/dislike/watchlist a movie you're posting data. When you're watching a movie your progress is constantly updated, posting data. Simple stuff but there's possibly hundreds of thousands of concurrent users doing that at any given moment.

nicce · 2024-11-13T17:18:27 1731518307

Yes, but it is still counts only a fraction of the purpose of their infrastructure. There are no hard global real-time sync requirements.

> When you're watching a movie your progress is constantly updated, posting data

This can be implemented on server side and with read requests only.

A proper comparison would be YouTube where people upload videos and comment stuff in real-time.

TZubiri · 2024-11-14T02:05:03 1731549903

There's many cases where server side takes you 98% there, but it still makes economical sense to spend shitloads on getting that 2% there.

In this case it's not the same whether your server sends a 10 second packet and the viewer views all of it, and whether server sends a 10 second packet, but client pauses at the 5s mark (which needs client-side logic)

Might sound trivial, but at netflix scale there's guaranteed a developer dedicated to that, probably a team, and maybe even a department.

PittleyDunkin · 2024-11-13T19:15:48 1731525348

> A proper comparison would be YouTube where people upload videos and comment stuff in real-time.

Even in this one sentence you're conflating two types of interaction. Surely downloading videos is yet a third, and possibly the rest of the assets on the site a fourth.

Why not just say the exact problem you think is worth of discussion with your full chest if you so clearly have one in mind?

TZubiri · 2024-11-14T01:56:32 1731549392

I'd make a distinction on:

-the entropy of the data: a video is orders of magnitude than browsing metadata.

- the compute required: other than an ML algorithm optimizing for engagement, there's no computationally intensive business domain work (throughput related challenges dont count)

- finally programming complexity, in terms of business domain, is not there.

I mean my main argument is that a video provider is a simple business requirement. Sure you can make something simple at huge scale and that is a challenge. Granted.

TZubiri · 2024-11-13T15:54:53 1731513293

I thought about the complexity in terms of compute, but I guess if there's no input then there's no compute possible, as all functions are idempotent and static. At the very least their results are cacheable, or the input is centralized (admins/show producers)