Hacker News new | past | comments | ask | show | jobs | submit login

> The other thing is that cloud hardware is generally very very slow and many engineers don't seem to appreciate how bad it is.

This. Mostly disk latency, for me. People who have only ever known DBaaS have no idea how absurdly fast they can be when you don’t have compute and disk split by network hops, and your disks are NVMe.

Of course, it doesn’t matter, because the 10x latency hit is overshadowed by the miasma of everything else in a modern stack. My favorite is introducing a caching layer because you can’t write performant SQL, and your DB would struggle to deliver it anyway.






> Of course, it doesn’t matter, because the 10x latency hit is overshadowed by the miasma of everything else in a modern stack.

This. Those complaining about performance seem to come from people who are not be aware of latency numbers.

Sure, the latency from reading data from a local drive can be lower than 1ms, whereas in block storage services like AWS EBS it can take more than 10ms. An order of magnitude slower. Gosh, that's a lot.

But whatever your disk access needs, your response will be sent over the wire to clients. That takes between 100-250ms.

Will your users even notice a difference if your response times are 110ms instead of 100ms? Come on.


While network latency may overshadow that of a single query, many apps have many such queries to accomplish one action, and it can start to add up.

I was referring more to how it's extremely rare to have a stack as simple as request --> LB --> app --> DB. Instead, the app almost always a micro service, even when it wasn't warranted, and each service is still making calls to DBs. Many of the services depend on other services, so there's no parallelization there. Then there's the caching layer stuck between service --> DB, because by and large RDBMS isn't understood or managed well, so the fix is to just throw Redis between them.


> While network latency may overshadow that of a single query, many apps have many such queries to accomplish one action, and it can start to add up.

I don't think this is a good argument. Even though disk latencies can add up, unless you're doing IO-heavy operations that should really be async calls, they are always a few orders of magnitude smaller than the whole response times.

The hypothetical gains you get from getting rid of 100% of your IO latencies tops off at a couple of dozen milliseconds. In platform-as-a-service offerings such as AWS' DynamoDB or Azure's CosmosDB, which involve a few network calls, an index query normally takes between 10 and 20ms. You barely get above single-digit performance gains if you lower risk latencies down to zero.

In relative terms, if you are operating an app where single-millisecond deltas in latencies are relevant, you get far greater decreases in response times by doing regional and edge deployments than switching to bare metal. Forget about doing regional deployments by running your hardware in-house.

There are many reason why talks about performance needs to start by getting performance numbers and figuring out bottlenecks.


Did you miss where I said “…each service is still making calls to DBs. Many of the services depend on other services…?”

I’ve seen API calls that result in hundreds of DB calls. While yes, of course refactoring should be done to drop that, the fact remains that if even a small number of those calls have to read from disk, the latency starts adding up.

It’s also not uncommon to have horrendously suboptimal schema, with UUIDv4 as PK, JSON blobs, etc. Querying those often results in lots of disk reads simply due to RDBMS design. The only way those result in anything resembling acceptable UX is with local NVMe drives for the DB, because EBS just isn’t going to cut it.


It's still a problem if you need to do multiple sequential IO requests that depend on each other (example: read index to find a record, then read the actual record) and thus can't be parallelized. These batches of IO sometimes must themselves be sequential and can't be parallelized either, and suddenly this is bottlenecking the total throughput of your system.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: