Hacker News new | past | comments | ask | show | jobs | submit login

Actually, I think we are on the same side. :-)

There is still a small difference though: when I execute a write and sync, then I'm pretty sure it either succeeded or didn't. I'm not sure if there is any way for the disk to say "don't know" which the OS will then pass to my program. Even if, I guess such a case is so rare that it is probably excluded in any error-model. Like cosmic rays. And I can always retry, since it's very very unlikely that there is a connection issue only between my program and the disk but not between other programs and that disk - since they run on the same machine.

Over a network things are different, because the connection might be gone for a long time and know I'm not aware of the state and I can't check it. I can also not tell if the connection might come back, the machine I was talking to might have burned down. If I, myself, burn down, then I don't need to worry anyways.

That's the difference. This is specific for write/sync though - as you explained it, just because things are running on the same machine does not mean things are necessarily more reliable than a "remote" call.




Why does there have to be a difference?

When modeling rpc calls as async calls returning a result, you can be just as sure that it has completed (or failed) as with a local call.

And considering write can use an underlying mounted network share, a spun-down HDD, or a thumb drive on a failing USB port, the same failure modes exist in either case.

In either case you're just submitting commands to another device's queue (and waiting for a result) over a hot-pluggable connection.


Because either the disk is down for everyone on that machine or for no one. That it is only down for my program is theoretically possible (e.g. a bug in the OS or driver) but it's so unlikely that it's usually moved into the "cosmic rays" category.

This is not true for networks (the internet).


I don't disagree with that, but does that matter for your program?

Your code still has to handle the same failure modes regardless.

I've had many situations where e.g. reading locally cached files on a smartphone eMMC had worse latency and throughput than doing the same read over the network on the server. The cache actually made performance worse in every way.


Yeah it matters. It's the reason why e.g. postgres offers data consistency and still operates in parallel on the disk.

If it were the same doing this over a network, then we could have a distributed, consistent postgres. What would I give for that. :-)


Well, I wouldn't use that as an example, as Postgres actually misinplemented write/sync in the past causing data corruption ;)

Regarding having a distributed postgres, that's not an issue actually. As long as all writing workers have access to the same synchronization primitives you can easily use postgres with e.g. k8s volumes in single writer, multiple reader mode. And single master, multiple read replica postgres deployments are common too.

The guarantees you're talking about aren't given by the storage implementation, but by the fact that all write workers run on the same machine.


There is no distributed postgres that behaves like a single one (except for being a bit slower sometimes or do).


All I claimed was that, if you let postgres run against network mounted storage it works the same as if you used it with local storage.

Do you see what I meant?


Sure, but what's the point? There still is no distributed postgres. there are read replicas, but that's not the same thing.

So why do you think that is?


I dunno what the point is supposed to be, you brought it up




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: