A simple distributed algorithm for small idempotent information

adolph · on Feb 21, 2014

Idempotence by example: Looking up some customer's name and address in a database are typically idempotent (in fact nullipotent), since this will not cause the database to change. Similarly, changing a customer's address is typically idempotent, because the final address will be the same no matter how many times it is submitted. However, placing an order for a car for the customer is typically not idempotent, since running the method/call several times will lead to several orders being placed. Canceling an order is idempotent, because the order remains canceled no matter how many requests are made.

https://en.wikipedia.org/wiki/Idempotence

vishnugupta · on Feb 21, 2014

While the wiki article seems to do a decent job of explaining mathematics behind idempotency it sheds no light on why it's so important in distributed systems. antirez talks about idempotency in sort of mathematical sense, though I could be wrong as I've just skimmed through the article.

Since failures are a norm in distributed systems one would want clients to retry a request without worrying about double updations (placing the order twice in the example above, debiting the money twice etc.,).

Just elaborate this further. Let's say the "place order" request didn't even reach the server. In this case client can just retry the request. However, if the request did reach the server but the response didn't reach the client then client can't safely retry without creating duplicate order. The problem here for the client is that there's no way for it to distinguish between a lost request vs lost response.

In such cases idempotency is achieved by requiring clients to send a unique identifier with each request. Server can now ensure that for a given request identifier the operation is performed just once.

bch · on Feb 21, 2014

Knowing the Raft[0][1] protocol, I felt this post was disingenuous going on for its entire length, and only mentioning Raft basically as a footnote, w/ no other references.

I think Raft deserves a bit more prominence and credit here, Antirez.

  [0] (pdf) https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf
  [1] I kept reading thinking to myself repeatedly "This is just Raft".

antirez · on Feb 21, 2014

In the very first part of the post: "The algorithm is mostly borrowed from Raft, however because of the premises it uses only a subset of Raft that is trivial to implement."

Maybe you only scanned the article.

bch · on Feb 21, 2014

:) my apologies -- I started reading your introduction and then moved on to the meat of the article...

antirez · on Feb 21, 2014

np ;-) after all as an heavy HN reader I understand very well the impossibility to read whole posts...

bkirwi · on Feb 21, 2014

It looks like the protocol is broken; can anyone confirm?

Suppose a handful of the processes trigger an election simultaneously, each with a `currentEpoch` of `n`, so that each process has voted but no proposal has a majority. Now all processes have a `lastVoteEpoch` of `n`.

According to the description, no process will vote more than once in a single epoch. However, a process will can only propose a new value using its `currentEpoch`. Since each proposal must have a `currentEpoch` of `n`, but no process is still allowed to vote in that epoch, no more state transitions are possible and the frequency cannot change.

antirez · on Feb 21, 2014

Check the "retry" part, basically when there is a split-brain during an election, a new process will try again and eventually will succeed. This is copied from Raft.

It works because when some instance retries, it increments again the currentEpoch, so ti is a new voting process from the point of view of the voters.

Btw it is not needed that to retry is the instance that did it the previous time, because even if the voting failed, the currentEpoch of the process that tried to get elected, will be propagated across the system, so every process will get the new epoch, and can try with an updated epoch to get voted.

mjb · on Feb 21, 2014

Right. For the same reason, the Paxos protocol allows multiple prepare votes from each acceptor. It makes this safe by requiring that each acceptor tell the proposer of the value for the highest ballot number it has accepted, and requires the proposer to switch proposal values when receiving such a message. This is, at first glance, one of the strange edges of the Paxos (synod) protocol, but it makes a lot of sense once you see an example where it is needed.

Raft has somewhat similar constructs, but explained slightly differently.

antirez · on Feb 21, 2014

Hey, actually Raft uses randomized retries, since I copied this part (and most of the rest) from Raft. Voting can result into split brain as this condition is recovered, and a retry time that is big enough compared to RTT guarantees liveness.

gbhn · on Feb 21, 2014

It looked similar. Can you highlight the differences from Raft? Is it just that state is small so it can accompany the various voting/heartbeat messages?

antirez · on Feb 21, 2014

Because the state is small and does not depend on the previous computation, the problem becomes trivial, and is accomplished just by broadcasting the full information with an attached unique version. No need for a log, rules to apply or not entries from the log to state machine, and so forth.

bkirwi · on Feb 21, 2014

Hmm -- if a process bumps its `currentEpoch` for every proposal, and the delays are randomized, it looks like this would fail only probabilistically and may eventually succeed. I'll have to take a closer look later today...

antirez · on Feb 21, 2014

Exactly that. It works only as long as the retry time is randomized and much larger than then network RTT.