Thanks for the article! I love lamport clocks, and the lightweight methods are cool.
I read through the article a couple of times, and was unsure about something:
When the browser resets, it does not appear to copy down the current state and clock value from the server, right? I'm basing that on this from the article:
The browser resets, all the state is dropped.
browser: settings: none, clock: 0
backend: settings: foo, clock: 2
I don't think that lamport clocks can be compared if they are not representing any concept of causality or knowledge, right? This implementation appears to be two separate integers (one for the client and one for the server), with no guarantee as to their ability to communicate with each other. One bug you discovered was that the server might have a value 2, and the client might have a value 2, but they would disagree on the state that value "2" referred to. A bug indeed!
Your fix was to handle the case where the client and server tied. However, I was left wondering: What about the case where the knowledge of the client is higher than the server, but the server still should not trust it? What if, in your example step 4, the client had made 3 changes without the ability to sync with the server?
It would be this:
The user changes the settings three times, but this time the state is not propagated to the backend (e.g. due to network hiccups).
browser: settings: none, clock: 3
backend: settings: foo, clock: 2
In that case, by the time the client talks to the server, the server would think that the client had "won", but the fact that the client had a clock value of "3" would be meaningless, right? Wouldn't you still want the server to win in this case?
Yes, you are right. To rephrase, you say that a browser could first increment its clock to the value larger that the backends, and then synchronize and "win" with the backend, setting the backend value to some bogus "none".
I'd say that it's "fine" in a way that the browser and the backend agree to "something" and are in sync. The case we wanted to prevent was that the browser and the backed hold different values and cannot agree which value to converge to.
So what is needed here is some sort of data type that is replicated, and guaranteed to converge. A convergent, replicated data type… a CRDT! In this case it rather seems like you wanted a regular-old time stamp, not a lamport timestamp, which gives you a “last-write-wins register” CRDT.
But perhaps you wanted something that actually handles the conflicts, like for a list of subscribers to an event. In that case you would want to use a vector clock instead of a lamport timestamp, and then when neither clock dominates the other (aka when they are tied), you take both sets and merge them. This is an incomplete outline of the “add-wins set” CRDT.
While I agree that CRDTs could work here, what struck me is you comment about a “regular-old timestamp”. In distributed systems, this isn’t simple. In fact, Lamport Clocks are a specific solution to guaranteeing a consistent monotonically increasing clock such that one time is known to occur after the other. A single instance, or primary db, can provide this, but in distributed systems this is the primary problem. There is no “regular-old timestamp” in distributed systems.
Right. You can't guarantee clocks in a distributed system are synchronised, unless you control all the nodes. Lamport clocks don't use time, they work on causal order. This came after that, not this happened at a particular time.
Definitely true. In this case, where nodes are browsers (so clocks can easily be synced to within a few seconds of one another) and changes are user settings (so they change less than once an hour), a last-write-wins register seems like a great choice for an atomic setting like a boolean.
For settings where we can more intelligently handle conflicts (e.g. sets), we don't need the timestamps, because we can take advantage of vector clocks instead.
Interesting. Incidentally, I would claim that TDD done properly is in fact a lightweight formal method. It helps to be conversant with some kind of formal semantics, but you can definitely play a bit fast and loose by treating your tests as a lightweight specification. In that case any formal properties I want the code to have that can't practically be expressed in the test is included in a comment on the test. Also, fuzzing is a way to narrow that gap further.
Property-based tests let us state the formal properties we want, even though we can't verify them. For example, ScalaCheck lets me state the following:
forAll (x: Int) { assert(x.abs >= 0) }
This is useful as a spec, as documentation, and as a test suite: e.g. ScalaCheck can try to disprove such properties by checking a bunch of random inputs.
Whilst such approaches don't let us prove such statements (except for exhaustively checking small input spaces, e.g. `forAll (x: Boolean, y: Boolean) ...` only needs four tests), I find it just as easy as normal unit testing, and far less effort than formal proofs (e.g. using Coq or Agda)
(FYI that property is in fact false, since it doesn't hold for -2147483648; ScalaCheck's random generators are biased towards such "problematic" values, e.g. -inf, NaN, etc., so it usually find such things)
Believe me, I take Dijkstra's message to heart. And a quality test suite can reify the decisions of a formal specification as expressed in PTS or some other suitable abstraction. Think more here's the proof, more or less rigorous as circumstances suggest reasonable, and here's some tests that capture as much of that in code as practicable to help give us some further confidence that we didn't err in our manipulations.
That's a category error. The formal spec isn't a proof, it's what is to be proven. Without a spec the notion of programming errors has no formal meaning. In that case, the complaints about the bug are always with respect to pleasantness. Given a formal specification, it's possible to write a program that will reliably satisfy that specification. And the tooling is better than ever. The Dafny language out of Microsoft Research is one interesting example. One can also do it by hand, possibly using an SMT solver to check one's work.
Yes, as a consultant Mr. Beck has no concern about code correctness. He only needs it to please the client sufficiently to maintain his reputation. Since users are used to low quality software, that’s a practicable bar to meet just winging it.
And as a seller of programming books, he increases his target audience by excluding the small percentage of mathematically literate programmers and appealing to the majority who find logic intimidating.
I’d encourage interested readers to look at some of Leslie’s other papers. His work from decades past was very forward thinking and useful in today’s world.
His work has directly shaped how many distributed systems reach consensus. Including recent projects like Solana. Also check out VDF and proof of history.
Yes, that's exactly right. Lamport clocks provide a partial ordering of events, based on the idea of "happened before". Specifically, Lamport defines the happens before operator →, such that within any one thread/process if a is done then b is done then a→b, and if a is one process sending c message and d is another process receiving the message then c→d. This allows a model where single-threaded processes communicate through messages, and allow us to reason about things like "if a could have caused b to happen, then a→b". Logical clocks are an extension of this model.
Bonus all-time great paper review: "Jim Gray once told me that he had heard two different opinions of this paper: that it’s trivial and that it’s brilliant. I can’t argue with the former, and I am disinclined to argue with the latter." (from Lamport at https://www.microsoft.com/en-us/research/publication/time-cl...).
I read through the article a couple of times, and was unsure about something: When the browser resets, it does not appear to copy down the current state and clock value from the server, right? I'm basing that on this from the article:
I don't think that lamport clocks can be compared if they are not representing any concept of causality or knowledge, right? This implementation appears to be two separate integers (one for the client and one for the server), with no guarantee as to their ability to communicate with each other. One bug you discovered was that the server might have a value 2, and the client might have a value 2, but they would disagree on the state that value "2" referred to. A bug indeed! Your fix was to handle the case where the client and server tied. However, I was left wondering: What about the case where the knowledge of the client is higher than the server, but the server still should not trust it? What if, in your example step 4, the client had made 3 changes without the ability to sync with the server? It would be this: In that case, by the time the client talks to the server, the server would think that the client had "won", but the fact that the client had a clock value of "3" would be meaningless, right? Wouldn't you still want the server to win in this case?