How the Dropbox Datastore API Handles Conflicts – Part Two

jchrisa · on Aug 30, 2013

The problem with this approach is that it requires you to resolve conflicts when you first see them. So you can't do workflows that accumulate more than two versions of what happened. Nor can you resolve the conflict asynchronously.

For an approach that doesn't ever force you to throw away or merge data, see the data structures in my OSCON talk. https://speakerdeck.com/jchris/couchbase-lite-oscon

I owe the world a write-up explaining these slides. Or come see me talk at StrangeLoop in St. Louis or Couchbase SF in September. http://www.couchbase.com/couchbase-sf-2013

erikb · on Aug 30, 2013

I could interprete it as a multiple-branch model with Git. But it's still rather confusing for me what use it would have for anybody outside of source code repos (meaning: I don't have the knowledge, not that there wouldn't be a usecase).

For some end users it's already confusing to have different states in the same linear history on different devices because of upload times and devices being offline. I think giving most users multi-branch capabilities would overpower them and they might just ignore all branches but one, manually merging parts of older branches into the newer ones.

So I guess you assume usecases for application writers? I'm also not deep enough into this topic that I could imagine any usecases. Would you mind giving some examples?

jchrisa · on Aug 31, 2013

As a database, the #1 rule is don't lose data. So a model that allows you to defer conflict merges, and to do the merge on any peer, keeps the database doing what it does best, and gives maximum flexibility to the application.

For instance on a customer record you might have three sales folks all qualify a lead as a 4, one adds a fax number, one adds an email address, and one adds the same fax but a different email, and a note about the best time to call. If the database forced the application to merge on encountering each update, there'd never be a chance to set all three versions side by side and make an informed comparison. Instead it would be luck that A and B merge, and then the result merges with C. Or some other order.

Where it gets really useful is if you have different actors that have ownership of the merge on different fields. So for instance you could partially resolve the conflict by merging some fields automatically but present a human with the full set of options for a bigger octopus merge on, say, the notes field.

pkj · on Aug 31, 2013

If a client app is processing the cust record, how would cust.email look like ? Typically I would assume a single value, but in case of unresolved conflicts will it be a iterator/array ?

jchrisa · on Aug 31, 2013

The client will have N copies of the customer record, so the record schema doesn't change. Instead all the potentially valid versions of it are preserved until a peer (any client or server) marks the conflict as resolved.

erikb · on Sept 5, 2013

this is not necessarily the number one rule. There are other rules as well, e.g. being 100% consistent, fast or available. I'm not a database expert but I don't think you can have it all.

pkj · on Aug 31, 2013

Trying to wrap my head around this. Seems difficult without clear usecases.

Let's say I have 10 devices d1,d2....d10 making updates to "a" on the server and went offline. a==20 and last update was by d5 before everyone went offline.

When the devices come back up, the fate of "a" depends on the rulesets. Following are 3 possible high-level combinations.

i) All devices have "remote" rule. On reconnection, everyone rollback "a" to 20. They are essentially back to the time before going offline. Even the device which did the last update(d5) before going offline is rolled back too, which seems bit odd. Still simple to reason with..

ii) All devices have "local" rule. On reconnection, the last device to reconnect updates "a". It is then broadcasted to all other devices. Note that it is not the last device to update "a". Rather it is the last to reconnect (Now, even if all of them reconnect at same time, depending on the queueing at server, the one at the tail wins). Not really simple..

iii) Mix of "remote" and "local" Let's say d1 had "local" rule and all others had "remote". On reconnection, d1's "a" will be propagated to everyone. This is irrespective of the order of reconnection (I am assuming that between reconnections "a" is not modified). This is pretty simple and perfectly predictable. Now, if we have more than one "local", we start getting non-deterministic, and at the extreme move to case ii)

smarx · on Aug 31, 2013

Note that every device submits its change with an expected "parent revision." The server checks the change against the server's current revision, and the change is accepted if and only if the server revision matches the parent revision of the submitted change.

So when devices d1 through d10 make a (simultaneous, I assume) change, they all submit their change with the same parent revision. Assuming they were up-to-date before they submitted that change, exactly one of the device's changes will succeed (whichever reaches the server first). For example, if the previous revision was 100, they'll all submit changes with the parent revision 100, and the first one to reach the server will succeed, at which point the server revision will be increased to 101. When the other changes come in, they'll all fail because their parent revision doesn't match the server.

I have to slightly revise your scenario and say all these devices went offline and then made a local change. This may be what you meant, but I want to clarify that the changes were queued up locally but not yet sent to the server.

So in (i), where the conflict resolution strategy is "remote," what will happen is that one device's change will win (whichever reaches the server first), and all other devices will throw out their change in favor of the change that made it to the server. It's not the case that everybody's changes are rolled back.

In (ii), the first device to connect submits its change and is accepted by the server. Subsequent devices submit their change (with parent revision of 100), see that they're out of date (server revision is now 101 or higher), and resubmit their change with the new parent revision, effectively clobbering any changes that have been made on the server. So each device in turn clobbers the value on the server, and the ultimate value is whichever change was submitted last.

For (iii), really don't do that. Having different devices use different conflict resolution strategies is a bad idea, and I can't really think of a valid scenario fro that. Can you?

So to sum up, (i) would mean "first change to reach the server wins," and (ii) would mean "last change to reach the server wins."

pkj · on Aug 31, 2013

Thanks for the comprehensive clarification. Yes you are right that the use of revision-id would allow "first change to reach the server wins" for case (i). It would result in a simple and fair outcome. I did not see the mention of revision usage in the parent link. But it makes absolute sense.

Think we agree on case ii). On case iii) I still think that having 1 local and N-1 remote might be useful when we want to prioritize a particular writer over others. Borrowing the sales example from commenter jchrisa, consider a new user sales-head (local rule). He syncs the data uploaded by his sales folks (remote rule) and then goes offline. When he is done editing and comes online, he wants to make sure his delta takes precedence irrespective of any previous changes by sales folks during his being offline. Since he has local rule, his update will just win. Further, he does not want sales guys who were offline and come online after him to overwrite his last update immediately. I am assuming that the sales folks with "remote" rule will see the data with newer server version and accept it.

smarx · on Aug 31, 2013

I suppose your scenario for (iii) is reasonable in the case of sharing a datastore among multiple users. In that case, you'd probably still want all of that user's devices to use the same conflict resolution strategy.

BTW, (for now) the Datastore API doesn't support sharing among multiple users.

Given the relationship the Datastore API has between the client and the server, it's also possible for a client to use a custom resolution rule (though this isn't yet exposed in any of our clients), which would let you do smarter things like look at the changes that were made and decide based on app-specific logic which one should win. I personally think that's a big advantage, but as I said, it's not one that's yet been fully realized.

boomzilla · on Aug 30, 2013

Awesome, this is one of the best explanations of version conflict resolution in distributed storage.

The best engineers not only needs to write great code, they need to write the best documentations too :)

joejohnson · on Aug 30, 2013

I might not understand what the Datastore API is for, but if I wanted to make a simple app that could sync to Word Docs between to clients, would these .doc files get represented in this database structure (tables, records and values)? Is that data structure just for example purposes?

How would this algorithm handle changes to the same text file? Is the file a record, and each row considered a value in that record?

EDIT: Looks like this API is only for "structured data like contacts, to-do items, and game state." https://www.dropbox.com/developers/datastore

smarx · on Aug 30, 2013

Your edit is correct. To sync files, you would use the Core API or Sync API.

(https://www.dropbox.com/developers/core and https://www.dropbox.com/developers/sync)

joejohnson · on Aug 30, 2013

I'd be interested to hear how conflicts are handled with those APIs.

smarx · on Aug 30, 2013

The API uses (optional) optimistic concurrency. So when you write a file, you can specify the expected parent revision, and if the revisions don't match, the operation will fail. If you want, you can have Dropbox instead just overwrite whatever file is present on the server or write the new file but rename it in the process (e.g. "myfile (1).txt").

This is pretty much all that's possible to do generally for files, since their structure is opaque. One of the ideas behind the Datastore API is to support developers structuring their data in such a way that changes can be merged automatically.

joshuak · on Aug 30, 2013

Why does chrome prompt me for keychain access on this page?

I'm not logged in, and it doesn't log me in.

minikites · on Aug 30, 2013

I've heard around (I don't use Chrome personally) that Chrome just prompts for keychain access until you give up and grant it access all the time.

joshuak · on Aug 30, 2013

Yes, but even then it only needs to prompt when it wants information from the keychain to fill in forms and such. The question is why would this blog page trigger such an event?

iza · on Aug 30, 2013

There is a sign in dropdown form in the upper right.