My questions were in the context of effective caching in the query process, sorr...

noahdesu · on Nov 12, 2017

> A process on a single machine somewhere has to get enough of the indexes and data into memory to answer questions about it.

This common challenge is perhaps magnified in systems that have deep storage indexes. For example, the link you posted seems to suggest that queries in Datomic may have cache misses that require pointer chasing down into storage, adding latency. Depending on where that latency cost is eaten, it could have a lot of side effects (e.g. long running query vs reducing transaction throughput).

This problem is likely exacerbated in the key-value store running on zlog because the red-black tree can become quite tall. Aggressive, optimistic caching on the db nodes, and server-side pointer chasing helps. It's definitely an issue.

I don't know anything about Datomic internals, so disclaimer, I'm only speculating about why Datomic doesn't have low cost clones: which is that the underlying storage solution isn't inherently copy-on-write. That is, Datomic emulates this so when a clone is made there isn't an easy metadata change that creates the logical clone.

Despite the lack of features etc in the kv-store running on zlog, database snapshots and clones are both the same cost of updating the root pointer of the tree.