My questions were in the context of effective caching in the query process, sorry that wasn't clear. A process on a single machine somewhere has to get enough of the indexes and data into memory to answer questions about it. Though I guess there are other types of queries you might want to do, like streamy mapreduce type stuff that doesn't need the whole log.
I will have to think about if a key-value-time storage backend would have any implications in Datomic. One thing Datomic Pro doesn't have is low-cost database clones through structure sharing, and it doesn't make sense to me why.
> A process on a single machine somewhere has to get enough of the indexes and data into memory to answer questions about it.
This common challenge is perhaps magnified in systems that have deep storage indexes. For example, the link you posted seems to suggest that queries in Datomic may have cache misses that require pointer chasing down into storage, adding latency. Depending on where that latency cost is eaten, it could have a lot of side effects (e.g. long running query vs reducing transaction throughput).
This problem is likely exacerbated in the key-value store running on zlog because the red-black tree can become quite tall. Aggressive, optimistic caching on the db nodes, and server-side pointer chasing helps. It's definitely an issue.
I don't know anything about Datomic internals, so disclaimer, I'm only speculating about why Datomic doesn't have low cost clones: which is that the underlying storage solution isn't inherently copy-on-write. That is, Datomic emulates this so when a clone is made there isn't an easy metadata change that creates the logical clone.
Despite the lack of features etc in the kv-store running on zlog, database snapshots and clones are both the same cost of updating the root pointer of the tree.
I will have to think about if a key-value-time storage backend would have any implications in Datomic. One thing Datomic Pro doesn't have is low-cost database clones through structure sharing, and it doesn't make sense to me why.
Here is a description of some storage problems that arise with large Datomic databases, though it's not clear to me if ZLog can help with this < http://www.dustingetz.com/ezpyZXF1ZXN0LXBhcmFtcyB7OmVudGl0eS..., >