Hacker News new | past | comments | ask | show | jobs | submit login
Differential Datalog (github.com/vmware)
231 points by maximilianroos on March 20, 2021 | hide | past | favorite | 62 comments



I always fancied a database that worked like a dynamic version of prolog, with facts that could be added over time, queries that can use inference etc. I couldn't really find anything that worked like that though, everything seemed to either have a fixed set of facts or didn't support inference. Looking for something like that is what first got me interested in differential dataflow.

The fact that this is bottom up makes me think that you probably can't go crazy with the number of rules.


Prolog provides this with so called dynamic predicates.

For example, if we declare the predicate f/1 dynamic using the standard directive dynamic/1:

    :- dynamic(f/1).
then we can assert new clauses for f/1 using the standard predicates asserta/1 and assertz/1:

    ?- asserta(f(a)), assertz(f(b)).
       true.
We then get:

    ?- f(X).
       X = a
    ;  X = b.
Rules can also be asserted:

    ?- use_module(library(lists)), asserta((f(X):-length(_,X))).
       true.
yielding:

   ?- f(X).
      X = 0
   ;  X = 1
   ;  X = 2
   ;  X = 3
   ;  ...
I used Scryer Prolog to run these examples.

Dynamic changes to the database have advantages and drawbacks. On the plus side, they let us incorporate new knowledge and rules, and we automatically benefit from built-in features such as argument indexing. However, running the same query twice can lead to different results, changes in the order of goals can affect the meaning of the conjunction, and we can no longer reason about different predicates in isolation. Dynamic changes of the database are said to be side-effects, since they affect the program in a way that is outside the logical interpretation of the given definitions.


>> Dynamic changes to the database have advantages and drawbacks. On the plus side, they let you incorporate knew knowledge and rules, and you automatically benefit from built-in features such as argument indexing. However, running the same query twice can lead to different results, changes in the order of goals can affect the meaning of the conjunction, and you can no longer reason about different predicates in isolation. Dynamic changes of the database are said to be side-effects, since they affect the program in a way that is outside the logical interpretation of the given definitions.

I wonder if that is part of the benefits advertised by DDlog, the ability to update a program without having to worry about clause ordering (which one does not have to, with datalog and bottom-up evaluation) or generally side effects.

Very tentatively, I'm guessing that, since datalog programs lack negation, any new facts or rules added to the (extensional) database will update the "output" of the database (its Herbrand model) monotonically, i.e. you'll see new facts comeing out but no facts will "disappear". I'm a bit unsure because the DDlog github is talking about functions and typed datalog and I'm not really sure how that works now.


Most modern datalogs, this one included, support negation, with some limitations.


Well, of course you can have negation and functions in "a" datalog language (there are different variants) but the point is that bottom-up evaluation is not then decidable.

Out of curiosity, what kind of negation do you mean? Negation as failure or classical negation? And what do you mean about limitations?


Hmm, I thought stratifying Datalog allowed bottom-up evaluation to remain decidable despite the presence of negation? And surely there are at least some kinds of functions that would also preserve deciability, even without stratification?

I admit I don't know very much about Datalog, and I know you know a substantial amount, so maybe I'm just confused. But I'd like to understand how.


Eh, to be honest I don't really know a substantial amount about Datalog. My specialty is Inductive Logic Programming and I'm interested in Datalog because it's decidability makes it a convenient represantation language for machine-learned logic programs. A very simple version of Datalog is sufficient for my purposes so I haven't really dabbled much in variations or more modern versions.

So I might well be wrong about decidability depending on the absence of negation. Now that I think of it again, the problem with negation, in particular the negation-as-failure (NAF) in Prolog, is that it breaks the monotonicity of inference. Monotonicity in this context means that the model of a logic program (the set of facts that are immediate consequences of the facts and rules in the program, and that are derived in a bottom-up fashion with a TP operator, in let's say "traditional" Datalog) can only increase with the introduction of new facts. With NAF, this is not the case -introduction of a new fact may make a previously derived true fact now false. So maybe the problem with negation, at least NAF, is that it breaks the soundness of the inference procedure because facts already derived as true become false when new facts are derived.

I give a (very crude, sorry) example of how Datalog's bottom-up evaluation works in an earlier comment:

https://news.ycombinator.com/item?id=26522737

In short, bottom-up evaluation proceeds in discrete steps where at each step a new set of facts is derived from the facts and rules known so-far. Newly derived facts are added to the program so the set of facts in the program increases until no new facts can be derived. Without NAF, when a new set of facts is derived and added to the program, the truth value of already derived facts cannot change. But _with_ NAF, bottom-up evaluation may introduce a new fact in step k that makes a fact derived in step k - j false. So now we have an unsound derivation procedure.

I wonder if this unsoundness actually translates to undecidability. Suppose the above happens - we introduce a new fact A and some existing fact B becomes false. What can we do to avoid this? Well, we can't know the truth value of B after the derivation of A before actually deriving A (because we don't know the truth of A before we can derive it), so we can't avoid deriving A. What we can do is go back and re-evaluate the truth of each derived fact, find that B is now false, and remove it from the program. But what if removing B allows a new fact, C, to be derived which was previously false because of B, and C is such that A is now false? Well, A cannot be derived given C, and without A we must derive B. Which means we have to get rid of C again and allow A back in. So now we're stuck in a loop.

I don't know if this is actually something that can happen or not, I was just thinking through the problem right now given your question, so I may be talking nonsense. In any case, if there is such a problem with NAF, then maybe modern datalogs have a different negation scheme, allowing classical negation (as does Answer Set Programming). But I really don't know about that, hence my question to the OP about the kind of negation they mean.

Which was a genuine question btw! I really don't know how modern Datalogs work.

Anyway I'll go check a few sources and see if my sketch proof of undecidability above makes any sense and I'll come back to let you know what I find :)


So this might take a while- there is a lot of literature on datalog variants with and without negation or recursion etc. I will have to check with some coleagues. I'll try to come back to you by Tuesday the latest.

In the mean time you might find the following reference useful (or not because it's typically thick in opaque notation):

http://webdam.inria.fr/Alice/

It's a textbook on databases. There are several chapters on datalog, including negation and recursion which may be of some help, but so far I can't form an intuition about when datalog is decidable with negation. I think it is, but with restrictions e.g. stratification, as you say, or restricting to non-recursive programs, disallowing nonmonotonic updates (as in my sketch proof above) etc. Indeed, it's a bit frustrating because the literature seems to be endlessly fragmented into sub-languages with different properties that are difficult to summarise concisely and I could just not find a straightforward answer "this works" as far as I looked (which was a couple of hours since I left my previous comment).

As I say in my comment above, there is a substantial amount I _don't_ know about datalog :)


Ouch. There's 6 upvotes on this comment which I take to mean there's (at least) 6 people waiting to hear what I come up with.

Unfortunately dear co-HN'ers I've come back to you empty-handed (and two days late... oh god). I contacted a colleague who is much more knowledgeable about bottom-up evaluation than me but I haven't heard back yet and that contact, plus my light reading of sources is really 90% of the bandwidth I have to devote to this right now. I have a paper deadline on the first of May that necessarily has to take up most of my brain. If any of you 6 (or anyone else!) is still interested by that time, please email me at the email in my sig and I'll be happy to dig further then, because this is something I'm actually intersted, too.

Sorry for letting everyone down.


You're not letting us down! We're collaboratively exploring a fascinating space of promising ideas!

Please don't feel obligated to answer the following before May, or even read it.

I didn't realize the space of Datalogs was so broad—I had thought that there was a single "stratified semantics" that was the standard. The stratified semantics that I knew about permits bottom-up evaluation with negation as failure to be decidable by putting negations of any predicate X/n into a stratum strictly higher than X/n (which is evaluated later), and non-negated consequences of X/n into a stratum equal to or higher than X/n's stratum.

Like, as I understand it, if we have

   foo(X) :- bar(X), \+ baz(X, 3).
   quux(X) :- foo(X).
then we put baz/2 in a lower stratum than foo/1, and quux/1 in a stratum not lower than foo/1. That way, by the time we are trying to infer foo/1 facts and quux/1 facts, we've already finished inferring all the baz/2 facts. Then, there's no way that we can infer more baz/2 facts in the future that could invalidate our foo/1 or quux/1 inferences.

And this avoids the kind of non-monotonic loop you're talking about: if inferring fact A invalidates the inference of fact B, then B is in a strictly higher stratum than A, so we haven't inferred B yet at the time that we infer A.

I think that with unrestricted negation the inconsistency or nontermination problem you mention does exist, in which there exists no (consistent) model for a program (is that the right terminology?); the simplest example would be:

    epimenides :- \+ epimenides.
The stratification restriction avoids this because it would require the stratum of epimenides/1 to be strictly greater than itself, which a standard CLP(FD) solver like SWI-Prolog's will easily tell you cannot be done:

    ?- use_module(library(clpfd)).
    %   library(error) compiled into error 0.00 sec, 17,872 bytes
    %  library(apply) compiled into apply 0.00 sec, 29,088 bytes
    %  library(assoc) compiled into assoc 0.00 sec, 36,240 bytes
    %  library(lists) compiled into lists 0.00 sec, 25,304 bytes
    %  library(pairs) compiled into pairs 0.00 sec, 9,040 bytes
    % library(clpfd) compiled into clpfd 0.03 sec, 736,792 bytes
    true.

    ?- X #> X.
    false.
I think there's an additional problem as well, corresponding to Henkin sentences—programs for which models do exist, but for which there is no unique minimal model. For example, we could try to formalize George W. Bush's foreign policy:

    withUs(You) :- person(You), \+ withTheTerrorists(You).
    withTheTerrorists(You) :- person(You), \+ withUs(You).
    person(chirac).
Now we have two minimal models, one in which person(chirac), withUs(chirac) and one in which person(chirac), withTheTerrorists(chirac). (Thus we can explain Freedom Fries, one of the most ridiculous and terrifying aspects of the politics of the early 02000s.) Further inference rules might rule out one, the other, or both of these models.

In the same way, the stratification rule rejects this program: withUs/1 needs to be above (evaluated later than) withTheTerrorists/1, but also vice versa. I'm not sure how to state this so SWIPL's CLP(FD) can tell that it's unsolvable, but that's probably because I'm a total noob at logic in general:

    ?- X #> Y, Y #> X.
    X#=<Y+ -1,
    Y#=<X+ -1.

    ?- X #> Y, Y #> X, label([X, Y]).
    ERROR: Arguments are not sufficiently instantiated
Of course it is not a hard problem to assign strata—a straightforward topological sort will immediately reject the loop.

The stratification rule is conservative, though; not only will it reject all the undecidable problems, it will also reject some programs where a unique minimal model does exist; as a trivial example:

    withUs(chirac).
    withUs(You) :- person(You), \+ withTheTerrorists(You).
    withTheTerrorists(You) :- person(You), \+ withUs(You).
    person(chirac).
So it makes sense that people would look for looser restrictions that still guarantee decidability.

Anyway, that's my understanding of the situation! I might be mistaken about basic things.

Thanks for the reference to the book! It looks great and I'll try to read it. Or, I guess, work through it.


In the same way, the stratification rule rejects this program: withUs/1 needs to be above (evaluated later than) withTheTerrorists/1, but also vice versa. I'm not sure how to state this so SWIPL's CLP(FD) can tell that it's unsolvable, but that's probably because I'm a total noob at logic in general:

    ?- X #> Y, Y #> X.
    X#=<Y+ -1,
    Y#=<X+ -1.

    ?- X #> Y, Y #> X, label([X, Y]).
    ERROR: Arguments are not sufficiently instantiated
I'm not very sure about this either, I haven't used CLP(FD) much. triska, the OP in this thread, is the author of the CLP(FD) library and he could help but I guess he's probably moved on from this thread by now.


>> You're not letting us down! We're collaboratively exploring a fascinating space of promising ideas!

Thanks, that's a great kind of attitude to have. dang (always a champion of intellectual curiosity) will be proud of us :)

I did have a quick look through your comment and I think it's insightful and more correct than not, far as I can tell.

About this:

  epimenides :- \+ epimenides.
That is a contradiction and so does not have any model, as you say. While startification would indeed avoid this, deriving an empty model (or failing to derive a model) is correct behaviour because the formula is unsatisfiable. However (1) is a well-formed formula, certainly in First Order Logic, so it's not an invalid program, per se. In general contradictions and tautologies tend to be treated specially or ommitted from discussion, in logic programming theory often, I find, which makes sense because they're kind of obvious corner cases that nobody wants to spend too much time on. "p if not p" should always fail quickly.

Regarding minimal models, according to the book I linked, there are two datalogs with negation, datalog¬ (which does not allow recursion) and datalog¬¬ and they indeed eliminate the guarantee that a program has a unique minimal model under fixpoint semantics- so while a program is still guaranteed to have _a_ fixpoint that is reachable in finite time, this fixpoint is not guaranteed to be unique or minimal.

As to my sketch proof of undecidability, that would actually not work in pure datalog, where the deletion of facts is not allowed. Deletion of facts is allowed under datalog¬¬, whose syntax accepts rules with negated _heads_ that are interpeted as signifying a fact should be deleted. There are different options for the treatment of negated rule heads (e.g. deriving A and ¬A simultaneously can be interpreted as a contradiction that makes the program unsatisfiable) but now the guarantee of termination is lost because there is no guaranteed fixpoint. This is all from 14 of the Alice book and the example given is of a program that flip-flops between two models, much simpler than my three-atom case above, which I think after all is probably fictitious :)

>> I think there's an additional problem as well, corresponding to Henkin sentences—programs for which models do exist, but for which there is no unique minimal model.

I didn't know about Henkin sentences and I don't know how datalog (of any kind) semantics deal with them. An interesting lead for further reading, thanks.


Thank you! I'll be playing around with these ideas. BTW, I sent you an email, so we can correspond further when you aren't on a paper deadline :)


Thanks, yeah, I saw your email last night. Was too tired to reply. Well, please accept this a reply and thank you for getting in touch :)


The notably limitation being that you can't combine negation and recursion.


Thanks, I think that's the main intuition to draw from this. But I figured OP (kragen) wanted to understand the why, also.


Indeed!


This is interesting and Scryer Prolog looks like a great project. I wonder if it's possible to run it in a DB-like mode - with durable changes, and accessible as a server.


There is already https://developer.logicblox.com/ but it's proprietary owned by Infor nowadays.


I have been very interested in CHR (constraint handling rules) for this kind of activity. It was developed (I think) to provide a systematic basis for the various constraint libraries to 'speak the same language' instead of being implemented without knowledge of each other.

I don't see much literature out there on it (after the mid-2000s) but it always seemed a conceptually elegant way of doing forward-chaining (and inference) in the way of most rules engines. It only has a few simple rules: simplification (remove data/rules), simpagation and propagation (new data).

The best part is that it embeds very easily in Prolog.


The company I work for, Grakn Labs (grakn.ai), builds a database that does datalog + negation type rules! You can add/remove data as you go, and also add/remove rules in the schema over time. The terminology the post here uses is different from ours, but roughly we do "Backward-chaining", which starts from the query to the set of facts that are inserted, which the article calls "top-down".


Do you handle incremental changes to the input data? Because that's the major selling point of DDlog: low-latency responses to changes in inputs.


We don't do this right now - optimising for incremental changes is actually better done by maintaining forward-chaining inferences (like the OP) rather than backward chaining, and then invalidating and updating just the inferences that have modified dependencies. Which you want depends on use case often: if you have often changing data backward chaining is generally better, but if it's largely static you can get large performance wins by materialising all the inferences, once, via forward-chaining.


Isn’t datalog with backward chaining just prolog? Datalog was known for forward chaining (bottom up) while prolog was known for backward chaining, datalog with backward chaining sounds like a misnomer.


You're not wrong - the other big difference between datalog and prolog is that datalog is a subset of prolog's functionality that enables forward-chaining to be safe (gets trickier with negation for example). I was mostly referring to the fact that Graql implements datalog semantics/capabilities, plus negation - but the execution strategy is backward-chaining.


This gives you dynamic facts, feed-forward rule application, and soon even dynamic rules.

It doesn't actually care much about the number of rules, and more about how much memory it costs to materialize them. There are delta-join techniques to do at least most of that in linear memory (w.r.t. input size) though, and DDlog has at least experimental support for them.


It's great to see differential-dataflow and timely-dataflow (both by Frank McSherry) come to life recently with this and materialize.com


Aside from their custom syntax in DDlog, how much semantic overlap is there with the query language of Datomic/Crux/Datalevin/Datascript flavour? Apart from the differential features. This one seems to be big on static type definitions at least.

edit: I guess this explains some of the static nature: "A DDlog program is compiled into a Rust library"


You don't really use dynamic datalog queries with DDlog. Instead you write rules that handle your queries in a streaming fashion ("prepare" statement at compile time, stream tuples of parameters at runtime and listen for outputs), if you can't afford to keep a materialized view up-to-date that turns your queries into simple KV-lookup operations.

But, do not worry: Support for run-time rule updates is in the works and would allow for arbitrary dynamic datalog queries.


So could you use this to implement always-up-to-date materialized views in Postgres? It says it uses relational data for inputs and outputs. It says it operates in-memory but also that it starts up from a set of known facts (which sounds like it could just read the current matview at startup). This would be a really cool extension for someone to build!


Yes. Almost exactly what you're describing exists [0] already.

[0]: https://materialize.com/


Looks like it’s Postgres protocol compatible rather than running as a Postgres extension?

I’ve been keeping an eye on work that’s been done to support this internally in Postgres. Looks like it’s still moving along. I guess it’s a pretty tricky puzzle (especially once you add aggregates). Depending on the performance implications, it would sit in a pretty sweet spot for some usecases. For example, at the moment I have a form of materialised view that’s simple too slow to build from scratch. My compromise is to run periodically, figure out which subset of the table needs refreshing and update just those rows.

https://wiki.postgresql.org/wiki/Incremental_View_Maintenanc...


It doesn't run as an extension (at least that's not the typical deployment, and I'd be surprised if it did).

You can feed a CDC to Materialize and either feed the output back to Postgres, or just query it directly in Materialize (via a normal postgres client).

It's very likely that you can cut the typical maintenance latency down below one second. Currently that likely involves using a Kafka broker between the two databases, but feel free to plop in the community Slack [0].

[0]: http://materialize.com/s/chat


What a horrible landing page. Is it a stand alone product? Does it integrate with my SQL database? Is it a server side application or a client side thing? What is the pricing model? So many questions, zero answers

Am in the process of exploring kafka to do something similar, and I think it will work but requires quite a complex setup (compared to an incrementally updated materialized view)


It's typically integrated with Kafka sources (and often also Kafka sinks), but you can just use any normal PostgreSQL client driver and normal SQL.

It does integrate with your postgres database, but logical replication target mode is still a TODO. Don't worry though, this only means the change data stream has to go through a Kafka broker. There should be a guide on this (if not, plop in the community Slack and ask).

The license allows for free single-node deployments; the SaaS variant is currently in Beta, so you'll have to ask. On-premise clusters for production usage are "contact sales", but AFAIK due to the low sales volume (turns out, vertical scaling is easy and often enough), so don't let that scare you off.


>> It says it operates in-memory but also that it starts up from a set of known facts (which sounds like it could just read the current matview at startup).

That "it starts up from a set of known facts" refers to the "bottom-up" evaluation of datalog programs, in contrast with the top-down evaluation in Prolog.

Consider for instance the following database of facts and rules:

  grandparent(X,Y):- parent(X,Z), parent(Z,Y). % (1)
  parent(maria, kostas).
  parent(kostas,yannis).
Prolog's top-down evaluation starts with the "head" of rule (1), grandparent(X,Y) and proceeds to its "body", (parent(X,Z), parent(Z,Y)) to instantiate the variables {X,Y,Z} according to the facts in the database and derive the new fact grandparent(maria,yannis).

Datalog's bottom-up evaluation instead starts with the facts in the program: parent(maria,kostas), parent(kostas, yannis) and adds those to the set of immediate consequences of the program (TP is called the "immediate consequence operator"):

  TP₀ = {parent(maria,kostas), parent(kostas,yannis)}
Now the "parent" facts in the body of the "grandparent" rule can be instantiated as follows:

  (parent(maria,kostas), parent(kostas,yannis)).
And finally the head of the rule can be instantiated to derive the same new atom as with Prolog's top-down evaluation:

  TP₁ = TP₀ ∪ {grandparent(maria,yannis)}
The advantage of bottom-up evaluation is decidability, as long as a program is datalog, which means no function symbols other than constants (which are function symbols of arity 0, i.e. with 0 arguments) and no negation. For instance, given the following program with a left-recursive clause, datalog's bottom-up evaluation terminates:

  ancestor(X,Y):- parent(X,Y). % (2)
  ancestor(X,Y):- ancestor(X,Z), ancestor(Z,Y). % (3)
  parent(maria,kostas).
  parent(kostas,yannis).
Given this program, datalog's bottom-up evaluation will first add the two facts of "parent" to the immediate consequences of the program; then instantiate the fact parent(X,Y) in the body of rule (2) to the two facts of "parent" in the program; and finally instantiate the head of rule (2), deriving two new facts of "ancestor":

  TP₁ = {ancestor(maria,kostas), ancestor(kostas,yannis), parent(maria,kostas),
  parent(kostas,yannis)}
Then the two facts of "ancestor" in the body of rule (3) will be instantiated to the facts derived in the previous step and a new fact derived from the instantiation of the variables in the head of the rule :

  TP₂ = TP₀ ∪ {ancestor(maria,yannis)}
And then the program will terminate because there are no more new facts to derive.

By contrast, Prolog's top-down evaluation will first correctly derive the two new "ancestor" facts from the head of rule (2) but then will try to evaluate the first ancestor(X,Z) fact in the body of rule (3). This will take it back to the head of rule (3) and evaluation will be stuck in an infinite recursion.

The trade-off is that bottom-up evaluation cannot handle negation or functions (including the cons function used to construct lists) and so is, in that sense, incomplete, whereas Prolog can do both. For this reason, datalog is preferred as a database language that offers a limited ability for reasoning (certainly richer than what is possible e.g. with SQL, that essentially only allows facts) whereas Prolog is preferred for general-purpose programming.


I wonder how the DDLog compares to Souffle. I think that the biggest difference is that Souffle is batch-oriented. It is less clear how different are the expressiveness, type systems or the recommended patterns. The syntax of DDLog looks seems to be closer to Souffle than to Datomic with EDN.


I don't know anything about Souffle so I can't directly answer the question. But there is some related material.

The DDlog repo includes a Souffle-to-DDlog converter: https://github.com/vmware/differential-datalog/blob/v0.38.0/...

The Souffle converter is somewhat incomplete: https://github.com/vmware/differential-datalog/issues/174


Very interesting! I started playing with Rust, timely-dataflow and differential-dataflow last weekend, but it was a lot to learn all at once. This looks like it's solved or suggested a path to productionizing differential-dataflow without needing a separate rust service and lots of operation tooling. It would be cool to use something like this for Notion's user-defined databases and queries.


You'll be happy to hear that logic/rule updates at runtime recently transitioned from the "todo... at some time" to the "todo... working on it" state.

Do these databases ever grow beyond what you can afford to keep in RAM/swap? I'm working towards high-throughput out-of-core support in differential-dataflow, but can't get much actual coding done in my off time. Understanding/concepts/discussions are not affected, so feel free to contact me.


I would love to have the option to suspend computation for unwanted outputs somehow (to disk) so I can scale way, way down with only partial data. My solution for “fits in RAM” is just “shard it”... but really I haven’t gotten far enough in my experiments to need to yet.


> You'll be happy to hear that logic/rule updates at runtime recently transitioned from the "todo... at some time" to the "todo... working on it" state.

What exactly do you mean with "logic/rule updates" ...in dd I can change/modify the actual dataflow?


Rego, the language used in Open Policy Agent, is based on and extends Datalog. It's gaining a lot of traction in the past couple of years for evaluating authorization policies.


There's an indirect relationship between Rego and DDlog, at least people-wise. OPA comes from Styra, which was founded by Tim Hinrichs and Teemu Koponen. Teemu designed nlog, which is also a Datalog variant, for use in NVP, which was the network virtualization product at Nicira and was later renamed NSX after VMware acquired Nicira. Tim also worked with Teemu (and me) on NSX at VMware. And Teemu was one of the architects of OVN (the biggest open source application for DDlog), with Tim also having some influence. And Teemu also knows Leonid and Mihai (the principal authors of DDlog).

Some of the episodes of my OVS Orbit podcast may be relevant:

* Episode 5: nlog, with Teemu Koponen from Styra and Yusheng Wang from VMware (https://ovsorbit.org/#e5)

* Episode 44: Cocoon-2, with Leonid Ryzhyk from VMware Research (https://ovsorbit.org/#e44)

* Episode 58: Toward Leaner, Faster ovn-northd, with Leonid Ryzhyk from VMware Research Group (https://ovsorbit.org/#e58)


Ozo[1] is another authorization language, but based on Prolog. It'd be interesting to compare the capabilities between the two.

[1] https://www.osohq.com


I'm curious what VMware uses this for.


I don't how much I can say since I work for VMware, but it's finding new uses all the time and I've personally brought it up with other teams.

I'd known of it a while before seeing a presentation on how one service team used it. They had a stream processing problem where they were beginning to lag the input by up to an hour. With differential datalog the lag dropped to sub-second.

It's seriously cool technology.


I'm told it's primarily used for "networking stuff and big data processing", but "log processing" and "network switches" were also mentioned as (I assume active) use-cases.


Section 4 of [1] lists applications, including OVN, but it appears to be used primarily for experimental or research projects at the moment, but that should change given enough time… maybe?

1: https://github.com/vmware/differential-datalog/blob/master/d...



Yes, and I'm really impressed with what we've been able to do with it in OVN lately. The patches aren't posted yet (probably Monday) but (with Leonid Ryzhyk's help) I managed to take one benchmark from about 110 seconds in the C version of OVN to about 35 seconds with DDlog.

We have other internal-to-VMware use cases for DDlog. The OVN use case is the one that's easiest to evangelize since it's all open source.


Maybe for hot migration?


Pretty sure not. Posted in reply to the sibling. This is a VMware research project but looks like the target use case is for OVN:

https://blogs.vmware.com/opensource/2018/07/26/open-source-o...

https://blogs.vmware.com/research/2020/12/07/whats-new-diffe...

https://twitter.com/Ben_Pfaff/status/1047905803250724865


Oh interesting, thanks!


Speaking with my VMware hat on, I don't know of use cases there.


A related project is declarative dataflow, which is an interactive and reactive dataflow engine built on top of differential dataflow. https://github.com/comnik/declarative-dataflow


Is this sth like linq in c#? Declarative programming over arbitrary structures? From skimming the examples i don't really see the advertised incremental aspect, could someone explain it a bit more?


It's partially inspired by Linq, so the similarity you see is expected.

It's not really arbitrary structures so much, though you're mostly free in what record type you use in a relation (structs and tagged enums are typical, though).

The incremental part is that you can feed it changes to the input (additions/retractions of facts) and get changes to the outputs back with low latency (you can alternatively just use it to keep an index up-to-date, where you can quickly look up based on a key (like a materialized view in SQL)).

This [0] section in the readme of the underlying incremental dataflow framework may help get the concept across, but feel free to follow up if you're still not seeing the incrementality.

[0]: https://github.com/TimelyDataflow/differential-dataflow#an-e...


Building it from source looks tricky but the concepts are quite interesting.


If you're talking about building the DDlog-to-Rust compiler, then it's not that hard. Really it's just a matter of installing Haskell then typing "stack build". But each release also comes with binaries for GNU/Linux, OS X, and Windows, e.g. for 0.38.0: https://github.com/vmware/differential-datalog/releases/tag/...


I wonder how long before someone makes a SaaS.


That exists[0] already, but with Postgres-compatible SQL.

But the idea of SaaS-ing the datalog variant is interesting.

[0]: https://materialize.com/




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: