Very interesting. I came to many similar conclusions completely independently, even attempted to build a typesafe in memory datalog in typescript.
I also came to the conclusion that just exposing datalog triples as a query language would never feel right and tried to expose a graphql like language that generated the datalog triples.
IMO react relay offers a great similar offering with their normalized cache. Relay has great DX too and can be totally type safe. To my knowledge datalog is way too dynamic for static analysis.
That being said, I would love to try Instant out. I'm really happy to see innovation in this area.
Thank you for the kind words! Both Joe and I are around if you any feedback. In terms of type-safety, we're thinking of it as an added layer. I'm optimistic, that since InstaQL provides objects as the interface, we could have more idiomatic types down the road.
This comment thread looks like it could benefit from Dgraph, which is an open source GraphQL-layer database built on top of a RDF n-quad store (badger).
I used to work on Firebase. I also use Firebase heavily in my work and personal projects.
I am normally quite skeptical of “Firebase but X” projects because they often seem to be running head first into the issues that the Firebase team so skillfully avoided.
This isn’t one of those projects. The author of this essay clearly gets what you need to actually build a backend for a modern app that’s also simple enough to get started with. I’m very excited about this project.
Are transactions supported in that solution or in the mentioned Datalog related ecosystem? I have built my own similar reactive inmemory triple store with typescripts type safety, but then I realized I still need transactions which are a bit of a pain, because transactions are bundling the otherwise separete triplets, so the atomic, independent logic of triplets and related effects breaks a bit. (I am sure it's solvable.)
The example code uses a `transact` function. But it really depends on what you mean by “transaction”. We don’t use a triple store at Notion, but we do use an abstraction that ensures a collection of operations either all succeed or all fail. We don’t support “interactive” transaction, where you can read, modify, write as an atomic group. This just isn’t desirable in a multiplayer or offline system - in cases we need that kind of consistency we use a normal HTTP API which is online-only.
Cool, thanks! Now a question: how a dev picks up related knowledge? I have a BSc and extra 7 years in development, but this area was totally gray to me. My motivation was only to come up a scalable solution for offline first apps with some kind of automatic persistence support, so at the end of the day my design goals were quite similar. How do you come up with stuff at notion? Are there must have books or just going with gut, experience and existing solutions you are aware about?
As jitl points out, in multiplayer settings, what you need is some way to commit a series of transactions all-together or not at all. We support this.
In the case where you want to read the database inside your transaction, we take inspiration from Datomic. Datomic runs all mutations in one high-memory box. You can provide functions that run in that box. This way, you can guarantee that the reads inside your transaction have the latest value. There's a lot of UX to figure out there, and this would be something to try to avoid in an offline-available setting.
Yes exactly this UX issue bothers me a lot, I wouldn't even go there. :D
What about migrations? Do you support? That's another thing I need in my offline first project, one of my other project has died because the lack of it. (I need something which plays well with Expo.io)
Migrations are very tough when you go offline-first. Cambria [^1] is an interesting read. For Instant, we are schemaless and think about offline more like a cache. In our case it's less of a problem.
Thank you! Having one of my projects died to it (lack of migrations, offline-first) I had promised myself I'll first solve that problem before writing a meaningful line of business code in a future project. :)
I'm working on building a database in the same space as InstantDB. Currently, it's an "object/graph database using Protobuf". There's a check to ensure updated Protobuf definitions are backwards-compatible. Of course, this still implicitly relies on using Protobufs correctly (i.e. a missing value is the same as zero/empty/null/nil), even though I'm trying to make it safe by default.
I'm curious what your needs are. Would you mind elaborating on what kind of migrations your project would have needed to not die?
Oh I just realized I have been already following you. :) Is that possible that you've been using cyclejs/xstream at some point? :) Or maybe from Future of Programming slack (or how it is called. :))
The basic intuitions they have are correct, but if feels like there is a serious lack of or disregard for the theoretical background needed to talk about this stuff properly.
> (pull db '[* {:team/task [* {:task/owner [*]}]}] team-id)
That's not Datalog. It's kind of a mix between a conjunctive query and a regular path query.
Datalog isn't even really a query language. It's a class of query languages with a specific expressive power.
Whereas SQL is essentially conjunctive queries with non-stratified Negation, Datalog is recursive conjunctive queries with no or only stratified negation.
Datalog also has nothing to do with triples, they are two completely orthogonal concepts.
I've been slaving away in this very space for years and this post heavily reminds my of my initial hubris. Building something truly foundational that can be universally implemented and perform in all potential target languages from Javascript, to WASM via e.g. Zig and Rust, while at the same time being both conceptually simple, and easy to implement, is really really really difficult with a lot of pain lurking in the details.
> These triples say that the Layer with id 1 has a fontSize 20 and backgroundColor blue. Since they are different rows, there’s no conflict.
This sounds a lot like Bigtable (https://cloud.google.com/bigtable), which also does last-write-wins conflict resolution layer. So this is adding a GraphQL + frontend layer to it?
I'm not as familiar with Bigtable's data model. Afaik they don't use a triple-store like system, but their data model does look interesting. [^1]
The concept you have is on point though. You can think that we've moved a graph-database over to the frontend, and introduces a GraphQL-like language for it.
also need to fit whole view updates into 16ms frame budget (not just one query but every query on the page that is impacted by a change as well as downstream reactive views). At some tipping point it can be faster to move relational queries to the cloud (sacrificing local first), and treat local first as an edge case (not all page components need to be live in offline mode - depending on the app, you may just need document edits not relations)
The research paper looks intriguing. I'll look into it. I am not as convinced about moving queries to the cloud, when it comes to the north stars of Figma / Linear / Notion. It would be hard to get the same kind of UX.
At Firebase we sometimes pondered the feasibility of a SQL version, but the semantics of SQL seem littered with semantic footguns that don't lend themselves to offline, secure, scalable and event driven distributed applications. We know everybody wanted better query expressivity but delivering that in a mobile friendly, clienside, secure package was very difficult to see the path to.
tripletstore/datalog actually seems like a decent compromise between SQL and no-SQL that could actually work out! Awesome idea!
I would really use this for many of my projects!
For project A, I used Firebase which was great initially but eventually ran into the limitations well described in Stopa's essay.
For project B, I decided to write everything from scratch but the amount of work it required is crazy, and of course the result in quite brittle.
For some context, that blog post discusses what I saw as missing pieces of the stack circa 2020. The startup I’m building (https://driftingin.space) is roughly the “lambda for websockets” part, whereas Instant is closer to the “generalized CRDT data layer part”.
Thanks for taking a look at the essay. As you mentioned, indeed it's possible if you store all commands, and sync with IndexedDB. The edge cases get quite complicated though. We're optimistic that a local layer that handles transactions and sync could makes things a lot easier.
We used to do this for an app, but moved away from that eventually, because:
1. You need to support old command (and payload schema) indefinitely. Or need migrations for those instead.
2. In highly interactive apps, this needs a lot of chatter. A peer that is offline for a few weeks may need to sync many many commands before they get to the latest version, and people generally don't like waiting on sync to complete. And syncing while users are interacting with the app gets complex pretty fast.
We tried to alleviate 2 using things like compaction and merging of commands, but eventually it ended up being much more complex than syncing the latest ui state which was a much smaller payload for our case.
It also eliminated many of corner case bugs in our command history compaction logic where some clients could end up in an invalid or unexpected states in very specific scenarios which were very hard to reproduce. Doing aggressive compaction while retaining effective order can get tricky if the object model is complex and deeply interlinked.
I also came to the conclusion that just exposing datalog triples as a query language would never feel right and tried to expose a graphql like language that generated the datalog triples.
IMO react relay offers a great similar offering with their normalized cache. Relay has great DX too and can be totally type safe. To my knowledge datalog is way too dynamic for static analysis.
That being said, I would love to try Instant out. I'm really happy to see innovation in this area.