Hacker News new | past | comments | ask | show | jobs | submit login
A Graph-Based Firebase (stopa.io)
177 points by stopachka on Aug 25, 2022 | hide | past | favorite | 51 comments



Very interesting. I came to many similar conclusions completely independently, even attempted to build a typesafe in memory datalog in typescript.

I also came to the conclusion that just exposing datalog triples as a query language would never feel right and tried to expose a graphql like language that generated the datalog triples.

IMO react relay offers a great similar offering with their normalized cache. Relay has great DX too and can be totally type safe. To my knowledge datalog is way too dynamic for static analysis.

That being said, I would love to try Instant out. I'm really happy to see innovation in this area.


Thank you for the kind words! Both Joe and I are around if you any feedback. In terms of type-safety, we're thinking of it as an added layer. I'm optimistic, that since InstaQL provides objects as the interface, we could have more idiomatic types down the road.


This comment thread looks like it could benefit from Dgraph, which is an open source GraphQL-layer database built on top of a RDF n-quad store (badger).


I used to work on Firebase. I also use Firebase heavily in my work and personal projects.

I am normally quite skeptical of “Firebase but X” projects because they often seem to be running head first into the issues that the Firebase team so skillfully avoided.

This isn’t one of those projects. The author of this essay clearly gets what you need to actually build a backend for a modern app that’s also simple enough to get started with. I’m very excited about this project.


Thank you for the kind words.


Are transactions supported in that solution or in the mentioned Datalog related ecosystem? I have built my own similar reactive inmemory triple store with typescripts type safety, but then I realized I still need transactions which are a bit of a pain, because transactions are bundling the otherwise separete triplets, so the atomic, independent logic of triplets and related effects breaks a bit. (I am sure it's solvable.)


The example code uses a `transact` function. But it really depends on what you mean by “transaction”. We don’t use a triple store at Notion, but we do use an abstraction that ensures a collection of operations either all succeed or all fail. We don’t support “interactive” transaction, where you can read, modify, write as an atomic group. This just isn’t desirable in a multiplayer or offline system - in cases we need that kind of consistency we use a normal HTTP API which is online-only.


Cool, thanks! Now a question: how a dev picks up related knowledge? I have a BSc and extra 7 years in development, but this area was totally gray to me. My motivation was only to come up a scalable solution for offline first apps with some kind of automatic persistence support, so at the end of the day my design goals were quite similar. How do you come up with stuff at notion? Are there must have books or just going with gut, experience and existing solutions you are aware about?


As jitl points out, in multiplayer settings, what you need is some way to commit a series of transactions all-together or not at all. We support this.

In the case where you want to read the database inside your transaction, we take inspiration from Datomic. Datomic runs all mutations in one high-memory box. You can provide functions that run in that box. This way, you can guarantee that the reads inside your transaction have the latest value. There's a lot of UX to figure out there, and this would be something to try to avoid in an offline-available setting.


Yes exactly this UX issue bothers me a lot, I wouldn't even go there. :D

What about migrations? Do you support? That's another thing I need in my offline first project, one of my other project has died because the lack of it. (I need something which plays well with Expo.io)


Migrations are very tough when you go offline-first. Cambria [^1] is an interesting read. For Instant, we are schemaless and think about offline more like a cache. In our case it's less of a problem.

[^1]: https://www.inkandswitch.com/cambria/


Thank you! Having one of my projects died to it (lack of migrations, offline-first) I had promised myself I'll first solve that problem before writing a meaningful line of business code in a future project. :)


I'm working on building a database in the same space as InstantDB. Currently, it's an "object/graph database using Protobuf". There's a check to ensure updated Protobuf definitions are backwards-compatible. Of course, this still implicitly relies on using Protobufs correctly (i.e. a missing value is the same as zero/empty/null/nil), even though I'm trying to make it safe by default.

I'm curious what your needs are. Would you mind elaborating on what kind of migrations your project would have needed to not die?


Oh another thing: do you guys have twitter or something? :) Would love to follow the project and devs too.


We don't have a business twitter, but I'm @stopachka, and my cofounder is @JoeAverbukh. Thank you for the kind words :)


Oh I just realized I have been already following you. :) Is that possible that you've been using cyclejs/xstream at some point? :) Or maybe from Future of Programming slack (or how it is called. :))


I haven't checked these out, but I'm definitely intrigued. Peaked at cyclejs -- I'm a fan of FRP, and more recently structured concurrency.


Isn't FRP just naturally superior? :)


The basic intuitions they have are correct, but if feels like there is a serious lack of or disregard for the theoretical background needed to talk about this stuff properly.

> (pull db '[* {:team/task [* {:task/owner [*]}]}] team-id)

That's not Datalog. It's kind of a mix between a conjunctive query and a regular path query.

Datalog isn't even really a query language. It's a class of query languages with a specific expressive power.

Whereas SQL is essentially conjunctive queries with non-stratified Negation, Datalog is recursive conjunctive queries with no or only stratified negation.

Datalog also has nothing to do with triples, they are two completely orthogonal concepts.

I've been slaving away in this very space for years and this post heavily reminds my of my initial hubris. Building something truly foundational that can be universally implemented and perform in all potential target languages from Javascript, to WASM via e.g. Zig and Rust, while at the same time being both conceptually simple, and easy to implement, is really really really difficult with a lot of pain lurking in the details.


do you write about your findings and ideas? i would be interested to read more.


I regularly regret not writing a blog...

But if you want to talk about some ideas feel free to drop by at discord.gg/tribles


> These triples say that the Layer with id 1 has a fontSize 20 and backgroundColor blue. Since they are different rows, there’s no conflict.

This sounds a lot like Bigtable (https://cloud.google.com/bigtable), which also does last-write-wins conflict resolution layer. So this is adding a GraphQL + frontend layer to it?


I'm not as familiar with Bigtable's data model. Afaik they don't use a triple-store like system, but their data model does look interesting. [^1]

The concept you have is on point though. You can think that we've moved a graph-database over to the frontend, and introduces a GraphQL-like language for it.

[^1]: https://static.googleusercontent.com/media/research.google.c...


Very cool, super exciting to see a clean frontend to it. As a big Firebase fan, I'm looking forward to what you build :)


Another product that does all this (relations, reactivity, offline, permissions, sync, etc…) really well, is Realm.

It is beyond awesome for building mobile apps, but mobile only. Annoyingly they have never released a web version. No idea why.

I would still look into it for ideas. It is super powerful and a joy to work with. I would love to see something like that working in the browser.


One possible solution to the "sync" layer (how do you sync the local db shard with the global consensus) looks like this: https://www.researchgate.net/publication/359578461_Continuou... (2022)

also need to fit whole view updates into 16ms frame budget (not just one query but every query on the page that is impacted by a change as well as downstream reactive views). At some tipping point it can be faster to move relational queries to the cloud (sacrificing local first), and treat local first as an edge case (not all page components need to be live in offline mode - depending on the app, you may just need document edits not relations)


The research paper looks intriguing. I'll look into it. I am not as convinced about moving queries to the cloud, when it comes to the north stars of Figma / Linear / Notion. It would be hard to get the same kind of UX.


At Firebase we sometimes pondered the feasibility of a SQL version, but the semantics of SQL seem littered with semantic footguns that don't lend themselves to offline, secure, scalable and event driven distributed applications. We know everybody wanted better query expressivity but delivering that in a mobile friendly, clienside, secure package was very difficult to see the path to.

tripletstore/datalog actually seems like a decent compromise between SQL and no-SQL that could actually work out! Awesome idea!


I'm encouraged reading this. Thank you.


Does Instant have any plans for iOS and Android SDKs? One of the biggest wins with Firebase is that it's really easy to integrate across platforms.


Yes! Right now one of our users implemented an RN integration. We'd love to support iOS and Android soon.


I would really use this for many of my projects! For project A, I used Firebase which was great initially but eventually ran into the limitations well described in Stopa's essay. For project B, I decided to write everything from scratch but the amount of work it required is crazy, and of course the result in quite brittle.


Thrilled to hear this, thank you :).


Another project with similar goals in mind: https://paulbutler.org/2020/the-webassembly-app-gap/

Also backed by YC.


Thanks for the mention!

For some context, that blog post discusses what I saw as missing pieces of the stack circa 2020. The startup I’m building (https://driftingin.space) is roughly the “lambda for websockets” part, whereas Instant is closer to the “generalized CRDT data layer part”.


Is this blog post a company? What company?



This looks like an evolution of Meteor, doesn't it?


Meteor certainly inspires us!


What would you say are your biggest differences?


Relations, and less vertical integration


Very cool! What're the differences between this and something like Replicache?


I haven't looked too deeply at Replicache, but the two differences as I understand it:

1. Replicache is a layer you add over an existing backend. Instant handles the backend. 2. The expose a key-value store. We expose a graph store.

Both 1&2 have pros and cons each way. I think we're inspired by the same problems, and their docs are really well done.


Are there any plans to open source instant in the future ?


We like the open-core mode. We have definite plans to open-source the client. Still think the best way to do this for the server.


I use Hasura for this purpose. With some hooks, you can achive offline mode, too.

You need a bit tricky hacks to use Hasura permission system the way you want, though.


Thanks for taking a look at the essay. As you mentioned, indeed it's possible if you store all commands, and sync with IndexedDB. The edge cases get quite complicated though. We're optimistic that a local layer that handles transactions and sync could makes things a lot easier.


Hasura does not offer syncing.

Syncing is hard :)


Aha, yup. My way of offline, is just store all commands offline and sync the command only. Syncing the view is hard.


We used to do this for an app, but moved away from that eventually, because:

1. You need to support old command (and payload schema) indefinitely. Or need migrations for those instead.

2. In highly interactive apps, this needs a lot of chatter. A peer that is offline for a few weeks may need to sync many many commands before they get to the latest version, and people generally don't like waiting on sync to complete. And syncing while users are interacting with the app gets complex pretty fast.

We tried to alleviate 2 using things like compaction and merging of commands, but eventually it ended up being much more complex than syncing the latest ui state which was a much smaller payload for our case.

It also eliminated many of corner case bugs in our command history compaction logic where some clients could end up in an invalid or unexpected states in very specific scenarios which were very hard to reproduce. Doing aggressive compaction while retaining effective order can get tricky if the object model is complex and deeply interlinked.

Not saying it is not a valid approach, but Ymmv.


Tell me more about this!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: