We'd love to share our work with you: Restate, a system for workflows-as-code (durable execution). With SDKs in JS/Java/Kotlin and a lightweight runtime built in Rust/Tokio.
https://github.com/restatedev/
https://restate.dev/
It is free and open, SDKs are MIT-licensed, runtime permissive BSL (basically just the minimal Amazon defense).
We worked on that for a bit over a year. A few points I think are worth mentioning:
- Restate's runtime is a single binary, self-contained, no dependencies aside from a durable disk. It contains basically a lightweight integrated version of a durable log, workflow state machine, state storage, etc. That makes it very compact and easy to run both on a laptop and a server.
- Restate implements durable execution not only for workflows, but the core building block is durable RPC handlers (or event handler). It adds a few concepts on top of durable execution, like virtual objects (turn RPC handlers into virtual actors), durable communication, and durable promises. Here are more details: https://restate.dev/programming-model
- Core design goal for APIs was to keep a familiar style. An app developer should look at Restate examples and say "hey, that looks quite familiar". You can let us know if that worked out.
- Basically every operation (handler invocation, step, ...) goes through a consensus layer, for a high degree of resilience and consistency.
- The lightweight log-centric architecture gives Restate still good latencies: For example around 50ms roundtrip (invoke to result) for a 3-step durable workflow handler (Restate on EBS with fsync for every step).
We'd love to hear what you think of it!
Question for OP: I'd bet Flink's Statefuns comes in Restate's story. Could you please comment on this? Maybe Statefuns we're sort of a plugin, and you guys wanted to rebase to the core of a distributed function?