Making Serverless Orchestration 25x Faster

plandis · 2024-06-17T19:12:00.000000Z

> You simply write your workflow as a TypeScript function which calls your steps, implemented in other TypeScript functions. The framework automatically instruments each step to record its output in the database after executing.

In my mind, the key value proposition of orchestration is that it is a solution for business processes where various parts have different owners. The fact that you need to tightly couple the logic for all your various states in DBOS sounds like it’s solving, at best, a partial part of the problem in my opinion.

KraftyOne · 2024-06-17T20:03:48.000000Z

Author here--can you break that down a little more? A DBOS workflow can call out to external APIs using communicator functions (https://docs.dbos.dev/tutorials/communicator-tutorial), so it works well with other loosely-coupled services.

alanyilunli · 2024-06-17T19:15:06.000000Z

Pardon my naivety but why was Typescript chosen as the interface for writing transactional workflows? I can't help but think that a backend language that is more popular for those use-cases would be more relevant.

Maybe the idea is that those who are using those other languages may have other workarounds already?

Zambyte · 2024-06-17T19:19:30.000000Z

The overlap for the target audience of TypeScript and "serverless" systems seems much larger than the overlap of the target audience of "backend languages" and "serverless"

bnchrch · 2024-06-17T19:22:52.000000Z

I think the naivety lies in underestimating how many backend systems are written in Typescript, and how large of the developer population knows / uses TS.

In either sense, its alot.

(Aside: The backend has plenty of bad languages. For example Python is considered a backend language but is considerably worse than most languages in speed, ergonomics, transitive dependencies, and so on... )

jpgvm · 2024-06-17T19:25:05.000000Z

It's a lot but Java is still king. The SV bubble seems to forget just how dominant Java is in backend software (well until they get a job at FANG and realise Java never left the building).

jrajav · 2024-06-17T20:01:38.000000Z

Is there data to support this viewpoint? I've heard it often, but it always seems to lean on anecdata or caveats that the concrete datasets out there have blind spots (which is plausible, but certainly not a given). And those concrete datasets don't usually tend to put Java near the top for professional usage, or any metric for that matter.

ithkuil · 2024-06-17T19:53:56.000000Z

Raw popularity is not the only parameter.

Toolchain weight, integration cost, ease of building DSLs, error message quality ... All those things matter when choosing a language to be embedded as a DSL into another system.

jpgvm · 2024-06-17T20:05:21.000000Z

True. Though the last time Stonebraker tried this (VoltDB) they did lean on Java for DB side custom logic.

I wasn't really making a comment on that though but rather on overall popularity of TS vs Java, which is mostly just about the influence of the echo chamber and how that distorts peoples perception of what is out there.

ithkuil · 2024-06-17T20:17:13.000000Z

The eco chamber is real but it cuts both ways: if you want your product to be successful you need a first wave of adopters and their echo chamber will shape your product through their feedback.

Once your product reaches a wider audience you may notice that the balance is different but that doesn't mean that if you had chosen what makes sense for the wider audience (e.g. java) in the first iteration you'd be necessarily at a better place now. Perhaps you wouldn't never get to the place of worry how to please the masses because you product wouldn't have had any success in the first selection environment

mind-blight · 2024-06-17T19:09:44.000000Z

This seems really cool. I've just been running into scenarios where this kind of durable execution is really helpful. I've been doing basic things with RabbitMQ plus a job server, but there are definitely limitations.

I'd love to hear from folks who have experience using things like Beam or Spark. I've if the biggest pain points I've encountered is that there are definitely dozens of "mature" products to solve this problem that all different slightly in their setup and tradeoffs

robertlagrant · 2024-06-17T19:23:06.000000Z

I always thought Temporal[0] would be a brilliant choice for that sort of durable processing.

[0] https://temporal.io

dangoodmanUT · 2024-06-17T19:27:14.000000Z

Temporal is great until it's not XD

nogridbag · 2024-06-17T19:48:37.000000Z

Care to elaborate or link to your thoughts? Temporal has always been on my radar. I wanted to do a deeper dive, but the cost seems pretty high for the managed service.

jahewson · 2024-06-17T20:33:01.000000Z

Good concept, buggy implementation. That’s not ok for a distributed system that glues together important pieces of your product.

robertlagrant · 2024-06-17T21:36:11.000000Z

I'd also love to know more. What sort of thing is an issue?

chipdart · 2024-06-17T19:49:17.000000Z

> This seems really cool. I've just been running into scenarios where this kind of durable execution is really helpful.

You might find Azure's Durable Functions right up your alley. With durable functions you can break away workflows into activities which are invoked from orchestrator functions or other activities like regular functions, but the runtime handles the orchestration and state machine updates.

jahewson · 2024-06-17T20:09:35.000000Z

These docs don’t fill me with confidence. That’s… weird given who’s behind the project. What gives?

> Workflows provide the following reliability guaranteees:

> 1. They always run to completion.

You can’t guarantee that.

> 2 […] Regardless of what failures occur during a workflow's execution, it executes each of its transactions once and only once.

“Executed” is the wrong word here, if the database goes down half way through a transaction it’s neither executed zero times nor once.

> 3. Communicators execute at least once but are never re-executed after they successfully complete. If a failure occurs inside a communicator, the communicator may be retried

That’s not what at-least-once means? In 2. “execute” means “run to completion” but here the logic only works if it means “try”.

> Workflows must be deterministic: if called multiple times with the same inputs, they should always do the same thing.

Deterministic is the wrong word here, the correct word is idempotent. e.g. A simple counting function is deterministic but not idempotent.

KraftyOne · 2024-06-17T20:44:06.000000Z

Author here--thanks for the feedback! We'll update the docs to clarify the first three points assume that the database and application always restart and return online if they go offline.

For the last point, we'll clarify we mean that the code of the workflow function must be deterministic. For example, a workflow shouldn't make an HTTP GET request and use its result to determine what to do next, even though that's technically idempotent. Instead, it should make the HTTP request in a communicator (https://docs.dbos.dev/tutorials/communicator-tutorial) so its output can be saved and reused during recovery if necessary.

jahewson · 2024-06-17T21:41:11.000000Z

Yeah, I see, you mean deterministic in the strongest sense - no state, no side-effects. A pure function. Might be worth phrasing it that way as React devs are familiar with that terminology but not the former.

FpUser · 2024-06-17T19:50:59.000000Z

>"Increasingly, developers are using reliable workflows to help build applications. Reliable workflows are programs that always run to completion–if they’re interrupted, they automatically resume from where they left off."

This "increasingly" has been in wide use for ages. Personally I was doing it in the 90s

ldjkfkdsjnv · 2024-06-17T19:15:27.000000Z

Temporal is the final boss when it comes to orchestration technology

jpgvm · 2024-06-17T19:43:36.000000Z

Maybe, that remains to be seen yet. I do however have high hopes for durable execution as a model even if Temporal doesn't end up being the eventual winner.

FridgeSeal · 2024-06-17T20:01:33.000000Z

> You simply write your workflow as a TypeScript function

Infinitely slower if you don’t use typescript though lol.

hxboo · 2024-06-17T19:12:26.000000Z

Is it comparing apples to apples? DBOS looks more like Spring than AWS Lambda imo.

secondrow · 2024-06-17T19:51:36.000000Z

It depends what part of DBOS you're looking at. DBOS Transact is the framework (TypeScript) used to develop apps/workflows such as those in the benchamrk.

DBOS Cloud hosts and executes DBOS Transact apps/workflows a la (AWS Lambda+Step Functions). So it is apples:apples. Functionally, DBOS Cloud is like Lambda and Step Functions in one.

localfirst · 2024-06-17T19:39:19.000000Z

is it just me or am i seeing less and less serverless showing up in roles? seems like there was a big rush during the hype around 2021, and people went back to ec2/kubernetes

jpgvm · 2024-06-17T19:48:05.000000Z

That is because that is exactly what happened. It was tried, people went too far and tried to build entire applications in FaaS and it was largely unsuccessful. Cue a bunch of migrations onto k8s to contain costs and get back control over process lifetime, better integration with existing monitoring/tracing, etc, etc.

I am probably the farthest from a fan of serverless but I have developed some appreciation for all the tech that went into it and have found some good use-cases for serverless and serverless like things.

The one I am most bullish on is serverless at the edge. Edge compute is too expensive when provisioned the traditional way (as static reserved memory + CPU etc) and the kind of tasks you want to do at the edge (request manipulation, early AuthZ, etc) are amenable to serverless requirements/limitations. Cloudflare Workers is what I am primarily familiar with but I imagine Lambda@Edge and Fastly's solution are similar.

Is serverless dead? No. But the hype around building whole apps on Lambda and that actually being good is.

tracker1 · 2024-06-17T20:51:48.000000Z

Not to mention, if your app is horizontally large, and depending on how it's configured, cold starts can really kill your overall performance.

localfirst · 2024-06-18T00:03:04.000000Z

what use cases do edge functions have? what is so critically latency sensitive

orthecreedence · 2024-06-17T20:26:12.000000Z

I think people realized that they had ultimately reinvented PHP and sometimes a stateful app server is just fine for your 100 req/day app.