Show HN: Inngest – Developer platform for background jobs and workflows

machiaweliczny · on June 20, 2023

Seems like similar API/usecase as Temporal. Do I get it right that your system is similar but easier to use as it's basically exposes similar functionality via HTTP and higher-level API?

Do I get it right that difference between this and for example ActiveJob in Rails is that you handle well multi step workflows where there's a need to coordinate and wait for some event/thing to finish (or just sleep). And benefit is that it it's easy to read whole flow as it's async function?

danfarrelly · on June 20, 2023

Exactly - we have many users that have come over after using Temporal. We designed our SDK to be more lightweight and flexible. We want it to feel more just like writing normal code, not a new coding paradigm. For example, you can define steps right within your function, not as separate "activities."

Being HTTP based (push vs. pull), it's easier to manage and works natively with serverless and servers.

Inngest is also event-driven, so you can fan-out and do things like have your workflow wait for another event. Our `step.waitForEvent()` allows you to pause a function until another event is received, creating dynamic jobs that can wait for additional actions or input. Also, using events allows us to replay failures super easily.

re: ActiveJob - Yeah, multi-step workflows are a huge difference. We manage step retries and the function state for you. That makes things like sleep and coordinating between events easy. As you mentioned, it leads to simpler function definition so it means that almost any engineer can write workflows quickly and easily read the code in a single place, reducing bugs due to disconnected jobs.

tomredman · on June 20, 2023

The timing of this is pretty awesome for me. I’m building a product that requires fairly heavy, scheduled background services. Originally, I had built these services intermingled with my client and API, but it was not ideal from a development or deployment process. Plus we had rolled our own monitoring which was itself a PITA to maintain. With Inngest, I moved all of our background services and processing to a separate sub-repo, and we can develop, deploy and monitor entirely independently from the rest of our product which has really sped things up. Love it. Would recommend for anything event-based!

The last straw for me was the few times I ran into issues, often due to my own mistakes, their support was nearly real-time and worked with me either help me solve the problem or dig in on their end to see where the issue was. Honestly more than anything the support gives me confidence to fully commit to this and use across all my production apps.

Anyway, great stuff all, you’ve built something awesome here.

danfarrelly · on June 20, 2023

> we had rolled our own monitoring which was itself a PITA to maintain

Thanks! What type of monitoring were you looking for? We have some basic metrics now, but know we need to improve this. What metrics, alerting, observability are important for you?

distracteddev90 · on June 20, 2023

Not the original commenter but I manage a similar system:

1. Wait timings for jobs.

2. Run timings for jobs.

3. Timeout occurrences and stdout/stderr logs of those runs

4. Retry metrics, and if there is a retry limit, then metrics on jobs that were abandoned.

One thing that is easy to overlook is giving users the ability to define a specific “urgency” for their jobs which would allow for different alerting thresholds on things like running time or waiting.

danfarrelly · on June 20, 2023

This is great - we do capture all logs for each run including any retries, so you can see errors and general successes. All of these other metrics we have internally, but need to expose to our users!

Observability is super key for background work even more so since it's not always tied to a specific user action, so you need to have a trail to understand issues.

> One thing that is easy to overlook is giving users the ability to define a specific “urgency” for their jobs which would allow for different alerting thresholds on things like running time or waiting.

We are adding prioritization for functions soon so this is helpful for thinking about how to think about telemetry for different priority/urgent jobs.

re: timeouts - managing timeouts usually means managing dead-letter queues and our goal is to remove the need to think about DLQs at all and build metrics and smarter retry/replay logic right into the Inngest platform.

jtwebman · on June 20, 2023

Sorry DLQs make it easier to do those alerts where a human needs to look asap at something. Not sure they can be gotten rid of, but maybe you call them something else.

goodoldneon · on June 20, 2023

Inngest engineer here!

Agreed that alerting is important! We alert on job failures, plus we integrate with observability tools like Sentry.

For DLQs, you're right that they have value. We aren't killing DLQs but rather rethinking them with better ergonomics. Instead of having a dumping ground for unacked messages, we're developing a "replay" feature that lets you retry failed jobs over a period of time. Our planned replay feature will run failures in a separate queue, which can be cancelled at any time. The replay itself can be retried as well if there's still a problem

thakobyan · on June 20, 2023

Building reliable background jobs and engineering workflows has almost always been challenging in any company I worked for, and I’m glad there is a company now who tries to excel the DX aspect of this problem.

danfarrelly · on June 20, 2023

Thanks for the comment! what were some of the most painful parts of this at the companies that you worked for?

mindvirus · on June 20, 2023

Schema management!

danfarrelly · on June 20, 2023

This is a great one. I've experienced this myself, especially when you change an event/message and then you need to handle that change in your job/workflow. Things can break pretty easily so you need to have versioning for both.

This is why we've built event schema versioning and versioning for functions baked into the platform. We have big plans for the schema management side of things that bring concepts of data governance to engineering teams. It should just be for data teams. As a bonus, we can also generate language types from schemas easily then.

What else about schema management is a pain? What have you used for this?

tianzhou · on June 20, 2023

Congrats on the launch. Building a reliable background job / workflow infra is hard. Temporal has lifted the bar significantly, glad to see new development.

As for the schema management part, we at bytebase.com have also built an OSS product to tackle this specifically.

frant1c · on June 20, 2023

We've been using Inngest at Secta.ai for the last ~6 months, happy to answer any questions!

DX is great! Writing the jobs feels very natural, much much simpler than Temporal. The development server is neat and makes debugging jobs very easy. TypeScript SDK is idiomatic, the types are properly inferred & propagated throughout the whole app.

The nice thing about writing step functions for Inngest vs regular "async worker queues" is that we can express logic, e.g. "if X than wait for event Y", with a layer of caching/retries on top.

dimitropoulos · on June 22, 2023

I don't have a question per se but this post was the first time I've heard about your company and I find it to be a really interesting offering and I've told a few people about it. Having done a small amount of portrait photography in the past (professional headshots, I mean) I think people underestimate how intensely difficult it can be. rock on!

zenorocha · on June 20, 2023

Background jobs is the kind of problem that every company has but there’s still room to find the best DX possible. I’m glad there’s people tackling this problem.

GGO · on June 20, 2023

Can you elaborate on why you chose to go with SSPL license? I want to open source a project and have been thinking between SSPL and AGPL. I am held back by OSI stating that the SSPL is does not comply with its Open Source Definition because it discriminates against specific fields of endeavor, describing it as a "fauxpen" source license.

danfarrelly · on June 20, 2023

Good question. This was a hard question for us last year and we chose SSPL for the time being as a early stage startup to offer some protection. AGPL allows anyone to deploy your system and re-sell it, but SSPL requires the person to open source their additions that they make for their platform, which benefits the project itself.

*Caveat*: This is super nuanced and hotly debated, so this is high level and no perfect answer here.

Mid term, we plan to move from SSPL to a more open license in the future as we further develop our open source project.

devty · on June 21, 2023

Curious to know about your background (:wink:)!

What led you guys to work on this problem? What inspired you guys?

mirzap · on June 20, 2023

Looks interesting and promising. Is it Open Source? Can it be self-hosted?

tonyhb · on June 20, 2023

The executor, queue, state, drivers, etc. are all on Github (https://github.com/inngest/inngest).

Over the last year we've been iterating on the internals a lot to build things like:

- Concurrency (shared nothing, auto-scalable)

- Batching (have one fn run with 100 events, vs 1:1 mapping)

- Prioritization

- Replay

- Parallelization

- Branch deploys

- Rate limiting

The changes have been heavy, and it would be really hard for self-hosted people to handle the migrations necessary for these. Now that this is slowing, self hosting is realistically something that's possible soon. We'd prefer to offer self hosting when it's easy and ready, vs something that's a burden to operate.

dimitropoulos · on June 20, 2023

why only TypeScript?

I realize you can't please everyone at all times but I'd love to have a Rust or Zig SDK option. Go is a good start in that direction I guess..

danfarrelly · on June 20, 2023

We started with TypeScript because it's not well supported with a current solution and none of them support serverless. We wanted to solve serverless first as it's made supporting long-running servers easy.

A lot of folks in the TS/JS community also don't often build distributed systems and it's easy to get wrong. So we think they're hungry for something like Inngest that they don't need to manage or spend weeks learning some complex system. Plus, TS gives us typing for all events/messages.

We already have a working Go SDK that we use internally and we have a test harness that will enable us to add other languages like Rust or Zig more easily. We even have a community member building a PoC for Elixir.

darwin67 · on June 20, 2023

no promises here but I'd love to look into Rust in the future. there're essentially no background job systems for Rust iirc.

imsh4yy · on June 20, 2023

Having an event-driven infrastructure is a big missing piece in the serverless world and I'm glad to see you guys stepping in and filling this gap!