What I've settled on is "store most job state in the DB, use task queues just to...

vhiremath4 · on Sept 25, 2023

Ultimately you have to figure out the separation of concerns of the job state and other core state. Ranging from “all state stored in message and will never become out of sync” to “no state stored in message and will never become out of sync”. In between you have “some state stored in db and some in message” and what I’ve found to be useful is keeping stuff in the db that needs to have high end state integrity (or as you said just making sure jobs are cancellable/idempotent).

Tangible example:

We have a video transcoder queue. The state of the video model in our db can change as the video is being finalized in various ways. The transcoder generates thumbnails and assets from the video and also updates its state in the db. So we store job information in the message about what thumbnails we want to generate and the video ID but nothing else. This allows us to look up the video row, see if the same media was already transcoded from the video (and cancel the job), and, if not, run the job and update the video row.

Also (and I know you’re not saying this), but I’ve never understood the argument that keeping queues in Postgres leads to higher data integrity via transaction guarantees. The job is still running on another process outside of the db. The only time this could be true is if the job itself mostly updates state in the db, in which case it’s the small minority of queued workloads (with the majority needing to do non-db compute work).