What I've settled on is "store most job state in the DB, use task queues just to poke workers into working on the jobs".
Storing the job state in the DB means you can query state nicely. It's not going to exactly show the state of things but it's helpful for working through a live incident (especially when most job queues just delete records as work is processed).
And if you make all the background tasks idempotent anyways then you're almost always safe with running a thing like "send a job to the task queue to handle this job".
If you rely _just_ on message queues, there are a lot of times where you can have performance issues, yet have a lot of trouble knowing what's going on (for example, rabbitMQ might tell you the size of your queues, but offer little-to-no inspection of the data inside them).
Ultimately you have to figure out the separation of concerns of the job state and other core state. Ranging from “all state stored in message and will never become out of sync” to “no state stored in message and will never become out of sync”. In between you have “some state stored in db and some in message” and what I’ve found to be useful is keeping stuff in the db that needs to have high end state integrity (or as you said just making sure jobs are cancellable/idempotent).
Tangible example:
We have a video transcoder queue. The state of the video model in our db can change as the video is being finalized in various ways. The transcoder generates thumbnails and assets from the video and also updates its state in the db. So we store job information in the message about what thumbnails we want to generate and the video ID but nothing else. This allows us to look up the video row, see if the same media was already transcoded from the video (and cancel the job), and, if not, run the job and update the video row.
Also (and I know you’re not saying this), but I’ve never understood the argument that keeping queues in Postgres leads to higher data integrity via transaction guarantees. The job is still running on another process outside of the db. The only time this could be true is if the job itself mostly updates state in the db, in which case it’s the small minority of queued workloads (with the majority needing to do non-db compute work).
Storing the job state in the DB means you can query state nicely. It's not going to exactly show the state of things but it's helpful for working through a live incident (especially when most job queues just delete records as work is processed).
And if you make all the background tasks idempotent anyways then you're almost always safe with running a thing like "send a job to the task queue to handle this job".
If you rely _just_ on message queues, there are a lot of times where you can have performance issues, yet have a lot of trouble knowing what's going on (for example, rabbitMQ might tell you the size of your queues, but offer little-to-no inspection of the data inside them).