Automating this with linter and formatter is great. It moves the argument over style and format to a one liner change to a lint config instead of mingling it with the with the main code change.
If you continue to have people bringing up arguments over the linter and formatter after an initial agreement is made, then you can talk to those people
pop() or read() in a loop, yes. can read 1 message or many messages at a time.
what we do at Tembo in our infrastructure is pause for up to a few seconds if a read() returns no messages. when there are messages, then we read() with no pause in between. when the queues are empty it amounts to less than one query per second. there is not much cost to reading frequently if you use a client side connection pool, or a server side pool like pgbouncer.
I agree. But it can be useful to have a guarantee, even for a specified period of time, that the message will only be seen once. For example, if the processing of that message is very expensive, such as if that message results in API requests to a very expensive SaaS service. It may be idempotent to process that message more times than necessary, but doing so may be cost prohibitive if you are billed per request. I think this is a case where using the VT to help you only process that message one time could help out quite a bit.
Agreed that this property is useful for efficiency. The danger comes from users believing this property is a guarantee and making correctness decisions based on it. There is no such thing as a distributed lock or exclusive hold.
This is more or less where my brain was heading. The utility of exactly once system is quite high. But the utility of coding knowing that it is at least once, hopefully, and if not we'll do a catch-up, means that I will code knowing that my work must be safe in an environment with these constraints.
The best argument I have for a postgres background worker queue is simplicity. You're already managing a db, why not have the db simplify your startup's infrastructure. Honestly, with heroku, sidekiq, and redis it isn't even simplifying. But for startups simplicity is useful so they can scale when they need to and avoid complexity when possible. But that's the only argument I'd make.
IMO, it is most valuable when you are looking for ways of reducing complexity. For a lot of projects, if you're already running Postgres then it is maybe not worth the added complexity of bringing in another technology.
If the message never reaches the queue (network error, database is down, app is down, etc), then yes that is a 0 delivery scenario. Once the message reaches the queue though, it is guaranteed that only a single consumer can read the message for the duration of the visibility timeout. FOR UPDATE guarantees only a single consumer can read the record, and the visibility timeout means we don't have to hold a lock. After that visibility timeout expires, it is an at-least-once scenario. Any suggestion for how we could change the verbiage on the readme to make that more clear?
You are talking about distributed systems, nobody expects to read "exactly once" delivery. If I read that on the docs, I consider that a huge red flag.
And the fact is that what you describe is a performance optimization, I still have to write my code so that it is idempotent, so that optimization does not affect me in any other way, because exactly once is not a thing.
All of this to say, I'm not even sure it's worth mentioning?
You should let the Apache Flink team know, they mention exactly-once processing on their home page (under "correctness guarantees") and in their list of features.
Exactly once delivery is not the same as exactly once processing.
Exactly once delivery is an impossibility; see the two generals problem. Exactly once processing is usually at-least-once delivery coupled with deduplication or an otherwise idempotent action
You CAN have "at most once" processing, since there is always the option of processing zero messages due to systems being down, but it's a joint effort, nobody is able to offer it "for free" (as in, no effort to implement it)
> Once the message reaches the queue though, it is guaranteed that only a single consumer can read the message for the duration of the visibility timeout.
But does the message get persisted and replicated before any consumer gets the message (and after the submission of the message is acked)? If it's a single node, the answer is simply "no": the hard drive can melt before anyone reads the message and the data is lost. It's not "exactly once" if nobody gets the message.
And if the message is persisted and replicated, but there's subsequently a network partition, do multiple consumers get to read the message? What happens if writing the confirmation from the consumer fails, does the visibility timeout still expire?
> After that visibility timeout expires, it is an at-least-once scenario.
That's not really what "at least once" refers to. That's normal operation, and sets the normal expectations of how the system should work under normal conditions. What matters is what happens under abnormal conditions.
> FOR UPDATE guarantees only a single consumer can read the record, and the visibility timeout means we don't have to hold a lock
This is not always true!
Postgres does snapshot isolation within transactions that begins on the first statement within a transaction. This means that some isolation levels like REPEATABLE READ can feel misleadingly “safer” but actually break your guarantee. From the docs:
> Note also that if one is relying on explicit locking to prevent concurrent changes, one should either use Read Committed mode, or in Repeatable Read mode be careful to obtain locks before performing queries. A lock obtained by a repeatable read transaction guarantees that no other transactions modifying the table are still running, but if the snapshot seen by the transaction predates obtaining the lock, it might predate some now-committed changes in the table. A repeatable read transaction's snapshot is actually frozen at the start of its first query or data-modification command (SELECT, INSERT, UPDATE, DELETE, or MERGE), so it is possible to obtain locks explicitly before the snapshot is frozen.
Well that only is a problem with locks if you actually need locks to satisfy your guarantee and you acquire the lock after your first read. However repeatable reads are never a good guarantee without locking
As u/Fire-Dragon-DoL says, you can have an exactly-once guarantee within this small system, but you can't extend it beyond that. And even then, I don't think you can have it. If the DB is full, you can't get new messages and they might get dropped, and if the threads picking up events get stuck or die then messages might go unprocessed and once again the guarantee is violated. And if processing an event requires having external side-effects that cannot be rolled back then you're violating the at-most-once part of the exactly-once guarantee.
> you can have an exactly-once guarantee within this small system
It's actually harder to do this in a small system. I submit a message to the queue, and it's saved and acknowledged. Then the hard drive fails before the message is requested by a consumer. It's gone. Zero deliveries, or "at most once".
> If the DB is full, you can't get new messages and they might get dropped
In this case, I'd expect the producer to receive a failure, so technically there's nothing to deliver.
> if the threads picking up events get stuck or die
While this obviously affects delivery, this is a concurrency bug and not a fundamental design choice. The failures folks usually refer to in this context are ones that are outside of your control, like hardware failures or power outages.