This response lacks nuance and focuses too much on the operational side of queue...

kilgnad · on Feb 19, 2023

Wrong. Your thinking is where the nuance is lacking. As I said you need to think of the system in terms of it's derivative. The rate of data consumed must on average be faster then the production. The net difference of these rates must equal a negative velocity with the rate of data going in as a positive number and data consumed as negative.

There is no system on the face of the earth that can operate at an average positive velocity. That is categorical failure.

This is mathematically and logically true no matter how big your queue is.

Even for the situation your describing... A really big ass queue, the average velocity of that system must still be negative. Unless your queue is so big that it can withstand an average negative velocity over the lifetime of the system which is kind of absurd.

Most of the time your queues should be empty. If your queues are observationally often filled with data but not full it means you're still operating at a negative velocity but you've tuned your system to operate at the border of failure.

It means your system hits unsustainable velocities often. You may want this for cost reasons but but such a system is not robust enough for my comfort. The queue should be empty most of the time with occasional spikes at most.

orf · on Feb 19, 2023

This is a very long-winded way of saying “you should have enough capacity to process all items in the queue before the queue fills up and everything explodes”.

I agree. But that’s not the argument the post is making, nor does it support the position that “queues should be empty”. No, they shouldn’t. It’s fine to let a queue build up (and thus not be empty), then process all the items in the queue in a short time frame. There are efficiency savings if you are able to do this.

kilgnad · on Feb 19, 2023

No. That is not what I'm saying. I'm saying on average queues must be empty most of the time. And if you are seeing queues that are filled most of the time, your entire system is operating on the border of failure.

This is true even when you have a big ass queue. The system must on average be more empty then it is filled.

This is not to say that your system has to never have spikes in data. Your system can often have spikes, but this cannot ever be the norm as in the amount of spikes in data production rate cannot push the average production rate to exceed the average consumption rate.

orf · on Feb 19, 2023

I don’t see how this holds true. Let’s say I have a process that takes 1 hour to process a batch of items up to 1000 (I.e doesn’t scale linearly with input size). So, 1 hour to process 1 item and the same 1 hour to process 1000 items.

If I have 1 message per minute being added to the queue, I can safely run 2 of these batch processes once a day to clear the queue. Or more generally can launch (queue_size / 1000) tasks once a day.

Because surely it doesn’t matter about the queue being empty, what matters is the rate at which messages can be processed.

With 2 processes running once a day the rate would be 2000/24/60 = 1.3 messages per minute in aggregate.

The queue can be filled 23 hours per day without issue and without being on the brink of a failure state.

Edit: you edited your comment whilst I was replying to specifically mention consumption rate. The aggregate consumption rate is all that matters, but it doesn’t follow that the queue “has to be more empty than not empty”. Those are different issues.

kilgnad · on Feb 19, 2023

You're talking about a different scenario. I'm talking about continuous inflow and outflow. Where outflow is processing (fixed rate) and inflow is traffic (variable). This is the classic assumption. In this state the space used in the queue is an indicator of mismatched consumption and production velocities.

You're talking about a single batch job that runs once a day and empties the queue at the same rate no matter how much data is in the queue. Thus the more data you grab in batch the more efficient the processing so you grab data once per day instead of continuosly. This is a clear exception from the assumed norm that needs to be mentioned.

That is not to say that your case is invalid. Many analytic databases possess your ingestion parameters. But the case is exceptional enough that if not explicitly mentioned it can be assumed that such a batch job is not a primitive in the model being discussed.

orf · on Feb 19, 2023

I disagree it’s exceptional even if my examples have been caricatures.

Very few technical problems don’t have any elasticity in outflows, but a lot of problems do benefit from a form of batching. Heck, even something as simple as inserting records into a non-analytic database benefits greatly from batching (inserting N rows rather than 1) and thus reducing TPS, or calling an external service, or writing files to some storage, or running inferences, or writing to a ledger, or anything.

Batching, in all forms, exists absolutely everywhere from super-scalar CPU uops to storage to networking to the physical world. The exception is to do something without batching, even if the whole stack you’re using from the ground up is built on it.

Given the common use case of batching, which inherently requires items to be queued for a period of time, how can you say that queues should always be empty? They would statistically rarely be empty. Which is a good thing. Because it enables batching.

If you want to queue and insert 1000 records into a database with 1000 transactions across 1000 workers, then say your workload is highly optimised because there are never any records in your queue, cool. Not very true though.

kilgnad · on Feb 19, 2023

>Very few technical problems don’t have any elasticity in outflows,

They have elasticity but you don't care for it. You run it at the maximum. As you mentioned the only case where you wouldn't is if you have some transaction that doesn't scale linearly with with frames/events.

>Given the common use case of batching, which inherently requires items to be queued for a period of time, how can you say that queues should always be empty? They would statistically rarely be empty. Which is a good thing. Because it enables batching.

In my experience, while batching is not uncommon, streaming is the much more common use case. So we can agree to disagree here.

ndriscoll · on Feb 19, 2023

It's not really that exceptional. You can do similar things with e.g. batch flushes every 5 ms for synchronous web requests and the same ideas will apply. This can give a massive increase in throughput while possibly decreasing overall response time for OLTP.

kilgnad · on Feb 19, 2023

It's common, but exceptional enough that it should be mentioned because it is a specialized database that does batch ingestion.

Let's say you flew from one city to another. People assume you took a plane. If you took a helicopter, though helicopters are common, it should be mentioned otherwise it won't be assumed.

ndriscoll · on Feb 19, 2023

It doesn't take a special database though. You can do this sort of thing in your application in like 10-20 lines of code (assuming you have a framework with queues and promises). It works great with mysql/postgresql which both benefit dramatically from batching queries.

kilgnad · on Feb 19, 2023

The context here is external queues like kafka. If you're writing stuff to a queue in your web app then it must be a relatively small load. This is different.

For analytics databases the batch ingestion requirement are usually so big that you have to save it to the file system hence the need for an external queue.

Heap allocated queues are different, at that stage I would say that it's in "processing" already and popped out of the external queue.

ndriscoll · on Feb 19, 2023

I guess "relatively small load" is... relative, but I've written that kind of thing to be able to handle ~70k sustained OLTP requests per second (including persistence to postgres) when load testing locally on my laptop. In any case the same thing applies to external queues. Your workers will often be more efficient if you pull off chunks of work to process together.

kilgnad · on Feb 19, 2023

By small load I mean size. The frames in your batch jobs are tiny and few in number if you're getting 70k batch jobs per second.

It's more similar to streaming... what you're doing here. In that case my velocity measurements are more applicable. You want your queues to be empty in general. A batch job is run every hour or something like that which is not what you're doing here.

If you ran your load test for 5 minutes and you see your queues are 50 percent full. Well that means 10 minutes in you'll hit OOM. Assuming your load tests are at a constant rate.

If your queues are mostly empty then it can handle the load you gave it and have room for spikes. It's just math.

Xylakant · on Feb 19, 2023

Queues can be full most of the time without the system being in a precarious state. Take for example an extremely spiky pattern of processing: once a day, a big batch of jobs is delivered. You know ahead of time how many items there are and have a pretty good handle on how long it takes to process one item. The items should be processed before the next batch arrives. You can then tune your system to empty the queue just before the next batch arrives. The queue is non-empty the majority of time, yet the system is perfectly stable.

Now, that is an extreme example, but similar behavior exists. It’s fine to have queues partially filled most of the time, as long as your average rate of processing exceeds the rate of incoming items by some margin.

kilgnad · on Feb 19, 2023

AS I said in the other post I am talking about the average and most likely use case. a flat and continuous rate of data processing and varied rates of data production.

Spiking the processing is not what people think about with queues and is most likely NOT what the author of the article is talking about.

Typically there's no reason why you would spike your processing unless such processing doesn't scale linearly with the amount of items processed (as the author in a sibling post to yours mentioned). Such a case must be deliberately introduced into the discussion as an exception to the model because it's not what people think about when discussing this. Or maybe it is given that 3 people brought the same exact exception up. Welp if that's the case then I can only say the author of this article and I certainly aren't referring to this use case.

edit: I'm rate limited. So I'm referencing another post in my response that you won't see until the rate limit dies and I post it.

zokier · on Feb 19, 2023

> You may want this for cost reasons but but such a system is not robust enough for my comfort

Your comfort is secondary to the business needs that the system is designed to fulfill. In practice some risks are better to accept than mitigate

kilgnad · on Feb 19, 2023

My comfort is a measure of business aversion to risk. It depends on the business of our respective industries.

Either way you have admitted that you tuned the system to operate with an acceptance of risk. Likely for cost reasons. This is fine, if you deliberately chose to do this.

zokier · on Feb 19, 2023

> This is fine, if you deliberately chose to do this.

Yet you bravely asserted that

> At best you can account for occasional spikes as the article stated but a queue even half full should never be the operational norm.

Suddenly this thing that you said "should never be" changed to "is fine"

kilgnad · on Feb 19, 2023

When humans communicate. We don't talk about exceptions where you deliberately choose to operate outside of the norm. If you deliberately choose to drive a car without a seat belt knowing this risk. This is fine.

You want to sky dive while deliberately choosing not to use a parachute? That's fine. But don't expect me to adjust my communication to account for your exception. You chose to do this.

But let's not pretend that you don't know about this fact in human communication. It's an obvious thing you know. Yet you chose to characterize your post in a way that negatively makes my statement look unreasonable and your actions look normal. Please don't do this. If you choose to sky dive without a parachute, that's fine, but admit you are operating with risk for the sake of cost. Do not try mischaracterize my intentions, that is just rude.