I'm no expert, but I don't think this is true. Couldn't tasks arrive into the queue at the same rate that they are processed, resulting in a fixed queue size at 100% utilization?
Put another way, in the second "No good" diagram showing one task being worked on with none in the queue, another task could arrive before the current one is finished.
I suspect the counterargument might be that the task arrival rate is not usually that predictable, but even so, I expect that the amount of variance predicts how true this theory is in practice.
> What’s to prevent the system from bouncing between “1 task in processor & 1 task in queue” and “1 task in processor & 2 tasks in queue” while maintaining 100% utilization?
> Nothing! That could totally happen in a queueing system. However, the arrival process would need to be tuned quite precisely to the processing rate. You would need to watch the status of the processor and, when a task finishes, only then insert a new task into the queue. But this implies a shell game: if you always have a task ready to put into the queue when it needs to be padded back out, then isn’t that task in a sense already “queued”?
> Instead, in queueing theory we usually assume a random arrival process: sometimes a minute may pass between arrivals into the queue; sometimes only a second. So the system can’t bounce for arbitrarily long between 1-in-queue and 2-in-queue states. Eventually, one of two things will randomly occur:
> 1. From the 1-in-queue state, the active task finishes processing before a new task arrives, bringing queue size to 0.
> 2. From the 2-in-queue state, a new task arrives in the queue before the active task finishes, causing the queue size to grow to 3.
Classical misunderstanding of queueing. If you have a perfect control over an arrival, you can consider it as already queued. I.e. bank customers or grocery store checkout customers or online http requests for a service all arrive at random interval. You don't have a control over their arrival timing.
This is not helped by the language used in the post. This was supposed to be an introduction to queuing, people shouldn’t come away with their classic misunderstandings completely intact.
> When a system has steady-state 100% utilization, its queue will grow to infinity. (Emph mine.)
This is clearly false, there are clearly counterexamples where the queue does not grow that way. You don’t have control over the arrival rate etc but it could just be your lucky day. It would be more accurate to say you can’t guarantee that either the queue length or wait time are finite. Author didn’t include the words “assume the worst case” or anything like that. I feel like that’s a pretty crucial piece of framing that’s missing.
The author didn't assume the worst case, he just didn't set the timeframe. Every queue with random arrive times that has processors operating at 100% of their capacity is guaranteed to eventually grow indefinitely (as in, you can name any size, it will become larger than it).
But on practice you also can not maintain 100% workload indefinitely either.
Random dice rolls don’t preclude rolling 1 again and again forever, right? There’s an assumption here about the distribution of arrival times that hasn’t been stated.
The model (M/M/1) is set up using the averages, not worst or best case. If you roll a die infinite times and get infinite 1s, then you have either got a loaded die (in which case its distribution is not the generally assumed uniform distribution from 1 to n, but just 1) or are extremely lucky, but the situation is still far from average.
> There's an assumption here about the distribution of arrival times that hasn't been stated.
Yes, it has, but not as explicitly as you may like. The author explicitly stated that they were considering an M/M/1 queue:
And in the "What about 90%?" section specifically calls out what distributions they're using (which are the same as the ones you'd expect from that page on M/M/1 queues).
For the system under discussion, the average arrival rate is the first parameter, and the average processing rate is the second parameter. If you decide to only consider the case where everyone shows up half as often as the average, then you've changed the distribution being considered. Similarly, if you decide to only consider the case where tasks are completed twice as fast, then you've, again, changed the distribution.
Queuing theory isn't generally about picking a specific circumstance and saying, "Look, you can get really lucky and clear the queue in 10 minutes instead of 10 years, even though on average we would never expect that." The primary motivation (in applying it) is to consider the average circumstances. On average, if your arrival rate in an M/M/1 queue is the same as or faster than your processing rate, your queue size will grow indefinitely. Sure, you could get lucky and have a really slow day for customers and a really on-their-game worker, it is statistically possible, but over an arbitrarily large period of time we'd expect the arrival time to be too high for the processing time, and the queue to grow.
Yes, I understand that, at least at a high level. I’m making a narrow point that the article didn’t teach us this and it actually was essential to understanding the claim. Quoting the article again:
> I won’t go into the M/M/1 part, as it doesn’t really end up affecting the Important Thing I’m telling you about. What does matter, however, is that ‘∞’.
This turned out to be wrong. It does end up affecting the Important Thing and the discussion was severely incomplete without it.
The steady state is like Big-O notation; it’s already talking about the edge (as time grows to infinity). A single day deviation doesn’t really matter — your queue might shrink and grow a little day-to-day, but on average it’ll grow more than it shrinks, and you’re ultimately fucked.
If your utilization looked like 100% on one day, 0% on another, and flipping back and forth (allowing the queue to clear) then by definition your steady state isn’t 100%, its more like 50%.
And of course if your time bound is limited, or your potential customers is limited, then naturally everything is limited. But like big-O, the notion is to describe what happens when your inputs grows to infinity
Yes, that is a very basic point, one basic reason to do queuing is that queues are buffers for uncertainty. If things are perfectly predictable, as you see on some robots, you allocate exactly what you need and no extra.
The uncertainty exists in two places though... It's not just the events coming in but also the time to process each queued event, is another source of uncertainty in the same system that can have the same effect.
The author mentions “M/M/1,” this quantifies that there is only one queue-handling process, that events come in with the sort of randomness that raindrops have (no periodicity), and that handling an event does not have any “timeouts” nor any “long tails” where once a handler has been occupied for 3 or 4 T (where T is the average time to resolve) then the prediction for how much time is remaining for them either drops to 0 (timeout) or escalates dramatically (long tail)... The exponential distribution is used, kind of to be nicely in between these, it is a model where over a time dt the handler is asked to pick a random number with success rate p=dt/T, if they pick the right random number then the event is handled, otherwise a new random number is generated, repeat ad nauseam.
The basic lessons are surprisingly transferable though. So for example traffic on the highway does not have this raindrop sort of randomness because cars kind of repel each other, they space out much more uniformly than you would have predicted on short term scales... The handling time is also nothing like this model of random guessing. Nevertheless, traffic jams still form at high utilization! And they form for a really interesting reason which is that the n handlers—lanes, we call them—have correlations between them. All these queuing models optimistically assume that a handler never stops another handler, “hey have you ever seen anything like this? where can I find that password?” etc.
But in traffic, a car can only lane change from one lane to another if both lanes synchronize speed. With a fixed ability to accelerate, this depends the amount of space between you and the next car, and the amount of space in the lane you want to switch to: as utilization increases both of these spaces drop and synchronization becomes a massive source of inefficiency.
Same happens at a company too, it is possible that an employee might spend more time waiting for meetings, and finding convenient times to schedule them, in order to synchronize two streams of work, than the work itself takes. Can be very frustrating!
It does not immediately grow towards infinity, but it is unbounded. Over a sufficiently long period of time, any given queue size will be observed.
Assuming the input and output are just a little bit random, and are only equal on average. You will have periods during which the input and output balances out. You will have periods during which the output is faster, and the queue will shrink to 0 (but no lower). And you will have periods where the output is slower, and the queue grows, with no bound.
Put another way, in the second "No good" diagram showing one task being worked on with none in the queue, another task could arrive before the current one is finished.
I suspect the counterargument might be that the task arrival rate is not usually that predictable, but even so, I expect that the amount of variance predicts how true this theory is in practice.