The assumption you're looking for is there but relatively inconspicuous. He mentioned a M/M/1/∞ queue but then didn't go into the details about the M/M/1 part. From the Wikipedia page[0]:
> an M/M/1 queue represents the queue length in a system having a single server, where arrivals are determined by a Poisson process...
So the arrival times are not arriving at any fixed rate, but according to a Poisson distribution. In the article he did reference this fact right before the first plot:
> (For the wonks: I used a Poisson arrival process and exponentially distributed processing times)
And then finally, he answered this question more directly in the comments to the article:
> What’s to prevent the system from bouncing between “1 task in processor & 1 task in queue” and “1 task in processor & 2 tasks in queue” while maintaining 100% utilization?
> Nothing! That could totally happen in a queueing system. However, the arrival process would need to be tuned quite precisely to the processing rate. You would need to watch the status of the processor and, when a task finishes, only then insert a new task into the queue. But this implies a shell game: if you always have a task ready to put into the queue when it needs to be padded back out, then isn’t that task in a sense already “queued”?
> Instead, in queueing theory we usually assume a random arrival process: sometimes a minute may pass between arrivals into the queue; sometimes only a second. So the system can’t bounce for arbitrarily long between 1-in-queue and 2-in-queue states. Eventually, one of two things will randomly occur:
> 1. From the 1-in-queue state, the active task finishes processing before a new task arrives, bringing queue size to 0.
> 2. From the 2-in-queue state, a new task arrives in the queue before the active task finishes, causing the queue size to grow to 3.
Of course I know what was missing. My comment was more about making the implicit more explicit. I am not the only one who immediately thought of the above as a counter-example to the bold text (repeated twice) in the article.
> an M/M/1 queue represents the queue length in a system having a single server, where arrivals are determined by a Poisson process...
So the arrival times are not arriving at any fixed rate, but according to a Poisson distribution. In the article he did reference this fact right before the first plot:
> (For the wonks: I used a Poisson arrival process and exponentially distributed processing times)
And then finally, he answered this question more directly in the comments to the article:
> What’s to prevent the system from bouncing between “1 task in processor & 1 task in queue” and “1 task in processor & 2 tasks in queue” while maintaining 100% utilization?
> Nothing! That could totally happen in a queueing system. However, the arrival process would need to be tuned quite precisely to the processing rate. You would need to watch the status of the processor and, when a task finishes, only then insert a new task into the queue. But this implies a shell game: if you always have a task ready to put into the queue when it needs to be padded back out, then isn’t that task in a sense already “queued”?
> Instead, in queueing theory we usually assume a random arrival process: sometimes a minute may pass between arrivals into the queue; sometimes only a second. So the system can’t bounce for arbitrarily long between 1-in-queue and 2-in-queue states. Eventually, one of two things will randomly occur:
> 1. From the 1-in-queue state, the active task finishes processing before a new task arrives, bringing queue size to 0.
> 2. From the 2-in-queue state, a new task arrives in the queue before the active task finishes, causing the queue size to grow to 3.
[0]: https://en.wikipedia.org/wiki/M/M/1_queue