The assumption you're looking for is there but relatively inconspicuous. He ment...

The assumption you're looking for is there but relatively inconspicuous. He mentioned a M/M/1/∞ queue but then didn't go into the details about the M/M/1 part. From the Wikipedia page[0]:

> an M/M/1 queue represents the queue length in a system having a single server, where arrivals are determined by a Poisson process...

So the arrival times are not arriving at any fixed rate, but according to a Poisson distribution. In the article he did reference this fact right before the first plot:

> (For the wonks: I used a Poisson arrival process and exponentially distributed processing times)

And then finally, he answered this question more directly in the comments to the article:

> What’s to prevent the system from bouncing between “1 task in processor & 1 task in queue” and “1 task in processor & 2 tasks in queue” while maintaining 100% utilization?

> Nothing! That could totally happen in a queueing system. However, the arrival process would need to be tuned quite precisely to the processing rate. You would need to watch the status of the processor and, when a task finishes, only then insert a new task into the queue. But this implies a shell game: if you always have a task ready to put into the queue when it needs to be padded back out, then isn’t that task in a sense already “queued”?

> Instead, in queueing theory we usually assume a random arrival process: sometimes a minute may pass between arrivals into the queue; sometimes only a second. So the system can’t bounce for arbitrarily long between 1-in-queue and 2-in-queue states. Eventually, one of two things will randomly occur:

> 1. From the 1-in-queue state, the active task finishes processing before a new task arrives, bringing queue size to 0.

> 2. From the 2-in-queue state, a new task arrives in the queue before the active task finishes, causing the queue size to grow to 3.

[0]: https://en.wikipedia.org/wiki/M/M/1_queue