Ah. That's classic queuing theory. It has a problem.
The early work on network congestion came from Kleinrock, who wrote the classic "Queuing theory". Kleinrock did his PhD thesis at MIT on Western Union Plan 55-A, a telegram switching system which can be thought of as Sendmail built out of relays and paper tape. Message switches look like a classic arrival rate / service rate problem. They have little customer-level back-pressure; you can send an email regardless of whether the transmission system is backed up.
So an open-loop analysis works fine.
The ARPANET had flow control on each link. Nothing could send a message until there was a buffer ready to receive it. So no packets were lost due to congestion. All overload is stopped at the sender. That approach is immune to congestion collapse, but not to lockup.
Then came the pure datagram networks, and TCP/IP. Anybody can send an IP datagram any time they want to, regardless of the network status. So overloads and packet loss are possible. TCP uses retransmission to hide that, imperfectly. This introduces a new set of problems, some of which were non-obvious at the time.
Classical queuing theory is open-loop. Arrival rate is considered to be independent of wait time. In the real world, it's not. Not even for store cashiers. If arrival rate exceeds service rate, the line length does not really grow without bound except in desperate situations. Customers leave without buying and take their business elsewhere. If there is no cashier idle time, the line length will increase only until the customer loss rate increases to match. Many retail managers do not get this.
I coined the term "congestion collapse" in 1984.[1] In 1985, I wrote, in my "On Package Switches with Infinite Storage" RFC, "We have thus shown that a datagram network with infinite storage, first-in-first-out queuing, and a finite packet lifetime will, under overload, drop all packets."[2]
Until then, people had been doing mostly classic queuing theory analysis. That's not enough.
Back then, memory was very expensive, and people were obsessing over how much memory was needed in a router. It was felt that adding more memory would solve the congestion problem. I pointed out that wouldn't work. Now that memory is cheap, that problem appears as "bufferbloat".
Those two RFCs started people thinking about this as a closed-loop problem. Van Jacobson later did much work in this area. I was out of it by 1986. Decades later, people are still fussing with the feedback control problems implicit in that result.
As the original poster points out here, this comes up in other situations, especially chains of services. If you get congestion in the middle of the chain, things will not go well, and there's a good chance of something that looks like congestion collapse, where throughput goes to nearly zero. It's better to push congestion out towards the endpoints.
We still don't have good solutions to congestion in the middle of a pure datagram network. What saved the Internet was fiber optic backbones and cheap long-haul bandwidth. There was a period in the 1990s when traffic had built up but backbone bandwidth was still expensive. The long-haul links choked and the Internet had "storms". There used to be an "Internet Weather Center", where you could check on how congested the major routers were.
I also coined the term "fair queuing". That can be a useful technique for services well above the IP datagram level. Don't use a FIFO queue; queue based on who's sending. If some source is sending too much, let them compete against themselves for the service. Others can still get through. This provides resilience against denial of service attacks.
I put that on a web site of mine some years ago, and for two months someone was pounding on it with useless requests without affecting response time for anybody else. (It wasn't an attack, just ineptitude using a public API.)
As someone who's been doing async-fanout physical-replication of Postgres instances for the longest time using pgBackRest's S3 repo support (the primary writes to S3; the replicas read from S3; the primary's uplink doesn't get saturated serving the writes) I've always wondered where the equivalent for message-queue systems was. Glad to see it :)
The early work on network congestion came from Kleinrock, who wrote the classic "Queuing theory". Kleinrock did his PhD thesis at MIT on Western Union Plan 55-A, a telegram switching system which can be thought of as Sendmail built out of relays and paper tape. Message switches look like a classic arrival rate / service rate problem. They have little customer-level back-pressure; you can send an email regardless of whether the transmission system is backed up. So an open-loop analysis works fine.
The ARPANET had flow control on each link. Nothing could send a message until there was a buffer ready to receive it. So no packets were lost due to congestion. All overload is stopped at the sender. That approach is immune to congestion collapse, but not to lockup.
Then came the pure datagram networks, and TCP/IP. Anybody can send an IP datagram any time they want to, regardless of the network status. So overloads and packet loss are possible. TCP uses retransmission to hide that, imperfectly. This introduces a new set of problems, some of which were non-obvious at the time.
Classical queuing theory is open-loop. Arrival rate is considered to be independent of wait time. In the real world, it's not. Not even for store cashiers. If arrival rate exceeds service rate, the line length does not really grow without bound except in desperate situations. Customers leave without buying and take their business elsewhere. If there is no cashier idle time, the line length will increase only until the customer loss rate increases to match. Many retail managers do not get this.
I coined the term "congestion collapse" in 1984.[1] In 1985, I wrote, in my "On Package Switches with Infinite Storage" RFC, "We have thus shown that a datagram network with infinite storage, first-in-first-out queuing, and a finite packet lifetime will, under overload, drop all packets."[2]
Until then, people had been doing mostly classic queuing theory analysis. That's not enough. Back then, memory was very expensive, and people were obsessing over how much memory was needed in a router. It was felt that adding more memory would solve the congestion problem. I pointed out that wouldn't work. Now that memory is cheap, that problem appears as "bufferbloat".
Those two RFCs started people thinking about this as a closed-loop problem. Van Jacobson later did much work in this area. I was out of it by 1986. Decades later, people are still fussing with the feedback control problems implicit in that result.
As the original poster points out here, this comes up in other situations, especially chains of services. If you get congestion in the middle of the chain, things will not go well, and there's a good chance of something that looks like congestion collapse, where throughput goes to nearly zero. It's better to push congestion out towards the endpoints.
We still don't have good solutions to congestion in the middle of a pure datagram network. What saved the Internet was fiber optic backbones and cheap long-haul bandwidth. There was a period in the 1990s when traffic had built up but backbone bandwidth was still expensive. The long-haul links choked and the Internet had "storms". There used to be an "Internet Weather Center", where you could check on how congested the major routers were.
I also coined the term "fair queuing". That can be a useful technique for services well above the IP datagram level. Don't use a FIFO queue; queue based on who's sending. If some source is sending too much, let them compete against themselves for the service. Others can still get through. This provides resilience against denial of service attacks.
I put that on a web site of mine some years ago, and for two months someone was pounding on it with useless requests without affecting response time for anybody else. (It wasn't an attack, just ineptitude using a public API.)
[1] https://datatracker.ietf.org/doc/html/rfc896
[2] https://datatracker.ietf.org/doc/html/rfc970