I've never understood these "fancy" algorithms. I've always just used HTB with simple egress-only rate and ceil limiting tied to the actual ISP line upload rate and it's never failed to make the latency problem go away.
What am I missing? Are things like cake/codel "easier" to deploy than creating a single paired qdisc and class?
Token buckets have burstiness problems, CAKE lets you run things at ~99%+ of your link speed, assuming a stable medium, without bufferbloat, thanks to the packet pacing it does. CAKE also does several other things using Cobalt (CoDel + BLUE), and clever 8-way set associative hashing that helps out when you have many flows.
Ironically, we made fq-codel the default in many linuxes starting in 2012. HTB will pick up the default qdisc when you set that, so we figured many small shaping tools were just picking that up automagically.
Most people do not measure in-stream latency, rather multiple streams, so, assuming you did or did not pick up fq_codel, a packet capture will show you the tcp RTTs being managed or not. Note that on a system where the tcp is originating and then transitioning htb on the same box, another subsystem (TSQ) managed the rates fairly well before hitting HTB.
What am I missing? Are things like cake/codel "easier" to deploy than creating a single paired qdisc and class?