I'd assume it's because you can just reconfigure to point pgbouncer to redirect ...

scurvy · on Dec 7, 2017

How much effort is required to reconfigure pgbouncer across hundreds of app containers vs moving a single IP? (not using pacemaker/corosync)

majidazimi · on Dec 7, 2017

Zero. Update configuration in a central location. Push an update to configuration manager agent on each node (using a dead simple gossip protocol). The agent wakes up, pulls new configuration, writes it and reloads pgbouncer. Using an aggressive gossip protocol, the whole process shouldn't take more than 5 seconds on a large cluster, let alone couple of nodes.

scurvy · on Dec 7, 2017

That's a fair amount of moving parts. What if one of the agents misses the update and keeps using the old server? What happens during that period of 5 seconds where the hosts are switching? It doesn't sound like an atomic, instant change process.

majidazimi · on Dec 7, 2017

It's gossip. There is no precise gossip protocol. I've just mentioned one approach. It's not difficult to modify it such that each node gets one message multiple times, to ensure everyone will see the changes. Centrally configuration management systems like Zookeeper/etcd,... will also suffer from the same problem: message propagation takes time. Even with floating IP, a couple of requests may fail until routing tables converge. So there is no atomic configuration switch in either case anyway. The argument was if floating IP is easier than pure software approach to deploy, which is not the case. both of them are equally challenging. One of them requires experienced network engineers, the other one requires experienced distributed system designer. In case of failure the effort to switch from failed node is zero in both approaches.