Some of the high end Xeon systems have gobs of PCIe lanes. If you have a four CPU socket system, you can hang four x16 PCIe cards off of each processor.
Both have max 32 PCI Express lanes - so you can theoretically hang max 2 PCIx x16 cards off each CPU, but as few of the lanes is reserved for other peripherals, in 4GPU/CPU setup they will practically run at lower speeds, ie 8/8/4/4. However, using system with PCIe root complex [1] can improve the GPU<->GPU communication speed.