My initial guess is they have nothing to do with each other. It would be like explaining why the next idea pops in your head. You can create a rational explanation but there's no way to test it.
my thoughts too, based on limited understanding of GPT. but the more pressure you apply towards compressing the neural network during training, the more circuitry these paths are likely to share. it would be interesting to see just how much and which parts could be folded together before you start to lose significant fidelity (though unfortunately the fidelity seems too low today to even try that).