In the network partition example, you say that in the smaller partition, changes cannot be commited because they cannot be replicated to the majority of nodes (as the smaller partition is... smaller). How is the partition to know this? The system can't tell the difference between a node leaving the network and a node undergoing a (tempoary) partition.
To give an example, say I have n machines in datacenter A, and n*.99 in datacenter B. datacenter A gets destroyed, permanently. Does datacenter B now reject all (EDIT: where reject = not commit) requests until a human comes along to tell it that datacenter A isn't coming back?
> To give an example, say I have n machines in datacenter A, and n*.99 in datacenter B. datacenter A gets destroyed, permanently. Does datacenter B now reject all (EDIT: where reject = not commit) requests until a human comes along to tell it that datacenter A isn't coming back?
Of CAP, you are now choosing CP with Raft. So yes, the system is unavailable until an external agent fixes it. In other words, the system needs to have a majority of nodes online to be "available".
What would happen if nodes were to be added to each side of a network partition (unknown to the other side), so that each side believed they had a majority? Or is the "writing" side of the partition determined at partition time, and not changed until they are restored?
* needs a round of Raft to notify its presence to other nodes
So you can only add new nodes (automatically) when you have a 'live' system.
majority = ceil((2n + 1)/2) : so by getting the number of available nodes in the partition, nodes can figure out if they are in the majority or minority cluster.
See section 6 in the paper for details of its implementation.
In the goraft implementation https://github.com/goraft they use an explicit leave command that gets added to the log. This way if a leave command was not received, a partition can be assumed.
To give an example, say I have n machines in datacenter A, and n*.99 in datacenter B. datacenter A gets destroyed, permanently. Does datacenter B now reject all (EDIT: where reject = not commit) requests until a human comes along to tell it that datacenter A isn't coming back?