Coordinator Election
Introduction
Many approaches in distributed computing involved a special
machine that has a unique role or purpose within the system
- consider the Time server for time synchronization
- The server that polls each machine and synchronizes time
It is not sufficient that the special machine isappointed
by an administrator, as distributed systems need to re-appoint another special machine in the event of afailed coordinator
Election is not Synchronisation
- When it comes to synchronisation, the selected process needs to know that it can enter the critical section. The other processes need not know that they cannot enter the critical section.
- It is not necessary for any process, other than the selected process, to know who has been selected.
- On the contrary, all participants must know which process is the coordinator.
- Fault tolerance is of secondary importance to Synchronisation. In contrast, election is typically required in the event of a fault.
Bully Algorithm
The Bully Algorithm is a method used in distributed systems to elect a coordinator or leader. It ensures that there is always a coordinator available to handle requests and maintain system stability.
In the Bully Algorithm, each process in the system has a unique identifier
or process ID. The process with the highest ID is considered the coordinator. However, if a lower-ranked process believes that the current coordinator has failed or become unresponsive, it initiates an election
process to select a new coordinator.
The Bully Algorithm works as follows:
- When a process detects that the current coordinator has failed, it sends an election message to all processes with higher IDs.
- If a higher-ID process receives an election message, it responds by sending an OK message back to the initiating process.
- If no higher-ID process responds within a certain time frame, the initiating process assumes that it is the highest-ranked active process and declares itself as the new coordinator.
- Once elected, the new coordinator sends a coordination message to all other processes to inform them of its selection.
- If any lower-ranked processes receive this coordination message, they update their knowledge of the new coordinator and continue normal operations.
The Bully Algorithm ensures that there is always a coordinator available in case of failures or unresponsiveness. It also guarantees that only one process becomes the coordinator at any given time.
However, it's important to note that the Bully Algorithm does not handle situations where multiple processes initiate an election simultaneously
or when network partitions occur
. Additional mechanisms are needed to address these scenarios in distributed systems.
Assumptions
- All messages are delivered within T(m) time. This is called message propagation time.
- Once a message is received, a reply will be sent within T(p) time. This is called message handling time.
- T(m) and T(p) are known.
Failure detection
When a node did not receive a response from the coordinator
Election
within
- It
broadcast
the election message to all the node that have ahigher id
than itself to check whether they are alive - those with lower id it does not care since it know that it is higher than them
- If the node receive a
No
response, it fails the election - The other node that receives the elections request will be aware of the coordinator failure and try to start election itself
- If a node did not receive a
No
response then it is considered as the coordinator - Election can be done concurrently
Example
Announcement
The node that won the election will announce to the rest of the node that it is the new coordinator
Edge cases
- When a node won the election, but it failed before announcement is made
- The other node detects that it still cant communicate with the coordinator, which is 5 in this case since 4 has not yet made its announcement
- The other nodes will repeat the election process
- The old node wake up after announcement
- When a node wake up, it will start election
- The old node wake up during discovery, before announcement
- wait for the request and reject
- 4's election failed
- The old node wake up after discovery, before announcement
- Both new elected coordinator and the old node will send out a message that it is the coordinator
- since the old node has a higher node, it will announce to the new node, and the new node will announce to the rest that the old node is the new coordinator
- In this case,
5
wake up and send the announcement to4
and4
will broadcast to the rest of the node that5
is the coordinator
Communication complexities
the communication complexity of the bully algorithm can be O(N^2) in the worst case scenario where every process sends an Election message to every other higher-ranking process. However, with efficient handling of responses and timeouts, this complexity can be significantly reduced in practice.
Ring Election
Processors are logically connected as a ring
Each node in the system is aware of its successor, its successor's successor
- Each node is aware of the full logical ring structure
Discovery Process
- A node realizes that the coordinator has failed when it communicate with the coordinator
- After realizing that the coordinator has failed, a node passes an election message with its ID around the ring.
- The election message is passed to its successor
- However, if the successor has failed, it is skipped and the message is sent to successor’s successor and so on.
- After receiving an election message, a machine appends its own ID and passes the modified election message around the ring.
- Once an election message is received and if the receiver machine finds its own ID in the message, then we conclude that the message had made its way all around the ring.
Announcement
- Once an election message had made its way all around the ring, the machine with the highest ID is chosen as the coordinator.
- At this point, the coordination message is passed similarly around the ring.
Adding new node
2 Loops to traverse the ring, one round to discover, one round to announce
This process can be simplified by the new node asking for the current ring structure from a current node in the group and the new node will compute the new ring structure and start broadcasting
When multiple node trying to insert at the same time, we only allow one retriever of the node structure at the same time
Edge cases
- 5 wakes up after 4 sent the message to 1
- reject and re-elect itself as the new coordinator
- 5 wakes up during the election
- still the same since its id is append to the message
Invitation Algorithm
- In practice communication failure, high latency, and/or congestion can partition a network. We can assume that a collection of machines, under the direction of a coordinator, can perform useful work, even if other such groups exist.
- Thus, we have an arrangement such that each group of machines that can communicate among themselves is directed by a coordinator.
- Different groups of processors, each operating under the direction of a different coordinator, can co-exist.
Network partition
The Invitation Algorithm provides a protocol for forming groups of available machines within network partitions, and then creating larger groups as failed machines are returned to service or network partitions are rectified.
- A node realizes that the coordinator has failed, elects itself as the coordinator and broadcasts the coordinator message to all other reachable nodes
- A node responds to such a message and tries to elect itself as the coordinator similarly
- if a response is received from a higher ID machine, then the machine starting the election process retires
- You will have multiple partitions after the election
Partition merge
Discover
Periodically all the coordinators of the individual partitions will check for the existence of all other coordinator by broadcasting the message, all other coordinator will respond to the message if they are the coordinator
Merging
Once the coordinators knows each other's existence and the coordinator with the high id will initiate the merge to the lower coordinator node
announcement
After the lower coordinator node agree to join the other network, it will broadcast to the members of it partition to join the new partition
Advantage
Partially solves the problem of detecting node failures
Issues
- All processing stops when the partitions are being merged
- Does not guarantee global consistency of data structure across all notes.
- Consistency of data structures is only guaranteed within a partition but not across partitions