Coordinator Election

##lesson

Introduction

Many approaches in distributed computing involved a special machine that has a unique role or purpose within the system

consider the Time server for time synchronization
The server that polls each machine and synchronizes time
It is not sufficient that the special machine is appointed by an administrator, as distributed systems need to re-appoint another special machine in the event of a failed coordinator

Election is not Synchronisation

When it comes to synchronisation, the selected process needs to know that it can enter the critical section. The other processes need not know that they cannot enter the critical section.
It is not necessary for any process, other than the selected process, to know who has been selected.
On the contrary, all participants must know which process is the coordinator.
Fault tolerance is of secondary importance to Synchronisation. In contrast, election is typically required in the event of a fault.

Bully Algorithm

The Bully Algorithm is a method used in distributed systems to elect a coordinator or leader. It ensures that there is always a coordinator available to handle requests and maintain system stability.

In the Bully Algorithm, each process in the system has a unique identifier or process ID. The process with the highest ID is considered the coordinator. However, if a lower-ranked process believes that the current coordinator has failed or become unresponsive, it initiates an election process to select a new coordinator.

The Bully Algorithm works as follows:

When a process detects that the current coordinator has failed, it sends an election message to all processes with higher IDs.
If a higher-ID process receives an election message, it responds by sending an OK message back to the initiating process.
If no higher-ID process responds within a certain time frame, the initiating process assumes that it is the highest-ranked active process and declares itself as the new coordinator.
Once elected, the new coordinator sends a coordination message to all other processes to inform them of its selection.
If any lower-ranked processes receive this coordination message, they update their knowledge of the new coordinator and continue normal operations.

The Bully Algorithm ensures that there is always a coordinator available in case of failures or unresponsiveness. It also guarantees that only one process becomes the coordinator at any given time.

However, it's important to note that the Bully Algorithm does not handle situations where multiple processes initiate an election simultaneously or when network partitions occur. Additional mechanisms are needed to address these scenarios in distributed systems.

Assumptions

All messages are delivered within T(m) time. This is called message propagation time.
Once a message is received, a reply will be sent within T(p) time. This is called message handling time.
T(m) and T(p) are known.

Failure detection

When a node did not receive a response from the coordinator

Election

within $2 \times T (m) + T (p)$ It will start a election process

It broadcast the election message to all the node that have a higher id than itself to check whether they are alive
those with lower id it does not care since it know that it is higher than them
If the node receive a No response, it fails the election
The other node that receives the elections request will be aware of the coordinator failure and try to start election itself
If a node did not receive a No response then it is considered as the coordinator
Election can be done concurrently

Example

Announcement

The node that won the election will announce to the rest of the node that it is the new coordinator

Edge cases

When a node won the election, but it failed before announcement is made
1. The other node detects that it still cant communicate with the coordinator, which is 5 in this case since 4 has not yet made its announcement
2. The other nodes will repeat the election process
The old node wake up after announcement
1. When a node wake up, it will start election
The old node wake up during discovery, before announcement
1. wait for the request and reject
2. 4's election failed
The old node wake up after discovery, before announcement
1. Both new elected coordinator and the old node will send out a message that it is the coordinator
2. since the old node has a higher node, it will announce to the new node, and the new node will announce to the rest that the old node is the new coordinator
3. In this case, 5 wake up and send the announcement to 4 and 4 will broadcast to the rest of the node that 5 is the coordinator

Communication complexities

the communication complexity of the bully algorithm can be O(N^2) in the worst case scenario where every process sends an Election message to every other higher-ranking process. However, with efficient handling of responses and timeouts, this complexity can be significantly reduced in practice.

Ring Election

Processors are logically connected as a ring
Each node in the system is aware of its successor, its successor's successor

Each node is aware of the full logical ring structure

Discovery Process

A node realizes that the coordinator has failed when it communicate with the coordinator
After realizing that the coordinator has failed, a node passes an election message with its ID around the ring.
The election message is passed to its successor
However, if the successor has failed, it is skipped and the message is sent to successor’s successor and so on.
After receiving an election message, a machine appends its own ID and passes the modified election message around the ring.
Once an election message is received and if the receiver machine finds its own ID in the message, then we conclude that the message had made its way all around the ring.

Announcement

Once an election message had made its way all around the ring, the machine with the highest ID is chosen as the coordinator.
At this point, the coordination message is passed similarly around the ring.

Adding new node

2 Loops to traverse the ring, one round to discover, one round to announce

This process can be simplified by the new node asking for the current ring structure from a current node in the group and the new node will compute the new ring structure and start broadcasting

Important

When multiple node trying to insert at the same time, we only allow one retriever of the node structure at the same time

Edge cases

5 wakes up after 4 sent the message to 1
1. reject and re-elect itself as the new coordinator
5 wakes up during the election
1. still the same since its id is append to the message

Invitation Algorithm

In practice communication failure, high latency, and/or congestion can partition a network. We can assume that a collection of machines, under the direction of a coordinator, can perform useful work, even if other such groups exist.
Thus, we have an arrangement such that each group of machines that can communicate among themselves is directed by a coordinator.
Different groups of processors, each operating under the direction of a different coordinator, can co-exist.

Network partition

The Invitation Algorithm provides a protocol for forming groups of available machines within network partitions, and then creating larger groups as failed machines are returned to service or network partitions are rectified.

A node realizes that the coordinator has failed, elects itself as the coordinator and broadcasts the coordinator message to all other reachable nodes
A node responds to such a message and tries to elect itself as the coordinator similarly
if a response is received from a higher ID machine, then the machine starting the election process retires
You will have multiple partitions after the election

Partition merge

Discover

Periodically all the coordinators of the individual partitions will check for the existence of all other coordinator by broadcasting the message, all other coordinator will respond to the message if they are the coordinator

Merging

Once the coordinators knows each other's existence and the coordinator with the high id will initiate the merge to the lower coordinator node

announcement

After the lower coordinator node agree to join the other network, it will broadcast to the members of it partition to join the new partition

Advantage

Partially solves the problem of detecting node failures

Issues

All processing stops when the partitions are being merged
Does not guarantee global consistency of data structure across all notes.
- Consistency of data structures is only guaranteed within a partition but not across partitions