Distributed Mutual Exclusion

Distributed Mutual Exclusion Introduction

To maintain mutual exclusion among n distributed processes/machines/nodes

while (true){
	Perform local operations
	acquire()
		execute critical section
	release()
}

During critical section, the machines can interact with remote processes or manipulates shared memory

Properties of Distributed Mutual Exclusion

Safety

Safety means that only one process is allowed to access the shared resource at a time, preventing concurrent access that could lead to data corruption or unexpected behavior.
To ensure safety, a distributed mutual exclusion algorithm should guarantee that, at any given moment, either no process is in the critical section or only one process is in the critical section.

Liveness

liveness guarantees that the system does not deadlock or remain in a state where no process can make progress indefinitely.

Fairness

Fairness ensures that processes requesting access to the shared resource are treated fairly in terms of granting access. It prevents situations where a particular process is continually granted access to the critical section while others are repeatedly denied, leading to potential starvation or unfairness.
fairness means that every process that requests access to the critical section should eventually be granted access, and no process should be unfairly prioritized over others.
There is a bounded wait time and request should be served in the order of request time

Other Expectations

Assumptions

  1. The network is reliable
    1. All messages sent get to their destination eventualy
    2. Not realistic, we will get to that later in the course
  2. The network is asynchronous
    1. Messages may take long time
  3. Machines may fail at any time (not all protocols handle it in its vanilla form)

General Idea

  1. Entry: Before entering critical section, a machine must get permission from other machines
  2. Exit: When exiting critical section, a machine must let the others know that he has finished executing the critical section
  3. Receving request: For fairness, machines allow other machines who had asked for permission earlier than them to proceed

Centralized lock server

Lock Server121234i) Requesti) RequestHeadii) okiii) Releaseiii) Headiv) ok

  1. Entry: Send REQUEST to central server and wait for permission
  2. On receving the request: Server puts it in the queue and delay sending response back to process until process is at the head of queu (FIFO)
  3. Exited: Once the critical section is executed, the machine sends a RELEASE message to the central lock server
  4. On Receiving a request: Removes the machine from the queue after it gets released

Disadvantages

Ring-base protocol

12345TokenMachine 1can run critical sectioncode

Machines are logically structured in a ring and a token is passed along the ring and only the machine with the token can execute the critical section.

If a machine wants to enter critical section it holds on to the token and only forward to the next node after the execution is completed

Property analysis

Property Explanation
Safety Distributed mutual exclusion ensures safety by only allowing the machine with the token to execute the code. This guarantees that only one process can access the critical section at a time, preventing race conditions and ensuring correctness.
Liveness The liveness of distributed mutual exclusion depends on the worst-case scenario where a process has to loop through the entire ring before acquiring the lock. In this case, it may take a significant amount of time for a process to be able to enter the critical section, potentially impacting system performance and responsiveness. Therefore, liveness may be compromised in certain situations.
Fairness Distributed mutual exclusion does not guarantee fairness as it does not take into account when requests come in or where the token is currently located. Consequently, processes that are further away from the token or have recently made requests may have to wait longer before being able to acquire the lock, resulting in an unfair distribution of resources. Hence, fairness is not guaranteed in this scenario.

Fair Ring base protocol

The token now contains the timet for the earliest known, yet outstanding request

Entry of critical section

Upon receiving the token, there are 3 cases that might be happening
Assuming Tm is the time stamp in machine

Property Explanation
Safety Safety is ensured in the fair ring-based mutual exclusion algorithm. This means that only one process can access the critical section at a time. When a process wants to enter the critical section, it sends a request message to its neighbor and waits for acknowledgment before proceeding. This ensures that no multiple processes can concurrently access the critical section, ensuring safety.
Liveness The fair ring-based mutual exclusion algorithm does not guarantee liveness as it introduces an additional loop for passing the token. This extra loop can result in delays and potential deadlocks if a process fails or takes too long to release the token. Therefore, there is no guarantee that every process eventually gets access to the critical section, making it worse than the unfair version of the ring-based algorithm in terms of liveness.
Fairness The fair ring-based mutual exclusion algorithm ensures fairness by using a token passing mechanism. Each process holds a token that determines its turn to enter the critical section. The token circulates around the ring, and when a process receives the token, it can enter the critical section if it has requested access. If a process does not want to enter the critical section, it passes the token to its neighbor. This ensures that every process gets an equal opportunity to access the critical section, making it fair among all processes participating in the distributed system.

Lamport's Shared Priority Queue

Each machine maintains locally a queue a part of shared priority queue
If a machine wants to run a critical section

Guaranteed properties

  1. All other machines are aware of the request of a machine's request to enter critical section
  2. each machine is aware of any earlier request

21) Req(T)Req(T)Q2132) Req(T)2) Req(T)4) Req(T)Q14) Req(T)Q33) if there is any outstandingrequest in the queue3) if there is anythingexist in the queue5) Reply5) Reply13If there is nothing, append to queuereply yesif there is something, append to queuehold the replyIf there is nothing, append to queuereply yesif there is something, append to queuehold the requestRecord of who has responded6) Pop the head7) RELEASE7) RELEASE8) Pop the head8) Pop the head

Enter the critical section

Exiting Critical section

  1. Base on who has reply that is stored, when exiting the critical section, it will release the lock and broadcast RELEASE message to the rest of the machine
  2. Pop the head of the queue

Receiving a acquire request

When a machine receive a request from other machine to carry out critical section

  1. It checks if there is any outstanding request locally
  2. If no,
    1. append the request to the queue
    2. reply that it can enter
  3. if yes
    1. Hold on the reply

Receiving the release request

pop the head of the queue

Multiple Nodes Requesting

Property analysis

Safety analysis

Base setup machine P1 made request and P2

Property Explanation
Safety Yes,

Optimised Version

Steps

  1. When a node receives a reply, and it does not have any earlier outstanding request, it will simply reply but will not store the request in the priority queue.
  2. Once a node receives all the reply from the rest of the node, it will start execution
  3. After executing critical section, it will pop the head of the queue
  4. Then it sends a reply to the rest of the outstanding request in the queue and clears the queue

Property Analysis

Safety:

Caveats

It does not handle fault tolerance
a node can only proceed if it receive a reply

Voting Protocol

Idea

Properties

  1. At most one process can have the majority of the VOTEs at any given time

Steps

  1. Broadcast request
  2. Wait for majority of VOTEs to arrive
  3. Critical section execution
  4. Release votes back
    This is fault tolerance

Simple case

p1p2p3p4p5VoteVoteVoteCSRelease VoteRelease Vote

Complex case

When the machine receive the request, it will check if it has voted before, if yes then it will put the request in a queue
When release vote is received, it will vote the top node in the queue

p1p2p3p4p5VoteP5VoteP5VoteP5VoteP2P2VoteCSReleaseReleaseReleaseVoteCS

Problem

Deadlock

p1p2p3VoteVoteVote

Property analysis

Property Explanation
Safety Mutual exclusion but deadlock
liveness Max RTT Amount the fastest n/2+1 machine
Fairness No

Dead lock prevention

Case 1: P1 has not enter critical section

p1p2p3p4Req(T+X)Req(T+X)Req(T+X)Req(T)Req(T)VoteVoteRescind VoteRELEASEp4 request time earlier than p1Vote

Case 2: P1 In the midst of critical section

p1p2p3REQ(T+X)voteREQ(T)voteCSRECIND_VOTERELEASEVote

Advantages

It is fault tolerant than other of the protocols discussed previously. it works even when (N/2-1) machines are down in the network

Maekawa Voting Protocol

The goal is to reduce the number of messages during the collection and release of VOTEs
It is not necessary to broadcast the REQUEST message. Instead each machine can have its own VOTING Set, sends requests within this set and collects VOTEs only from the machines in its VOTING set.

Design

Design Voting set

Pasted image 20231016130137.png

Question

what is the number of VOTEs to be collected before a machine can enter critical section?

Answer

All the nodes in the voting set have to reply
It is kind of fault tolerant, because it does not require entire system's nodes to be alive, but it require all the machines in the voting set to the alive

Question

What is the total number of messages exchanged

Answer

2×(n1)+2×(n1)+2×(n1)=6×(n1)
No. Request + No. votes + No. release

Quiz

Consider the basic version of the voting protocol (i.e., without deadlock prevention) with five (5) nodes. Assume two out of these five nodes, say N1 and N2, wish to enter the critical section concurrently. N1 receives three votes and N2 receives two votes.is it possible for N3/N4/N5 to have two requests of N1 in the queue? if no provide a example to proof

No, it is not possible for N3/N4/N5 to have two requests of N1 in the queue.

Let's understand this by breaking down the voting protocol of distributed mutual exclusion:

  1. When a node (say N1) wants to enter a critical section, it sends a request to all nodes in the system.
  2. On receiving a request, each node gives its vote to one request at a time using some priority rules (e.g., based on timestamps).
  3. A node is allowed to enter the critical section when it has received votes from a majority of nodes in the system.

In this scenario, N1 and N2 are trying to enter the critical section concurrently. N1 receives three votes and N2 receives two votes. So, N1 will be allowed to enter into the critical section.

Now, let's consider your question: is it possible for any of the remaining nodes (N3/N4/N5) to have two requests from N1 queued up?

No, because:

Hence, until N1’s current request is processed completely (i.e., until it exits from CS and releases its votes), no other node can have another request from N1 in their queue.

Example: Assume that Node 1 (N1) has sent two consecutive requests R1 and R2 respectively. After sending R1, before N3/N4/N5 could process R2 they must first receive a release message for R1 from N1. Therefore at any given moment, there can only be one request from N1 in their queue. Thus, two requests from the same node cannot exist simultaneously in the queue of N3/N4/N5.

No, it's not possible for N3/N4/N5 to have two requests of N1 in the queue. According to the voting protocol, a node (N1 in this case) will send a request message to all other nodes only once when it wants to enter the critical section. It will not send another request message until it has exited the critical section. Therefore, each of these nodes (N3/N4/N5) can only have one request from N1 in their queue at any given time.

N1N2N3N4N5VoteVoteVoteVoteVotecsReleaseReleaseVotecsRelease 1Request11

For voting protocol, does the when a node finish execution of critical section, it release the votes but does it pop the request from other nodes's queue who did not vote for it

In a distributed mutual exclusion system using the voting protocol, when a node finishes execution of its critical section, it does indeed release the votes it has acquired. However, it does not directly pop or remove requests from other nodes' queues who did not vote for it.

The responsibility of managing and updating its own request queue falls on each individual node. Once a node receives a release message from the node that has just exited its critical section, this triggers the former to update its own queue by removing the corresponding request.

Therefore, while the exiting node initiates this process by sending out release messages, it's up to each recipient node to then handle its own queue management in response.

By design, ring-based protocol for mutual exclusion is not fault tolerant. Design a simple protocol to make the ring-based protocol fault tolerant. Specifically, discuss the following: 1) messages sent and received and the protocol to handle them, 2) how to handle nodes leaving the network, 3) how to handle nodes joining the network, 4) how to handle intermittent faults.

Designing a fault-tolerant ring-based protocol for mutual exclusion involves ensuring that the system can handle failures and can recover to a normal state even when some of its components fail.

  1. Messages sent and received and the protocol to handle them:

In a basic ring-based protocol, nodes exchange two types of messages: REQUEST and RELEASE. In our fault-tolerant protocol, we will introduce an additional message type: HEARTBEAT.

If a node receives a REQUEST message and it's not currently in or waiting for the critical section, it forwards the request to its successor. If it receives a RELEASE message, it removes the corresponding request from its queue and forwards the release message.

  1. How to handle nodes leaving the network:

When a node leaves (either voluntarily or due to failure), its predecessor needs to establish contact with its successor. To facilitate this, each node maintains information about both its predecessor and successor in the ring.

Periodically, each node sends out a HEARTBEAT message; if several consecutive heartbeats from any given node are not received, that node is assumed to have left or failed. The predecessor of this presumed-failed node then establishes contact with its next known active successor in order to maintain continuity in the ring.

  1. How to handle nodes joining the network:

When a new node wants to join, it sends a JOIN request to any one of the existing nodes. That existing node responds with information about itself and its current successor. The new node then positions itself between these two nodes by updating their predecessor/successor information appropriately.

  1. How to handle intermittent faults:

Intermittent faults can be handled by a combination of heartbeat messages and temporary reconfiguration of the network. If a node doesn't respond to several consecutive heartbeats, it's assumed to be faulty and temporarily removed from the network. Once it starts sending heartbeats again, it can be reintegrated into the network using the same protocol as for joining nodes.

To ensure data consistency, any changes made in the critical section during the fault period should be broadcasted or sent directly to the reintegrated node before it can enter the

1.     Consider Lamport’s shared priority queue protocol for distributed mututal exclusion without Ricart and Agrawala’s optimization. You have three machines P1, P2 and P3, none of which is malicious. Let us assume P1 and P2 request to enter the critical section concurrently. Is it possible to encounter a scenario where the following happens: Upon receiving the RELEASE message from P2, machine P3 removes the request of P1 from P3’s priority queue. If the scenario is impossible, then provide a proof. If the scenario is possible, then draw a sequence diagram or discuss the scenario point-by-point as it happens. You may assume that the network layer does not drop any message.

The scenario described is impossible according to Lamport’s shared priority queue protocol for distributed mutual exclusion. The reason is that the protocol works in a way that each machine maintains a priority queue of requests based on the timestamp of the request and the id of the machine. The machine with the smallest timestamp (i.e., earliest request) is given priority and if two machines have equal timestamps, then the one with smaller id is given preference.

In this scenario, P1 and P2 request to enter the critical section concurrently. According to Lamport's algorithm, all machines, including P3, will add these requests to their priority queues as they receive them. The order will depend on their timestamps and ids.

When P2 exits from its critical section, it sends a RELEASE message to all other machines in the system including P3. Upon receiving this RELEASE message from P2, P3 will only remove P2’s request from its priority queue because this message serves as an indication that P2 has finished executing its critical section and therefore its request can be safely removed from other machines' queues.

However, this does not affect P1's request in any way because it was not P1 who sent the RELEASE message but rather it was from P2. So, even after receiving RELEASE message from P2, machine P3 cannot remove the request of machine P1 from its queue because it has not received a RELEASE message for that specific request i.e., from machine P1.

Therefore, there is no possible scenario where upon receiving a RELEASE message from one machine (P2), another machine (P3) removes a different machine's (P1) request from its queue. Thus proving that such scenario is impossible under Lamport’s shared priority queue protocol for distributed mutual exclusion without Ricart and Agrawala’s optimization.