Clocks and Synchronisation

Clocks

In distributed systems, if the timing of the node is wrong, there might be different impact to the program
different machine have different understanding about the clocks

Real time system: requires task to be completed within a given time

Physcial Clock

Centralized Systems

Distributed Systems

Resynchronizing Clocks

How we synchronize the clock

Ultimate goal is to have all nodes have the same time and correct time

Cristian's Algorithm

Assumptions

  1. Server has zero computation cost
    1. Take into account of server latency
    2. server timestamp + (RTT-server latency)/2
    3. server to append the computation time back
  2. the send and receive delay is symmetric
    1. Keep track of recent RTTs and repeat request when an RTT is an outlier
    2. αi=Tserveri+RTTi2
    3. otherwise take a weighted average of n measurements
      1. Avg.Client.Click(n)=Wαn+(1W)Avg.Client.Clock(n1)
    4. average the error
      1. error(i)=αiTclienti
      2. Avg.error.client(n)=Werror(n)+(1W)Avg.error.Client(n1)
    5. Recursive call give the most recent measurement more weight
    6. Tclientn+avg.error.client(n)
  3. client clock does not change meanwhile

check if your computer clock is in sync

Berkeley Algorithm

Similar idea to Christian Algorithm but does not have dedicated server
It synchroize time base on polling from other machine server and compute the time.
The server also broad cast to all machines

Berkeley algorithmreq timereq timeRTT1RTT2Client 1Client 2Server
C1 has time T1=T1+RTT12+RTT2RTT1
C2 has time T2=T2+RTT22
Server time T3
Tavg=T1+T2+T33

Adjusting

After calculating the Tavg we can make adjustment to all three servers
adjustadjustadjustClient 1Client 2Server
Its more scalable than Cristian Protocol as it is not bottlenecked by a single server, even though it requires more network calls to reach synchronization

Real-world synchronization

Network Time Protocol (NTP)Time SourceStratum 1Stratum 2Stratum 3Client
Validation and fault tolerance for having multiple servers and synchronization within each level

Synchronization

Server atStratum X+1Server atStratum XT1T2T3T4
RTT=(T4T1)(T3T2)
Adjustments at X+1 T3+RTT2

Which Server to choose

  1. Lower Stratum servers more accurate
  2. Reliability of communication/latency

Challenges

Logical Clock

As compared to real physical clock, we might be interested in the logical clock (when does message arrive)
We can only have partial order in distributed system
imagine a group project, people work independently on different task, but at certain stage, it should be synchronized and the order matters

Happens-before Relation

Implication

if A->B then L(A)<L(B) where L(A) is the logical clock value
Each machine's logical clock is local to the machine itself

p1p2p1.1p1.3p1.4p1.2p2.1p2.2p2.3p2.4p2.5p2.6p2.7p2.8p1.5send msg with clock valueL(p1.1)=1L(p1.2)=2L(p2.1)=1L(p2.2)=2L(p2.3)=3L(p2.4)=4L(p2.5)=5max(L(p1.2),L(p2.5))+1 => L(p2.6)=6L(p2.7)=7L(p1.3)=3max(L(p1.3),L(p2.7))+1 => L(p1.4)=8L(p1.5)=9L(p2.7)=8
max(L(p1.2),L(p2.5))+1 is to make sure that the logical clock value of receiving is alway higher than the sending end

Total Ordering of Events

Total ordering of events in a distributed system refers to the establishment of a consistent and agreed-upon sequence for all events that occur across multiple nodes or processes in the system. It ensures that all nodes observe events in the same order, regardless of the actual physical time or the order in which they occur at each node.

In a distributed system, where multiple nodes can operate concurrently and independently, it is challenging to establish a global time reference. Therefore, total ordering provides a logical ordering mechanism that allows nodes to agree on the sequence of events.

Total ordering can be achieved through various techniques such as:

  1. Logical Clocks: Nodes assign logical timestamps to events based on their own local clocks. These timestamps are used to establish a partial order among events, which can later be transformed into a total order.

  2. Vector Clocks: Similar to logical clocks, vector clocks assign vectors of logical timestamps to events. Each node maintains its own vector clock, and when events are exchanged between nodes, their vector clocks are updated accordingly.

  3. Lamport Timestamps: Proposed by Leslie Lamport, Lamport timestamps are used to establish partial ordering among events based on causality. Each event is assigned a unique timestamp, and if one event causally depends on another event, its timestamp will be greater than the previous event's timestamp.

  4. Physical Clock Synchronization: Nodes synchronize their physical clocks using protocols like Network Time Protocol (NTP) or Precision Time Protocol (PTP). Once synchronized, physical timestamps can be used as an approximation for total ordering.

By establishing total ordering of events in a distributed system, applications can ensure consistency and reliability in various scenarios such as distributed databases, replicated systems, messaging systems, and consensus algorithms like Paxos or Raft.

By establishing total ordering of events in a distributed system, applications can ensure consistency and reliability in various scenarios such as distributed databases, replicated systems, messaging systems, and consensus algorithms like Paxos or Raft.

In a total ordering If the logical clock value is the same then they must be concurrent

Causality Violation

Causality violation in distributed systems refers to a situation where the order of events observed by different processes or clients does not agree with the causal relationship between those events. In other words, it is a violation of the cause-effect relationship between events in a distributed system.

Lamport's logical clock can be used for a total ordering

Example

m1m2m312345c1c2c3Failed
M1 -> Move File 'X' to C2
M2-> Use File 'x' from C2
M3-> read record 'y' from file 'x'

Explanation

M1 states "Move File 'X' to C2." This message indicates that file 'X' is being moved from C1 to C2.

M2 instructs C3 to use File 'X' from C2. This message assumes that file 'X' has already been moved to C2. However, since M2 is sent after M1, it violates the causality principle because M2 depends on the successful execution of M1. Therefore, M2 should not refer to File 'X' in C2 before it is actually moved.

M3 reads record 'Y' from File 'X'. This message assumes that File 'X' exists in C2. However, since M2 violates causality, M3 also violates causality because it tries to read from File 'X' in C2 before it is actually moved.

Potential causality violation

Intuition

We want to detect potential causality violation

detect the order in which the message are send and receive is different across all nodes in the system

Send(M1)-> Send(M2)
Send(M1)-> Rec(M1)
Send(M2)-> Rec(M2)
Rec(M2)-> Send(M3)
Send(M3)-> Rec(M3)
Rec(M3)-> Rec(M1)

Therefore
Rec(M2)->Rec(M1)

Vector Clock

There is no information embedded in the logical clock value that the file is sent or not

How vector clock works

(0,0,0)(0,0,0)(0,0,0)(1,0,0)(2,0,0)(2,0,0)(1,0,0)(1,0,0)(0,0,0)(1,0,0)+1(1,0,1)Receive OperationMax(1,0,2)(1,1,2)(2,2,2)(1,0,2)Vector Clock Operation(0,0,0,0)(0,0,0,0)(0,0,0,0)(0,0,0,0)(1,0,0,0)(1,0,0,0)(1,1,0,0)(2,0,0,0)(2,0,0,0)(2,0,1,0)(2,0,2,0)(2,0,2,0)(2,0,2,1)(1,2,0,0)(2,2,3,0)(2,0,2,2)(2,0,2,2)(3,0,2,2)(4,0,2,2)(4,0,2,2)(4,2,4,2)(2,0,2,3)(2,0,2,3)(4,2,5,3)

Comparing vector

Comparison of vector time stamps is crucial for the detection of causality violation
comparing vector clocks requires element-wise comparison

if A->B then V(A)<V(B)

Event 1 -> Event 2Both event arein the same machineBoth events are in different machineV1 = [V1.1,V1.2,...,V1.x,...V1.m]V2 = [V1.1,V1.2,...,V1.x+ ,...V1.m]Event 1 is sendEvent 2 is receiveV2=max(V1,..(internal clock))+ (0,0,...1,...,0)

Detecting a causality violation

Upon receiving a message, its vector clock (recall that the vector clock is associated with each sent message) is checked with the local clock.

If the local clock is after (in line with the vector comparison) the message clock, then a causality violation is flagged.

Why is that so

Every machine only updates its own clock and not touching the rest of the vector.
If there is an advance clock describe previously can only happen if there exist another machine that receive the message that increments the clock
m1m2m312345c1c2c3Failed
For this example, 2 updates the c1 clock and is captured by c3 which is then sent to c2 resulting in c2 having a more advanced c1 clock than the clock sent by m1

1.     Discuss one disadvantage of Berkley’s clock synchronization protocol as compared to Cristian’s protocol. Illustrate the shortcoming with an example. You may assume that the network is ideal in the sense that sending and receiving messages take equal amount of time. No marks will be awarded in the absence of the example.

Berkley’s clock synchronization protocol and Cristian’s protocol are both used in distributed systems to synchronize clocks. However, each has its own advantages and disadvantages. One key disadvantage of Berkley’s protocol in comparison to Cristian's protocol is that it involves more communication overhead.

Berkley’s protocol operates by having a coordinator node (often the node with the master clock) send a message to all other nodes in the system, asking them for their current time values. Each node then responds with their time value. The coordinator collects all these responses, calculates an average time value, and sends this average back to all nodes. Each node then adjusts its clock accordingly.

This process introduces a higher level of communication overhead compared to Cristian's protocol because every node must communicate with every other node multiple times during each synchronization operation. This can be problematic in large distributed systems where there are many nodes, resulting in high network traffic and increased latency.

On the other hand, Cristian's protocol works by having each client send a request to a single server for its current time, receive a response from the server, and adjust its clock accordingly. This process involves significantly less communication between nodes compared to Berkley’s protocol.

Example: Imagine a network of 10 nodes using Berkley’s algorithm for synchronisation. For each synchronisation operation, the coordinator would need to send out 10 messages (one to each node), receive 10 responses (one from each node), calculate an average time value, and then send out another 10 messages (the adjusted time value) - that's a total of 30 messages for one round of synchronisation.

In contrast, if this same network was using Cristian's algorithm instead - each client would just need to send one message request to the server and receive one response back - that's just two messages per client or 20 messages total for all clients for one round of synchronisation - significantly less than Berkley’s protocol.