Clocks and Synchronisation

##lesson #clocks

Clocks

In distributed systems, if the timing of the node is wrong, there might be different impact to the program
different machine have different understanding about the clocks

Real time system: requires task to be completed within a given time

Physcial Clock

Centralized Systems

One or more processors share a common bus
Time synchronization is not much of concern

Distributed Systems

Each system has its own timer
Timers are based on oscillation of quartz crystal
These materials are not perfect and drift away from true time

Resynchronizing Clocks

Coordiantion of Physical Clocks
- possible but can never be exact
- distributed algorithms should be tolerant to clock drift
A typical clock has relative error 10^-5
- 100 ticks/second implies 36000 ticks/hour normally
- with error, 360000 ticks/hour +- ticks
- in the worst case, two system drift 8 ticks apart in an hour
Given we need to maintain all clocks in a distributed systems within 2 ticks
- 4 synchronization per hour

How we synchronize the clock

Ultimate goal is to have all nodes have the same time and correct time

Cristian's Algorithm

clients sends a request to timeserver
Time server responds with a timestamp
client synchronizes with the server provided timestamp
measure the round trip time(RTT)

Assumptions

Server has zero computation cost
1. Take into account of server latency
2. server timestamp + (RTT-server latency)/2
3. server to append the computation time back
the send and receive delay is symmetric
1. Keep track of recent RTTs and repeat request when an RTT is an outlier
2. $α^{i} = T_{s e r v e r}^{i} + \frac{R T T^{i}}{2}$
3. otherwise take a weighted average of n measurements
  1. $A v g . C l i e n t . C l i c k (n) = W * α^{n} + (1 - W) * A v g . C l i e n t . C l o c k (n - 1) *$
4. average the error
  1. $e r r o r (i) = α^{i} - T_{c l i e n t}^{i}$
  2. $A v g . e r r o r . c l i e n t (n) = W * e r r o r (n) + (1 - W) * A v g . e r r o r . C l i e n t (n - 1)$
5. Recursive call give the most recent measurement more weight
6. $T_{c l i e n t}^{n} + a v g . e r r o r . c l i e n t (n)$
client clock does not change meanwhile

check if your computer clock is in sync

Berkeley Algorithm

Similar idea to Christian Algorithm but does not have dedicated server
It synchroize time base on polling from other machine server and compute the time.
The server also broad cast to all machines

C1 has time $T_{1}^{'} = T_{1} + \frac{R T T_{1}}{2} + R T T_{2} - R T T_{1}$
C2 has time $T_{2^{'}} = T_{2} + \frac{R T T_{2}}{2}$
Server time $T_{3}$
$T_{a v g} = \frac{T_{1}^{'} + T_{2}^{'} + T_{3}}{3}$

Adjusting

After calculating the $T_{a v g}$ we can make adjustment to all three servers

Its more scalable than Cristian Protocol as it is not bottlenecked by a single server, even though it requires more network calls to reach synchronization

Real-world synchronization

Network Time Protocol (NTP)
Instead of one time server, it uses a hierachy of time servers arranged into a strata
First stratum contains servers with highly accurate time
The second stratum servers synchronize with the first, the third stratum servers synchronize with the second and so on

Validation and fault tolerance for having multiple servers and synchronization within each level

Synchronization

$R T T = (T_{4} - T_{1}) - (T_{3} - T_{2})$
Adjustments at X+1 $T_{3} + \frac{R T T}{2}$

Which Server to choose

Lower Stratum servers more accurate
Reliability of communication/latency

Challenges

How to find a time server
How to handle faults (if server goes down)
What if the time server or a client is compromised

Logical Clock

As compared to real physical clock, we might be interested in the logical clock (when does message arrive)
We can only have partial order in distributed system
imagine a group project, people work independently on different task, but at certain stage, it should be synchronized and the order matters

Happens-before Relation

If A and B are events in the same component and A occurs before B, then A->B
Let A and B be different system to interact, If A is a send message and B receives this message, then A->B

Implication

If A->B (A happens before B) and B->C (B happens before C), then A->C (transitive)
If A->B is not true, then A and B may proceed concurrently. This means A and B may occur in any unknown order

if A->B then L(A)<L(B) where L(A) is the logical clock value
Each machine's logical clock is local to the machine itself

max(L(p1.2),L(p2.5))+1 is to make sure that the logical clock value of receiving is alway higher than the sending end

Total Ordering of Events

Total ordering of events in a distributed system refers to the establishment of a consistent and agreed-upon sequence for all events that occur across multiple nodes or processes in the system. It ensures that all nodes observe events in the same order, regardless of the actual physical time or the order in which they occur at each node.

In a distributed system, where multiple nodes can operate concurrently and independently, it is challenging to establish a global time reference. Therefore, total ordering provides a logical ordering mechanism that allows nodes to agree on the sequence of events.

Total ordering can be achieved through various techniques such as:

Logical Clocks: Nodes assign logical timestamps to events based on their own local clocks. These timestamps are used to establish a partial order among events, which can later be transformed into a total order.
Vector Clocks: Similar to logical clocks, vector clocks assign vectors of logical timestamps to events. Each node maintains its own vector clock, and when events are exchanged between nodes, their vector clocks are updated accordingly.
Lamport Timestamps: Proposed by Leslie Lamport, Lamport timestamps are used to establish partial ordering among events based on causality. Each event is assigned a unique timestamp, and if one event causally depends on another event, its timestamp will be greater than the previous event's timestamp.
Physical Clock Synchronization: Nodes synchronize their physical clocks using protocols like Network Time Protocol (NTP) or Precision Time Protocol (PTP). Once synchronized, physical timestamps can be used as an approximation for total ordering.

By establishing total ordering of events in a distributed system, applications can ensure consistency and reliability in various scenarios such as distributed databases, replicated systems, messaging systems, and consensus algorithms like Paxos or Raft.

In a total ordering If the logical clock value is the same then they must be concurrent

Total ordering is achieved by the logical clock value

Causality Violation

Causality violation in distributed systems refers to a situation where the order of events observed by different processes or clients does not agree with the causal relationship between those events. In other words, it is a violation of the cause-effect relationship between events in a distributed system.

Lamport's logical clock can be used for a total ordering

However, it might not consider cases whether all messages are received in the exact same order in all machines
One machine can receive A first then B but other machine receive B first then A

Example

M1 -> Move File 'X' to C2
M2-> Use File 'x' from C2
M3-> read record 'y' from file 'x'

Explanation

M1 states "Move File 'X' to C2." This message indicates that file 'X' is being moved from C1 to C2.

M2 instructs C3 to use File 'X' from C2. This message assumes that file 'X' has already been moved to C2. However, since M2 is sent after M1, it violates the causality principle because M2 depends on the successful execution of M1. Therefore, M2 should not refer to File 'X' in C2 before it is actually moved.

M3 reads record 'Y' from File 'X'. This message assumes that File 'X' exists in C2. However, since M2 violates causality, M3 also violates causality because it tries to read from File 'X' in C2 before it is actually moved.

Potential causality violation

Intuition

We want to detect potential causality violation

Any action a host takes may be affected by any message it has previously received

detect the order in which the message are send and receive is different across all nodes in the system

eg. if machine P receives M2 before M1 but M1 was sent before M2, then a potential causality violation has occurred

Send(M1)-> Send(M2)
Send(M1)-> Rec(M1)
Send(M2)-> Rec(M2)
Rec(M2)-> Send(M3)
Send(M3)-> Rec(M3)
Rec(M3)-> Rec(M1)

Therefore
Rec(M2)->Rec(M1)

Vector Clock

There is no information embedded in the logical clock value that the file is sent or not

Embedding information in the clock so that we can analyse causality issues
we cannot use scalar value since the information is lost
we use vector clock, that is not fully synchronized with all machines

How vector clock works

size of vector clock is the number of machine
update clock asynchronsely when there is message exchange
Only local clock value is increment by one on event excution
others are untouched unless message exchange
sending entire vector with each message
on receive, merge both vectors for each element in the vector, select the greater of the corresponding elements from each, increment the component for self

Comparing vector

Comparison of vector time stamps is crucial for the detection of causality violation
comparing vector clocks requires element-wise comparison

if A->B then V(A)<V(B)

Property 1: All elements of $V_{A}$ must be less than or equal to respective elements in $V_{B}$
Property 2: At least one of the element in $V_{A}$ is strictly less than the respective element of $V_{B}$
If propertiy 1 or property 2 is not satisfied then the event can be concurrent

Detecting a causality violation

Upon receiving a message, its vector clock (recall that the vector clock is associated with each sent message) is checked with the local clock.

If the local clock is after (in line with the vector comparison) the message clock, then a causality violation is flagged.

Why is that so

Every machine only updates its own clock and not touching the rest of the vector.
If there is an advance clock describe previously can only happen if there exist another machine that receive the message that increments the clock

For this example, 2 updates the c1 clock and is captured by c3 which is then sent to c2 resulting in c2 having a more advanced c1 clock than the clock sent by m1

1. Discuss one disadvantage of Berkley’s clock synchronization protocol as compared to Cristian’s protocol. Illustrate the shortcoming with an example. You may assume that the network is ideal in the sense that sending and receiving messages take equal amount of time. No marks will be awarded in the absence of the example.

Berkley’s clock synchronization protocol and Cristian’s protocol are both used in distributed systems to synchronize clocks. However, each has its own advantages and disadvantages. One key disadvantage of Berkley’s protocol in comparison to Cristian's protocol is that it involves more communication overhead.

Berkley’s protocol operates by having a coordinator node (often the node with the master clock) send a message to all other nodes in the system, asking them for their current time values. Each node then responds with their time value. The coordinator collects all these responses, calculates an average time value, and sends this average back to all nodes. Each node then adjusts its clock accordingly.

This process introduces a higher level of communication overhead compared to Cristian's protocol because every node must communicate with every other node multiple times during each synchronization operation. This can be problematic in large distributed systems where there are many nodes, resulting in high network traffic and increased latency.

On the other hand, Cristian's protocol works by having each client send a request to a single server for its current time, receive a response from the server, and adjust its clock accordingly. This process involves significantly less communication between nodes compared to Berkley’s protocol.

Example: Imagine a network of 10 nodes using Berkley’s algorithm for synchronisation. For each synchronisation operation, the coordinator would need to send out 10 messages (one to each node), receive 10 responses (one from each node), calculate an average time value, and then send out another 10 messages (the adjusted time value) - that's a total of 30 messages for one round of synchronisation.

In contrast, if this same network was using Cristian's algorithm instead - each client would just need to send one message request to the server and receive one response back - that's just two messages per client or 20 messages total for all clients for one round of synchronisation - significantly less than Berkley’s protocol.