Clocks and Synchronisation
Clocks
In distributed systems, if the timing of the node is wrong, there might be different impact to the program
different machine have different understanding about the clocks
Real time system: requires task to be completed within a given time
Physcial Clock
Centralized Systems
- One or more processors share a common bus
- Time synchronization is not much of concern
Distributed Systems
- Each system has its own timer
- Timers are based on oscillation of quartz crystal
- These materials are not perfect and drift away from true time
Resynchronizing Clocks
- Coordiantion of Physical Clocks
- possible but can never be exact
- distributed algorithms should be tolerant to clock drift
- A typical clock has relative error 10^-5
- 100 ticks/second implies 36000 ticks/hour normally
- with error, 360000 ticks/hour +- ticks
- in the worst case, two system drift 8 ticks apart in an hour
- Given we need to maintain all clocks in a distributed systems within 2 ticks
- 4 synchronization per hour
How we synchronize the clock
Ultimate goal is to have all nodes have the same time and correct time
Cristian's Algorithm
- clients sends a request to timeserver
- Time server responds with a timestamp
- client synchronizes with the server provided timestamp
- measure the
round trip time(RTT)
Assumptions
- Server has
zero
computation cost- Take into account of server latency
server timestamp + (RTT-server latency)/2
- server to append the
computation time
back
- the send and receive delay is
symmetric
- Keep track of recent
RTTs
and repeat request when anRTT
is an outlier - otherwise take a weighted average of
n
measurements - average the error
Recursive call
give the most recent measurement more weight
- Keep track of recent
- client clock does not change meanwhile
check if your computer clock is in sync
Berkeley Algorithm
Similar idea to Christian Algorithm but does not have dedicated server
It synchroize time base on polling from other machine server and compute the time.
The server also broad cast to all machines
C1
has time
C2
has time
Server
time
Adjusting
After calculating the
Its more scalable than Cristian Protocol as it is not bottlenecked by a single server, even though it requires more network calls to reach synchronization
Real-world synchronization
- Network Time Protocol (NTP)
- Instead of one time server, it uses a
hierachy
of time servers arranged into astrata
- First stratum contains servers with highly accurate time
- The second stratum servers synchronize with the first, the third stratum servers synchronize with the second and so on
Validation and fault tolerance for having multiple servers and synchronization within each level
Synchronization
Adjustments at X+1
Which Server to choose
- Lower Stratum servers more accurate
- Reliability of communication/latency
Challenges
- How to find a time server
- How to handle faults (if server goes down)
- What if the time server or a client is
compromised
Logical Clock
As compared to real physical clock, we might be interested in the logical clock (when does message arrive)
We can only have partial order in distributed system
imagine a group project, people work independently on different task, but at certain stage, it should be synchronized and the order matters
Happens-before Relation
-
If A and B are events in the same component and A occurs before B, then A->B
-
Let A and B be different system to interact, If A is a send message and B receives this message, then A->B
Implication
- If A->B (A happens before B) and B->C (B happens before C), then A->C (
transitive
) - If
A->B
isnot true
, then A and B may proceedconcurrently
. This means A and B may occur inany unknown order
if A->B
then L(A)<L(B)
where L(A)
is the logical clock value
Each machine's logical clock is local to the machine itself
max(L(p1.2),L(p2.5))+1
is to make sure that the logical clock value of receiving
is alway higher than the sending
end
Total Ordering of Events
Total ordering of events in a distributed system refers to the establishment of a consistent and agreed-upon sequence for all events that occur across multiple nodes or processes in the system. It ensures that all nodes observe events in the same order, regardless of the actual physical time or the order in which they occur at each node.
In a distributed system, where multiple nodes can operate concurrently and independently, it is challenging to establish a global time reference. Therefore, total ordering provides a logical ordering mechanism that allows nodes to agree on the sequence of events.
Total ordering can be achieved through various techniques such as:
-
Logical Clocks: Nodes assign logical timestamps to events based on their own local clocks. These timestamps are used to establish a partial order among events, which can later be transformed into a total order.
-
Vector Clocks: Similar to logical clocks, vector clocks assign vectors of logical timestamps to events. Each node maintains its own vector clock, and when events are exchanged between nodes, their vector clocks are updated accordingly.
-
Lamport Timestamps: Proposed by Leslie Lamport, Lamport timestamps are used to establish partial ordering among events based on causality. Each event is assigned a unique timestamp, and if one event causally depends on another event, its timestamp will be greater than the previous event's timestamp.
-
Physical Clock Synchronization: Nodes synchronize their physical clocks using protocols like Network Time Protocol (NTP) or Precision Time Protocol (PTP). Once synchronized, physical timestamps can be used as an approximation for total ordering.
By establishing total ordering of events in a distributed system, applications can ensure consistency and reliability in various scenarios such as distributed databases, replicated systems, messaging systems, and consensus algorithms like Paxos or Raft.
By establishing total ordering of events in a distributed system, applications can ensure consistency and reliability in various scenarios such as distributed databases, replicated systems, messaging systems, and consensus algorithms like Paxos or Raft.
In a total ordering If the logical clock value is the same then they must be concurrent
- Total ordering is achieved by the
logical clock value
Causality Violation
Causality violation in distributed systems refers to a situation where the order of events observed by different processes or clients does not agree
with the causal relationship between those events. In other words, it is a violation of the cause-effect
relationship between events in a distributed system.
Lamport's logical clock can be used for a total ordering
- However, it might not consider cases whether all messages are received in the exact same order in all machines
- One machine can receive A first then B but other machine receive B first then A
Example
M1 -> Move File 'X' to C2
M2-> Use File 'x' from C2
M3-> read record 'y' from file 'x'
Explanation
M1 states "Move File 'X' to C2." This message indicates that file 'X' is being moved from C1 to C2.
M2 instructs C3 to use File 'X' from C2. This message assumes
that file 'X' has already been moved to C2. However, since M2 is sent after M1, it violates the causality principle because M2 depends
on the successful execution
of M1. Therefore, M2 should not refer to File 'X' in C2 before it is actually moved.
M3 reads record 'Y' from File 'X'. This message assumes that File 'X' exists in C2. However, since M2 violates causality, M3 also violates causality because it tries to read from File 'X' in C2 before it is actually moved.
Potential causality violation
Intuition
We want to detect potential causality violation
- Any action a host takes may be affected by any message it has previously received
detect the order in which the message are send and receive is different across all nodes in the system
- eg. if machine P receives M2 before M1 but M1 was sent before M2, then a potential causality violation has occurred
Send(M1)-> Send(M2)
Send(M1)-> Rec(M1)
Send(M2)-> Rec(M2)
Rec(M2)-> Send(M3)
Send(M3)-> Rec(M3)
Rec(M3)-> Rec(M1)
Therefore
Rec(M2)->Rec(M1)
Vector Clock
There is no information
embedded in the logical clock value that the file is sent or not
- Embedding
information
in the clock so that we can analyse causality issues - we cannot use scalar value since the information is lost
- we use vector clock, that is not fully synchronized with all machines
How vector clock works
- size of vector clock is the number of machine
- update clock asynchronsely when there is message exchange
- Only local clock value is increment by one on event excution
- others are untouched unless message exchange
- sending entire vector with each message
- on receive, merge both vectors for each element in the vector, select the greater of the corresponding elements from each, increment the component for self
Comparing vector
Comparison of vector time stamps is crucial for the detection of causality violation
comparing vector clocks requires element-wise comparison
if A->B
then V(A)<V(B)
- Property 1: All elements of
must be less than or equal to respective elements in - Property 2: At least one of the element in
is strictly less than
the respective element of
Ifpropertiy 1
orproperty 2
is not satisfied then the event can be concurrent
Detecting a causality violation
Upon receiving a message, its vector clock (recall that the vector clock is associated with each sent message) is checked with the local clock.
If the local clock is after
(in line with the vector comparison) the message clock, then a causality violation
is flagged.
Why is that so
Every machine only updates its own clock and not touching the rest of the vector.
If there is an advance clock describe previously can only happen if there exist another machine that receive the message that increments the clock
For this example, 2 updates the c1
clock and is captured by c3
which is then sent to c2
resulting in c2
having a more advanced c1
clock than the clock sent by m1
1. Discuss one disadvantage of Berkley’s clock synchronization protocol as compared to Cristian’s protocol. Illustrate the shortcoming with an example. You may assume that the network is ideal in the sense that sending and receiving messages take equal amount of time. No marks will be awarded in the absence of the example.
Berkley’s clock synchronization protocol and Cristian’s protocol are both used in distributed systems to synchronize clocks. However, each has its own advantages and disadvantages. One key disadvantage of Berkley’s protocol in comparison to Cristian's protocol is that it involves more communication overhead.
Berkley’s protocol operates by having a coordinator node (often the node with the master clock) send a message to all other nodes in the system, asking them for their current time values. Each node then responds with their time value. The coordinator collects all these responses, calculates an average time value, and sends this average back to all nodes. Each node then adjusts its clock accordingly.
This process introduces a higher level of communication overhead compared to Cristian's protocol because every node must communicate with every other node multiple times during each synchronization operation. This can be problematic in large distributed systems where there are many nodes, resulting in high network traffic and increased latency.
On the other hand, Cristian's protocol works by having each client send a request to a single server for its current time, receive a response from the server, and adjust its clock accordingly. This process involves significantly less communication between nodes compared to Berkley’s protocol.
Example: Imagine a network of 10 nodes using Berkley’s algorithm for synchronisation. For each synchronisation operation, the coordinator would need to send out 10 messages (one to each node), receive 10 responses (one from each node), calculate an average time value, and then send out another 10 messages (the adjusted time value) - that's a total of 30 messages for one round of synchronisation.
In contrast, if this same network was using Cristian's algorithm instead - each client would just need to send one message request to the server and receive one response back - that's just two messages per client or 20 messages total for all clients for one round of synchronisation - significantly less than Berkley’s protocol.