Data versioning
Introduction
Dynamo provides eventual consistency
, which allows for updates to be propagated to all replicas asynchronously.
- A
put()
call may return to its caller before the update has been applied to all the replicas (As long as W-1 machines responded more details) - A subsequent
read()
request may get the out dated information
Design details
- Dynamo treats the result of
each modification
as anew
andimmutable
version of the data - Multiple versions of the data can be present in the system
Conflicting versions
In the event of concurrent update and failures, there might be multiple versions of the data, resulting in a conflict
Client must perform reconciliation in order to collapse
multiple branches of data evolution back into one
Merge
Multiple versions of shopping cart
available
Merging is performed to make sure no data is lost
deleted item may resurface
Causality
Dynamo uses vector clocks to capture causality between different versions of the same object
Vector clocks is effectively a list of (node,counter)
pairs in this case
- A client creates a new object
Sx
handles the write and create a new objectD1
with vector clock[Sx,1]
- client update the object, request still handled by
Sx
hence only increment the sequence number by1
and a new objectD2
- D2
descends
from D1, hence overwriting D1- There may be replicas of D1 elsewhere that has not seen D2
- Client try to update again, but this time handle by
Sy
, resulting in creation ob objectD3
, since it is a new machines, new vector clock is appended, thus the final vector clock([Sx,2],[Sy,1])
- Then another client try to update again, but the request is handled by
Sz
andSz
only has the version ofD2
available, hence arriving at objectD4
with([Sx,2],[Sz,1])
- Base on the vector clock, a node that is aware of
D3
and receivesD4
will know that they dont have a causal relationship and reconciliation have to be carried out - Assume a client performs a read, the summary of the clocks of
D3
andD4
will be sent - If reconciliation is performed a final version of
D5
is written bySx
with the combination of all the vector clocks([Sx,3],[Sy,1],[Sz,1])
One can determine whether two versions of an object are on parallel branches
or causal ordering
by examine they vector clocks
if the properties 1 and 2
of vector comparison if not satisfied then there may be a conflict and require reconciliation, else can be ignored
Vector clock issues
The size of the vector clocks may gets too big, if many servers coordinate the writes to a object
- the writes are usually handled by one of the top N nodes in the preference list
- In case of network partition or multiple server failures, write request may be handled by nodes that are not in the top N nodes
- This causing the size of the vector clocks to grow (if not it will always be < N)
Vector clock truncation scheme
dynamo store a timestamp(along with (node,counter)
pair) that indicates the last time the node updated the data item.
When the number of (node,counter)
pair exceeds a threshold
The oldest pair is deleted
May lead to inefficiencies in reconciliation as the descendant relationship cannot be derived accurately