Interfacing
Introduction
From the paper, there are only two operations that we need to take care of for the interfacing options for the system get
and put
, however there are many considerations and specific implementations of this that we need to take care of
How request are handled
- It provides two core operations -
get(key)
andput(key, value, context)
. - The key is treated as an opaque array of bytes. Dynamo applies a hash function #updateLink on the key to generate a storage location.
- The value is also treated as an opaque array of bytes representing the data object
Write context
There may be instances were multiple versions of a same data might be present, hence a write context
has to be passed to the put
request to determine which version of the data to write to
write context will contain information such as vector clock
more about the use of vector clock in versioning here
The context information is stored along with the object so that the system can verify the validity of the context object supplied to the put request
For concurrent writes, the coordinator node generates a new write context that subsumes all the vector clocks of the conflicting versions. This new reconciled write context is returned to the client on success.
Execution
- the node to handle the request is selected by
- a
load balancer
to select base on load information partition-aware
client library
- a
coordinator
is the node that handles the read and write operations, typically is the first among the top N node in the preference list- If the requests are received through a load balancer, requests may be routed to any random node in the ring
- The node receiving the request, if it is not in the top N node, it will not coordinate the request but
forward
to thefirst
node among the top N node in the preference list
May lead to imbalance load distribution
- can be solved by allowing any of the top N nodes in the preference list to be the coordinator
- The coordinator is chosen to be the node replied the
fastest
to theprevious read request
- This information should be stored in the context
- Clients can configure Dynamo's consistency and durability parameters like R, W and N on a per-instance basis to control tradeoffs.
R (Read quorum size)
- The minimum number of nodes that must participate in a successful read operation.W (Write quorum size)
- The minimum number of nodes that must participate in a successful write operation.N (Replication factor)
- The number of nodes that each data item is replicated to, more can be found in Replication
- Increasing R and W improves consistency but reduces availability. Setting lower values improves availability but risks inconsistency.
- N controls the durability and number of replicas for each data item. Higher N means more copies are stored.
- Typical values are R=2, W=2 and N=3. This provides weak consistency but high availability.
- Setting W=1 and N=1 means a write is considered successful even if it succeeds on only one node. This maximizes write availability.
- Setting R=1 means reads can be satisfied from any node without checking other replicas. This improves read performance.
Put request handling
- Upon receiving a
put()
request for a key, the coordinator generates avector clock
for the new version and writes the new version locally - The coordinator then sends the
new version with the vector clock
to theN highest ranked
reachable
nodes, if at leastW-1 nodes
respond then the write is considered successful - must specify which version it is updating, by passing the context it obtained from earlier read operation
- Replication is required for consistency and durability
Get request handling
- Coordinator request all existing version of data for that key from the N highest-ranked reachable node in the preference list for that key
- if there are multiple branches that cannot be syntactically reconciled, it will return all objects at the leaves with the corresponding version information in the context
- The divergent versions are then reconciled and the reconciled version superseding the current versions is written back
- A better visualisation here
Do we really need all the version of data
What if there are too many versions accumulated and the query become slow
for divergent versions, do we return the reconciled version?
State machine
Each client request results in the creation of a state machine on the node that received the client request.
The state machine contains all the logic for identifying the nodes responsible for a key, sending the requests, waiting for responses, potentially doing retries, processing the replies and packaging the response to the client
They use a state machine
to keep track of the current status of the client request a desired output is only produced when the state machine terminates
Read operation
To read up more about versioning Data versioning