This is the part about CAP theorem that I don't understand. A system which is not tolerant to partitions, can be made to be consistent and available as per CAP. I don't understand the tolerant part...and its implications. Basically when I give up tolerance what have I exactly given up because the only two things system really needs is consistency and availability. Partitions cause lost of availability or consistency because of shared state.
Is it possible to be not tolerant and still be consistent and available? Or does being not tolerant inherently assume either not consistent or not available or both? A single node system doesn't have any problems with partitions, but when the single node goes down, it is not available. Thus I believe it is not possible to have a available system without replication and under network partitions replicated systems will become inconsistent if they have to be available. An alternate way of defining CAP theorem would be to use replication, consistency and latency as the base constructs. A node never coming up has infinite latency, whereas partitions would have some upper bound on latency (by including human intervention).
Much of the discussion around CAP theorem has been based around the NoSQL dbs. These systems use the key-value data model. The main problem is that we always think of interactions as read or write operations. Whenever a write occurs it modifies the original value in arbitrary possible ways, making it hard to measure the cost of being inconsistent. Consider a system where we think of writes as changes. Further if these changes have the property that irrespective of in which order they are applied they will yield the same result. In such a case all we need is a reliable way of sending messages to nodes. The degree of inconsistency is directly related to the latency in delivering messages.
The second area where I believe we can reduce the impact of inconsistency is by moving the decision of choosing consistency or availability in the db nodes itself. Consider a system which stores objects with code instead of just data. A change is just an invocation of a method on the object that changes the object state. Now if this method is aware of the currently available nodes, it can choose to be inconsistent or available depending upon the impact of the change. Example: Consider a shopping cart application that needs to decrease its inventory count by 1. Also assume that we have 10 replicas. Now when 10 concurrent clients want to buy the same item and somehow these operations end up happening concurrently on all the replicas, we have two extreme options. One to serialize them and second to let them happen in parallel and be inconsistent. Now consider that we have only 1 item, in that case running this in parallel will cause problems as we are selling stuff we don't have. Only one of the requests should succeed. But if we had 1000 such items, the impact of inconsistency is not much. Each node will think the current item count is 1000, whereas it would have changed from 1000 to 990. Thus if we know the impact or magnitude of inconsistency, we can take a better call at runtime to decide if we should prefer availability or consistency. For example in the case above, it is OK to be inconsistent if each node does 100 transactions without consulting with any other node. Depending upon number of nodes it is consistent with, this number can be refined as long as possible to give availability assuming other nodes will use the same strategy.
No comments:
Post a Comment