Friday, November 12, 2010

Partition Tolerance & Calculated Consistency

This is the part about CAP theorem that I don't understand. A system which is not tolerant to partitions, can be made to be consistent and available as per CAP.  I don't understand the tolerant part...and its implications. Basically when I give up tolerance what have I exactly given up because the only two things system really needs is consistency and availability. Partitions cause lost of availability or consistency because of shared state.

Is it possible to be not tolerant and still be consistent and available? Or does being not tolerant inherently assume either not consistent or not available or both? A single node system doesn't have any problems with partitions, but when the single node goes down, it is not available.  Thus I believe it is not possible to have a available system without replication and under network partitions replicated systems will become inconsistent if they have to be available. An alternate way of defining CAP theorem would be to use replication, consistency and latency as the base constructs. A node never coming up has infinite latency, whereas partitions would have some upper bound on latency (by including human intervention).

Much of the discussion around CAP theorem has been based around the NoSQL dbs. These systems use the key-value data model. The main problem is that we always think of interactions as read or write operations. Whenever a write occurs it modifies the original value in arbitrary possible ways, making it hard to measure the cost of being inconsistent. Consider a system where we think of writes as changes. Further if these changes have the property that irrespective of in which order they are applied they will yield the same result. In such a case all we need is a reliable way of sending messages to nodes. The degree of inconsistency is directly related to the latency in delivering messages.

The second area where I believe we can reduce the impact of inconsistency is by moving the decision of choosing consistency or availability in the db nodes itself. Consider a system which stores objects with code instead of just data. A change is just an invocation of a method on the object that changes the object state. Now if this method is aware of the currently available nodes, it can choose to be inconsistent or available depending upon the impact of the change.  Example: Consider a shopping cart application that needs to decrease its inventory count by 1. Also assume that we have 10 replicas. Now when 10 concurrent clients want to buy the same item and somehow these operations end up happening concurrently on all the replicas, we have two extreme options. One to serialize them and second to let them happen in parallel and be inconsistent. Now consider that we have only 1 item, in that case running this in parallel will cause problems as we are selling stuff we don't have. Only one of the requests should succeed.  But if we had 1000 such items, the impact of inconsistency is not much. Each node will think the current item count is 1000, whereas it would have changed from 1000 to 990. Thus if we know the impact or magnitude of inconsistency, we can take a better call at runtime to decide if we should prefer availability or consistency.  For example in the case above, it is OK to be inconsistent if each node does 100 transactions without consulting with any other node. Depending upon number of nodes it is consistent with, this number can be refined as long as possible to give availability assuming other nodes will use the same strategy.

Monday, November 08, 2010

Earlier posts on FOSS

These two are on the future of selling software as a business.

Free Software Bubble



This one talks about how to make money by writing open source software.

How to make money from open source?

This one tries to provide a solution to the issues presented above by redefining the role of operating systems. It goes on to provide a business model for software in which OS plays an important role.

Thoughts on OS

This one has some advise for Microsoft but seems like Apple heard it first.

Why Microsft should make windows 7 mobile free?

Friday, October 22, 2010

Scriptable Object Cache

UPDATE:  Scriptable object cache is now available Cacheismo

Over the last few months, I have become a fan of Lua, especially because of the coroutine support. In fact more than Lua, I have become great fan of Kahlua, the java implementation of the Lua Language. The first product of this fascination was a Lua Http Proxy. This is what it looks like:

function proxy(context, httpRequest)
   local request = httpRequest
   local request1 = NewLuaHttpRequest("HTTP/1.1", request:getMethod(), request:getUri())
  local request2 = NewLuaHttpRequest("HTTP/1.1", request:getMethod(), request:getUri())
  local requests = {request1, request2}
         request1:setHeader("Host","10.10.8.76")
          request2:setHeader("Host","10.10.8.76") 
  local responses = luaSendRequestAndGetResponseInParallel(context, requests)
   local finalResponse = responses[1]
          finalResponse:setContent(responses[1]:getContent() .. responses[2]:getContent()) 
         luaSendResponse(context, finalResponse)
   end
end

All this code does is that when it gets a request, it makes two parallel requests and then sends back concatenated response. Fairly simple ?  No. To the naked eye this is a single threader blocking code, but when you combine this with the power of Lua coroutines you get a proxy which is as simple to write as single threaded blocking code, but underneath you have the full power of java non blocking IO. You can probably have 100K concurrent connections running this code concurrently using may be 4 or 8 threads. That is where simplicity meeds speed and speed meets power or flexibility. It will be trivial to build upon this to build a node.js alternative which is much much easier to code without loosing the speed and gaining platform independence in bonus.

Now to the main topic Scriptable Object Cache. As I mentioned in the last post about the java memcache,  I wanted to add some Lua to it. The result was a cache which could contain object instead of bytes. This is where redis fits in, but what if you wanted to store a object specific to your needs. Here is how to implement Set functionality.


function new() 
  local  hashset = {}
  return hashset
end

function put(hashset, key) 
  hashset[key]=1   
  return "STORED"
end

function exists(hashset, key)
  if (hashset[key] == 1) then
     return "EXISTS"
  end
  return "NOT_FOUND"
end

function delete(hashset, key)
     hashset[key] = nil
  return "DELETED"
end 

function count(hashset)
  return #hashset
end

function getall(hashset)  
        local result = ""
for k,v in pairs(hashset) do
           result = result .. k .. " " 
        end
   return result
end


function union(hashset, hashset2)  
        local newhashset = {}
for k,v in pairs(hashset) do
              newhashset[k] = 1
        end
for k,v in pairs(hashset2) do
             newhashset[k] = 1
        end
   return getall(newhashset)
end

function intersection(hashset, hashset2)  
        local toIter
        local toLookup
        local newHashSet = {}

        if (#hashset > #hashset2) then 
            toIter = hashset2
            toLookup = hashset
        else 
            toIter = hashset
            toLookup = hashset2
        end

for k,v in pairs(toIter) do
           if (toLookup[k] == 1) then 
               newHashSet[k] = 1
           end
        end
   return getall(newHashSet)
end


function __init__(context) 
  context:registerConstructor("new")
  context:registerMethod("exists")
  context:registerMethod("put")
  context:registerMethod("count")
  context:registerMethod("delete")
  context:registerMethod("getall")
  context:registerParamsAsKeyMethod("union")
  context:registerParamsAsKeyMethod("intersection")
end

Once you code or copy this the following interface exists:
> new set1     # creates a new set
> invoke set1 put a 
> invoke set1 put b
> new set2 
> invoke set2 put b
> invoke set2 put c
> invoke set1 union set2

I haven't written the client, so don't know what is the performance, but I wouldn't be surprised if it runs at about 50% the speed of memcached.  But wait what about the speed benefits because of not having to do get and update? And by the way how do you update? Using CAS in a loop?  

What we have done here is inverted the responsibility. Instead of doing stuff in the code and putting it in the cache, what we can do now is do the stuff atomically in the cache itself. 

Benefits:
  1. Define you own objects using Lua scripts 
  2. Base code is java so runs on any platform and OS
  3. No CAS in loop. CAS only tells you things have gone wrong. Here you have the flexibility of doing the right thing in the first place.
  4. No need to define key patterns to store stuff using different keys. Define your object instead. Define its methods and what they return. As a bonus all stuff on a given object is atomic.
  5. Performance. Updating a 1000 entry list in memcached would need getting the object, updating it and store using CAS. If fails repeat. Now just define a single function which does the update and you are done. 
This is not complete yet. I plan to add persistence and simplify the interfaces a bit.  Stay tuned. 


Memcached in Java

Over the last few weeks I was just playing around with some non blocking java code. I had read that jmemcache which works using Netty runs at about 50% the speed of memcached. So I wrote a simple cache implementation and picked up some code from the jmemcache project and I had my own memcached in java. Works at about 80%-90% of the speed of memcached, which I believe is very good given the quadruple data copy involved ... java bytes arrays to java byte buffer to native byte buffer to the kernel.

Benefits:

  • Works on all platforms (32 bit, 64 bit)
  • Works on all OSes which have java.. windows, linux, solaris, hp-unix, mac, etc
  • No slab reallocation issues as we don't have any slabs
  • Multi threaded
  • Pure java. No third party dependency.
Bads:
  • 10%-20% performance drop
  • Cpu utilization is higher than memcached
  • Objects are not tightly packed, so uses a bit more memory

Testing was done using the xmemcached client and their benchmark code.

PS: Java gathering write still has memory leak on 64 bit linux.

Tuesday, October 19, 2010

Banks can only Collapse

Banks don't make losses.  They can only collapse and so do governments.

With Variable or Adjustable Home Loan or Mortgage,  banks can always keep interest rates high enough for the consumer. Hence the only way for banks to loose money is when consumers are unable to pay the loans i.e defaults. This is not the norm, but under extreme circumstances that is the only choice left for the consumer.

This smells like the old kingdoms where a stupid King will continue to raise taxes on the people without bothering to help them, resulting in either a take over or a revolution. All systems are created by people, for the people and when they start being unfair and unjust, people break the system.  It is the responsibility of the government to create regulations and laws to create systems which are fair and just, so that people don't break the systems. Breaking systems is costly, but sometime systems don't leave enough choice.

Money is a system. It is a system of trust. Economic depression happens when people don't trust money anymore.  In other words economy works when people trust money. And it is the responsibility of the government to make sure they do. It is hard to factor in how much people trust money into equations and hence it is left out, but that is the most important factor for the economy to work. The God Father and Hindi remake Sarkar show a glimpse of that other kind of economy.

Any business that cannot make losses can only collapse. Government, banks, corruption, Windows, Apple, Google Search, Facebook, Ebay, etc.  When their is no choice, people invent something new.