Thursday, February 24, 2011

Percentage Vs Absolute

Some things have absolute costs and some have percentage costs.  Tax, Interest, Bribes, Bonus, Affiliate Programs, Retailer Margin, etc are some examples where we pay in percentages. Most other stuff works with absolutes.

Absolute seems to work best in the producer consumer scenario.

Percentages do best when the relationship is more of a partnership.

Sometimes I wonder if instead of paying money to each other, we could invest in each other, we would have had better shot at world peace, mutual respect and being human.

Friends,  is what we call the kind of people we invest in.

Wednesday, December 22, 2010

The 2G Debate

Government sold 2G licenses at very low prices causing huge losses. These licenses are valid for 20 years. Why 20 years? Why not give licenses per year? Why not per month? And why license?

How about government implementing a gateway which could allot spectrum dynamically with say few seconds of granularity, with rates that are based on congestion rates at any time of the day. Basically any telecom provider can get as much spectrum as he needs, programmatically, at granularity of few seconds within few milliseconds. It is not impossible to build such a system and with fraction of the cost at which the spectrum was sold and would give maximum flexibility to all operators so that they only pay for what they use and gives government maximum money because rates can be changed by government as and when appropriate.

This would be a great system to maximize government revenue and offers maximum flexibility to every operator. Chances are this will never be built. How will operators make money and how will politicians make money in that case.  See Non-Markets.

UPDATE: http://arstechnica.com/business/2012/09/fcc-to-make-spectrum-sharing-reality-whether-carriers-want-it-or-not/

Friday, November 12, 2010

Partition Tolerance & Calculated Consistency

This is the part about CAP theorem that I don't understand. A system which is not tolerant to partitions, can be made to be consistent and available as per CAP.  I don't understand the tolerant part...and its implications. Basically when I give up tolerance what have I exactly given up because the only two things system really needs is consistency and availability. Partitions cause lost of availability or consistency because of shared state.

Is it possible to be not tolerant and still be consistent and available? Or does being not tolerant inherently assume either not consistent or not available or both? A single node system doesn't have any problems with partitions, but when the single node goes down, it is not available.  Thus I believe it is not possible to have a available system without replication and under network partitions replicated systems will become inconsistent if they have to be available. An alternate way of defining CAP theorem would be to use replication, consistency and latency as the base constructs. A node never coming up has infinite latency, whereas partitions would have some upper bound on latency (by including human intervention).

Much of the discussion around CAP theorem has been based around the NoSQL dbs. These systems use the key-value data model. The main problem is that we always think of interactions as read or write operations. Whenever a write occurs it modifies the original value in arbitrary possible ways, making it hard to measure the cost of being inconsistent. Consider a system where we think of writes as changes. Further if these changes have the property that irrespective of in which order they are applied they will yield the same result. In such a case all we need is a reliable way of sending messages to nodes. The degree of inconsistency is directly related to the latency in delivering messages.

The second area where I believe we can reduce the impact of inconsistency is by moving the decision of choosing consistency or availability in the db nodes itself. Consider a system which stores objects with code instead of just data. A change is just an invocation of a method on the object that changes the object state. Now if this method is aware of the currently available nodes, it can choose to be inconsistent or available depending upon the impact of the change.  Example: Consider a shopping cart application that needs to decrease its inventory count by 1. Also assume that we have 10 replicas. Now when 10 concurrent clients want to buy the same item and somehow these operations end up happening concurrently on all the replicas, we have two extreme options. One to serialize them and second to let them happen in parallel and be inconsistent. Now consider that we have only 1 item, in that case running this in parallel will cause problems as we are selling stuff we don't have. Only one of the requests should succeed.  But if we had 1000 such items, the impact of inconsistency is not much. Each node will think the current item count is 1000, whereas it would have changed from 1000 to 990. Thus if we know the impact or magnitude of inconsistency, we can take a better call at runtime to decide if we should prefer availability or consistency.  For example in the case above, it is OK to be inconsistent if each node does 100 transactions without consulting with any other node. Depending upon number of nodes it is consistent with, this number can be refined as long as possible to give availability assuming other nodes will use the same strategy.

Monday, November 08, 2010

Earlier posts on FOSS

These two are on the future of selling software as a business.

Free Software Bubble



This one talks about how to make money by writing open source software.

How to make money from open source?

This one tries to provide a solution to the issues presented above by redefining the role of operating systems. It goes on to provide a business model for software in which OS plays an important role.

Thoughts on OS

This one has some advise for Microsoft but seems like Apple heard it first.

Why Microsft should make windows 7 mobile free?

Friday, October 22, 2010

Scriptable Object Cache

UPDATE:  Scriptable object cache is now available Cacheismo

Over the last few months, I have become a fan of Lua, especially because of the coroutine support. In fact more than Lua, I have become great fan of Kahlua, the java implementation of the Lua Language. The first product of this fascination was a Lua Http Proxy. This is what it looks like:

function proxy(context, httpRequest)
   local request = httpRequest
   local request1 = NewLuaHttpRequest("HTTP/1.1", request:getMethod(), request:getUri())
  local request2 = NewLuaHttpRequest("HTTP/1.1", request:getMethod(), request:getUri())
  local requests = {request1, request2}
         request1:setHeader("Host","10.10.8.76")
          request2:setHeader("Host","10.10.8.76") 
  local responses = luaSendRequestAndGetResponseInParallel(context, requests)
   local finalResponse = responses[1]
          finalResponse:setContent(responses[1]:getContent() .. responses[2]:getContent()) 
         luaSendResponse(context, finalResponse)
   end
end

All this code does is that when it gets a request, it makes two parallel requests and then sends back concatenated response. Fairly simple ?  No. To the naked eye this is a single threader blocking code, but when you combine this with the power of Lua coroutines you get a proxy which is as simple to write as single threaded blocking code, but underneath you have the full power of java non blocking IO. You can probably have 100K concurrent connections running this code concurrently using may be 4 or 8 threads. That is where simplicity meeds speed and speed meets power or flexibility. It will be trivial to build upon this to build a node.js alternative which is much much easier to code without loosing the speed and gaining platform independence in bonus.

Now to the main topic Scriptable Object Cache. As I mentioned in the last post about the java memcache,  I wanted to add some Lua to it. The result was a cache which could contain object instead of bytes. This is where redis fits in, but what if you wanted to store a object specific to your needs. Here is how to implement Set functionality.


function new() 
  local  hashset = {}
  return hashset
end

function put(hashset, key) 
  hashset[key]=1   
  return "STORED"
end

function exists(hashset, key)
  if (hashset[key] == 1) then
     return "EXISTS"
  end
  return "NOT_FOUND"
end

function delete(hashset, key)
     hashset[key] = nil
  return "DELETED"
end 

function count(hashset)
  return #hashset
end

function getall(hashset)  
        local result = ""
for k,v in pairs(hashset) do
           result = result .. k .. " " 
        end
   return result
end


function union(hashset, hashset2)  
        local newhashset = {}
for k,v in pairs(hashset) do
              newhashset[k] = 1
        end
for k,v in pairs(hashset2) do
             newhashset[k] = 1
        end
   return getall(newhashset)
end

function intersection(hashset, hashset2)  
        local toIter
        local toLookup
        local newHashSet = {}

        if (#hashset > #hashset2) then 
            toIter = hashset2
            toLookup = hashset
        else 
            toIter = hashset
            toLookup = hashset2
        end

for k,v in pairs(toIter) do
           if (toLookup[k] == 1) then 
               newHashSet[k] = 1
           end
        end
   return getall(newHashSet)
end


function __init__(context) 
  context:registerConstructor("new")
  context:registerMethod("exists")
  context:registerMethod("put")
  context:registerMethod("count")
  context:registerMethod("delete")
  context:registerMethod("getall")
  context:registerParamsAsKeyMethod("union")
  context:registerParamsAsKeyMethod("intersection")
end

Once you code or copy this the following interface exists:
> new set1     # creates a new set
> invoke set1 put a 
> invoke set1 put b
> new set2 
> invoke set2 put b
> invoke set2 put c
> invoke set1 union set2

I haven't written the client, so don't know what is the performance, but I wouldn't be surprised if it runs at about 50% the speed of memcached.  But wait what about the speed benefits because of not having to do get and update? And by the way how do you update? Using CAS in a loop?  

What we have done here is inverted the responsibility. Instead of doing stuff in the code and putting it in the cache, what we can do now is do the stuff atomically in the cache itself. 

Benefits:
  1. Define you own objects using Lua scripts 
  2. Base code is java so runs on any platform and OS
  3. No CAS in loop. CAS only tells you things have gone wrong. Here you have the flexibility of doing the right thing in the first place.
  4. No need to define key patterns to store stuff using different keys. Define your object instead. Define its methods and what they return. As a bonus all stuff on a given object is atomic.
  5. Performance. Updating a 1000 entry list in memcached would need getting the object, updating it and store using CAS. If fails repeat. Now just define a single function which does the update and you are done. 
This is not complete yet. I plan to add persistence and simplify the interfaces a bit.  Stay tuned.