Friday, October 22, 2010

Scriptable Object Cache

UPDATE:  Scriptable object cache is now available Cacheismo

Over the last few months, I have become a fan of Lua, especially because of the coroutine support. In fact more than Lua, I have become great fan of Kahlua, the java implementation of the Lua Language. The first product of this fascination was a Lua Http Proxy. This is what it looks like:

function proxy(context, httpRequest)
   local request = httpRequest
   local request1 = NewLuaHttpRequest("HTTP/1.1", request:getMethod(), request:getUri())
  local request2 = NewLuaHttpRequest("HTTP/1.1", request:getMethod(), request:getUri())
  local requests = {request1, request2}
         request1:setHeader("Host","10.10.8.76")
          request2:setHeader("Host","10.10.8.76") 
  local responses = luaSendRequestAndGetResponseInParallel(context, requests)
   local finalResponse = responses[1]
          finalResponse:setContent(responses[1]:getContent() .. responses[2]:getContent()) 
         luaSendResponse(context, finalResponse)
   end
end

All this code does is that when it gets a request, it makes two parallel requests and then sends back concatenated response. Fairly simple ?  No. To the naked eye this is a single threader blocking code, but when you combine this with the power of Lua coroutines you get a proxy which is as simple to write as single threaded blocking code, but underneath you have the full power of java non blocking IO. You can probably have 100K concurrent connections running this code concurrently using may be 4 or 8 threads. That is where simplicity meeds speed and speed meets power or flexibility. It will be trivial to build upon this to build a node.js alternative which is much much easier to code without loosing the speed and gaining platform independence in bonus.

Now to the main topic Scriptable Object Cache. As I mentioned in the last post about the java memcache,  I wanted to add some Lua to it. The result was a cache which could contain object instead of bytes. This is where redis fits in, but what if you wanted to store a object specific to your needs. Here is how to implement Set functionality.


function new() 
  local  hashset = {}
  return hashset
end

function put(hashset, key) 
  hashset[key]=1   
  return "STORED"
end

function exists(hashset, key)
  if (hashset[key] == 1) then
     return "EXISTS"
  end
  return "NOT_FOUND"
end

function delete(hashset, key)
     hashset[key] = nil
  return "DELETED"
end 

function count(hashset)
  return #hashset
end

function getall(hashset)  
        local result = ""
for k,v in pairs(hashset) do
           result = result .. k .. " " 
        end
   return result
end


function union(hashset, hashset2)  
        local newhashset = {}
for k,v in pairs(hashset) do
              newhashset[k] = 1
        end
for k,v in pairs(hashset2) do
             newhashset[k] = 1
        end
   return getall(newhashset)
end

function intersection(hashset, hashset2)  
        local toIter
        local toLookup
        local newHashSet = {}

        if (#hashset > #hashset2) then 
            toIter = hashset2
            toLookup = hashset
        else 
            toIter = hashset
            toLookup = hashset2
        end

for k,v in pairs(toIter) do
           if (toLookup[k] == 1) then 
               newHashSet[k] = 1
           end
        end
   return getall(newHashSet)
end


function __init__(context) 
  context:registerConstructor("new")
  context:registerMethod("exists")
  context:registerMethod("put")
  context:registerMethod("count")
  context:registerMethod("delete")
  context:registerMethod("getall")
  context:registerParamsAsKeyMethod("union")
  context:registerParamsAsKeyMethod("intersection")
end

Once you code or copy this the following interface exists:
> new set1     # creates a new set
> invoke set1 put a 
> invoke set1 put b
> new set2 
> invoke set2 put b
> invoke set2 put c
> invoke set1 union set2

I haven't written the client, so don't know what is the performance, but I wouldn't be surprised if it runs at about 50% the speed of memcached.  But wait what about the speed benefits because of not having to do get and update? And by the way how do you update? Using CAS in a loop?  

What we have done here is inverted the responsibility. Instead of doing stuff in the code and putting it in the cache, what we can do now is do the stuff atomically in the cache itself. 

Benefits:
  1. Define you own objects using Lua scripts 
  2. Base code is java so runs on any platform and OS
  3. No CAS in loop. CAS only tells you things have gone wrong. Here you have the flexibility of doing the right thing in the first place.
  4. No need to define key patterns to store stuff using different keys. Define your object instead. Define its methods and what they return. As a bonus all stuff on a given object is atomic.
  5. Performance. Updating a 1000 entry list in memcached would need getting the object, updating it and store using CAS. If fails repeat. Now just define a single function which does the update and you are done. 
This is not complete yet. I plan to add persistence and simplify the interfaces a bit.  Stay tuned. 


Memcached in Java

Over the last few weeks I was just playing around with some non blocking java code. I had read that jmemcache which works using Netty runs at about 50% the speed of memcached. So I wrote a simple cache implementation and picked up some code from the jmemcache project and I had my own memcached in java. Works at about 80%-90% of the speed of memcached, which I believe is very good given the quadruple data copy involved ... java bytes arrays to java byte buffer to native byte buffer to the kernel.

Benefits:

  • Works on all platforms (32 bit, 64 bit)
  • Works on all OSes which have java.. windows, linux, solaris, hp-unix, mac, etc
  • No slab reallocation issues as we don't have any slabs
  • Multi threaded
  • Pure java. No third party dependency.
Bads:
  • 10%-20% performance drop
  • Cpu utilization is higher than memcached
  • Objects are not tightly packed, so uses a bit more memory

Testing was done using the xmemcached client and their benchmark code.

PS: Java gathering write still has memory leak on 64 bit linux.

Tuesday, October 19, 2010

Banks can only Collapse

Banks don't make losses.  They can only collapse and so do governments.

With Variable or Adjustable Home Loan or Mortgage,  banks can always keep interest rates high enough for the consumer. Hence the only way for banks to loose money is when consumers are unable to pay the loans i.e defaults. This is not the norm, but under extreme circumstances that is the only choice left for the consumer.

This smells like the old kingdoms where a stupid King will continue to raise taxes on the people without bothering to help them, resulting in either a take over or a revolution. All systems are created by people, for the people and when they start being unfair and unjust, people break the system.  It is the responsibility of the government to create regulations and laws to create systems which are fair and just, so that people don't break the systems. Breaking systems is costly, but sometime systems don't leave enough choice.

Money is a system. It is a system of trust. Economic depression happens when people don't trust money anymore.  In other words economy works when people trust money. And it is the responsibility of the government to make sure they do. It is hard to factor in how much people trust money into equations and hence it is left out, but that is the most important factor for the economy to work. The God Father and Hindi remake Sarkar show a glimpse of that other kind of economy.

Any business that cannot make losses can only collapse. Government, banks, corruption, Windows, Apple, Google Search, Facebook, Ebay, etc.  When their is no choice, people invent something new.

Thursday, October 14, 2010

The not so virtue of selfishness

Ayn Rand talked a lot about selfishness and how that is good for everybody. Corruption is a direct consequence of selfishness. See Corruption and Capitalism.

How do you define selfishness? As much as I think about it, it sort of boils down to being logical or making the best possible choice for yourself, both of which are incomplete definitions.

Example: Say, I am a doctor. I need to catch a flight. The only taxi ready to go to the airport is charging me 10X the regular price. 

By conventional thinking the taxi driver is selfish. But so am I, because that is the logical and best possible choice I have.  Once I pay 10X the price and I understand the virtue of selfishness, I will also start charging my patients 10X the price when I know they don't have a choice. Yet again, patients are selfish as they are being logical and making the best possible choice and so am I.  

What we have done here is that by being selfish, people start screwing each other when they know the other person doesn't have a choice. Banks do it, Oil companies do it, Facebook, Apple, Google, Microsoft, IBM, Cisco, Oracle do it.  Someday the pharmacy will do it, airlines will do it, your maid will do it, your wife will do it, your children will do it..everyone will do it. 

So what we get is a screwed up society to live in by being selfish. But if I did think this through and felt that if I start screwing people, eventually I will end up in a screwed up society which I don't want, I might stop screwing people when they don't have choice. Now this again is a selfish choice to make.   

This is my problem with selfishness. What you do by being selfish could be anything and is just limited by your own ability to think and decide for yourself. Being selfish is good if and only if everyone has same IQ.

Tuesday, October 05, 2010

On Reality, Truth & levels of Abstractions

Much of the philosophy literature is full of the notion of identity, consciousness and perception. Table is a table and not a chair kind of stuff. What actually exists and what is just our perception.

Chair is a chair. It is made of wood. But no chair is also made of plastic and metals. Even wooden chair has metal nails and may be some glue and polish. Sofa is also a chair. It could have leather or cloth. The wood itself is made of molecules and molecules are made of atoms and then atoms are made of electrons and protons and neutrons, which themselves are made of sub atomic particles.

Technically the reality is based on sub-atomic particles which we can't see.

This brings us to the core conjecture of this post: the Reality & Truth is just a comfortable level of abstraction.

It will be useless to talk about recipe of fried chicken at the level of sub atomic particles or molecules. Or discussing the design of a building in terms of atoms. If you are making a atomic reactor, yes that is the level at which you need to think. May be sub atomic particles are made of further small units and if someone making a atomic reactor cannot explain his truths on current theory, he will further invent a better level of abstraction to deal with those things.

Things are grey, but it is efficient/simple to talk using black and white. Company distinguishes  between the people based on their roles  - manager, developer, QA, architect, etc. Political parties distinguish between people based on voting units or religion or caste or gender or education etc. Banks distinguish the same set of people based on their net worth or ability to repay loan etc. Police looks at them as criminals or non criminals. Doctors would have a different way of people classification. If we don't do such kind of classification to abstract out what is important and what is not, it will be impossible to get anything done in the world. For every one, this is the reality and at the same time it is a comfortable level of abstraction.

Every calculation in the world which uses pi is incomplete and yet we use it because it simplifies life. You can choose as many digits as you like, whatever you are comfortable with. Reality & Truth is just a comfortable level of abstraction, as long as they work, all abstractions are good enough substitute for reality. The only problem is we get so comfortable with these abstractions that we cease to think beyond them which by the way was the very reason for creating them in the first place.  All we need to do is to be aware of these abstractions, so that instead of beating our head on why things are not making sense, just think beyond the abstractions and invent new ones which make sense, until they also break.  The bigger problem is that we share these abstractions with other people in the world and unless they too feel the need, it is very hard to make them adjust/agree to a new "uncomfortable" abstractions (They are comfortable with the current setup).

Newton did a great job at describing motion and then Einstein questioned how do you measure it. Both theories are currently in use depending upon which suites the given problem. Before google all search engines looked at each page as a collection of words and then google defined each page using the link structure around it.  When Zynga copied from other games on facebook, they had figured out that it was not about the quality of the game, but about how do you use the facebook platform to grow the user base. When IBM was thinking PC is the business, Microsoft figured out that OS is what makes the difference and is the strategic point of control.

Well the point is that many innovations in business or science are simply the result of looking at the problem differently, using different level of abstraction. The same is true for any form of knowledge we have ever encountered. Yet the world continues to struggle with what is reality, this is reality and this is not, fighting wars, writing blogs, doing marketing and propaganda. I guess if we could knock off the word reality/truth and simply use abstraction, world would be much much peaceful.