Friday, October 22, 2010

Scriptable Object Cache

UPDATE:  Scriptable object cache is now available Cacheismo

Over the last few months, I have become a fan of Lua, especially because of the coroutine support. In fact more than Lua, I have become great fan of Kahlua, the java implementation of the Lua Language. The first product of this fascination was a Lua Http Proxy. This is what it looks like:

function proxy(context, httpRequest)
   local request = httpRequest
   local request1 = NewLuaHttpRequest("HTTP/1.1", request:getMethod(), request:getUri())
  local request2 = NewLuaHttpRequest("HTTP/1.1", request:getMethod(), request:getUri())
  local requests = {request1, request2}
  local responses = luaSendRequestAndGetResponseInParallel(context, requests)
   local finalResponse = responses[1]
          finalResponse:setContent(responses[1]:getContent() .. responses[2]:getContent()) 
         luaSendResponse(context, finalResponse)

All this code does is that when it gets a request, it makes two parallel requests and then sends back concatenated response. Fairly simple ?  No. To the naked eye this is a single threader blocking code, but when you combine this with the power of Lua coroutines you get a proxy which is as simple to write as single threaded blocking code, but underneath you have the full power of java non blocking IO. You can probably have 100K concurrent connections running this code concurrently using may be 4 or 8 threads. That is where simplicity meeds speed and speed meets power or flexibility. It will be trivial to build upon this to build a node.js alternative which is much much easier to code without loosing the speed and gaining platform independence in bonus.

Now to the main topic Scriptable Object Cache. As I mentioned in the last post about the java memcache,  I wanted to add some Lua to it. The result was a cache which could contain object instead of bytes. This is where redis fits in, but what if you wanted to store a object specific to your needs. Here is how to implement Set functionality.

function new() 
  local  hashset = {}
  return hashset

function put(hashset, key) 
  return "STORED"

function exists(hashset, key)
  if (hashset[key] == 1) then
     return "EXISTS"
  return "NOT_FOUND"

function delete(hashset, key)
     hashset[key] = nil
  return "DELETED"

function count(hashset)
  return #hashset

function getall(hashset)  
        local result = ""
for k,v in pairs(hashset) do
           result = result .. k .. " " 
   return result

function union(hashset, hashset2)  
        local newhashset = {}
for k,v in pairs(hashset) do
              newhashset[k] = 1
for k,v in pairs(hashset2) do
             newhashset[k] = 1
   return getall(newhashset)

function intersection(hashset, hashset2)  
        local toIter
        local toLookup
        local newHashSet = {}

        if (#hashset > #hashset2) then 
            toIter = hashset2
            toLookup = hashset
            toIter = hashset
            toLookup = hashset2

for k,v in pairs(toIter) do
           if (toLookup[k] == 1) then 
               newHashSet[k] = 1
   return getall(newHashSet)

function __init__(context) 

Once you code or copy this the following interface exists:
> new set1     # creates a new set
> invoke set1 put a 
> invoke set1 put b
> new set2 
> invoke set2 put b
> invoke set2 put c
> invoke set1 union set2

I haven't written the client, so don't know what is the performance, but I wouldn't be surprised if it runs at about 50% the speed of memcached.  But wait what about the speed benefits because of not having to do get and update? And by the way how do you update? Using CAS in a loop?  

What we have done here is inverted the responsibility. Instead of doing stuff in the code and putting it in the cache, what we can do now is do the stuff atomically in the cache itself. 

  1. Define you own objects using Lua scripts 
  2. Base code is java so runs on any platform and OS
  3. No CAS in loop. CAS only tells you things have gone wrong. Here you have the flexibility of doing the right thing in the first place.
  4. No need to define key patterns to store stuff using different keys. Define your object instead. Define its methods and what they return. As a bonus all stuff on a given object is atomic.
  5. Performance. Updating a 1000 entry list in memcached would need getting the object, updating it and store using CAS. If fails repeat. Now just define a single function which does the update and you are done. 
This is not complete yet. I plan to add persistence and simplify the interfaces a bit.  Stay tuned. 

No comments:

Post a Comment