Thursday, October 13, 2011

Using Cacheismo Cluster for Real Time Analytics


Cacheismo now supports invoking requests on other server.
  - getFromServer
    This function takes the server name and a key as an argument and returns the result to the script.
  - getInParallel
    This function takes map of server names to list of keys and gets values for all the keys from all the servers in parallel. Once all the results are received, the script gets a map of server names to map of keys and received values.

Here is a simple example code which gets to top accessed keys from the cacheismo cluster using a simplified, framework-less map-reduce.  The contents below should belong to file names mapreduce.lua

-- function called on every cacheismo node to get 
-- the top keys by access count
-- virtual key : mapreduce:topKeys:


function topKeys(count) 
    local keys  = accessCounter:getKeyAccessCount()
    -- sort keys by access count
    table.sort (keys)
    local result = {}
    local i      = 1
    for (i in 1..count) do 
       result[i] = keys[i]
    end
    return result
end


-- function called on one of the cacheismo nodes 
-- which distributes work to all other nodes using getInParallel
-- virtual key : mapreduce:manageGetTop:10 


function manageGetTop(count) 
     local query = {}
     for k,v in pairs(servers) do 
        query[k] = "mapreduce:topKeys:"..count
     end
     local results = getInParallel(query)
     local total = {}
     for k,v in pairs(results) do 
           for k1,v1 in pairs(v) do 
              total[k1] = v1
           end
     end
     table.sort (total)
     local topkeys = {}
     for (i in 1..count) do 
       topkeys[i] = total[i]
     end
end


I have taken some liberties with the syntax and their is no accessCounter object by default in cacheismo, but is fairly easy to create in the script.  Note that the above implementation doesn't have ant intermediate nodes, but is trivial to create by calling get for mapreduce:manageGetTop instead of calling mapreduce:topKeys.

Single cacheimso server might be serving thousands of such queries because everything is happening in non-blocking manner but we give user the illusion of synchronous stack using lua coroutines.

Links:
Cacheismo Introduction
Cacheismo Lua API
Cacheimso Sliding Window Counter Implementation in Lua
Cacheismo Quota Implementation in Lua
Cacheismo Memory Allocation
Cacheismo Source Code
Cacheismo Support

No comments:

Post a Comment