Friday, September 30, 2011

Rate Limiting/Quota with Cacheismo

Cacheismo is a non-blocking single threaded high performance scriptable memcache. Instead of storing bytes or blobs in the cache, it can store objects defined in lua. No need to get the object, modify it and then save it back (with retry).  Directly modify object state in the cache itself by invoking methods of your objects. Because it is single threaded, modifications are automatically atomic. It is best suited for managing volatile application state at super speed (70K requests per second per core). Memcached would be more useful if your objects don't change much. But if they are changed frequently, memcached's update mechanism using retries would give you significantly low performance.

It comes with a default implementation of a quota object. You can change it as it suits you without touching the c code.  Here is how to use it.

Note that these are memached get requests with special keys encoding our intent.  The keys follow the following syntax:
<ObjectType>:<Method>:<ObjectKey>:<Arg1>:<Arg2>

Object type is "quota". Cacheismo has other object types as well and more can be added by adding new lua scripts. Methods for quota object are "new", "addandcheck", "reset". Object key we are using is "userKey". You would need key per user. Number of arguments would vary depending upon the method. Here new takes 2 args, addandcheck takes 1 and reset takes 0.

get quota:new:userKey:10:hour 
VALUE quota:new:userKey:10:hour 0 7 
CREATED
-- Creates a new quota object in cacheismo, that enforces 10 requests per hour.
get quota:addandcheck:userKey:10
VALUE quota:addandcheck:userKey:10 0 8
QUOTA_OK
-- Checking if user can use 10 units of quota. This succeeds. It is easy to create a default user object if it doesn't exist in this call also, instead of forcing users to always do a new first.
get quota:addandcheck:userKey:1
VALUE quota:addandcheck:userKey:1 0 14
QUOTA_EXCEEDED
-- Checking if user can use 1 more unit of quota. This fails as user exhausted it in previous call.
get quota:reset:userKey
VALUE quota:reset:userKey 0 8
QUOTA_OK
-- Reset the quota back to 0.


Cacheismo can do around 70K such operations per second running on single core. Like memcached, if single instance is not enough, multiple instances can be used. Cacheismo supports memcached ascii protocol. So any memcached client can be used to access it.

Introduction to cacheismo

=========================================================
-- simple quota implementation 
local quotaTypesMap = { month = 1, day = 1, hour = 1 }


local function findExpiresAt(quotaType)
       local current = os.date("*t")
       if (quotaType == "month") then
           current.month = current.month + 1
           current.day   = 1
           current.hour  = 0
           current.min   = 0
           current.sec   = 1
       elseif (quotaType == "day") then
           current.day   = current.day + 1
           current.hour  = 0
           current.min   = 0
           current.sec   = 1
       else 
           current.hour = current.hour+1
           current.min   = 0
           current.sec   = 1
       end 
    return os.time(current)
end


local function new(limit, quotaType) 
  if (quotaTypesMap[quotaType] == nil) then
      return nil
  end
  local  object    = {}
  object.limit     = tonumber(limit)
  object.type      = quotaType
  object.count     = 0
  object.expiresAt = findExpiresAt(quotaType)
  return object
end


local function addandcheck(object, value)
   local toUseValue = 1 
   if (value ~= nil) then
      toUseValue = tonumber(value)
   end
   if (os.time() > object.expiresAt) then
       object.count = 0;
       object.expiresAt = findExpiresAt(object.type) 
   end
   if ((object.count + toUseValue) > object.limit) then
       return -1
   else 
       object.count = object.count + toUseValue
       return 0
   end
end


local function reset(object) 
   if (os.time() > object.expiresAt) then
       object.count = 0
       object.expiresAt = findExpiresAt(object.type) 
   end
   object.count = 0
   return 0
end
===================================================================

Cacheismo Memory Allocation

Cacheismo uses two different kind of memory allocators. One is called chunkpool which is used to store the items and another is called fallocator which is used for request processing. Both have different design goals. chunkpool is designed to be reasonably fast but very compact. It tries to minimize memory wastage. fallocator on the other hand doesn't care about memory wastage but tries to be very fast. Here are the details.

fallocator 
fallocator is used when we are allocating memory temporarily and it is certain that it will be freed very soon. For example when reading data from socket and parsing the request, many small strings are generated but these are released as soon as the request is over, which could be a matter of micro seconds.   Cacheismo creates a fallocator per connection. fallocator not only allocates memory but also keeps track of memory. Thus if because of bad code we miss freeing some memory, it will be reclaimed when the fallocator is destroyed, essentially when the connection is closed.

fallocator internally allocates a big buffer (4KB) and keeps track of how much of the buffer is used. Malloc simply returns buffer+used at the pointer and increments used by size of malloc.  If the current buffer cannot handle the request, new buffer is allocated and memory given from new buffer. Used buffers are kept in a linked list. On free, fallocator decreases the refcount on the buffer which was used to allocate that memory. If the refcount becomes zero, the buffer is freed from the used buffers list. The benefit of this refcount based memory management is that it keeps memory usage to minimum. So if you are doing some malloc/frees in a loop, they will not exhaust the memory because refcount mechanism will ensure that they are released. If the malloc size is bigger than 4KB, fallocator defaults to malloc but keeps track of the buffer.

Moreover the 4KB buffer are cached to avoid going to malloc every time. This scheme of things give us O(1) malloc and  O(1) free in normal case (when  not going to system malloc).   It avoids fragmentation of the memory but could result in more memory being used than required at any point in time. It also gives us some protection against memory leaks.

chunkpool
chunkpool is used for storing the cacheitems and the goal here is to maximize memory utilization. chunkpool imposes two constrains on the application. One, it never allocates more than 4KB of contiguous memory. That is, if the application needs 8KB of memory, it can at max only get 2 4KB blocks and not a single 8KB block. Applications needs to be designed around this limitation. The second one is not a constraint, but an advice.  Apart from malloc, chunkpool exposes another function called relaxedMalloc which can return memory smaller than what application asked for. Thus if application asks for 4KB of memory, it might return 256 bytes or 1KB if it doesn't have a 4KB block. If the returned memory is too small, application can free it and deal with out of memory condition or if possible use large number of small buffers instead of small number of large buffers.

Because of these restrictions, fragmentation of memory doesn't makes the system unusable, but only adds  more memory management overhead. Internally we have slabs of size 16 + n16 upto 4096 bytes. Initially all memory is allocated to the 4KB slab. If applications asks for say 1KB of memory we split the 4Kb buffer into two buffer on of 3Kb and another of 1Kb. The 1 KB buffer goes to the application and the 3KB is stored in the 3Kb slab. Cacheismo chunkpool uses skiplist to keep track of non empty slabs. At runtime if we found that the slab of asked malloc size is empty, it quickly find the next not empty slab, splits the buffer and returns memory to the application. On free, chunkpool tries to merge the free buffers to create bigger buffers.  One limitation that I will fix later is that chunkpool memory uses the first four bytes for memory management. The implication is that any free buffer can only merge with logical next buffer and not logical previous buffer. This would require storing buffer size also at the end of the buffer and not just at the start.  To circumvent this problem, chunkpool requires an extra GC call where it tries to merge 8MB of pages at a time.  This is called once every second.  This is triggered only if at least 12% of memory is free and average chunk size is less than 256 bytes other wise it is a no-op. Even though chunkpool is capable of addressing upto 64GB of memory, it is advisable not to go beyond say 8GB.

chunkpool is a very efficient allocation manager for cache data, when compared with slab allocator.  It saves user the trouble of first thinking about what slab sizes should be used and maintains high HIT Rate when object sizes are changed because of changes in application code.  Cacheismo is not faster than memcached when comparing raw TPS, but when you take HIT Rate into account, it saves you lots of db accesses and increases overall application performance.  You can download it from here.

Wednesday, September 28, 2011

Cacheismo Cluster Update

I have started work on cacheismo cluster. I should have something working in next couple of days. The interface for lua scripts to get values from the cluster would need small change. Instead of only specifying the key, the caller will need to specify both dataKey as well as the virtualKey when looking up for a value. Cacheismo will use the dataKey to decide the server to which the virtualKey would be sent so that we don't end up in a endless loop. For dataKey's lookup is anyways simpler.

The cacheismo cluster specification (set of servers in the cluster) would be exposed via special virtual keys, so that it can be updated just like other objects. Each of the servers will monitor all other servers in the cluster and update its consistent hashing table so that requests are routed appropriately. This places a restriction on the number of servers we can add in a cluster. Maybe at some point what we will need is neste or multi-level consistent hashing where a special node will stand for multiple other nodes, something like a router which connects one network with another.

UPDATE: Cacheismo Cluster Update 2

Monday, September 26, 2011

Cacheismo Next Steps

Most important limitation of current approach is that cacheismo cannot be used in clustered mode. The reason being virtual key hashing might choose a server different from where the actual key is stored.
hash("set:new:myKey") is not equal to hash("set$myKey") is not equal to hash("set:count:myKey")

I did some experiments with lua threads and they look pretty cheap to create. The current plan is to make cacheismo nodes aware of each other and add client side support to cacheismo server. Instead of just looking up the key in local server memory, cacheismo will also do the hashing at server side on the actual key and fetch the results from the responsible cacheismo server.  This would double the latency but ensure that any virtual key can be thrown at any server and it will fetch the results.

For example consider something like "set:intersection:mySet1:mySet2:mySet3".
Lets assume that  mySet1, mySet2 and mySet3 are stored on different servers.
Trivial implementation would be to get mySet1, mySet2 and mySet3 to server executing the request and do the processing. Another possibility is to get sizes of mySet1, mySet2 and mySet3 and execute two parallel "set:intersection:smallest:other1" and "set:intersection:smallest:other2", followed by local intersection over the result.

Since cacheismo is scriptable, it is always possible to change this logic.  I will hide the non-blocking details using lua co-routines so writing such objects will not require "knowing" the asynchronous IO aspects, just some friendly function call like getValueForKey.

It might make it possible to do some in memory map-reduce with recursive calls for virtual keys running parallel stacks spread over the network, executing at the same time.

UPDATE: Cacheismo Cluster Update

Saturday, September 24, 2011

Introducing Cacheismo

Cacheismo is scriptable object cache which can be used as a replacement for memcached. It supports memcache protocol (tcp and ascii only). It automatically manages slab sizes, no user configuration is required.  Slabs are intelligently managed and they reconfigure themselves as per  the usage. PUT never ever fails. Details follow:
- Efficient use of memory 
Slab allocator used in memcached would waste memory if slab sizes and object sizes are not in sync. For example when using slab sizes of 64/128/256/512/1024 bytes object with size 513 is stored in 1024 bytes slab, wasting almost half the memory. Cacheismo tries to  maintain memory used by object as close as possible to the object size
- Reclaimable Memory 
Every byte is reclaimable in cacheismo. PUT never fails. You will never have slab full issues. More on memory management.
- True LRU support. Dependable cache
Memcached will evict items from the slab in which object will be placed. Cacheismo will always use the LRU, irrespective of the size. This ensures that the most important data is always in the cache. No surprises. If something is not in the cache, it either expired or was least used.
- Server Side Scriptable objects
The best part of using cacheismo is that is it fully scriptable in lua. Instead of cluttering your code with workarounds for  memcached read/write semantics, you can move that logic inside  cacheismo.  See Use cases for cacheismo Rate Limiting Quota With Cacheismo Using Cacheismo Cluster For Real-time Analytics
Sample objects map, set, quota and sliding window counters written in lua can be found in scripts directory. You could create  your own objects and place them in the scripts directory. The interface for accessing scriptable objects is implemented via memcached get requests. For example:  get set:new:mykey   would create a new set object refered via myKey get set:put:myKey:a   would put key a in the set myKey get set:count:myKey   would return number of elements in the set get set:union:myKey1:myKey2   would return union of sets myKey1 and myKey2 See scripts/set.lua for other functions. I call this approach virtual keys. Reason for using it ... You can use existing memcached libraries to access cacheismo. Building - cacheismo depends on libevent and lua (luagit can  also be used). Complete implementation is in about 5000 lines  of code and fairly organised. 
Blue line is (memcachedTPS-cacheismoTPS)/memcachedTPS. This essentially measures how much faster is memcached compared to cacheismo. As you can see at max it is 20% faster.
Black line is (cacheismoHitRATE-memcachedHitRATE)/cacheismoHitRATE. This essentially means how much memory efficient is cacheismo compared to memcached. For the first half of the test case we don't see any difference. But in the second half cacheismo is at max about 80% efficient. The reason for this behavior is that memcached can't reassign slabs once they are used whereas cacheismo continues to adjust with changing object sizes. Over time memcached commits memory to some slabs and hence its hit rate decreases over time.
Get source from GitHub
Google Group
UPDATE: Cacheismo Next Steps

Friday, September 23, 2011

What to do when your code suddenly slows down by 60%?

before running the tests multiple times to confirm that this indeed is the case...
before looking into the recent changes...
before looking into the test code...
before checking if swap is being used...
before checking if logging is enabled...
before doubting the kernel upgrade...
before using strace -c to find out unusual system calls...
before using valgrind --tool=callgrind to profile the code...

The first thing to do is to check cat /proc/cpuinfo and check in cpu Mz is same as advertised cpu speed. Laptops usually run in power saving mode. Not sure what is wrong with 64 bit Ubuntu 11.04, running on Lenovo Thinkpad T410. Even with power plugged in, it decided to run at 933Mz instead of 2.53 Gz.
Thanks to google for this link.

Saturday, August 20, 2011

I am not with Anna

Call me pessimist, but I don't see corruption going away any time soon.

Loose definition of corruption would be using your "position" for "personal gains" instead of doing what is "right".  The operating word here is "position". How much corrupt can a beggar be? How much corrupt can a government be? Clearly government holds such an important position that it is very likely to be corrupt.  It pays to be corrupt government.

How about Lokpal? Does Lokpal holds important position? Yes it does and in some sense it holds a position more powerful than the government itself. Does it pays to be a corrupt Lokpal? Handsomely. Almost as much as a corrupt government.  I kind of believe in Murphy's Law as much as I believe in gravity.

Lokpal is as susceptible to corruption as much is government or a government employee. The process of choosing Lokpal is similar to aristocracy.  The difference is instead of choosing the government itself we are using it to choose the Lokpal. In some sense what Anna is saying is that Aristocracy is a better form of governance than democracy.  Is it?

Solving the corruption problem needs fundamental understanding and play between the concepts of "position", "personal gains" and what is "right".

Position:  Power corrupts, absolute power corrupts absolutely. How can we change the power equation?  Democracy gives people the power to change the government every five years. Government gets absolute power for five years. Here is my take on better democracy Dynamic Government.

Personal Gains:  This is usually money but it could be anything that can lead to money in the future. Again it need not be personal. It could easily go to friends and family.  I believe that the concept of money is broken. See Reinventing Money.  India badly needs to have Inheritance Tax.  In some sense money is equivalent to "position" if the person in "position" is corrupt. Hence absolute money for long duration is equivalent to absolute power and will lead to corruption. See  If Money Expired And People Had Choice .

What is Right: This one is hard to crack. Moral is right according to religion. Legal is right according to Law. Rational is right according to reason - but who knows what we are missing and how logical our minds think. Religion and Law are created by mortals only and could be wrong at times. I don't know the answers here but this is the direction I am looking at - On Reality Truth And Levels Of Abstractions and The Idiot Religion

I think corruption is a design issue, design of human interactions, how they help each other to survive together.

Wednesday, August 17, 2011

Mini Speed Breaker

The idea behind speed breaker is breaking the speed. The effectiveness of speed breaker drops with visibility. It is useless if driver cannot see it. In fact it can do more harm instead of providing safety.

Twist is, use a mini speed breaker before the actual speed breaker to give indication to the driver for the coming speed breaker. It is better than painting speed breakers or putting up boards because it doesn't needs  visibility for communication.  Basically if you hit a mini speed breaker, just reduce your speed as a big one will follow. 

Tuesday, July 12, 2011

Optimizing time in getting off the plane

Two cases:  With or without check-in baggage?
  • Get your hand baggage as soon as possible and stand in the line.
  • This will take around 5 minutes.
  • Get into the bus, but make sure you take the seat close to the gate or preferably stand
  • Get out of the bus and go out (no check-in baggage) 
  • Get out of the bus and wait for baggage to arrive. 
  • Usually baggage will come first if you were the last one to check-in the flight (FILO).
  • Wait for baggage 
When getting out of the landing airport is decided by when you arrive at the departure airport, then why do people continue to fight for being first to get out of the plane. 


Tuesday, May 24, 2011

Why Low Scoring IPL Matches Last Long

IPL makes money via advertisements in the match.  If the match finishes in 10 overs instead of 20, it means loss of revenue. It seems like  no matter what, both "strategic timeouts" have to happen in any match. I am not aware of any match in which a team won before the "strategic timeout" irrespective of the required runs.

Sometimes I wonder if sports is a form of Reality TV.

Tuesday, May 10, 2011

Named Queries in Relational Databases

Databases don't provide a way to create named queries with runtime arguments.  Yes we have views but most databases are accessed via programming languages where we need to construct the SQL query. We do have the Prepared statements but again they are programming language constructs and not named database level constructs.

Benefits:
1) No need to parse queries. They are callable by name.
2) Prevents SQL injection attacks because of not parsing.
3) Easy integration with programming languages.  db.execute("myQuery", arg1, arg2,...)

4) Instead of trying to optimize every SQL query, every time, database will have the list of the most important queries it needs to answer. This allows databases to save time by pre-planning the execution in the short run and in the long run, databases can optimize the layout of files or creating of indexes automatically to optimize the access pattern.
5) Easy to change db logic. As long as the query returns the same fields (may be more), you can easily change the actual query logic without changing application code.

Their is a lot in the name. 

Monday, May 02, 2011

The better side of corruption

Corruption is an enabler. It thrives on inefficiency, shortcomings of the system and provides a way to get things done for a price.  People pay for stuff that is valuable to them, hence at the very least corruption is a great validation...it validates, confirms that something is important.  The answer to the question, how do I make something better becomes self evident...fix things where their is corruption.

I wanted to move from Reliance CDMA to Airtel GSM. After 2 weeks, 6 visits of 30 minutes each and asking 10 times when should I disconnect my Reliance connection, the ultimate answer is - "Sorry your request couldn't be processed because you have not disconnected your Reliance connection".

I wish Airtel guys were corrupt. They would have known what people want, how much they want it and  the best interface to execute it.

Monday, April 18, 2011

Customer Support

Customers really don't want Customer Support, they just want working products. 

Friday, April 08, 2011

Price & Retailer Margin

Assuming price is determined by the demand and supply rules, how does one decides the retailer margin?

Let P be the price and M% be the retailers margin and C the expected number of units sold per month, then

Earnings per month for retailer = P * M * C
This number should be high enough to make a living for the retailer after deducting the cost of running the business.

Examples:

  • Mobile prices are high and number of units sold per month is good, causing margins to be low
  • Taps, shower, lights, basins, etc are sold at low volumes, prices are OK and margins are very high
  • FMCG products have very low margins
  • Kitchen chimneys and burners are high priced, volumes are low and margins are high
  • Cigarettes have high volume and hence margins are low
  • High end cosmetics have very low volumes, hence has reasonable margins
  • Books have low volume, unit price is OK, margins should be high
  • Text Books have high volume, unit price is OK, but they are sold only once in a year, very much like crackers on diwali.
  • Pet food also has low volume and hence high margins
  • Furniture again low volume, high price and hence margins ought to be reasonable.
I guess the opportunity for e-commerce is creating business efficiency for low volume products to drive their prices down by passing on the profits to customers making it impossible for some of the retailers to justify their business. The high volume products will always have lower margins and hence price is not what will drive buying on the net, but the other factors like trust, service, range, etc. It also opens up the market for products which couldn't find retail space because they couldn't fulfill this equation.

Another side effect of this equation is what will be counter intuitive for most economists. Under specific circumstances, when the demand goes low, prices don't really go down, they increase. The reason - retailer needs to meet his monthly needs. If he makes less money, the only way to go back to original income levels is increasing the price and not decreasing the price. The specific circumstances are:

  • All retailers are friends (taxi operators, unions, etc) 
  • Customer lifetime values is same as that of current transaction. For example a tourist hiring a taxi is not going to remember or call the same guy on his next visit.
  • How much time does customer has to make the decision. The lesser the time, less choice he has.
  • Customer already committed part of the money. 
  • Does competition exists? 
  • Is trust part of the equation?  

Wednesday, April 06, 2011

Old Poem

मुर्गी से आया है अंडा
या अंडे से मुर्गी आई है 
कविता ने कवी को बनाया है 
या कवी ने कविता बनाई है
दुविधा में हूँ फंसा हुआ 
क्या सच है क्या सचायी है

क्या सच की कोई परिभासा है 
क्या सच झूठ के बीच कोई रेखा है 
जो देखा है वोह सच है क्या 
क्या सच को किसी ने देखा है 

क्या वही है सच जो दीखता है 
या वोह जो आप समझते हैं
लकिन जो आप समझते है 
क्या आप उसे समझते हैं

मैंने सच को समझा नहीं 
जो समझा है समझा दे मुझे 
मैं उलझा हूँ उलझन मैं मेरी 
मुजको मुझेसे मिला दो हरे



Mean And/Or Stupid

Mean and Stupid are two labels which can be used to describe people. Just like tall, dark, handsome or generous, wealthy, shy, happy, content, etc. All labels are dangerous, but these two in-particular are particularly dangerous.

Label someone mean and it gives us the license to be mean to them.
Label someone stupid and it gives us the license to not listen to them. Someone who doesn't appreciates something intelligent said to them is by definition stupid.

Thinking someone is mean and/or stupid eventually makes us  mean and stupid and being mean and stupid anyway causes other people to be mean and stupid. It is a vicious circle.  It is fiction creating hardcore reality which is actually factitious.

Just think how many people are mean and stupid just because we think they are. You are what you think you are is alright, but you are what you think someone else is, is scary.

Thursday, March 17, 2011

Speeding up Kahlua

Earlier post Scriptable Object Cache

The first implementation of the lua based http proxy was fairly slow.  On a 1.6Gz quad core intel processor, it could proxy about 1K requests per second. The final number was about 15K messages per second with 5KB request and 5KB response.

The first problem was LuaTableImpl. It is very slow.  The problem is lua table is both a hashmap and a list with single iterator.  One of the methods in the lua table interface is:


Object next(Object key);

Hack number one is to use default java Hashmap instead of the built in LuaTableImpl. Need to take care of making sure that "array" access works and "next" works.

Problem number two is the overhead of creating the runtime. In my case every request needs a new runtime  or LuaState object which is associated with each request. This was very expensive. Here is what needs to be done to create luaState and set it up for the proxy code to execute.

String     fileName = "/luaTestCode/test.lua";
LuaState   state    = new LuaState(System.out);
File       luaFile  = new File(fileName);
LuaClosure closure  = LuaCompiler.loadis(new FileInputStream(luaFile),
    luaFile.getName(), state.getEnvironment());
state.call(closure, null);
LuaConverterManager manager = new LuaConverterManager();
LuaNumberConverter.install(manager);
LuaTableConverter.install(manager);
LuaJavaClassExposer exposer = new LuaJavaClassExposer(state, manager);
        exposer.exposeClass(LuaContext.class);
                exposer.exposeClass(LuaHttpRequest.class);
                            exposer.exposeClass(LuaHttpResponse.class);

Object proxy = state.getEnvironment().rawget("proxy_function");
Object value = state.call(proxy, null);


As you can see, we have file read, followed by compiler, followed by some Reflection code which will setup java functions in the lua runtime. All of this is expensive and needs to be done for every message. To avoid this, I used couple of hacks. First one was to avoid the compilation again and again. This one was simple. The output of compilation step is LuaClosure. I changed the code so that instead of LuaClosure I would get LuaPrototype which is the real output of compilation. This could be easily reused to create as many LuaClosure objects without paying the price of file access and compilation. The second hack was to avoid setup cost of LuaState object. Digging into code revealed that LuaState (without the stack) is nothing but the environment or a luaTable. So I just added clone method to the LuaTable implementation and used it to cheaply construct new LuaStates which had all the java classes correctly exposed.  The new code looked something like this.

public class LuaContext {
private LuaState     state;
private LuaClosure   closure;
private Object       coroutine;

private static final LuaState            BASE;
private static final LuaConverterManager MANAGER;
private static final LuaJavaClassExposer EXPOSER;
static {
BASE    = new LuaState(System.out);
MANAGER = new LuaConverterManager();
LuaNumberConverter.install(MANAGER);
LuaTableConverter.install(MANAGER);
EXPOSER = new LuaJavaClassExposer(BASE, MANAGER);
                EXPOSER.exposeClass(LuaContext.class);
EXPOSER.exposeClass(LuaHttpRequest.class);
EXPOSER.exposeClass(LuaHttpResponse.class);
}

public LuaContext(LuaPrototype proto) 
throws IOException {
this.state         = BASE.clone();
this.closure       = new LuaClosure(proto, state.getEnvironment());
this.state.call(this.closure, null);
this.coroutine     = this.state.getEnvironment().rawget("proxy_function");
}
That is it.  After adding support for storing state (lua objects), lock and some basic json, I was able to write some cool proxy features in lua itself.

Here is how I could do redirect handling at server side...

function getServerAndPath(response) 
       local newURL = response:getHeader("Location")
       local server     = nil
       local path        = nil
       if (newURL ~= nil) then
            local h1, h2 = string.find(newURL, "http://")
            local s1, s2  = string.find(newURL, "/", h2+1)
            server          = string.sub(newURL, h2+1, s1-1)
            path             = string.sub(newURL, s1)
       end
       return server, path 
end


function isRedirectResponseCode(code)
      if (code == 301) then return true end
      if (code == 302) then return true end
      if (code == 303) then return true end
      if (code == 307) then return true end
      return false
end

function proxy(context, httpRequest)
         local targetServer   = "yahoo.com"
         local path                  = httpRequest:getUri() 
         local responseCode = 301
         local httpResponse  = nil

         while (isRedirectResponseCode(responseCode)) do 
      httpRequest:setHeader("Host", targetServer)
               httpRequest:setUri(path)
               httpResponse = luaSendRequestAndGetResponse(context, httpRequest) 
               responseCode = httpResponse:getStatus()
               print ("got response with status " .. responseCode)
               targetServer, path = getServerAndPath(httpResponse) 
         end
         luaSendResponse(context, httpResponse)
end


 Here is another example...for doing least connection based server load balancing.

-- simple lb implementation 

function newLB() 
  local  object      = {}
  object.lock        = NewLuaLock()
  object.connections = {}
  return object
end

function addServer(object, serverName) 
    object.lock:lock()
    if (object.connections[serverName] == nil) then
        object.connections[serverName] = 0
    end
    object.lock:unlock()
end

function findLeastUsedServer(object)
    local min    = 64 * 1024
    local result = nil
 
    object.lock:lock()
    for k,v in pairs(object.connections) do
if (v < min) then 
             min    = v 
             result = k
        end        
    end
    object.connections[result] = min + 1
    object.lock:unlock() 
    return result
end

function decreaseServerCount(object, server)
    object.lock:lock()
    if (object.connections[server]  ~= nil) then 
if (object.connections[server] > 0) then 
           object.connections[server] = object.connections[server] -1
        end
    end
    object.lock:unlock()
end

--------------------------------------------------------------

function getLBResource() 
local objectMap    = getObjectMap()
         local lbResource   = objectMap:get("lb")

         if (lbResource == nil) then 
lbResource = newLB() 
                addServer(lbResource, "google.com")
                addServer(lbResource, "yahoo.com")
                objectMap:put("lb", lbResource)    
         end 
         return lbResource
end


function proxy(context, httpRequest)
         local lbResource = getLBResource()
         local server         = findLeastUsedServer(lbResource)
               httpRequest:setHeader("Host", server)
         local response    = luaSendRequestAndGetResponse(context, httpRequest) 
               decreaseServerCount(lbResource, server)
         luaSendResponse(context, response)
end

The final example show how to filter the twitter timeline of all the junk information and cache it for a while at the proxy.

function proxy(context, httpRequest)
         local globalCache     = getHttpResponseCache()
local twitterResponse = globalCache:get("timeline")
       
         if (twitterResponse == nil) then 
          local twitterRequest  = NewLuaHttpRequest("HTTP/1.1", "GET", "/1/statuses/public_timeline.json")
                  twitterRequest:setHeader("Host", "api.twitter.com")                
         twitterResponse = luaSendRequestAndGetResponse(context, twitterRequest)
                
                  local jsonArray       = NewLuaJSONArrayFromText(twitterResponse:getContent())    
                  local newJsonArray    = NewLuaJSONArray()        
          
                  for i=0,(jsonArray:length()-1) do 
                       local current = jsonArray:getJSONObjectAtIndex(i)
                       local text    = current:getString("text")
                       newJsonArray:putString(text)
                  end
                  twitterResponse:setContent(newJsonArray:toString())
                  -- cache for five minutes
                  globalCache:put("timeline", 300, twitterResponse)
          end
          luaSendResponse(context, twitterResponse)
end


Wednesday, March 16, 2011

I don't like horns

First, I don't use it much myself.  I get irritated when others (over) use it.

The idea behind horn is - "Be Careful".  It is like a broadcast SOS message.  Why should others be careful, why don't you yourself be careful. To put it in other words, I think the ability to tell others to be careful gives us the luxury of NOT being careful. May be we should just ban horns and learn to drive without this ability. Hopefully once In-Car computers are running Android, we might get to see beyond line of sight and be better informed, or horns might get replaced by inter car communication systems, which one could hack and filter.


Wednesday, March 09, 2011

Twitter Business Model

How will twitter make money?

Two things they did are:

  • New website 
  • Promoted Tweets
Why promoted tweets? Yes for money, but more importantly, it is because they will show up on all applications that use twitter API.  Twitter apps don't run in twitter context, twitter runs in twitter app context.  Facebook on the other hand provides the context in which facebook apps run. The important implication is who owns the screen real estate. In case of facebook, it is facebook. In case of twitter it is twitter apps. Hence any mechanism of advertisements from twitter would have to be part of the "content" served by twitter API. But even this doesn't works if clients can filter content.  

So twitter started with making people remove twitter/tweet name from their app names and website names. They purchased few of the clients, did revenue sharing with some of the clients and then added promoted tweets.  All this and the new website are all towards an effort to own the screen real estate, because fundamentally that is what you need if you want to show advertisements.

Here are some ideas on how twitter can make money.
  • Speed  - How much time does it takes for a tweet sent to be displayed on the followers screen.  This could be immediate or could be delayed and it could be delayed so much that it never makes it to the intended person.  Some people would pay for speed in sending it out, some people would pay for speed in receiving it.  Some might pay for slowing it down.
  • Are all tweets equal ? And is chronological order the best way of reading tweets. If someone can save my time by filtering and sorting them for me, I could pay for that. It could be the most "active tweets"(replies) or most liked tweets (retweets) or it could be a person doing the filtering. Keep me connected but don't waste my time.  May be organizing them into tweetshots - collection of related tweets either on a topic or from a person or during a time interval. Out source this problem say via twitter proxy interface and let developers innovate inside the twitter platform, instead of outside. Another way to expose the same functionality could be virtual users. @politics could be a tweet channel that filters tweets related to politics. Basically instead of having hard link between produced and consumer, make it a soft link. 
  • Make twitter a market place.  Paid tweets. Subscribe to Seth Godin's tweets for $1 per month. May be some people would find it worth the price. Similarly twitter could have tweets as advertisements. So I as a subscriber can sell my attention for some money. Lets say I put my attention for tweet at 20 cents. Any one whom I am not following can send me a tweet by paying me 20 cents.  My attention is cheap, but CEO of a company might put it at $100. 
  • Twitter Analytics - Instead of specifying a single shortened URL in the tweet, auto generate different URL for each user.  This creates a way for twitter to monitor exactly who read it and when and what else they are reading.
  • Most broadcast mediums end up causing spam. Email, Groups, etc. Twitter is in a unique position where they control the medium and have power over the complete ecosystem. No one can tweet me until I follow them. Email sucks at this. Communication medium with a single point of control can solve the spam problem.
  • Many people use twitter to complain in public. The purpose of that complaint is not to bring down reputation of the company, but to make the company listen. Reputation management in social media is now almost a buzz word. Twitter can become the unified CRM for companies. #complaint >apple could be slowed down, giving company a chance to handle the complaint, before letting it  spread. People would love it if it makes it easy to communicate with the companies and companies would love it, if they somehow can manage the complaint instead of playing with its reputation.
  • Twitter for computers. Twitter is a communication medium for people. But the same technology could be used for structured messages (json?).  Twitter for Rentals (Need a place or want to rent a place) . Twitter for products (price changes/discounts). I guess the point is, instead of launching websites, providing API's or sending emails to other companies, some companies will benefit by just providing that information in public and let other companies consume it, in a standard way. So if you are recruiting, just tweet it to a recruiting topic with standard interface and let all recruiters in the world help you.  Or if you need a quote for 100 lenovo laptops with 4GB ram, just tweet it.  In short, global structured message pub sub, using standard twitter client.
I like the company, would love to see it making money.

Thursday, February 24, 2011

Percentage Vs Absolute

Some things have absolute costs and some have percentage costs.  Tax, Interest, Bribes, Bonus, Affiliate Programs, Retailer Margin, etc are some examples where we pay in percentages. Most other stuff works with absolutes.

Absolute seems to work best in the producer consumer scenario.

Percentages do best when the relationship is more of a partnership.

Sometimes I wonder if instead of paying money to each other, we could invest in each other, we would have had better shot at world peace, mutual respect and being human.

Friends,  is what we call the kind of people we invest in.