Monday, September 26, 2011

Cacheismo Next Steps

Most important limitation of current approach is that cacheismo cannot be used in clustered mode. The reason being virtual key hashing might choose a server different from where the actual key is stored.
hash("set:new:myKey") is not equal to hash("set$myKey") is not equal to hash("set:count:myKey")

I did some experiments with lua threads and they look pretty cheap to create. The current plan is to make cacheismo nodes aware of each other and add client side support to cacheismo server. Instead of just looking up the key in local server memory, cacheismo will also do the hashing at server side on the actual key and fetch the results from the responsible cacheismo server.  This would double the latency but ensure that any virtual key can be thrown at any server and it will fetch the results.

For example consider something like "set:intersection:mySet1:mySet2:mySet3".
Lets assume that  mySet1, mySet2 and mySet3 are stored on different servers.
Trivial implementation would be to get mySet1, mySet2 and mySet3 to server executing the request and do the processing. Another possibility is to get sizes of mySet1, mySet2 and mySet3 and execute two parallel "set:intersection:smallest:other1" and "set:intersection:smallest:other2", followed by local intersection over the result.

Since cacheismo is scriptable, it is always possible to change this logic.  I will hide the non-blocking details using lua co-routines so writing such objects will not require "knowing" the asynchronous IO aspects, just some friendly function call like getValueForKey.

It might make it possible to do some in memory map-reduce with recursive calls for virtual keys running parallel stacks spread over the network, executing at the same time.

UPDATE: Cacheismo Cluster Update

Saturday, September 24, 2011

Introducing Cacheismo

Cacheismo is scriptable object cache which can be used as a replacement for memcached. It supports memcache protocol (tcp and ascii only). It automatically manages slab sizes, no user configuration is required.  Slabs are intelligently managed and they reconfigure themselves as per  the usage. PUT never ever fails. Details follow:
- Efficient use of memory 
Slab allocator used in memcached would waste memory if slab sizes and object sizes are not in sync. For example when using slab sizes of 64/128/256/512/1024 bytes object with size 513 is stored in 1024 bytes slab, wasting almost half the memory. Cacheismo tries to  maintain memory used by object as close as possible to the object size
- Reclaimable Memory 
Every byte is reclaimable in cacheismo. PUT never fails. You will never have slab full issues. More on memory management.
- True LRU support. Dependable cache
Memcached will evict items from the slab in which object will be placed. Cacheismo will always use the LRU, irrespective of the size. This ensures that the most important data is always in the cache. No surprises. If something is not in the cache, it either expired or was least used.
- Server Side Scriptable objects
The best part of using cacheismo is that is it fully scriptable in lua. Instead of cluttering your code with workarounds for  memcached read/write semantics, you can move that logic inside  cacheismo.  See Use cases for cacheismo Rate Limiting Quota With Cacheismo Using Cacheismo Cluster For Real-time Analytics
Sample objects map, set, quota and sliding window counters written in lua can be found in scripts directory. You could create  your own objects and place them in the scripts directory. The interface for accessing scriptable objects is implemented via memcached get requests. For example:  get set:new:mykey   would create a new set object refered via myKey get set:put:myKey:a   would put key a in the set myKey get set:count:myKey   would return number of elements in the set get set:union:myKey1:myKey2   would return union of sets myKey1 and myKey2 See scripts/set.lua for other functions. I call this approach virtual keys. Reason for using it ... You can use existing memcached libraries to access cacheismo. Building - cacheismo depends on libevent and lua (luagit can  also be used). Complete implementation is in about 5000 lines  of code and fairly organised. 
Blue line is (memcachedTPS-cacheismoTPS)/memcachedTPS. This essentially measures how much faster is memcached compared to cacheismo. As you can see at max it is 20% faster.
Black line is (cacheismoHitRATE-memcachedHitRATE)/cacheismoHitRATE. This essentially means how much memory efficient is cacheismo compared to memcached. For the first half of the test case we don't see any difference. But in the second half cacheismo is at max about 80% efficient. The reason for this behavior is that memcached can't reassign slabs once they are used whereas cacheismo continues to adjust with changing object sizes. Over time memcached commits memory to some slabs and hence its hit rate decreases over time.
Get source from GitHub
Google Group
UPDATE: Cacheismo Next Steps

Friday, September 23, 2011

What to do when your code suddenly slows down by 60%?

before running the tests multiple times to confirm that this indeed is the case...
before looking into the recent changes...
before looking into the test code...
before checking if swap is being used...
before checking if logging is enabled...
before doubting the kernel upgrade...
before using strace -c to find out unusual system calls...
before using valgrind --tool=callgrind to profile the code...

The first thing to do is to check cat /proc/cpuinfo and check in cpu Mz is same as advertised cpu speed. Laptops usually run in power saving mode. Not sure what is wrong with 64 bit Ubuntu 11.04, running on Lenovo Thinkpad T410. Even with power plugged in, it decided to run at 933Mz instead of 2.53 Gz.
Thanks to google for this link.

Saturday, August 20, 2011

I am not with Anna

Call me pessimist, but I don't see corruption going away any time soon.

Loose definition of corruption would be using your "position" for "personal gains" instead of doing what is "right".  The operating word here is "position". How much corrupt can a beggar be? How much corrupt can a government be? Clearly government holds such an important position that it is very likely to be corrupt.  It pays to be corrupt government.

How about Lokpal? Does Lokpal holds important position? Yes it does and in some sense it holds a position more powerful than the government itself. Does it pays to be a corrupt Lokpal? Handsomely. Almost as much as a corrupt government.  I kind of believe in Murphy's Law as much as I believe in gravity.

Lokpal is as susceptible to corruption as much is government or a government employee. The process of choosing Lokpal is similar to aristocracy.  The difference is instead of choosing the government itself we are using it to choose the Lokpal. In some sense what Anna is saying is that Aristocracy is a better form of governance than democracy.  Is it?

Solving the corruption problem needs fundamental understanding and play between the concepts of "position", "personal gains" and what is "right".

Position:  Power corrupts, absolute power corrupts absolutely. How can we change the power equation?  Democracy gives people the power to change the government every five years. Government gets absolute power for five years. Here is my take on better democracy Dynamic Government.

Personal Gains:  This is usually money but it could be anything that can lead to money in the future. Again it need not be personal. It could easily go to friends and family.  I believe that the concept of money is broken. See Reinventing Money.  India badly needs to have Inheritance Tax.  In some sense money is equivalent to "position" if the person in "position" is corrupt. Hence absolute money for long duration is equivalent to absolute power and will lead to corruption. See  If Money Expired And People Had Choice .

What is Right: This one is hard to crack. Moral is right according to religion. Legal is right according to Law. Rational is right according to reason - but who knows what we are missing and how logical our minds think. Religion and Law are created by mortals only and could be wrong at times. I don't know the answers here but this is the direction I am looking at - On Reality Truth And Levels Of Abstractions and The Idiot Religion

I think corruption is a design issue, design of human interactions, how they help each other to survive together.

Wednesday, August 17, 2011

Mini Speed Breaker

The idea behind speed breaker is breaking the speed. The effectiveness of speed breaker drops with visibility. It is useless if driver cannot see it. In fact it can do more harm instead of providing safety.

Twist is, use a mini speed breaker before the actual speed breaker to give indication to the driver for the coming speed breaker. It is better than painting speed breakers or putting up boards because it doesn't needs  visibility for communication.  Basically if you hit a mini speed breaker, just reduce your speed as a big one will follow.