Wednesday, June 27, 2012

Of Money and Banks

Looking back, I seem to have too much to say about these topics.
In chronological order..

This one talks about how money is an abstraction for human interdependence and how it has properties that are different from properties of human interdependence. 
http://chakpak.blogspot.in/2010/03/reinventing-money.html

It is a mixed bag about unethical banking practices like loan switching charges and floating rate of interest (as if fixed one was better) and how we should move away from loans and only allow investments.
http://chakpak.blogspot.in/2010/08/home-loan-switching-charges-floating.html

http://chakpak.blogspot.in/2010/08/interest.html

Read this one. What if money expired ? Is choice enough in capitalism?
http://chakpak.blogspot.in/2010/09/if-money-expired-people-had-choice.html

Banks should be open to failures for their own good.
http://chakpak.blogspot.in/2010/10/banks-can-only-collapse.html

http://chakpak.blogspot.in/2011/04/price-retailer-margin.html

If you are employed, you will love this one. 
http://chakpak.blogspot.in/2012/01/loan-or-subscription.html

More detailed analysis of frequency of choice - the missing piece in capitalism. 

Tuesday, June 19, 2012

Be Original

Cut: Copy-Paste

UPDATE: May be this is not so clear after all. The idea was the use the sentence which is completely opposite of "being original" which is "cut-copy-paste" and use the same words to convey its opposite. If I was to code, this would appear as

Cut (Copy-Paste)

Essentially cut (stop) doing copy-paste if you want to be original.

Wednesday, June 13, 2012

Where is money?

It is not in the banks.
Not in your wallet.
Definitely not in the stock markets.

Take out the piece of paper you call money from your wallet and you will find this written.
"I promise to pay the bearer the sum of X rupees" - Governor RBI.

If money was money, you already have it and then why is Governor of RBI promising you to "pay it"? Well it turns out what it means is that in normal circumstances people will trust these pieces of paper, but even if they don't, I promise I will pay you something of value worth these pieces of paper.

Money is the promise we make to each other and the only species which honors promises is humans. You are money. Your wife, your kids, your parents, your neighbors, people in your office, people on the road, the shopkeepers, the drivers, the plumber ...

EVERYONE is money and NOT EVERYTHING.

Things don't have value, people see value in things. This is simple stuff, one less indirection. The billions of dollars/rupees (pick your currency) is paper in the hands of few individuals or corporations, is nothing if the rest of world doesn't honors their promise to be of value to those who have this paper with them. I have no idea what RBI Governor will give to such a guy whose money no one accepts.

It is people who produce something valuable and it is other people who appreciate that value and pay for it. Money is not what you have in the bank, wallet or in stocks, the value of that piece of paper is in the people around you and their willingness to do something of value for you for that piece of paper. The value of money is in the people. Anything that stops other people from producing value, is decreasing the value of that paper you hold so dear. And many times what is stopping people is that they don't have that paper with them. Richest companies in the world are mostly monopolies (Oil and Gas, Infrastructure, Public Sector). The next richest are banks. Somehow capitalism fails to bring efficiency in banking even with private sector. Everyone of them seems to be making money. Give 5% interest for deposits and charge 15% for giving loans. Stupid regulations.

People seem to trust money more than they trust in people and paradoxically, trust in money is nothing but trust in people.  

Tuesday, June 12, 2012

Sanitary Napkin


What could be more honest than "It Sucks". 
"Periodically Yours" also sounds alright.

Monday, May 14, 2012

Alternative Implementation of Call Center

Most call centers needs sophisticated call management software to distribute calls among the call center workers. Asterix and openpbx are few open source alternatives.  Basically the core design in based on proxy/middleware architecture. I am sure this is expensive.

An alternative way to implement the same functionality would be to use say smartphone based apps. Instead of putting your switching logic on the server, the same can be accomplished via the client. The idea is instead of pushing logic into the IVR or call proxy, use the internet connection and the computing power of the smart phone and your normal webserver (in your favorite langauge) handle the logic aspect of the call and then just make the app call the right number for the job. The approach has multiple benefits.
  • Use phone to store the answers to most stupid questions asked like name of the customer, address of the customer, even phone number of the customer, his choice of language and customer specific ids like account number or customer identification number or may be the receipt number etc. No need to waste time. 
  • Even if the stored information is insufficient, you could let the customer enter it on his mobile device instead of doing it over the IVR.
  • Customer can talk to the same guy he talked to earlier and save time. This helps the customer by skipping the context and generates faster response time.
  • No need to buy a special number. Just expose a web API which tells the phone app which number to call based on the information stored in the phone. Put whatever logic you need to put into that web API like .... if customer has called 5 times and the issue is not yet resolved, automatically route the call to the manager. The numbers could be regular mobile numbers or desk phone numbers of the employees. 
  • Very easy for the customer to give feedback. You have complete power of the webapp at your disposal. 
  • Very easy to implement say a work queue....customer simply presses a button and says call me back. Why wait.
Most "customers" already have smartphones and the ones that don't have are going to have them in next few years. By using the internet connection on phones and their computing power, it is possible to build much better customer management solution at fraction of the cost by offloading the "logic" to the app and webapi and use usual phone number to do the actual talking. Somehow we are still stuck in pushing logic into the phone line. 

Thursday, May 03, 2012

Life doesn't teaches us much

Contrary to popular beliefs, I think that life doesn't teaches us much. If it was true, we would have become very much like each other. We are, but then we are not.

I think after a point life simply reinforces what we already know.  

Sunday, April 15, 2012

Patents - Necessary Evil

The deal with patents is fairly simple. Innovate, file patent, get exclusive rights for some period of time on your invention and make money from your invention. Now the problems.

  • Too many patents being filed
  • Too many generic patents 
  • Patent Troll
  • Defensive patents 
In simple terms the problem is uncertainty. I have developed this cool technology and I have no clue if someone already has a patent on it. File stupid patents, because if you don't someone else will and that is a threat. Oops someone sued me, how much do I need to pay to settle this apart from the lawyer fees. Patents add huge variance to predicting costs/profits in business. 

The fix is simple enough - bring certainty and sanity to the patent ecosystem. 

Time is money

The exclusive rights to use your invention is nothing but a way to make money. And the simplest way to fix this is to base exclusiveness on money and not on time. What this means is that at the time of filing patents, you need to put a monetary value on it which tells the worth of the patent. The cost of filing patent would be based on this monetary value (call it patent tax).  This has two important repercussions.
  • Patent Troll is controlled because a finite value is attached to patent. Irrespective of how patents are "sold" to others for use, total payout cannot exceed the declared value of the patent. 
  • Stupid patents are avoided because you need to attach monetary value which will be taxed. Unless you are sure that patent is worth that value, you probably don't want to waste money on it.
I guess what I am talking about is making patents a market where patents can be purchased/used to  make life  better on planet by figuring out a way to attach monetary value to patents instead of finding it out through expensive court battles. This makes the process efficient and reduces uncertainty and provide value to both the inventor and the user of the technology. The idea as I understood was always to reward the inventor and never to prevent others from using it. Preventing other from using it is just a way to reward the inventor, but not the only way.  If we can find alternative ways for ensuring the same, we can perhaps bring some sanity to the system so that both the inventor and society as a whole can benefit from the invention at reasonable costs, predictably. 

Friday, April 13, 2012

Ford Mantra

I have been driving Ford for eight years now. I like the car, but spare parts are stupidly expensive. I wanted to get the windscreen changed, but thought against it when I was told the price. Rs 9000. Compare that to windscreen cost of Skoda Rapid - Rs 3600. Now that is hell lot of difference.

What does that tells us about Ford Business Model? Sell cheap cars. Make money on spare parts and services. Which again brings me to the old dilemma of choice in capitalism - namely you don't have a choice after you make a choice.

I think some rule like cost of all parts put together in the car cannot be more than the price of car would probably help bringing some sanity to the system. If everyone needs to go through this experience to learn it then Ford surely has a very bright future. I think the statement, "You can fool some of the people some of the time, all people some of the time, some people all the time but not all people all the time" is missing a crucial component. The component is people have finite life times. Then the game starts all over again. If new generations don't listen to the old generation, the cycling of fooling can run indefinitely because people continue to be replaced by new people.  

Fundamentally this is all about frequency of choice.

Sunday, April 08, 2012

Two Favorite Patterns in C

The first one is for error handling.


#define IfTrue(x, level, format, ... )          \
if (!(x)) {                                    \
   LOG(level, format, ##__VA_ARGS__)          \
     goto OnError;                              \
 }


This is the simplest way of simulating try/catch in c. Yeah it uses goto and is a bad programming practice and what not, but it makes c code beautiful and understandable. Take a look at the code below for creating a server connection below. These code samples are from cacheismo.

connection_t  connectionServerCreate(u_int16_t port, char* ipAddress, connectionHandler_t* handler) {
connectionImpl_t* pC = ALLOCATE_1(connectionImpl_t);
IfTrue(pC, ERR, "Error allocating memory");
pC->fd = socket(AF_INET, SOCK_STREAM, 0);
IfTrue(pC->fd > 0, ERR, "Error creating new socket");
{
         int flags = fcntl(pC->fd, F_GETFL, 0);
         IfTrue(fcntl(pC->fd, F_SETFL, flags | O_NONBLOCK) == 0,
    ERR, "Error setting non blocking");
}
memset((char*) &pC->address, 0, sizeof(pC->address));
pC->address.sin_family        = AF_INET;
pC->address.sin_addr.s_addr   = INADDR_ANY;
pC->address.sin_port          = htons(port);
if (ipAddress) {
pC->address.sin_addr.s_addr  = inet_addr(ipAddress);
}
IfTrue(bind(pC->fd, (struct sockaddr *) &pC->address,sizeof(pC->address)) == 0,  ERR, "Error binding");
IfTrue(listen(pC->fd, DEFAULT_BACKLOG) == 0,  ERR, "Error listening");
pC->isServer = 1;
pC->CH = handler;
goto OnSuccess;
OnError:
if (pC) {
connectionClose(pC);
pC = 0;
}
OnSuccess:
return pC;
}

It is a linear code. This avoids multiple exist points and repetitive error handling code. Less nesting of "if" blocks makes it easy to follow the code. Error handling/cleanup happens in the end and is common for all possible errors in the function, which also means less code.

The second pattern I use often is opaque objects.

typedef void* chunkpool_t;


chunkpool_t  chunkpoolCreate(u_int32_t maxSizeInPages);
void         chunkpoolDelete(chunkpool_t chunkpool);
void*        chunkpoolMalloc(chunkpool_t chunkpool, u_int32_t size);
void         chunkpoolFree(chunkpool_t  chunkpool, void* pointer);

Almost every type is opaque. What does it accomplishes? Freedom. Freedom to change the implementation of the objects because rest of the code only uses functions to access the object and doesn't knows how object is actually implemented.  This also forces me to think hard about what should be the minimal interface for accessing this object because it is painful to keep writing new methods.  I use this for almost all objects except objects whose only job is to be containers of data and no functionality.

I do use function pointers when they make sense, but that would be a topic for another post. Writing high performance software is fun, but making sure it is easy to code and easy to change makes the journey pleasant.

Friday, April 06, 2012

The Debt Of Humanity

We are all in debt. I don't mean the financial debt, your home loan and stuff like that. I mean the debt of humanity.  What were our chances of surviving if we were born million years ago. Death during labor, infections, lack of food, shelter. Instead of fighting with each other, we choose to live together and developed a language to talk. Rest is history. We have taken the concept of being together from few families to villages, towns, cities, nations and now we are almost at the edge of time when all of humanity is considered one big family. And the reason is simple - nations also fight and don't know how to live together.
No matter how much we feel for our country, the truth is that vaccines that saved us were invented by someone else, the languages that we speak and which runs multi billion dollar BPO industry is not ours. Bangalore is silicon valley of India but computers and programming languages and the OS we use etc were not invented here.
We are so much deeply connected today than we were yesterday but our ability to see these connections has diminished over time. I am not talking about facebook friends, but those who work at facebook to make it possible. Those who work at google to make search simpler, democratize the mobile OS. I am talking about the people who make our cars and those who make sure you get the petrol/diesel at the station. The ones who run the refineries and the ones who dig oil out of wells and the ones who build the pipelines. The ones who build the roads and those who build the equipment to build the roads. The ones who invest their lifetimes researching life saving drugs, the ones who ensure we have electricity at our homes. The list is endless.
Everyone on the planet is in some ways making life easier for the rest of us. They realize it or not is debatable. We realize it or not is also debatable. But I do feel that we would have never been here without the rest of us. 

The Best

We know what is the best (product/service/decision/policy/whatever). Actually we knew the best all our lives. Sony makes best TV's. Ferrari is the fastest car, iphone is the best phone, land and gold are the best investments, and so on.

At some point in the past, the best TV was theater, their was no fastest car,  but fastest horses, their was no phone. People with more gold were probably robbed and people with more land ended by killing each other to get more land.


Beware of the future. Best is yet to come.

Saturday, March 17, 2012

Frequency of Choice

I have talked about this earlier also, but the topic is so close to my heart that I wanted to have a dedicated post.

As far as I understand it, the crux of capitalism is choice. In economics the word choice is substituted by market. What is market? Market is where consumers exercise choice. If consumers don't have a choice it is not a market. The assumption is capitalism thrives on competition.  Competition creates choice. Consumers will choose the best products at lowest prices forcing companies to innovate and reduce prices. The best will survive.

This is all true and then not quite true. Two main problems:

  • Most people don't like to think. Even  if they can, the complexity of world is sufficiently high that figuring out what is best for them is close to impossible. Eventually it is either brands or price because they make the decision simple. 
  • Frequency of choice. Since this is all I want to talk about, I will use the next paragraph.
We are good at stuff we do often, the old practice makes the man perfect thing. We buy petrol, vegetables, groceries, etc almost every day. Prices changes are felt, drop in quality is noticed. But then their are things that we don't do often. Things like joining new job, getting married, buying car or home, taking a loan, choosing college, getting home painted, buying TV or refrigerator or AC, casting our vote, choosing a laptop or OS, choosing email client, signing up on a social network, etc.  Many of these choices are irreversible or if not irreversible then choosing the alter our choice is very expensive.  This is where capitalism fails miserably because it is no longer about choosing from alternatives but the choice of altering our choice. For products with short life spans like vegetables or toothpaste altering a choice is not expensive. Vegetables will last few days, toothpaste few weeks and you can choose better product next time, but with products that last years or decades or in some cases lifetimes, it is the altering of choice which is required not choice among products. 

Specifics:
 
Consider home loan business in India. Floating rates have been around for long time now.  What do they float on is unknown and once you take the loan you realize that the "unknown" is whim of the Bank. Usually your floating home loan interest rate will increase by 20%-40% within few months of taking the loan and now their is no choice.  Well their is a choice to switch to other home loan, but only if you pay 2%-4% of your home loan value as switching charges.  This is as monopolistic as it gets and we call it capitalism, the mecca of markets and choice.  Even banks don't know if they are giving a good/bad interest rate to the customer, then how can customer decide if he is getting a good deal and that deal is good enough for the next 20 years. No one can. The only way I can know if I getting a good deal is if I can switch my home loan any moment I desire to switch. That is what will make it a market.

The same happens when switching a job (notice period), casting a vote (5 years gap), buying a car (10% value drop when it get out of the showroom) and at many other places. In computers, the advent of SAAS based companies have started filling this gap by providing monthly choice to the customers to continue to use them which once was a difficult choice of finding the best product. Amazon EC2 gives choice to use machines by hour and OS by hour. I think governments which call themselves followers of capitalism have missed a point.  It is not the choice alone that matters, it is the frequency of the choice that is at the core of efficient markets.  

Tuesday, March 06, 2012

Threadpool and the task queue

Every architecture makes way for threadpool and a task queue.  Multiple thread wait on the queue always ready to pick up the next task and execute them. Once implemented, the next task is tuning it. How many threads? What is the size of the queue? Blocking queue or throws error on full? Retry handler?

Before you start worrying about this ask a simple question. How much time does it takes to execute the task? If it is not at least couple of orders of magnitude greater than time it takes to do context switch, don't bother about the threadpool/queue, just execute it right there, on your current thread.

Here is why?
  • Task queue has a lock. More threads and more often it is accessed, more contention, more time to submit the task. Extra context switch just to acquire the lock. Basically you are doing serialization before getting to parallelism here. More threads + more tasks => more time per insert. Think of it like talking to a customer care executive(CCE). You do lots of IO using IVR and finally reach the CCE and the guy instead of answering your questions connects you to another guy and you need to explain the problem once again. That is pretty much how context switch works. If you need to talk for 10-20 minutes, it might be worth it, but if all it takes is few seconds of conversation, it just wastes time.   
  • Once the task is submitted, it need to wakeup some thread. That is context switch, costs time.
  • By the time this new thread wakes up because of lock and time elapsed most of the variable it needs are out of cache...more time. Read lock semantics for JVM. 
  • How do you do error handling from the task? Extra code, extra states.
You can avoid all this by executing the task inline....normal function call. It will run faster.  It is easy to write/debug. The assumption here is that the task really takes short time to execute and it mostly cpu intensive. Webserver using threadpool is understandable. Single request might need to do file IO, access some locked resources, possibly make multiple database queries. These are kind of things that make sense in threadpool...things that are complex enough to be simplified by using  a new/dedicated "thread of execution".  For other things, function call is the most efficient.  

Monday, March 05, 2012

Out of select

Some non-blocking architectures use separate threads for IO and others as worker threads. One problem with this design is extra latency because even though the worker is done with its work, the IO thread is blocked on select/epoll. One simple way to wake up selector under such conditions is to open a pipe per selector thread. The read end is made part of selector fd set and the write end is used by the worker thread to write one byte on the pipe which wakes up the selector. This ensures that whenever new fd needs to be registered with select, it wakes up as soon as possible instead of timing out on read timeout.

Wednesday, February 08, 2012

Adsense for TV adds

TV adds is $64 billion industry in USA alone.  And it works on the TRP rating systems which are nothing but statistical formulas applied on TV viewing timings of few thousand homes per country.  The simplest way to disrupt this market is by controlling the remote control. A wifi-enabled Android/iOS running remote can give real time information about who is watching which channel at what point in time.  And once you have it, you already have Adsense for TV.

And that is what I think Apple and Google are going to fight for over the next few years.

What is the mystery “entertainment device” Google is testing?

Apple patents new touchscreen remote control for a future Apple TV

IntoNow, the ipad app purchased by yahoo was a nice step in this direction using the audio matching technology. I guess this whole set of social TV apps is mostly about finding who is watching what at any point in time.

The same objective can be fulfilled by making TV's more intelligent (Samsung TV apps) and also by network connected setup boxes.  But I feel universal remove is a much cleaner and simpler way to go about doing this. If google were to crack this, they will not only be the entry point to anything we do on the browser but also entry point to all devices we use in our houses.  When we switch on/off lights, how many times does microwave is used, how much TV we watch, etc.

It will be interesting to see how they market this and at what price points.
 

Thursday, February 02, 2012

Cacheismo learnings

I already knew lua, memory management, writing servers and other technical bits. I did learned the automake stuff to make sure people can compile it.

But the best part was something else. Marketing. I guess I failed miserably at that one. I don't know if anyone has downloaded cacheismo code and tried to compile it or is anyone using it. I think it is one of my best works and it is free and I don't know what more do I need to do to convince people that it is better that memcached.

I tried the following:

  • I wrote a mail on memcached group explaining cacheismo.
  • I wrote to author of highscalability.com blog. He was kind enough to include a link. 
  • I created cacheismo google group. Only my friends joined. (Thanks!). No questions so far.
  • I tried to answer some of the questions on stackoverflow about memcached. I looked at problems which people face but can't solve with memcached. Tried answering the questions to best of my knowledge and also provided information about how it can solved using cacheismo. Someone removed all my posts :( from stackoverflow. 

So I guess even if their exist people who might find cacheismo useful, it is kind of impossible for them to find it, unless of-course they magically search for cacheismo on google.  So the question is what is the plan? And the answer is nothing much. I am not actively working on cacheismo. I will be more than happy to help anyone who wants to use it.  I need to solve the discovery problem and the plan there is to keep posting to stackoverflow...until the person who deletes my posts gets tired of it. Quora is another option. And may be some videos on youtube.  

May be caching is not such a big problem for people and memcached is good enough. Well in that case I will write some more servers. Http Proxy something like haproxy but configurable in lua might be fun. Or may be websocket server for HTML 5 applications.  

Wednesday, January 11, 2012

Loan or Subscription

Some things are expensive. If sold only at their price points, they would have a very small addressable market. Even if they are useful, even if they would provide move value over time than what they cost, making a choice to buy them is fairly hard.

The first solution to this problem was delayed payments. Companies will allow customer to settle bills later, helping them in maintaining cash flows. The second solution was loan from banks and the latest one is credit cards.  All these mechanisms allow customers to have something of value now and pay for it later (hopefully realizing the value of the thing they purchased). But the risk of not being able to realize the value stays as the purchase is not reversible.

In my opinion the best solution to this problem is service model, pay as you use. Many of the severs used in companies have vanished. The email server, the content management system, the svn repository, etc. We have gmail for email, google docs for content, github for svn and all of these are on services model. 

I guess subscription helps in figuring out the "price" of something. Figuring out lifetime value delivered by a product is hard. By forcing ourselves to think about value of something in a month, we have better chances of falling in range that is acceptable to the customers. The second benefit is frequency of choice. We give our customers every month a chance to stop using us and selecting someone else. This has two benefits. One it makes decision making simple for the customer and second it brings focus to the company because we need to keep "selling" ourselves, keep delighting the customer, to be in business.

Software is not the only thing that can be sold as a service. Almost everything can fit in this scheme of things. If you are selling something expensive that provides its value over time, subscription is your best answer to "barrier to buy".  Product/Service is a superficial packaging over the value you provide. Package your value the way it makes sense for your customers.

Tuesday, January 10, 2012

Life is not a game

In a game either you win or your opponent. In life, you may both win and you may both loose or the usual gaming rule apply.

Relational DB and Objects


Strange observation.

DB's mature over time and adapt to new requirements much easier than code. Code breaks from release to release and after some time needs major changes to keep in shape.

Relational DBs are more Zen compatible - A is A and A is not A. Code is usually not so flexible. More strangely, relational db's are less flexible (schema, everything is table) and ends up being more flexible whereas code is flexible (extend, copy-paste, interfaces, override) and ends up being inflexible.

Wonder, if we could associate code (C, Java, PHP) with tables/columns or joins and write our web applications in SQL.  Immediately all the hashmaps and arrays become moot. No for/while loops just select/where.  It becomes so much easier to create as many views as you like which are consistent by writing new SQL queries.  If queries can be named, they can be used as first level objects for creating higher level queries.  Empty tables that only have code associated with them but no data, can be used for providing all the decoration or access control or mappings.

I haven't seen something like this, but I do think that it is very much possible to build. With all the ORM tools and frameworks we are almost at the point where talking to DB is almost automated. I guess all I want to say is that SQL is probably a good framework for not just managing data, but also code. Lets give it a shot.

Friday, January 06, 2012

The Ambulance Lane

Hardly a day passes by when I don't see an ambulance screaming on the road while coming back from office.  In spite of all the good intentions of the frustrated drivers, most of the time ambulance is stuck with rest of the traffic.

I think one way to make things a little better would be fix say the left most or the right most lane as the virtual Ambulance Lane. The idea is to align the good intentions of people so that instead of randomly deciding how to give way to ambulance, it is pre-decided for them. How does it help? Lets say we decide right most lane to be Ambulance Lane. If you hear Ambulance and you are in the rightmost lane, move to left and make the right lane free.  If you are not on the right lane, make space for vehicles on the right. If you are in the rightmost lane and you can't move to left, honk, put on your parking lights and jump the red light. That is it.

As long as people follow this one rule, I am sure we could help more Ambulances reach hospitals on time.

Thursday, December 22, 2011

Economics of a slow judicial system

Let me go first to to non obvious and we will deal with the obvious later.

If it takes 10 or 20 years for the court to give judgement, who benefits from it? I guess it is the lawyers. The question is, is the judiciary slow or is it the lawyers who use the loopholes of the system to make cases last forever. If judiciary was fast and people didn't fight much in the courts, the only people who stand to loose is the lawyers.  Net income of lawyers is dependent on number of concurrent cases they are handling. To maximize their income, they should make sure that none of the cases they fight reach any judgement, because that is like losing a customer and why would a lawyer want it. Just a thought. No facts.

The obvious part is, even if you committed a crime and have enough money, lawyers will keep the judgement out of the way to let you live free.

This is another instance where demand supply completely breaks. The business depends upon the idea of selling justice, but not delivering it..because the moment it is delivered, no more money can be extracted from the customer.

Wednesday, November 30, 2011

Why power outlets are below the desk?

I have one pain in my life that I go through twice a day. The pain of plugging in laptop cable in the morning and taking it out in the evening. For some reason all carpenters/designers seem to find power outlets hidden below the desk more appropriate instead of providing one over the desk.

The reason is offices or the modern offices were invented in the era of desktops and it made sense to keep all the wires hidden below. Once connected, no one needs to touch them ever. Unfortunately most of world has moved on to laptops, but the modern office design has become a little obsolete. 

Friday, October 21, 2011

Use cases for Cacheismo

Cacheimso is a scriptable in-memory lua object/key value cache, written in c and works on most posix compliant systems. It is based on a simple idea of exposing objects with methods instead of opaque values.

  • Simplest use case would be to use cacheismo as a key value cache, just like memcached. It supports the tcp ascii memcached protocol. No need to specify or tune slab sizes. You would be surprised by improvement in your cache hit rate. This comes from using special memory management scheme to optimize memory usage.  Cacheismo Memory Management 
  • Second use case for cacheismo is when you would like to store server side state which is frequently modified. Typically this is where you would use CAS operations in memcached.  Example could be counters, rate limiting, last N users, etc. If you use memcached for storing session objects, it would be essential to map users to specific machines so that session information is not corrupted because of concurrent modifications to session state. With cacheismo you could keep your session object in cacheismo server and let any app server to handle any user request.  Cacheismo allows creating server side objects in lua scripting language which can be atomically modified by use of get operations on special keys called virtual keys.  Cacheismo running on single core can at max do about 80K get operations per second and this includes running a minimal script on every get operation. Your mileage will depend upon complexity of the script and size of your objects.   Sliding Window Counter Quota With Cacheismo Cacheismo Lua API
  • Cacheismo also supports talking to other cacheismo server from the scripting environment. This is based on two functions getFromServer and getInParallel.  These can be used to create synchronous  replication, proxy functionality to route requests to different servers and for in memory parallel computations.  Cacheismo Cluster Using Cacheismo Cluster For Real Time Analytics 
  • Cacheismo also supports querying for existing keys through scripts. This can be used to create replica of an existing system if the current system needs to be shutdown for any reason. Finding Keys In Cacheismo
  • Cacheismo supports server side consistent hashing. If you find it difficult to update all your app machines when new servers are added, you could use this feature to do seamless upgrades. This comes with a network cost. One hop to the first cacheismo server which finds the right node and second to the right node for the key. 

Thursday, October 20, 2011

Education for Education

Education loans are based on the same model as other loans, the difference being that interest rates are less compared to other kind of loans.  The model makes sense since banks are in it for money. But what if this was done by some non profit organization whose only function is to make it easy for people to get educated.

Lets say it cost C1 at time T1 to get education E. Lets say at some time T2 in the future the cost of same education is now C2.  Lets say "payback" amount for an education loan is simply a function of current cost of that education. You can "payback" at any point in time, after 2 years or 10 years or 20 years or do partial payments. Basically what you get at start is "cost of education" and what you payback is also "cost of education".  Essentially I am decoupling "cost of money" which is interest that banks charge over time from "cost of education".

Under normal circumstances, this would have ensured that non profit organization which has funds to support education of N persons, will continue to have funds to support N persons at all times in the future.  When I looked up wikipedia for cost of education is US, it looks like this would have been a very bad deal given that cost of education in US has increased 2.5 times more than inflation (which is a good indicator of interest rate scheme).  I don't know what is going on here.

The economic feedback loop is pretty big here.
 => High cost of education
 => Student debt
          => Unable to pay
                => Economic instability or political unrest
                     => Depression ?
 => Choose not to go for higher education
          => Less qualified people for the job
              => More salaries for qualified people
                     => Now debt for higher education is justified
 => Universities make loss, less students enrole
        => Universities decrease prices
            => Stability - cost of education is justified

It takes years before we know the mistakes and rectify them.  I keep coming back to the same point, money is a bad abstraction. We need a better model for making people help other people now, in belief that they will be helped when needed in future.

Index Based Salaries

Prices change. Some change every day, others may be once a month. Salaries are fixed, unless company decides to change it.

When government decides to increase petrol prices, price of almost everything increases because almost every business depends on transportation. But salaries don't change. When SBI increases cash reserve ratio or base rate, EMI increases, production decreases but salaries don't change.

It is kind of unfair that everyone in business can increase/decrease prices either based on market conditions or increase/decrease in cost of doing business but salaried people don't have that choice. The only choice they have is - talk to your manager or find a new job.

What if salary was a function? Say Hari Sadu has a home loan, drives a car and has children studying in school.  If his salary was linked to his home loan rate, cost of petrol and cost of education, he would be a very happy employee. Just like businesses can adjust their prices, if salaries could be adjusted the same way, it would create a much faster feedback loop for the economy to adjust to the new conditions. If a business decides that it cannot support salary function of some employee and wants to decrease his salary, it will come out clean and open. The current system of things, simply says things are same (salary stays the same) but actually employee is taking a hit and it stays unsaid, unacknowledged.  A 10% salary raise at yearly performance evaluation might be less that inflation or much much less than "impact of inflation" for a given employee.  In this new system, raise is a raise, it gives employee more spending power.

This is obviously a bad move for the companies because it makes one of fixed costs of business, variable. But what it also does is that it brings transparency. Just as businesses understand and adjust to prices of other businesses, it makes sense to understand and adjust to impact of prices on your employees.  If home loan rate increase by HDFC is eating into margins of Wipro or Infosys, they are in a better position of negotiate with HDFC than each employee on his own and then may be HDFC will figure out a deal where Wipro or Infosys charges less for software.  I guess all I am talking about is businesses being more people aware than money aware, because money in itself can do nothing, only people can.

Tuesday, October 18, 2011

Finding Keys In Cacheismo

In the last post I talked about how to do map-reduce like stuff with cacheimso. It occurred to me that their is no method to get all the keys. The global hashmap object now supports new method getPrefixMatchingKeys which as the name implies returns keys which match a certain prefix. Typically objects are stored with following template objectType$objectKey. Thus an object created by file quota.lua with key myKey will be stored with key quota$myKey. To do something with all the quota objects:
   
local keys = getHashMap():getPrefixMatchingKeys("quota$")
for k,v in pairs(keys) do 
   print(v)
end

If you would like the quota object itself to support say getAllKeys method then ...add this to quota.lua in scripts directory.



function getAllKeys() 

    local keys = getHashMap():getPrefixMatchingKeys("quota$")
    local result = ""
    for k,v in pairs(keys) do 
       result..v.."\n"  
    end
end


Now a get request with key quota:getAllKeys would return a new line separated list of all active quota objects in the map.

This is good enough but probably not very interesting. I am planning to support indexes as first class objects to make it faster to reach interesting objects in quick time. These indexes will automatically remove objects which are deleted or forced to expire because of LRU caching. So if you need to find all the quota objects that are close to quota limit, create an index on quota used value.

Thursday, October 13, 2011

Using Cacheismo Cluster for Real Time Analytics


Cacheismo now supports invoking requests on other server.
  - getFromServer
    This function takes the server name and a key as an argument and returns the result to the script.
  - getInParallel
    This function takes map of server names to list of keys and gets values for all the keys from all the servers in parallel. Once all the results are received, the script gets a map of server names to map of keys and received values.

Here is a simple example code which gets to top accessed keys from the cacheismo cluster using a simplified, framework-less map-reduce.  The contents below should belong to file names mapreduce.lua

-- function called on every cacheismo node to get 
-- the top keys by access count
-- virtual key : mapreduce:topKeys:


function topKeys(count) 
    local keys  = accessCounter:getKeyAccessCount()
    -- sort keys by access count
    table.sort (keys)
    local result = {}
    local i      = 1
    for (i in 1..count) do 
       result[i] = keys[i]
    end
    return result
end


-- function called on one of the cacheismo nodes 
-- which distributes work to all other nodes using getInParallel
-- virtual key : mapreduce:manageGetTop:10 


function manageGetTop(count) 
     local query = {}
     for k,v in pairs(servers) do 
        query[k] = "mapreduce:topKeys:"..count
     end
     local results = getInParallel(query)
     local total = {}
     for k,v in pairs(results) do 
           for k1,v1 in pairs(v) do 
              total[k1] = v1
           end
     end
     table.sort (total)
     local topkeys = {}
     for (i in 1..count) do 
       topkeys[i] = total[i]
     end
end


I have taken some liberties with the syntax and their is no accessCounter object by default in cacheismo, but is fairly easy to create in the script.  Note that the above implementation doesn't have ant intermediate nodes, but is trivial to create by calling get for mapreduce:manageGetTop instead of calling mapreduce:topKeys.

Single cacheimso server might be serving thousands of such queries because everything is happening in non-blocking manner but we give user the illusion of synchronous stack using lua coroutines.

Links:
Cacheismo Introduction
Cacheismo Lua API
Cacheimso Sliding Window Counter Implementation in Lua
Cacheismo Quota Implementation in Lua
Cacheismo Memory Allocation
Cacheismo Source Code
Cacheismo Support

Tuesday, October 11, 2011

Multi-threaded vs Multi-process

Cacheismo doesn't use threads. So does redis. Memcached is multi-threaded. How to choose?
Multi-threaded applications are hard to code and understand. Non-blocking multi-threaded applications are close to nightmare. I had the honor of writing multi-threaded non-blocking http proxy @apigee. It was written in c, had JNI bindings with java and could talk to our custom interface with kernel using shared memory (to avoid kernel space to user space data copy).  Having done that, I choose not to use threads in cacheismo.

Scalability of multi-threaded applications with multiple cores is simply a function of how much work can be done in parallel. The lesser the conflicts between the threads, better is the performance. The first problem is distribution of work among the requests. If multiple threads are working  concurrently on same piece of data, it is hard for them to make progress. Memcached uses a single selector thread to bind connections to different threads. It is not optimal. It will become bottleneck when you have thousands of clients trying to connect to the server at the same time. May be all connections of one thread are sleeping whereas all connections of some other thread are killing the cpu. So even though we have some work, few threads are not bothered because it doesn't belongs to them. This is kind of not possible with memcached (it hardly uses cpu), but very much possible with cacheismo running user scripts.  Now consider the hashmap itself. One hashmap, multiple threads, unavoidable lock. Possible solution here would be striping. What about slabs. One possibility is per thread slabs or some form of thread local caching or instead of using lock at slab level, use per slab locks.

In short, to make multi-threaded code work at optimal levels, we have to looking at our objects in finer granularity than what we originally planned. We have to make our design sub optimal to make sure threads make progress, even if at the cost of under utilization of resources.
Multi-process model works the opposite way. Consider your complete code/process as an object and create multiples of them. Have some way to distribute requests to these individual objects/processes. In many cases it is possible, specifically when either clients have knowledge of servers (consistent hashing) or if some intelligent proxy can do this for us (http load balancer). Even this could be non-optimal, but at least it is simple. What else? Memory leaks are easier to manage with multi-process model. Just restart the process. Since memory is managed at process level, it can be easily reclaimed by killing the process and if we have multiple of them, the service is not disrupted for every user. In a multi-threaded model, restart will impact all users. No locks are needed in multi process code, unless we use shared memory. The process boundary gives exclusive access to memory which doesn't need further management.

I don't see multi-process code in any ways inferior to multi-threaded code. It has some benefits like simple to code, simple to debug, simple to maintain. Other beauty is if instead of running 1 process with multiple threads you are running multiple processes with single thread, crash in the first case will cause full system failure, whereas in second case you still have (n-1)/n percent of the system up.  It is easy to upgrade things one at a time because we have more than 1 thing. With single server, no option but to shut it down, upgrade and start.


The only problem I see with multi-process code is duplication of basic data structure and configuration.  For example running each requests in a separate java process in tomcat or jboss will be a really bad idea. There is simply too much shared state - db connection pool, byte code, runtime native code, session information, etc.   The other problem is a bit superficial one; instead of managing one process, we have to manage multiple of them (use scripts?)
If your solution is either stateless or it can be partitioned such that there is no interdependence, multiprocess will make more sense. It is a difficult choice, but just don't use threads if you can survive without them, specially with non-blocking code. Try  cacheismo and see for  yourself.

Tuesday, October 04, 2011

Cacheismo Cluster Update 2

Today I could set the value on server 1 and get it back from server 2.  This script of server 2 was changed a bit.


function handleGET(command, key) 
    local cacheItem = getHashMap():get(key)
    if (cacheItem ~= nil) then
   command:writeCacheItem(cacheItem)
   cacheItem:delete()
else 
   local result = command:getFromExternalServer("127.0.0.1:11212", key)
   if (result ~= nil) then
       writeStringAsValue(command, key, result)
   end 
    end
end

This is where I feel lua beats javascript because of its excellent coroutine support.  I could easily change the above code to a while loop and go through all servers in the cluster looking for a value and it would not change the complexity of the code a bit, but try doing it with javascript callbacks and it would look like a mess. 

I am thinking now that the original idea of using dataKey and virtualKey is a bit lame. What I would do now is that I will make server an explicit argument, just like I used above. For consistent hashing we will have an object exposed in lua which can be used to find the correct server via consistent hashing, but it would be optional to use. This gives users the flexibility to use some cacheismo instances as pure proxies and others as caches. Some of the servers can be part of one consistent hashing group and others can be part of another. For some reason if users want to use one of the servers as a naming server to map keys to server, they can use it that way. Essentially users have full flexibility with respect to how to map keys to servers  - consistent hashing, multiple servers hidden behind a proxy(cacheismo), naming server to map keys to servers or any combination of these.  

Other possibilities include replication and quorum. These can be accomplished with the current API itself, but I will add parallel get support to make it as fast as possible. 

Friday, September 30, 2011

Cacheismo Lua API


The scripting capabilities of cacheismo are very powerful. The code for handing commands like get/set/add etc is completely written in lua using the lua interface. Here is the code for handling SET command:


local function handleSET(command) 
  local hashMap   = getHashMap()   
  local cacheItem = hashMap:get(command:getKey())
  if (cacheItem ~= nil) then 
      hashMap:delete(command:getKey())
      cacheItem:delete()
      cacheItem = nil
  end 
  cacheItem = command:newCacheItem()
  if (cacheItem ~= nil) then 
      hashMap:put(cacheItem)
      command:writeString("STORED\r\n")
  else 
     command:writeString("SERVER_ERROR Not Enough Memory\r\n")
  end
  return 0
end



This is the list of the lua methods which can be called by new scripts

getHashMap()
- Global method, return instance of the global hashmap object.

setLogLevel(level)
- Sets the logging level. Four levels are defined DEBUG=0, INFO=2,
  WARN=2 and ERR=3. The function takes numeric argument.

Table 
The standard table object in lua is extended to support the following methods

marshal(table) 
- returns a string object which represents the serialized table.

unmarshal(serializedTable)
- returns a fully constructed table from the given serialized table string

clone(table) 
- returns a deep copy of the given table


HashMap
The global hashmap object retrieved from getHashMap() supports following methods

get(key) 
- returns a cacheItem object for the given key if found, else nil

put(cacheItem) 
- takes a cacheItem object and puts it in the hashMap
     
delete(key) 
- deletes the cacheItem object associated with the given key from the hashMap.

deleteLRU(requiredSpace) 
- deletes as many LRU objects as required to free up at least requiredSpace.


CacheItem
The data in the hashMap is stored in the form of cacheItem objects. We need to have
a cacheItem object to store in the map and cacheItem is what we get from the map
when we do a get. The object supports following methods. These objects are
read only.

getKey()
- returns the key associated with the cacheItem

getKeySize()
- returns the number of bytes in the key

getExpiryTime()
- returns the time in seconds since epoch when the current item will expire.

getFlags()
- returns the flags associated with the cacheItem

getDataSize()
- returns the size of the data stored in this cacheItem

getData()
- returns the data stored in the cacheItem as lua string. This is normally not
used because lua script don't need access to actual data unless using virtual keys.

delete()
- deletes the reference to the cacheItem object. Important to call this after get
from hashMap, after using the cacheItem.


Command
This represents the request coming from the client. This can be modified if
required by the script.

getCommand()
- returns the memcached the command being used.
  One of get, add, set, replace, prepend, append, cas, incr, decr, gets, delete,
  stats, flush_all, version, quit or verbosity
     
getKey()
- returns the key used in the command. This is nil in case of multi-get and other
  commands which don't have key as argument.

setKey(newKey)
- set the given string newKey as the key for the command object.

getKeySize()
- returns the size in bytes for the key

getExpiryTime()
- returns the expiryTime specified in the command.

setExpiryTime(newTime)
- sets the expiry time to new value newTime.

getFlags()
- returns the flags specified in the command. It is also used by the verbosity
  command to return the logging level.
 
setFlags(newFlags)
- sets the falgs in the command object to new flags.

getDelta()
- returns the delta value used in incr/decr commands

setDelta(newDelta)
- sets the delta value to new delta value

getNoReply()
- returns the boolean noReply from the command.

setNoReply(newNoReply)
- sets the value for the noReply

getDataSize()
- returns the size of data in the command, in bytes. This works for set, add, etc.

setData(newData)
- replaces the data in the command object with the newData string

getData()
- returns the data in the command object as lua string

newCacheItem()
- creates and returns a cacheItem object using the data in the command object.
  This is used with set/add/etc

writeCacheItem(cacheItem)
- writes the cacheItem in the memcached response format on the client socket.
   "VALUE key flags size\r\ndata\r\n"
  cas is not supported.

writeString(dataString)
- writes arbitrary string to the client socket. This is useful for writing
  "END\r\n", "STORED" and other standard memcached response strings.

hasMultipleKeys()
- returns the number of multi-get keys in the command

getMultipleKeys()
- returns a table of all the keys in the multi-get command

Apart from these, few helper methods defined in config.lua are also useful.

writeStringAsValue(command, key, value) 
- writes an arbitrary string in the "VALUE key flags size\r\ndata\r\n" format.
  flags is 0 and size is calculated using string.len


executeReadOnly(command, originalKey, objectType, cacheKey, func, ...) 
- helper function for read only operations on lua tables stored in the cache.
 It retrievs the data from the cache, uses table.unmarshal to construct the
 lua table and calls the function func with this lua table as the first
 argument. Any extra args passed to this function are passed to function func.


executeReadWrite(command, originalKey, objectType, cacheKey, func, ...) 
- helper function for read/write operations on lua tables stored in the cache
  orginal object is deleted and a new object is created with the same key with
  latest data values
 

executeNew(command, originalKey, objectType, cacheKey, func, ...)
- helper function for creation of new objects based on lua tables. if the key
  is in use, it is deleted before creating the new object

See set.lua, map.lua, quota.lua and swcounter.lua for example usage.


Cacheismo sliding window counter

Sliding window counters are useful for maintaining information about some last n transactions. One example could be a site like pingdom storing last n ping times for a site. Cacheismo comes with a default sliding window counter example which can be modified to suit your needs.  The current implementation supports new, add, getlast, getmin, getmax, getavg.

Note that these are memached get requests with special keys encoding our intent.  The keys follow the following syntax:
<ObjectType>:<Method>:<ObjectKey>:<Arg1>:<Arg2>

Object type is "swcounter". Cacheismo has other object types as well and more can be added by adding new lua scripts. Methods for swcounter object are "new", "add", "getmin", "getmax","getavg", "getlast", Object key we are using is "myCounter". You would need key per counter. Number of arguments would vary depending upon the method. Here new takes 1 arg (number of values to store), add takes 1 arg (value to store). Rest all methods take no arguments.

get swcounter:new:myCounter:2
VALUE  swcounter:myCounter:100 0 8 
CREATED 
-- Create a new counter which stores the last 2 values
get swcounter:add:myCounter:100
VALUE  swcounter:add:myCounter:100 0 7 
SUCCESS 
--store 100 as first value 
get swcounter:add:myCounter:50
VALUE  swcounter:add:myCounter:50 0 7 
SUCCESS 
-- store 50 as second value
get swcounter:getmin:myCounter
VALUE  swcounter:getmin:myCounter 0 2
50
-- get the min entry among all the stored values
get swcounter:getmax:myCounter
VALUE  swcounter:getmax:myCounter 0 3
100
-- get the max entry among all the stored values
get swcounter:getavg:myCounter
VALUE  swcounter:getavg:myCounter 0 2
75
-- get the average value for all the stored values

The complete implementation in lua looks like this.
====================================================
-- sliding window counter 
local function new(windowsize) 
  local  object = {}
  if (windowsize == nil) then
     windowsize = "16"
  end
  object.size   = tonumber(windowsize)
  object.index  = 0
  object.values = {}
  local count   = 0
  for count = 1,object.size do 
object.values[count] = 0
  end
  return object
end

local function add(object, value) 
   object.index = object.index+1
   if (object.index > object.size) then 
       object.index = 1 
   end 
   object.values[object.index] = tonumber (value) 
end

local function getlast(object)
   if (object.index ~= 0) then
       return object.values[object.index]
   end
   return 0
end

local function getmin(object)
   local count = 0
   local min   = math.huge
   for count = 1,object.size do
       if (object.values[count] < min) then
          min = object.values[count]
       end 
   end
   return min
end

local function getmax(object)
   local count = 0
   local max   = 0
   for count = 1,object.size do
       if (object.values[count] > max) then
          max = object.values[count]
       end 
   end
   return max
end

local function getavg(object)
   local count = 0
   local total   = 0
   for count = 1, object.size do
        total = total + object.values[count]
   end
   return (total/object.size)
end
=============================================================