Saturday, March 17, 2012

Frequency of Choice

I have talked about this earlier also, but the topic is so close to my heart that I wanted to have a dedicated post.

As far as I understand it, the crux of capitalism is choice. In economics the word choice is substituted by market. What is market? Market is where consumers exercise choice. If consumers don't have a choice it is not a market. The assumption is capitalism thrives on competition.  Competition creates choice. Consumers will choose the best products at lowest prices forcing companies to innovate and reduce prices. The best will survive.

This is all true and then not quite true. Two main problems:

  • Most people don't like to think. Even  if they can, the complexity of world is sufficiently high that figuring out what is best for them is close to impossible. Eventually it is either brands or price because they make the decision simple. 
  • Frequency of choice. Since this is all I want to talk about, I will use the next paragraph.
We are good at stuff we do often, the old practice makes the man perfect thing. We buy petrol, vegetables, groceries, etc almost every day. Prices changes are felt, drop in quality is noticed. But then their are things that we don't do often. Things like joining new job, getting married, buying car or home, taking a loan, choosing college, getting home painted, buying TV or refrigerator or AC, casting our vote, choosing a laptop or OS, choosing email client, signing up on a social network, etc.  Many of these choices are irreversible or if not irreversible then choosing the alter our choice is very expensive.  This is where capitalism fails miserably because it is no longer about choosing from alternatives but the choice of altering our choice. For products with short life spans like vegetables or toothpaste altering a choice is not expensive. Vegetables will last few days, toothpaste few weeks and you can choose better product next time, but with products that last years or decades or in some cases lifetimes, it is the altering of choice which is required not choice among products. 

Consider home loan business in India. Floating rates have been around for long time now.  What do they float on is unknown and once you take the loan you realize that the "unknown" is whim of the Bank. Usually your floating home loan interest rate will increase by 20%-40% within few months of taking the loan and now their is no choice.  Well their is a choice to switch to other home loan, but only if you pay 2%-4% of your home loan value as switching charges.  This is as monopolistic as it gets and we call it capitalism, the mecca of markets and choice.  Even banks don't know if they are giving a good/bad interest rate to the customer, then how can customer decide if he is getting a good deal and that deal is good enough for the next 20 years. No one can. The only way I can know if I getting a good deal is if I can switch my home loan any moment I desire to switch. That is what will make it a market.

The same happens when switching a job (notice period), casting a vote (5 years gap), buying a car (10% value drop when it get out of the showroom) and at many other places. In computers, the advent of SAAS based companies have started filling this gap by providing monthly choice to the customers to continue to use them which once was a difficult choice of finding the best product. Amazon EC2 gives choice to use machines by hour and OS by hour. I think governments which call themselves followers of capitalism have missed a point.  It is not the choice alone that matters, it is the frequency of the choice that is at the core of efficient markets.  

Tuesday, March 06, 2012

Threadpool and the task queue

Every architecture makes way for threadpool and a task queue.  Multiple thread wait on the queue always ready to pick up the next task and execute them. Once implemented, the next task is tuning it. How many threads? What is the size of the queue? Blocking queue or throws error on full? Retry handler?

Before you start worrying about this ask a simple question. How much time does it takes to execute the task? If it is not at least couple of orders of magnitude greater than time it takes to do context switch, don't bother about the threadpool/queue, just execute it right there, on your current thread.

Here is why?
  • Task queue has a lock. More threads and more often it is accessed, more contention, more time to submit the task. Extra context switch just to acquire the lock. Basically you are doing serialization before getting to parallelism here. More threads + more tasks => more time per insert. Think of it like talking to a customer care executive(CCE). You do lots of IO using IVR and finally reach the CCE and the guy instead of answering your questions connects you to another guy and you need to explain the problem once again. That is pretty much how context switch works. If you need to talk for 10-20 minutes, it might be worth it, but if all it takes is few seconds of conversation, it just wastes time.   
  • Once the task is submitted, it need to wakeup some thread. That is context switch, costs time.
  • By the time this new thread wakes up because of lock and time elapsed most of the variable it needs are out of cache...more time. Read lock semantics for JVM. 
  • How do you do error handling from the task? Extra code, extra states.
You can avoid all this by executing the task inline....normal function call. It will run faster.  It is easy to write/debug. The assumption here is that the task really takes short time to execute and it mostly cpu intensive. Webserver using threadpool is understandable. Single request might need to do file IO, access some locked resources, possibly make multiple database queries. These are kind of things that make sense in threadpool...things that are complex enough to be simplified by using  a new/dedicated "thread of execution".  For other things, function call is the most efficient.  

Monday, March 05, 2012

Out of select

Some non-blocking architectures use separate threads for IO and others as worker threads. One problem with this design is extra latency because even though the worker is done with its work, the IO thread is blocked on select/epoll. One simple way to wake up selector under such conditions is to open a pipe per selector thread. The read end is made part of selector fd set and the write end is used by the worker thread to write one byte on the pipe which wakes up the selector. This ensures that whenever new fd needs to be registered with select, it wakes up as soon as possible instead of timing out on read timeout.