Tuesday, March 06, 2012

Threadpool and the task queue

Every architecture makes way for threadpool and a task queue.  Multiple thread wait on the queue always ready to pick up the next task and execute them. Once implemented, the next task is tuning it. How many threads? What is the size of the queue? Blocking queue or throws error on full? Retry handler?

Before you start worrying about this ask a simple question. How much time does it takes to execute the task? If it is not at least couple of orders of magnitude greater than time it takes to do context switch, don't bother about the threadpool/queue, just execute it right there, on your current thread.

Here is why?
  • Task queue has a lock. More threads and more often it is accessed, more contention, more time to submit the task. Extra context switch just to acquire the lock. Basically you are doing serialization before getting to parallelism here. More threads + more tasks => more time per insert. Think of it like talking to a customer care executive(CCE). You do lots of IO using IVR and finally reach the CCE and the guy instead of answering your questions connects you to another guy and you need to explain the problem once again. That is pretty much how context switch works. If you need to talk for 10-20 minutes, it might be worth it, but if all it takes is few seconds of conversation, it just wastes time.   
  • Once the task is submitted, it need to wakeup some thread. That is context switch, costs time.
  • By the time this new thread wakes up because of lock and time elapsed most of the variable it needs are out of cache...more time. Read lock semantics for JVM. 
  • How do you do error handling from the task? Extra code, extra states.
You can avoid all this by executing the task inline....normal function call. It will run faster.  It is easy to write/debug. The assumption here is that the task really takes short time to execute and it mostly cpu intensive. Webserver using threadpool is understandable. Single request might need to do file IO, access some locked resources, possibly make multiple database queries. These are kind of things that make sense in threadpool...things that are complex enough to be simplified by using  a new/dedicated "thread of execution".  For other things, function call is the most efficient.  

No comments:

Post a Comment