Monday, May 14, 2018

Counting Representation

Democracy is great and so is the idea of representative government. The point where it starts hurting is that majority wins. By definition, majority is representative of just majority.  Even majority representing majority is questionable. Majority does represents majority when we have just two parties. But when we have more the two, we can be easily sure that majority is not represented. For example, in a three party, highly contested elections 34% would count as winning majority and it excludes 66% (real majority).  Essentially, majority of votes don't translate to majority in representation.

Surprisingly, it is easy to fix. All we need to change is what we count. The basic idea is to add the dimension of time to representation. Instead of the crude approximation of majority, we can have complete representation for everyone over time.  As popular is management circles, you only get what you measure.

How do we measure representation? As we discussed earlier, votes don't translate to majority representation. Well some of the votes do translate to representation. The votes which are cast to the winning party. The votes that are not cast and the votes that are not cast to the winning party are useless. They don't count toward representation.

Without much ado, here are the rules of the game:

  1. Votes which result in representation are "spent"
  2. Votes that don't get representation are "not spent"
  3. Votes that are not cast are "not spent"
In other words, votes can be stored and used across elections.  Each vote is worth 5 years of representation. They work just like currency as store of "political value".  For example: say we had a two party system with 100 people and say party A won the election with 51 votes. These 51 votes are counted as spent. Rest 49 people get to keep their unrepresented vote for later use. In the next elections, 49 people will have 2 votes each and 51 will have one vote each. It is easy to see that with 98 votes, these 49 people will we able to easily get their representation.  

Given 65 years of life expectancy and 18 years as voting age, people get to vote around 10 times in their life. If the number of political parties is less that or equal to 10, even some 10% of the population will get a chance to form a government during their life time. To make this work with even larger number of parties, we will need to either have more frequent elections or allow inheritance of "political value", just like property and money.  This makes it possible for any arbitrary group of people to eventually form a government, perhaps once in few hundred years. 

It is easy to see that this method of counting representation leads to representation of all people over time. It will probably help stop the madness around "winning" elections and all that goes into it. Why? Because if you win, you make it easy for others to win the next time. If you loose, you chance of success increase over time. 

From what I could reason about, the system has two equilibriums. One: If people are cleanly partitioned into groups, over time each group gets representation and world becomes fair in terms of representation. If the groups are greedy, they will choose policies which advance their own groups. Sadly, the same strategy will be used by each of the other groups. Sad, but fair. 

The second equilibrium, would be towards enlarging or growing the size of these groups. Essentially, if the two groups can sort out their differences and work towards what is common and important for them, we get one less group and hopefully policies which work towards welfare of all the people in this larger group. The recursive logic will bring us to some manageable number of groups or even just one.  We might even see groups getting split when they cannot reconcile their differences, but that still leaves the process fair, just and representative. 

No matter which way the wind blows, we can always be sure that everyone is getting represented over time. If nothing else, it reduces the cost of running political parties and hopefully that is the money which can be used for advancement of the country and providing public goods. 

Criticism is most welcome!  

Saturday, July 29, 2017

On eventually consistent file listing


Cheap s3 storage comes with unusual cost: correctness.  One of the key problems while working with highly partitioned data stored in s3, is the problem of eventual consistency in file listing. What exactly is this problem and how can we think about mitigating its impact: we will discuss in this post.

File listing is a very common operations since the invention of files. Given a directory or path, it gives us the list of files under that path. Tons of lines of code written over file systems, depend on correctness of this operation.  Unfortunately, this assumption breaks when the listing is done on s3 or for that matter any blob store.

Deconstructing file listing 

One way to think about eventual consistency of file listing is to argue that we get a wrong answer. This is correct to some extent but not powerful enough to do something about it. To do something about it, we need to dig a bit deeper and understand the nature of this wrongness. I find it useful to characterise this in the following form:
  • Ghost files 
  • Conceived files 
Lets try to understand what they mean. Ghost files are files which are listed by the file listing operation but they have actually been deleted from the file system. This is a very common reason for job failures in spark and hive. They are called ghost for obvious reasons. Conceived files on the other hand are those files which actually exist, but were not returned by the listing API.  In the happy (immediately unhappy) path, eventual consistency causes jobs to fail, because further operations on ghost file keep failing, irrespective of the number of retries. In the unhappy (short term happy) path, we have data loss because of conceived file, because they are simply missing in the final result,  resulting in incorrect answers.  

Given these two constructs, we can argue that the wrongness of a listing operation will occur either because of Ghost files (files which shouldn't be present in the listing but are) and conceived files (files which should be present in the listing but are not there). We can now have separate solutions for dealing with detection and consequences of these two file classes.

Dealing with Ghost files

Ghost file are files which shouldn't have existed in the listing API to start with. Once they show up, they cause different problem depending on what operations we are doing with these files. Most common problem would be subsequent file not found errors. One way to deal with this is to do a fresh listing operations and do a set subtraction.
Let A be the result of listing operation at time t1
and B be the result of listing operation at time t2,
where t2 > t1.
Set A-B i.e the files which are in A but not in B, is essentially the list of currently known ghost files. Once detected, we can choose to deal with them in some form. One simple way is to ignore the failures caused by ghost files, because we know they should fail. The other option is to remove them from our task queue, because we know they are not part of the final solution set. We might need to iterate multiple times (say, till a fixed point) to find out all the ghost files.

Dealing with Conceived files

Conceived files are the files which didn't even show up.
Lets again consider that A be the result of listing operation at time t1
and B be the result of listing operation at time t2,
where t2 > t1.
Set B-A i.e the files which are in B but were not in A, is essentially the list of current known conceived files. These are files which we would have missed if we only do a single listing operation. Once detected, we can choose to deal with them in some form.  Handling of conceived files is relatively simple. We just need to add them to our working set. We might need to iterate multiple times (say, till a fixed point) to find out all the conceived files and treat them as if they were part of the original listing operation. 

It is tempting to say why not wait until the system has become consistent before starting the work. In theory it works, it practice we don't know how much time it will take. Starting with whatever information we can get from the listing API, we get a head start and can keep revising our task set depending upon what further listing API reveals. What we get through this approach is correctness but without introducing any performance penalties.

In conclusion, we can deal with eventual consistency in file listing operations by repeating the listing operation, detecting ghost and conceived files and modifying our work queues to take our new knowledge about the listing status into account.

Saturday, July 08, 2017

Don't screw yourself

It is a good principle to practice is life. In other terms it would mean don't do something stupid, or something that is not good for you.  Here and now, this is easier to practice. Add time and space to it and we have no clue what it means.

One of the popular ways of screwing yourself involves passage of time. Smoking is a classical example. Not saving, eating unhealthy food, no exercise, not learning new things, the list is endless.

Self screwing is not the only option. We can screw ourselves through other also. One of the simplest ways to punch yourself in the face is to punch someone else. Or consider not respecting right of other people to join the traffic. When we block someone, they block someone else. Since roads are a inherently connected, these small steps helps in creating larger deadlocks.

The point I wanted to make is: we are interconnected and these interconnections make is difficult to view our actions in isolation. These connections connect us not only to one another now but also ourselves to our future. Screwing others is just another way to screw yourself...not now, not here..but someday and somewhere.

Monday, February 01, 2016

Apartment Security: Can we make it smarter using smartphones

We love our families and one place it shows up is when it comes to securing our apartments. What are our expectations? With in reasonable limits, we want to prevent unauthorised people from entering our apartment. Specifically we want to answer the following two questions about visitors, before they can step inside:
  1. Who are you?
  2. Are you invited?
We do a good job of answering the second question by enforcing the rule that for every visitor, security should call up the resident and confirm before letting anyone in, assuming we have intercom. But we really don’t know much about 1. At best we ask the visitor to do an entry into the register which cannot be proved or disproved. We just hope it is correct and useful, incase we need it. Inspite of these measures, few important questions go unanswered.
  • What if something wrong happens? How do we help police in finding the culprits? CCTV helps but we know that inspite of CCTC footage, the accused is still absconding after 2 years of Bangalore ATM assault case. VIDEO: CCTV vs Phone Number
  • We do take care of entry of visitors, but are we absolutely sure about what happens between them leaving the flat and showing up on the main gate? How can we prevent visitors from loitering around in the apartment complex once they are done with the primary visit. 
Unfortunately we don’t know much. SiftApps is trying to solve this problem. Most of the time when we bring additional security measures we are faced with the dilemma of additional time that both the security and visitor have to spend in establishing credentials and authorisation. For example to ensure that we have a valid phone number of the visitor, we can come up with a rule that every visitor need to make a call on the security guard’s phone number and it is the responsibility of the security guard to enter the phone number himself in the visitor’s register. Obviously it will increase the time the visitor has to spend at the gate. Moreover we need “better” security guards. This still leaves us with the problem of knowing or preventing loitering by the visitors after they leave the flat and before they show up at the main gate. 

SiftApps has given considerable thought to this problem. How can we make security not only effective but also efficient in terms of time spent admitting any visitor. Another area which we focused on is making the process simpler enough which normal security guards can understand and follow. Specifically we tried to answer the following questions: 
  • Can we use smartphones instead of servers and computer terminals? 
  • How does instant connectivity helps us in becoming smarter collectively as a whole without requiring non trivial effort from any of us?  
The solution that we have come up with solves all the problems that I have described above. It consists of two apps. One app is for the security and the other app for residents.  The two fundamental questions that we want to answer about visitor remain:
  • Who are you?
  • Are you invited?
Resident app allows residents to “invite” visitors. It requires residents to input the visitor details including the phone number of the visitor. Using this information, the app will generate a unique code and SMS it to the visitor. This code is for one time use only and encodes all the information we need for the visitor including the resident who invited him. When the visitor arrives, all he has to do is to share the code with the security guard, who will input it in the app. The security guard’s app will validate the code, record the entry time of the visitor and display the result of validation to the security guard who can then allow or deny entry to the visitor. VIDEO: How Sift Works? Note that we don’t need visitors to have smartphones and nor do they need to install any app. 

At this stage we have validated phone number of the visitor and we have made sure that any visitor can be easily admitted or denied entry in less than 10 seconds by checking the code in the security guard’s app.

We went ahead and created another optional flow which takes care of the other problem I mentioned earlier. Namely how to prevent unwanted loitering of the visitors once they leave the flat and before they show up at the main gate.

We created a notion of exit code in the app, which is very similar to the entry code I explained above. It can be generated via the resident app when the visitor is about to leave. It is also SMSed to the visitor. When the visitor shows up at the main gate, we know how much time he has spent between generation of the exit code and its actual usage. This way we have complete information about the visitor:
  • his validated phone number 
  • time of entry 
  • time of leaving the flat 
  • time of showing up at the gate 
  • and the resident which invited the visitor 
It is not far fetched to imagine how this information can be used to create real time alerts for security to manage security in a proactive manner. Consider how the following will help in better security:
  • Ability to know exactly how many visitors are in the apartment at any point in time
  • Ability to raise alert to the security if a delivery person has not completed the visit within some pre-configured time
  • Ability to notify security if a person has left the flat, but not shown up at the main gate within some pre-configured time 
  • Ability to track multi-flat visitors and tracking of exit time at each of the flats 

It is a very good solution which gives us the information we need, without causing any extra effort on the part of the residents or the security guard. It doesn’t requires any additional infrastructure except may be a smart phone for the security guard. All our visitor information is available in digital format, easy to search and lookup. The privacy of our visitors is maintained as this information is kept securely on servers and not on some paper register which is available to scrutiny of any visitor who is willing enough to read. 

I would love to hear your feedback and would be happy to answer any questions. Please get in touch with us if you would like to see a demo or start your free trial.  Thanks for your time. Looking forward to help you secure your apartment.

Email: iamrohit  at
Watch this 95 seconds video to see Sift in action. VIDEO: How Sift Works?

Saturday, September 12, 2015

One time password

What is the point of hiding one time passwords?

  • Since I am entering it for the first and last time in my life, let me see it so that I don't type something wrong.
  • It is OK for other people to see it. It is anyway useless immediately after use. 
I wish Apple opens up a bank so that banks have someone to copy.