Tuesday, October 20, 2009

Zen and the art of filing bugs

This is bit old post..just re posting it here..

Finding bugs in software is the primary job of every QA Engineer. What is a bug? Unacceptable behavior of the software is termed as a bug. This is very generic definition and it is meant to be so. Thus one of the crucial aspects of being able to tell if some behavior is a bug or not, is to fully understand what is acceptable behavior of the software.

It is bad that we don’t yet have any metrics for the quality of a bug. I myself don’t have any new metrics but I will like to express my views on quality of bugs. In the rest of this document I will try to explain how using open and fixed bug count is a bad metric of software quality and tester effectiveness and is not only wrong but also hampers the overall growth of the company.

Non Bugs

Consider a simple car. It is not a bug in the car if the car doesn’t go beyond the speed of 200KM, but definitely a bug if that happens with a Ferrari. What is acceptable from a sports car is not the same as what is acceptable from a simple car. This understanding is very much crucial for any tester to develop so that he can contribute to the software quality.

The problem with finding and filings “bugs” which are not, has two shortcomings.
1. Tester wastes his time and energy is finding and filing it.
2. Developer wastes his time in either trying to fix it / mask it or convincing tester that this is not a bug.

From a company perspective this is not a good situation if majority of the tester and developer time is wasted on such “non bugs”. The basic cause of such situations is lack of proper documentation. It is utmost important that “acceptability” of any software is defined to a reasonable accuracy, so that it is possible for any one in the organization to identify “bugs” from “non bugs”, without much effort. As the percentage of such bugs increase, more and more of tester and developer time is wasted that could have been devoted to developing the product and finding actual bugs. This problem typically happens when a new tester joins a team and in his quest for being productive from the day one, ends up filing such bugs.

Duplicate Bugs a.k.a Bug Bloat

Another class of bugs that impacts the company growth just like “non bugs” is duplicate bugs. Unfortunately, these cannot be resolved by proper documentation. A duplicate bug is a bug, which is a side effect of some other actual bug. Both the bugs might have totally different test cases and may present totally different behaviors. On the face of it they might seem totally unrelated. But when the actual bug is fixed, the side effect is automatically fixed.

Developer is the only person that can establish if a given bug is a duplicate or not, because duplicate bug is not a duplicate in behavior to the original bug, but duplicate because of implementation characteristics. If a developer says that a given bug is a duplicate, it would be good if the tester accepts that hypothesis and verifies it when the actual bug is fixed. This in general is not the case.

Consider a car with a faulty battery that gets discharged as soon as the car is stopped. A tester can easily file the following bugs:
1. After the car is stopped it doesn’t starts.
2. After the car is stopped we cannot switch on lights.
3. After the car is stopped the horn doesn’t works.
4. After the car is stopped the wiper doesn’t works.
5. After the car is stopped the music system doesn’t works.

And may be lots more bugs. If the tester has no understanding of what a battery is and how it is related to functioning of other sub systems in the car, he can keep on filing as many bugs as he wants. They are all duplicates and will go away only by fixing the battery.

The reuse of code in the form of functions, data structures and general dependencies between the subsystems create ample opportunities for existence of such bugs in the software. I won’t say they are same bugs. From a user perspective they are all different bugs. But from the perspective of the developer they are all the same and hence the concept of duplicate bug. From the tester perspective it might be a great opportunity for increasing the bug count, but at the end of the day, it is the company that suffers from such “bug bloats”. Why?

1. Tester tests and opens N number of bugs. Instead of one unit of time he spends N units of time.
2. Developer goes through each of the bugs, logs, talks with the tester, tries to convince him that these are duplicates, marks it as duplicate, only to find that it has been reopened and reassigned and all that and at the end of the day fixing just one and marking all the others as fixed.
3. Tester tests for all the bugs and finds that all of them are fixed.

Again, we see that valuable tester and developer time is wasted in communication and following the processes and as far as the product is concerned, really only one bug got fixed. Company definitely stands at loss because the same time could have been spend in finding the real bugs and fixing them, for improving the product, for adding other features to the product.

It is highly unfortunate that there is no silver bullet is solving this issue. This can be minimized if the tester understands or try to understand such dependencies. We could either say that if a developer says a bug is duplicate it is duplicate, no questions asked. That would be the way of trust. The problem is that we might miss some bugs, because developer only “thinks” they are duplicates. On the other hand, we can use fact as the basis. File every bug as it exists. If we take that path, we end up decreasing our productivity if bugs are actually duplicates.

If we have more automated tests than ad hoc tests, we will be in a good position to solve such problems. When a bug is discovered in a particular component, developer can go through the list of tests to tell which other test cases will fail because of this bug. This would be one way to minimize the “bug bloat”.

Currently our test plans lack specificity. Every test case looks at the system as a whole without any consideration to individual components of the system. We can move a step forward from the black box testing and start exploring “grey” box testing. When I say grey box, I only mean that test cases are aware of the components it is going to test. Also, I don’t mean that we shouldn’t do black box testing.

All I mean is that each test case should have a well defined purpose and it should be known what components are being tested by each test case. What I want to convey is that if we know about the system and its components and inter dependencies between them, we can come up with a reasonable ordering of the test cases, which will help us in finding bugs faster, and in the test suite of the component responsible for the bug. To conclude:
1. Test case shouldn’t just have an id but they should have a context and a purpose.
2. The order in which test cases should be executed is not their id in the test plan, but an ordering developed by discussing it with the developer. A component owner must be able to tell “sanity” of which other components must be tested before his component is tested and he should be able to tell the order in which the test cases of his component should be tested. This could be further enhanced by capturing the inter dependencies between actual test cases in a component. Given the number of test cases in a component, it might make sense to further segregate test cases of component is various classes and defining “component test case class” level dependencies.

Again I am not saying that every test case should or will fall nicely into such a structure. Testing flow of messages when configuration changes are being made would be one such example, where success of the test case depends on correct working of all the subsystems. Clearly all such test cases should be run, when all the components pass their sanity tests. Further, we should first check for configuration changes that impact one system at a time, before checking for configuration changes that impact more than one component.

With a system as complex as ours, we need to make sure that our testing strategy is smart enough to facilitate and reduce the time it takes in finding and fixing the bugs. I believe that it is important that we do minimize “bug bloats” because they hamper productivity of the team in a big way.

Defining the Quality of Bug

Bugs have quality. Given that the purpose of QA is to increase the quality of the product, I will propose that the quality of the bug is in direct proportion to the increase in quality of the product by virtue of fixing that bug and inversely proportional to the time spend on fixing the bug. A high quality test case (bug) is the one which makes it obvious to the developer what needs to be done to fix it.

This definition assumes that the code is well written and doesn’t have any design flaws. And hence, a bug is coming because of some developer oversight at some places in the code. This is valid most of the time.

Given this definition, the time spent in fixing a bug could come from two places:
1. The complexity of the test case used in finding the bug. If the test case is too complex and involves too many components, it is hard to find out the root cause.
2. The details specified while filing the bug. Bugs without logs or irrelevant logs, incorrect summary, or without correct specification of the system state, etc contributes to the time spend in analyzing the bug and hence take the biggest chunk of time spent in fixing the bug.

When filing a bug, tester must try to find the minimal set of simple steps required to reproduce the bug. A bug which says that the system crashes when I run all my test cases in a loop with 100 threads is a very low quality bug, if the real issue was that a particular test case was leaking buffers.

Many times testers confuse themselves with the assumption that bugs found with “complex test scenario” are good quality bugs. This is true, if and only if the “complex test scenario” is the only manifestation of the bug, but it is a very low quality bug, if the same results can be produced by doing something very simple. Many a times the “complex test scenario” is the only manifestation, because tester never tried anything simple.

The quality of the bugs directly determines the quality of the product. Any tester can literally stop the growth of the product by filing too many low quality bugs, because developers will end of spending most of their time in analyzing the bugs than fixing them.

Low quality testing has the power to jeopardize product development.
Maintaining quality of the bugs is very important and for the success of the company, it is important that this quality is strived for.

Filing Bugs

As defined in the last section, filing of bugs plays an important part in determining the quality of the bug and hence the quality of the product.
Filing essentially means capturing enough information about the bug, so that developer can start fixing the bug as soon as he looks at the bug. This is the ideal, but we can strive for it. This may not be possible because of various reasons.
1. The bug doesn’t provide enough and definite steps to reproduce the bug. Or the steps are so complex that it takes some time to execute them and find the root cause.
2. The bug doesn’t have the required log files.
3. The bug doesn’t provide stack trace for crashes.
4. The bug doesn’t capture the details of the environment in which the bug was seen.
5. The bug doesn’t capture the details of the operation that was performed.
6. The bug doesn’t occur when the system is in clean state.
7. The bug doesn’t capture the history of the operation performed.
8. The bug doesn’t have a purpose or intention with respect to what component or feature or behavior is being tested.
9. The bug is filed without testing if the “softer” versions of the test case. The bug essentially describes a “point” in the “test space”, without exploring in any direction around the “test point”.

There are too many wrong ways to file a bug. But the one and only thing that the tester must keep in mind while filing is: “I want this to get fixed, as soon as possible.” And then try to provide as much information as possible to make the task of the developer as easy as possible. Obviously there are trade offs. It shouldn’t be the case that tester spends say 10 hours trying to provide the information about the bug, which developer might have deduced in few minutes. Go an extra mile, find out from the developer what does he wants to make it easy for him to fix, but avoid what is unnecessary and what is difficult.

To conclude, the real job is testing is not to find the bugs as generally believed but to get the bugs fixed. By having a structured approach to testing, we can find and fix bugs faster and improve the quality of the product in lesser time, which gives us more time to add new features to the product. We have a lot to gain by controlling the quality of our testing and a lot to loose by not doing it.

Lots of literature is available on making complex systems but none on testing complex systems. The methods of testing simple software or simple systems when applied on complex systems cause as much catastrophes as caused by making complex systems using simple software methodologies. In the following sections, I will list some of the insights that I have and I believe can simplify the process of finding and fixing bugs in complex systems.

The core concept that I will exploit is that the complex system is generally built using simple components.

What is a component? A component is an encapsulated entity which provides a specific functionality to the system. It could be a library used by a single or multiple processes in the system. It could be an executable which at runtime will provide some functionality. It could be a kernel module. It could be the kernel itself. For that matter it could the complete OS or any other facility provided by the OS. It could be thread in a process also.

This definition can be applied repeatedly on any complex system to further subdivide it into its constituents. At the lowest level of this spectrum will be the utility routines (assuming we are not testing the default libraries and system calls). At this lowest level, the routines can be tested using unit tests. Why do we test routines using unit tests? Routines can be tested independently without depending upon anything else in the system. In fact the system is directly dependent on correct functioning of each of the routines in each of components and its libraries. Though unit testing is done in a very controlled environment, it lays the foundation on which the whole system stands.

1. Core libraries should have unit tests.
2. Unit testing should be extended to test the routines in a environment which is as close to the system as possible. For example if the library is intended to the used from multiple threads, it makes sense to test it in multithreaded environment. If the routine could be called from multiple threads of multiple processes, it should be tested for that behavior.

No comments:

Post a Comment