Dynamic Analysis Session

Introduction to Dynamic Analysis

Dynamic analysis is the “the analysis of the properties of a running software system” [Ball1999]. It is complementary to static analysis techniques. Some properties that cannot be studied through static analysis can be examined with dynamic analysis and vice versa. The applications of dynamic analysis techniques are very broad: program comprehension, system verification, resource profiling, test analysis, etc.

In this session, we will use dynamic analysis for two purposes:

Coverage Measurement: Collecting information about the quality of a test suite and the confidence we can have in it during regression testing.
Program Understanding: Collecting information to understand the inner working of the system and to identify refactoring opportunities.

(Note: OORP - p.xx refers to a page in the book "Object-Oriented Reengineering Patterns".)

Measuring and Assessing Test Coverage (OORP p121-144)

The presence of a test suite can be an important support during reengineering. Tests help to:

Reveal unwanted side effects of refactoring. Frequent execution of a regression test suite can reveal defects early and fast, thereby forming a harness that protects the developer.
Understanding the inner working of (part of) a system. In particular do unit tests show typical user scenarios as well as scenarios where the system can not or must not function.
Give developer trusts in the quality of their work. Making the test suite pass confirms or rejects the assumptions the developer makes about the system.
Write new tests. Existing tests serve as examples or even as the basis for further tests. You can systematically extend tests based on criteria important for the project (testing old bugs, increasing the tests for a critically important part, testing new functionality, etc).

The presence of automated tests does however not offer any guarantee about its quality. Do the tests cover the whole system or are some parts left untested? Which parts are covered to which extend? Hence, measuring test coverage is a useful, even necessary, way to assess the quality and usefulness of a test suite in the context of reengineering.

We will use two systems as subjects to experiment with:

JPacman - A toy example from the University of Delft

SweetHome3D - An application for interior design

Download the source code packages and import them into your Eclipse workspace as "Existing projects". Alternatively, you can download a ready Eclipse workspace containing both projects.

We try two dynamic analysis tools for test coverage:

Emma

Eclipse install link

Clover

web page
Eclipse install link - manual.

Install them into your Eclipse IDE using the provided install links. In Eclipse, go to "help" -> "install new software". Enter the link at the top and select the tool you want to install. Don't forget to turn off the option "Contact all update sites during install to find required software"!

Measuring Test Coverage with Emma

Measuring test coverage requires information about the executed code during a test run. To obtain this information, some additional code is typically injected which is done during an additional step in the build process, before compilation. For virtual machine technology, this manipulation can also be performed at the bytecode level. In both cases, the injected code monitors and reports during test execution to a given test coverage measurement and reporting tool.

There are three ways to obtain the measurement using the test coverage tool Emma for Java. However, in this lab session we use the plugin EclEmma - an Emma plugin for Eclipse. Given that you can configure Eclipse to compile the system under study, you can launch a "Coverage run" and obtain coverage results reported as tables and colored source code within the IDE.

The procedure is similar with Clover.

Evaluating Test Coverage

Using Emma, inspect the different types of coverage.

How do they differ from eachother?

Comment out a test and run Emma again. Do this a couple of times.

What impact does it have on the coverage?

Now do the same with Clover, and compare the results.

Are the coverage results similar for both tools?

Both tools highlight covered code when opening a source file: compare the coverage of the same file using this view.

Are the coverage results similar for both tools?
Which tool provides the most information in this view?

Analyse the two subject systems with Emma and / or Clover and answer the following questions:

How do the results coincide with how you would intuitively test the system? Do more complex or critical parts have higher coverage?
What would you propose as a good level of code coverage?
Do you think 100% code coverage is feasible?
Comparing the two tools, which do you prefer and why?

Mutation Testing

Mutation testing is a method of determining the quality of a test suite in detail. Mutation testing simulates real faults, and checks whether the test suite is strong enough to catch simulated faults. It is performed by injecting faults into software, and counting the number of these faults that make at least one test fail. The process for mutation testing require the following steps. First, faulty versions of the software are created by introducing a single fault into the system (Mutation). This is done by applying a known transformation (Mutation Operator) on a certain part of the code. After generating the faulty versions of the software (Mutants), the test suite is executed on each one of these mutants. If there is an error or failure during the execution of the test suite, the mutant is marked as killed (Killed Mutant). On the other hand, if all tests pass, it means that the test suite could not catch the fault and the mutant has survived (Survived Mutant). Mutation testing demands a green test suite — a test suite in which all the tests pass — to run correctly. The final result is calculated by dividing the number of killed mutants by the number of all mutants. A test suite is said to achieve full test adequacy whenever it can kill all of the mutants. Such test suites are called mutation-adequate test suites.

Mutation Testing with LittleDarwin

There are several tools available that can perform mutation testing on Java code. In this lab, we use LittleDarwin. LittleDarwin is a mutation testing tool written in python that can analyze a wide range of Java softwares. It is built with the purpose of easy deployment in an industrial environment, and it can handle complicated build system structures often found in such cases. In order to perform the experiment:

First, download this script.
Create a new directory, and copy the downloaded script there.
Now open a terminal in this directory by right-clicking in the file manager and chosing "Open in Terminal".
Run the script using this command:

sh ./get_deps.sh

Wait for the process to finish. Now you can run LittleDarwin using this command to see the command line help:

cd LittleDarwin
../python/bin/python ./LittleDarwin.py

Mutation Testing of JPacMan3

Now we want to use mutation testing to find out the quality of the test suite of JPacMan3. To do that, we run the LittleDarwin on the supplied jpacman project using the following command:

../python/bin/python ./LittleDarwin.py -m -b -p ./jpacman3/src/main -t ./jpacman3/ -c mvn,test

-m is for enabling the mutation phase
-b is for enabling the test execution phase
-p is to provide the path to production code files
-t is to provide the path to the build directory
-c is to provide the commands to run the test suite with

This is a time consuming process, therefore, for this project it takes between 15-25 minutes to complete the analysis. After it is finished, you can find the report in ./jpacman3/src/mutated/report.html.

Run it for a moment to see how it works, but you don't have to complete the process. You can find the report here.

Evaluating Mutation Coverage

Look at the report from LittleDarwin, and the report from EclEmma side by side.

Find classes that have a low mutation coverage and a high statement coverage. What does this indicate?
In these classes, find a survived mutant that is within a covered statement (you can use the line numbers and the before-after comments inside each mutant to find them in the code). Why does this happen?
Find classes that have a high mutation coverage and a low statement coverage. What does this indicate?
In these classes, look for a killed mutant that is not within a covered statement. Why does this happen?
Given the previous observations, do you think statement coverage is a reliable metric for the quality of the test suite? Why (not)?
Considering the new information, revisit your answers to the previous section. Does anything need to change?
Can the accuracy of mutation testing be improved? If yes, how do you think it is possible?
How can mutation testing help writing new tests?

Other Resources

The slides can be found here.
For more information on dynamic analysis tools you can check, for example, the wikipedia page on "Dynamic Program Analysis".
Martin Fowler has a nice short article on test coverage in his blog.
Jia et al. performed a thorough literature study on mutation testing that summarizes the topic very well. You can access their paper here.

Attachment	Size
Ball1999.pdf	248.28 KB
Cornelissen2009.pdf	4.6 MB