Lava: A System for Mutation Testing of Java Programs

Sebastian Danicic

Abstract:

Whilst on Holiday in Sri Lanka, I decided to write some Java Mutants

\scalebox{0.5}{\includegraphics{p207.ps}}
This is what I can see from my verandah where I am doing this project.

Mutation Testing

Suppose we have written a big complicated program. Before we release it, we want to be as confident as possible that it works properly. So we test it by running it many times using a different input each time. Each time we run a test, we check that our program has behaved as expected. If it has, then we adopt a smug self-satisfied smile. Otherwise, we scratch our head and debug our program to try and find the error.

The set of all inputs that we try on our program is known as our test set. Suppose that our program behaves correctly for all elements of our test set. How confident can we be that our program is correct? Clearly, big test sets are good.

Mutations are a way of testing how good a test set is. The idea is, we introduce a `bug' by changing a program a little bit (by mutating it) and seeing if our `mutated program' behaves differently to the original with respect to any element of the test set. If this is the case we say the mutant has been `killed' by the test set. Of course, some mutant will be semantically equivalent to the original program so these will never be killed however good the test set is. A test set is good if it kills lots and lots of mutants.

Talking of mutants, this is Gizmo our Sri Lankan cat.
\scalebox{0.5}{\includegraphics{p203.ps}}
He is waiting to be fed. He miaows incessantly.

Higher Order Mutants

If we mutate a mutant we get what is know as a mutant of order 2. If we mutate this mutant we get a mutant or order 3 etc. A mutant of order $n$ can be expressed as a sequence of $n$ mutations. In general, a mutant of order $n$ is syntactically `further away' from the original program and semantically more likely to be different from the original too.

The Coupling Hypothesis

The Coupling Hypothesis states there is no point in generating higher order mutants. That is, we can measure the adequacy of a test set with mutants of order one. We do not get a more precise measure of our test set by using higher order mutants. By the way, here are some cows we saw on the beach yesterday.
\scalebox{0.1}{\includegraphics{pict3410.ps}} We can say a test set kills a mutant if at least one of its members kills the mutant. The adequacy of a test set is related to the number of non-equivalent mutants that it kills. One of the main aims of this work was to test the coupling hypothesis.

The System

Quite often in such experiments, toy languages are invented. Toy programs are written in this toy language, mutations are produced and tested on toy test sets. We decided to generate mutants for real Java programs. We also decided that our system should be able to cope with full Java 1.1. This meant that our system would have ready made Java programs to mutate with ready made test sets. Another design decision that we made was that the test sets would consist of input files. So the programs upon which our system can be applied are ones which expect all their input from a file or from standard input. The system was implemented using two tools from Sun Microsystems: JavaCC, a parser generator and JJTree, a preprocessor for JavaCC [tm] that inserts parse tree building actions at various places in the JavaCC source. These tools come with some ready made Java grammars which helped enormously in the production of our System. The one we used was Java1.1.jj. This is the grammar that includes the Java 1.1 language extensions. It is based on the Java Language Specification by Gosling, Joy, and Steele augmented by the inner class document released with JDK 1.1. It has the nice self referential property that the Java parser (JavaParser.class) resulting from processing Java1.1.jj correctly parses its own source (JavaParser.java).

The Allowable Mutations

As a first step we decided to perform mutations only on operators. This allowed for a finite yet complete set of mutants for any given program. The operators were divided into 8 different classes. Each operator was only allowed to be exchanged for one in its own class. This increased the probability that a mutant would compile. The operator classes were:

Boolean Constants true,false
Boolean Operators &&,||
Relational Operators ==,<=,>=,<,>,!=
Incr/Decr Operators ++,--
Arithmetic Operators +,-,*,/
Binary Bit Operators &,|,^,%,<<,>>,>>>
Arithmetic Assignment Operators +=,-=,*=,/=
Binary Bit Assignment Operators &=,|=,^=,%=,<<=,>>=,>>>=

Mutating JavaParser.java

An obvious choice of program to mutate was the the Java Parser itself. As test set we can use any collection of Java Programs. We decided to use programs in the demo directory j2sdk1.4.2_06/demo containing 359 examples of Java source code. These include examples that use Swing and other Java Foundation Classes, and the Java Platform Debugger Architecture and would therefore use a healthy variety of Java syntax to test the parser.

As stated earlier JavaParser.java is produced as a result of running JavaCC with the Java Grammar file Java1.1.jj. JavaParser.java has more than 5000 lines of code and contains over 1000 occurrence of mutatable operators as defined in the table above.

Defining a Mutation

The Java Parsers converts the input program into a sequence of tokens. We can therefore define a mutation as a pair, $(t,n)$ where $t$ represents the position in the sequence of tokens of the token we wish to mutate and $n$ represents the element of the class that we are replace the token with. For example if the 32nd token is a $+$ sign then the mutation $(32,3)$ represents replacing this plus sign with a $*$ sign. The pair $(32,1)$ in this case would be the identity mutation since we would be replacing a $+$ with a $+$.

The implementation

I have produced a Java package called mutations with a lot of useful mutation producing code.

Let us look now at the program allTokens.java. Each token is represented by an integer value which is stored as an int constant, one for each token type. For example, JavaParserConstants.FALSE is the integer value for the token FALSE and JavaParserConstants.LT is the value representing the token $<$. The two dimensional array allowables is such that allowables[i] represents the array of elements for which the token represented by integer $i$ can be replaced. For example

allowables[JavaParserConstants.GT] = new
	int[]{
			JavaParserConstants.LT,
			JavaParserConstants.EQ,
			JavaParserConstants.NE,
			JavaParserConstants.GE,
			JavaParserConstants.LE
	     };
Says that the Greater than sign ($>$) can only be replaced by any one of $<=$,$==$,!=,$>=$, $<=$.

We represent a token simply as an object of type tok, where tok is given by:

package mutations;
import mutations.*;
import java.io.*;
public class tok implements Serializable
{
	
        public int type;
        public String image;
        public String commentsEtc;

        public tok( int t,String i,String c)
        {
                type=t;
                image=i;
                commentsEtc=c;
        }

        public String toString()
        {
           return commentsEtc+image;
        }
}

So a token can be thought of as triple consisting of an int token type as just described, a String, image containing the actual value of the token, and another String containing all the comments an white space that occurred in between this token and the previous one.

The class allTokens.java contains two very useful methods: programToVector(String s) which converts program s to Vector of toks. and printProgram(Vector v) which prints out the program represented by the Vector of toks v.

For example printProgram(programToVector(``Hello.java'')) will print the program Hello.java exactly as it is.

Given a Vector of tokss, v, the method arrayOfMutatable(v) produces an array consisting of the positions in $v$ of all the mutatable tokens in program represented by $v$. For example if the zeroth element of this array was 59, this would mean the 59th token was the first which was from one of the token classes described earlier. The number of elements of this array represents the number of different tokens that can be changed in order to produce a first order mutant.

How to Produce a Mutant

The following code would is an example of a complete Java program to produce a mutant. In fact it find the `first' mutant of itself.
import mutations.*;
import java.util.*;
class doit
{
        public static void main(String args[]) throws Exception
        {
                allTokens a=new allTokens();
                Vector v=a.programToVector("doit.java");
                int am[] =a.arrayOfMutatable(v);
                tok firstMutatableToken=(tok)(v.elementAt(am[0]));
                int newType=a.allowables[firstMutatableToken.type][0];
                tok newTok=a.replace(firstMutatableToken,newType);
                Vector w=a.replaceToken(v,am[0],newTok);
                a.printProgram(w);
        }
}
The output is the illegal Java Program:
import mutations.+;
import java.util.*;
class doit
{
        public static void main(String args[]) throws Exception
        {
                allTokens a=new allTokens();
                Vector v=a.programToVector("doit.java");
                int am[] =a.arrayOfMutatable(v);
                tok firstMutatableToken=(tok)(v.elementAt(am[0]));
                int newType=a.allowables[firstMutatableToken.type][0];
                tok newTok=a.replace(firstMutatableToken,newType);
                Vector w=a.replaceToken(v,am[0],newTok);
                a.printProgram(w);
        }
}
The first mutatable token is *. It has been changed to a +. This is a problem. We wish to exclude mutants that do not compile. This problem is tackled in a later section. Clearly this program can be adapted to produce any or every mutant.

How to Produce a Random Mutant

To produce a random mutation simply take an arbitrary element of array am above, look up its token type, $j$ and replace it with a random element of allowables[j]. Very simple methods for producing random mutations (randomMutate) can be found in allTokens.java.
  public tok randomMutate(tok t)
  {
        int [ ]  a =allowables[t.type];
        int type= a[useful.rand(a.length)];
        return replace(t,type);
	}
The array allowables[t.type] represents the set of token types that can replace token t. We replace it by a random element of the array. All we need to do now is randomly pick a token to randomly mutate:

public  Vector randomMutate(Vector w)
{
	
        int a[] =arrayOfMutatable(w);
        int pos=a[useful.rand(a.length)];
        tok t= randomMutate((tok)w.elementAt(pos));
        return replaceToken(w,pos,t);
}

Producing All Mutants

Since the number of operator mutants is finite, we can compute the set of all allowable mutants in a very staightforward way. This is left as an exercise for the reader!
Hint: arrayOfMutatable(w) gives positions of all tokens which can be mutated. For each of these, we look up the type, $t$ and look up all mutations of $t$ in allowables[t].

Seeing if a Mutant is Killed by a Test Set

There are two useful shell scripts for running experiments: isKilled and allMutants. The script isKilled assumes that the code for the original Java Parser is in subdirectory called original and a mutation of the Java Parser is in a subdirectory called mutated. Both have been compiled. The script simply runs each program on a whole batch of Java programs and using Unixdiff checks whether the outputs are different. If there are never any differences then the mutant has not been killed.

The code for isKilled is:

declare -i z
z=0
for i in  `find /usr/share/j2sdk1.4.2_06/ -regex .*[^\/][\.]java$`
#for each Java program in a list of Java Programs do:-
do
cd original/
java JavaParser $i > /tmp/a  2>&1 #run the original parser
# and redirect the resulting standard output and standard error to /tmp/a
cd ../mutated
java JavaParser $i > /tmp/b  2>&1 #run the mutated parser
# and redirect the resulting standard output and standard error to /tmp/b
diff /tmp/a /tmp/b  > /dev/null 2>&1
result=$?
#use diff to compare the two and store in result
[ $result -ne 0 ]  &&  echo "killed!" && exit 1
#if the result is not zero it's been killed so exit the script
#if not try to kill it with the next input.
z=($z+1) 
echo $z
cd ..
done

The AllMutants script Generates 1000 random mutants each mutant is compiled and compiled and then if no error isKilled is run. A count is kept of all the mutants which a, compile and b are killed. The code for AllMutants is

#! /bin/bash

declare -i m
m=0
declare -i k
k=0
#go round a thousand times
for i in `java useful.range 1 1000` 
do
java allTokens original/JavaParser.java  > mutated/JavaParser.java
#mutate the JavaParser and store in file in 'mutated' directory
cd mutated
javac JavaParser.java  > /dev/null 2>&1
#compile the mutated JavaParser
result=$?
#store the result in variable: result
cd ..
[ $result -eq 0 ] && m=( $m+1 ) &&  ! ./isKilled && k=( $k+1 )
#if it compiles add one to m try to kill it. 
#if it's killed add one to k
echo so far $i mutations tried, $m of those compiled and $k killed 
done

What if a Mutant doesn't terminate for some input?

If we are running experiments that try to kill mutants, we do not want our mutants to fail to terminate as this would mean that our experiment would also fail to terminate. The most practical way round this problem is simply to kill a process it it runs for more than a certain length of time and assume it was not going to terminate. If the original did terminate then we can assume the mutant has also been killed. If neither terminates after the specified time we can assume the mutant has not been killed. (We do not do this here as we assume the original is correct and always terminates.) The code to do this may need some tuning depending on the system on which it is being run. Here is the Unix script for doing this. It is called keepGoing and should be run in parallel with allMutants.
 while true 
 do
 sleep 3
 echo now
 res1=`ps u |grep JavaParser|grep -v grep|cut -c 10-15`
 echo $res1
 sleep 3
 res2=`ps u |grep JavaParser|grep -v grep|cut  -c 10-15`
 [ "$res1" -eq "$res2" ] && ( echo "deleted" ) && ( kill -9 $res1 )
  echo $res2
 done

Limitations of The System

The system works best if the test consists of files which are read by the orginal program and the mutants.

Download

The basic mutator can be downloaded from http://sebastian.doc.gold.ac.uk/lava.jar

To generate a random mutant from myProg.java on standard output, type:

java -jar lava.jar myProg.java
Note: I don't think it works for prograns that include the Java 1.5 extensions such as generics. If anyone is interested in updating it to handle full Java 1.5 please contact me. This would make a very nice student project.



s.danicic@gold.ac.uk
Seb's Home Page
Dept of Comptuing
Last updated 2007-04-24