=Paper=
{{Paper
|id=None
|storemode=property
|title=An Evaluation of Java Code Coverage Testing Tools
|pdfUrl=https://ceur-ws.org/Vol-920/p72-kajo-mece.pdf
|volume=Vol-920
|dblpUrl=https://dblp.org/rec/conf/bci/Kajo-MeceT12
}}
==An Evaluation of Java Code Coverage Testing Tools==
<pdf width="1500px">https://ceur-ws.org/Vol-920/p72-kajo-mece.pdf</pdf>
<pre>
         An Evaluation of Java Code Coverage Testing Tools
                         Elinda Kajo-Mece                                                                     Megi Tartari
              Faculty of Information Technology                                                   Faculty of Information Technology
                Polytechnic University of Tirana                                                   Polytechnic University of Tirana
                           ekajo@fti.edu.al                                                              mtartari@fti.edu.al

ABSTRACT                                                                               2. SELECTED TOOLS AND EVALUATION
Code coverage metric is considered as the most important metric                        CRITERIA
used in analysis of software projects for testing. Code coverage
analysis also helps in the testing process by finding areas of a                         Among various automated testing tools [8], we have selected
program not exercised by a set of test cases, creating additional                      two tools to perform the Code Coverage Analysis [1][2][3], as a
test cases to increase coverage, and determine the quantitative                        manner to evaluate the efficiency of our tests we created in the
measure of the code, which is an indirect measure of quality.                          JUnit framework [4][7]. In this paragraph we will summarize
There are a large number of automated tools to find the coverage                       briefly the main features of these to: EMMA and CodeCover
of test cases in Java. Choosing an appropriate tool for the                            coverage tools. The main reasons for which we choose them are:
application to be tested may be a complicated process. To make it                      1. These tools are 100 % open-source.
ease we propose an approach for measuring characteristics of
                                                                                       2. These tools have a large market share compared with the other
these testing tools in order to evaluate them systematically and to
                                                                                       open source coverage tools.
select the appropriate one.
                                                                                       3. These have multiple report type format.
Keywords                                                                               4. These tools are for both open-source and commercial
Code coverage metrics, testing tools, test case, test suite                            development projects.
1. INTRODUCTION                                                                        EMMA Tool
  The levels of quality, maintainability, and stability of software                      We used EclEmma 2.1.0, a plug-in for Eclipse, which is our Java
can be improved and measured through the use of automated                              development environment. Emma distinguishes itself from other
tools throughout the software development process. In software                         tools by going after a unique feature combination: development
testing[5][6],software metrics enable the appropriate quantitative                     while keeping individual developer's work fast and iterative. Such
information, to support us in the decision-making on the most                          a tool is essential for detecting dead code and verifying which parts
efficient and appropriate testing tools for our programs.                              of an application are actually exercised by the test suite and
  The most mentioned metric for assessment in the software field                       interactive use. The main features of Emma, which represent its
are the Code Coverage metrics. These metrics are considered as                         advantages are: Emma can instrument classes for coverage either
the most important metric, often used in the analysis of software                      offline ( before they are loaded) or on the fly (using an
projects for the testing process.                                                      instrumenting application class loader); Supported coverage types:
                                                                                       class, method, line, basic block; Emma can detect
  Today we have available several tools that perform this
                                                                                       when a single source code line is covered only partially; Output
coverage analysis, but we will select the most appropriate tools,
                                                                                       report types: plain text, HTML, XML.
which are Java open-source code coverage tools like Emma and
CodeCover.                                                                             CodeCover Tool
  To conclude with, according to some criteria, that we will take                        CodeCover is an extensible open source code coverage tool. It
into consideration for the evaluation of this code coverage tools,                     provides several ways to increase test quality. It shows the quality
we will judge for the most efficient tool to be used by the software                   of test suite and helps to develop new test cases and rearrange test
testing team. These criteria are: Human-Interface Design (HID),                        cases to save some of them. So we get a higher quality and a better
Ease of Use (EU), Reporting Features (RF), Response Time (RT).                         test productivity. The main features of CodeCover are: Supports
  In Section 2 we will mention the coverage metrics [9] used in                        statement coverage, branch coverage, loop coverage and strict
our experiments; we will shortly explain the tools [8] we have                         condition coverage; Performs source instrumentation for the most
selected to perform the code coverage analysis for our tests;                          accurate coverage measurement; CLKI interface, for easy use from
describe briefly how JUnit framework is implemented in each of                         the command line; Ant interface, for easy integration into an
these tools [10] [11], since JUnit is our experimental                                 existing build process; Correlation Matrix to find redundant test
environment, where we program unit tests for our software and                          cases and optimize your test suite; The source code is highlighted
the last part of this section consists of selecting some criteria                      according to the measured date.
based on which we will then judge which of the tools is more                             The testing environment we used to project the set of tests for
effective to use in the testing process. In Section 3 we will                          our input programs was JUnit 3.
summarize the results of our experiments for each tool and                               We choose as input programs six sorting algorithms: Bubble
analyze them to bring us in the conclusion which of the tools is                       Sort, Selection Sort, Insertion Sort, Heap Sort, Merge Sort, Quick
more effective. In Section 4 we give the conclusions of our work.                      Sort. The main reason why we choose these algorithms is the
                                                                                       facility we face on computing the Cyclomatic Complexity (CC),
BCI’12, September 16–20, 2012, Novi Sad, Serbia.                                       which is crucial on defining the number of test cases needed to
Copyright © 2012 by the paper’s authors. Copying permitted only for private and        achieve a good coverage percentage of the program code. To
academic purposes. This volume is published and copyrighted by its editors.            proceed in the testing process for each of this sorting algorithm, we
Local Proceedings also appeared in ISBN 978-86-7031-200-5, Faculty of Sciences,
University of Novi Sad.                                                                first build Java programs for each of them.


                                                                                  72
  To achieve our goal we chose some criteria, based on which we             66.7 %. This result contradicts the result taken after the execution
will evaluate which testing tool is the most efficient. So we chose         of Emma tool on the same set of test cases, which is relatively
Human Interface Design (HID) as an indicator of the level of                high with an average of 87 % (Fig.1). This contradict, led us to
difficulty to learn the tool's procedures on purchase and the               increase the number of test cases for a higher quality of tests. For
likelihood of errors, in using the tool over a long period of time;         Quick Sort we built 4 more test cases (Fig.2), which produced a
Ease of Use (EU) to judge if the tool is easy to use to ensure              maximum result of 100 % code coverage with both tools.
timely, adequate, and continual integration into the software
development process; Reporting Features (RF) to show the degree
of variety regarding the formats that tools use to report their
coverage results; Response Time (RT) used to evaluate the tool's
performance with regards to response time. In addition to these
criteria, we will also evaluate the number and quality of test cases
to judge for the most appropriate tool for the software testing
process.

3. EXPERIMENTS AND ANALYSIS
  In this section we will summarize the experiments we have
performed on the selected algorithms. Initially, we built the Java             Figure 1: Emma Coverage report initially with three test
programs for each of our sorting algorithms. Then we designed                                  cases for QuickSort.
the set of testing units by using the JUnit testing framework [7] in
Java. Finally we performed the Analysis of Code Coverage, to
evaluate these tests through the selected code coverage tools. This
analysis calculates the coverage percentage, that serves as an
indirect measure of the quality of tests. Based on these
measurements, we can then create additional test cases [4][7]to
increase code coverage.
  In table 1 we summarized the quantitative information regarding
our experiments. In the last column we show the number of final
test cases we built for each of the Java programs of the sorting               Figure 2: CodeCoverage report finally with seven test cases
algorithms. We used the term "final test cases" because we                                         for QuickSort.
continuously improved our coverage results by increasing the
number of test cases, until the addition of another test case does
not anymore affect the coverage result, that means we have
achieved a high level of code coverage.

            Table 1: Experimental Program Details

   Input        LOC       NOC        NOM         CC       No.of
 Programs                                                TestCase
Bubble           53          2         3          4         11
Selection        55          2         3          4         11
                                                                                   Figure 3: Code Coverage report after execution of
Insertion        53          2         3          4          11
                                                                                 CodeCover initially with three test cases for QuickSort.
Heap             84          2         11        13          16
Merge            67          1         3         11           9
Quick            63          1         6         11           7
LOC-Lines of Code,NOM-Number of Methods,NOC-Number
          of Classes,CC-Cyclomatic Complexity

Based on these coverage results and also the computed criteria
chosen for evaluation, we performed the analysis process to define
the best tool.
In the figures below we see the coverage reports produced after             Figure 4: Code Coverage report after execution of CodeCover
the execution of Emma and CodeCover for two cases: 1) When                            finally with seven test cases for QuickSort.
we projected a small set of tests; 2) When we projected a larger
set of tests in order to improve quality of the testing process. To
show briefly the experimental procedure we followed to achieve              During our experiments, we noticed that this contradict, that
our objective, we will take as an example the experimental results          relates to the fact that for the same set of test cases the execution
for Quick Sort algorithm. For Quick Sort we initially projected             of Emma gives us a higher coverage tool than the result reported
only 3 test cases (Fig.1). The CodeCover tool produced low BC               from CodeCover, we concluded that CodeCover gives a more
(Branch Coverage) and LC (Loop Coverage) coverage metrics of                accurate information regarding the code coverage.


                                                                       73
In Section 2, we mentioned the Correlation Matrix as a way to             test case for each functional unit of the program, and to avoid
find redundant test cases, which does not increase the coverage           programming long test cases that try to cover a considerable part
percentage. It shows a kind of dependency relationship between            of the program.
test cases of the same input program. In JUnit3 testing framework,
dependency between tests is not supported, that is why we should
always try to avoid dependency between test cases. In the figure
below is shown the Correlation Matrix for Quick Sort.


                                                                                Figure 7: Code Coverage report after execution of
                                                                              CodeCover finally with eleven test cases for BubbleSort.


                                                                           We haven't showed Emma coverage report, because it is
                                                                           relatively high since the first case, where we projected only 7
                                                                           tests.
                                                                           The results gained for SelectionSort are 46.7% for LC metric in
                                                                           the case of 7 tests and 80 % in the final case of 11 test cases; for
                                                                           Insertion are 60% for LC metric in the first case and 86.7% for
                                                                           the final case. So far, we see that in general the most
                                                                           "problematic" coverage metric is the Loop Coverage metric.
                                                                           This happens mainly because of the for loop, that requires more
                                                                           test cases to be covered. This is shown in fig.10, where yellow
                                                                           signifies the partial coverage of the for loop.
Figure 5: The Correlation Matrix produced by CodeCover for
               QuickSort with seven test cases.


From the figure above, we see that blue squares (meaning that
there is 100 % dependency between test cases), exist only in the
case where the same number of test case intersect. So we can say
that we have proceeded according to the main rule of JUnit,that is
to avoid dependency between test cases.
Below we will show by figures the results of the Code Coverage
Analysis performed by Emma and CodeCover tools for the other
five input sorting programs.
For Bubble, Selection and Insertion Sort we initially projected 7
test cases, then in order to achieve a relatively high coverage we        Figure 8: A partial coverage of a for loop, crucial for the Lool
projected 11 test cases. The coverage result report produced by                              Covrage metric (80 %).
CodeCover for BubbleSort is shown below for both cases.
                                                                          For MergeSort we initially projected 4 test cases, which according
                                                                          to CodeCover produced a low LC indicator of 60 %,. Then we
                                                                          extended this set of test cases to 7test cases, gaining a new
                                                                          percentage of LC of 86.7 % ( the reason why it is not 100 % is
                                                                          because there are many loops in the program, not only the for
                                                                          loops, but also while).
Figure 6: Code Coverage report after execution of CodeCover               For Heap Sort we initially projected 8 test cases, giving a LC
         initially with seven test case for BubbleSort.                   metric of 33.3 % and a CC metric (Condition Coverage) of 80
                                                                          %.Then we improved this set of tests by extending it to 16 test
                                                                          cases, that improved considerably both the LC and CC metric to
From the figure above, we see a low percentage of 53.3 % for the          respectively : 88.9 % and 100 %.
LC (Loop Coverage) metric. That is why we finally projected 11
                                                                          Through the graph below we show the improvements we achieved
test cases to increase this low percentage as shown in the figure
                                                                          in our experiments until we gained a high code by showing the
below, where the new LC metric is 86.7 %, which is considered a
                                                                          initial result we gained when we projected a small set of test cases
high coverage percentage. By improving our experimental work
                                                                          and the final result after we increased the number of test cases for
on the testing process repeatedly we came into the conclusion that
                                                                          a higher coverage.
to achieve a high coverage percentage the secret is to project one


                                                                     74
                                                                           through the detailed coverage analysis for each program method,
                                                                           it allows us to define the unnecessary test cases, that does not
                                                                           increase coverage of the program, affecting so negatively the
                                                                           execution time of the test suite by decreasing it. We argued this
                                                                           conclusion by taking as an example QuickSort, where for an
                                                                           initial set of 3 test cases while Emma reported an average
                                                                           coverage of 87%, CodeCover reported a low Loop Coverage of
                                                                           66.7 %.The same fact was present in all our set of input sorting
                                                                           programs. So in order to project a successful testing process for
  Figure 9: The percentage of improvement in code coverage                 our input programs, we should base on CodeCover coverage
  achieved by increasing the number of test cases for the six              reports, to decide whether it is necessary to increase the number
                      sorting programs.                                    of test cases or not. During our experimental work, where we
                                                                           continuously improved the testing process, we came into the
                                                                           conclusion that the most problematic coverage metric is Loop
In table 2, we have summarized the results produced by Emma
                                                                           Coverage. This happens mainly because of the for loop, that
and CodeCover tools after performing the Code Coverage
                                                                           requires extra tests to be fully covered. So our coverage results
Analysis on each of the input programs (the sorting algorithms).
                                                                           for all our input programs reached a Loop Coverage metric in the
                                                                           range 46.7 % to 66.7%, which is considered very low. But not
Table 2: Analysis & Implementation of Emma and CodeCover                   only the Loop Coverage metric was responsible for low coverage
                Using Various Sort Programs                                percentages in the beginning of our work, but also the manner in
                                                                           which we projected our tests affects coverage result. So to
                                                                           achieve a high code coverage, we have to avoid programming
                                                                           long test cases that try to cover a considerable part of the
                                                                           program, but instead we must project one test case for each
                                                                           functional unit of the program. We arrive in the same conclusion
                                                                           if we see table 3, that shows the computed criteria chosen to
                                                                           completely evaluate the testing tools. From this table we infer that
                                                                           the CodeCover tool is easy to use, has a very good response time
                                                                           for every command given, has very good reporting features
                                                                           compared with Emma tool.
 SC-Statement Coverage, BLC-Block Coverage, BC-Branch
 Coverage, LC-Loop Coverage, MC-Method Coverage, CC                        5. REFERENCES
Condition Coverage, FC-File Coverage, CLC-Class Coverage                   [1] Lawrance, J., Clarke, S., Burnett, M., and G. Rothermel. 2005. How
                                                                           Well Do Professional Developers Test with Code Coverage
                                                                           Visualizations? An Empirical Study. In Proceedings of the IEEE
After analyzing the code coverage results produced after the               Symposium on Visual Languages and Human-Centric Computing
execution of Emma and CodeCover on the various sorting                     (September 2005).
programs, we concluded that CodeCover gives a more accurate                [2] Tikir, M. M., and Hollingsworth, J. K. 2002. Efficient instrumentation
coverage information than Emma. To complete the process of                 for code coverage testing. In Proceedings of the ACM SIGSOFT 2002
evaluating the effectiveness of these testing tools, we will show          International Symposium on Software Testing and Analysis (Rome, Italy,
in table 3 the computed criteria [4] [5] selected to evaluate these        July 22-24, 2002).
tools.                                                                     [3] Cornett, S. 1996-2011. Code Coverage Analysis. Bullseye Testing
                                                                           Technology.
               Table 3: Analysis of Tool Metrics
                                                                           [4] Beust, C., and Suleiman, H. 2007. Next Generation Java Testing:
                                                                           TestNg and Advanced Concepts. Addison Wesley, 1-21, 132-150.
                                                                           [5] Ammann, P., and Offutt, J. 2008. Introduction to Software Testing,
                                                                           Cembridge University Press, 268-277.
Based on these values (which we partially gained in their official         [6] Sommerville, I. 2007. Software Engineering ( 8th edition).Harlow:
websites, as they are open-source tools), we judged that the best          Addison Wesley, 537-565.
and more effective tool to be used during the software testing             [7] JUnit Best Practices-Java World,
process is CodeCover.                                                      http://www.javaworld.com/javaworld/jw-12-2000/jw-1221-junit.html

4. CONCLUSIONS                                                             [8] Prasad, K.V.K.K. 2006. Software testing tools.
Based on the results summarized in table 2, that shows achieved            [9] Durrani, Q. 2005.Role of Software Metrics in Software Engineering
code coverage metric reported from each tool, we conclude that             and Requirements Analysis. In Proceeding of IEEE ICICT First
CodeCover tool reports a more accurate coverage information                International Conference of Information and Communication
                                                                           Technologies. (August 27-28).
than Emma, which does not supply us with sufficient information,
based on which we can judge over the quality of tests, that is why         [10] EMMA: a free Java code coverage tool http://Emma.sourceforge.net
we suggest the use of the CodeCover tool. CodeCover is more                [11] CodeCover Tutorial
efficient to perform the Code Coverage Analysis, because                   http://www.codecoveragetools.com/code_coverage_java.html


                                                                      75

</pre>