=Paper=
{{Paper
|id=None
|storemode=property
|title=An Evaluation of Java Code Coverage Testing Tools
|pdfUrl=https://ceur-ws.org/Vol-920/p72-kajo-mece.pdf
|volume=Vol-920
|dblpUrl=https://dblp.org/rec/conf/bci/Kajo-MeceT12
}}
==An Evaluation of Java Code Coverage Testing Tools==
An Evaluation of Java Code Coverage Testing Tools Elinda Kajo-Mece Megi Tartari Faculty of Information Technology Faculty of Information Technology Polytechnic University of Tirana Polytechnic University of Tirana ekajo@fti.edu.al mtartari@fti.edu.al ABSTRACT 2. SELECTED TOOLS AND EVALUATION Code coverage metric is considered as the most important metric CRITERIA used in analysis of software projects for testing. Code coverage analysis also helps in the testing process by finding areas of a Among various automated testing tools [8], we have selected program not exercised by a set of test cases, creating additional two tools to perform the Code Coverage Analysis [1][2][3], as a test cases to increase coverage, and determine the quantitative manner to evaluate the efficiency of our tests we created in the measure of the code, which is an indirect measure of quality. JUnit framework [4][7]. In this paragraph we will summarize There are a large number of automated tools to find the coverage briefly the main features of these to: EMMA and CodeCover of test cases in Java. Choosing an appropriate tool for the coverage tools. The main reasons for which we choose them are: application to be tested may be a complicated process. To make it 1. These tools are 100 % open-source. ease we propose an approach for measuring characteristics of 2. These tools have a large market share compared with the other these testing tools in order to evaluate them systematically and to open source coverage tools. select the appropriate one. 3. These have multiple report type format. Keywords 4. These tools are for both open-source and commercial Code coverage metrics, testing tools, test case, test suite development projects. 1. INTRODUCTION EMMA Tool The levels of quality, maintainability, and stability of software We used EclEmma 2.1.0, a plug-in for Eclipse, which is our Java can be improved and measured through the use of automated development environment. Emma distinguishes itself from other tools throughout the software development process. In software tools by going after a unique feature combination: development testing[5][6],software metrics enable the appropriate quantitative while keeping individual developer's work fast and iterative. Such information, to support us in the decision-making on the most a tool is essential for detecting dead code and verifying which parts efficient and appropriate testing tools for our programs. of an application are actually exercised by the test suite and The most mentioned metric for assessment in the software field interactive use. The main features of Emma, which represent its are the Code Coverage metrics. These metrics are considered as advantages are: Emma can instrument classes for coverage either the most important metric, often used in the analysis of software offline ( before they are loaded) or on the fly (using an projects for the testing process. instrumenting application class loader); Supported coverage types: class, method, line, basic block; Emma can detect Today we have available several tools that perform this when a single source code line is covered only partially; Output coverage analysis, but we will select the most appropriate tools, report types: plain text, HTML, XML. which are Java open-source code coverage tools like Emma and CodeCover. CodeCover Tool To conclude with, according to some criteria, that we will take CodeCover is an extensible open source code coverage tool. It into consideration for the evaluation of this code coverage tools, provides several ways to increase test quality. It shows the quality we will judge for the most efficient tool to be used by the software of test suite and helps to develop new test cases and rearrange test testing team. These criteria are: Human-Interface Design (HID), cases to save some of them. So we get a higher quality and a better Ease of Use (EU), Reporting Features (RF), Response Time (RT). test productivity. The main features of CodeCover are: Supports In Section 2 we will mention the coverage metrics [9] used in statement coverage, branch coverage, loop coverage and strict our experiments; we will shortly explain the tools [8] we have condition coverage; Performs source instrumentation for the most selected to perform the code coverage analysis for our tests; accurate coverage measurement; CLKI interface, for easy use from describe briefly how JUnit framework is implemented in each of the command line; Ant interface, for easy integration into an these tools [10] [11], since JUnit is our experimental existing build process; Correlation Matrix to find redundant test environment, where we program unit tests for our software and cases and optimize your test suite; The source code is highlighted the last part of this section consists of selecting some criteria according to the measured date. based on which we will then judge which of the tools is more The testing environment we used to project the set of tests for effective to use in the testing process. In Section 3 we will our input programs was JUnit 3. summarize the results of our experiments for each tool and We choose as input programs six sorting algorithms: Bubble analyze them to bring us in the conclusion which of the tools is Sort, Selection Sort, Insertion Sort, Heap Sort, Merge Sort, Quick more effective. In Section 4 we give the conclusions of our work. Sort. The main reason why we choose these algorithms is the facility we face on computing the Cyclomatic Complexity (CC), BCI’12, September 16–20, 2012, Novi Sad, Serbia. which is crucial on defining the number of test cases needed to Copyright © 2012 by the paper’s authors. Copying permitted only for private and achieve a good coverage percentage of the program code. To academic purposes. This volume is published and copyrighted by its editors. proceed in the testing process for each of this sorting algorithm, we Local Proceedings also appeared in ISBN 978-86-7031-200-5, Faculty of Sciences, University of Novi Sad. first build Java programs for each of them. 72 To achieve our goal we chose some criteria, based on which we 66.7 %. This result contradicts the result taken after the execution will evaluate which testing tool is the most efficient. So we chose of Emma tool on the same set of test cases, which is relatively Human Interface Design (HID) as an indicator of the level of high with an average of 87 % (Fig.1). This contradict, led us to difficulty to learn the tool's procedures on purchase and the increase the number of test cases for a higher quality of tests. For likelihood of errors, in using the tool over a long period of time; Quick Sort we built 4 more test cases (Fig.2), which produced a Ease of Use (EU) to judge if the tool is easy to use to ensure maximum result of 100 % code coverage with both tools. timely, adequate, and continual integration into the software development process; Reporting Features (RF) to show the degree of variety regarding the formats that tools use to report their coverage results; Response Time (RT) used to evaluate the tool's performance with regards to response time. In addition to these criteria, we will also evaluate the number and quality of test cases to judge for the most appropriate tool for the software testing process. 3. EXPERIMENTS AND ANALYSIS In this section we will summarize the experiments we have performed on the selected algorithms. Initially, we built the Java Figure 1: Emma Coverage report initially with three test programs for each of our sorting algorithms. Then we designed cases for QuickSort. the set of testing units by using the JUnit testing framework [7] in Java. Finally we performed the Analysis of Code Coverage, to evaluate these tests through the selected code coverage tools. This analysis calculates the coverage percentage, that serves as an indirect measure of the quality of tests. Based on these measurements, we can then create additional test cases [4][7]to increase code coverage. In table 1 we summarized the quantitative information regarding our experiments. In the last column we show the number of final test cases we built for each of the Java programs of the sorting Figure 2: CodeCoverage report finally with seven test cases algorithms. We used the term "final test cases" because we for QuickSort. continuously improved our coverage results by increasing the number of test cases, until the addition of another test case does not anymore affect the coverage result, that means we have achieved a high level of code coverage. Table 1: Experimental Program Details Input LOC NOC NOM CC No.of Programs TestCase Bubble 53 2 3 4 11 Selection 55 2 3 4 11 Figure 3: Code Coverage report after execution of Insertion 53 2 3 4 11 CodeCover initially with three test cases for QuickSort. Heap 84 2 11 13 16 Merge 67 1 3 11 9 Quick 63 1 6 11 7 LOC-Lines of Code,NOM-Number of Methods,NOC-Number of Classes,CC-Cyclomatic Complexity Based on these coverage results and also the computed criteria chosen for evaluation, we performed the analysis process to define the best tool. In the figures below we see the coverage reports produced after Figure 4: Code Coverage report after execution of CodeCover the execution of Emma and CodeCover for two cases: 1) When finally with seven test cases for QuickSort. we projected a small set of tests; 2) When we projected a larger set of tests in order to improve quality of the testing process. To show briefly the experimental procedure we followed to achieve During our experiments, we noticed that this contradict, that our objective, we will take as an example the experimental results relates to the fact that for the same set of test cases the execution for Quick Sort algorithm. For Quick Sort we initially projected of Emma gives us a higher coverage tool than the result reported only 3 test cases (Fig.1). The CodeCover tool produced low BC from CodeCover, we concluded that CodeCover gives a more (Branch Coverage) and LC (Loop Coverage) coverage metrics of accurate information regarding the code coverage. 73 In Section 2, we mentioned the Correlation Matrix as a way to test case for each functional unit of the program, and to avoid find redundant test cases, which does not increase the coverage programming long test cases that try to cover a considerable part percentage. It shows a kind of dependency relationship between of the program. test cases of the same input program. In JUnit3 testing framework, dependency between tests is not supported, that is why we should always try to avoid dependency between test cases. In the figure below is shown the Correlation Matrix for Quick Sort. Figure 7: Code Coverage report after execution of CodeCover finally with eleven test cases for BubbleSort. We haven't showed Emma coverage report, because it is relatively high since the first case, where we projected only 7 tests. The results gained for SelectionSort are 46.7% for LC metric in the case of 7 tests and 80 % in the final case of 11 test cases; for Insertion are 60% for LC metric in the first case and 86.7% for the final case. So far, we see that in general the most "problematic" coverage metric is the Loop Coverage metric. This happens mainly because of the for loop, that requires more test cases to be covered. This is shown in fig.10, where yellow signifies the partial coverage of the for loop. Figure 5: The Correlation Matrix produced by CodeCover for QuickSort with seven test cases. From the figure above, we see that blue squares (meaning that there is 100 % dependency between test cases), exist only in the case where the same number of test case intersect. So we can say that we have proceeded according to the main rule of JUnit,that is to avoid dependency between test cases. Below we will show by figures the results of the Code Coverage Analysis performed by Emma and CodeCover tools for the other five input sorting programs. For Bubble, Selection and Insertion Sort we initially projected 7 test cases, then in order to achieve a relatively high coverage we Figure 8: A partial coverage of a for loop, crucial for the Lool projected 11 test cases. The coverage result report produced by Covrage metric (80 %). CodeCover for BubbleSort is shown below for both cases. For MergeSort we initially projected 4 test cases, which according to CodeCover produced a low LC indicator of 60 %,. Then we extended this set of test cases to 7test cases, gaining a new percentage of LC of 86.7 % ( the reason why it is not 100 % is because there are many loops in the program, not only the for loops, but also while). Figure 6: Code Coverage report after execution of CodeCover For Heap Sort we initially projected 8 test cases, giving a LC initially with seven test case for BubbleSort. metric of 33.3 % and a CC metric (Condition Coverage) of 80 %.Then we improved this set of tests by extending it to 16 test cases, that improved considerably both the LC and CC metric to From the figure above, we see a low percentage of 53.3 % for the respectively : 88.9 % and 100 %. LC (Loop Coverage) metric. That is why we finally projected 11 Through the graph below we show the improvements we achieved test cases to increase this low percentage as shown in the figure in our experiments until we gained a high code by showing the below, where the new LC metric is 86.7 %, which is considered a initial result we gained when we projected a small set of test cases high coverage percentage. By improving our experimental work and the final result after we increased the number of test cases for on the testing process repeatedly we came into the conclusion that a higher coverage. to achieve a high coverage percentage the secret is to project one 74 through the detailed coverage analysis for each program method, it allows us to define the unnecessary test cases, that does not increase coverage of the program, affecting so negatively the execution time of the test suite by decreasing it. We argued this conclusion by taking as an example QuickSort, where for an initial set of 3 test cases while Emma reported an average coverage of 87%, CodeCover reported a low Loop Coverage of 66.7 %.The same fact was present in all our set of input sorting programs. So in order to project a successful testing process for Figure 9: The percentage of improvement in code coverage our input programs, we should base on CodeCover coverage achieved by increasing the number of test cases for the six reports, to decide whether it is necessary to increase the number sorting programs. of test cases or not. During our experimental work, where we continuously improved the testing process, we came into the conclusion that the most problematic coverage metric is Loop In table 2, we have summarized the results produced by Emma Coverage. This happens mainly because of the for loop, that and CodeCover tools after performing the Code Coverage requires extra tests to be fully covered. So our coverage results Analysis on each of the input programs (the sorting algorithms). for all our input programs reached a Loop Coverage metric in the range 46.7 % to 66.7%, which is considered very low. But not Table 2: Analysis & Implementation of Emma and CodeCover only the Loop Coverage metric was responsible for low coverage Using Various Sort Programs percentages in the beginning of our work, but also the manner in which we projected our tests affects coverage result. So to achieve a high code coverage, we have to avoid programming long test cases that try to cover a considerable part of the program, but instead we must project one test case for each functional unit of the program. We arrive in the same conclusion if we see table 3, that shows the computed criteria chosen to completely evaluate the testing tools. From this table we infer that the CodeCover tool is easy to use, has a very good response time for every command given, has very good reporting features compared with Emma tool. SC-Statement Coverage, BLC-Block Coverage, BC-Branch Coverage, LC-Loop Coverage, MC-Method Coverage, CC 5. REFERENCES Condition Coverage, FC-File Coverage, CLC-Class Coverage [1] Lawrance, J., Clarke, S., Burnett, M., and G. Rothermel. 2005. How Well Do Professional Developers Test with Code Coverage Visualizations? An Empirical Study. In Proceedings of the IEEE After analyzing the code coverage results produced after the Symposium on Visual Languages and Human-Centric Computing execution of Emma and CodeCover on the various sorting (September 2005). programs, we concluded that CodeCover gives a more accurate [2] Tikir, M. M., and Hollingsworth, J. K. 2002. Efficient instrumentation coverage information than Emma. To complete the process of for code coverage testing. In Proceedings of the ACM SIGSOFT 2002 evaluating the effectiveness of these testing tools, we will show International Symposium on Software Testing and Analysis (Rome, Italy, in table 3 the computed criteria [4] [5] selected to evaluate these July 22-24, 2002). tools. [3] Cornett, S. 1996-2011. Code Coverage Analysis. Bullseye Testing Technology. Table 3: Analysis of Tool Metrics [4] Beust, C., and Suleiman, H. 2007. Next Generation Java Testing: TestNg and Advanced Concepts. Addison Wesley, 1-21, 132-150. [5] Ammann, P., and Offutt, J. 2008. Introduction to Software Testing, Cembridge University Press, 268-277. Based on these values (which we partially gained in their official [6] Sommerville, I. 2007. Software Engineering ( 8th edition).Harlow: websites, as they are open-source tools), we judged that the best Addison Wesley, 537-565. and more effective tool to be used during the software testing [7] JUnit Best Practices-Java World, process is CodeCover. http://www.javaworld.com/javaworld/jw-12-2000/jw-1221-junit.html 4. CONCLUSIONS [8] Prasad, K.V.K.K. 2006. Software testing tools. Based on the results summarized in table 2, that shows achieved [9] Durrani, Q. 2005.Role of Software Metrics in Software Engineering code coverage metric reported from each tool, we conclude that and Requirements Analysis. In Proceeding of IEEE ICICT First CodeCover tool reports a more accurate coverage information International Conference of Information and Communication Technologies. (August 27-28). than Emma, which does not supply us with sufficient information, based on which we can judge over the quality of tests, that is why [10] EMMA: a free Java code coverage tool http://Emma.sourceforge.net we suggest the use of the CodeCover tool. CodeCover is more [11] CodeCover Tutorial efficient to perform the Code Coverage Analysis, because http://www.codecoveragetools.com/code_coverage_java.html 75