=Paper=
{{Paper
|id=Vol-1771/paper7
|storemode=property
|title=Code Coverage Analysis of Combinatorial Testing
|pdfUrl=https://ceur-ws.org/Vol-1771/paper7.pdf
|volume=Vol-1771
|authors=Eun-Hye Choi,Osamu Mizuno,Yifan Hu
|dblpUrl=https://dblp.org/rec/conf/apsec/ChoiMH16
}}
==Code Coverage Analysis of Combinatorial Testing==
4th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2016) Code Coverage Analysis of Combinatorial Testing Eun-Hye Choi⇤ , Osamu Mizuno† , Yifan Hu† ⇤ National Institute of Advanced Industrial Science and Technology (AIST), Ikeda, Japan Email: e.choi@aist.go.jp † Kyoto Institute of Technology, Kyoto, Japan Email: o-mizuno@kit.ac.jp, y-hu@se.is.kit.ac.jp Abstract—Combinatorial t-way testing with small t is known as testing would be of big interest to practitioners who consider an efficient black-box testing technique to detect parameter inter-applying t-way testing to their software testing. action failures. So far, several empirical studies have reported the Note that t-way testing is a black-box testing and thus is e↵ectiveness of t-way testing on fault detection abilities. However, few studies have investigated the e↵ectiveness of t-way testing difficult to achieve 100% code coverage and its code coverage on code coverage, which is one of the most important coverage depends on the system under test (SUT) model, e. g. parameters criteria widely used for software testing. This paper presents a and their values, designed for t-way testing. Therefore, in order quantitative analysis to evaluate the code-coverage e↵ectiveness to evaluate the code coverage e↵ectiveness of t-way testing, of t-way testing. Using three open source utility programs, we we compare code coverage obtained by t-way testing with that compare t-way testing with exhaustive (all combination) testing w. r. t. code coverage and test suite sizes. by exhaustive testing, similarly to [10]. In order to quantitatively analyze the code coverage e↵ec- Keywords-Combinatorial testing; t-way testing; Exhaustive test- tiveness of t-way testing compared to exhaustive testing, we ing; Code coverage; Line coverage; Branch coverage. set up the following two research questions: • RQ1: How high code coverage can t-way testing achieve I. Introduction compared to exhaustive testing? Can t-way testing obtain higher code coverage earlier compared to exhaustive Combinatorial testing [15], [20] is a common black-box testing? How large interaction strength t is necessary for testing to detect failures caused by parameter interactions. t-way testing to achieve the code coverage close to that Modern software systems have a lot of parameters, and thus by exhaustive testing? their interactions are too numerous to be exhaustively tested. • RQ2: With the same number of test cases, how di↵erent Combinatorial t-way testing [15], [20], where t is called an are t-way testing and exhaustive testing on code coverage? interaction strength, addresses this problem by testing all value For evaluating the code coverage e↵ectiveness of t-way combinations of t parameters with small t, instead of testing all testing, RQ1 compares t-way testing and exhaustive testing parameter-value combinations exhaustively. t-way testing has in their original sizes, while RQ2 compares t-way testing and been applied to e. g. conformance testing for DOM (Document exhaustive testing in the same sizes. Object Model) Events standard [19], rich web applications To answer the above research questions, we perform a case [18], commercial MP3 players [25], and a ticket gate system study that analyzes t-way test suites with 1 t 4 on two kinds for transportation companies [14]. of widely used code coverage; line coverage (i. e., statement Kuhn et al. [16] investigated the fault detection e↵ectiveness coverage) and branch coverage (i. e., decision coverage). For of t-way testing; their result showed that t-way testing with an empirical case study, we use seventeen versions of three small interaction strength t ( 4) can efficiently detect most C program projects, flex, grep, and make, from the Software- interaction failures while significantly reducing the number artifact Infrastructure Repository (SIR) [7]. To prepare t-way of test cases compared to exhaustive testing, which tests all test suites, we first construct SUT models with constraints parameter-value combinations. Other studies [2], [8], [25] also from test plans in Test Specification Language (TSL) [21] supported the result by Kuhn et al. [16]. of the repository. We next generate t-way test suites for the On the other hand, as far as we know, the only work by SUT models using two state-of-the-art t-way test generation Giannakopoulou et al. [10] reported the e↵ectiveness of t- tools, ACTS [3], [26] and PICT [6], [31]. We evaluate the way testing on code coverage. They compared code coverage code coverage e↵ectiveness of t-way testing by comparing the between their model-checker based exhaustive testing and t-way test suites and exhaustive test suites on the examining 3-way testing with two program modules for a NASA air line coverage and branch coverage with test suite sizes. transportation system. Paper Organization: Section II-A explains combinatorial Code coverage, which measures what percentage of source t-way testing and Section II-B describes related work on code is executed by a test suite, has been considered as one of the e↵ectiveness evaluation of t-way testing. Section III the most important coverage metrics for software testing and describes our experimental setting, and Section IV explains is required by many industrial software development standards experimental results which answer the research questions. (e. g. [1]). Therefore, the code coverage e↵ectiveness of t-way Section V concludes this paper. 43 4th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2016) TABLE I TABLE IV An example SUT model. An exhaustive test suite T2 . TABLE III A 2-way test suite T1 . Parameter Values p1 p2 p3 Debug mode (= p1 ) on, o↵ 1 on on 1 p1 p2 p3 Bypass use (= p2 ) on, o↵ 2 on on o↵ Fast scanner (= p3 ) FastScan (= 1), FullScan (= 2), o↵ 1 on on 1 3 on o↵ 1 2 on o↵ o↵ 4 on o↵ 2 Constraint: 3 o↵ o↵ 1 5 on o↵ o↵ (Fast scanner = FullScan) ! (Bypass use , on) 4 o↵ on o↵ 6 o↵ on 1 5 o↵ o↵ 2 7 o↵ on o↵ TABLE II 6 on o↵ 1 8 o↵ o↵ 1 An example of all possible pairs of parameter-values. 9 o↵ o↵ 2 10 o↵ o↵ o↵ Param. pairs Parameter-value pairs TABLE V (p1 , p2 ) (on, on), (on, o↵), (o↵, on), (o↵, o↵) Related work. (p1 , p3 ) (on, 1), (on, 2), (on, o↵), (o↵, 1), (o↵, 2), (o↵, o↵) (p2 , p3 ) (on, 1), (on, o↵), (o↵, 1), (o↵, 2), (o↵, o↵) Metrics studied Code coverage Fault detection Kuhn et al. (2004) [16] X II. Background and Related Work Zhang et al. (2012) [25] X Petke et al. (2015) [22] X A. Combinatorial t-way Testing Henard et al. (2015) [11] X The System Under Test (SUT) for combinatorial testing Choi et al. (2016) [4] X Giannakopoulou et al. (2011) [10] X is modeled from parameters, their associated values from This paper X finite sets, and constraints between parameter-values. For instance, the SUT model shown in Table I, has three parameters (p1 , p2 , p3 ); the first two parameters have two possible values We say that a tuple of t (1 t |P|) parameter-values is and the other has three possibilities. Constraints among possible i↵ it does not contradict the SUT constraint . A parameter-values exist when some parameter-value combina- t-way test suite for the SUT model is a test suite that covers tions cannot occur. The example SUT has a constraint such all possible t-tuples of parameter-values in the SUT model. that p2 , on if p3 = 1, i. e., the combination of p2 = on and p3 = 1 is not allowed. Example 1. Consider the SUT model in Table I and t = 2. More formally, a model of an SUT is defined as follows: There exist 15 possible t-tuples (pairs) of parameter-values, as shown in Table II. The test suites T1 in Table III is a 2-way Definition 1 (SUT model). An SUT model is a triple hP, V, i, (pairwise) test suite since it covers all the possible parameter- where value pairs in Table II. T2 in Table IV is a 3-way test suite • P is a finite set of parameters p1 , . . . , p|P| , and corresponds to the exhaustive test suite since the number • V is a family that assigns a finite value domain Vi for of parameters in the example model is three . each parameter pi (1 i |P|), and • is a constraint on parameter-value combinations. Many algorithms to efficiently construct t-way test suites have been proposed so far. Approaches to generate t-way A test case is a value assignment for the parameters that test suites for SUT models with constraints include greedy satisfies the SUT constraint. For example, a 3-tuple (p1 =on, algorithms (e. g., AETG [5], PICT [6], [31], and ACTS [3], p2 =on, p3 =1) is a test case for our example SUT model. We [26]), heuristic search (e. g., CASA [9], HHSA [12], and call a sequence of test cases a test suite. TCA [17]), and SAT-based approaches (e. g., Calot [23], [24]). An exhaustive test suite (i. e. all combination test suite) is a sequence of all possible test cases, i. e., a test suite that B. Related Work: E↵ectiveness evaluation of t-way testing covers all parameter-value combinations satisfying the SUT The e↵ectiveness of t-way testing with small interaction constraint. In general, exhaustive testing is impractical since it strength t on fault detection have been reported by several stipulates to test all possible test cases and thus its size (the empirical studies so far [13], [15], but the code coverage of number of test cases) increases exponentially with the number t-way testing has not been studied well. Table V summarizes of parameters. the e↵ectiveness metrics studied in related work. Combinatorial t-way testing (e. g., pairwise, when t = 2) Kuhn et al. [16] investigated parameter interactions inducing alternatively stipulates to test all t-way parameter-value combi- actual failures of four systems; a software for medical devices, nations satisfying the SUT constraint at least once. We call t a browser, a server, and a database system. As a result, 29–68% an interaction strength. An exhaustive test suite corresponds of the faults involved a single parameter; 70–97% (89–99%) to a t-way test suite with t = |P|. of the faults involved up to two (three) parameter interactions; Definition 2 (t-way test suite). Let hP, V, i be an SUT model. 96–100% of the faults involved up to four and five parameter 44 4th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2016) TABLE VI interactions; no fault involved more than six parameters. From Subject programs. the result, the authors concluded that most failures are triggered by parameter interactions with small t (at most four to six) Proj. Ver. Identifier Year of release LoC and thus t-way testing with 4 t 6 could provide the fault v0 2.4.3 1993 10,163 detection ability of “pseudo-exhaustive” testing. v1 2.4.7 1994 10,546 v2 2.5.1 1995 13,000 Zhang et al. [25] also explored that failures of actual flex v3 2.5.2 1996 13,048 commercial MP3 players are triggered by t-way parameter v4 2.5.3 1996 13,142 interactions with at most t = 4. v5 2.5.4 1997 13,144 Petke et al. [22] more thoroughly studied the efficiency of v0 2.0 1996 8,163 early fault detection by t-way testing with 2 t 6. They v1 2.2 1998 11,945 v2 2.3 1999 12,681 used six projects, flex, make, grep, gzip, nanoxml, and siena, grep v3 2.4 1999 12,780 from the Software artifact Infrastructure Repository (SIR) and v4 2.4.1 2000 13,280 showed the number of faults detected after 25%, 50%, 70%, v5 2.4.2 2000 13,275 and 100% of test cases are executed. v0 3.75 1996 17,424 v1 3.76.1 1997 18,525 Henard et al. [11] used five projects, grep, sed, flex, make, make v2 3.77 1998 19,610 and gzip, also from SIR and compared the number of faults v3 3.78.1 1999 20,401 detected by test suite prioritization with t-way coverage (2 v4 3.79.1 2000 23,188 t 4) and other black-box and white-box prioritization. Choi et al. [4] used three projects, flex, grep, and make, Parameters: from SIR and investigated a correlation of the fault detection ef- ... fectiveness with two evaluation metrics, called weight coverage Debug mode: # -d and KL divergence, for prioritized t (= 2)-way testing. Debug_on. To our knowledge, the only work by Giannakopoulou et Debug_off. al. [10] reported code coverage of t-way testing. Their target Bypass use: # -Cr system is a component of the Tactical Separation Assisted Bypass_on. [property Bypass] Flight Environment (TSAFE) of the Next Generation Air Bypass_off. Transportation System (NextGen) by the NASA Ames Research Center. In their work, t (= 3)-way testing and their model- Fast scanner: # -f, -Cf FastScan. [property FastScan] checker (JPF [30]) based exhaustive testing are compared FullScan. [if !Bypass][property FullScan] w. r. t. code coverage; line coverage, branch coverage, loop off. [property f&Cfoff] coverage, and strict condition coverage, which are computed ... using CodeCover [28]. Giannakopoulou et al. reported that for two program mod- Fig. 1. A part of the test plan for flex in TSL. ules, the di↵erences of code coverage by 3-way testing and exhaustive testing are 0–2% for the four coverage metrics they used, while the numbers of test cases are 6,047 for 3-way B. Subject Test Suites testing but 9.9 ⇥ 106 for exhaustive testing. In this paper, we 1) SUT Models: For each project, flex, grep, and make, we more thoroughly analyze the code coverage e↵ectiveness of construct an SUT model for t-way testing whose parameters, t-way testing with 1 t 4 using three open source utility values, and constraints are fully extracted from the test plan in programs. TSL (Test Specification Language), which is included in SIR. For example, Figure 1 shows a part of the test plan in TSL for III. Experiments project flex. From the TSL specification, we construct the SUT model for flex whose parameters include Debug mode(= p1 ), A. Subject Programs Bypass use(= p2 ) and Fast scanner(= p3 ), p2 has two values including Bypass on(= on), p3 has three values including To investigate code coverage of t-way testing, we use three FullScan(= 2), and constraints include (p3 = 2) ! (p2 , on). open source projects of C programs, flex, grep, and make, Table I corresponds to a part of the SUT model for flex, which from the Software artifact Infrastructure Repository (SIR) [32]. is constructed from the part of the test plan in Figure 1. flex is a lexical analysis generator. grep is a program to search Table VII shows the size of the SUT model constructed for text matching regular expressions. make is a program to for each project. In the table, the size of parameter-values is control the compile and build process. The programs have expressed as k; gk11 gk22 . . . gknn , which indicates that the number of been widely used to evaluate testing techniques by researchers parameters is k and for each i there are ki parameters that have hm in studies including [4], [11], [22]. Table VI shows for each gi values. The size of constraints is expressed as l; l1h1 l2h2 . . . lm , version of programs we use, the version identifier, the year which indicates that the constraint is described in conjunctive released, and the lines of code (LoC) calculated using cloc [27]. normal form (CNF) with l variables whose Boolean value 45 4th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2016) TABLE VII Constructed SUT models. Line Branch 1.00 ● ● ● ● ● Proj. Model size ● ● 29; 323 44 62 ● Parameter-values 0.99 flex Constraints 97; 2712 221 242 2517 269 Parameter-values 14; 24 31 43 51 61 91 111 131 201 grep Constraints 87; 2433 327 48 75 161 241 271 281 3110 0.98 ● ● Parameter-values 22; 22 312 44 52 61 71 make Constraints 79; 2526 211 221 231 243 257 269 0.97 TABLE VIII Sizes and code coverage of exhaustive test suites. 0.96 Proj. Size Line coverage Branch coverage 1−way 2−way 3−way 4−way 1−way 2−way 3−way 4−way Avg. 0.7968 0.8544 flex 525 Min. 0.7789 0.8151 Fig. 2. Ratio of code coverage of t-way testing (1 t 4) over that of Max. 0.8312 0.9316 exhaustive testing for all versions of projects. Avg. 0.4961 0.4948 grep 470 Min. 0.4726 0.4746 Line Branch Max. 0.5900 0.5826 0.5 Avg. 0.4543 0.5373 make 793 Min. 0.4234 0.5126 Max. 0.4726 0.5494 0.4 represents an assignment of a value to a parameter and for 1−way each j there are h j clauses that have l j literals. For the example 0.3 2−way SUT model in Figure 1, the size of parameter-values is 3; 22 31 3−way and the size of constraints is 2; 21 . 4−way Exhaustive 2) Test Suites: We use t-way test suites with 1 t 4 that 0.2 are generated by ACTS [3], [26] and PICT [6], [31] for our 0 100 200 300 400 0 100 200 300 400 constructed SUT models with constraints. The tools ACTS and Test cases PICT are state-of-the-art open source t-way test generation tools developed by NIST (National Institute of Standards and Fig. 3. Example code coverage growths of t-way testing (1 t 4) and Technology) and Microsoft, respectively. For comparison, we exhaustive testing for one version (v1) of grep. also use exhaustive test suites each of which obtains all possible test cases. The exhaustive test suite for the test plan of each project is included in SIR. In Table VIII and Table X, we show the average, minimum, 3) Evaluation Metrics: To evaluate code coverage of each and maximum values of code coverage for versions of each test suite, we analyze the following two kinds of code coverage, project. which are computed using gcov [29]: For example, for project flex, the sizes of 2-way test suites (52 by ACTS and 51 by PICT) are less than 10% of the size • Line coverage: the percentage of program lines executed. of the exhaustive test suite (525) from Table VIII and Table IX. • Branch coverage: the percentage of branches of conditional On the other hand, for flex, line coverage and branch coverage statements executed. of 2-way test suites (the exhaustive test suite) are on average gcov is a source code analysis tool, which is a standard utility 0.7927 (0.7968) and 0.8522 (0.8544) from Table VIII and delivered with the GNU C/C++ Compiler and reports how Table X. many lines and branches are executed. IV. Results A. RQ1: t-way testing vs. exhaustive testing Table VIII shows the size, i. e. the number of test cases, To compare the code coverage between t-way testing and and the code coverage (line coverage and branch coverage) of exhaustive testing, we investigate the following metric the exhaustive test suite for each project, while Table IX and RCov (T t , EX) = Cov(T t ) / Cov(EX), Table X show those of the subject t-way test suites (1 t 4) generated by ACTS and PICT. Table IX shows the sizes of the which denotes the ratio of code coverage of t-way test suite subject t-way test suites with the ratio of them over the sizes of T t over that of exhaustive test suite EX. exhaustive test suites. Table X summarizes line coverage and Table XI summarizes the values of RCov (T t , EX) with 1 t branch coverage of the subject t-way test suites for each project. 4 for line coverage and branch coverage for each project and 46 4th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2016) TABLE IX Sizes of t-way test suites (1 t 4). # of test cases (ratio over # of the exhaustive test cases) Proj. 1-way 2-way 3-way 4-way ACTS 30 (5.71 %) 52 (9.90 %) 91 (17.33 %) 155 (29.52 %) flex PICT 30 (5.71 %) 51 (9.71 %) 90 (17.14 %) 154 (29.33 %) ACTS 40 (8.51 %) 76 (16.17 %) 183 (38.94 %) 328 (69.79 %) grep PICT 40 (8.51 %) 77 (16.38 %) 180 (38.30 %) 326 (69.36 %) ACTS 27 (3.40 %) 34 (4.29 %) 44 (5.55 %) 68 (8.58 %) make PICT 27 (3.40 %) 34 (4.29 %) 45 (5.67 %) 69 (8.70 %) TABLE X Code coverage of t-way test suites (1 t 4). Line coverage Branch coverage Proj. 1-way 2-way 3-way 4-way 1-way 2-way 3-way 4-way Avg. 0.7683 0.7927 0.7934 0.7931 0.8407 0.8522 0.8522 0.8522 flex Min. 0.7481 0.7755 0.7763 0.7755 0.8018 0.8136 0.8136 0.8136 Max. 0.8145 0.8264 0.8267 0.8267 0.9225 0.9281 0.9281 0.9281 Avg. 0.4917 0.4959 0.4961 0.4961 0.4769 0.4880 0.4933 0.4948 grep Min. 0.4668 0.4723 0.4726 0.4726 0.4549 0.4676 0.4712 0.4746 Max. 0.5875 0.5897 0.5900 0.5900 0.5726 0.5786 0.5845 0.5826 Avg. 0.4451 0.4539 0.4540 0.4540 0.5321 0.5364 0.5366 0.5366 make Min. 0.4168 0.4230 0.4230 0.4230 0.5053 0.5117 0.5117 0.5117 Max. 0.4628 0.4724 0.4724 0.4726 0.5442 0.5484 0.5484 0.5484 TABLE XI Comparison of code coverage between t-way testing (1 t 4) and exhaustive testing. Line coverage Branch coverage Proj. 1-way 2-way 3-way 4-way 1-way 2-way 3-way 4-way flex 96.41 % 99.49 % 99.58 % 99.54 % 98.40 % 99.74 % 99.74 % 99.74 % Avg. of RCov (T t , EX) grep 99.11 % 99.95 % 100.00 % 100.00 % 96.32 % 98.58 % 99.67 % 100.00 % (R = Cov(T t )/Cov(EX)) make 97.97 % 99.91 % 99.93 % 99.92 % 99.03 % 99.84 % 99.88 % 99.87 % Avg. 97.82 % 99.77 % 99.83 % 99.81 % 97.85 % 99.36 % 99.76 % 99.87 % flex 0 / 12 8 / 12 8 / 12 8 / 12 0 / 12 12 / 12 12 / 12 12 / 12 # (R 99.5 %) grep 4 / 12 12 / 12 12 / 12 12 / 12 0 / 12 0 / 12 7 / 12 12 / 12 / # all cases make 0 / 10 10 / 10 10 / 10 10 / 10 0 / 10 10 / 10 10 / 10 10 / 10 Total 4 / 34 30 / 34 30 / 34 30 / 34 0 / 34 22 / 34 29 / 34 34 / 34 all projects. In the table, we also show the numbers of cases avg. 97.85% (99.36%) of branch coverage of exhaustive testing. where RCov (T t , EX) 99.5%, i. e. t-way testing achieves more With 3-way (4-way) testing, line coverage is avg. 99.83% than 99.5% of the coverage obtained by exhaustive testing, (99.81%) and branch coverage is avg. 99.76% (99.87%) of the over the numbers of all cases (versions) for projects. coverage of exhaustive testing. Figure 2 presents the box plots for the results of RCov (T t , EX) • Can t-way testing obtain higher code coverage earlier for all projects. Each box plot shows the mean (circle in the compared to exhaustive testing? box), median (thick horizontal line), the first/third quartiles Figure 3 shows example line coverage growths and branch (hinges), and highest/lowest values within 1.5 ⇥ the inter- coverage growths of t-way test suites (1 t 4) and the quartile range of the hinge (whiskers). exhaustive test suites for one version (v1) of project grep. (The • How high code coverage can t-way testing achieve coverage growths represent the typical cases of our experiment compared to exhaustive testing? results.) We can see that t-way testing with smaller t obtains From Table XI and Figure 2, we can see that t-way testing higher code coverage earlier compared to exhaustive testing with even small t can achieve high values of RCov (T t , EX), and t-way testing with larger t. i. e. high ratios of code coverage over the code coverage of For the example case in Figure 3, to obtain 48% line coverage exhaustive testing. (46% branch coverage), 1-way, 2-way, 3-way, and 4-way testing In the result of our case study, 1-way (2-way) testing covers respectively require 35, 42, 56, and 56 (36, 47, 71, and 71) test avg. 97.82% (99.77%) of line coverage of exhaustive testing and cases, while exhaustive testing requires 219 (265) test cases. 47 4th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2016) Line Branch • With the same number of test cases, how di↵erent are 1.3 t-way testing and exhaustive testing on code coverage? From Table XII and Figure 4, we can see that t-way test suites with 1 t 4 achieve higher line coverage and branch coverage compared to exhaustive test suites in the same sizes. 1.2 Especially, t-way testing with smaller t obtains higher values of RCov (T t , EX⇤), i. e. higher ratios of code coverage over that of exhaustive testing in the same size. As described in Table XII, for all cases, 1-way and 2-way ● ● ● ● 1.1 ● ● ● ● testing achieve more than 105% of code coverage of exhaustive testing with the same size. For 3-way and 4-way testing, the numbers of cases that achieve more than 105% of line (branch) 1.0 coverage of exhaustive testing with the same sizes are 24 and 1−way 2−way 3−way 4−way 1−way 2−way 3−way 4−way 20 (24 and 16) cases among all 34 cases. Fig. 4. Ratios of code coverage of t-way testing (1 t 4) over that of exhaustive testing with the same sizes for all versions of projects. From the results, t-way testing with smaller t can obtain higher code coverage compared to exhaustive testing with the same number of test cases. • How large t is necessary for t-way testing to achieve the code coverage close to that by exhaustive testing? Surprisingly, as the result of our case study, all t-way test V. Conclusion suites with 1 t 4 obtain more than 95% of code coverage This paper analyzes the code coverage e↵ectiveness of of exhaustive test suites. Especially, for project grep, 4-way combinatorial t-way testing with small t. As a result of our test suites obtain the same line coverage and branch coverage empirical evaluation using a collection of open source utility with the exhaustive test suite. As described in Table XI, in programs, t-way testing with small t (1 t 4) efficiently 30 cases of all 34 cases, 2-way, 3-way, and 4-way test suites covers more than 95% of code coverage achieved by exhaustive achieve more than 99.5% of line coverage of exhaustive test testing, while requiring much smaller test cases. In addition, suites. For branch coverage, in all cases, 4-way test suites comparing in the same test suite sizes, t-way testing with achieve more than 99.5% of coverage of exhaustive test suites. smaller t obtains higher ratio of code coverage over that by exhaustive testing. From the results, t-way testing with small t (1 t In this paper, we evaluate two kinds of widely used code 4) can efficiently obtain code coverage close to that by coverage metrics, line coverage and branch coverage. Further exhaustive testing while requiring smaller test cases. work includes evaluating other metrics such as loop coverage, condition coverage, etc. Another further work is to investigate both the code coverage e↵ectiveness and the fault detection B. RQ2: t-way testing vs. exhaustive testing in the same sizes e↵ectiveness of t-way testing and analyze the relation between them on real software projects. To compare the code coverage between t-way testing and exhaustive testing with the same sizes, we investigate the Acknowledgments following metric The authors would like to thank anonymous referees for RCov (T t , EX⇤) = Cov(T t ) / Cov(EX⇤), their helpful comments and suggestions to improve this paper. This work was partly supported by JSPS KAKENHI Grant which denotes the ratio of code coverage of t-way test suite T t Number 16K12415. over that of a subset, hereafter denoted by EX⇤, of exhaustive test suite EX whose size is same with T t . In our experiments, References we constructed EX⇤ 100 times by randomly selecting |T t | test [1] International Standardization Organization, ISO26262: Road vehicles - cases from exhaustive test suite EX and use the average value Functional safety, November 2011. of the code coverage for the 100 EX⇤. [2] K. Z. Bell and M. A. Vouk. On e↵ectiveness of pairwise methodology for testing network-centric software. In Proc. of International Conference Table XII summarizes the values of RCov (T t , EX⇤) with 1 on Information and Communication Technology, pages 221–235. IEEE, t 4 for line coverage and branch coverage for each project 2005. and all projects. In the table, we also show the numbers of [3] M. N. Borazjany, L. Yu, Y. Lei, R. Kacker, and R. Kuhn. Combinatorial testing of ACTS: A case study. In Proc. of the 5th International cases where RCov (T t , EX⇤) 105%, i. e. t-way testing achieves Conference on Software Testing, Verification and Validation (ICST), more than 105% of the coverage obtained by exhaustive testing pages 591–600. IEEE, 2012. with the same size, over the numbers of all cases for projects. [4] E. Choi, S. Kawabata, O. Mizuno, C. Artho, and T. Kitamura. Test e↵ectiveness evaluation of prioritized combinatorial testing: a case study. Figure 4 presents the box plots for the results of RCov (T t , EX⇤) In Proc. of the International Conference on Software Quality, Reliability for all projects. & Security (QRS), pages 61–68. IEEE, 2016. 48 4th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2016) TABLE XII Comparison of code coverage between t-way testing (1 t 4) and exhaustive testing with the same sizes. Line coverage Branch coverage Proj. 1-way 2-way 3-way 4-way 1-way 2-way 3-way 4-way flex 116.45 % 117.26 % 114.45 % 111.04 % 116.65 % 116.12 % 113.65 % 110.57 % Avg. of RCov (T t , EX⇤) grep 109.93 % 108.39 % 105.36 % 103.31 % 111.26 % 110.44 % 107.36 % 104.94 % (R⇤ = Cov(T t )/Cov(EX⇤)) make 106.63 % 106.45 % 106.11 % 105.36 % 105.99 % 105.80 % 105.51 % 104.85 % Avg. 111.26 % 110.95 % 108.79 % 106.64 % 111.61 % 111.08 % 109.04 % 106.90 % flex 12 / 12 12 / 12 12 / 12 12 / 12 12 / 12 12 / 12 12 / 12 12 / 12 # (R⇤ 105 %) grep 12 / 12 12 / 12 2 / 12 2 / 12 12 / 12 12 / 12 2 / 12 2 / 12 / # all cases make 10 / 10 10 / 10 10 / 10 6 / 10 10 / 10 10 / 10 10 / 10 2 / 10 Total 34 / 34 34 / 34 24 / 34 20 / 34 34 / 34 34 / 34 24 / 34 16 / 34 [5] M. B. Cohen, M. B. Dwyer, and J. Shi. Constructing interaction test [23] A. Yamada, A. Biere, C. Artho, T. Kitamura, and E. Choi. Greedy suites for highly-configurable systems in the presence of constraints: A combinatorial test case generation using unsatisfiable cores. In Proc. of greedy approach. IEEE Trans. Software Eng., 34(5):633–650, 2008. the 31st International Conference on Automated Software Engineering [6] J. Czerwonka. Pairwise testing in the real world: Practical extensions (ASE), pages 614–624. IEEE/ACM, 2016. to test-case senarios. In Proc. of the 24th Pacific Northwest Software [24] A. Yamada, T. Kitamura, C. Artho, E. Choi, Y. Oiwa, and A. Biere. Quality Conference, pages 419–430. Citeseer, 2006. Optimization of combinatorial testing by incremental SAT solving. In [7] H. Do, S. Elbaum, and G. Rothermel. Supporting controlled experimen- Proc. of the 8th International Conference on Software Testing, Verification tation with testing techniques: An infrastructure and its potential impact. and Validation (ICST), pages 1–10. IEEE, 2015. Empirical Software Engineering, 10(4):405–435, 2005. [25] Z. Zhang, X. Liu, and J. Zhang. Combinatorial testing on id3v2 tags [8] G. B. Finelli. NASA software failure characterization experiments. of mp3 files. In Proc. of the 5th International Conference on Software Reliability Engineering & System Safety, 32(1):155–169, 1991. Testing, Verification and Validation (ICST), pages 587–590. IEEE, 2012. [9] B. J. Garvin, M. B. Cohen, and M. B. Dwyer. Evaluating improvements [26] ACTS, Available: http://csrc.nist.gov/groups/SNS/acts/. to a meta-heuristic search for constrained interaction testing. Empirical [27] cloc – Count Lines of Code, Available: http://cloc.sourceforge.net. Software Engineering, 16(1):61–102, 2011. [28] CodeCover – an open-source glass-box testing tool, Available: http://codecover.org. [10] D. Giannakopoulou, D. Bushnell, J. Schumann, H. Erzberger, and [29] gcov – a test coverage program, Available: K. Heere. Formal testing for separation assurance. Annals of Mathematics http://gcc.gnu.org/onlinedocs/gcc/Gcov.html. and Artificial Intelligence, 63(1):5–30, 2011. [30] JavaPathfinder, Available: http://babelfish.arc.nasa.gov/trac/jpf. [11] C. Henard, M. Papadakis, M. Harman, Y. Jia, and Y. Le Traon. Comparing [31] Pairwise Independent Combinatorial Tool, Available: white-box and black-box test prioritization. In Proc. of the 38th http://github.com/Microsoft/pict. International Conference on Software Engineering (ICSE), pages 523– [32] Software-artifact Infrastructure Repository, Available: http://sir.unl.edu/. 534. ACM, 2016. [12] Y. Jia, M. B. Cohen, M. Harman, and J. Petke. Learning combinatorial interaction testing strategies using hyperheuristic search. In Proc. of the 37th International Conference on Software Engineering (ICSE), pages 540–550. IEEE/ACM, 2015. [13] R. N. Kacker, D. R. Kuhn, Y. Lei, and J. F. Lawrence. Combinatorial test- ing for software: An adaptation of design of experiments. Measurement, 46(9):3745–3752, 2013. [14] T. Kitamura, A. Yamada, G. Hatayama, C. Artho, E. Choi, N. T. B. Do, Y. Oiwa, and S. Sakuragi. Combinatorial testing for tree-structured test models with constraints. In Proc. of the International Conference on Software Quality, Reliability & Security (QRS), pages 141–150. IEEE, 2015. [15] D. R. Kuhn, R. N. Kacker, and Y. Lei. Introduction to combinatorial testing. CRC Press, 2013. [16] D. R. Kuhn, D. R. Wallace, and A. M. Gallo. Software fault interactions and implications for software testing. IEEE Trans. Software Eng., 30(6):418–421, 2004. [17] J. Lin, C. Luo, S. Cai, K. Su, D. Hao, and L. Zhang. TCA: An efficient two-mode meta-heuristic algorithm for combinatorial test generation. In Proc. of the 30th International Conference on Automated Software Engineering (ASE), pages 494–505. ACM/IEEE, 2015. [18] C. Maughan. Test case generation using combinatorial based coverage for rich web applications. PhD Thesis. Utah State University, 2012. [19] C. Montanez, D. R. Kuhn, M. Brady, R. M. Rivello, J. Reyes, and M. K. Powers. An application of combinatorial methods to conformance testing for document object model events. NISTIR-7773, 2010. [20] C. Nie and H. Leung. A survey of combinatorial testing. ACM Computing Surveys, 43(2):11, 2011. [21] T. J. Ostrand and M. J. Balcer. The category-partition method for specifying and generating functional tests. Commun. ACM, 31(6):676– 686, 1988. [22] J. Petke, M. Cohen, M. Harman, and S. Yoo. Practical combinatorial interaction testing: Empirical findings on efficiency and early fault detection. IEEE Trans. Software Eng., 41(9):901–924, 2015. 49