=Paper= {{Paper |id=Vol-1771/paper7 |storemode=property |title=Code Coverage Analysis of Combinatorial Testing |pdfUrl=https://ceur-ws.org/Vol-1771/paper7.pdf |volume=Vol-1771 |authors=Eun-Hye Choi,Osamu Mizuno,Yifan Hu |dblpUrl=https://dblp.org/rec/conf/apsec/ChoiMH16 }} ==Code Coverage Analysis of Combinatorial Testing== https://ceur-ws.org/Vol-1771/paper7.pdf
              4th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2016)




  Code Coverage Analysis of Combinatorial Testing
                                           Eun-Hye Choi⇤ , Osamu Mizuno† , Yifan Hu†
                   ⇤ National Institute of Advanced Industrial Science and Technology (AIST), Ikeda, Japan
                                                       Email: e.choi@aist.go.jp
                                           † Kyoto Institute of Technology, Kyoto, Japan

                                          Email: o-mizuno@kit.ac.jp, y-hu@se.is.kit.ac.jp


   Abstract—Combinatorial t-way testing with small t is known as   testing would be of big interest to practitioners who consider
an efficient black-box testing technique to detect parameter inter-applying t-way testing to their software testing.
action failures. So far, several empirical studies have reported the  Note that t-way testing is a black-box testing and thus is
e↵ectiveness of t-way testing on fault detection abilities. However,
few studies have investigated the e↵ectiveness of t-way testing    difficult to achieve 100% code coverage and its code coverage
on code coverage, which is one of the most important coverage      depends on the system under test (SUT) model, e. g. parameters
criteria widely used for software testing. This paper presents a   and their values, designed for t-way testing. Therefore, in order
quantitative analysis to evaluate the code-coverage e↵ectiveness   to evaluate the code coverage e↵ectiveness of t-way testing,
of t-way testing. Using three open source utility programs, we     we compare code coverage obtained by t-way testing with that
compare t-way testing with exhaustive (all combination) testing
w. r. t. code coverage and test suite sizes.                       by exhaustive testing, similarly to [10].
                                                                      In order to quantitatively analyze the code coverage e↵ec-
   Keywords-Combinatorial testing; t-way testing; Exhaustive test- tiveness of t-way testing compared to exhaustive testing, we
ing; Code coverage; Line coverage; Branch coverage.
                                                                   set up the following two research questions:
                                                                      • RQ1: How high code coverage can t-way testing achieve
                          I. Introduction                                compared to exhaustive testing? Can t-way testing obtain
                                                                         higher code coverage earlier compared to exhaustive
   Combinatorial testing [15], [20] is a common black-box                testing? How large interaction strength t is necessary for
testing to detect failures caused by parameter interactions.             t-way testing to achieve the code coverage close to that
Modern software systems have a lot of parameters, and thus               by exhaustive testing?
their interactions are too numerous to be exhaustively tested.        • RQ2: With the same number of test cases, how di↵erent
Combinatorial t-way testing [15], [20], where t is called an             are t-way testing and exhaustive testing on code coverage?
interaction strength, addresses this problem by testing all value For evaluating the code coverage e↵ectiveness of t-way
combinations of t parameters with small t, instead of testing all testing, RQ1 compares t-way testing and exhaustive testing
parameter-value combinations exhaustively. t-way testing has in their original sizes, while RQ2 compares t-way testing and
been applied to e. g. conformance testing for DOM (Document exhaustive testing in the same sizes.
Object Model) Events standard [19], rich web applications             To answer the above research questions, we perform a case
[18], commercial MP3 players [25], and a ticket gate system study that analyzes t-way test suites with 1  t  4 on two kinds
for transportation companies [14].                                 of widely used code coverage; line coverage (i. e., statement
   Kuhn et al. [16] investigated the fault detection e↵ectiveness coverage) and branch coverage (i. e., decision coverage). For
of t-way testing; their result showed that t-way testing with an empirical case study, we use seventeen versions of three
small interaction strength t ( 4) can efficiently detect most C program projects, flex, grep, and make, from the Software-
interaction failures while significantly reducing the number artifact Infrastructure Repository (SIR) [7]. To prepare t-way
of test cases compared to exhaustive testing, which tests all test suites, we first construct SUT models with constraints
parameter-value combinations. Other studies [2], [8], [25] also from test plans in Test Specification Language (TSL) [21]
supported the result by Kuhn et al. [16].                          of the repository. We next generate t-way test suites for the
   On the other hand, as far as we know, the only work by SUT models using two state-of-the-art t-way test generation
Giannakopoulou et al. [10] reported the e↵ectiveness of t- tools, ACTS [3], [26] and PICT [6], [31]. We evaluate the
way testing on code coverage. They compared code coverage code coverage e↵ectiveness of t-way testing by comparing the
between their model-checker based exhaustive testing and t-way test suites and exhaustive test suites on the examining
3-way testing with two program modules for a NASA air line coverage and branch coverage with test suite sizes.
transportation system.                                                Paper Organization: Section II-A explains combinatorial
   Code coverage, which measures what percentage of source t-way testing and Section II-B describes related work on
code is executed by a test suite, has been considered as one of the e↵ectiveness evaluation of t-way testing. Section III
the most important coverage metrics for software testing and describes our experimental setting, and Section IV explains
is required by many industrial software development standards experimental results which answer the research questions.
(e. g. [1]). Therefore, the code coverage e↵ectiveness of t-way Section V concludes this paper.




                                                                   43
                4th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2016)




                                TABLE I                                                                                           TABLE IV
                          An example SUT model.                                                                            An exhaustive test suite T2 .
                                                                                            TABLE III
                                                                                       A 2-way test suite T1 .
        Parameter               Values                                                                                                   p1    p2       p3
        Debug mode (= p1 )      on, o↵                                                                                            1     on     on        1
                                                                                             p1    p2     p3
        Bypass use (= p2 )      on, o↵                                                                                            2     on     on       o↵
        Fast scanner (= p3 )    FastScan (= 1), FullScan (= 2), o↵                     1    on     on      1                      3     on     o↵        1
                                                                                       2    on     o↵     o↵                      4     on     o↵        2
        Constraint:
                                                                                       3    o↵     o↵      1                      5     on     o↵       o↵
        (Fast scanner = FullScan) ! (Bypass use , on)
                                                                                       4    o↵     on     o↵                      6     o↵     on        1
                                                                                       5    o↵     o↵      2                      7     o↵     on       o↵
                                 TABLE II                                              6    on     o↵      1                      8     o↵     o↵        1
            An example of all possible pairs of parameter-values.                                                                 9     o↵     o↵        2
                                                                                                                                 10     o↵     o↵       o↵

   Param. pairs     Parameter-value pairs                                                                         TABLE V
   (p1 , p2 )       (on, on), (on, o↵), (o↵, on), (o↵, o↵)                                                       Related work.
   (p1 , p3 )       (on, 1), (on, 2), (on, o↵), (o↵, 1), (o↵, 2), (o↵, o↵)
   (p2 , p3 )       (on, 1), (on, o↵), (o↵, 1), (o↵, 2), (o↵, o↵)                                                                     Metrics studied
                                                                                                                         Code coverage         Fault detection
                                                                                       Kuhn et al. (2004) [16]                                          X
                II. Background and Related Work                                       Zhang et al. (2012) [25]                                          X
                                                                                      Petke et al. (2015) [22]                                          X
A. Combinatorial t-way Testing                                                        Henard et al. (2015) [11]                                         X
   The System Under Test (SUT) for combinatorial testing                                Choi et al. (2016) [4]                                          X
                                                                                  Giannakopoulou et al. (2011) [10]
                                                                                                            X
is modeled from parameters, their associated values from                                     This paper     X
finite sets, and constraints between parameter-values. For
instance, the SUT model shown in Table I, has three parameters
(p1 , p2 , p3 ); the first two parameters have two possible values We say that a tuple of t (1  t  |P|) parameter-values is
and the other has three possibilities. Constraints among possible i↵ it does not contradict the SUT constraint . A
parameter-values exist when some parameter-value combina- t-way test suite for the SUT model is a test suite that covers
tions cannot occur. The example SUT has a constraint such all possible t-tuples of parameter-values in the SUT model.
that p2 , on if p3 = 1, i. e., the combination of p2 = on and
p3 = 1 is not allowed.                                             Example 1. Consider the SUT model in Table I and t = 2.
   More formally, a model of an SUT is defined as follows:         There exist 15 possible t-tuples (pairs) of parameter-values, as
                                                                   shown in Table II. The test suites T1 in Table III is a 2-way
Definition 1 (SUT model). An SUT model is a triple hP, V, i,
                                                                   (pairwise) test suite since it covers all the possible parameter-
where
                                                                   value pairs in Table II. T2 in Table IV is a 3-way test suite
   • P is a finite set of parameters p1 , . . . , p|P| ,           and corresponds to the exhaustive test suite since the number
   • V is a family that assigns a finite value domain Vi for       of parameters in the example model is three .
      each parameter pi (1  i  |P|), and
   •      is a constraint on parameter-value combinations.            Many algorithms to efficiently construct t-way test suites
                                                                   have been proposed so far. Approaches to generate t-way
   A test case is a value assignment for the parameters that test suites for SUT models with constraints include greedy
satisfies the SUT constraint. For example, a 3-tuple (p1 =on, algorithms (e. g., AETG [5], PICT [6], [31], and ACTS [3],
p2 =on, p3 =1) is a test case for our example SUT model. We [26]), heuristic search (e. g., CASA [9], HHSA [12], and
call a sequence of test cases a test suite.                        TCA [17]), and SAT-based approaches (e. g., Calot [23], [24]).
   An exhaustive test suite (i. e. all combination test suite) is
a sequence of all possible test cases, i. e., a test suite that B. Related Work: E↵ectiveness evaluation of t-way testing
covers all parameter-value combinations satisfying the SUT
                                                                      The e↵ectiveness of t-way testing with small interaction
constraint. In general, exhaustive testing is impractical since it
                                                                   strength t on fault detection have been reported by several
stipulates to test all possible test cases and thus its size (the
                                                                   empirical studies so far [13], [15], but the code coverage of
number of test cases) increases exponentially with the number
                                                                   t-way testing has not been studied well. Table V summarizes
of parameters.
                                                                   the e↵ectiveness metrics studied in related work.
   Combinatorial t-way testing (e. g., pairwise, when t = 2)
                                                                      Kuhn et al. [16] investigated parameter interactions inducing
alternatively stipulates to test all t-way parameter-value combi-
                                                                   actual failures of four systems; a software for medical devices,
nations satisfying the SUT constraint at least once. We call t
                                                                   a browser, a server, and a database system. As a result, 29–68%
an interaction strength. An exhaustive test suite corresponds
                                                                   of the faults involved a single parameter; 70–97% (89–99%)
to a t-way test suite with t = |P|.
                                                                   of the faults involved up to two (three) parameter interactions;
Definition 2 (t-way test suite). Let hP, V, i be an SUT model. 96–100% of the faults involved up to four and five parameter




                                                                             44
              4th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2016)




                                                                                                       TABLE VI
interactions; no fault involved more than six parameters. From                                      Subject programs.
the result, the authors concluded that most failures are triggered
by parameter interactions with small t (at most four to six)                   Proj.     Ver.     Identifier   Year of release      LoC
and thus t-way testing with 4  t  6 could provide the fault                                v0     2.4.3           1993          10,163
detection ability of “pseudo-exhaustive” testing.                                            v1     2.4.7           1994          10,546
                                                                                             v2     2.5.1           1995          13,000
   Zhang et al. [25] also explored that failures of actual                      flex
                                                                                             v3     2.5.2           1996          13,048
commercial MP3 players are triggered by t-way parameter                                      v4     2.5.3           1996          13,142
interactions with at most t = 4.                                                             v5     2.5.4           1997          13,144
   Petke et al. [22] more thoroughly studied the efficiency of                               v0      2.0            1996           8,163
early fault detection by t-way testing with 2  t  6. They                                  v1      2.2            1998          11,945
                                                                                             v2      2.3            1999          12,681
used six projects, flex, make, grep, gzip, nanoxml, and siena,                 grep
                                                                                             v3      2.4            1999          12,780
from the Software artifact Infrastructure Repository (SIR) and                               v4     2.4.1           2000          13,280
showed the number of faults detected after 25%, 50%, 70%,                                    v5     2.4.2           2000          13,275
and 100% of test cases are executed.                                                         v0     3.75            1996          17,424
                                                                                             v1    3.76.1           1997          18,525
   Henard et al. [11] used five projects, grep, sed, flex, make,               make          v2     3.77            1998          19,610
and gzip, also from SIR and compared the number of faults                                    v3    3.78.1           1999          20,401
detected by test suite prioritization with t-way coverage (2                                v4    3.79.1           2000          23,188
t  4) and other black-box and white-box prioritization.
   Choi et al. [4] used three projects, flex, grep, and make,
                                                                            Parameters:
from SIR and investigated a correlation of the fault detection ef-          ...
fectiveness with two evaluation metrics, called weight coverage              Debug mode:            # -d
and KL divergence, for prioritized t (= 2)-way testing.                        Debug_on.
   To our knowledge, the only work by Giannakopoulou et                        Debug_off.
al. [10] reported code coverage of t-way testing. Their target               Bypass use:            # -Cr
system is a component of the Tactical Separation Assisted                      Bypass_on.           [property Bypass]
Flight Environment (TSAFE) of the Next Generation Air                          Bypass_off.
Transportation System (NextGen) by the NASA Ames Research
Center. In their work, t (= 3)-way testing and their model-                  Fast scanner:          # -f, -Cf
                                                                               FastScan.            [property FastScan]
checker (JPF [30]) based exhaustive testing are compared                       FullScan.            [if !Bypass][property FullScan]
w. r. t. code coverage; line coverage, branch coverage, loop                   off.                 [property f&Cfoff]
coverage, and strict condition coverage, which are computed                 ...
using CodeCover [28].
   Giannakopoulou et al. reported that for two program mod-                        Fig. 1.    A part of the test plan for flex in TSL.
ules, the di↵erences of code coverage by 3-way testing and
exhaustive testing are 0–2% for the four coverage metrics they
used, while the numbers of test cases are 6,047 for 3-way              B. Subject Test Suites
testing but 9.9 ⇥ 106 for exhaustive testing. In this paper, we     1) SUT Models: For each project, flex, grep, and make, we
more thoroughly analyze the code coverage e↵ectiveness of         construct an SUT model for t-way testing whose parameters,
t-way testing with 1  t  4 using three open source utility      values, and constraints are fully extracted from the test plan in
programs.                                                         TSL (Test Specification Language), which is included in SIR.
                                                                  For example, Figure 1 shows a part of the test plan in TSL for
                         III. Experiments                         project flex. From the TSL specification, we construct the SUT
                                                                  model for flex whose parameters include Debug mode(= p1 ),
A. Subject Programs                                               Bypass use(= p2 ) and Fast scanner(= p3 ), p2 has two values
                                                                  including Bypass on(= on), p3 has three values including
   To investigate code coverage of t-way testing, we use three FullScan(= 2), and constraints include (p3 = 2) ! (p2 , on).
open source projects of C programs, flex, grep, and make, Table I corresponds to a part of the SUT model for flex, which
from the Software artifact Infrastructure Repository (SIR) [32]. is constructed from the part of the test plan in Figure 1.
flex is a lexical analysis generator. grep is a program to search   Table VII shows the size of the SUT model constructed
for text matching regular expressions. make is a program to for each project. In the table, the size of parameter-values is
control the compile and build process. The programs have expressed as k; gk11 gk22 . . . gknn , which indicates that the number of
been widely used to evaluate testing techniques by researchers parameters is k and for each i there are ki parameters that have
                                                                                                                                     hm
in studies including [4], [11], [22]. Table VI shows for each gi values. The size of constraints is expressed as l; l1h1 l2h2 . . . lm  ,
version of programs we use, the version identifier, the year which indicates that the constraint is described in conjunctive
released, and the lines of code (LoC) calculated using cloc [27]. normal form (CNF) with l variables whose Boolean value




                                                                  45
                 4th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2016)




                                TABLE VII
                          Constructed SUT models.                                                        Line                             Branch
                                                                                    1.00                                                               ●
                                                                                                     ●           ●   ●                             ●
     Proj.                            Model size
                                                                                                                                      ●
                                                                                                                                                   ●
                                     29; 323 44 62
                                                                                                                                                   ●
                Parameter-values                                                    0.99
     flex
                Constraints          97; 2712 221 242 2517 269
                Parameter-values     14; 24 31 43 51 61 91 111 131 201
     grep
                Constraints          87; 2433 327 48 75 161 241 271 281 3110        0.98
                                                                                              ●                                ●


                Parameter-values     22; 22 312 44 52 61 71
    make
                Constraints          79; 2526 211 221 231 243 257 269               0.97

                                 TABLE VIII
               Sizes and code coverage of exhaustive test suites.                   0.96

       Proj.      Size             Line coverage       Branch coverage                       1−way 2−way 3−way 4−way        1−way 2−way 3−way 4−way
                          Avg.             0.7968                 0.8544
       flex        525    Min.             0.7789                 0.8151            Fig. 2. Ratio of code coverage of t-way testing (1  t  4) over that of
                          Max.             0.8312                 0.9316            exhaustive testing for all versions of projects.
                          Avg.             0.4961                 0.4948
      grep         470    Min.             0.4726                 0.4746                                  Line                            Branch
                          Max.             0.5900                 0.5826               0.5

                          Avg.             0.4543                 0.5373
      make         793    Min.             0.4234                 0.5126
                          Max.             0.4726                 0.5494
                                                                                       0.4



represents an assignment of a value to a parameter and for                                                               1−way
each j there are h j clauses that have l j literals. For the example    0.3
                                                                                                                         2−way
SUT model in Figure 1, the size of parameter-values is 3; 22 31                                                          3−way
and the size of constraints is 2; 21 .                                                                                   4−way
                                                                                                                         Exhaustive
   2) Test Suites: We use t-way test suites with 1  t  4 that         0.2
are generated by ACTS [3], [26] and PICT [6], [31] for our
                                                                            0   100 200 300 400             0   100 200 300 400
constructed SUT models with constraints. The tools ACTS and                                          Test cases
PICT are state-of-the-art open source t-way test generation
tools developed by NIST (National Institute of Standards and Fig. 3. Example code coverage growths of t-way testing (1  t  4) and
Technology) and Microsoft, respectively. For comparison, we exhaustive testing for one version (v1) of grep.
also use exhaustive test suites each of which obtains all possible
test cases. The exhaustive test suite for the test plan of each
project is included in SIR.                                          In Table VIII and Table X, we show the average, minimum,
   3) Evaluation Metrics: To evaluate code coverage of each and maximum values of code coverage for versions of each
test suite, we analyze the following two kinds of code coverage, project.
which are computed using gcov [29]:                                     For example, for project flex, the sizes of 2-way test suites
                                                                     (52 by ACTS and 51 by PICT) are less than 10% of the size
   • Line coverage: the percentage of program lines executed.
                                                                     of the exhaustive test suite (525) from Table VIII and Table IX.
   • Branch coverage: the percentage of branches of conditional
                                                                     On the other hand, for flex, line coverage and branch coverage
      statements executed.
                                                                     of 2-way test suites (the exhaustive test suite) are on average
gcov is a source code analysis tool, which is a standard utility
                                                                     0.7927 (0.7968) and 0.8522 (0.8544) from Table VIII and
delivered with the GNU C/C++ Compiler and reports how Table X.
many lines and branches are executed.
                           IV. Results                               A. RQ1: t-way testing vs. exhaustive testing

  Table VIII shows the size, i. e. the number of test cases,           To compare the code coverage between t-way testing and
and the code coverage (line coverage and branch coverage) of       exhaustive  testing, we investigate the following metric
the exhaustive test suite for each project, while Table IX and                    RCov (T t , EX) = Cov(T t ) / Cov(EX),
Table X show those of the subject t-way test suites (1  t  4)
generated by ACTS and PICT. Table IX shows the sizes of the which denotes the ratio of code coverage of t-way test suite
subject t-way test suites with the ratio of them over the sizes of T t over that of exhaustive test suite EX.
exhaustive test suites. Table X summarizes line coverage and           Table XI summarizes the values of RCov (T t , EX) with 1  t 
branch coverage of the subject t-way test suites for each project. 4 for line coverage and branch coverage for each project and




                                                                               46
              4th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2016)




                                                                            TABLE IX
                                                               Sizes of t-way test suites (1  t  4).

                                                              # of test cases (ratio over # of the exhaustive test cases)
                             Proj.
                                                       1-way                 2-way                   3-way                    4-way
                                         ACTS     30      (5.71 %)      52        (9.90 %)      91      (17.33 %)      155     (29.52 %)
                             flex
                                         PICT     30      (5.71 %)      51        (9.71 %)      90      (17.14 %)      154     (29.33 %)
                                         ACTS     40      (8.51 %)      76    (16.17 %)        183      (38.94 %)      328     (69.79 %)
                            grep
                                         PICT     40      (8.51 %)      77    (16.38 %)        180      (38.30 %)      326     (69.36 %)
                                         ACTS     27      (3.40 %)      34        (4.29 %)      44       (5.55 %)       68      (8.58 %)
                            make
                                         PICT     27      (3.40 %)      34        (4.29 %)      45       (5.67 %)       69      (8.70 %)

                                                                           TABLE X
                                                          Code coverage of t-way test suites (1  t  4).

                                                              Line coverage                                  Branch coverage
                            Proj.
                                                 1-way       2-way        3-way      4-way      1-way        2-way      3-way         4-way
                                        Avg.     0.7683     0.7927      0.7934       0.7931     0.8407     0.8522      0.8522        0.8522
                            flex        Min.     0.7481     0.7755      0.7763       0.7755     0.8018     0.8136      0.8136        0.8136
                                        Max.     0.8145     0.8264      0.8267       0.8267     0.9225     0.9281      0.9281        0.9281
                                        Avg.     0.4917     0.4959      0.4961       0.4961     0.4769     0.4880      0.4933        0.4948
                            grep        Min.     0.4668     0.4723      0.4726       0.4726     0.4549     0.4676      0.4712        0.4746
                                        Max.     0.5875     0.5897      0.5900       0.5900     0.5726     0.5786      0.5845        0.5826
                                        Avg.     0.4451     0.4539      0.4540       0.4540     0.5321     0.5364      0.5366        0.5366
                           make         Min.     0.4168     0.4230      0.4230       0.4230     0.5053     0.5117      0.5117        0.5117
                                        Max.     0.4628     0.4724      0.4724       0.4726     0.5442     0.5484      0.5484        0.5484

                                                                        TABLE XI
                                     Comparison of code coverage between t-way testing (1  t  4) and exhaustive testing.

                                                                     Line coverage                                           Branch coverage
                                        Proj.
                                                   1-way         2-way            3-way        4-way          1-way          2-way       3-way      4-way
                                         flex    96.41 %       99.49 %        99.58 %         99.54 %      98.40 %      99.74 %        99.74 %     99.74 %
           Avg. of RCov (T t , EX)      grep     99.11 %       99.95 %       100.00 %        100.00 %      96.32 %      98.58 %        99.67 %    100.00 %
          (R = Cov(T t )/Cov(EX))       make     97.97 %       99.91 %        99.93 %         99.92 %      99.03 %      99.84 %        99.88 %     99.87 %
                                        Avg.     97.82 %       99.77 %        99.83 %         99.81 %      97.85 %      99.36 %        99.76 %     99.87 %
                                         flex      0 / 12        8 / 12        8 / 12          8 / 12         0 / 12     12 / 12        12 / 12    12 / 12
              # (R 99.5 %)              grep       4 / 12       12 / 12       12 / 12         12 / 12         0 / 12      0 / 12         7 / 12    12 / 12
                / # all cases           make       0 / 10       10 / 10       10 / 10         10 / 10         0 / 10     10 / 10        10 / 10    10 / 10
                                        Total      4 / 34       30 / 34       30 / 34         30 / 34         0 / 34     22 / 34        29 / 34    34 / 34



all projects. In the table, we also show the numbers of cases avg. 97.85% (99.36%) of branch coverage of exhaustive testing.
where RCov (T t , EX) 99.5%, i. e. t-way testing achieves more With 3-way (4-way) testing, line coverage is avg. 99.83%
than 99.5% of the coverage obtained by exhaustive testing, (99.81%) and branch coverage is avg. 99.76% (99.87%) of the
over the numbers of all cases (versions) for projects.                 coverage of exhaustive testing.
    Figure 2 presents the box plots for the results of RCov (T t , EX)    • Can t-way testing obtain higher code coverage earlier
for all projects. Each box plot shows the mean (circle in the               compared to exhaustive testing?
box), median (thick horizontal line), the first/third quartiles           Figure 3 shows example line coverage growths and branch
(hinges), and highest/lowest values within 1.5 ⇥ the inter- coverage growths of t-way test suites (1  t  4) and the
quartile range of the hinge (whiskers).                                exhaustive test suites for one version (v1) of project grep. (The
    • How high code coverage can t-way testing achieve                 coverage growths represent the typical cases of our experiment
      compared to exhaustive testing?                                  results.) We can see that t-way testing with smaller t obtains
    From Table XI and Figure 2, we can see that t-way testing higher code coverage earlier compared to exhaustive testing
with even small t can achieve high values of RCov (T t , EX), and t-way testing with larger t.
i. e. high ratios of code coverage over the code coverage of              For the example case in Figure 3, to obtain 48% line coverage
exhaustive testing.                                                    (46% branch coverage), 1-way, 2-way, 3-way, and 4-way testing
    In the result of our case study, 1-way (2-way) testing covers respectively require 35, 42, 56, and 56 (36, 47, 71, and 71) test
avg. 97.82% (99.77%) of line coverage of exhaustive testing and cases, while exhaustive testing requires 219 (265) test cases.




                                                                                  47
               4th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2016)




                    Line                               Branch                      •   With the same number of test cases, how di↵erent are
1.3
                                                                                       t-way testing and exhaustive testing on code coverage?
                                                                                    From Table XII and Figure 4, we can see that t-way test
                                                                                 suites with 1  t  4 achieve higher line coverage and branch
                                                                                 coverage compared to exhaustive test suites in the same sizes.
1.2
                                                                                 Especially, t-way testing with smaller t obtains higher values
                                                                                 of RCov (T t , EX⇤), i. e. higher ratios of code coverage over that
                                                                                 of exhaustive testing in the same size.
                                                                                    As described in Table XII, for all cases, 1-way and 2-way
        ●                                  ●
                ●                                  ●
1.1                                                             ●
                           ●

                                 ●                                    ●          testing achieve more than 105% of code coverage of exhaustive
                                                                                 testing with the same size. For 3-way and 4-way testing, the
                                                                                 numbers of cases that achieve more than 105% of line (branch)
1.0
                                                                                 coverage of exhaustive testing with the same sizes are 24 and
      1−way   2−way    3−way   4−way    1−way   2−way     3−way     4−way
                                                                                 20 (24 and 16) cases among all 34 cases.
Fig. 4. Ratios of code coverage of t-way testing (1  t  4) over that of
exhaustive testing with the same sizes for all versions of projects.                  From the results, t-way testing with smaller t can obtain
                                                                                   higher code coverage compared to exhaustive testing with
                                                                                   the same number of test cases.
  •   How large t is necessary for t-way testing to achieve the
      code coverage close to that by exhaustive testing?
   Surprisingly, as the result of our case study, all t-way test                                               V. Conclusion
suites with 1  t  4 obtain more than 95% of code coverage                         This paper analyzes the code coverage e↵ectiveness of
of exhaustive test suites. Especially, for project grep, 4-way                   combinatorial t-way testing with small t. As a result of our
test suites obtain the same line coverage and branch coverage                    empirical evaluation using a collection of open source utility
with the exhaustive test suite. As described in Table XI, in                     programs, t-way testing with small t (1  t  4) efficiently
30 cases of all 34 cases, 2-way, 3-way, and 4-way test suites                    covers more than 95% of code coverage achieved by exhaustive
achieve more than 99.5% of line coverage of exhaustive test                      testing, while requiring much smaller test cases. In addition,
suites. For branch coverage, in all cases, 4-way test suites                     comparing in the same test suite sizes, t-way testing with
achieve more than 99.5% of coverage of exhaustive test suites.                   smaller t obtains higher ratio of code coverage over that by
                                                                                 exhaustive testing.
     From the results, t-way testing with small t (1  t                           In this paper, we evaluate two kinds of widely used code
  4) can efficiently obtain code coverage close to that by                       coverage metrics, line coverage and branch coverage. Further
  exhaustive testing while requiring smaller test cases.                         work includes evaluating other metrics such as loop coverage,
                                                                                 condition coverage, etc. Another further work is to investigate
                                                                                 both the code coverage e↵ectiveness and the fault detection
B. RQ2: t-way testing vs. exhaustive testing in the same sizes                   e↵ectiveness of t-way testing and analyze the relation between
                                                                                 them on real software projects.
   To compare the code coverage between t-way testing and
exhaustive testing with the same sizes, we investigate the                                                  Acknowledgments
following metric
                                                                                    The authors would like to thank anonymous referees for
              RCov (T t , EX⇤) = Cov(T t ) / Cov(EX⇤),                           their helpful comments and suggestions to improve this paper.
                                                                                 This work was partly supported by JSPS KAKENHI Grant
which denotes the ratio of code coverage of t-way test suite T t                 Number 16K12415.
over that of a subset, hereafter denoted by EX⇤, of exhaustive
test suite EX whose size is same with T t . In our experiments,                                                  References
we constructed EX⇤ 100 times by randomly selecting |T t | test                    [1] International Standardization Organization, ISO26262: Road vehicles -
cases from exhaustive test suite EX and use the average value                         Functional safety, November 2011.
of the code coverage for the 100 EX⇤.                                             [2] K. Z. Bell and M. A. Vouk. On e↵ectiveness of pairwise methodology
                                                                                      for testing network-centric software. In Proc. of International Conference
   Table XII summarizes the values of RCov (T t , EX⇤) with 1                        on Information and Communication Technology, pages 221–235. IEEE,
t  4 for line coverage and branch coverage for each project                          2005.
and all projects. In the table, we also show the numbers of                       [3] M. N. Borazjany, L. Yu, Y. Lei, R. Kacker, and R. Kuhn. Combinatorial
                                                                                      testing of ACTS: A case study. In Proc. of the 5th International
cases where RCov (T t , EX⇤) 105%, i. e. t-way testing achieves                       Conference on Software Testing, Verification and Validation (ICST),
more than 105% of the coverage obtained by exhaustive testing                         pages 591–600. IEEE, 2012.
with the same size, over the numbers of all cases for projects.                   [4] E. Choi, S. Kawabata, O. Mizuno, C. Artho, and T. Kitamura. Test
                                                                                      e↵ectiveness evaluation of prioritized combinatorial testing: a case study.
Figure 4 presents the box plots for the results of RCov (T t , EX⇤)                   In Proc. of the International Conference on Software Quality, Reliability
for all projects.                                                                     & Security (QRS), pages 61–68. IEEE, 2016.




                                                                            48
                4th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2016)




                                                                      TABLE XII
                          Comparison of code coverage between t-way testing (1  t  4) and exhaustive testing with the same sizes.

                                                                  Line coverage                                          Branch coverage
                                       Proj.
                                                   1-way         2-way         3-way         4-way         1-way         2-way         3-way         4-way
                                       flex     116.45 %     117.26 %      114.45 %       111.04 %      116.65 %     116.12 %      113.65 %      110.57 %
         Avg. of RCov (T t , EX⇤)     grep      109.93 %     108.39 %      105.36 %       103.31 %      111.26 %     110.44 %      107.36 %      104.94 %
       (R⇤ = Cov(T t )/Cov(EX⇤))      make      106.63 %     106.45 %      106.11 %       105.36 %      105.99 %     105.80 %      105.51 %      104.85 %
                                       Avg.     111.26 %     110.95 %      108.79 %       106.64 %      111.61 %     111.08 %      109.04 %      106.90 %
                                       flex       12 / 12       12 / 12       12 / 12       12 / 12       12 / 12       12 / 12       12 / 12       12 / 12
            # (R⇤ 105 %)              grep        12 / 12       12 / 12        2 / 12        2 / 12       12 / 12       12 / 12        2 / 12        2 / 12
              / # all cases           make        10 / 10       10 / 10       10 / 10        6 / 10       10 / 10       10 / 10       10 / 10        2 / 10
                                      Total       34 / 34       34 / 34       24 / 34       20 / 34       34 / 34       34 / 34       24 / 34       16 / 34



 [5] M. B. Cohen, M. B. Dwyer, and J. Shi. Constructing interaction test            [23] A. Yamada, A. Biere, C. Artho, T. Kitamura, and E. Choi. Greedy
     suites for highly-configurable systems in the presence of constraints: A            combinatorial test case generation using unsatisfiable cores. In Proc. of
     greedy approach. IEEE Trans. Software Eng., 34(5):633–650, 2008.                    the 31st International Conference on Automated Software Engineering
 [6] J. Czerwonka. Pairwise testing in the real world: Practical extensions              (ASE), pages 614–624. IEEE/ACM, 2016.
     to test-case senarios. In Proc. of the 24th Pacific Northwest Software         [24] A. Yamada, T. Kitamura, C. Artho, E. Choi, Y. Oiwa, and A. Biere.
     Quality Conference, pages 419–430. Citeseer, 2006.                                  Optimization of combinatorial testing by incremental SAT solving. In
 [7] H. Do, S. Elbaum, and G. Rothermel. Supporting controlled experimen-                Proc. of the 8th International Conference on Software Testing, Verification
     tation with testing techniques: An infrastructure and its potential impact.         and Validation (ICST), pages 1–10. IEEE, 2015.
     Empirical Software Engineering, 10(4):405–435, 2005.                           [25] Z. Zhang, X. Liu, and J. Zhang. Combinatorial testing on id3v2 tags
 [8] G. B. Finelli. NASA software failure characterization experiments.                  of mp3 files. In Proc. of the 5th International Conference on Software
     Reliability Engineering & System Safety, 32(1):155–169, 1991.                       Testing, Verification and Validation (ICST), pages 587–590. IEEE, 2012.
 [9] B. J. Garvin, M. B. Cohen, and M. B. Dwyer. Evaluating improvements            [26] ACTS, Available: http://csrc.nist.gov/groups/SNS/acts/.
     to a meta-heuristic search for constrained interaction testing. Empirical      [27] cloc – Count Lines of Code, Available: http://cloc.sourceforge.net.
     Software Engineering, 16(1):61–102, 2011.                                      [28] CodeCover – an open-source glass-box testing tool, Available:
                                                                                         http://codecover.org.
[10] D. Giannakopoulou, D. Bushnell, J. Schumann, H. Erzberger, and
                                                                                    [29] gcov        –      a      test    coverage       program,        Available:
     K. Heere. Formal testing for separation assurance. Annals of Mathematics
                                                                                         http://gcc.gnu.org/onlinedocs/gcc/Gcov.html.
     and Artificial Intelligence, 63(1):5–30, 2011.
                                                                                    [30] JavaPathfinder, Available: http://babelfish.arc.nasa.gov/trac/jpf.
[11] C. Henard, M. Papadakis, M. Harman, Y. Jia, and Y. Le Traon. Comparing         [31] Pairwise        Independent      Combinatorial        Tool,      Available:
     white-box and black-box test prioritization. In Proc. of the 38th                   http://github.com/Microsoft/pict.
     International Conference on Software Engineering (ICSE), pages 523–            [32] Software-artifact Infrastructure Repository, Available: http://sir.unl.edu/.
     534. ACM, 2016.
[12] Y. Jia, M. B. Cohen, M. Harman, and J. Petke. Learning combinatorial
     interaction testing strategies using hyperheuristic search. In Proc. of the
     37th International Conference on Software Engineering (ICSE), pages
     540–550. IEEE/ACM, 2015.
[13] R. N. Kacker, D. R. Kuhn, Y. Lei, and J. F. Lawrence. Combinatorial test-
     ing for software: An adaptation of design of experiments. Measurement,
     46(9):3745–3752, 2013.
[14] T. Kitamura, A. Yamada, G. Hatayama, C. Artho, E. Choi, N. T. B. Do,
     Y. Oiwa, and S. Sakuragi. Combinatorial testing for tree-structured test
     models with constraints. In Proc. of the International Conference on
     Software Quality, Reliability & Security (QRS), pages 141–150. IEEE,
     2015.
[15] D. R. Kuhn, R. N. Kacker, and Y. Lei. Introduction to combinatorial
     testing. CRC Press, 2013.
[16] D. R. Kuhn, D. R. Wallace, and A. M. Gallo. Software fault interactions
     and implications for software testing. IEEE Trans. Software Eng.,
     30(6):418–421, 2004.
[17] J. Lin, C. Luo, S. Cai, K. Su, D. Hao, and L. Zhang. TCA: An efficient
     two-mode meta-heuristic algorithm for combinatorial test generation.
     In Proc. of the 30th International Conference on Automated Software
     Engineering (ASE), pages 494–505. ACM/IEEE, 2015.
[18] C. Maughan. Test case generation using combinatorial based coverage
     for rich web applications. PhD Thesis. Utah State University, 2012.
[19] C. Montanez, D. R. Kuhn, M. Brady, R. M. Rivello, J. Reyes, and M. K.
     Powers. An application of combinatorial methods to conformance testing
     for document object model events. NISTIR-7773, 2010.
[20] C. Nie and H. Leung. A survey of combinatorial testing. ACM Computing
     Surveys, 43(2):11, 2011.
[21] T. J. Ostrand and M. J. Balcer. The category-partition method for
     specifying and generating functional tests. Commun. ACM, 31(6):676–
     686, 1988.
[22] J. Petke, M. Cohen, M. Harman, and S. Yoo. Practical combinatorial
     interaction testing: Empirical findings on efficiency and early fault
     detection. IEEE Trans. Software Eng., 41(9):901–924, 2015.




                                                                               49