Testing Configuration Knowledge-Bases
                                                         Franz Wotawa and Ingo Pill1


Abstract. Writing tests for configuration knowledge-bases is a dif-            In this paper, our focus is on such faults in configuration
ficult task. One not minor reason is the huge search space. For ex-         knowledge-bases an their consequences. Of course, another source
haustive testing, all possible combinations of configuration parame-        for failure is in the configuration algorithm’s implementation, i.e.,
ters must be considered. In practice, exhaustive testing is thus often      the reasoning engine, itself. While such faults are outside our paper’s
impossible, due to the sheer, exponential, number of combinations.          focus, the generated tests can also be used to test the reasoner.
Consequently it becomes necessary to focus on the most important               Regarding fault detection and isolation, the size of a knowledge-
configurations first. This abstract challenge is well-known in the test-    base is of certain interest. That is, if the knowledge-base itself, or the
ing community, and can be addressed by exploiting combinatorial             configuration space, is very small, exhaustive testing might even be
testing. Combinatorial testing deals with reducing the number of test       feasible and a valid option for specific situations. However, in case
inputs by aiming at exhaustive combinations of parameter subsets.           of huge knowledge-bases or huge configuration spaces, exhaustive
That is, ensuring that a test-suite contains tests covering all value       testing is practically impossible. For instance, and without loosing
combinations for all parameter subsets for (or up to) a given size. In      generality, let us assume the example application of parameter con-
this paper, we formulate the configuration-testing problem and show         figuration. There the purpose is to find a value assignment to all avail-
how combinatorial testing can be used in a corresponding test case          able system parameters, in order to receive a setup that implements
generation process, in order to achieve a huge reduction in the num-        a desired functionality. If we have parameters p1 , . . . , pn , each tak-
ber of required test cases.                                                 ing values from a domain D with size k, an exhaustive search would
                                                                            require us to test kn possible configurations, which, for the obvious
                                                                            reasons, is most likely infeasible for those values for k and n expe-
1    INTRODUCTION                                                           rienced in practice. Therefore, we require effective alternatives that
A configuration, i.e., something that results from a particular ar-         allow us to systematically focus our testing efforts.
rangement of parts or components (according to the Merriam Web-                In system testing, where we often have to test in the context of
ster dictionary2 ), can be considered as a system aggregating specific      alternative “environments’, we suffer from a similar problem. For in-
parts in order to implement a desired functionality or behavior. In         stance, if we want to test a web page, we have to consider various
model-based configuration, we use a knowledge-base in order to rep-         hardware platforms from PCs to smart phones and tablets, a variety
resent those components’ functionality, given user requirements, and        of operating systems, a set of web browsers commonly used, and so
any other knowledge that is necessary for defining or constructing the      on. Testing the web page in the context of all the possible “configu-
system. Such additional knowledge encompasses, for instance, con-           rations” is of course an achy task that requires a lot of resources. An
straints prohibiting physically impossible (and thus conflicting) ar-       empirical study (see e.g., [12]) showed, however, that not all combi-
rangements. Obviously, the outcome of any configuration algorithm           nations of parameter value assignments are necessary for revealing a
depends heavily on the model’s quality. In some sense, quality in this      bug. Rather, it seems sufficient to consider local parameter configu-
case can be considered as being “as close as necessary (and possible)       rations. Implementing the concept of combinatorial testing (see also
to reality”, so that we need to capture the “appropriate” knowledge         Section 4), we aim to cover all local parameter combinations up to,
and do that in the right way.                                               or of, a given size in a test suite. That is, all the combinations for
   In case of faults in the knowledge base, e.g., when we miss some         (all possible) chosen “local” subsets of parameters, which allows us
constraint that prohibits some impossible configuration, a derived          to dramatically reduce the number of required tests. Of course the
configuration might be incorrect for at least some specific scenarios       choice of the subset size directly influences the “locality” of the test
or corner cases. Thus, testing, which is basically unavoidable for ver-     case generation process.
ification and validation problems, is not only essential for hardware          In this paper, we discuss the testing problem for configuration
and programs, but also for knowledge-bases. We certainly have to            knowledge-bases and propose the use of combinatorial testing for
ensure that a configuration behaves as desired. The evidence is even        automated test input generation. We introduce our preliminary def-
stronger when moving from static configuration, e.g., configuring a         initions using a simplified example from the e-vehicle domain, and
product based on user needs, to dynamic configurations where the            furthermore discuss two different testing aspects. First, we consider
system might adapt itself for a certain situation. For example, a robot     testing of different configurations. And second, when considering the
might adapt its control behavior in case of a broken wheel, that is, on     desired functionality as being changeable, there arises the question of
its view of the world that it stores in an internal knowledge-base as       whether there actually is a valid configuration for a certain combina-
foundation for its reasoning. In such cases, a reliable and, to a certain   tion of functionalities.
degree, expected and “safe” behavior has to be ensured.                        Our paper is organized as follows. First, we discuss some related
                                                                            research with a focus on testing of knowledge-based systems in gen-
1 Technische Universität Graz, email: {wotawa,ipill}@ist.tugraz.at
2 http://www.merriam-webster.com/dictionary/configuration
                                                                            eral. Afterwards, we introduce the foundations of configuration using
a running example. We then use the same example to discuss com-             i.e., combinatorial testing, which seems to suit configuration very
binatorial testing. After the introduction into combinatorial testing,      well.
we discuss testing of configuration knowledge-bases in more detail.
Finally, we conclude the paper and outline future research directions.      3   THE CONFIGURATION PROBLEM
                                                                            For illustration purposes, let us consider the following simplified ex-
2   RELATED RESEARCH                                                        ample from the domain of vehicle configurations. In Figure 1, we
Knowledge-based systems are used for various purposes like con-             illustrate an example comprising an electric vehicle that contains an
figuration, diagnosis, and also decision support, e.g., for high-level      electric motor, electric consumers like an air-condition, and a battery
control of systems. For all these application areas, systems have to        that delivers the required electricity. Battery size and other factors,
be predictable, that is, they have to behave as expected and do not         like the driving mode, substantially influence the range of the vehicle.
cause any trouble leading to a loss of resources or even harm peo-          The configuration knowledge base for our example comprises four
ple. Despite this fact, it is interesting to note that there has not been   components, i.e., an air-condition, a motor, the driving mode, and the
a huge number of papers dealing with testing, verification, and val-        battery - each of them offering some options, which then vary from
idation of knowledge-based systems. Robert Plant [17, 18] was one           configuration to configuration. Let us now assume that there are two
of the first dealing with verification, validation, and testing of expert   engine types (standard and powerful), three types of air-condition
systems and knowledge-based systems in general. There is also an            (none, manual, and electronic), two driving modes (leisure and race),
earlier survey available (see [13]) that deals with tools for validation    and three different batteries (type1, type2, type3), each providing a
and verification of knowledge-based systems.                                different electric capacity. Clearly, the configured vehicle’s range de-
   Regarding testing of knowledge-based systems, it is also worth           pends heavily on the battery and actual power consumption. That is,
mentioning El-Korany and colleagues’ work [5], where their focus            for instance, if there is too much power consumption, some partic-
is on the testing methodology. There the authors distinguish differ-        ular range can never be achieved. The range, however, reflects an
ent cases where testing is required, i.e., inference knowledge testing      important part of a customer’s needs. While customer A is satisfied
and task knowledge testing. The objective behind their work was to          when she can drive the car for one day in a city for no more than 100
increase the level of correctness of knowledge-based systems. Other         km, customer B expects his car being able to cover more than 200
work includes [9], where Hartung and Håkansson discuss test au-            km before it has to be recharged. Other customer requirements might
tomation for knowledge-based systems. Their approach works for              concern air-conditioning, or the availability of a particular driving
production rules that are extracted from the knowledge-bases.               mode.
   Hayes and Parzen [10] focused more on the question of “to what
degree a knowledge-based systems fulfills its purpose”, that is, as
indicated in the title of their publication, on achieving the desired
behavior. In order to answer the question about the quality of de-
cisions coming from a knowledge-based system, Hayes and Parzen
introduced a special metric (QUEM) to judge the quality of the solu-
tions. The proposed approach is essential for measuring the overall
performance of a knowledge-based system.
   To the best of our knowledge, there is only little work on test-
ing configuration motors or motors that make use of configuration
methods like recommenders. Felfernig and colleagues [8, 7] discuss
the use of testing, i.e., white-box testing, and development environ-
ments in the context of recommender applications. Other work from
Felfernig and colleagues [6] mainly focuses on the second step of
debugging, i.e., fault localization and correction, but still requires
test cases for finding inconsistencies between the behavior coded in
a knowledge base and the expected behavior, which originates from
knowledge engineers or customers of the configuration system.
   Tiihonen and colleagues [20] described a rule-based configura-
tor and also introduced a more or less model-independent testing                     Figure 1. Configuration problem of an electric vehicle
method. In their approach the configurator is tested using randomly
generated requirements given to the configurator. Besides discussing
the underlying methodology Tiihonen et al. also presents empirical
                                                                               In the following, we discuss the formalization of our e-vehicle
results gained from 4 different configuration models. In contrast to
                                                                            configuration example, but let us introduce the definition of a con-
Tiihonen and colleague the testing approach proposed in this paper
                                                                            figuration problem first.
is not a random testing approach. Moreover, our focus is more on
testing the configuration knowledge-base and not the whole configu-         Definition 1 (Configuration problem) A configuration problem is
rator. Although, the obtained tests can be used later for testing con-      a tuple (SD ∪ REQ, PARTS, MODES) where SD is the system de-
crete implementations.                                                      scription, REQ are the requirements, and PARTS are the configurable
   In our paper, we rely on previous research in the domain of              parts that are allowed to be set to particular modes from MODES.
testing knowledge-based systems, but focus on the specific case of
knowledge-based systems for configuration. We distinguish different            We assume that SD and REQ are first order logic formulae. Other
cases for testing and suggest to use a specific testing methodology,        formalisms might also be used requiring the existence of consistency
checks and reasoning capabilities. Our definition of the configuration      The power consumption of all vehicle parts should never exceed
problem assumes that the functionality or behavior of the parts (from       the available power, so that we add an integrity constraint:
PARTS) are defined in SD for a particular mode (from MODES). For
our e-vehicle example, we consider 4 different parts: the electric mo-               avpow(bat, B) ∧ cons(emot, E) ∧ cons(ac, A) ∧
tor (emot), the air-condition (ac), the driving mode (dm) and the bat-               cons(dm, D) → B > (E + A + D)
tery (bat), i.e., PARTS = {emot, ac, dm, bat}.
                                                                            Finally, we have to map power consumption and available power
   What is missing, is the configuration knowledge and the require-
                                                                            to the vehicle’s maximum distance (without recharging) class.
ments. Regarding the latter, we assume that we want to distinguish
slow acceleration cars (slowacc) from fast acceleration cars (fastacc),          avpow(bat, B) ∧ cons(emot, E) ∧ cons(ac, A) ∧
as well as the availability of air-condition cooling (cooling). More-            cons(dm, D) ∧ B − (E + A + D) > 99 → city
over, a user might specify the maximum distance before recharging,
which might be city for less than or equal 100 km, interurban for dis-           avpow(bat, B) ∧ cons(emot, E) ∧ cons(ac, A) ∧
tances up to 250 km, and max, otherwise. In the following, we depict             cons(dm, D) ∧ B − (E + A + D) > 200 → interurban
SD for our running example. In the system description, we make use
of the predicate mode that assigns a component a certain parameter               avpow(bat, B) ∧ cons(emot, E) ∧ cons(ac, A) ∧
value, cons for fixing the power consumption of a part, and avpow                cons(dm, D) ∧ B − (E + A + D) > 400) → max
for stating the available electrical power for batteries.
                                                                            It is worth noting that the above definition allows to derive differ-
Electric motor: The standard engine provides slow acceleration              ent maximum distances at the same time. If the distance is larger
  only, but draws less electrical power. The powerful motor provides        than 400 city, interurban, and max become valid. This could be
  fast acceleration but consumes more electricity as a downside.            avoided via integrity constraints or chaining to constraints, in or-
                                                                            der to get non-overlapping definitions. However, this definition is
        mode(emot, standard) → (cons(emot, 300) ∧ slowacc)                  intended such as to allow to specify a minimum capability.
        mode(emot, powerful) → (cons(emot, 400) ∧ fastacc)
                                                                             Now let us we define formally what we understand about a con-
Air-condition: If there is no air-condition, then there is no power
                                                                          figuration. Intuitively, a configuration has to do with a mode assign-
  consumption and also no cooling. The manual air-condition draws
                                                                          ment, which corresponds to choosing a certain part, e.g., setting the
  less power than the electronic one. Both provide cooling.
                                                                          battery to type1 means that we want this battery in our configuration.
          mode(ac, none) → (cons(ac, 0) ∧ ¬cooling)
          mode(ac, manual) → (cons(ac, 100) ∧ cooling)                    Definition 2 (Configuration)
          mode(ac, electronic) → (cons(ac, 150) ∧ cooling)                Let (SD∪REQ, PARTS, MODES) be a configuration problem. A con-
                                                                          figuration is an assignment of a particular mode to each of the parts,
Driving mode: A leisure driver consumes no additional electricity         i.e., a set C is a configuration, if and only if |C| = |PARTS| and
  on top of the power required to drive the motor. A racy driver          ∀ p ∈ PARTS : ∃ mode(p, m) ∈ C where m ∈ MODES.
  draws more power due to higher acceleration.
                                                                             As this definition ignores REQ and SD, it induces the whole con-
                 mode(dm, leisure) → (cons(dm, 0))                        figuration space. Being interested only in valid configurations, i.e.,
                 mode(dm, race) → (cons(dm, 100))                         those that do not contradict REQ and SD, we define them as follows:
Battery: The three battery types have varying capacities.
                                                                          Definition 3 (Valid configuration) Let C be a configuration for the
                mode(bat, type1) → (avpow(bat, 450))                      configuration problem (SD ∪ REQ, PARTS, MODES). Configuration
                mode(bat, type2) → (avpow(bat, 600))                      C is valid if and only if SD ∪ REQ ∪ C is satisfiable.
                mode(bat, type3) → (avpow(bat, 800))
                                                                            Clearly Definition 3 does not ensure that a valid configuration
Other constraints: There are several further domain-dependent             meets the requirements. Hence, we define a suitable configuration.
  constraints: The racy driving mode can only be obtained when
  having a powerful motor, i.e., it is not possible to have fast accel-   Definition 4 (Suitable configuration) Let C be a valid configura-
  eration without the right motor.                                        tion for the configuration problem (SD ∪ REQ, PARTS, MODES). C
                                                                          is suitable iff the requirements REQ can be derived from the system
                    ¬ (mode(dm, race) ∧ ¬fastacc)
                                                                          description and the configuration, i.e., SD ∪ C |= REQ.
  In addition, we have to ensure that a component cannot be in more
  than one mode simultaneously,, and that some available functions           The user requirements have a direct impact on the space of suit-
  are in contradiction, e.g., slow and fast acceleration.                 able configurations. Clearly, REQ = {city} has more suitable con-
                                                                          figurations than the requirements REQ = {city, cooling}. The given
        ¬ (mode(emot, standard) ∧ mode(emot, manual))                     definitions of configuration are close to those of reconfiguration and
        ¬ (mode(none, standard) ∧ mode(manual, manual))                   parameter configuration, e.g. from [19, 15]. However, to some extent,
        ¬ (mode(none, standard) ∧ mode(manual, electronic))               generative configuration, e.g., [19], can also be handled, when as-
        ¬ (mode(none, manual) ∧ mode(manual, electronic))                 suming a boundary for involved components and connections. Each
        ¬ (mode(dm, leisure) ∧ mode(dm, race))                            potential component and connection has to be defined in the sys-
        ¬ (mode(bat, type1) ∧ mode(bat, type2))                           tem description having two modes. One is for indicating the use of
        ¬ (mode(bat, type1) ∧ mode(bat, type3))                           a component or connection in a configuration, and the other for stat-
        ¬ (mode(bat, type2) ∧ mode(bat, type3))                           ing that the component or connection is not used. In addition, some
        ¬ (slowacc ∧ fastacc)                                             integrity constraints have to be specified, in order to ensure that in a
final configuration there is no connection without the corresponding       methodology. Even more details are offered in the next section. For
components. Note that for larger systems and configurations such a         brevity let us consider the component modes as inputs:
bounded variant might lead to a description that cannot be used for
computing configurations in reasonable time, which does not contra-                           input      values
dict the observation that the given definitions - in principle - allow                        emot       standard, powerf ul
for specifying different configuration problems.                                              ac         none, manual, electronic
   Let us come back to our running example and the definition of                              dm         leisure, race
suitable configurations. When stating REQ = {city, cooling} we can                            bat        type1, type2, type3
obtain the suitable configuration                                             When searching for all two-way combinations, i.e., combinations
                                                                         of values for two particular inputs, we would obtain results similar
                   mode(emot, standard), mode(ac, manual),
                                                                           or equivalent to the one depicted in Table 1. There for each combina-
                     mode(dm, leisure), mode(bat, type1)
                                                                           tion of two inputs, all possible value combinations are given, which
but also                                                                   results in 9 test cases. For comparison reasons, we also depict the test
                                                                         cases for strength 3 in Table 2. It is worth noting that, when consid-
               mode(emot, powerful), mode(ac, electronic),                 ering all combinations, we would finally obtain 36 test cases.
                  mode(dm, leisure), mode(bat, type2)
                                                                                  Table 1.    All two-way interactions for the e-vehicle example
among others. The configuration
                                                            
                    mode(emot, powerful), mode(ac, none),
                     mode(dm, leisure), mode(bat, type1)                                     emot          ac             dm          bat
                                                                                     1       powerf ul     none           race        type1
would be a valid one, but is not suitable as cooling is not established.             2       standard      none           leisure     type2
For computing configurations meeting requirements, we refer the in-                  3       powerf ul     none           leisure     type3
terested reader to [19] or [15].                                                     4       standard      manual         race        type1
   What remains now, is the question whether the formalized config-                  5       powerf ul     manual         leisure     type2
                                                                                     6       standard      manual         race        type3
uration problem represents reality and results in the desired configu-               7       powerf ul     electronics    leisure     type1
rations. Hence, we need to test the configuration knowledge-base. To                 8       standard      electronics    race        type2
this end, in the next section we introduce a certain testing methodol-               9       powerf ul     electronics    race        type3
ogy suitable for this task.

                                                                              The significant advantage of combinatorial testing is that the num-
4   COMBINATORIAL TESTING                                                  ber of test cases can be reduced while still considering combinations
Combinatorial testing is a method for the algorithmic computation of       of input values. In order to implement combinatorial testing as a test
tests and in particular test input data for a system under test (SUT).     case generation method, the following steps are required:
   An answer to the question of how much test input data we should
generate in order to reveal undetected faults is of great practical im-    1. First, someone has to write a model of the input space, comprising
portance. As mentioned before, for n inputs with k possible values,           the inputs and their value domains.
an exhaustive approach would require us to test kn combinations.           2. The combinatorial design procedure takes this input space and
When missing an important combination, so that a fault remains in             generates an array where each row is simple a test case describing
the source code, the consequences might be catastrophic, especially           the value for each input considering the given strength t.
for safety-critical systems. Recently, researchers suggested not to        3. Every row is delivered back as a single test case describing poten-
consider all input value combinations, but only certain ones focusing         tial input data (but not the expected output).
on an exhaustive “local” search (see e.g. [3, 23, 24]). The underlying        Another benefit of combinatorial testing is that Steps 2 and 3 can
idea is that while input combinations might be required in order to        be automated completely. There are tools available for computing
reveal a bug, in practice, we can restrict the size of considered com-     the test cases, e.g., the ACTS combinatorial test generation tool [16].
binations and consider multiple “local” combinations in a test case.       ACTS has been developed jointly by the US National Institute Stan-
   Combinatorial testing formalizes this idea of considering a certain     dards and Technology (NIST) and the University of Texas at Arling-
combination of inputs - in our case parameter subsets of size -, e.g., 2   ton and currently has more than 1,400 individual and corporate users.
or 3, where all possible value combinations are tried. Regarding the          A drawback of combinatorial testing is that only test input data is
considered combination of inputs we distinguish the strength of com-       generated. Hence, the oracle problem, i.e., classifying the computed
binatorial testing, e.g., strength 2 or 3. Each strength t (where t ≥ 2)   output as being correct or not, still remains for combinatorial testing.
requires that each t-wise tuple of values of the different system pa-      However, at least, combinatorial testing offers a structured and well
rameters is covered at least once in the test suite, which reduces the     defined method for test input data generation that can be effectively
necessary number of test cases substantially. Of course, the strength t    used in practice.
could also be set to the maximum in order to do an exhaustive search.         Regarding an algorithm for computing test cases using combinato-
The natural question is then if this method is sufficient. In [12], for    rial testing, we refer the reader to the available literature. The under-
example, the authors report on an empirical study considering vari-        lying data structure for computing the test is the mixed-level covering
ous programs from different domains and showed that it was enough          array which can be defined as follows (see [4] ).
to consider six-way interactions in order to detect all faults.
   We now illustrate combinatorial testing in the context of our run-      Definition 5 A mixed-level covering array which we will denote as
ning example, where we restrict our focus purely on the testing            MCA(t, k, (g1 , . . . , gk )) is an k × N array in which the entries of
                                                                                 Testing is based on test cases. We formalize test cases in a simpli-
      Table 2.   All three-way interactions for the e-vehicle example
                                                                              fied form appropriate for our purposes.

                                                                              Definition 6 (Test case) A test case for a SUT is a tuple (IN, OUT)
                 emot          ac              dm          bat                where IN is a formalization of the input values, and OUT defines the
           1     standard      none            leisure     type1              expected output when executing the SUT using IN.
           2     powerf ul     none            race        type1
           3     standard      none            race        type2                  We say that a test case (IN, OUT) is a passing test case for a SUT
           4     powerf ul     none            leisure     type2              if the execution of SUT using IN returns an output that is not in con-
           5     standard      none            leisure     type3              tradiction with OUT. Otherwise, we say that the test case is a failing
           6     powerf ul     none            race        type3              test case. A test suite is a set of test cases. In order to test a SUT, we
           7     standard      manual          race        type1
           8     powerf ul     manual          leisure     type1              are interested in having a test suite that comprises at least one fail-
           9     standard      manual          leisure     type2              ing test case. If there is no such test case, we assume the SUT to be
          10     powerf ul     manual          race        type2              correct with respect to the test suite.
          11     standard      manual          race        type3                  After discussing some basic testing principles, the question re-
          12     powerf ul     manual          leisure     type3
          13     standard      electronics     leisure     type1
                                                                              mains of how to actually test configuration knowledge-bases. Ac-
          14     powerf ul     electronics     race        type1              cording to Definition 1, the formalized knowledge covers the system
          15     standard      electronics     race        type2              description SD and the requirements REQ. What we actually want
          16     powerf ul     electronics     leisure     type2              to ensure is that when querying the knowledge-base using a certain
          17     standard      electronics     leisure     type3              request, we obtain the expected result. Hence, for testing purposes,
          18     powerf ul     electronics     race        type3
                                                                              we are interested mainly in testing SD and not REQ.
                                                                                  There are some additional aspects when discussing testing con-
the i-th row arise from an alphabet of size gi . Let {i1 , . . . , it } ⊆     figuration knowledge-bases. For testing ordinary programs, the role
{1, . . . , k} and considerQthe subarray of size t × N by selecting rows      of input and output variables is well known. For configuration prob-
                             t                                                lems, someone might, however, also consider REQ as input and the
of the MCA. There are i=1 gi possible t-tuples that could appear
as columns, and an MCA requires that each appears at least once.              set of suitable configurations SCONF as output. It might also be de-
The parameter t is also called the strength of the MCA.                       sirable to ask for the requests to be obtained when assuming a certain
                                                                              configuration. In terms of configurations, most likely there are some
   The mixed level covering array defines all possible combinations           valid configurations that are not suitable. Others are not even valid.
of t inputs having a finite value domain of gi for an input i. It is worth    According to Myers 5th testing principle, however, we also have to
noting that in combinatorial testing we have to have finite domains           check the invalid and unexpected cases.
(which is perfectly fine in case of configuration knowledge-bases).               We now formalize these two testing problems. Let us assume a
We might also remark that the technique for discretizing the param-           system description SD that describes configuration knowledge re-
eter values is referred to as input parameter modeling in combina-            garding PARTS and MODES. The first testing problem is for check-
torial testing [11]. After discussing combinatorial testing, we show          ing whether the derived suitable configurations are the correct ones.
how combinatorial testing can be effectively used for testing config-
uration knowledge-bases in the next section.                                  Definition 7 (Testing configuration) The testing configuration
                                                                              problem concerns testing the knowledge-base in its capabilities for
                                                                              deriving the expected configurations, and can be characterized as
5    TESTING KNOWLEDGE-BASES                                                  follows:
The obvious purpose of testing is to reveal a SUT’s faults. To this           Input: SD, PARTS, and MODES
end, the SUT is executed using certain input values, and the resulting        Objective: Finding test cases of the form (REQ, SCONF), where
behavior is logged. This behavior is compared with the expected one.            REQ are requirements, and CONF is a set of expected configura-
In case of deviations, a fault is detected and we certainly get inter-          tions for the configuration problem (SD∪REQ, PARTS, MODES).
ested in the corresponding root causes. In his ACM Turing Lecture               Note that CONF might be empty in case of inconsistencies. Other-
1972, Edsger W. Dijkstra mentioned that ”program testing can be                 wise, CONF is expected to comprise suitable configurations only.
a very effective way to show the presence of bugs, but is hopelessly
inadequate for showing their absence”. Hence, someone might be                   The second testing problem is related to checking whether given
interested in efficiently detecting deviations, i.e., finding the right in-   configurations lead to the derivation of the correct requirements, if
put that causes the misbehavior. Finding such an input might be like          there are any.
finding a needle in a haystack. Testing methods like combinational
                                                                              Definition 8 (Testing requirement derivation) The testing re-
testing help in this respect.
                                                                              quirement derivation problem captures the case where we are
   For a more detailed view on testing, we recommend Myers
                                                                              interested in testing the capabilities of the knowledge base to derive
book [14], where he - aside covering other issues - introduces 10
                                                                              requirements from conflicts. It can be characterized as follows:
testing principles. In the 5th one, Myers mentions that ”test cases
must be written for input conditions that are invalid and unexpected          Input: SD, PARTS, and MODES
as well as for those that are valid and expected”. Hence, there is a          Objective: Finding test cases, of the form (C, R) where C is a con-
requirement not only to test for expected results, but also to execute          figuration and R is the expected result. Obviously, R might be
a SUT using input values for which the SUT was not designed. In                 ⊥ in case the configuration itself lead to an inconsistency, i.e.,
case of a configuration knowledge base, this means that we have to              SD ∪ C |= ⊥. R might comprises all requests for which C is a
use also queries where we expect no solution due to inconsistencies             suitable configuration, or might be empty if there are no requests
arising during resolution.                                                      for which C is suitable.
   In order to solve both configuration specific testing problems, we      sidering 2-way combinations. If no FAIL is obtained, the process
need a method for computing input values, i.e., requirements respec-       can be continued for 3-way combinations or even stronger ones, of
tively configurations, and the resulting values. For the first part, we    course re-using previously obtained classifications. The process can
can easily make use of combinatorial testing with the advantage            definitely stop when strength t in combinatorial testing (for obtain-
of a reduced number of test cases to be computed while still re-           ing t-way combinations) reaches the number of variables used. Ex-
taining the capabilities for revealing a faulty behavior. Computing        perimental surveys suggest that it seems enough to consider 6-way
the expected outcome in an automated fashion, however, is not di-          combinations (see [12]).
rectly possible, because of a missing specification. Hence, we have           Algorithm 1 summarizes the steps necessary for computing a test
to rely on the knowledge engineer to provide this information. In the      suite in order to solve the testing configuration problem. The algo-
testing community, this problem is referred to as the oracle prob-         rithm for solving testing requirement derivation problem is very sim-
lem. There are some related methods like model-based testing (e.g.,        ilar. Algorithm 2 shows the necessary steps. The only differences are
see [22, 21]) or metamorphic testing (e.g., see [1, 2]). The latter uses   in the for-loop of the algorithm, where we have to take care of the
symmetries in the functions or systems to be tested in order to gain       different situations. Both algorithms terminate assuming a finite set
information about the correct behavior. For example, when testing          of requirements and configurations. When ignoring the time required
the sinus function implementation, we can make use of the property         for theorem prover, computing a configuration, and user interaction,
sin(x) = sin(2π + x). If available, such techniques can be also used       the time required for executions is mainly bound by the time required
for testing configuration knowledge-bases. However, in the following       for combinatorial testing.
we discuss the overall testing process ignoring metamorphic testing.
                                                                           Algorithm 2 TEST REQ(SD, PARTS, MODES, CM)
Algorithm 1 TEST CONF(SD, PARTS, MODES, CM)
                                                                           Input: A system description SD, its component set PARTS, their
Input: A system description SD, its component set PARTS, their             modes MODES, and a combinatorial testing model CM for configu-
modes MODES, and a combinatorial testing model CM for require-             rations.
ments.                                                                     Output: A test suite TS where also the result of the test is stored for
Output: A test suite TS where also the result of the test is stored for    each test case
each test case
                                                                            1: TS := ∅
 1: TS := ∅                                                                 2: t := 2
 2: t := 2                                                                  3: flag := FALSE
 3: flag := FALSE                                                           4: repeat
 4: repeat                                                                  5:    Call the combinatorial testing algorithm using CM and t and
 5:    Call the combinatorial testing algorithm using CM and t and               store the result in T .
      store the result in T .                                               6:   for all t ∈ T do
 6:   for all t ∈ T do                                                      7:      Convert t to its corresponding configuration representation
 7:      Convert t to its corresponding requirements representation                 C.
         REQ.                                                               8:      if SD ∪ C |= ⊥ then
 8:      Call     the     configuration   engine   on     (SD ∪             9:          R := ⊥.
         REQ, PARTS, MODES) and store the result in SCONF.                 10:      else
 9:      Present REQ and SCONF to the user for obtaining a classi-         11:          Call the the theorem prover with input SD ∪ C and store
         fication UC ∈ {PASS, FAIL, ?}                                                  the derivable requirements in R.
10:      if UC = FAIL then                                                 12:      end if
11:         Ask the user for SCONF                                         13:      Present C and R to the user for obtaining a classification
12:         flag = TRUE                                                             UC ∈ {PASS, FAIL, ?}
13:      end if                                                            14:      if UC = FAIL then
14:      TS := TS ∪ {(REQ, SCONF, UC)}                                     15:          Ask the user for R
15:   end for                                                              16:          flag = TRUE
16:   t := t + 1                                                           17:      end if
17: until f lag or no more t-way combinations are possible                 18:      TS := TS ∪ {(C, R, UC)}
18: return TS                                                              19:   end for
                                                                           20:   t := t + 1
                                                                           21: until flag or no more t-way combinations are possible
   In the proposed testing methodology for configuration knowledge-
                                                                           22: return TS
bases, we make use of combinatorial testing for generating the inputs
for both problems, the testing configuration as well as the testing re-
quirement derivation problem. We use these inputs, and a configura-           Finally, it is worth discussing the computation of combinatorial
tion engine (respectively a theorem prover) for generating the current     tests in Algorithm 1 and Algorithm 2. For Algorithm 2, we al-
output. The input and the corresponding output is given to the user        ready computed the test cases in the previous section. See, for ex-
(e.g., the knowledge engineer) for classifying the result as FAIL or       ample, Table 1 for all two-way combinations. There, test case 4
PASS. Note that we also have to consider that the user has no clear        would lead to an inconsistency when calling the theorem prover,
understanding of the expected outcome. In this case, the classifica-       because the standard motor would lead to slowacc which contra-
tion inconclusive (i.e., ?) can be used. This test input generation and    dicts the rules ¬(mode(dm, race) ∧ ¬fastacc) in combination with
classification process that keeps the user in the loop, is started con-    ¬(slowacc ∧ fastacc). Hence, we would be able to detect the case
where a knowledge-base is missing some of the mentioned rules.                testing for computing input data needed. We argued that combina-
   For obtaining the combinatorial tests for Algorithm 1, the situation       torial testing is very well suited for configuration testing, because
is a little different (but not much). There, we are interested in require-    of ensuring a good fault detection capability while still reducing the
ment combinations. As discussed before, there might be cases where            number of input combinations to consider. In practice, limited com-
we do not want to specify all requirements. Hence, we have to find            binations, i.e., five- to six-way combinations have turned out to be
a model for the combinatorial testing algorithm where we are able             sufficient for revealing faults that have not been found before. The
to take not care on a certain requirement. For our e-vehicle exam-            question whether, e.g., six-way combinations are enough for config-
ple, we have three different requirement categories: cooling, driving         uration knowledge-base testing, will have to be addressed by future
distance, and acceleration, each of them with the following possible          research and corresponding experiments.
values:                                                                          In future research also the proposed approach has to be em-
                                                                              pirically evaluated. For such an evaluation, large configuration
                input                   values                                knowledge-bases should be used. Moreover, by introducing faults in
                cooling                 true, false,                          the knowledge-bases someone would be able to check, whether the
                driving distance        city, interurban, max,                proposed approach is capable of detecting faults. Ideally, the fault de-
                acceleration            slowacc, fastacc,                     tection capabilities should be compared with other approaches, e.g.,
                                                                              random testing. Another interesting question is due to the testing ca-
   Note that the value is used to indicate that this requirement is cur-      pabilities of existing knowledge-based configuration tools. Do they
rently not active. When using this model as input to the ACTS tool,           support testing? Which testing strategies do they suggestion? These
we are able to obtain 12 combinatorial tests of strength 2 depicted           two questions among others can be answered, when carrying out a
in Table 3. Each row comprises requirements for our configuration             case study with the objective of evaluating existing configuration so-
model. Some of the requirements may lead to suitable configura-               lutions. It is worth noting that we focussed more on the principles
tions, some may not. This clarification has to be performed when              of testing configuration knowledge-bases in this paper and provided
considering the test cases in Algorithm 1.                                    a solution. We leave a detailed empirical analysis of the proposed
                                                                              approach for future research.
    Table 3. All two-way interactions for the requirements of the e-vehicle
                                                                              ACKNOWLEDGEMENTS
                                                                              The research presented in this paper has been carried as part of the
                     driving distance      acceleration   cooling
                                                                              eDAS project funded by the European Commission FP-7 grant agree-
                1    city                  slowacc        false
                2    city                  fastacc                            ment number: 608770.
                3    city                                 true
                4    interurban            slowacc
                5    interurban            fastacc        true                REFERENCES
                6    interurban                           false               [1] T.Y. Chen, S.C. Cheung, and S.M. Yiu, ‘Metamorphic testing: a new
                7    max                   slowacc        true                    approach for generating next test cases’, Technical report, Department
                8    max                   fastacc        false                   of Computer Science, Hong Kong University of Science and Technol-
                9    max                                                          ogy, Hong Kong, (1998). Technical Report HKUST-CS98-01.
               10                          slowacc        true                [2] T.Y. Chen, J. Feng, and T.H. Tse, ‘Metamorphic testing of programs on
               11                          fastacc        false                   partial differential equations: a case study’, in Proceedings of the 26th
               12                                                                 Annual International Computer Software and Applications Conference
                                                                                  (COMPSAC ’02), pp. 327–333, Los Alamitos, CA, (2002). IEEE Com-
                                                                                  puter Society.
                                                                              [3] David M. Cohen, Siddhartha R. Dalal, Michael L. Fredman, and Gard-
   From the results obtained using our running example we are able                ner C. Patton, ‘The AETG system: An approach to testing based
to conclude that combinatorial testing – in principle – can be used               on combinatorial design’, IEEE Trans. Softw. Eng., 23(7), 437–444,
to solve the two testing problems, which correspond to configura-                 (1997).
tion knowledge-bases. These two problems correspond to the two                [4] Charles J. Colbourn, ‘Covering arrays’, in Handbook of Combinato-
different questions someone would ask during and after the devel-                 rial Designs, eds., Charles J. Colbourn and Jeffrey H. Dinitz, Discrete
                                                                                  Mathematics and Its Applications, 361–365, CRC Press, Boca Raton,
opment of configuration knowledge-bases. The first question, deals                Fla., 2nd edn., (2006).
with the challenge of ensuring whether a knowledge-base is able               [5] Abeer El-Korany, Ahmed Rafea, Hoda Baraka, and Saad Eid, ‘A struc-
to derive expected configurations. The second question is related to              tured testing methodology for knowledge-based systems’, in 11th In-
the evaluation whether a knowledge-base allows for deriving con-                  ternational Conference on Database and Expert Systems Applications
                                                                                  (DEXA), pp. 427–436. Springer, (2000).
figurations that fulfill the given requirements. Both questions have          [6] A. Felfernig, G. Friedrich, D. Jannach, and M. Stumptner,
to be addressed within the development of configurators and their                 ‘Consistency-based diagnosis of configuration knowledge bases’, Ar-
knowledge-bases in order to gain trust in their correctness.                      tificial Intelligence, 152(2), 213–234, (2004).
                                                                              [7] A. Felfernig, G. Friedrich, D. Jannach, and M. Zanker, ‘An integrated
                                                                                  environment for the development of knowledge-based recommender
6      CONCLUSION                                                                 applications’, International Journal of Electronic Commerce (IJEC),
                                                                                  11(2), 11–34, (2006).
In this paper, we raised the question of how to test configuration            [8] A. Felfernig, K. Isak, and T. Kruggel, ‘Testing knowledge-based rec-
knowledge-bases. We focused on model-based configuration and de-                  ommender systems’, OEGAI Journal, 4, 12–18, (2005).
fined two testing problems. One for checking whether obtained con-            [9] Ronald Hartung and Anne Håkansson, ‘Automated testing for knowl-
                                                                                  edge based systems’, in Knowledge-Based Intelligent Information and
figurations are in line with the requirements, and the other for test-            Engineering Systems, eds., Bruno Apolloni, RobertJ. Howlett, and
ing whether the correct set of configurations is returned for given               Lakhmi Jain, volume 4692 of Lecture Notes in Computer Science, 270–
requirements. The proposed testing method relies on combinatorial                 278, Springer Berlin Heidelberg, (2007).
[10] Caroline C. Hayes and Michael I. Parzen, ‘Quem: An achievement test       [18] Robert T. Plant, ‘Expert system development and testing: A knowledge
     for knowledge-based systems’, IEEE Transactions on Knowledge and               engineer’s perspective’, Journal of Systems and Software, 19(2), 141–
     Data Engineering, 9(6), 838–847, (November/December 1997).                     146, (1992).
[11] D.R. Kuhn, R.N. Kacker, and Y. Lei, Introduction to Combinatorial         [19] Markus Stumptner, Gerhard Friedrich, and Alois Haselböck, ‘Gen-
     Testing, Chapman & Hall/CRC Innovations in Software Engineering                erative constraint-based configuration of large technical systems’, AI
     and Software Development Series, Taylor & Francis, 2013.                       EDAM, 12, 307–320, (9 1998).
[12] D.R. Kuhn, R.N. Kacker, Y. Lei, and J. Hunter, ‘Combinatorial software    [20] Juha Tiihonen, Timo Soininen, Ilkka Niemelä, and Reijo Sulonen, ‘Em-
     testing’, Computer, 94–96, (August 2009).                                      pirical testing of a weight constraint rule based configurator’, in ECAI
[13] Stephen Murrell and Robert T. Plant, ‘A survey of tools for the valida-        2002 Configuration Workshop, pp. 17–22, (2002).
     tion and verification of knowledge-based systems: 1985-1995’, Deci-       [21] J. Tretmans, ‘Model-based testing and some steps towards test-based
     sion Support Systems, 21(4), 307–323, (1997).                                  modelling’, in Proceedings of the 11th International School on Formal
[14] Glenford J. Myers, The Art of Software Testing, John Wiley & Sons,             Methods for Eternal Networked Software Systems (SFM 2011), (2011).
     Inc., 2 edn., 2004.                                                       [22] M. Utting and B. Legeard, Practical Model-Based Testing - A Tools
[15] Iulia Nica and Franz Wotawa, ‘(re-)configuration of communication              Approach, Morgan Kaufmann Publishers Inc., 2006.
     networks in the context of m2m applications’, in Proceedings of the       [23] Cemal Yilmaz, Myra B Cohen, and Adam A Porter, ‘Covering arrays
     15th Workshop on Configuration, Vienna, Austria, (2013).                       for efficient fault characterization in complex configuration spaces’,
[16] NIST, User Guide for ACTS.             which is online available at            Software Engineering, IEEE Transactions on, 32(1), 20–34, (2006).
     csrc.nist.gov/groups/SNS/acts/documents/acts user guide v2 r1.1.pdf;      [24] Linbin Yu, Yu Lei, R.N. Kacker, and D.R. Kuhn, ‘Acts: A combinatorial
     last visited on June 20th, 2014.                                               test generation tool’, in Software Testing, Verification and Validation
[17] Robert Plant, ‘Rigorous approach to the development of knowledge-              (ICST), 2013 IEEE Sixth International Conference on, pp. 370–375,
     based systems’, Knowl.-Based Syst., 4(4), 186–196, (1991).                     (2013).