Testing Configuration Knowledge-Bases Franz Wotawa and Ingo Pill1 Abstract. Writing tests for configuration knowledge-bases is a dif- In this paper, our focus is on such faults in configuration ficult task. One not minor reason is the huge search space. For ex- knowledge-bases an their consequences. Of course, another source haustive testing, all possible combinations of configuration parame- for failure is in the configuration algorithm’s implementation, i.e., ters must be considered. In practice, exhaustive testing is thus often the reasoning engine, itself. While such faults are outside our paper’s impossible, due to the sheer, exponential, number of combinations. focus, the generated tests can also be used to test the reasoner. Consequently it becomes necessary to focus on the most important Regarding fault detection and isolation, the size of a knowledge- configurations first. This abstract challenge is well-known in the test- base is of certain interest. That is, if the knowledge-base itself, or the ing community, and can be addressed by exploiting combinatorial configuration space, is very small, exhaustive testing might even be testing. Combinatorial testing deals with reducing the number of test feasible and a valid option for specific situations. However, in case inputs by aiming at exhaustive combinations of parameter subsets. of huge knowledge-bases or huge configuration spaces, exhaustive That is, ensuring that a test-suite contains tests covering all value testing is practically impossible. For instance, and without loosing combinations for all parameter subsets for (or up to) a given size. In generality, let us assume the example application of parameter con- this paper, we formulate the configuration-testing problem and show figuration. There the purpose is to find a value assignment to all avail- how combinatorial testing can be used in a corresponding test case able system parameters, in order to receive a setup that implements generation process, in order to achieve a huge reduction in the num- a desired functionality. If we have parameters p1 , . . . , pn , each tak- ber of required test cases. ing values from a domain D with size k, an exhaustive search would require us to test kn possible configurations, which, for the obvious reasons, is most likely infeasible for those values for k and n expe- 1 INTRODUCTION rienced in practice. Therefore, we require effective alternatives that A configuration, i.e., something that results from a particular ar- allow us to systematically focus our testing efforts. rangement of parts or components (according to the Merriam Web- In system testing, where we often have to test in the context of ster dictionary2 ), can be considered as a system aggregating specific alternative “environments’, we suffer from a similar problem. For in- parts in order to implement a desired functionality or behavior. In stance, if we want to test a web page, we have to consider various model-based configuration, we use a knowledge-base in order to rep- hardware platforms from PCs to smart phones and tablets, a variety resent those components’ functionality, given user requirements, and of operating systems, a set of web browsers commonly used, and so any other knowledge that is necessary for defining or constructing the on. Testing the web page in the context of all the possible “configu- system. Such additional knowledge encompasses, for instance, con- rations” is of course an achy task that requires a lot of resources. An straints prohibiting physically impossible (and thus conflicting) ar- empirical study (see e.g., [12]) showed, however, that not all combi- rangements. Obviously, the outcome of any configuration algorithm nations of parameter value assignments are necessary for revealing a depends heavily on the model’s quality. In some sense, quality in this bug. Rather, it seems sufficient to consider local parameter configu- case can be considered as being “as close as necessary (and possible) rations. Implementing the concept of combinatorial testing (see also to reality”, so that we need to capture the “appropriate” knowledge Section 4), we aim to cover all local parameter combinations up to, and do that in the right way. or of, a given size in a test suite. That is, all the combinations for In case of faults in the knowledge base, e.g., when we miss some (all possible) chosen “local” subsets of parameters, which allows us constraint that prohibits some impossible configuration, a derived to dramatically reduce the number of required tests. Of course the configuration might be incorrect for at least some specific scenarios choice of the subset size directly influences the “locality” of the test or corner cases. Thus, testing, which is basically unavoidable for ver- case generation process. ification and validation problems, is not only essential for hardware In this paper, we discuss the testing problem for configuration and programs, but also for knowledge-bases. We certainly have to knowledge-bases and propose the use of combinatorial testing for ensure that a configuration behaves as desired. The evidence is even automated test input generation. We introduce our preliminary def- stronger when moving from static configuration, e.g., configuring a initions using a simplified example from the e-vehicle domain, and product based on user needs, to dynamic configurations where the furthermore discuss two different testing aspects. First, we consider system might adapt itself for a certain situation. For example, a robot testing of different configurations. And second, when considering the might adapt its control behavior in case of a broken wheel, that is, on desired functionality as being changeable, there arises the question of its view of the world that it stores in an internal knowledge-base as whether there actually is a valid configuration for a certain combina- foundation for its reasoning. In such cases, a reliable and, to a certain tion of functionalities. degree, expected and “safe” behavior has to be ensured. Our paper is organized as follows. First, we discuss some related research with a focus on testing of knowledge-based systems in gen- 1 Technische Universität Graz, email: {wotawa,ipill}@ist.tugraz.at 2 http://www.merriam-webster.com/dictionary/configuration eral. Afterwards, we introduce the foundations of configuration using a running example. We then use the same example to discuss com- i.e., combinatorial testing, which seems to suit configuration very binatorial testing. After the introduction into combinatorial testing, well. we discuss testing of configuration knowledge-bases in more detail. Finally, we conclude the paper and outline future research directions. 3 THE CONFIGURATION PROBLEM For illustration purposes, let us consider the following simplified ex- 2 RELATED RESEARCH ample from the domain of vehicle configurations. In Figure 1, we Knowledge-based systems are used for various purposes like con- illustrate an example comprising an electric vehicle that contains an figuration, diagnosis, and also decision support, e.g., for high-level electric motor, electric consumers like an air-condition, and a battery control of systems. For all these application areas, systems have to that delivers the required electricity. Battery size and other factors, be predictable, that is, they have to behave as expected and do not like the driving mode, substantially influence the range of the vehicle. cause any trouble leading to a loss of resources or even harm peo- The configuration knowledge base for our example comprises four ple. Despite this fact, it is interesting to note that there has not been components, i.e., an air-condition, a motor, the driving mode, and the a huge number of papers dealing with testing, verification, and val- battery - each of them offering some options, which then vary from idation of knowledge-based systems. Robert Plant [17, 18] was one configuration to configuration. Let us now assume that there are two of the first dealing with verification, validation, and testing of expert engine types (standard and powerful), three types of air-condition systems and knowledge-based systems in general. There is also an (none, manual, and electronic), two driving modes (leisure and race), earlier survey available (see [13]) that deals with tools for validation and three different batteries (type1, type2, type3), each providing a and verification of knowledge-based systems. different electric capacity. Clearly, the configured vehicle’s range de- Regarding testing of knowledge-based systems, it is also worth pends heavily on the battery and actual power consumption. That is, mentioning El-Korany and colleagues’ work [5], where their focus for instance, if there is too much power consumption, some partic- is on the testing methodology. There the authors distinguish differ- ular range can never be achieved. The range, however, reflects an ent cases where testing is required, i.e., inference knowledge testing important part of a customer’s needs. While customer A is satisfied and task knowledge testing. The objective behind their work was to when she can drive the car for one day in a city for no more than 100 increase the level of correctness of knowledge-based systems. Other km, customer B expects his car being able to cover more than 200 work includes [9], where Hartung and Håkansson discuss test au- km before it has to be recharged. Other customer requirements might tomation for knowledge-based systems. Their approach works for concern air-conditioning, or the availability of a particular driving production rules that are extracted from the knowledge-bases. mode. Hayes and Parzen [10] focused more on the question of “to what degree a knowledge-based systems fulfills its purpose”, that is, as indicated in the title of their publication, on achieving the desired behavior. In order to answer the question about the quality of de- cisions coming from a knowledge-based system, Hayes and Parzen introduced a special metric (QUEM) to judge the quality of the solu- tions. The proposed approach is essential for measuring the overall performance of a knowledge-based system. To the best of our knowledge, there is only little work on test- ing configuration motors or motors that make use of configuration methods like recommenders. Felfernig and colleagues [8, 7] discuss the use of testing, i.e., white-box testing, and development environ- ments in the context of recommender applications. Other work from Felfernig and colleagues [6] mainly focuses on the second step of debugging, i.e., fault localization and correction, but still requires test cases for finding inconsistencies between the behavior coded in a knowledge base and the expected behavior, which originates from knowledge engineers or customers of the configuration system. Tiihonen and colleagues [20] described a rule-based configura- tor and also introduced a more or less model-independent testing Figure 1. Configuration problem of an electric vehicle method. In their approach the configurator is tested using randomly generated requirements given to the configurator. Besides discussing the underlying methodology Tiihonen et al. also presents empirical In the following, we discuss the formalization of our e-vehicle results gained from 4 different configuration models. In contrast to configuration example, but let us introduce the definition of a con- Tiihonen and colleague the testing approach proposed in this paper figuration problem first. is not a random testing approach. Moreover, our focus is more on testing the configuration knowledge-base and not the whole configu- Definition 1 (Configuration problem) A configuration problem is rator. Although, the obtained tests can be used later for testing con- a tuple (SD ∪ REQ, PARTS, MODES) where SD is the system de- crete implementations. scription, REQ are the requirements, and PARTS are the configurable In our paper, we rely on previous research in the domain of parts that are allowed to be set to particular modes from MODES. testing knowledge-based systems, but focus on the specific case of knowledge-based systems for configuration. We distinguish different We assume that SD and REQ are first order logic formulae. Other cases for testing and suggest to use a specific testing methodology, formalisms might also be used requiring the existence of consistency checks and reasoning capabilities. Our definition of the configuration The power consumption of all vehicle parts should never exceed problem assumes that the functionality or behavior of the parts (from the available power, so that we add an integrity constraint: PARTS) are defined in SD for a particular mode (from MODES). For our e-vehicle example, we consider 4 different parts: the electric mo- avpow(bat, B) ∧ cons(emot, E) ∧ cons(ac, A) ∧ tor (emot), the air-condition (ac), the driving mode (dm) and the bat- cons(dm, D) → B > (E + A + D) tery (bat), i.e., PARTS = {emot, ac, dm, bat}. Finally, we have to map power consumption and available power What is missing, is the configuration knowledge and the require- to the vehicle’s maximum distance (without recharging) class. ments. Regarding the latter, we assume that we want to distinguish slow acceleration cars (slowacc) from fast acceleration cars (fastacc), avpow(bat, B) ∧ cons(emot, E) ∧ cons(ac, A) ∧ as well as the availability of air-condition cooling (cooling). More- cons(dm, D) ∧ B − (E + A + D) > 99 → city over, a user might specify the maximum distance before recharging, which might be city for less than or equal 100 km, interurban for dis- avpow(bat, B) ∧ cons(emot, E) ∧ cons(ac, A) ∧ tances up to 250 km, and max, otherwise. In the following, we depict cons(dm, D) ∧ B − (E + A + D) > 200 → interurban SD for our running example. In the system description, we make use of the predicate mode that assigns a component a certain parameter avpow(bat, B) ∧ cons(emot, E) ∧ cons(ac, A) ∧ value, cons for fixing the power consumption of a part, and avpow cons(dm, D) ∧ B − (E + A + D) > 400) → max for stating the available electrical power for batteries. It is worth noting that the above definition allows to derive differ- Electric motor: The standard engine provides slow acceleration ent maximum distances at the same time. If the distance is larger only, but draws less electrical power. The powerful motor provides than 400 city, interurban, and max become valid. This could be fast acceleration but consumes more electricity as a downside. avoided via integrity constraints or chaining to constraints, in or- der to get non-overlapping definitions. However, this definition is mode(emot, standard) → (cons(emot, 300) ∧ slowacc) intended such as to allow to specify a minimum capability. mode(emot, powerful) → (cons(emot, 400) ∧ fastacc) Now let us we define formally what we understand about a con- Air-condition: If there is no air-condition, then there is no power figuration. Intuitively, a configuration has to do with a mode assign- consumption and also no cooling. The manual air-condition draws ment, which corresponds to choosing a certain part, e.g., setting the less power than the electronic one. Both provide cooling. battery to type1 means that we want this battery in our configuration. mode(ac, none) → (cons(ac, 0) ∧ ¬cooling) mode(ac, manual) → (cons(ac, 100) ∧ cooling) Definition 2 (Configuration) mode(ac, electronic) → (cons(ac, 150) ∧ cooling) Let (SD∪REQ, PARTS, MODES) be a configuration problem. A con- figuration is an assignment of a particular mode to each of the parts, Driving mode: A leisure driver consumes no additional electricity i.e., a set C is a configuration, if and only if |C| = |PARTS| and on top of the power required to drive the motor. A racy driver ∀ p ∈ PARTS : ∃ mode(p, m) ∈ C where m ∈ MODES. draws more power due to higher acceleration. As this definition ignores REQ and SD, it induces the whole con- mode(dm, leisure) → (cons(dm, 0)) figuration space. Being interested only in valid configurations, i.e., mode(dm, race) → (cons(dm, 100)) those that do not contradict REQ and SD, we define them as follows: Battery: The three battery types have varying capacities. Definition 3 (Valid configuration) Let C be a configuration for the mode(bat, type1) → (avpow(bat, 450)) configuration problem (SD ∪ REQ, PARTS, MODES). Configuration mode(bat, type2) → (avpow(bat, 600)) C is valid if and only if SD ∪ REQ ∪ C is satisfiable. mode(bat, type3) → (avpow(bat, 800)) Clearly Definition 3 does not ensure that a valid configuration Other constraints: There are several further domain-dependent meets the requirements. Hence, we define a suitable configuration. constraints: The racy driving mode can only be obtained when having a powerful motor, i.e., it is not possible to have fast accel- Definition 4 (Suitable configuration) Let C be a valid configura- eration without the right motor. tion for the configuration problem (SD ∪ REQ, PARTS, MODES). C is suitable iff the requirements REQ can be derived from the system ¬ (mode(dm, race) ∧ ¬fastacc) description and the configuration, i.e., SD ∪ C |= REQ. In addition, we have to ensure that a component cannot be in more than one mode simultaneously,, and that some available functions The user requirements have a direct impact on the space of suit- are in contradiction, e.g., slow and fast acceleration. able configurations. Clearly, REQ = {city} has more suitable con- figurations than the requirements REQ = {city, cooling}. The given ¬ (mode(emot, standard) ∧ mode(emot, manual)) definitions of configuration are close to those of reconfiguration and ¬ (mode(none, standard) ∧ mode(manual, manual)) parameter configuration, e.g. from [19, 15]. However, to some extent, ¬ (mode(none, standard) ∧ mode(manual, electronic)) generative configuration, e.g., [19], can also be handled, when as- ¬ (mode(none, manual) ∧ mode(manual, electronic)) suming a boundary for involved components and connections. Each ¬ (mode(dm, leisure) ∧ mode(dm, race)) potential component and connection has to be defined in the sys- ¬ (mode(bat, type1) ∧ mode(bat, type2)) tem description having two modes. One is for indicating the use of ¬ (mode(bat, type1) ∧ mode(bat, type3)) a component or connection in a configuration, and the other for stat- ¬ (mode(bat, type2) ∧ mode(bat, type3)) ing that the component or connection is not used. In addition, some ¬ (slowacc ∧ fastacc) integrity constraints have to be specified, in order to ensure that in a final configuration there is no connection without the corresponding methodology. Even more details are offered in the next section. For components. Note that for larger systems and configurations such a brevity let us consider the component modes as inputs: bounded variant might lead to a description that cannot be used for computing configurations in reasonable time, which does not contra- input values dict the observation that the given definitions - in principle - allow emot standard, powerf ul for specifying different configuration problems. ac none, manual, electronic Let us come back to our running example and the definition of dm leisure, race suitable configurations. When stating REQ = {city, cooling} we can bat type1, type2, type3 obtain the suitable configuration When searching for all two-way combinations, i.e., combinations   of values for two particular inputs, we would obtain results similar mode(emot, standard), mode(ac, manual), or equivalent to the one depicted in Table 1. There for each combina- mode(dm, leisure), mode(bat, type1) tion of two inputs, all possible value combinations are given, which but also results in 9 test cases. For comparison reasons, we also depict the test   cases for strength 3 in Table 2. It is worth noting that, when consid- mode(emot, powerful), mode(ac, electronic), ering all combinations, we would finally obtain 36 test cases. mode(dm, leisure), mode(bat, type2) Table 1. All two-way interactions for the e-vehicle example among others. The configuration   mode(emot, powerful), mode(ac, none), mode(dm, leisure), mode(bat, type1) emot ac dm bat 1 powerf ul none race type1 would be a valid one, but is not suitable as cooling is not established. 2 standard none leisure type2 For computing configurations meeting requirements, we refer the in- 3 powerf ul none leisure type3 terested reader to [19] or [15]. 4 standard manual race type1 What remains now, is the question whether the formalized config- 5 powerf ul manual leisure type2 6 standard manual race type3 uration problem represents reality and results in the desired configu- 7 powerf ul electronics leisure type1 rations. Hence, we need to test the configuration knowledge-base. To 8 standard electronics race type2 this end, in the next section we introduce a certain testing methodol- 9 powerf ul electronics race type3 ogy suitable for this task. The significant advantage of combinatorial testing is that the num- 4 COMBINATORIAL TESTING ber of test cases can be reduced while still considering combinations Combinatorial testing is a method for the algorithmic computation of of input values. In order to implement combinatorial testing as a test tests and in particular test input data for a system under test (SUT). case generation method, the following steps are required: An answer to the question of how much test input data we should generate in order to reveal undetected faults is of great practical im- 1. First, someone has to write a model of the input space, comprising portance. As mentioned before, for n inputs with k possible values, the inputs and their value domains. an exhaustive approach would require us to test kn combinations. 2. The combinatorial design procedure takes this input space and When missing an important combination, so that a fault remains in generates an array where each row is simple a test case describing the source code, the consequences might be catastrophic, especially the value for each input considering the given strength t. for safety-critical systems. Recently, researchers suggested not to 3. Every row is delivered back as a single test case describing poten- consider all input value combinations, but only certain ones focusing tial input data (but not the expected output). on an exhaustive “local” search (see e.g. [3, 23, 24]). The underlying Another benefit of combinatorial testing is that Steps 2 and 3 can idea is that while input combinations might be required in order to be automated completely. There are tools available for computing reveal a bug, in practice, we can restrict the size of considered com- the test cases, e.g., the ACTS combinatorial test generation tool [16]. binations and consider multiple “local” combinations in a test case. ACTS has been developed jointly by the US National Institute Stan- Combinatorial testing formalizes this idea of considering a certain dards and Technology (NIST) and the University of Texas at Arling- combination of inputs - in our case parameter subsets of size -, e.g., 2 ton and currently has more than 1,400 individual and corporate users. or 3, where all possible value combinations are tried. Regarding the A drawback of combinatorial testing is that only test input data is considered combination of inputs we distinguish the strength of com- generated. Hence, the oracle problem, i.e., classifying the computed binatorial testing, e.g., strength 2 or 3. Each strength t (where t ≥ 2) output as being correct or not, still remains for combinatorial testing. requires that each t-wise tuple of values of the different system pa- However, at least, combinatorial testing offers a structured and well rameters is covered at least once in the test suite, which reduces the defined method for test input data generation that can be effectively necessary number of test cases substantially. Of course, the strength t used in practice. could also be set to the maximum in order to do an exhaustive search. Regarding an algorithm for computing test cases using combinato- The natural question is then if this method is sufficient. In [12], for rial testing, we refer the reader to the available literature. The under- example, the authors report on an empirical study considering vari- lying data structure for computing the test is the mixed-level covering ous programs from different domains and showed that it was enough array which can be defined as follows (see [4] ). to consider six-way interactions in order to detect all faults. We now illustrate combinatorial testing in the context of our run- Definition 5 A mixed-level covering array which we will denote as ning example, where we restrict our focus purely on the testing MCA(t, k, (g1 , . . . , gk )) is an k × N array in which the entries of Testing is based on test cases. We formalize test cases in a simpli- Table 2. All three-way interactions for the e-vehicle example fied form appropriate for our purposes. Definition 6 (Test case) A test case for a SUT is a tuple (IN, OUT) emot ac dm bat where IN is a formalization of the input values, and OUT defines the 1 standard none leisure type1 expected output when executing the SUT using IN. 2 powerf ul none race type1 3 standard none race type2 We say that a test case (IN, OUT) is a passing test case for a SUT 4 powerf ul none leisure type2 if the execution of SUT using IN returns an output that is not in con- 5 standard none leisure type3 tradiction with OUT. Otherwise, we say that the test case is a failing 6 powerf ul none race type3 test case. A test suite is a set of test cases. In order to test a SUT, we 7 standard manual race type1 8 powerf ul manual leisure type1 are interested in having a test suite that comprises at least one fail- 9 standard manual leisure type2 ing test case. If there is no such test case, we assume the SUT to be 10 powerf ul manual race type2 correct with respect to the test suite. 11 standard manual race type3 After discussing some basic testing principles, the question re- 12 powerf ul manual leisure type3 13 standard electronics leisure type1 mains of how to actually test configuration knowledge-bases. Ac- 14 powerf ul electronics race type1 cording to Definition 1, the formalized knowledge covers the system 15 standard electronics race type2 description SD and the requirements REQ. What we actually want 16 powerf ul electronics leisure type2 to ensure is that when querying the knowledge-base using a certain 17 standard electronics leisure type3 request, we obtain the expected result. Hence, for testing purposes, 18 powerf ul electronics race type3 we are interested mainly in testing SD and not REQ. There are some additional aspects when discussing testing con- the i-th row arise from an alphabet of size gi . Let {i1 , . . . , it } ⊆ figuration knowledge-bases. For testing ordinary programs, the role {1, . . . , k} and considerQthe subarray of size t × N by selecting rows of input and output variables is well known. For configuration prob- t lems, someone might, however, also consider REQ as input and the of the MCA. There are i=1 gi possible t-tuples that could appear as columns, and an MCA requires that each appears at least once. set of suitable configurations SCONF as output. It might also be de- The parameter t is also called the strength of the MCA. sirable to ask for the requests to be obtained when assuming a certain configuration. In terms of configurations, most likely there are some The mixed level covering array defines all possible combinations valid configurations that are not suitable. Others are not even valid. of t inputs having a finite value domain of gi for an input i. It is worth According to Myers 5th testing principle, however, we also have to noting that in combinatorial testing we have to have finite domains check the invalid and unexpected cases. (which is perfectly fine in case of configuration knowledge-bases). We now formalize these two testing problems. Let us assume a We might also remark that the technique for discretizing the param- system description SD that describes configuration knowledge re- eter values is referred to as input parameter modeling in combina- garding PARTS and MODES. The first testing problem is for check- torial testing [11]. After discussing combinatorial testing, we show ing whether the derived suitable configurations are the correct ones. how combinatorial testing can be effectively used for testing config- uration knowledge-bases in the next section. Definition 7 (Testing configuration) The testing configuration problem concerns testing the knowledge-base in its capabilities for deriving the expected configurations, and can be characterized as 5 TESTING KNOWLEDGE-BASES follows: The obvious purpose of testing is to reveal a SUT’s faults. To this Input: SD, PARTS, and MODES end, the SUT is executed using certain input values, and the resulting Objective: Finding test cases of the form (REQ, SCONF), where behavior is logged. This behavior is compared with the expected one. REQ are requirements, and CONF is a set of expected configura- In case of deviations, a fault is detected and we certainly get inter- tions for the configuration problem (SD∪REQ, PARTS, MODES). ested in the corresponding root causes. In his ACM Turing Lecture Note that CONF might be empty in case of inconsistencies. Other- 1972, Edsger W. Dijkstra mentioned that ”program testing can be wise, CONF is expected to comprise suitable configurations only. a very effective way to show the presence of bugs, but is hopelessly inadequate for showing their absence”. Hence, someone might be The second testing problem is related to checking whether given interested in efficiently detecting deviations, i.e., finding the right in- configurations lead to the derivation of the correct requirements, if put that causes the misbehavior. Finding such an input might be like there are any. finding a needle in a haystack. Testing methods like combinational Definition 8 (Testing requirement derivation) The testing re- testing help in this respect. quirement derivation problem captures the case where we are For a more detailed view on testing, we recommend Myers interested in testing the capabilities of the knowledge base to derive book [14], where he - aside covering other issues - introduces 10 requirements from conflicts. It can be characterized as follows: testing principles. In the 5th one, Myers mentions that ”test cases must be written for input conditions that are invalid and unexpected Input: SD, PARTS, and MODES as well as for those that are valid and expected”. Hence, there is a Objective: Finding test cases, of the form (C, R) where C is a con- requirement not only to test for expected results, but also to execute figuration and R is the expected result. Obviously, R might be a SUT using input values for which the SUT was not designed. In ⊥ in case the configuration itself lead to an inconsistency, i.e., case of a configuration knowledge base, this means that we have to SD ∪ C |= ⊥. R might comprises all requests for which C is a use also queries where we expect no solution due to inconsistencies suitable configuration, or might be empty if there are no requests arising during resolution. for which C is suitable. In order to solve both configuration specific testing problems, we sidering 2-way combinations. If no FAIL is obtained, the process need a method for computing input values, i.e., requirements respec- can be continued for 3-way combinations or even stronger ones, of tively configurations, and the resulting values. For the first part, we course re-using previously obtained classifications. The process can can easily make use of combinatorial testing with the advantage definitely stop when strength t in combinatorial testing (for obtain- of a reduced number of test cases to be computed while still re- ing t-way combinations) reaches the number of variables used. Ex- taining the capabilities for revealing a faulty behavior. Computing perimental surveys suggest that it seems enough to consider 6-way the expected outcome in an automated fashion, however, is not di- combinations (see [12]). rectly possible, because of a missing specification. Hence, we have Algorithm 1 summarizes the steps necessary for computing a test to rely on the knowledge engineer to provide this information. In the suite in order to solve the testing configuration problem. The algo- testing community, this problem is referred to as the oracle prob- rithm for solving testing requirement derivation problem is very sim- lem. There are some related methods like model-based testing (e.g., ilar. Algorithm 2 shows the necessary steps. The only differences are see [22, 21]) or metamorphic testing (e.g., see [1, 2]). The latter uses in the for-loop of the algorithm, where we have to take care of the symmetries in the functions or systems to be tested in order to gain different situations. Both algorithms terminate assuming a finite set information about the correct behavior. For example, when testing of requirements and configurations. When ignoring the time required the sinus function implementation, we can make use of the property for theorem prover, computing a configuration, and user interaction, sin(x) = sin(2π + x). If available, such techniques can be also used the time required for executions is mainly bound by the time required for testing configuration knowledge-bases. However, in the following for combinatorial testing. we discuss the overall testing process ignoring metamorphic testing. Algorithm 2 TEST REQ(SD, PARTS, MODES, CM) Algorithm 1 TEST CONF(SD, PARTS, MODES, CM) Input: A system description SD, its component set PARTS, their Input: A system description SD, its component set PARTS, their modes MODES, and a combinatorial testing model CM for configu- modes MODES, and a combinatorial testing model CM for require- rations. ments. Output: A test suite TS where also the result of the test is stored for Output: A test suite TS where also the result of the test is stored for each test case each test case 1: TS := ∅ 1: TS := ∅ 2: t := 2 2: t := 2 3: flag := FALSE 3: flag := FALSE 4: repeat 4: repeat 5: Call the combinatorial testing algorithm using CM and t and 5: Call the combinatorial testing algorithm using CM and t and store the result in T . store the result in T . 6: for all t ∈ T do 6: for all t ∈ T do 7: Convert t to its corresponding configuration representation 7: Convert t to its corresponding requirements representation C. REQ. 8: if SD ∪ C |= ⊥ then 8: Call the configuration engine on (SD ∪ 9: R := ⊥. REQ, PARTS, MODES) and store the result in SCONF. 10: else 9: Present REQ and SCONF to the user for obtaining a classi- 11: Call the the theorem prover with input SD ∪ C and store fication UC ∈ {PASS, FAIL, ?} the derivable requirements in R. 10: if UC = FAIL then 12: end if 11: Ask the user for SCONF 13: Present C and R to the user for obtaining a classification 12: flag = TRUE UC ∈ {PASS, FAIL, ?} 13: end if 14: if UC = FAIL then 14: TS := TS ∪ {(REQ, SCONF, UC)} 15: Ask the user for R 15: end for 16: flag = TRUE 16: t := t + 1 17: end if 17: until f lag or no more t-way combinations are possible 18: TS := TS ∪ {(C, R, UC)} 18: return TS 19: end for 20: t := t + 1 21: until flag or no more t-way combinations are possible In the proposed testing methodology for configuration knowledge- 22: return TS bases, we make use of combinatorial testing for generating the inputs for both problems, the testing configuration as well as the testing re- quirement derivation problem. We use these inputs, and a configura- Finally, it is worth discussing the computation of combinatorial tion engine (respectively a theorem prover) for generating the current tests in Algorithm 1 and Algorithm 2. For Algorithm 2, we al- output. The input and the corresponding output is given to the user ready computed the test cases in the previous section. See, for ex- (e.g., the knowledge engineer) for classifying the result as FAIL or ample, Table 1 for all two-way combinations. There, test case 4 PASS. Note that we also have to consider that the user has no clear would lead to an inconsistency when calling the theorem prover, understanding of the expected outcome. In this case, the classifica- because the standard motor would lead to slowacc which contra- tion inconclusive (i.e., ?) can be used. This test input generation and dicts the rules ¬(mode(dm, race) ∧ ¬fastacc) in combination with classification process that keeps the user in the loop, is started con- ¬(slowacc ∧ fastacc). Hence, we would be able to detect the case where a knowledge-base is missing some of the mentioned rules. testing for computing input data needed. We argued that combina- For obtaining the combinatorial tests for Algorithm 1, the situation torial testing is very well suited for configuration testing, because is a little different (but not much). There, we are interested in require- of ensuring a good fault detection capability while still reducing the ment combinations. As discussed before, there might be cases where number of input combinations to consider. In practice, limited com- we do not want to specify all requirements. Hence, we have to find binations, i.e., five- to six-way combinations have turned out to be a model for the combinatorial testing algorithm where we are able sufficient for revealing faults that have not been found before. The to take not care on a certain requirement. For our e-vehicle exam- question whether, e.g., six-way combinations are enough for config- ple, we have three different requirement categories: cooling, driving uration knowledge-base testing, will have to be addressed by future distance, and acceleration, each of them with the following possible research and corresponding experiments. values: In future research also the proposed approach has to be em- pirically evaluated. For such an evaluation, large configuration input values knowledge-bases should be used. Moreover, by introducing faults in cooling true, false, the knowledge-bases someone would be able to check, whether the driving distance city, interurban, max, proposed approach is capable of detecting faults. Ideally, the fault de- acceleration slowacc, fastacc, tection capabilities should be compared with other approaches, e.g., random testing. Another interesting question is due to the testing ca- Note that the value is used to indicate that this requirement is cur- pabilities of existing knowledge-based configuration tools. Do they rently not active. When using this model as input to the ACTS tool, support testing? Which testing strategies do they suggestion? These we are able to obtain 12 combinatorial tests of strength 2 depicted two questions among others can be answered, when carrying out a in Table 3. Each row comprises requirements for our configuration case study with the objective of evaluating existing configuration so- model. Some of the requirements may lead to suitable configura- lutions. It is worth noting that we focussed more on the principles tions, some may not. This clarification has to be performed when of testing configuration knowledge-bases in this paper and provided considering the test cases in Algorithm 1. a solution. We leave a detailed empirical analysis of the proposed approach for future research. Table 3. All two-way interactions for the requirements of the e-vehicle ACKNOWLEDGEMENTS The research presented in this paper has been carried as part of the driving distance acceleration cooling eDAS project funded by the European Commission FP-7 grant agree- 1 city slowacc false 2 city fastacc ment number: 608770. 3 city true 4 interurban slowacc 5 interurban fastacc true REFERENCES 6 interurban false [1] T.Y. Chen, S.C. Cheung, and S.M. Yiu, ‘Metamorphic testing: a new 7 max slowacc true approach for generating next test cases’, Technical report, Department 8 max fastacc false of Computer Science, Hong Kong University of Science and Technol- 9 max ogy, Hong Kong, (1998). Technical Report HKUST-CS98-01. 10 slowacc true [2] T.Y. Chen, J. Feng, and T.H. Tse, ‘Metamorphic testing of programs on 11 fastacc false partial differential equations: a case study’, in Proceedings of the 26th 12 Annual International Computer Software and Applications Conference (COMPSAC ’02), pp. 327–333, Los Alamitos, CA, (2002). IEEE Com- puter Society. [3] David M. Cohen, Siddhartha R. Dalal, Michael L. Fredman, and Gard- From the results obtained using our running example we are able ner C. Patton, ‘The AETG system: An approach to testing based to conclude that combinatorial testing – in principle – can be used on combinatorial design’, IEEE Trans. Softw. Eng., 23(7), 437–444, to solve the two testing problems, which correspond to configura- (1997). tion knowledge-bases. These two problems correspond to the two [4] Charles J. Colbourn, ‘Covering arrays’, in Handbook of Combinato- different questions someone would ask during and after the devel- rial Designs, eds., Charles J. Colbourn and Jeffrey H. Dinitz, Discrete Mathematics and Its Applications, 361–365, CRC Press, Boca Raton, opment of configuration knowledge-bases. The first question, deals Fla., 2nd edn., (2006). with the challenge of ensuring whether a knowledge-base is able [5] Abeer El-Korany, Ahmed Rafea, Hoda Baraka, and Saad Eid, ‘A struc- to derive expected configurations. The second question is related to tured testing methodology for knowledge-based systems’, in 11th In- the evaluation whether a knowledge-base allows for deriving con- ternational Conference on Database and Expert Systems Applications (DEXA), pp. 427–436. Springer, (2000). figurations that fulfill the given requirements. Both questions have [6] A. Felfernig, G. Friedrich, D. Jannach, and M. Stumptner, to be addressed within the development of configurators and their ‘Consistency-based diagnosis of configuration knowledge bases’, Ar- knowledge-bases in order to gain trust in their correctness. tificial Intelligence, 152(2), 213–234, (2004). [7] A. Felfernig, G. Friedrich, D. Jannach, and M. Zanker, ‘An integrated environment for the development of knowledge-based recommender 6 CONCLUSION applications’, International Journal of Electronic Commerce (IJEC), 11(2), 11–34, (2006). In this paper, we raised the question of how to test configuration [8] A. Felfernig, K. Isak, and T. Kruggel, ‘Testing knowledge-based rec- knowledge-bases. We focused on model-based configuration and de- ommender systems’, OEGAI Journal, 4, 12–18, (2005). fined two testing problems. One for checking whether obtained con- [9] Ronald Hartung and Anne Håkansson, ‘Automated testing for knowl- edge based systems’, in Knowledge-Based Intelligent Information and figurations are in line with the requirements, and the other for test- Engineering Systems, eds., Bruno Apolloni, RobertJ. Howlett, and ing whether the correct set of configurations is returned for given Lakhmi Jain, volume 4692 of Lecture Notes in Computer Science, 270– requirements. The proposed testing method relies on combinatorial 278, Springer Berlin Heidelberg, (2007). [10] Caroline C. Hayes and Michael I. Parzen, ‘Quem: An achievement test [18] Robert T. Plant, ‘Expert system development and testing: A knowledge for knowledge-based systems’, IEEE Transactions on Knowledge and engineer’s perspective’, Journal of Systems and Software, 19(2), 141– Data Engineering, 9(6), 838–847, (November/December 1997). 146, (1992). [11] D.R. Kuhn, R.N. Kacker, and Y. Lei, Introduction to Combinatorial [19] Markus Stumptner, Gerhard Friedrich, and Alois Haselböck, ‘Gen- Testing, Chapman & Hall/CRC Innovations in Software Engineering erative constraint-based configuration of large technical systems’, AI and Software Development Series, Taylor & Francis, 2013. EDAM, 12, 307–320, (9 1998). [12] D.R. Kuhn, R.N. Kacker, Y. Lei, and J. Hunter, ‘Combinatorial software [20] Juha Tiihonen, Timo Soininen, Ilkka Niemelä, and Reijo Sulonen, ‘Em- testing’, Computer, 94–96, (August 2009). pirical testing of a weight constraint rule based configurator’, in ECAI [13] Stephen Murrell and Robert T. Plant, ‘A survey of tools for the valida- 2002 Configuration Workshop, pp. 17–22, (2002). tion and verification of knowledge-based systems: 1985-1995’, Deci- [21] J. Tretmans, ‘Model-based testing and some steps towards test-based sion Support Systems, 21(4), 307–323, (1997). modelling’, in Proceedings of the 11th International School on Formal [14] Glenford J. Myers, The Art of Software Testing, John Wiley & Sons, Methods for Eternal Networked Software Systems (SFM 2011), (2011). Inc., 2 edn., 2004. [22] M. Utting and B. Legeard, Practical Model-Based Testing - A Tools [15] Iulia Nica and Franz Wotawa, ‘(re-)configuration of communication Approach, Morgan Kaufmann Publishers Inc., 2006. networks in the context of m2m applications’, in Proceedings of the [23] Cemal Yilmaz, Myra B Cohen, and Adam A Porter, ‘Covering arrays 15th Workshop on Configuration, Vienna, Austria, (2013). for efficient fault characterization in complex configuration spaces’, [16] NIST, User Guide for ACTS. which is online available at Software Engineering, IEEE Transactions on, 32(1), 20–34, (2006). csrc.nist.gov/groups/SNS/acts/documents/acts user guide v2 r1.1.pdf; [24] Linbin Yu, Yu Lei, R.N. Kacker, and D.R. Kuhn, ‘Acts: A combinatorial last visited on June 20th, 2014. test generation tool’, in Software Testing, Verification and Validation [17] Robert Plant, ‘Rigorous approach to the development of knowledge- (ICST), 2013 IEEE Sixth International Conference on, pp. 370–375, based systems’, Knowl.-Based Syst., 4(4), 186–196, (1991). (2013).