=Paper= {{Paper |id=Vol-3341/wm4316 |storemode=property |title=Testing Product Configuration Knowledge Bases Declaratively |pdfUrl=https://ceur-ws.org/Vol-3341/WM-LWDA_2022_CRC_4316.pdf |volume=Vol-3341 |authors=Konstantin Herud,Joachim Baumeister |dblpUrl=https://dblp.org/rec/conf/lwa/HerudB22 }} ==Testing Product Configuration Knowledge Bases Declaratively== https://ceur-ws.org/Vol-3341/WM-LWDA_2022_CRC_4316.pdf
Testing Product Configuration Knowledge Bases
Declaratively
Konstantin Herud1 , Joachim Baumeister1,2
1
    denkbares GmbH, Germany
2
    University of Würzburg, Germany


                                         Abstract
                                         Product configuration typically makes use of declarative knowledge to model the properties of complex
                                         products. The development of such product knowledge bases is similar to the development of code
                                         bases. Key challenges include collaboration, maintainability, extensibility, and quality assurance. New
                                         features, requirements, and regulations lead to frequent and error-prone iterations. Analogous to software
                                         engineering, automated testing is critical to ensure the integrity of product knowledge. While the general
                                         NP-complete complexity of configuration problems generates much academic interest, these business-
                                         relevant challenges receive less attention. This paper thus presents ongoing work on quality assurance in
                                         product configuration using regression testing. Therefore we first formally define a novel data structure
                                         for performing the tests. We then explore the challenges of collaboratively engineering testing knowledge
                                         in practice. Finally, we illustrate a grammar to formulate the tests with several application scenarios.

                                         Keywords
                                         Product Configuration, Regression Testing, Declarative Knowledge, Quality Assurance, Collaboration,
                                         Maintainability




1. Motivation
Product configuration describes a broad area that deals with the composition and individualiza-
tion of generic components. A typical example is the selection of a custom computer. Instead
of offering a predefined set of options, for example, customers can instead assemble the exact
computer they want from components such as different processors, cases, and monitors. As a
result, the users are more likely to spend money. The various components are contained in a
product catalog and are subject to certain compatibilities with each other. However, Felfernig et
al. [1] shows that modern product configuration is also used for much more complex problems.
Examples include railway interlocking systems [2], cement plants [3], mobile phone networks
[4], offers, contracts, user manuals, or technical documentation [5], and services like elevator
maintenance [6]. In such cases, simple compatibilities are not sufficient, since, for example, legal
regulations, spatial and temporal requirements, or other physical conditions must be taken into
account. Thus, manual modeling of the domain becomes necessary. Despite the long history
and profitability of product configuration, there are still several issues that make engineering
product knowledge a challenge.


LWDA’22: Lernen, Wissen, Daten, Analysen. October 05–07, 2022, Hildesheim, Germany
$ konstantin.herud@denkbares.com (K. Herud); joachim.baumeister@denkbares.com (J. Baumeister)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
    Some of these challenges are similar to software development. While executing code per-
formantly is important, the real challenge is developing high-quality code in the first place
and maintaining it over a long time. By having different parties making continuous changes
to the code, bugs are easily introduced. The same applies to product configuration. However,
configuration knowledge must be completely free of errors to prevent customers from ordering
products that can neither be manufactured nor sold. A system for automated quality assess-
ment of knowledge integrity is thus essential. One tool to identify newly introduced errors is
regression testing. However, because NP complexity is typically associated with configuration
problems, difficulties exist in developing meaningful tests at all. Unlike in software development,
it is not sufficient to describe a set of expected inputs and outputs. Although expected inputs
are usually well defined, their amount of possible combinations is exponential to the number of
feature values. This quickly exceeds the capacity of hardware and developers. To address this,
our work develops a novel data structure for regression testing in product configuration to deal
with this complexity. Analogous to the development of product knowledge itself, declarative
formulation is a guiding principle. Often, different parties with different expertise maintain the
knowledge. These parties should not be concerned with the details of a procedural, and thus
technical, formulation. Test knowledge is inherently declarative: It formulates what the desired
behavior of functions is, rather than knowing their details of how. This notion leads us to the
idea of test-driven knowledge development. New requirements are first formulated as tests and
thereby documented simultaneously. Based on this, the requirement can be understood in the
long term, and arbitrary refactoring can be performed on the knowledge.
    This paper therefore addresses two questions:

    • How can products be regressively tested despite an exponential set of possible configura-
      tions?
    • How does the collaborative development and maintenance of test knowledge between
      parties of different expertise work?

Our work outlines a vision of a set of methods for quality assurance and collaborative develop-
ment of complex product configuration knowledge bases.
   For this purpose, we first briefly define product knowledge and the configuration process
in general in Section 2. We then present our data structure for formulating test knowledge
in Section 3. In the next Section 4, we take a look at how tests are developed in practice and
their lifecycle. We illustrate this view with several application scenarios in Section 5. Finally,
we conclude our work with a brief look at the related literature and an overview of future
challenges.
2. Product Configuration
The main idea of product configuration is the feature-based personalization of a product. There
are serveral efforts for a general ontology to model the knowledge about a product. Since the set
of possible configurations grows exponentially with the amount of selectable options, it is not
feasible to enumerate all results in a database. Instead, configuration problems are commonly
modeled as constraint satisfaction problems. Here, a set of variables and their domains exists,
i. e., components and their type, or in general, features of a product. This typically involves
the notion of customizing systems out of generic components that form a part-of-hierarchy.
We refer to this as structural knowledge. Furthermore, the interactions of the variables are
modeled with constraints. We subsequently describe this as behavioral knowledge. Finally, we
define the configuration process itself. Common extensions of a general ontology consist of
a hierarchy and dynamic activation of variables. More complex components are composed
in this way recursively from more specific components which in turn are configurable. Since
this hierarchical modeling introduces much complexity with concepts such as partonomy and
taxonomy that are beyond the scope of this work, we stick with the notion of features, i. e.,
configurable aspects of the product. To avoid the irrelevant overhead of this complexity, we
keep our definition as simple as possible to provide the basis for regression testing. We align
ourselves with the definitions in [7].
Definition 1 (Feature). The properties of a product are specified by 𝑛 features. A feature 𝑓 is a
variable defined by its feature type. The type of a feature domain (𝑓 ) determines its non-empty
domain. Primitive feature types describe numeric, boolean, and textual domains. Concrete
feature types define a discrete and finite domain of selectable options.
   We use typewriter font for concrete examples and write features with a capital initial letter.
Domain values are completely capitalized. For example, the color of a car could be described by
a feature Color, where domain (Color) = {BLACK, WHITE, RED}. One reason for using the
term feature is to distinguish between different types that capture different characteristics of a
product. For example, there are also primitive data types that cannot be classified under the term
component. Note that this definition can give rise to a potentially infinite number of possible
product configurations, e. g., by having a feature with domain (𝑓 ) = R. The implication is that
in this case no finite set of configuration exists. Thus no finite set of classical test cases with
expected input and output can cover the entire space of valid configurations.
   Since the individual configurations cannot be explicitly enumerated, they are implicitly
specified by constraints. With the previous definition, only the configurable dimensions of
a product are defined. However, a large part of the knowledge is formed by the constraints
on these dimensions. We call these constraints behavioral knowledge. The set of all possible
configurations results from the cross product of the domains of all features. However, usually
only a fraction of them form valid configurations, i.e., combinations of the components that
can be produced and sold. To capture the set of all valid solutions in this space of all possible
solutions, constraints are required.
Definition 2 (Constraint). A constraint 𝑐 is a function that maps a configuration 𝑋 to a boolean
truth value, i. e.,
                                      𝑐 : 𝑋 → {⊤, ⊥} .                                        (1)
  While the configuration will be discussed in more detail in a moment, in our work we are
mainly interested in first order and propositional as well as arithmetic constraints. Although the
type of constraints is unbounded in theory, usually there are common patterns to describe the
behavior of a product. Note that we abuse notation here to describe constraints 𝑐 as formulas
for which an interpretation 𝐼 exists such that 𝐼 (𝑐) evaluates to true or false.
    1. Allowed or forbidden value combinations of different components, e. g.,
                            (Body = CITY           ∧ Drive = FRONT_WHEEL)
                          ∨ (Body = SPORT          ∧ Drive = REAR_WHEEL)
                          ∨ (Body = OFF_ROAD ∧ Drive = ALL_WHEEL)
      This pattern lists possible feature combinations in table-like disjunctive normal form.
   2. Requirements, that formulate arbitrary conditions that have to be fulfilled, e. g.,
                                         WeightInKG <= 3500

      Although these are in nature similar to combinations, they formulate more concise
      constraints that go beyond (in)equality. They may involve arithmetic, e. g., the “weight of
      of all components must not exceed a certain value”.
   3. Implications which are similar to requirements. Here, a condition must first occur before
      the consequence must be met, e. g.,
                                 Body = SUV → HorsePower >= 100

Other examples outside the scope of this work describe default value assignments, involve
temporal conditions, or concern the presence and absence of components in the hierarchy. The
combination of features, feature types, and constraints of a product forms a knowledge base.
Definition 3 (Knowledge Base). A configuration knowledge base is a triple (𝐹, 𝐷, 𝐶), where 𝐹
is the set of all features, 𝐷 is the set of all feature types, and 𝐶 is a set of constraints over 𝐹 .
  The knowledge base is then used to offer individual configurations to customers.
Definition 4 (Configuration). A configuration 𝑋 is a set of at most one value assignment 𝑥 to
each feature 𝑓 ∈ 𝐹 in a knowledge base (𝐹, 𝐷, 𝐶).
                           𝑋 = {𝑓 = 𝑥 | 𝑓 ∈ 𝐹 ∧ 𝑥 ∈ domain (𝑓 )} .                                (2)
𝑋 is complete, if (3) holds, and valid if (4) holds.
                  complete (𝑋) : ∀𝑓 ∈ 𝐹 : ∃𝑥 ∈ domain (𝑓 ) : (𝑓 = 𝑥) ∈ 𝑋                          (3)
                       valid (𝑋) : ∀𝑐 ∈ 𝐶 : 𝑐 (𝑋) = ⊤                                             (4)
 A complete configuration is therefore merely the finished process of feature binding where
exactly one value exists for each feature. The user requirements 𝑈 are a partial configuration,
where each feature assignment 𝑓 = 𝑥 is explicitly given by the user. Neither complete (𝑈 ) nor
valid (𝑈 ) have to be true. We are interested in the set of solutions 𝑆, where each solution 𝑠 ∈ 𝑆
is complete and valid. Note that 𝑠 ⊇ 𝑈 , since it is completed from 𝑈 . We refer to 𝑈 as satisfiable
if 𝑆 ̸= ∅ and unsatisfiable otherwise. The configuration process is typically a sequence where
𝑈 grows incrementally. The customer starts with 𝑈 = ∅ and can select only valid or arbitrary
values depending on the environment.
3. Regression Testing
Regression testing is commonly understood as the repeated evaluation of test cases to ensure
that modifications in already tested functions do not cause new errors [8]. These new errors
can arise, for example, from fixing old errors, refactoring in general, or by implementing new
requirements and regulations. The term regression is used when a new version does not correctly
maintain existing functionality. For the purpose of identifying these regressions, test cases are
implemented. In product configuration, a strict separation of concern for testing exists in our
work. Testing is for quality assurance of the knowledge base, not the configuration environment
or a reasoning engine. By reasoning engine we refer to a system that is able to infer logical
consequences from a set of asserted axioms and facts. For example, a common choice is Answer
Set Programming (ASP) [9]. Here the axioms are first-order logical rules — the so-called problem
encoding that is used to guide the search. The facts are ground atoms, which describe a problem
instance. These instantiate propositional rules with which the problem is ultimately solved.
However, a central idea in product configuration is the clear separation between knowledge
modeling and reasoning engine. It is inefficient to address all possible reasoning tasks with a
single solving technique, such as providing explanations in the case of failure, enumeration
of models, optimization, or continuous value computations [10]. A decoupling of modeling
and reasoning is therefore essential. In this work, we assume that there exists an arbitrary but
correctly working reasoning engine with proper axioms. The goal is thus to ensure that the
knowledge accurately models the product. This means that the solution space 𝑆 only allows
valid configurations and that customers cannot order invalid ones. A special requirement
resulting from this is that not a single test is allowed to fail.
   Although various synergies exist between knowledge and software engineering, there is a
key difference for regression testing in product configuration. While it is usually possible to
develop tests to cover the entire software, NP complexity generally prevails for constraint-level
configuration problems [11]. This complexity makes it difficult to develop dedicated tests for
the exponential set of allowed and disallowed configurations. Therefore, it is not sufficient
to define a set of valid and invalid user requirements as test cases that are expected to be
satisfiable or unsatisfiable. Instead, we extend the expressive power of the tests to the power
of the configuration ontology itself. This means that the tests are fomulated declaratively as
constraints — equivalently to the development of the product knowledge itself.

Definition 5 (Test Case). A test case is a triple (𝑈, 𝑡, 𝑚), where 𝑈 is a potentially empty set of
feature assignments 𝑓 = 𝑥 : 𝑓 ∈ 𝐹 ∧ 𝑥 ∈ domain (𝑓 ), 𝑡 is a constraint following Definition 2,
and 𝑚 ∈ {universal , existential } is a reasoning mode.

   A novelty here is the introduction of a reasoning mode 𝑚. The two modes universal and
existential specify whether the test constraint must hold in at least one or all of the configu-
rations that can be derived from the user requirements. Instead of developing and executing
separate test cases for each possible configuration, efficient algorithms like CDCL can then be
used to enumerate models [12]. The test constraint can thus be evaluated quickly in practice
despite the exponential amount of configurations.
Definition 6 (Universal Testing). Given a test case (𝑈, 𝑡, universal ), the test constraint 𝑡 must
hold in all configurations 𝑠 ⊇ 𝑈 : valid (𝑠) ∧ complete (𝑠).

                                        ∀𝑠 ∈ 𝑆 : 𝑡 (𝑠) = ⊤                                         (5)

  This type of reasoning is necessary for both positive and negative hard requirements. Section 5
will illustrate both reasoning modes in more detail.

Definition 7 (Existential Testing). Given a test case (𝑈, 𝑡, existential ), at least one configuration
𝑠 ⊇ 𝑈 : valid (𝑠) ∧ complete (𝑠) must exist in which the test constraint 𝑡 holds.

                                        ∃𝑠 ∈ 𝑆 : 𝑡 (𝑠) = ⊤                                         (6)

  This type of reasoning is for example important for a guided configuration process, where the
user is pointed to the set of selectable valid values. Thus, it would be conceivable to implement
concrete features as a dropdown menu, where the selectable options are loaded dynamically
depending on the current requirements. A test case could then ensure that an option is still
available. Both modes can be negated by negating the test constraint and using the opposite
reasoning mode. The opposite of universal testing thus changes from ¬∀𝑠 ∈ 𝑆 : 𝑐 (𝑠) = ⊤ to
∃𝑠 ∈ 𝑆 : 𝑐 (𝑠) = ⊥. Correspondingly, existential testing changes from ¬∃𝑠 ∈ 𝑆 : 𝑐 (𝑠) = ⊤ to
∀𝑠 ∈ 𝑆 : 𝑐 (𝑠) = ⊥. The default mode tests the constraint accordingly with a configuration that
complies with all satisfiable default rules. All feature assignments not derived from defaults can
have a non-deterministic choice in this case. However, before discussing these modes in more
detail, we define our collected test knowledge as test suite.

Definition 8 (Test Suite). A test suite is a tuple (𝑀, 𝑇 ), where 𝑀 is a knowledge base following
Definition 3 and 𝑇 is a finite set of test cases following Definition 5.

  The tests of a test suite can be executed sequentially by a portfolio of reasoning engines, but
can also be fully parallelized. Since each test case is potentially associated with an NP complete
evaluation, the performance of the test system plays a critical role. In addition, optimization
can be performed, such as grouping several test cases with the same user requirements and the
same reasoning mode.
  According to Junker [13], the configuration task also consists of an explanation of failure if
no configuration can be found that satisfies all requirements. Thus, further reasoning modes
would be conceivable, for example to reason about properties of unsatisfiable configurations.
Another mode could be used to test default assignments if they are supported by the ontology.
Here, defaults are a separate set of constraints according to the scheme

                                       constraint → 𝑓 = 𝑥                                          (7)

where 𝑓 ∈ 𝐹 and 𝑥 ∈ domain (𝑓 ). An example is Body = SPORT → Seats = SPORT. Finding
a configuration, for example, can then additionally be treated as an optimization problem to
satisfy as many defaults as possible. Besides the purpose of grouping functionally similar
options, defaults also serve to determinize the reasoning process. However, additional reasoning
modes remain open as future work.
4. Engineering Testing Knowledge
Having formally defined both product configuration and regression testing, in this section we
take a look at it from a practical standpoint. For this, we outline in Section 4.1 the collaborative
development of test knowledge and its challenges using a domain-specific grammar. Then, in
Section 4.2, we describe the lifecycle of the test knowledge, its integration into the development
process, and most importantly, the execution of the tests.

4.1. Development
Various front ends are conceivable for developing the testing knowledge and thus setting up the
data structure in Definition 8. These front ends must be adapted to the expected developers of the
tests. Ideally, the front end is based on existing technology for developing product knowledge.
Very technical developers, for example, could be granted direct access to a programming
interface. Technically-averse actors, for instance subject matter experts, should be supported
in other ways. One way would be to add a developer mode to the configuration environment.
On the one hand, this allows to manually create a configuration to be tested analogous to the
customer experience. On the other hand, also test constraints can be created in the developer
view for each step, for example with the support of graphical editors and pre-built templates.
The respective steps of the development process are then serialized into individual test cases.
While this method is very convenient, it also comes with disadvantages. The linear navigation
through the configuration process leads to a similar set of user requirements and thus similar
test cases. This in turn leads to the many dimensions of the solution space of valid configurations
possibly being insufficiently tested, while few other dimensions are redundantly tested.
   A third solution would be a middle ground, such as the implementation of a Domain Specific
Language (DSL). Technical details such as the different reasoning modes should be abstracted. A
strongly declarative solution could be closely oriented to natural language. A DSL has the advan-
tage of being able to use the synergies with text-based software development. This makes it very
easy, for example, to integrate versioning with helpful difference views. Furthermore, a semantic
wiki can serve as a platform for collaborative development, structuring and maintenance of
test knowledge. In addition, text is a suitable interface for pointing out anomalies, for example
through syntax highlighting. Ultimately, there is an integrated development environment (IDE)
that combines all of these functions. The basis for this is the DSL. Such a language is briefly
outlined in Listing 1.
   Here, we only show the formulation of the constraints. As lines 1–3 show, a statement begins
with the reasoning mode. Then, after lines 5–11, a constraint is initially a logical expression
that can be evaluated to true or false according to Definition 2. Note the precedence of the rules.
Essential for this is the compare rule in line 10 to evaluate constants obtained from formulas.
Formulas in lines 13–23 represent calculations and functions that can be evaluated to constants.
The last line refers to the hidden feature rule, which is used to query configuration-dependent
values. Since typically different people without a computer science background develop tests,
one goal of the DSL is to avoid computer science-specific concepts and terminology. For example,
“&&” and “||” often serve as logical AND and OR in programming languages, but are replaced here
by their natural language counterparts. Also, Unicode symbols are avoided, such as “→”, “∨”,
     and “∧”, which are inconvenient to type. Note that the grammar lacks operations to manipulate
     the configuration. However, a simple option would be to specify a serialized configuration
     with feature assignments for each test case, which is then loaded from a database or file. A
     self-contained test case, on the other hand, would start from a blank configuration and include
     operations to set and modify values. The written DSL code is then decomposed into a set of
     user requirements, where each element of the set arises after an operation to modify a value.
     We show examples of usage in Section 5.
1                      t e s t := univeral | e x i s t e n t i a l
2               univeral := ' require ' constraint
3         e x i s t e n t i a l := ' allow ' c o n s t r a i n t
4
5         constraint        :=   implication
6        implication        :=   disjunction ( ' implies ' disjunction ) ∗
7        disjunction        :=   c o n j u n c t i o n ( ' or ' c o n j u n c t i o n ) ∗
8        conjunction        :=   n e g a t i o n ( ' and ' n e g a t i o n ) ∗
9           negation        :=   ' not ' n e g a t i o n | ' ( ' c o n s t r a i n t ' ) ' | compare
10           compare        :=   formula ( operator formula )+
11          operator        :=   ' < ' | ' <= ' | ' > ' | ' >= ' | '= ' | ' ! = '
12
13          formula         :=   addition
14         addition         :=   subtraction ( '+ ' subtraction )∗
15     subtraction          :=   multiplication ( ' − ' multiplication )∗
16   multiplication         :=   division ( '∗ ' division )∗
17         division         :=   sign ( ' / ' sign ) ∗
18             sign         :=   ' − ' sign | '+ ' sign | other
19
20             other        : = ' ( ' f o r m u l a ' ) ' | a g g r e g a t i o n | atom
21       aggregation        := function ' ( ' formula ( ' , ' formula ) ∗ ' ) '
22          function        : = ' count ' | ' sum ' | ' min ' | ' max ' | ' e q u a l '
23              atom        : = ' t r u e ' | ' f a l s e ' | number | f e a t u r e
     Listing 1: A simplified grammar to define test cases. For simplicity, we adapt the symbols
                {(, ), ?, *, +} of regular expressions with the same meaning. The grammar is not
                functional due to a lack of terminal rules. Also, whitespaces are ignored. The terminal
                symbol feature is not defined, but is used to identify features 𝑓 ∈ 𝐹 . Likewise,
                number is not defined, but describes integer and floating point values. The start rule
                is test.


     4.2. Execution
     To detect regressions, tests must be run automatically after changes and new versions. Here,
     software development can serve as inspiration. Correspondingly, a continuous integration (CI)
     system can be implemented that runs the tests after each change or after manual triggering [14].
     Figure 4.2 shows a prototypical implementation for the developed test framework of this paper.
The automated tests assert the integrity of new knowledge before it is accepted as the central
consensus. Then, if all tests are successful, a resulting artifact can be delivered, for example. An
artifact might be a compiled text file in a format that is ready for a reasoning engine. Following
this idea, the concept of test-driven development can be adapted to product configuration. In
order to integrate new features, requirements or regulations into the product knowledge, test
cases are first developed for them. The lifecycle of product knowledge thus consists of iterations
of the following steps [15]:
   1. Clear definition and documentation of the new requirements
   2. Formulation of the defined requirements as a test case
   3. The implemented test fails expectably but not necessarily
   4. Initial formulation of the new product knowledge
   5. The implemented tests are now succesful
   6. Refactoring of the existing knowledge
It is often easier to formulate conditions of validity than to formulate the knowledge itself. For
example, it is easier to specify that the weight of a product must lie within a certain interval
than to specify the calculation behind it. Another big advantage is that any requirements for
the product are clearly documented. This is crucial in the case of regressions. Tests can fail
for various reasons. In the simplest case, for example, the names of features change. If the
configuration ontology contains a hierarchy, the position of knowledge in the hierarchy may
change. It may also happen that old knowledge is invalidated for example by new regulations
Therefore, beside the product knowledge also the test knowledge must be maintained. Both
is however only possible, if the purpose of the knowledge can be comprehended afterwards
by people other than the original developers. Good software is often self-explanatory. The
origin and purpose of knowledge is not necessarily so. If there is ambiguity about whether old
functions are obsolete in case of regression, then legacy data accumulates which has a strong
negative impact on the quality of the knowledge base.




Figure 1: A screenshot of a prototypical implementation for the continuous integration of knowledge
changes in the semantic wiki KnowWE [16]. At a defined trigger all regression tests are executed before
the knowledge base is delivered. An overview shows the last executions and their status of success. In
addition, quick information about the test execution is displayed, e. g., how many tests were executed,
how long the test duration was and how long the runtime of the test process was including the setup of
the configuration inputs. If required, further details on individual tests can be viewed.
5. Application Scenarios
Regression testing is used by companies to verify that the solution space of sellable configurations
is correctly defined. We separate this interest into two categories:
      1. Whitelist Testing: On the one hand, the set of manufacturable configurations should be
         clearly defined in order to prevent profit losses by selling fewer products due to a lack of
         configurations.
      2. Blacklist Testing: On the other hand, non-manufacturable configurations are to be pre-
         vented in order to avoid problems with the correction of erroneous orders in downstream
         systems such as supply chain management and more importantly, the production line.
Since whitelisting explicitly describes allowed configurations, knowledge is much easier to
maintain. This works, for instance, by describing allowed instead of forbidden feature com-
binations. For example, the list in Section 2 shows allowed combinations. The opposite —
forbidden combinations — would be a negation of the entire expression. This would hide which
configurations are excluded by the constraint. The situation is similar with test knowledge. It is
hard to test the unknown unknowns. Whitelisting is therefore the recommended approach.

5.1. Whitelist Testing
Whitelist testing is concerned with the known properties of valid configurations. In the easiest
case, tests are developed for an existing product whose knowledge base is merely being expanded
or maintained. In this case, test cases can be generated automatically, which, for example, ensure
that the sold configurations of a past period are still sellable with the revisions. A typical test
case here consists of the user requirements 𝑈 at that time and the feature assignments of the
sold configuration. 𝑈 is then universally tested, since the requirements come explicitly from the
user and thus must be included in any derived configuration. The assignments of the reasoning
engine 𝐴 at that time are existentially tested to ensure that the configuration could still be
derived in exactly the same way. For example,

         𝑈 = {Body = SPORT, ExteriorColor = RED, Wheels = 21_INCH_SPIDER} ,
         𝐴 = {Seats = SPORT, InteriorColor = BLACK, . . . } .

A test case may then look like the following.
/ / load       / setup configuration
require        Body                      = SPORT
require        E x t e r i o r C o l o r = RED
require        Wheels                    = 21 _INCH_SPIDER
allow          Seats                     = SPORT AND
               I n t e r i o r C o l o r = BLACK
...
Since the tests are generated automatically, each line can easily consist of a single constraint.
Otherwise, universal constraints can be linked with a logical AND to save typing. Note that
for existential tests, it makes a difference, though, whether they are formulated as a single or a
separate constraint. If the ontology supports defaults, then a third reasoning mode for it would
be conceivable, that explicitly tests configurations that maximize all satisfiable defaults (see
Section 3). For example, if Body = SPORT the implication Seats = SPORT should be tested as
default instead of existentially.
   However, usually it is not old knowledge that needs to be tested, but new knowledge. Instead
of a set of inputs and expected outputs, the requirements of the new knowledge must then be
specified as testable criteria. For example, one property of any product that is often part of
the knowledge base is its price calculation. Here, prices are often dynamically composed of
surcharges and price reductions. A surcharge could result if a product does not use a uniform
color, but different colors for different components. This requirement is first formulated as a
test.
r e q u i r e equal ( ExteriorColor , I n t e r i o r C o l o r , CoverColor )
              implies ColorSurcharge = f a l s e

r e q u i r e not equal ( E x t e r i o r C o l o r , I n t e r i o r C o l o r , CoverColor )
              implies ColorSurcharge = true
The test starts with 𝑈 = ∅, since it must apply universally to all derivable configurations.
Then the functionality is implemented in the product knowledge itself and can be refactored
as desired. Note that the derivation of ColorSurcharge here is potentially very similar to
the test constraint itself. However, this redundancy documents the original requirement of the
product knowledge in the event that the knowledge is changed.

5.2. Blacklist Testing
Blacklist testing is concerned with the unknown properties of invalid configurations. Because
of this uncertainty, blacklist tests are discouraged. However, they are often required by poor
knowledge modeling practices. For example, by using combination constraints (see Section 2)
to define forbidden combinations instead of allowed ones. Equivalently to tests with sold
configurations, the reasoning engine can be used to generate a set of non-manufacturable
configurations. To do this, 𝑈 is randomly generated repeatedly and ¬valid (𝑈 ) is checked
in each case. However, the set of invalid configurations typically exceeds the set of valid
configurations by several orders of magnitude. This severely limits the quality assurance of
these blacklist tests. Much better suited are universal tests with general statements, where
𝑈 = ∅. As an example, consider again the price. A universal statement would be, for example,
that the price must always lie in an interval that is statically calculated from the minimum and
maximum prices of the components.
r e q u i r e 1 2 , 5 1 7 . 3 2 <= sum ( P r i c e I n t e r i o r ,
                                           PriceExterior ,
                                           Surcharges ) − PriceReductions
                                <= 9 5 , 5 6 9 . 1 4
This computation could also be done dynamically, which would require additional operations
in the DSL, e. g., to access the minimum and maximum of the feature domains.
   Another problem could involve the weight of a product. For example, in the European Union,
a passenger car may not exceed the weight of 3.5 tons. Therefore, to prevent configurations
from exceeding this weight, a constraint exists that limits the sum of the weights of all parts
accordingly. The weight must always be rounded up to the next kilogram so that the limit is
not exceeded unnoticed due to rounding. A test is implemented.
r e q u i r e WeightInKG <= 3 5 0 0


6. Conclusion
Automated testing for quality assurance in all areas of computer science has a long history. This
ranges from validation and verification in knowledge engineering [17, 18, 19] to a complete
portfolio of testing techniques such as unit, integration, system, and acceptance tests in software
engineering [8]. The development of tests often requires a similar amount of work as the
implementation of the functionality itself. Nevertheless, in the long run this extra effort leads
to a reduction of the total work due to the optimization of maintenance, refactoring, and
extensibility. The circumstances of an ordered but faulty configuration that has to be recalled
from production already justify this effort. Our work has thus presented a novel data structure
for domain specific testing in product configuration. It is based on the declarative development
of test constraints akin to the development of product knowledge itself. In addition, reasoning
modes were introduced to avoid the combinatorial problems associated with simple test cases
of expected inputs and outputs and the exponential set of configurations. The options for
formulating the test cases must be tailored to the respective developers and their expertise.
Our work has exemplified this by presenting a domain specific language to find a compromise
between technically aphine and averse developers. Simultaneously, the use of text-based
development opens up many opportunities to adapt established software engineering practices.
   Nevertheless, several challenges remain. One of them is the evaluation of quality assurance
itself using coverage metrics of the tested knowledge. In contrast to software, there is no
control flow due to declarativity of our approach, which means that existing metrics such as
path or branch coverage are not applicable [20, 21]. Similar to much other work [21, 22], thus
more research is needed on how to draw conclusions from our test cases about their quality
and the extent of knowledge tested. Another challenge is that the presented framework for
testing is itself error-prone, as the formulation of the test constraints is non-trivial. Various
papers tackle the task of verifying test suites themselves for this purpose [23, 24, 25]. Finally,
the performance of test case execution is a challenge that suffers from the NP complexity of
configuration problems. For this, a system to automatically decide which test cases have to be
executed depending on the changes made to the knowledge base is conceivable. This is to avoid
executing test cases that succeed regardless of occurring regressions and are therefore irrelevant.
The execution of only relevant test cases then leads to an improvement of the intended frequent
and continuous integration of changes. Ultimately, regression testing represents only one of
many tools to ensure knowledge quality such as those used in software engineering. We strive
for a portfolio of all these methods in the future and are, at the time of writing, in the process
of evaluating the presented methods with industrial partners.
References
 [1] A. Felfernig, L. Hotz, C. Bagley, J. Tiihonen, Knowledge-based configuration: From research
     to business cases, Morgan Kaufmann, Oxford, England, 2014.
 [2] A. Falkner, H. Schreiner, Siemens: Configuration and reconfiguration in industry,
     Knowledge-Based Configuration: From Research to Business Cases (2014) 199–210.
     doi:10.1016/B978-0-12-415817-7.00016-5.
 [3] K. Orsvärna, M. H. Bennick, Tacton: Use of tacton configurator at flsmidth, in: Knowledge-
     Based Configuration, Morgan Kaufmann, 2014.
 [4] I. Nica, F. Wotawa, R. Ochenbauer, C. Schober, H. F. Hofbauer, S. Boltek, Kapsch: Re-
     configuration of mobile phone networks, in: Knowledge-Based Configuration, Morgan
     Kaufmann, 2014.
 [5] R. Rabiser, M. Vierhauser, M. Lehofer, P. Günbacher, T. Männistö, Configuring and
     generating technical documents, in: Knowledge-Based Configuration, Morgan Kaufmann,
     2014.
 [6] J. Tiihonen, W. Mayer, M. Stumptner, M. Heiskala, Configuring services and processes, in:
     Knowledge-Based Configuration, Morgan Kaufmann, 2014.
 [7] K. Herud, J. Baumeister, O. Sabuncu, T. Schaub, Conflict handling in product configuration
     using answer set programming, FLoC 2022 ICLP Workshops (2022).
 [8] M. Pezzè, M. Young, Software testing and analysis - process, principles and techniques,
     Wiley, 2007.
 [9] J. Tiihonen, M. Heiskala, A. Anderson, T. Soininen, Wecotin - A practical logic-based sales
     configurator, AI Commun. 26 (2013) 99–131. URL: https://doi.org/10.3233/AIC-2012-0547.
     doi:10.3233/AIC-2012-0547.
[10] A. A. Falkner, G. Friedrich, A. Haselböck, G. Schenner, H. Schreiner, Twenty-five years
     of successful application of constraint technologies at siemens, AI Mag. 37 (2016) 67–80.
     URL: https://doi.org/10.1609/aimag.v37i4.2688. doi:10.1609/aimag.v37i4.2688.
[11] R. Dechter, Constraint processing, Elsevier Morgan Kaufmann, 2003. URL: http://www.
     elsevier.com/wps/find/bookdescription.agents/678024/description.
[12] S. Jabbour, J. Lonlac, L. Sais, Y. Salhi, Extending modern SAT solvers for models enu-
     meration, in: J. Joshi, E. Bertino, B. Thuraisingham, L. Liu (Eds.), Proceedings of the
     15th IEEE International Conference on Information Reuse and Integration, IRI 2014, Red-
     wood City, CA, USA, August 13-15, 2014, IEEE Computer Society, 2014, pp. 803–810. URL:
     https://doi.org/10.1109/IRI.2014.7051971. doi:10.1109/IRI.2014.7051971.
[13] U. Junker, Configuration, in: F. Rossi, P. van Beek, T. Walsh (Eds.), Handbook of
     Constraint Programming, volume 2 of Foundations of Artificial Intelligence, Elsevier,
     2006, pp. 837–873. URL: https://doi.org/10.1016/S1574-6526(06)80028-3. doi:10.1016/
     S1574-6526(06)80028-3.
[14] J. Baumeister, J. Reutelshoefer, Developing knowledge systems with continuous integration,
     in: S. N. Lindstaedt, M. Granitzer (Eds.), I-KNOW 2011, 11th International Conference on
     Knowledge Management and Knowledge Technologies, Graz, Austria, September 7-9, 2011,
     ACM, 2011, p. 33. URL: https://doi.org/10.1145/2024288.2024328. doi:10.1145/2024288.
     2024328.
[15] K. Beck, Test Driven Development. By Example, Addison-Wesley Longman, Amsterdam,
     2002.
[16] J. Baumeister, J. Reutelshoefer, F. Puppe, Knowwe: a semantic wiki for knowledge engi-
     neering, Appl. Intell. 35 (2011) 323–344. URL: https://doi.org/10.1007/s10489-010-0224-5.
     doi:10.1007/s10489-010-0224-5.
[17] R. Knauf, A. Gonzalez, K. Jantke, Validating rule-based systems: a complete methodology,
     in: IEEE SMC’99 Conference Proceedings. 1999 IEEE International Conference on Systems,
     Man, and Cybernetics (Cat. No.99CH37028), volume 5, 1999, pp. 744–749 vol.5. doi:10.
     1109/ICSMC.1999.815644.
[18] J. Baumeister, Continuous Knowledge Engineering with Semantic Wikis, habilitation,
     Universität Würzburg, 2010.
[19] J. Baumeister, Advanced empirical testing, Knowl. Based Syst. 24 (2011) 83–94. URL:
     https://doi.org/10.1016/j.knosys.2010.07.008. doi:10.1016/j.knosys.2010.07.008.
[20] H. Zhu, P. A. V. Hall, J. H. R. May, Software unit test coverage and adequacy, ACM
     Comput. Surv. 29 (1997) 366–427. URL: https://doi.org/10.1145/267580.267590. doi:10.
     1145/267580.267590.
[21] F. Belli, O. Jack, Declarative paradigm of test coverage, Softw. Test. Verification
     Reliab. 8 (1998) 15–47. URL: https://doi.org/10.1002/(SICI)1099-1689(199803)8:115::
     AID-STVR146\protect\protect\leavevmode@ifvmode\kern+.2222em\relax3.0.CO;2-D.
     doi:10.1002/(SICI)1099-1689(199803)8:1\<15::AID-STVR146\>3.0.CO;
     2-D.
[22] F. Belli, O. Jack, A test coverage notion for logic programming, in: Sixth International
     Symposium on Software Reliability Engineering, ISSRE 1995, Toulouse, France, October
     24-27, 1995, IEEE Computer Society, 1995, pp. 133–142. URL: https://doi.org/10.1109/ISSRE.
     1995.497651. doi:10.1109/ISSRE.1995.497651.
[23] S. Boroday, A. Petrenko, A. Ulrich, Test suite consistency verification, in: 2008 East-
     West Design & Test Symposium, EWDTS 2008, Lviv, Ukraine, October 9-12, 2008, IEEE
     Computer Society, 2008, pp. 235–239. URL: https://doi.org/10.1109/EWDTS.2008.5580145.
     doi:10.1109/EWDTS.2008.5580145.
[24] P. H. Deussen, S. Tobies, Formal test purposes and the validity of test cases, in: D. A. Peled,
     M. Y. Vardi (Eds.), Formal Techniques for Networked and Distributed Systems - FORTE 2002,
     22nd IFIP WG 6.1 International Conference Houston, Texas, USA, November 11-14, 2002,
     Proceedings, volume 2529 of Lecture Notes in Computer Science, Springer, 2002, pp. 114–129.
     URL: https://doi.org/10.1007/3-540-36135-9_8. doi:10.1007/3-540-36135-9\_8.
[25] C. Jard, T. Jéron, P. Morel, Verification of test suites, in: H. Ural, R. L. Probert,
     G. von Bochmann (Eds.), Testing of Communicating Systems: Tools and Techniques,
     IFIP TC6/WG6.1 13th International Conference on Testing Communicating Systems (Test-
     Com 2000), August 29 - September 1, 2000, Ottawa, Canada, volume 176 of IFIP Conference
     Proceedings, Kluwer, 2000, pp. 3–18.