=Paper=
{{Paper
|id=Vol-1453/06_ReinfrankNinausWotawaFelfernig_IntelligentSupportingTechniques_Confws-15_p31
|storemode=property
|title=Intelligent supporting techniques for the maintenance of constraint-based configuration systems
|pdfUrl=https://ceur-ws.org/Vol-1453/06_ReinfrankNinausWotawaFelfernig_IntelligentSupportingTechniques_Confws-15_p31.pdf
|volume=Vol-1453
|dblpUrl=https://dblp.org/rec/conf/confws/ReinfrankNWF15
}}
==Intelligent supporting techniques for the maintenance of constraint-based configuration systems==
<pdf width="1500px">https://ceur-ws.org/Vol-1453/06_ReinfrankNinausWotawaFelfernig_IntelligentSupportingTechniques_Confws-15_p31.pdf</pdf>
<pre>
Intelligent Supporting Techniques for the Maintenance of
         Constraint-based Configuration Systems12
                            Florian Reinfrank, Gerald Ninaus, Franz Wotawa, Alexander Felfernig
                                              Institute for Software Technology
                                                Graz University of Technology
                                           Inffeldgasse 16b/II, 8010 Graz, Austria
                                             {firstname.lastname}@ist.tugraz.at


Abstract. Constraint-based systems like knowledge-based recom-                     The knowledge engineer has to extend the current knowledge base
mendation and configuration are well established technologies in                   with new product attributes (e.g., introducing a new product feature
many different product areas like cars, computers, notebooks, and                  eBike) and questions (e.g., ’Do you want an electric engine assis-
financial services. Such systems reduce the number of valid products               tance?’). In complex constraint-based configuration system updates
and configurations regarding the customers’ preferences. The rela-                 are time consuming and error prone because unexperienced knowl-
tionship between product variables and customer questions is rep-                  edge engineers have to adapt the knowledge base and unexpected
resented by constraints. Nowadays, constraint-based configuration                  dependencies between constraints exist.
systems represent volatile product assortments: Variables must be                     In this paper we show how we can support knowledge engineers
adapted, new product features lead to new questions for the cus-                   when they maintain a constraint-based configuration system. The
tomer, and / or the constraints must be updated. We call such sce-                 approaches can be used in many scenarios like knowledge-based
narios maintenance tasks.                                                          recommendation, knowledge-based configuration, or feature models.
   In complex constraint-based configuration systems the mainte-                   Due to complex restrictions to adapt the product assortment regard-
nance task is time consuming and error prone. Previous research fo-                ing the customers’ preferences, intelligent techniques can be used if
cused on the detection of conflicts, repair actions for the conflicts,             the preferences can not be fulfilled.
and redundant constraints. In this paper we give an overview about                    This paper is organized as follows. Section 2 gives an overview
these techniques and present new approaches like recommendation,                   about constraint-based configuration systems. It introduces a running
well-formedness violation, simulation, and knowledge base verifica-                example and relevant definitions for this paper. Our new approaches
tion for the support of knowledge engineers.                                       to support knowledge engineers in maintaining constraint-based con-
                                                                                   figuration systems are described in Section 3. A summary in Section
                                                                                   4 concludes this paper.
1     Introduction
In e-Commerce applications constraint-based configuration systems                  2   Related Work
are used to show which combinations of product variable assign-
ments can be combined and offered to potential customers. Due to                   In this Section we give an overview about constraint-based configu-
complex restrictions to adapt the product assortment regarding the                 ration systems, introduce a running example for this paper and define
customers’ preferences, intelligent techniques can be used if the cus-             relevant terms which are necessary to explain the intelligent support-
tomers’ preferences can not be fulfilled.                                          ing techniques from Section 3.
   A knowledge engineer develops and maintains such knowledge                         For our constraint-based configuration system we use the con-
bases. Based on knowledge (for example, knowledge of bikes) from                   straint satisfaction problem (CSP) modeling technique [14]. A CSP
domain experts, the engineer defines product variables and variable                is a triple (V, D, C) and consists of a set of variables V and a set
domains (e.g., the domain of the product variable BikeT ype’ is                    of domains D where each domain dom(vi ) represents all valid as-
M ountainBike, CityBike, and RacerBike), prepares additional                       signments for a variable vi , e.g., dom(vi ) = {val1 , . . . , valn }. The
questions presented to potential customers (e.g., ‘What is the main                set C contains all constraints which restrict the number of valid in-
usage of the bike?’), and develops relationships (constraints) be-                 stances of a constraint-based configuration system. Basically, a con-
tween questions and products (e.g., if the main usage of the bike is               straint consists of a set of assignments a for variables and relations
everyday lif e then the bike should be of the type CityBike).                      between them. The set A(ci ) is the set of assignments a constraint ci
   Such knowledge bases must be updated over time. For exam-                       has. If a constraint contains only one assignment, we denote such
ple, in the last years bikes with an electric engine became popular.               constraints unary constraint or assignment [13]. Furthermore, the
                                                                                   constraints can be divided into two different types. First, the set CKB
1 We thank the anonymous reviewers for their helpful comments.
2 The work presented in this paper has been conducted within the scope
                                                                                   contains all constraints which describe the domain. For example, it is
    of the research project ICONE (Intelligent Assistance for Configuration
                                                                                   not allowed to use mountain bike tires (T ireW idth > 50mm) for
    Knowledge Base Development and Maintenance) funded by the Austrian             racing bikes (BikeT ype = RacerBike), s.t. c = ¬(BikeT ype =
    Research Promotion Agency (827587).                                            RacerBike ∧ T ireW idth > 50mm). Second, the committed cus-


                                                                              31                Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors
                                                                                             Proceedings of the 17th International Configuration Workshop
                                                                                                                  September 10-11, 2015, Vienna, Austria
 tomers’ preferences are represented as constraints in the set CR .
   The following example is denoted as a CSP and shows a bike
 knowledge base. It contains variables which represent product
 variables as well as customer requirements. The set CR contains an
 example for customer requirements.

 V = {BikeT ype, F rameSize, eBike, T ireW idth, U niCycle,
         U sage}
 D={
     dom(BikeT ype) = {M ountainBike, CityBike,
                             RacerBike},
     dom(F rameSize) = {40cm, 50cm, 60cm},
     dom(eBike) = {true, f alse},
     dom(T ireW idth) = {23mm, 37mm, 57mm},
     dom(U niCycle) = {true, f alse},
                                                                                                  Figure 1. Different types of anomalies.
     dom(U sage) = {Competition, EverydayLif e,
                         HillClimbing}
                                                                                  Definition 3 ’Consistent Instance’: An instance (complete or in-
 }
                                                                                  complete) is consistent, if no constraint in C is violated.
 CKB = {
     c0 := BikeT ype = M ountainBike → T ireW idth >                              In constraint-based configuration systems it can happen, that the sys-
             37mm ∧ F rameSize ≥ 50cm;                                            tem can not offer consistent instances to a user (anomaly) because it
     c1 := BikeT ype = RacerBike → T ireW idth =                                  is not possible to satisfy all constraints (see Definition 3). Such a ’no
             23mm ∧ F rameSize = 60cm;                                            solution could be found’ dilemma is caused by at least one conflict
     c2 := BikeT ype = CityBike → T ireW idth =                                   between a) the constraints in the knowledge base CKB and the cus-
             37mm ∧ F rameSize ≥ 50cm;                                            tomer requirements CR or b) within the set CKB itself. Definition 4
     c3 := ¬(BikeT ype 6= CityBike ∧ eBike = true);                               introduces a formal representation of a conflict.
     c4 := U sage = EverydayLif e → BikeT ype =
             CityBike;                                                            Definition 4 ’Conflict’: A conflict is a set of constraints CS ⊆
     c5 := U sage = HillClimbing → BikeT ype =                                    {CKB ∪ CR } which can not be fulfilled by the CSP, s.t. CS is incon-
             M ountainBike;                                                       sistent.
     c6 := U sage = Competition → BikeT ype =
             RacerBike ∧ F rameSize = 60cm;                                       If we have an inconsistency in our knowledge base, we can say that
     c7 := eBike = true → T ireW idth = 37mm;                                     CKB ∪ CR is always a conflict set. To have a more detailed informa-
     c8 := U niCycle = f alse;                                                    tion about the inconsistency, we introduce the term ’minimal conflict’
 }                                                                                which is described in Definition 5.
 CR = {
              c9 : F rameSize = 50cm;                                             Definition 5 ’Minimal Conflict’: A minimal conflict CS is a conflict
              c10 : U sage = Competition;                                         (see Definition 4) and the set CS only contains constraints which are
              c11 : eBike = true;                                                 responsible for the conflict, s.t. @c∈CS CS \ {c} is inconsistent.
 }
 C = CKB ∪ CR                                                                     When we focus on the set CR and say, that CKB is consistent, our ex-
 The example contains some anomalies in terms of conflicts, redun-                ample contains two minimal conflict sets. CS1 = {c9 , c10 } because
 dancies and well-formedness violations. Figure 1 gives an overview               it is not possible to have a bike for competition with a frame size of
 of different types of anomalies. In the following, we list definitions           50cm and CS2 = {c10 , c11 } because bikes used for competition
 to define the anomalies.                                                         do not support eBikes. The example shows that a knowledge base
 The constraint set C restricts the set of valid instances. While CKB             can have more than one conflict. In such cases we can help users to
 remains stable during a user session she can add her preferences in              resolve the conflicts with diagnosis. A diagnosis ∆ is a set of con-
 the set CR . An instance is given, if at least one customer preference           straints. The removal of the set ∆ from CR leads to a consistent
 is added to CR . Definition 1 introduces the term ’instance’.                    knowledge base, formally described in Definition 6.

 Definition 1 ’Instance’: An instance is given if at least one con-               Definition 6 ’Diagnosis’: A diagnosis ∆ is a set of constraints ∆ ⊆
 straint in the set CR , s.t. CR 6= ∅.                                            CR ∪CKB . When removing the set ∆ from CR ∪CKB , the knowledge
                                                                                  base will be consistent, s.t. CR ∪ CKB \ ∆ is consistent.
 In a complete instance all variables in the knowledge base have at
 least one assignment. Definition 2 introduces the definition for a               Assuming that CKB is consistent (see Definition 3), we can say that
 complete instance.                                                               the knowledge base always will be consistent if we remove CR . In
 Definition 2 ’Complete Instance’: An instance is complete iff all                Definition 7 we introduce the term ’minimal diagnosis’ which helps
 product variables have an assignment, such that ∀v∈V v 6= ∅.                     to reduce the number of constraints within a diagnosis.

 Instances can either fulfill all constraints in a constraint set C (con-         Definition 7 ’Minimal Diagnosis’: A minimal diagnosis ∆ is a di-
 sistent) or not (inconsistent). Definition 3 defines the term ’consistent        agnosis (see Definition 6) and there doesn’t exist a subset ∆0 ⊂ ∆
 instance’.                                                                       which has the same property of being a diagnosis.


Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors                     32
Proceedings of the 17th International Configuration Workshop
September 10-11, 2015, Vienna, Austria
The example configuration knowledge base contains two minimal                   In our example the variable pair U sage and BikeT ype is unnec-
diagnoses. The removal of the set ∆1 = {c9 , c11 } or ∆2 = {c10 }               essary refined because whenever U sage = EverydayLif e the
leads to a consistent configuration knowledge base. After having cal-           BikeT ype = CityBike, U sage = HillClimbing always leads
culated diagnoses and removed the constraints which are in one diag-            to BikeT ype = M ountainBike, and U sage = Competition is
nosis, we can ensure a consistent knowledge base which is necessary             always combined with the assignment BikeT ype = RacerBike.
for calculating redundancies and well-formedness violations.                    If such a violation occurs, we can recommend the knowledge engi-
A redundancy is a set of redundant constraints in the knowledge                 neer to remove the variable U sage and replace it with the variable
base. A constraint c is redundant, if a knowledge base KB 0 with-               BikeT ype in the constraints.
out the constraint c has the same semantics3 as the knowledge base
KB which contains the constraint. Redundant constraints are for-
mally described in Definition 8.                                                3     Intelligent Support for the Maintenance of
                                                                                      Constraint-based configuration systems
Definition 8 ’Redundant constraint’: A constraint c is redundant iff
                                                                                In this Section we describe existing (conflict and redundancy man-
the removal of the constraint from CKB leads to the same semantics,
                                                                                agement) and new (recommendation, well-formedness, simulation,
s.t. CKB \ {c} |= c.
                                                                                metrics) intelligent techniques to support knowledge engineers in
In our example, the constraint c7 is redundant, since only CityBikes            their maintenance tasks.
can be eBikes (c3 ) and have tires with a width of 37mm (c2 ).
While conflicts, diagnoses, and redundancies focus on constraints,
                                                                                3.1    Intelligent Recommendation
well-formedness violations identify anomalies based on variables
and domain elements [2]. We now introduce well-formedness vio-                  Constraint-based knowledge bases can have hundreds or thousands
lations in constraint-based configuration systems.                              of variables, domain elements, and constraints. If there is a main-
The first well-formedness violation focuses on dead domain ele-                 tenance task (e.g., inserting new tire sizes), recommendation tech-
ments. A dead domain element is an element which can never be                   niques help to differentiate between relevant and not relevant infor-
assigned to its variable in a consistent instance (see Definition 3).           mation within the knowledge base. For example, the tires of a bike
Definition 9 introduces a formal description of dead elements.                  probably have an influence on the frame of a bike but does not influ-
                                                                                ence the bell of a bike. In such cases, recommendation techniques de-
Definition 9 ’Dead domain elements’: A domain element val ∈                     tect items (variables, domain elements, constraints, test cases) which
dom(v) is dead iff it is never in a consistent instance, s.t. CKB ∪             are influenced by the tires and the knowledge engineer can focus on
{v = val; } is inconsistent.                                                    these items. We describe four different types of recommendation to
                                                                                support knowledge engineers in their maintenance tasks [4].
The assignments F rimeSize = 40cm and U niCycle = true;
                                                                                The first recommendation approach is the most viewed recommenda-
can never be part of a consistent instance because M ountainBikes
                                                                                tion which is user-independent. It can be useful for new engineers of
and CityBikes require at least 50cm and RacerBikes require a
                                                                                a product domain.
F rameSize of 60cm and our current knowledge base does not sup-
                                                                                Second, recently added lists new items (products, product vari-
port U niCycles.
                                                                                ables, questions, and constraints) in the knowledge base. It is user-
On the other hand, we can have domain elements which are assigned
                                                                                dependent since it considers the last log in of the knowledge engineer
to each consistent instance. We denote such domain elements full
                                                                                and helps to get a fast understanding of the previous changes in the
mandatory and introduce definition 10.
                                                                                knowledge base.
Definition 10 ’Full mandatory’: A domain element val1 ∈                         The next type of recommendation is collaborative filtering. This type
dom(vi ) is full mandatory iff there is no consistent (complete or in-          of recommendation takes the ratings for items into account and looks
complete) instance where the variable vi does not have the assign-              for knowledge engineers with similar ratings. In our case, we don’t
ment val1 , s.t. CKB ∪ {vi 6= val1 } is inconsistent.                           have ratings but use the interaction with items as ratings. If a knowl-
                                                                                edge engineer looks at products, she ’rates’ the item with 1. 2 will
The knowledge base can never be consistent if U niCycle 6= f alse.              be added by the knowledge engineer if she is editing an item. Table
In that case, we can say that the domain element f alse of the do-              1 shows an example for a collaborative filtering recommendation for
main dom(U niCycle) is full mandatory and U niCycle = true                      knowledge engineer u0 based on our example in Section 2.
can never be in a consistent knowledge base (dead domain element).
Another well-formedness violation is called unnecessary refinement.                            c0     c1    c2    c3     c4    c5    c6    c7     c8
Such an unnecessary refinement consists of two variables. If the first                   u0           1           1      2     1            ?
variable has an assignment, it is possible to predict the assignment                     u1           1      1           1     1           1
                                                                                         u2           1      2     1                  2    2
of the second variable because the second variable can only have                         u3     1            1                        2
exactly one consistent assignment. A formal definition is given in
Definition 11.                                                                  Table 1. An example for collaborative filtering. 1 means that the item ci is
                                                                                viewed by the user uj , 2 means that the item is edited and ’ ’ means that the
Definition 11 ’Unnecessary refinement’: A knowledge base con-                   item is neither viewed nor edited by the user.
tains a variable pair vi , vj . For each domain element val1 of vari-
able vi , we can say that variable vj always has the same assignment            In table 1 we try to find out if we should recommend item c7 to
vj = val2 , s.t. ∀val1 ∈dom(vi ) ∃val2 ∈dom(vj ) vi = val1 ∧ vj 6= val2         knowledge engineer u0 . The common process to find recommend-
is inconsistent.                                                                able items is twofold. First, we try to find knowledge engineers with
3 We use the term ’semantics’ to describe a knowledge base KB 0 with the        similar interests. In our example, u1 and u2 have nearly the same
  same solution set as KB.                                                      items viewed or edited. Second, we have to evaluate if the similar


                                                                           33                Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors
                                                                                          Proceedings of the 17th International Configuration Workshop
                                                                                                               September 10-11, 2015, Vienna, Austria
 knowledge engineers are interested in the item. Therefore, we use the               Algorithm 2 QuickXPlain’ (CKB , ∆, CR ):∆
 Pearson correlation [3, 6]. In our example, u1 , and u2 have viewed /                         . CKB : Set of constraints which can’t be part of a conflict
 edited item c7 and we can recommend c7 to knowledge engineer c0 .                                 . ∆: Set of constraints which can be part of a conflict
 Another recommendation approach is the usage of content-based fil-                                       . CR : Set of constraints which will be evaluated
 tering. The basic idea is to find similar items compared to a reference               if ∆ 6= ∅ and inconsistent(CKB ) then
 item. We take the variable names and domain values from a con-                            return ∅;
 straint and evaluate the similarities between the reference item and                  end if
 all other items. The similarities are measured by the TF-IDF (term                    if singleton(CR ) then
 frequency and inverse document frequency) algorithm [6, 8] where                          return CR ;
 the item is the document and the terms are the variables and domain                   end if
 elements. Table 2 shows the similarity values between constraint c7                   k ← d r2 e;
 as reference constraint with the other constraints.                                   C1 ← {c1 , ..., ck } ∈ CR ;
                           constraint   similarity                                     C2 ← {ck+1 , ..., cr } ∈ CR ;
                              c0          0.50                                         ∆1 ← QuickXP lain0 (CKB ∪ C1 , C1 , C2 );
                              c1          0.17                                         ∆2 ← QuickXP lain0 (CKB ∪ ∆1 , ∆1 , C1 );
                              c2          0.50                                         return(∆1 ∪ ∆2 );
                              c3          0.00
                              c4          0.00
                              c5          0.00                                       Contrary to QuickXPlain, FastDiag is an algorithm to calculate a
                              c6          0.33                                       minimal diagnosis and has C and CR as input. C contains all con-
                              c8          0.00                                       straints, s.t. C = CKB ∪ CR . If C is an empty set, FastDiag has
 Table 2. Similarities between constraint c7 with the other constraints based        no diagnosable set and the algorithm stops. It also stops if the set
 on content based recommendation                                                     C \ CR is inconsistent, because this set contains inconsistencies, but
                                                                                     will not be diagnosed. If both preconditions are fulfilled, Algorithm
 With this approach, we can say, that there is a high relationship be-               4 will calculate one diagnosis.
 tween constraint c7 with the constraints c0 and c2 and a weak rela-
 tionship with the constraints c6 and c1 .                                           Algorithm 3 FAST D IAG(CR , C):∆
                                                                                                       . CR : Set of constraints which will be diagnosed
                                                                                              . C: inconsistent knowledge base including all constraints
 3.2    Intelligent Anomaly Management                                                 if C = ∅ ∨ inconsistent(CKB − C) then
 As mentioned in Section 2, there are many anomalies in our example                        return ∅;
 knowledge base. In the following, we describe algorithms for detect-                  else
 ing conflicts, diagnoses, redundancies, and well-formedness viola-                        return DIAG(∅, CR , C)
 tions as well as explanations for those anomalies. These algorithms                   end if
 reduce the time to detect the anomalies and explain the anomaly to
 get a higher understanding of the knowledge base.                                   First of all, DIAG checks whether C is consistent. If it is consistent,
 Junker [7] described a divide-and-conquer approach to detect con-                   each subset of C is also consistent and no constraint in C can be a
 flicts in knowledge bases. The algorithm takes two sets of constraints              part of the diagnosis. Otherwise, CR will be divided into two sub-
 as input. CKB is a set of constraints which can not be part of a di-                sets C1 and C2 . Each subset will be removed from C separately and
 agnosis. The constraints in the set CR will be taken into account for               checked again in a recursive manner. If C \ C1 is consistent, we can
 calculating a diagnosis. If the set CR is not empty and CR ∪ CKB                    say that C2 is consistent and an empty set will be returned. If it is in-
 is not consistent, the algorithm returns a set of constraints which is a            consistent, at least one constraint in C1 must be part of the diagnosis
 minimal conflict (see Definition 5).                                                and therefore C1 will be divided and tested again unless |C| = 1.
                                                                                     The algorithm returns ∆1 ∪ ∆2 which is a minimal diagnosis.
 Algorithm 1 QuickXPlain (CKB , CR ):∆                                               With the previous algorithms we can a.) support customers when
                        . CKB : set of not diagnosable constraints                   they do not get any products for their preferences (CKB ∪ CR is
                               . CR : set of diagnosed constraints                   inconsistent) and b.) support knowledge engineers when they main-
   if isEmpty(CR ) or consistent(CKB ∪ CR ) then                                     tain a constraint-based configuration system with conflicts in CKB .
        return ∅;                                                                    For a detailed description of the visualization of conflicts we refer
   else                                                                              the reader to [16].
        return QuickXP lain0 (CKB , ∆, CR );                                         When we can assume that CKB is consistent, we can continue with
   end if                                                                            redundancy and well-formedness checks. Please note that the follow-
                                                                                     ing algorithms are applied to the constraint set CKB and ignore CR ,
 The algorithm QuickXPlain’ takes three sets as input. While ∆ is                    s.t. C = CKB .
 initially empty, the set CKB contains the constraints which can not                 The first approach for detecting redundancies has been proposed by
 be part of a conflict and the constraints which are part of a conflict              Piette [9]. The approach is the following: a knowledge base aggre-
 are in the set CR . Note that the set CKB can also be empty. The                    gated with its negotiation must be inconsistent, formally described as
 algorithm is a recursive divide-and-conquer algorithm. It splits the                C ∪C is inconsistent and C = {¬c0 ∨¬c1 ∨...∨¬cn }. By removing
 set CR into two parts (C1 and C2 ) and adds the part C1 to CKB . C2                 a constraint ci separately from C, the algorithm checks whether the
 will be evaluated by doing a recursive call with C2 as the set which                result of C − {ci } ∪ C is still inconsistent. If this is the case, then the
 has to be evaluated.                                                                constraint ci is redundant and can be removed. Finally, the algorithm


Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors                        34
Proceedings of the 17th International Configuration Workshop
September 10-11, 2015, Vienna, Austria
Algorithm 4 DIAG(∆, CR , C):∆                                                 Algorithm 7 C ORE D(B, ∆, C): ∆
                                    . ∆: Set of diagnosed constraints                                                       . B: Consideration set
                    . CR : Set of constraints which will be diagnosed                                                 . ∆: Constraints added to B
                                                    . C: CR ∪ CKB                            . C: set of constraints to be checked for redundancy
  if ∆ 6= ∅ and consistent(C) then                                              if ∆ 6= ∅ and inconsistent(B) then
      return ∅;                                                                     return ∅;
  end if                                                                        end if
  if singleton(CR ) then                                                        if singleton(C) then
      return CR ;                                                                   return(C);
  end if                                                                        end if
  k ← d |C2R | e;                                                               k ← d |C|
                                                                                        2
                                                                                          e;
  C1 ← {c0 , ..., ck } ∈ CR ;                                                   C1 ← {c1 , c2 , ..., ck } ∈ C;
  C2 ← {ck+1 , ..., cn } ∈ CR ;                                                 C2 ← {ck+1 , ck+2 , ..., cn } ∈ C;
  ∆1 ← DIAG(C1 , C2 , C − C1 );                                                 ∆1 ← C ORE D(B ∪ C2 , C2 , C1 );
  ∆2 ← DIAG(∆1 , C1 , C − ∆1 );                                                 ∆2 ← C ORE D(B ∪ ∆1 , ∆1 , C2 );
  return(∆1 ∪ ∆2 );                                                             return(∆1 ∪ ∆2 );

returns the set C without redundant constraints.                              Hence the shifted constraint can not be part of an anomaly and further
                                                                              anomalies can be detected.
Algorithm 5 SEQUENTIAL(C): ∆
                                                                              Both algorithms (SEQUENTIAL and CoreDiag) can be used to de-
                                                . C: knowledge base           tect redundant constraints. As mentioned in Section 2 a constraint
                                          . C: the complement of C            consists of a set of variable assignments A(ci ). When we want to
                                    . ∆: set of redundant constraints         test if an assignment of a constraint is redundant, we have to remove
  Ct ← C;                                                                     the assignment from the constraint and check, if the knowledge base
  for all ci in Ct do                                                         is still redundant. For a detailed description of assignment-based re-
      if isInconsistent(Ct − ci ∪ C) then                                     dundancy detection we refer the reader to [11].
           Ct ← Ct − {ci };                                                   We also have to discuss the usefulness of redundant constraints. On
      end if                                                                  the one hand, desired redundancies can help to increase the under-
  end for                                                                     standing of a knowledge base. For example, if many implications
  ∆ ← C − Ct ;                                                                (e.g. A → B; B → C; C → D) are in the knowledge base, a con-
  return ∆;                                                                   straint A → D may helps to understand the knowledge base. On the
                                                                              other hand, redundant constraints can increase the effort for updates.
Another approach (CoreDiag) has been proposed by Felfernig et al.             If the redundant constraint are not identified by the knowledge engi-
[5]. Instead of a linear approach, they adapt the QuickXPlain al-             neer, the knowledge base does not have a correct behavior anymore.
gorithm. The divide-and-conquer approach of this algorithm checks             Next, we describe the algorithms to detect well-formedness viola-
whether removing a set of constraints C1 leads to an inconsistency            tions. First, Algorithm 8 takes sets of constraints (C) and variables
formally described as C − C1 ∪ C is inconsistent. If it is not incon-         (V ) as input parameters and returns a set of variable assignments.
sistent, C1 must be further divided and tested again.                         Each of the assignments can never be consistent with C. The sugges-
                                                                              tion for the knowledge engineer is, that the domain elements which
Algorithm 6 C ORE D IAG (CKB): ∆                                              will be returned by the algorithm can be deleted.
                                        . C : set with all constraints
                                          . C: the complement of C            Algorithm 8 DeadDomainElement (C, V ): ∆
                                    . ∆: set of redundant constraints                                            . C: knowledge base constraints
  C ← {¬c1 ∨ ¬c2 ∨ ... ∨ ¬cn };                                                                                    . V : knowledge base variables
  return(C − C ORE D(C, C, C));                                                                    . ∆ set with inconsistent variable assignments
                                                                                for all vi ∈ V do
CoreD (Algorithm 7) checks, if B ⊆ C is inconsistent. An inconsis-                  for all domj ∈ dom(vi ) do
tency of B ∪ C means that the subset is not redundant and no con-                       C 0 = C ∪ {vi = domj }
straint of B will be a part of ∆. singleton(C) = true means that                        if inconsistent(C 0 ) then
|C| is redundant and will be returned. Otherwise the constraint set C                       ∆ ← {vi = domj }
will be further divided and the subsets will be checked recursively.                    end if
With the presented approaches we can calculate one conflict, diag-                  end for
nosis, or constraint set without redundancies. In complex knowledge             end forreturn ∆
bases we can assume that many anomalies are in the knowledge base.
For calculating all conflicts / diagnoses / redundant constraint sets,        While we can evaluate if a domain element can never be in a con-
we use Reiter’s HSDAG approach [12]. This approach takes the re-              sistent instance, we can also check if a domain element must be in
sult of one of the algorithms above and expands branches for each             a consistent instance of a knowledge base. We denote such domain
constraint in the result set. The constraint will be inserted into the        elements as full mandatory. Algorithm 9 checks whether the knowl-
set which can not be part of the result, e.g. a constraint ci will be         edge base will be inconsistent, if the domain element domj is not
removed from CR and added to CKB in the QuickXPlain algorithm.                selected.


                                                                         35               Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors
                                                                                       Proceedings of the 17th International Configuration Workshop
                                                                                                            September 10-11, 2015, Vienna, Austria
 Algorithm 9 FullMandatory (C, V ): ∆                                          each type of anomaly. We take the set of constraints (e.g. set of dead
                                    . C: knowledge base constraints            elements) and add this set to CKB in the algorithm. Now we have
                                      . V : knowledge base variables           ensured, that the constraint which describes the anomaly, can’t be
                      . ∆ set with inconsistent variable assignments           part of ∆ in the algorithm. Next, we have to negate the constraint set
   for all vi ∈ V do                                                           which describes the anomaly. Since the negation of the anomaly can
       for all domj ∈ dom(vi ) do                                              never be consistent, QuickXPlain will return the set of constraints
           C 0 = C ∪ {vi 6= domj }                                             which is responsible for the anomaly. For example, the dead domain
           if inconsistent(C 0 ) then                                          element U niCycle = true will be negated and added to C. In that
               ∆ ← {vi 6= domj }                                               case, QuickXPlain will return the set {c8 } as an explanation for the
           end if                                                              dead domain element.
       end for
   end forreturn ∆                                                             3.3    Simulation
                                                                               Due to the huge complexity of calculating all possible instances for
 If variable vi contains a full mandatory domain element, we can say,          all possible assignments (see Section 2) in constraint-based configu-
 that each other domain element of vi is a dead element. If a do-              ration systems we use Gibbs’ simulation to estimate the consistency
 main element is full mandatory, we suggest the knowledge engineer             rate cr for a specific set of assignments A [11]. With this approxima-
 to delete all other domain elements or to remove the variable itself.         tion, we can . . .
 Finally, we introduce an algorithm to detect unnecessary refinements
 between variables (see Definition 11). Algorithm 10 returns a set of            . . . estimate the restriction rate (number of consistent instances
 constraints. Each of these constraints describe one unnecessary re-             compared to all instances) and evaluate the knowledge base (see
 finement between two variables and each domain element between                  Section 3.4).
 both variables. The assignments between the variables are conjunc-              . . . generate test cases for boundary value analysis [11].
 tive and each domain element of variable vi is in a disjunctive order,          . . . rank diagnoses and conflicts (assuming that a knowledge base
 e.g. (vi = vali1 ∧ vj = valj1 ) ∨ (vi = vali2 ∧ vj = valj2 ) ∨ (vi =            with nearly the same restriction rate compared to the current
 vali3 ∧ vj = valj3 ).                                                           knowledge base is preferred).
                                                                                 . . . generate reports for variety management (e.g. ’How many
 Algorithm 10 UnnecessaryRefinement (C, V ): ∆                                   bikes can be used for Competition, EverydayLif e, and
                                     . C: knowledge base constraints             HillClimbing?’).
                                       . V : knowledge base variables          An assignment is a constraint which contains one variable av , one
                                              . ∆ set with constraints         domain element ad , and a relationship between variable and domain
   for all vi ∈ V do                                                           element ar (see Section 2). Examples for assignments are eBike =
       for all vj ∈ V |vi 6= vj do                                             true; and BikeT ype = M ountainBike. Algorithm 11 is divided
           A = ∅;                              . set with assignments          into three functions and shows the basic algorithm for estimating the
           for all domk ∈ dom(vi ) do                                          consistency rate for a set of assignments.
               dompair = f alse;                                               The function Gibbs(KB, A) is the main function of this algorithm.
               C 0 ← C ∪ {vi = domk }                                          It has a knowledge base KB and a set of assignments A as input.
               for all doml ∈ dom(vj ) do                                      The knowledge base contains sets of variables V ∈ KB and con-
                   C 00 ← C 0 ∪ {vj 6= doml }                                  straints C ∈ KB. The set CC (checks) contains all results from
                   if inconsistent(C 00 ) ∧ dompair = f alse then              consistency checks. A consistency check is either consistent (1) or
                        dompair = true;                                        inconsistent (0). The number of minimum calls is constant and given
                        A ← A ∪ {vi = domk ∧ vj = doml }                       in variable mincalls. The total number of consistent checks is given
                   end if                                                      in variable consistent. threshold is a constant and required to test,
               end for                                                         if the current set of consistency checks has a high accuracy. If the
           end for                                                             variable verif y is greater than the threshold, we can not guaran-
           if |A| = |dom(vi )| then                                            tee, that the current result is accurate. Therefore, we have to execute
               ∆ ← ∆ ∪ disjunctive(A)                                          the loop again. In the while-loop we first have to generate a set of
           end if                                                              new random assignments. Since assignments are also special types
       end for                                                                 of constraints, we add them to the set C ∈ KB and do a consistency
   end forreturn ∆                                                             check again. If randA ∪ C ∈ KB is consistent, we add 1 to the set
                                                                               CC and increment the variable consistent. Otherwise, we add 0 to
 The performance of this algorithm depends on the number of vari-              the set CC. Finally, we verify all previous consistency checks. If the
 ables, their domain size, the number of unnecessary refinements, and          variable verif y is lower than the variable threshold and we have
 the performance of the solver. In our short study with 14 knowledge           more consistency checks than mincalls, we can return the number
 bases (up to 34 variables and domain sizes from two to 47) the de-            of consistent checks divided by the total number of checks.
 tection of unnecessary refinements requires up to 375 ms (with 42             The function generateRandAssign(KB) is responsible for the
 unnecessary refinements) for the detection of all possible unneces-           generation of new assignments. Random(C) returns the number
 sary refinements (Intel Xeon @ 2.4Ghz * 6 cores, 24GB RAM).                   of assignments which has to be generated randomly. Random(V )
 To get a deep understanding of the anomalies we need to explain               takes a variable from the knowledge base. If the variable is al-
 them to the knowledge engineer [2]. For the calculation of an ex-             ready part of another assignment, the variable won’t be used again.
 planation we use the QuickXPlain algorithm (see Algorithm 2) for              Random(R) selects a relation between the variable and the domain


Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors                  36
Proceedings of the 17th International Configuration Workshop
September 10-11, 2015, Vienna, Austria
Algorithm 11 GibbsSampling                                                   3.4    Knowledge base Evaluation
  function G IBBS(KB, A): ∆                                                  As mentioned in the previous Sections, we can analyze a knowl-
     CC = ∅                . set of consistency check results {0, 1}         edge base in different ways and collect a lot of information about
     mincalls = 200                                        . constant        the knowledge base. Finally, we can also evaluate the knowledge
     threshold = 0.01                                      . constant        base in terms of metrics. Those metrics a) help to get information
     consistent = 0                                                          about the quality of the knowledge base and b) get information about
     verif y = Double.M ax V alue                                            the quality of previous changes. Next, we will describe some met-
     while n < mincalls ∨ verif y > threshold do                             rics, use them to answer five questions, and measure three goals for
         randA = A ∪ GENERATE R ANDA SSIGN(KB)                               constraint-based configuration systems (goal-question-metrics). The
         C.addAll(randA)                                  . C ∈ KB           metrics are based on a literature review focusing on knowledge engi-
         if isConsistent(KB) then                                            neering. An overview of the literature review is given in [10]. In the
              consistent + +                                                 following list we describe several metrics.
              CC.add(1)
         else                                                                • Number of variables |V | ∈ KB.P
              CC.add(0)                                                                                                           |dom(vi )|
                                                                             • Average domain size domsize: vi ∈V |V |
         end if
         C.removeAll(randA)                                                  • Number of constraints |CKB | ∈ KB
         verif y = VERIFY C HECKS(CC)                                        • Number of minimal conflicts |CS|: see Definition 3
         n++                                                                 • Minimal cardinality CS M CCS: the lowest number of constraints
     end while                                                                 in a conflict set
     return consistent/n                                                     • Number of minimal diagnoses |∆|: see Definition 5
  end function                                                               • Minimal cardinality diagnosis M C∆: the lowest number of con-
  function GENERATE R ANDA SSIGN(KB):A                                         straints in a diagnosis
     A=∅                                     . A: set of assignments         • Number of redundancy sets |R|
     n = Random(C) > 0:                    . generate n assignments          • Maximal cardinality redundancy set M CR: the largest number of
     for i = 0; i < n; i + + do                                                constraints in a redundancy set
         av = Random(V )                                  . V ∈ KB           • dead elements DE: number of dead elements compared to the
         ar = Random(R)                                                        total number of all domain elements
         ad = Random(dom(av ))                                                                                       (
                                                                                             P        P                0 C ∪ {vi = dj } =
                                                                                                                                        6 ∅
         A.add(a)                                                                               vi ∈V   dj ∈dom(vi )
     end for                                                                                                           1 else
                                                                                    DE =
     return A                                                                                                 |V | × domsize
  end function
  function VERIFY C HECKS(CC):∆                                              • full mandatory F M : number of full mandatory domain elements
     CC1 = CC.split(0, |CC|/2)                                                 compared to the total number of all domain elements
     CC2 = CC.split((|CC|/2) + 1, |CC|)                                                                            (
     mean1 = mean(CC1 )                                                                    P       P                 0 C ∪ {vi 6= dj } = ∅
                                                                                             vi ∈V    dj ∈dom(vi )
     mean2 = mean(CC2 )                                                                                              1 else
                                                                                    FM =
     if mean1 ≥ mean2 then                                                                                  |V | × domsize
         return mean1 − mean2
     else                                                                    • unnecessary refinement U R: whenever a variable vi has an as-
         return mean2 − mean1                                                  signment, we can predict the assignment of variable vj , s.t.
     end if                                                                    dom(vi ) → dom(vj )
                                                                                                    |C|
  end function                                                               • restriction rate RR: |V |
                                                                                                          P         #vars(ci )
                                                                                                             c ∈C
                                                                                                              #vars(C)
                                                                                                                                 |C|
                                                                             • restriction rate RR2 :    i
                                                                                                                |V |
                                                                                                                            where #vars(ci ) is the
                                                                               number of variables in ci .
                                                                             • variable influence factor V IF (vi ): number of constraints in
elements. In our case, variables can have textual domain elements              which a variable vi appears
                                                                                                              related to the number of constraints,
                                                                                                          1 vi ∈ ci
(e.g. the brand of a bike) or numeric domain elements (e.g. the price
                                                                                                          
                                                                                                  P
                                                                                                    ci ∈C 
of a bike). While the set of relations for textual domain elements is                                     0 else

R = {=, 6=}, the set is extended to R = {=, 6=, <, ≤, >, ≥} for                e.g., V IF (vi ) =           |C|
                                                                                                                        .
numerical domain elements. Finally, Random(dom(av )) selects a               • variable influence
                                                                                         s
                                                                                                  factor V IFall : average influence of all variables
                                                                                                          P
                                                                                                           vj ∈V V IF (vj )
domain element from dom(av ) randomly.                                         P           (V IF (vi )−         |V |
                                                                                                                            )2
The function verif yChecks(CC) tests if the number of consistent                   vi ∈V                  |V |
and inconsistent checks are normally distributed. Therefore, we first        • coverage coverage: G IBBS S AMPLING(KB, ∅) (see Section 3.3)
divide the set with the consistency check results CC into two parts.
We evaluate the mean of both sets CC1 and CC2 and test if both               With the metrics we collected a lot of information about the knowl-
mean values are near to each other. If they have nearly the same             edge base. To evaluate the knowledge base, we aggregate the metrics
value, we can say that the consistent checks are normally distributed        and use the goal-question-metrics approach [1] to quantify the qual-
in both sets and return the difference between mean1 and mean2.              ity of the knowledge base.


                                                                        37                Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors
                                                                                       Proceedings of the 17th International Configuration Workshop
                                                                                                            September 10-11, 2015, Vienna, Austria
  The aggregation of metrics will be used to answer questions relat-             of knowledge engineers about the knowledge base. Further research
  ing to one or more goals. Next, we are listing the questions and the           should also be done in the context of stakeholder integration. For ex-
  corresponding metrics for each question:                                       ample, in the software engineering process it is common that several
                                                                                 stakeholders (e.g. customers and users) can participate in the engi-
Q1: Is the configuration knowledge base complete?:                               neering process. For the integration of different stakeholders and to
    |V |, domsize, |C|, coverage, |CS|, |∆|                                      optimize the knowledge engineering, further research should also be
Q2: Does the knowledge base contain anomalies?:                                  done in the context of knowledge engineering processes and knowl-
    |CS|, |∆||R|, DE, F M, U R                                                   edge base development.
Q3: Does the configuration knowledge base have an admissible per-
    formance?:
    |V |, domsize, |C|, |R|DE, F M, U R                                          REFERENCES
Q4: Is the configuration knowledge base modifiable?:                              [1] Victor R. Basili.         Software modeling and measurement: The
    M CCS, M C∆, M CR, DE, F M, U R, RR, RR2 ,                                        goal/question/metric paradigm. Technical report, University of Mary-
    V IFall , Coverage                                                                land at College Park College Park, MD, USA, College Park, MD, USA,
                                                                                      1992.
Q5: Is the configuration knowledge base understandable?:                          [2] Alexander Felfernig, David Benavides, Jos A. Galindo, and Florian Re-
    M CCS, M C∆, M CR, DE, F M, U R, RR, RR2 ,                                        infrank. Towards anomaly explanation in feature models. Workshop on
    V IFall , coverage                                                                Configuration, pages 117 – 124, 2013.
                                                                                  [3] Alexander Felfernig, Paul Blazek, Florian Reinfrank, and Gerald Nin-
  Based on the answers for these questions we can evaluate the quality                aus. Intelligent User Interfaces for Configuration Environments, vol-
  of a knowledge base. The quality will be measured in terms of three                 ume 1, pages 89 – 106. Morgan Kaufmann, 2014.
                                                                                  [4] Alexander Felfernig, Stefan Reiterer, Martin Stettinger, Florian Rein-
  goals which we will list in the following:                                          frank, Michael Jeran, and Gerald Ninaus. Recommender systems for
                                                                                      configuration knowledge engineering. Workshop on Configuration,
G1: A configuration knowledge base must be maintainable, such that                    pages 51 – 54, 2013.
    it is easy to change the semantics of the knowledge base in a de-             [5] Alexander Felfernig, Christoph Zehentner, and Paul Blazek. Corediag:
    sired manner (corresponding questions: Q2 anomalies, Q4 modi-                     Eliminating redundancy in constraint sets. In Martin Sachenbacher,
    fiability)                                                                        Oskar Dressler, and Michael Hofbaur, editors, DX 2011. 22nd Interna-
                                                                                      tional Workshop on Principles of Diagnosis, pages 219 – 224, Murnau,
G2: A configuration knowledge base must be understandable, such                       GER, 2010.
    that the effort for a maintainability task for a knowledge engineer           [6] Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard
    can be evaluated (corresponding questions: Q2 anomalies, Q5 un-                   Friedrich. Recommender Systems: An Introduction, volume 1. Univer-
    derstandability)                                                                  sity Press, Cambridge, 2010.
                                                                                  [7] Ulrich Junker. Quickxplain: preferred explanations and relaxations for
G3: A configuration knowledge base must be functional, such that it                   over-constrained problems. In Proceedings of the 19th national confer-
    represents a part of the real world (e.g. a bike configuration knowl-             ence on Artifical intelligence, AAAI’04, pages 167–172. AAAI Press,
    edge base; corresponding questions: Q1 completeness, Q2 anoma-                    2004.
    lies, Q3 performance).                                                        [8] Michael Pazzani and Daniel Billsus. Content-based recommendation
                                                                                      systems. In Peter Brusilovsky, Alfred Kobsa, and Wolfgang Nejdl, ed-
  The results of the GQM-approach can be explained by a compari-                      itors, The Adaptive Web, volume 4321 of Lecture Notes in Computer
                                                                                      Science, pages 325–341. Springer Berlin / Heidelberg, 2007.
  son with the measurements of previous versions of the knowledge                 [9] Cédric Piette. Let the solver deal with redundancy. In Proceedings of
  base. The comparison can show, if maintainability, understandabil-                  the 2008 20th IEEE International Conference on Tools with Artificial
  ity, and functionality increases or decreases over time and explain                 Intelligence - Volume 01, pages 67–73, Washington, DC, USA, 2008.
  the changes (based on a comparison of the answers for the questions                 IEEE Computer Society.
                                                                                 [10] Florian Reinfrank, Gerald Ninaus, Bernhard Peischl, and Franz
  and metrics). For a detailed description of the GQM-approach for
                                                                                      Wotawa. A goal-question-metrics model for configuration knowledge
  constraint-based configuration systems we refer the reader to [10].                 bases. Configuration Workshop, 2015.
                                                                                 [11] Florian Reinfrank, Gerald Ninaus, Franz Wotawa, and Alexander
                                                                                      Felfernig. Maintaining constraint-based configuration systems: Chal-
  4    Summary                                                                        lenges ahead. Configuration Workshop, 2015.
                                                                                 [12] Raymond Reiter. A theory of diagnosis from first principles. Artificial
  In this paper we presented approaches to improve the mainte-
                                                                                      Intelligence, 32(1):57–95, 1987.
  nance for constraint-based configuration systems. We described the             [13] Stuart Russell and Peter Norvig. Artificial Intelligence. A modern ap-
  state-of-the-art in conflict and redundancy management and intro-                   proach, volume 2. Prentice Hall, New Jersey, 2003.
  duced recommendation for the support of knowledge engineers. New               [14] Edward Tsang. Foundations of Constraint Satisfaction. Academic
  anomaly detection algorithms can be used to detect well-formedness                  Press, 1993.
                                                                                 [15] Franz Wotawa, Florian Reinfrank, Gerald Ninaus, and Alexander
  violations. Simulation techniques in the context of constraint-based                Felfernig. icone: intelligent environment for the development and main-
  configuration systems allow us to approximate metrics for the goal-                 tenance of configuration knowledge bases. IJCAI 2015 Joint Workshop
  question-metrics approach. We implemented these approaches in our                   on Constraints and Preferences for Configuration and Recommenda-
  web-based system called ’iCone’ (Intelligent environment for the                    tion, 2015.
                                                                                 [16] Franz Wotawa, Martin Stettinger, Florian Reinfrank, Gerald Ninaus,
  development and maintenance of configuration knowledge bases)
                                                                                      and Alexander Felfernig. Conflict management for constraint-based
  [15].4                                                                              recommendation. IJCAI 2015 Joint Workshop on Constraints and Pref-
  While we presented novel approaches to support knowledge engi-                      erences for Configuration and Recommendation, 2015.
  neers, further research has to be done in the verification of the new
  recommendation, simulation, and metrics evaluation techniques. Fur-
  thermore, micro tasks can be used to collect and verify assumptions
  4           http://ase-projects-studies.ist.tugraz.at:
      8080/iCone/


Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors                    38
Proceedings of the 17th International Configuration Workshop
September 10-11, 2015, Vienna, Austria

</pre>