Intelligent Supporting Techniques for the Maintenance of Constraint-based Configuration Systems12 Florian Reinfrank, Gerald Ninaus, Franz Wotawa, Alexander Felfernig Institute for Software Technology Graz University of Technology Inffeldgasse 16b/II, 8010 Graz, Austria {firstname.lastname}@ist.tugraz.at Abstract. Constraint-based systems like knowledge-based recom- The knowledge engineer has to extend the current knowledge base mendation and configuration are well established technologies in with new product attributes (e.g., introducing a new product feature many different product areas like cars, computers, notebooks, and eBike) and questions (e.g., ’Do you want an electric engine assis- financial services. Such systems reduce the number of valid products tance?’). In complex constraint-based configuration system updates and configurations regarding the customers’ preferences. The rela- are time consuming and error prone because unexperienced knowl- tionship between product variables and customer questions is rep- edge engineers have to adapt the knowledge base and unexpected resented by constraints. Nowadays, constraint-based configuration dependencies between constraints exist. systems represent volatile product assortments: Variables must be In this paper we show how we can support knowledge engineers adapted, new product features lead to new questions for the cus- when they maintain a constraint-based configuration system. The tomer, and / or the constraints must be updated. We call such sce- approaches can be used in many scenarios like knowledge-based narios maintenance tasks. recommendation, knowledge-based configuration, or feature models. In complex constraint-based configuration systems the mainte- Due to complex restrictions to adapt the product assortment regard- nance task is time consuming and error prone. Previous research fo- ing the customers’ preferences, intelligent techniques can be used if cused on the detection of conflicts, repair actions for the conflicts, the preferences can not be fulfilled. and redundant constraints. In this paper we give an overview about This paper is organized as follows. Section 2 gives an overview these techniques and present new approaches like recommendation, about constraint-based configuration systems. It introduces a running well-formedness violation, simulation, and knowledge base verifica- example and relevant definitions for this paper. Our new approaches tion for the support of knowledge engineers. to support knowledge engineers in maintaining constraint-based con- figuration systems are described in Section 3. A summary in Section 4 concludes this paper. 1 Introduction In e-Commerce applications constraint-based configuration systems 2 Related Work are used to show which combinations of product variable assign- ments can be combined and offered to potential customers. Due to In this Section we give an overview about constraint-based configu- complex restrictions to adapt the product assortment regarding the ration systems, introduce a running example for this paper and define customers’ preferences, intelligent techniques can be used if the cus- relevant terms which are necessary to explain the intelligent support- tomers’ preferences can not be fulfilled. ing techniques from Section 3. A knowledge engineer develops and maintains such knowledge For our constraint-based configuration system we use the con- bases. Based on knowledge (for example, knowledge of bikes) from straint satisfaction problem (CSP) modeling technique [14]. A CSP domain experts, the engineer defines product variables and variable is a triple (V, D, C) and consists of a set of variables V and a set domains (e.g., the domain of the product variable BikeT ype’ is of domains D where each domain dom(vi ) represents all valid as- M ountainBike, CityBike, and RacerBike), prepares additional signments for a variable vi , e.g., dom(vi ) = {val1 , . . . , valn }. The questions presented to potential customers (e.g., ‘What is the main set C contains all constraints which restrict the number of valid in- usage of the bike?’), and develops relationships (constraints) be- stances of a constraint-based configuration system. Basically, a con- tween questions and products (e.g., if the main usage of the bike is straint consists of a set of assignments a for variables and relations everyday lif e then the bike should be of the type CityBike). between them. The set A(ci ) is the set of assignments a constraint ci Such knowledge bases must be updated over time. For exam- has. If a constraint contains only one assignment, we denote such ple, in the last years bikes with an electric engine became popular. constraints unary constraint or assignment [13]. Furthermore, the constraints can be divided into two different types. First, the set CKB 1 We thank the anonymous reviewers for their helpful comments. 2 The work presented in this paper has been conducted within the scope contains all constraints which describe the domain. For example, it is of the research project ICONE (Intelligent Assistance for Configuration not allowed to use mountain bike tires (T ireW idth > 50mm) for Knowledge Base Development and Maintenance) funded by the Austrian racing bikes (BikeT ype = RacerBike), s.t. c = ¬(BikeT ype = Research Promotion Agency (827587). RacerBike ∧ T ireW idth > 50mm). Second, the committed cus- 31 Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors Proceedings of the 17th International Configuration Workshop September 10-11, 2015, Vienna, Austria tomers’ preferences are represented as constraints in the set CR . The following example is denoted as a CSP and shows a bike knowledge base. It contains variables which represent product variables as well as customer requirements. The set CR contains an example for customer requirements. V = {BikeT ype, F rameSize, eBike, T ireW idth, U niCycle, U sage} D={ dom(BikeT ype) = {M ountainBike, CityBike, RacerBike}, dom(F rameSize) = {40cm, 50cm, 60cm}, dom(eBike) = {true, f alse}, dom(T ireW idth) = {23mm, 37mm, 57mm}, dom(U niCycle) = {true, f alse}, Figure 1. Different types of anomalies. dom(U sage) = {Competition, EverydayLif e, HillClimbing} Definition 3 ’Consistent Instance’: An instance (complete or in- } complete) is consistent, if no constraint in C is violated. CKB = { c0 := BikeT ype = M ountainBike → T ireW idth > In constraint-based configuration systems it can happen, that the sys- 37mm ∧ F rameSize ≥ 50cm; tem can not offer consistent instances to a user (anomaly) because it c1 := BikeT ype = RacerBike → T ireW idth = is not possible to satisfy all constraints (see Definition 3). Such a ’no 23mm ∧ F rameSize = 60cm; solution could be found’ dilemma is caused by at least one conflict c2 := BikeT ype = CityBike → T ireW idth = between a) the constraints in the knowledge base CKB and the cus- 37mm ∧ F rameSize ≥ 50cm; tomer requirements CR or b) within the set CKB itself. Definition 4 c3 := ¬(BikeT ype 6= CityBike ∧ eBike = true); introduces a formal representation of a conflict. c4 := U sage = EverydayLif e → BikeT ype = CityBike; Definition 4 ’Conflict’: A conflict is a set of constraints CS ⊆ c5 := U sage = HillClimbing → BikeT ype = {CKB ∪ CR } which can not be fulfilled by the CSP, s.t. CS is incon- M ountainBike; sistent. c6 := U sage = Competition → BikeT ype = RacerBike ∧ F rameSize = 60cm; If we have an inconsistency in our knowledge base, we can say that c7 := eBike = true → T ireW idth = 37mm; CKB ∪ CR is always a conflict set. To have a more detailed informa- c8 := U niCycle = f alse; tion about the inconsistency, we introduce the term ’minimal conflict’ } which is described in Definition 5. CR = { c9 : F rameSize = 50cm; Definition 5 ’Minimal Conflict’: A minimal conflict CS is a conflict c10 : U sage = Competition; (see Definition 4) and the set CS only contains constraints which are c11 : eBike = true; responsible for the conflict, s.t. @c∈CS CS \ {c} is inconsistent. } C = CKB ∪ CR When we focus on the set CR and say, that CKB is consistent, our ex- The example contains some anomalies in terms of conflicts, redun- ample contains two minimal conflict sets. CS1 = {c9 , c10 } because dancies and well-formedness violations. Figure 1 gives an overview it is not possible to have a bike for competition with a frame size of of different types of anomalies. In the following, we list definitions 50cm and CS2 = {c10 , c11 } because bikes used for competition to define the anomalies. do not support eBikes. The example shows that a knowledge base The constraint set C restricts the set of valid instances. While CKB can have more than one conflict. In such cases we can help users to remains stable during a user session she can add her preferences in resolve the conflicts with diagnosis. A diagnosis ∆ is a set of con- the set CR . An instance is given, if at least one customer preference straints. The removal of the set ∆ from CR leads to a consistent is added to CR . Definition 1 introduces the term ’instance’. knowledge base, formally described in Definition 6. Definition 1 ’Instance’: An instance is given if at least one con- Definition 6 ’Diagnosis’: A diagnosis ∆ is a set of constraints ∆ ⊆ straint in the set CR , s.t. CR 6= ∅. CR ∪CKB . When removing the set ∆ from CR ∪CKB , the knowledge base will be consistent, s.t. CR ∪ CKB \ ∆ is consistent. In a complete instance all variables in the knowledge base have at least one assignment. Definition 2 introduces the definition for a Assuming that CKB is consistent (see Definition 3), we can say that complete instance. the knowledge base always will be consistent if we remove CR . In Definition 2 ’Complete Instance’: An instance is complete iff all Definition 7 we introduce the term ’minimal diagnosis’ which helps product variables have an assignment, such that ∀v∈V v 6= ∅. to reduce the number of constraints within a diagnosis. Instances can either fulfill all constraints in a constraint set C (con- Definition 7 ’Minimal Diagnosis’: A minimal diagnosis ∆ is a di- sistent) or not (inconsistent). Definition 3 defines the term ’consistent agnosis (see Definition 6) and there doesn’t exist a subset ∆0 ⊂ ∆ instance’. which has the same property of being a diagnosis. Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors 32 Proceedings of the 17th International Configuration Workshop September 10-11, 2015, Vienna, Austria The example configuration knowledge base contains two minimal In our example the variable pair U sage and BikeT ype is unnec- diagnoses. The removal of the set ∆1 = {c9 , c11 } or ∆2 = {c10 } essary refined because whenever U sage = EverydayLif e the leads to a consistent configuration knowledge base. After having cal- BikeT ype = CityBike, U sage = HillClimbing always leads culated diagnoses and removed the constraints which are in one diag- to BikeT ype = M ountainBike, and U sage = Competition is nosis, we can ensure a consistent knowledge base which is necessary always combined with the assignment BikeT ype = RacerBike. for calculating redundancies and well-formedness violations. If such a violation occurs, we can recommend the knowledge engi- A redundancy is a set of redundant constraints in the knowledge neer to remove the variable U sage and replace it with the variable base. A constraint c is redundant, if a knowledge base KB 0 with- BikeT ype in the constraints. out the constraint c has the same semantics3 as the knowledge base KB which contains the constraint. Redundant constraints are for- mally described in Definition 8. 3 Intelligent Support for the Maintenance of Constraint-based configuration systems Definition 8 ’Redundant constraint’: A constraint c is redundant iff In this Section we describe existing (conflict and redundancy man- the removal of the constraint from CKB leads to the same semantics, agement) and new (recommendation, well-formedness, simulation, s.t. CKB \ {c} |= c. metrics) intelligent techniques to support knowledge engineers in In our example, the constraint c7 is redundant, since only CityBikes their maintenance tasks. can be eBikes (c3 ) and have tires with a width of 37mm (c2 ). While conflicts, diagnoses, and redundancies focus on constraints, 3.1 Intelligent Recommendation well-formedness violations identify anomalies based on variables and domain elements [2]. We now introduce well-formedness vio- Constraint-based knowledge bases can have hundreds or thousands lations in constraint-based configuration systems. of variables, domain elements, and constraints. If there is a main- The first well-formedness violation focuses on dead domain ele- tenance task (e.g., inserting new tire sizes), recommendation tech- ments. A dead domain element is an element which can never be niques help to differentiate between relevant and not relevant infor- assigned to its variable in a consistent instance (see Definition 3). mation within the knowledge base. For example, the tires of a bike Definition 9 introduces a formal description of dead elements. probably have an influence on the frame of a bike but does not influ- ence the bell of a bike. In such cases, recommendation techniques de- Definition 9 ’Dead domain elements’: A domain element val ∈ tect items (variables, domain elements, constraints, test cases) which dom(v) is dead iff it is never in a consistent instance, s.t. CKB ∪ are influenced by the tires and the knowledge engineer can focus on {v = val; } is inconsistent. these items. We describe four different types of recommendation to support knowledge engineers in their maintenance tasks [4]. The assignments F rimeSize = 40cm and U niCycle = true; The first recommendation approach is the most viewed recommenda- can never be part of a consistent instance because M ountainBikes tion which is user-independent. It can be useful for new engineers of and CityBikes require at least 50cm and RacerBikes require a a product domain. F rameSize of 60cm and our current knowledge base does not sup- Second, recently added lists new items (products, product vari- port U niCycles. ables, questions, and constraints) in the knowledge base. It is user- On the other hand, we can have domain elements which are assigned dependent since it considers the last log in of the knowledge engineer to each consistent instance. We denote such domain elements full and helps to get a fast understanding of the previous changes in the mandatory and introduce definition 10. knowledge base. Definition 10 ’Full mandatory’: A domain element val1 ∈ The next type of recommendation is collaborative filtering. This type dom(vi ) is full mandatory iff there is no consistent (complete or in- of recommendation takes the ratings for items into account and looks complete) instance where the variable vi does not have the assign- for knowledge engineers with similar ratings. In our case, we don’t ment val1 , s.t. CKB ∪ {vi 6= val1 } is inconsistent. have ratings but use the interaction with items as ratings. If a knowl- edge engineer looks at products, she ’rates’ the item with 1. 2 will The knowledge base can never be consistent if U niCycle 6= f alse. be added by the knowledge engineer if she is editing an item. Table In that case, we can say that the domain element f alse of the do- 1 shows an example for a collaborative filtering recommendation for main dom(U niCycle) is full mandatory and U niCycle = true knowledge engineer u0 based on our example in Section 2. can never be in a consistent knowledge base (dead domain element). Another well-formedness violation is called unnecessary refinement. c0 c1 c2 c3 c4 c5 c6 c7 c8 Such an unnecessary refinement consists of two variables. If the first u0 1 1 2 1 ? variable has an assignment, it is possible to predict the assignment u1 1 1 1 1 1 u2 1 2 1 2 2 of the second variable because the second variable can only have u3 1 1 2 exactly one consistent assignment. A formal definition is given in Definition 11. Table 1. An example for collaborative filtering. 1 means that the item ci is viewed by the user uj , 2 means that the item is edited and ’ ’ means that the Definition 11 ’Unnecessary refinement’: A knowledge base con- item is neither viewed nor edited by the user. tains a variable pair vi , vj . For each domain element val1 of vari- able vi , we can say that variable vj always has the same assignment In table 1 we try to find out if we should recommend item c7 to vj = val2 , s.t. ∀val1 ∈dom(vi ) ∃val2 ∈dom(vj ) vi = val1 ∧ vj 6= val2 knowledge engineer u0 . The common process to find recommend- is inconsistent. able items is twofold. First, we try to find knowledge engineers with 3 We use the term ’semantics’ to describe a knowledge base KB 0 with the similar interests. In our example, u1 and u2 have nearly the same same solution set as KB. items viewed or edited. Second, we have to evaluate if the similar 33 Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors Proceedings of the 17th International Configuration Workshop September 10-11, 2015, Vienna, Austria knowledge engineers are interested in the item. Therefore, we use the Algorithm 2 QuickXPlain’ (CKB , ∆, CR ):∆ Pearson correlation [3, 6]. In our example, u1 , and u2 have viewed / . CKB : Set of constraints which can’t be part of a conflict edited item c7 and we can recommend c7 to knowledge engineer c0 . . ∆: Set of constraints which can be part of a conflict Another recommendation approach is the usage of content-based fil- . CR : Set of constraints which will be evaluated tering. The basic idea is to find similar items compared to a reference if ∆ 6= ∅ and inconsistent(CKB ) then item. We take the variable names and domain values from a con- return ∅; straint and evaluate the similarities between the reference item and end if all other items. The similarities are measured by the TF-IDF (term if singleton(CR ) then frequency and inverse document frequency) algorithm [6, 8] where return CR ; the item is the document and the terms are the variables and domain end if elements. Table 2 shows the similarity values between constraint c7 k ← d r2 e; as reference constraint with the other constraints. C1 ← {c1 , ..., ck } ∈ CR ; constraint similarity C2 ← {ck+1 , ..., cr } ∈ CR ; c0 0.50 ∆1 ← QuickXP lain0 (CKB ∪ C1 , C1 , C2 ); c1 0.17 ∆2 ← QuickXP lain0 (CKB ∪ ∆1 , ∆1 , C1 ); c2 0.50 return(∆1 ∪ ∆2 ); c3 0.00 c4 0.00 c5 0.00 Contrary to QuickXPlain, FastDiag is an algorithm to calculate a c6 0.33 minimal diagnosis and has C and CR as input. C contains all con- c8 0.00 straints, s.t. C = CKB ∪ CR . If C is an empty set, FastDiag has Table 2. Similarities between constraint c7 with the other constraints based no diagnosable set and the algorithm stops. It also stops if the set on content based recommendation C \ CR is inconsistent, because this set contains inconsistencies, but will not be diagnosed. If both preconditions are fulfilled, Algorithm With this approach, we can say, that there is a high relationship be- 4 will calculate one diagnosis. tween constraint c7 with the constraints c0 and c2 and a weak rela- tionship with the constraints c6 and c1 . Algorithm 3 FAST D IAG(CR , C):∆ . CR : Set of constraints which will be diagnosed . C: inconsistent knowledge base including all constraints 3.2 Intelligent Anomaly Management if C = ∅ ∨ inconsistent(CKB − C) then As mentioned in Section 2, there are many anomalies in our example return ∅; knowledge base. In the following, we describe algorithms for detect- else ing conflicts, diagnoses, redundancies, and well-formedness viola- return DIAG(∅, CR , C) tions as well as explanations for those anomalies. These algorithms end if reduce the time to detect the anomalies and explain the anomaly to get a higher understanding of the knowledge base. First of all, DIAG checks whether C is consistent. If it is consistent, Junker [7] described a divide-and-conquer approach to detect con- each subset of C is also consistent and no constraint in C can be a flicts in knowledge bases. The algorithm takes two sets of constraints part of the diagnosis. Otherwise, CR will be divided into two sub- as input. CKB is a set of constraints which can not be part of a di- sets C1 and C2 . Each subset will be removed from C separately and agnosis. The constraints in the set CR will be taken into account for checked again in a recursive manner. If C \ C1 is consistent, we can calculating a diagnosis. If the set CR is not empty and CR ∪ CKB say that C2 is consistent and an empty set will be returned. If it is in- is not consistent, the algorithm returns a set of constraints which is a consistent, at least one constraint in C1 must be part of the diagnosis minimal conflict (see Definition 5). and therefore C1 will be divided and tested again unless |C| = 1. The algorithm returns ∆1 ∪ ∆2 which is a minimal diagnosis. Algorithm 1 QuickXPlain (CKB , CR ):∆ With the previous algorithms we can a.) support customers when . CKB : set of not diagnosable constraints they do not get any products for their preferences (CKB ∪ CR is . CR : set of diagnosed constraints inconsistent) and b.) support knowledge engineers when they main- if isEmpty(CR ) or consistent(CKB ∪ CR ) then tain a constraint-based configuration system with conflicts in CKB . return ∅; For a detailed description of the visualization of conflicts we refer else the reader to [16]. return QuickXP lain0 (CKB , ∆, CR ); When we can assume that CKB is consistent, we can continue with end if redundancy and well-formedness checks. Please note that the follow- ing algorithms are applied to the constraint set CKB and ignore CR , The algorithm QuickXPlain’ takes three sets as input. While ∆ is s.t. C = CKB . initially empty, the set CKB contains the constraints which can not The first approach for detecting redundancies has been proposed by be part of a conflict and the constraints which are part of a conflict Piette [9]. The approach is the following: a knowledge base aggre- are in the set CR . Note that the set CKB can also be empty. The gated with its negotiation must be inconsistent, formally described as algorithm is a recursive divide-and-conquer algorithm. It splits the C ∪C is inconsistent and C = {¬c0 ∨¬c1 ∨...∨¬cn }. By removing set CR into two parts (C1 and C2 ) and adds the part C1 to CKB . C2 a constraint ci separately from C, the algorithm checks whether the will be evaluated by doing a recursive call with C2 as the set which result of C − {ci } ∪ C is still inconsistent. If this is the case, then the has to be evaluated. constraint ci is redundant and can be removed. Finally, the algorithm Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors 34 Proceedings of the 17th International Configuration Workshop September 10-11, 2015, Vienna, Austria Algorithm 4 DIAG(∆, CR , C):∆ Algorithm 7 C ORE D(B, ∆, C): ∆ . ∆: Set of diagnosed constraints . B: Consideration set . CR : Set of constraints which will be diagnosed . ∆: Constraints added to B . C: CR ∪ CKB . C: set of constraints to be checked for redundancy if ∆ 6= ∅ and consistent(C) then if ∆ 6= ∅ and inconsistent(B) then return ∅; return ∅; end if end if if singleton(CR ) then if singleton(C) then return CR ; return(C); end if end if k ← d |C2R | e; k ← d |C| 2 e; C1 ← {c0 , ..., ck } ∈ CR ; C1 ← {c1 , c2 , ..., ck } ∈ C; C2 ← {ck+1 , ..., cn } ∈ CR ; C2 ← {ck+1 , ck+2 , ..., cn } ∈ C; ∆1 ← DIAG(C1 , C2 , C − C1 ); ∆1 ← C ORE D(B ∪ C2 , C2 , C1 ); ∆2 ← DIAG(∆1 , C1 , C − ∆1 ); ∆2 ← C ORE D(B ∪ ∆1 , ∆1 , C2 ); return(∆1 ∪ ∆2 ); return(∆1 ∪ ∆2 ); returns the set C without redundant constraints. Hence the shifted constraint can not be part of an anomaly and further anomalies can be detected. Algorithm 5 SEQUENTIAL(C): ∆ Both algorithms (SEQUENTIAL and CoreDiag) can be used to de- . C: knowledge base tect redundant constraints. As mentioned in Section 2 a constraint . C: the complement of C consists of a set of variable assignments A(ci ). When we want to . ∆: set of redundant constraints test if an assignment of a constraint is redundant, we have to remove Ct ← C; the assignment from the constraint and check, if the knowledge base for all ci in Ct do is still redundant. For a detailed description of assignment-based re- if isInconsistent(Ct − ci ∪ C) then dundancy detection we refer the reader to [11]. Ct ← Ct − {ci }; We also have to discuss the usefulness of redundant constraints. On end if the one hand, desired redundancies can help to increase the under- end for standing of a knowledge base. For example, if many implications ∆ ← C − Ct ; (e.g. A → B; B → C; C → D) are in the knowledge base, a con- return ∆; straint A → D may helps to understand the knowledge base. On the other hand, redundant constraints can increase the effort for updates. Another approach (CoreDiag) has been proposed by Felfernig et al. If the redundant constraint are not identified by the knowledge engi- [5]. Instead of a linear approach, they adapt the QuickXPlain al- neer, the knowledge base does not have a correct behavior anymore. gorithm. The divide-and-conquer approach of this algorithm checks Next, we describe the algorithms to detect well-formedness viola- whether removing a set of constraints C1 leads to an inconsistency tions. First, Algorithm 8 takes sets of constraints (C) and variables formally described as C − C1 ∪ C is inconsistent. If it is not incon- (V ) as input parameters and returns a set of variable assignments. sistent, C1 must be further divided and tested again. Each of the assignments can never be consistent with C. The sugges- tion for the knowledge engineer is, that the domain elements which Algorithm 6 C ORE D IAG (CKB): ∆ will be returned by the algorithm can be deleted. . C : set with all constraints . C: the complement of C Algorithm 8 DeadDomainElement (C, V ): ∆ . ∆: set of redundant constraints . C: knowledge base constraints C ← {¬c1 ∨ ¬c2 ∨ ... ∨ ¬cn }; . V : knowledge base variables return(C − C ORE D(C, C, C)); . ∆ set with inconsistent variable assignments for all vi ∈ V do CoreD (Algorithm 7) checks, if B ⊆ C is inconsistent. An inconsis- for all domj ∈ dom(vi ) do tency of B ∪ C means that the subset is not redundant and no con- C 0 = C ∪ {vi = domj } straint of B will be a part of ∆. singleton(C) = true means that if inconsistent(C 0 ) then |C| is redundant and will be returned. Otherwise the constraint set C ∆ ← {vi = domj } will be further divided and the subsets will be checked recursively. end if With the presented approaches we can calculate one conflict, diag- end for nosis, or constraint set without redundancies. In complex knowledge end forreturn ∆ bases we can assume that many anomalies are in the knowledge base. For calculating all conflicts / diagnoses / redundant constraint sets, While we can evaluate if a domain element can never be in a con- we use Reiter’s HSDAG approach [12]. This approach takes the re- sistent instance, we can also check if a domain element must be in sult of one of the algorithms above and expands branches for each a consistent instance of a knowledge base. We denote such domain constraint in the result set. The constraint will be inserted into the elements as full mandatory. Algorithm 9 checks whether the knowl- set which can not be part of the result, e.g. a constraint ci will be edge base will be inconsistent, if the domain element domj is not removed from CR and added to CKB in the QuickXPlain algorithm. selected. 35 Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors Proceedings of the 17th International Configuration Workshop September 10-11, 2015, Vienna, Austria Algorithm 9 FullMandatory (C, V ): ∆ each type of anomaly. We take the set of constraints (e.g. set of dead . C: knowledge base constraints elements) and add this set to CKB in the algorithm. Now we have . V : knowledge base variables ensured, that the constraint which describes the anomaly, can’t be . ∆ set with inconsistent variable assignments part of ∆ in the algorithm. Next, we have to negate the constraint set for all vi ∈ V do which describes the anomaly. Since the negation of the anomaly can for all domj ∈ dom(vi ) do never be consistent, QuickXPlain will return the set of constraints C 0 = C ∪ {vi 6= domj } which is responsible for the anomaly. For example, the dead domain if inconsistent(C 0 ) then element U niCycle = true will be negated and added to C. In that ∆ ← {vi 6= domj } case, QuickXPlain will return the set {c8 } as an explanation for the end if dead domain element. end for end forreturn ∆ 3.3 Simulation Due to the huge complexity of calculating all possible instances for If variable vi contains a full mandatory domain element, we can say, all possible assignments (see Section 2) in constraint-based configu- that each other domain element of vi is a dead element. If a do- ration systems we use Gibbs’ simulation to estimate the consistency main element is full mandatory, we suggest the knowledge engineer rate cr for a specific set of assignments A [11]. With this approxima- to delete all other domain elements or to remove the variable itself. tion, we can . . . Finally, we introduce an algorithm to detect unnecessary refinements between variables (see Definition 11). Algorithm 10 returns a set of . . . estimate the restriction rate (number of consistent instances constraints. Each of these constraints describe one unnecessary re- compared to all instances) and evaluate the knowledge base (see finement between two variables and each domain element between Section 3.4). both variables. The assignments between the variables are conjunc- . . . generate test cases for boundary value analysis [11]. tive and each domain element of variable vi is in a disjunctive order, . . . rank diagnoses and conflicts (assuming that a knowledge base e.g. (vi = vali1 ∧ vj = valj1 ) ∨ (vi = vali2 ∧ vj = valj2 ) ∨ (vi = with nearly the same restriction rate compared to the current vali3 ∧ vj = valj3 ). knowledge base is preferred). . . . generate reports for variety management (e.g. ’How many Algorithm 10 UnnecessaryRefinement (C, V ): ∆ bikes can be used for Competition, EverydayLif e, and . C: knowledge base constraints HillClimbing?’). . V : knowledge base variables An assignment is a constraint which contains one variable av , one . ∆ set with constraints domain element ad , and a relationship between variable and domain for all vi ∈ V do element ar (see Section 2). Examples for assignments are eBike = for all vj ∈ V |vi 6= vj do true; and BikeT ype = M ountainBike. Algorithm 11 is divided A = ∅; . set with assignments into three functions and shows the basic algorithm for estimating the for all domk ∈ dom(vi ) do consistency rate for a set of assignments. dompair = f alse; The function Gibbs(KB, A) is the main function of this algorithm. C 0 ← C ∪ {vi = domk } It has a knowledge base KB and a set of assignments A as input. for all doml ∈ dom(vj ) do The knowledge base contains sets of variables V ∈ KB and con- C 00 ← C 0 ∪ {vj 6= doml } straints C ∈ KB. The set CC (checks) contains all results from if inconsistent(C 00 ) ∧ dompair = f alse then consistency checks. A consistency check is either consistent (1) or dompair = true; inconsistent (0). The number of minimum calls is constant and given A ← A ∪ {vi = domk ∧ vj = doml } in variable mincalls. The total number of consistent checks is given end if in variable consistent. threshold is a constant and required to test, end for if the current set of consistency checks has a high accuracy. If the end for variable verif y is greater than the threshold, we can not guaran- if |A| = |dom(vi )| then tee, that the current result is accurate. Therefore, we have to execute ∆ ← ∆ ∪ disjunctive(A) the loop again. In the while-loop we first have to generate a set of end if new random assignments. Since assignments are also special types end for of constraints, we add them to the set C ∈ KB and do a consistency end forreturn ∆ check again. If randA ∪ C ∈ KB is consistent, we add 1 to the set CC and increment the variable consistent. Otherwise, we add 0 to The performance of this algorithm depends on the number of vari- the set CC. Finally, we verify all previous consistency checks. If the ables, their domain size, the number of unnecessary refinements, and variable verif y is lower than the variable threshold and we have the performance of the solver. In our short study with 14 knowledge more consistency checks than mincalls, we can return the number bases (up to 34 variables and domain sizes from two to 47) the de- of consistent checks divided by the total number of checks. tection of unnecessary refinements requires up to 375 ms (with 42 The function generateRandAssign(KB) is responsible for the unnecessary refinements) for the detection of all possible unneces- generation of new assignments. Random(C) returns the number sary refinements (Intel Xeon @ 2.4Ghz * 6 cores, 24GB RAM). of assignments which has to be generated randomly. Random(V ) To get a deep understanding of the anomalies we need to explain takes a variable from the knowledge base. If the variable is al- them to the knowledge engineer [2]. For the calculation of an ex- ready part of another assignment, the variable won’t be used again. planation we use the QuickXPlain algorithm (see Algorithm 2) for Random(R) selects a relation between the variable and the domain Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors 36 Proceedings of the 17th International Configuration Workshop September 10-11, 2015, Vienna, Austria Algorithm 11 GibbsSampling 3.4 Knowledge base Evaluation function G IBBS(KB, A): ∆ As mentioned in the previous Sections, we can analyze a knowl- CC = ∅ . set of consistency check results {0, 1} edge base in different ways and collect a lot of information about mincalls = 200 . constant the knowledge base. Finally, we can also evaluate the knowledge threshold = 0.01 . constant base in terms of metrics. Those metrics a) help to get information consistent = 0 about the quality of the knowledge base and b) get information about verif y = Double.M ax V alue the quality of previous changes. Next, we will describe some met- while n < mincalls ∨ verif y > threshold do rics, use them to answer five questions, and measure three goals for randA = A ∪ GENERATE R ANDA SSIGN(KB) constraint-based configuration systems (goal-question-metrics). The C.addAll(randA) . C ∈ KB metrics are based on a literature review focusing on knowledge engi- if isConsistent(KB) then neering. An overview of the literature review is given in [10]. In the consistent + + following list we describe several metrics. CC.add(1) else • Number of variables |V | ∈ KB.P CC.add(0) |dom(vi )| • Average domain size domsize: vi ∈V |V | end if C.removeAll(randA) • Number of constraints |CKB | ∈ KB verif y = VERIFY C HECKS(CC) • Number of minimal conflicts |CS|: see Definition 3 n++ • Minimal cardinality CS M CCS: the lowest number of constraints end while in a conflict set return consistent/n • Number of minimal diagnoses |∆|: see Definition 5 end function • Minimal cardinality diagnosis M C∆: the lowest number of con- function GENERATE R ANDA SSIGN(KB):A straints in a diagnosis A=∅ . A: set of assignments • Number of redundancy sets |R| n = Random(C) > 0: . generate n assignments • Maximal cardinality redundancy set M CR: the largest number of for i = 0; i < n; i + + do constraints in a redundancy set av = Random(V ) . V ∈ KB • dead elements DE: number of dead elements compared to the ar = Random(R) total number of all domain elements ad = Random(dom(av )) ( P P 0 C ∪ {vi = dj } = 6 ∅ A.add(a) vi ∈V dj ∈dom(vi ) end for 1 else DE = return A |V | × domsize end function function VERIFY C HECKS(CC):∆ • full mandatory F M : number of full mandatory domain elements CC1 = CC.split(0, |CC|/2) compared to the total number of all domain elements CC2 = CC.split((|CC|/2) + 1, |CC|) ( mean1 = mean(CC1 ) P P 0 C ∪ {vi 6= dj } = ∅ vi ∈V dj ∈dom(vi ) mean2 = mean(CC2 ) 1 else FM = if mean1 ≥ mean2 then |V | × domsize return mean1 − mean2 else • unnecessary refinement U R: whenever a variable vi has an as- return mean2 − mean1 signment, we can predict the assignment of variable vj , s.t. end if dom(vi ) → dom(vj ) |C| end function • restriction rate RR: |V | P #vars(ci ) c ∈C #vars(C) |C| • restriction rate RR2 : i |V | where #vars(ci ) is the number of variables in ci . • variable influence factor V IF (vi ): number of constraints in elements. In our case, variables can have textual domain elements which a variable vi appears related to the number of constraints, 1 vi ∈ ci (e.g. the brand of a bike) or numeric domain elements (e.g. the price  P ci ∈C  of a bike). While the set of relations for textual domain elements is 0 else R = {=, 6=}, the set is extended to R = {=, 6=, <, ≤, >, ≥} for e.g., V IF (vi ) = |C| . numerical domain elements. Finally, Random(dom(av )) selects a • variable influence s factor V IFall : average influence of all variables P vj ∈V V IF (vj ) domain element from dom(av ) randomly. P (V IF (vi )− |V | )2 The function verif yChecks(CC) tests if the number of consistent vi ∈V |V | and inconsistent checks are normally distributed. Therefore, we first • coverage coverage: G IBBS S AMPLING(KB, ∅) (see Section 3.3) divide the set with the consistency check results CC into two parts. We evaluate the mean of both sets CC1 and CC2 and test if both With the metrics we collected a lot of information about the knowl- mean values are near to each other. If they have nearly the same edge base. To evaluate the knowledge base, we aggregate the metrics value, we can say that the consistent checks are normally distributed and use the goal-question-metrics approach [1] to quantify the qual- in both sets and return the difference between mean1 and mean2. ity of the knowledge base. 37 Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors Proceedings of the 17th International Configuration Workshop September 10-11, 2015, Vienna, Austria The aggregation of metrics will be used to answer questions relat- of knowledge engineers about the knowledge base. Further research ing to one or more goals. Next, we are listing the questions and the should also be done in the context of stakeholder integration. For ex- corresponding metrics for each question: ample, in the software engineering process it is common that several stakeholders (e.g. customers and users) can participate in the engi- Q1: Is the configuration knowledge base complete?: neering process. For the integration of different stakeholders and to |V |, domsize, |C|, coverage, |CS|, |∆| optimize the knowledge engineering, further research should also be Q2: Does the knowledge base contain anomalies?: done in the context of knowledge engineering processes and knowl- |CS|, |∆||R|, DE, F M, U R edge base development. Q3: Does the configuration knowledge base have an admissible per- formance?: |V |, domsize, |C|, |R|DE, F M, U R REFERENCES Q4: Is the configuration knowledge base modifiable?: [1] Victor R. Basili. Software modeling and measurement: The M CCS, M C∆, M CR, DE, F M, U R, RR, RR2 , goal/question/metric paradigm. Technical report, University of Mary- V IFall , Coverage land at College Park College Park, MD, USA, College Park, MD, USA, 1992. Q5: Is the configuration knowledge base understandable?: [2] Alexander Felfernig, David Benavides, Jos A. Galindo, and Florian Re- M CCS, M C∆, M CR, DE, F M, U R, RR, RR2 , infrank. Towards anomaly explanation in feature models. Workshop on V IFall , coverage Configuration, pages 117 – 124, 2013. [3] Alexander Felfernig, Paul Blazek, Florian Reinfrank, and Gerald Nin- Based on the answers for these questions we can evaluate the quality aus. Intelligent User Interfaces for Configuration Environments, vol- of a knowledge base. The quality will be measured in terms of three ume 1, pages 89 – 106. Morgan Kaufmann, 2014. [4] Alexander Felfernig, Stefan Reiterer, Martin Stettinger, Florian Rein- goals which we will list in the following: frank, Michael Jeran, and Gerald Ninaus. Recommender systems for configuration knowledge engineering. Workshop on Configuration, G1: A configuration knowledge base must be maintainable, such that pages 51 – 54, 2013. it is easy to change the semantics of the knowledge base in a de- [5] Alexander Felfernig, Christoph Zehentner, and Paul Blazek. Corediag: sired manner (corresponding questions: Q2 anomalies, Q4 modi- Eliminating redundancy in constraint sets. In Martin Sachenbacher, fiability) Oskar Dressler, and Michael Hofbaur, editors, DX 2011. 22nd Interna- tional Workshop on Principles of Diagnosis, pages 219 – 224, Murnau, G2: A configuration knowledge base must be understandable, such GER, 2010. that the effort for a maintainability task for a knowledge engineer [6] Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard can be evaluated (corresponding questions: Q2 anomalies, Q5 un- Friedrich. Recommender Systems: An Introduction, volume 1. Univer- derstandability) sity Press, Cambridge, 2010. [7] Ulrich Junker. Quickxplain: preferred explanations and relaxations for G3: A configuration knowledge base must be functional, such that it over-constrained problems. In Proceedings of the 19th national confer- represents a part of the real world (e.g. a bike configuration knowl- ence on Artifical intelligence, AAAI’04, pages 167–172. AAAI Press, edge base; corresponding questions: Q1 completeness, Q2 anoma- 2004. lies, Q3 performance). [8] Michael Pazzani and Daniel Billsus. Content-based recommendation systems. In Peter Brusilovsky, Alfred Kobsa, and Wolfgang Nejdl, ed- The results of the GQM-approach can be explained by a compari- itors, The Adaptive Web, volume 4321 of Lecture Notes in Computer Science, pages 325–341. Springer Berlin / Heidelberg, 2007. son with the measurements of previous versions of the knowledge [9] Cédric Piette. Let the solver deal with redundancy. In Proceedings of base. The comparison can show, if maintainability, understandabil- the 2008 20th IEEE International Conference on Tools with Artificial ity, and functionality increases or decreases over time and explain Intelligence - Volume 01, pages 67–73, Washington, DC, USA, 2008. the changes (based on a comparison of the answers for the questions IEEE Computer Society. [10] Florian Reinfrank, Gerald Ninaus, Bernhard Peischl, and Franz and metrics). For a detailed description of the GQM-approach for Wotawa. A goal-question-metrics model for configuration knowledge constraint-based configuration systems we refer the reader to [10]. bases. Configuration Workshop, 2015. [11] Florian Reinfrank, Gerald Ninaus, Franz Wotawa, and Alexander Felfernig. Maintaining constraint-based configuration systems: Chal- 4 Summary lenges ahead. Configuration Workshop, 2015. [12] Raymond Reiter. A theory of diagnosis from first principles. Artificial In this paper we presented approaches to improve the mainte- Intelligence, 32(1):57–95, 1987. nance for constraint-based configuration systems. We described the [13] Stuart Russell and Peter Norvig. Artificial Intelligence. A modern ap- state-of-the-art in conflict and redundancy management and intro- proach, volume 2. Prentice Hall, New Jersey, 2003. duced recommendation for the support of knowledge engineers. New [14] Edward Tsang. Foundations of Constraint Satisfaction. Academic anomaly detection algorithms can be used to detect well-formedness Press, 1993. [15] Franz Wotawa, Florian Reinfrank, Gerald Ninaus, and Alexander violations. Simulation techniques in the context of constraint-based Felfernig. icone: intelligent environment for the development and main- configuration systems allow us to approximate metrics for the goal- tenance of configuration knowledge bases. IJCAI 2015 Joint Workshop question-metrics approach. We implemented these approaches in our on Constraints and Preferences for Configuration and Recommenda- web-based system called ’iCone’ (Intelligent environment for the tion, 2015. [16] Franz Wotawa, Martin Stettinger, Florian Reinfrank, Gerald Ninaus, development and maintenance of configuration knowledge bases) and Alexander Felfernig. Conflict management for constraint-based [15].4 recommendation. IJCAI 2015 Joint Workshop on Constraints and Pref- While we presented novel approaches to support knowledge engi- erences for Configuration and Recommendation, 2015. neers, further research has to be done in the verification of the new recommendation, simulation, and metrics evaluation techniques. Fur- thermore, micro tasks can be used to collect and verify assumptions 4 http://ase-projects-studies.ist.tugraz.at: 8080/iCone/ Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors 38 Proceedings of the 17th International Configuration Workshop September 10-11, 2015, Vienna, Austria