=Paper=
{{Paper
|id=Vol-1453/19_ReinfrankNinausPeischlWotawa_AGoal-Question-MetricsModel_Confws-15_p123
|storemode=property
|title=A goal-question-metrics model for configuration knowledge bases
|pdfUrl=https://ceur-ws.org/Vol-1453/19_ReinfrankNinausPeischlWotawa_AGoal-Question-MetricsModel_Confws-15_p123.pdf
|volume=Vol-1453
|dblpUrl=https://dblp.org/rec/conf/confws/ReinfrankNPW15
}}
==A goal-question-metrics model for configuration knowledge bases==
A Goal-Question-Metrics Model for Configuration Knowledge Bases Florian Reinfrank, Gerald Ninaus, Bernhard Peischl, Franz Wotawa Institute for Software Technology Graz University of Technology Inffeldgasse 16b/II, 8010 Graz, Austria {firstname.lastname}@ist.tugraz.at Abstract. Configuration knowledge bases are a well- knowledge engineers has to develop and maintain the knowl- established technology for describing configurable products edge base. like cars, computers, and financial services. Such knowledge In this paper we describe how to measure the quality of bases are characterized by sets of constraints, variables, and configuration knowledge bases.1 Therefore, we use the goal domains. Lot of research has been done for testing knowledge question metric approach (GQM). The first step in the GQM bases, finding conflicts, and recommending repair actions. approach is to define possible goals for the knowledge base, In contrast, less work has been done in the area of measur- like understandability, maintainability, and functionality. The ing the quality of configuration knowledge bases. Such qual- achievement of the goals will be measured by answers for a set ity measurements can help knowledge engineers to improve of questions. These answers will be aggregated and weighted. the maintainability, understandability, and functionality of After the aggregation and the weightings of answers, the qual- knowledge bases. Based on a literature review we first give an ity of the current version of the configuration knowledge base overview of the state-of-the-art in knowledge base metrics. We can be measured in terms of fulfilling the goals. Third, ques- will extend the current research by using the goal-question- tions will be answered by sets of metrics. In this paper we metrics (GQM) approach of the software engineering disci- define goals, questions, and metrics and show, how we can pline to find gaps for the characterization of knowledge bases. measure them. These results will help knowledge engineers in We will also identify further metrics to complete the model. focusing on relevant aspects of the configuration knowledge The results of this paper help knowledge engineers to reduce base to improve the management of a configuration knowl- the effort to develop and maintain configuration knowledge edge base, e.g., the efforts for maintainability, understand- bases. ability, and functionality of a knowledge base. The remainder of this paper is organized as follows: Sec- tion 2 introduces a working example for this work, anomalies 1 Introduction and their definitions. Section 3 gives an overview of the state Products like cars, computers, and software product lines of the art for goals, questions, and metrics for configuration can be customized according to consumers’ preferences. For knowledge bases and other research areas. In Section 4 we supporting users to get valid configurations and to support the discuss the GQM model and compare it with the results of manufacturing department in companies to get an overview other research areas. Finally, Section 5 summarizes this paper of their production lines, models of their products are neces- and gives an overview of further research in this area. sary [29]. Knowledge bases describe a part of the real world (e.g., the set of valid product configurations for bikes). The implementation of a knowledge base is typically done coop- 2 Configuration Knowledge Base eratively between domain experts and knowledge engineers A configuration knowledge base can be defined as a [5, 42]. Configuration knowledge bases can be represented, triple (V, D, C) with a set of variables V where each vari- for example, as constraint satisfaction problems (CSP [39]). able v ∈ V has a domain dom(v) ∈ D. For example, a Configuration knowledge bases (CKB) represent the com- bike configuration knowledge base could contain the variables plete set of valid product configurations. Adding, changing, V = {Ref lector, P edal, F ramecolor} where each variable has and removing constraints of such knowledge bases is neces- a domain D = {dom(Ref lector) = {yes, no}, dom(P edal) sary, because the set of valid product configurations changes = {Standard, Clip}, dom(F ramecolor) = {Green, Red}}. over time. Humans in general and knowledge engineers in par- The assignments to a variable (e.g., P edal = Clip) are de- ticular tend to keep efforts related to knowledge acquisition fined as constraints c ∈ C [39]. While such assignments and maintenance as low as possible. Due to cognitive limita- tions [11, 21, 37] anomalies such as inconsistencies, redundan- 1 Please consider, that the approach presented in this paper can cies, and well-formedness violations are in CKBs. The CKB also be used in other models like knowledge-based recommenda- maintenance task gets even more complicated, if a couple of tion [9] and feature models [4]. 123 Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors Proceedings of the 17th International Configuration Workshop September 10-11, 2015, Vienna, Austria are defined by users, other constraints are defined by do- flicts, redundancies and forms of well-formedness violations. main experts and are restricting the number of valid com- For a detailed description of anomalies we refer the reader to binations of variable assignments. For example, a constraint [14, 16, 20, 28]. ci = ¬(P edal = Standard ∧ F ramecolor = Red) does not In our example, we can see that the set of constraints allow a consistent configuration with P edal = Standard and {c0 , c2 , c3 , c5 } is in conflict because there does not exist a valid F ramecolor = Red. We call user assignments customer re- assignment for these constraints such that all constraints are quirements c ∈ CR and constraints which are describing the fulfilled. We call such scenarios conflicts [23] and they are relationship between customer requirements and product vari- defined in the Definitions 3 and 4. ables knowledge constraints c ∈ CKB . The sets CR and CKB Definition 3: a conflict is given, iff there exists a set of describe C, such that CR ∪ CKB = C [18]. Customers try constraints CS which can not result in a valid configuration to find a valid configuration for the configuration knowledge (see Definitions 1 and 2), such that CS ⊆ C is inconsistent. base which means, that they assign values to variables {CR } Definition 4: a minimal conflict is given, if the constraint and the set of assignments is not in conflict with CKB (see set CS leads to a conflict (see Definition 3), and there does Definitions 1 and 2). not exist a subset of CS with the same property of being a Definition1: A consistent configuration is defined as a conflict, such that, @CS 0 ⊂ CS is inconsistent. set of customer constraints CR . This set of constraints is not The example of this paper contains one minimal conflict: in conflict with CKB , such that, there exists one possibility to CS1 = {c0 , c2 , c3 , c5 } because this constraint set leads to no assign a value to each variable, formally defined as CKB ∪ CR valid configuration of the configuration knowledge base and is consistent. it is not possible to remove a constraint from CS1 without Definition2: A configuration is a consistent complete loosing the property of beeing a minimal conflict (see Defini- configuration, iff the knowledge base is a consistent configu- tions 3 and 4). A minimal conflict can be calculated with the ration (see Definition 1) and each variable has an assignment, QUICKXPLAIN algorithm [23]. If our model has more than such that @vi ∈V vi = ∅. one conflict and we want to have all minimal conflicts we Figure 1 gives an example about a configuration knowledge need to calculate an acyclic graph (HSDAG) which is defined base for a bike domain. The graphic is based on the notation in [35]. of feature models [6] where each variable is represented as a Solutions for such conflicts are called diagnoses [35, 17]. By feature and each feature has the same domain (dom(vi ) = removing a set of constraints ∆ from the configuration knowl- {true, f alse}). The notation of constraints of feature models edge base, we receive at least one valid assignment for each is described in Figure 1. variable of a configuration knowledge base, formally described The following configuration knowledge base (CKB) reflects in the Definitions 5 and 6. a constraint-based (CSP-based [39]) representation of the Definition 5: A set of constraints ∆ is called diagnosis, configuration model depicted in Figure 1. Each of the iff the removal of ∆ from the knowledge base C leads to a constraints in Figure 1 is part of the set CKB . valid configuration (see Definition 1 and 2), such that C \ ∆ is consistent. V ={ Definition 6: A set of constraints ∆ is called minimal di- Bike, Ref lector, P edal, F ramecolor, Green, Red agnosis, iff it is a diagnoses (see Definition 5) and it is not Standard, Clip possible to remove a constraint from ∆ without loosing the } property of being a diagnoses, such that C \ ∆0 ⊂ ∆ is con- D={ sistent. dom(Bike) = dom(Ref lector) = In our example the constraint sets ∆1 = {c0 }, ∆2 = {c2 }, dom(P edal) = dom(F ramecolor) = ∆3 = {c3 } and ∆4 = {c5 } are minimal diagnosis because re- dom(Green) = dom(Red) = dom(Standard) = moving one of these constraint sets would lead to a consistent dom(Clip) = {true, f alse} configuration knowledge base. } While conflicts and diagnosis are well discussed, little at- CKB = { tention has been done to redundancies in configuration knowl- c0 : Bike = true; edge bases. Piette [28] and Felfernig et al. [20] focused on the c1 : Bike = true ↔ Ref lector = true; problem of redundant constraint sets in knowledge bases and c2 : Bike = true ↔ P edal = true; defined the term redundancy (see Definition 7). c3 : Bike = true ↔ F ramecolor = true; Definition 7: A set of constraints R is redundant, if the c4 : Ref lector = true → P edal = true; removal of R leads to the same semantics of C, such that, c5 : ¬(P edal = true ∧ F ramecolor = true); C \ R |= R. c6 : F ramecolor = true ↔ (Green = true∨ In our example, we have two different sets of redundant Red = true); constraints: R1 = {c2 } and R2 = {c4 }. A redundancy does c7 : (Standard = true ↔ (Clip = f alse∧ not have an impact on the semantics of a knowledge base P edal = true)) ∧ (Clip = true ↔ (Standard but probably leads to a higher effort for maintenance tasks of = f alse ∧ P edal = true)); knowledge bases and decreases the performance of the config- } uration knowledge base. We can calculate such sets with the SEQUENTIAL [28] or CoreDiag [20] algorithm. Both algo- In this example there are different types of anomalies. rithms use the negation of C (C = ¬c0 ∨ ¬c1 ∨ ... ∨ ¬cn }) for Anomalies are patterns in data that do not conform to a well calculating redundant constraints. For checking the semantics defined notion of normal behavior [10]. Anomalies can be con- of C \ R the algorithms check, if C \ R ∪ C is inconsistent. An Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors 124 Proceedings of the 17th International Configuration Workshop September 10-11, 2015, Vienna, Austria Figure 1. Feature Model of the bike configuration example. inconsistency means, that the set R is redundant. For get- Another well-formedness violation is called unnecessary re- ting all different sets of redundant constraints we can also use finement. Such an unnecessary refinement consists of two vari- HSDAG [35]. ables. If the first variable has an assignment, it is possible to The third type of anomalies are well-formedness viola- predict the assignment of the other variable. A formal defini- tions. Such violations do not have an impact on the consis- tion is given in Definition 10. tency of a knowledge base (see Definitions 1 and 2). While Definition 10 (Unnecessary refinement): A configura- well-formedness is well discussed in the research area of fea- tion knowledge base contains a variable pair vi , vj . For each ture models (see e.g., [6]), less research has focused on well- domain element val1 of variable vi , we can say that vari- formedness violations in configuration knowledge bases. How able vj always has the same assignment vj = val2 , s.t. we use well-formedness violations for calculating metrics for ∀val1 ∈dom(vi ) ∃val2 ∈dom(vj ) vi = val1 ∧ vj 6= val2 is inconsis- configuration knowledge bases will be described in the next tent. Section. In our example the variable pair Bike and Ref lector Anomalies and other aspects of a configuration knowledge is unnecessary refined because whenever Bike = true base have impacts on the maintainability, understandability, the Ref lector = true, and Bike = f alse respectively and functionality of the knowledge base. In the following, we Ref lector = f alse leads to an inconsistency. If such a vio- give an overview of goals, questions, and metrics for config- lation occurs, we can recommend the knowledge engineer to uration knowledge bases which helps knowledge engineers to remove the variable Ref lector and rename the variable Bike get an understanding of the quality of the knowledge base. with BikeW ithRef lectors. While conflicts, diagnoses, and redundancies focus on con- straints, well-formedness violations identify anomalies based on variables and domain elements. We now introduce well- 3 A GQM model for Configuration formedness violations in configuration knowledge bases. Knowledge Bases The first well-formedness violation focuses on dead domain For the overview of the metrics for configuration knowl- elements. A dead domain element is an element which can edge bases we use the GQM method. For each goal we use a never be assigned to its variable in a consistent instance (see set of questions to define the achievement of each goal. It is Definition 1). Definition 8 introduces a formal description of also necessary that the goals, questions, and metrics can be dead elements. calculated automatically, and with explanations [2, 36]. Definition 8:: A domain element val1 ∈ dom(vi ) is dead iff In this section we first give an overview of the possible it is never part of a consistent instance, s.t. CKB ∪{vi = val1 ; } goals for configuration knowledge bases. Thereafter we give an is inconsistent. overview of the questions in (configuration) knowledge bases. On the other hand, we can have domain elements which Finally we operationalize the questions by listing metrics for have to be assigned to each consistent instance. We denote configuration knowledge bases. such domain elements full mandatory (see Definition 9). Definition 9:: A domain element val1 ∈ dom(vi ) is full mandatory, iff there is no consistent (complete or incom- 3.1 Goals for Configuration Knowledge plete) instance where the variable vi does not have the as- Bases signment val1 , s.t. CKB ∪ {vi 6= val1 } is inconsistent. The configuration knowledge base can never be consistent, Nabil et al. [27] define five basic goals for knowledge bases. if Bike = f alse or Ref lector = f alse or P edal = f alse. Reusability means, that the knowledge base can be reused in For that domain elements, we can say that these domain ele- another application area. The flexibility defines the possibility ments are full mandatory and the other domain element f alse to change the semantics of the configuration knowledge base. is a dead domain element. Note that all domain elements of a Understandability defines the possibility that knowledge en- domain are false optional if one domain element is full manda- gineers have correct assumptions. Functionality describes the tory. applicability of the knowledge base. For example, if the model does not describe the real product assortment, the knowledge 125 Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors Proceedings of the 17th International Configuration Workshop September 10-11, 2015, Vienna, Austria base has no functionality. Extendability describes the possibil- figurations, and generating user recommendations. This per- ity to extend the knowledge base. In our example (see Figure formance mainly influences the functionality of a system (la- 1) we can extend the model by adding the bike type (V 0 = V ∪ tency) and the reusability. {T ype, M ountainBike, CityBike}; D0 = D ∪ {dom(T ype) = Q3: Does the configuration knowledge base have an admis- dom(M ountainBike) = dom(CityBike) = {true, f alse}};). sible performance? Lethbridge [25] identifies three goals for knowledge bases. First, it is necessary that knowledge engineers can monitor If it is necessary to develop and maintain the knowledge their work. Therefore it is necessary to offer baselines for base a high modifiability will help to reduce the effort for the its continous improvement. Another aspect is the support for update operation. The modifiability has a positive impact on knowledge engineers when they maintain a knowledge base. the reusability and maintainability of a knowledge base. For Finally, Lethbridge also focuses on the understandability of example, when updating redundant constraints in a knowl- knowledge bases. edge base (e.g. constraint c2 in the example in Section 2) it’s From the perspective of software product lines there are probably necessary to update the redundant constraints (c4 ) three goals: the analyzability focuses on the capability of a sys- too. This may lead to a low functionality because the knowl- tem to be diagnosed for anomalies. Changeability is the pos- edge base doesn’t have the correct behavior. sibility and ease of change in a model when modifications are Q4: Is the configuration knowledge base modifiable? necessary. Understandability also means the likelihood that knowledge engineers and designers understand the knowledge The development effort describes the effort when updat- base [2]. ing a configuration knowledge base. This effort contains the To sum up, we define the following goals for configuration time for the update operation. This includes the update of knowledge bases: the semantics of the knowledge base and the time, which is required to remove all new errors. This effort has an impact on • A configuration knowledge base must be maintainable, the maintainability and reusability of a knowledge base and is such that it is easy to change the semantics of the knowl- mainly influenced by the understandability of a configuration edge base in a desired manner [2, 27]. knowledge base. • A configuration knowledge base must be understandable, Q5: Is the configuration knowledge base understandable? such that the effort for a maintainability task for a knowl- edge engineer can be evaluated [2, 25, 27]. Not each goal has a relationship with each question. In • A configuration knowledge base must be functional, such Table 1 we give an overview of the relationship between goals that it represents a part of the real world (e.g. a bike con- and questions: figuration knowledge base) [27]. Question / Goal MT US FT 3.2 Questions for Configuration Knowledge Q1 (completeness) + Q2 (anomalies) - - - Bases Q3 (performance) + After defining the goals for configuration knowledge bases Q4 (modifiability) + Q5 (understandability) + we describe the questions relating to one or more goals. The concordance with the application will be defined by the com- Table 1. Relations between goals and metrics (MT = maintain- pleteness. It suggests the applicability of the current state of ability, US = usability, FT = functionality) the knowledge base for representing the application area. For instance, in our example (see Figure 1) a Bike can have a green and a red frame color. If it is possible to combine those 3.3 Metrics for Configuration Knowledge colors, the coverage is high. If a frame can only have either a Bases red or a green frame color, the model does not represent the application area and the coverage will be low. The metrics are based on a literature review focusing on Q1: Is the configuration knowledge base complete? knowledge engineering [3, 5, 7, 16, 25, 26, 27, 30, 31, 32, 40] as well as on software product line engineering [2, 6, 22, 24]. Anomalies are a well researched area in the context of con- The assumptions in this section are based on the literature of figuration knowledge bases [30]. The term ’anomalies’ is used configuration knowledge bases and other research areas like synonymously for errors and subsumes the terms inconsis- feature models and software product lines, software engineer- tencies, redundancies, and well-formedness violations. Errors ing, and rule-based knowledge bases. can have an impact on each of the goals, since it has negative After having defined the questions for configuration knowl- impacts on the reusability, maintainability, and understand- edge bases, the next task is to quantify the metrics. There- ability. It can also have a negative impact on the functionality, fore, we describe possible metrics for configuration knowledge if there exists an inconsistency in the knowledge base. bases. Most of the metrics require a consistent CKB. The Q2: Does the configuration knowledge base contain anoma- metrics are based on literature study in configuration, feature lies? model and software engineering research areas. The next list shows some metrics derived from MOOSE and function point analysis [2, 12, 25, 36]: The performance describes the time which is required to calculate characteristics of a knowledge base. These charac- • Number of variables |V |: In the example (see Figure 1) teristics are e.g., error checking, calculating consistent con- |V | = 8. Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors 126 Proceedings of the 17th International Configuration Workshop September 10-11, 2015, Vienna, Austria P vi ∈V |dom(vi )| • Average domain size: domsize = |V | =2 ( • Number of constraints: |C| = 8 P P 0 C ∪ {vi 6= dj } = ∅ vi ∈V dj ∈dom(vi ) 1 else The number of minimal conflicts |CS| is the first FM = (2) |V | × domsize anomaly metric [23]. In our example (see Section 2), we have 1 minimal conflict, such that |CS| = 1. We can also evaluate Since each domain in our example knowledge base has two the smallest number of constraints in a conflict set. The low- values (true, f alse) we can say, that whenever a domain ele- est number of constraints in a conflict set CS is the minimal ment is dead, the other value becomes full mandatory auto- cardinality of conflict sets M CCS and can be defined as matically. When domains have more than two values, it can @CSj : |CSj | < |CSi |. The example in Section 2 has a mini- be the case, that a domain element is dead but there is no mal cardinality M CCS = 4. other domain value with the property of being a full manda- We can also evaluate diagnoses for knowledge bases. For tory domain element. example, with the FastDiag algorithm [17, 19] we can cal- The third well-formedness violation is called unnecessary culate the number of diagnoses |∆| and the number of refinement (U R). Such a violation occurs when there are constraints in a minimal cardinality diagnosis M C∆. two variables and the domain element of the first variable in A minimal cardinality diagnosis ∆i is a minimal diagnosis a valid configuration can be suggested by the assignment of a which has the property of having the smallest number of con- second variable. An unnecessary refinement can be described straints in the diagnosis, such that, @∆j : |∆j | < |∆i |. The as dom(vi ) → dom(vj ). example described in Section 2, contains 4 minimal diagnoses In the example in Section 2 we can say that the vari- (∆1 = {c0 }, ∆2 = {c3 }, ∆3 = {c5 }, ∆4 = {c1 , c2 }, |∆| = 4) ables Standard and Clip are an unnecessary refinement, and a minimal cardinality diagnosis of 1 (M C∆ = 1). because whenever Standard = true Clip = f alse and The number of redundant constraints can also be used as Standard = f alse Clip = true. In that case, we can recom- a measure for knowledge bases [20, 28, 30, 31] if the config- mend, that the domain of the variable P edal can be replaced uration knowledge base is consistent.2 The number of sets of by Standard, Clip and the variables Standard and Clip can redundant constraints is denoted as |R| and the maxi- be removed from the knowledge base without changing the mum cardinality of a redundancy set Ri is 1. We cal- semantics of the knowledge base. culate the maximum cardinality for Ri by checking, if there The restriction rate RR compares the number of con- exists another set Rj which has a bigger cardinality, such straints with the number of variables. In the example de- |C| that Ri has the property of having the maximum cardinality, scribed in Section 2 the restriction rate RR = |V | = 88 = 1. iff @Rj |Rj | < |Ri |. The example in Section 2 contains one A value greater than 1 means that there is a high restriction set with redundant constraints (R1 = {c4 }, |R| = 1) and the [2, 25]. maximum cardinality of these sets is 1 (M CR = |1|). The metric RR is influenced by the design of the knowledge A domain element domi ∈ dom(vj ) is a dead domain base. For example, while one knowledge engineer requires a element, iff there does not exist a valid configuration, such single constraint for subsuming the constraints c0 ∧ c1 ∧ c2 ∧ c3 that, C ∪ {vj = domi ; } 6= ∅ [5, 6]. When assuming that another knowledge engineer is using four single constraints. To C 0 = C ∪ {c8 : Standard = true; } it is not possible, that consider these different design approaches in the metric, the Clip is also true, such that, C 0 ∪ {c9 : P edal = true} = ∅, restriction rate RR2 is considering the number of variables #vars(ci ) such that, DE = 1. We use the sum of all dead elements as a P #vars(C) c ∈C |C| i in a constraint, such that, RR2 = |V | where metric 0 < DE < 1 by using Equation 1 where a value nearer 0 means that there are no or less dead elements and a value #vars(ci ) is the number of variables in ci . nearer to 1 means that a high number of domain elements Another metric from the domain of software engineering in the knowledge base can not be selected in a consistent is the variable inheritance factor V IF [1]. Adapted for configuration knowledge base. configuration knowledge bases, we define V IF as the num- ber of constraints in which a variable vi appears related ( to the number of constraints, e.g., V IF (F ramecolor) = P P 0 C ∪ {vi = dj } = 6 ∅ P 1 vf ramecolor ∈ ci vi ∈V dj ∈dom(vi ) ci ∈C 1 else 0 else DE = (1) |C| = 0.375 because the variable |V | × domsize f ramecolor appears in three constraints and |C| = 8. To receive a CKB metric we calculate the V IFall for all A domain element domi ∈ dom(vj ) becomes dead if another variables. When calculating the arithmetic mean of the V IFall domain element domk ∈ dom(vl ), vj 6= vl is selected (Condi- of all variables, we can evaluate the importance distribution of tionally dead domain elements CD [6]). In the example (see all variables. A value near to 0 means, that all variables have Section 2) the constraint c7 does not allow the configuration the same importance and should be considered in the same Standard = true ∧ Clip = true. way. On the other hand, a high value means that there are On the other hand, a domain element can be full manda- some important and less important variables in the knowledge tory (F M ). Full mandatory means, that there does not exist base. In such cases, it makes sense to focus on the important a consistent instance of the knowledge base where this domain variabless when maintaining the knowledge base. V IFall = element isn’t selected, formaly described as: P vj ∈V V IF (vj ) P (V IF (vi )− |V | )2 2 To receive a consistent configuration knowledge base we remove vi ∈V |V | the constraint c5 from the set C. Finally, we evaluate the metric coverage. The coverage 127 Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors Proceedings of the 17th International Configuration Workshop September 10-11, 2015, Vienna, Austria measures the number of all consistent complete configura- For a detailed description of the calculation of metrics tions (see Section 2) compared to the maximum number of focusing on anomalies (conflicts, redundancies, and well- complete configurations in a knowledge base. In our exam- formedness violations) and the coverage metric we refer the ple Q|V in Section 2 the maximum number of configurations is reader to [33]. | i=0 |dom(vi )| = 256 (8 variables and each variable has a domain size 2). This will be compared with the number of consistent configurations. In our example we have the follow- 4 Discussion ing consistent configurations: { In this Section we want to discuss relevant aspects of sev- Bike = true∧ eral metrics and give an insight in the implementation of the Ref lector = true∧ goal-question-metrics in the iCone interface. P edal = true∧ Most of the research in the area of configuration knowl- F ramecolor = true∧ edge engineering focuses on the area of verifying configura- (Standard = f alse ∧ Clip = true ∧ Green = true∧ tion knowledge bases (functionality goal, see Section 3) and Red = f alse)∨ ignores the question how to validate the knowledge base [30] (Standard = f alse ∧ Clip = true ∧ Green = f alse∧ (maintainabilty and understandabilty). Felfernig et al. Red = true)∨ [15] present an empirical study about the understandability (Standard = f alse ∧ Clip = true ∧ Green = true∧ of constraints in knowledge bases but there does not exist a Red = true)∨ metric for the understandability of constraints and the knowl- (Standard = true ∧ Clip = f alse ∧ Green = true∧ edge base. Red = f alse)∨ Briand et al. [8] measured the effects of the structural com- (Standard = true ∧ Clip = f alse ∧ Green = f alse∧ plexity of software and its relationship to the maintainabil- Red = true)∨ ity of software. Bagheri and Gasevic [2] transferred this model (Standard = true ∧ Clip = f alse ∧ Green = true∧ into the area of feature models and found out, that the num- Red = true) ber of leaf features, the cyclomatic complexity, the flexibility } of configuration, and the number of valid configurations in- Now we can compare the number of consistent configura- fluence the maintainability of feature models. While the sim- tions (= 6) with the number of all configurations (= 256). ple metrics are easy to transfer into configuration knowledge This leads to a coverage of 6/256 ∗ 100 = 2.34375% which is bases, the depth of a tree or the number of valid configurations very low and the example configuration knowledge base is very can not be calculated. restrictive. For the example knowledge base it is quite easy The number of redundant constraints is an important to evaluate all possible combinations of variables and domain metric since a low number of redundant constraints can im- elements. For knowledge bases with more variables, domain prove the maintenance task, simplify the understandability, elements, and constraints we have millions and more possi- and reduce the time for calculating valid configurations. An ble combinations of variable assignments. For such scenarios important issue in that case is, that redundant constraints can we introduced the simulation technique in the context of also improve the understandability of a configuration knowl- knowledge based systems to approximate the coverage. For a edge base. If a redundant constraint is declared as a desired detailed description to approximate this metric in large con- redundant constraint, the metric should not contain such con- figuration knowledge bases we refer the reader to [33]. straints, but should list it as a desired redundancy. Finally, we can refer the questions to the metrics. Table 2 In a simple configuration knowledge base like the example gives an overview of the relationships between the questions in Section 2 it is easy to calculate the consistency of each and the metrics. possible configuration for the coverage metric. For example, in a configuration knowledge base with a medium number of Q1 Q2 Q3 Q4 Q5 variables (e.g. 10) and average domain size (e.g. 5) we have |V | + - approximately 10M possible configurations. Since it is not domsize + - possible to calculate so many possible configurations in real- |C| + - time, we developed a simulation strategy to approximate the |CS| - - - |∆| - - - number of consistent configurations. For a detailed description M CCS + of the simulation strategy, we refer the reader to [34]. M C∆ + While showing the GQM to knowledge engineers can help |R| - - - understand and maintain the configuration knowledge base, M CR + it is also important to interpret the results. Therefore we im- DE - - - - plemented a history for each metric in our iCone-interface3 . FM - - - - UR - - - - When updates in a configuration knowledge base in the iCone- RR - - system are saved, a new version of the knowledge base will be RR2 - - created and metrics will be actualized. In Figure 2 we can see V IFall - - the changes of the value of the DEAD elements metric. Coverage - - 3 iCone is an ’intelligent environment for the devel- Table 2. Relations between metrics (rows) and questions opment and maintenance of configuration knowledge (columns). A ’-’ means, that the metric has a negative impact on bases’ (http://ase-projects-studies.ist.tugraz.at: the question, ’+’ represents a positive impact. 8080/iCone/index.jsp). Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors 128 Proceedings of the 17th International Configuration Workshop September 10-11, 2015, Vienna, Austria measure the metrics of an existing configuration knowledge base, but we can not identify the causes of bad configuration knowledge base engineering. To give recommendations for op- timizing the knowledge base engineering process, we have to observe the whole process. 5 Conclusion This paper introduces a goal-question-metric approach to evaluate configuration knowledge bases. We gave an overview of configuration knowledge bases and introduced a running ex- ample for this paper and defined goals, questions, and metrics for configuration knowledge bases. Furthermore, we showed how to calculate time-consuming metrics efficiently and pre- sented the iCone-visualization of metrics. It also points out some practical issues when dealing with metrics. Future research should evaluate the relations between goals, questions, and metrics. Our future work will contain empiri- cal evaluations about the correlation between the goals, ques- Figure 2. Visualization of changes for the metric DEAD. The tions, and metrics and their weightings in the aggregation y-axis shows the number of DEAD variables in each version of the process from metrics to questions and from questions to goals. configuration knowledge base (x-axis). Future work should take a look at the knowledge engineer- ing process. When calculating metrics for knowledge engi- Felfernig [13] gives an overview of the usage of function neers, we can list some possible improvements. Preece gives point analysis for configuration knowledge bases. Therefore an overview about verification and validation techniques for he analyzed the input of a configuration knowledge base from knowledge bases [30]. He aligns different V&V techniques to customers and the complexity of a configuration knowledge different models of a knowledge base, e.g., conceptual and de- base. They use the customer requirements as external input sign models and an implemented system. The metrics listed (EI), the data, which are required by the user as external in Section 3 only focus on the implemented system. A GQM query (EQ), the consistent domain elements of variables as for the conceptual and design model does not exist. external output (EO), knowledge elements as internal logical Another relevant fact is the volatility of requirements [36]. files (ILF), and external information like the product assort- If the requirements for the configuration knowledge base are ment from an ERP-system as external interface file (EIF). changing frequently, it is also hard to keep the CKB up to While this approach takes input and output into account, it date. For calculating metrics referring to the quality of a CKB does not evaluate the quality (e.g., the number of dead domain and its requirements, it is necessary to integrate a require- elements) of the input and output. ments management system in the CKB maintenance tool or We have implemented the GQM and the FPA approach in offer an interface for both tools. our iCone implementation [41]. Table 3 gives an overview of the performance. Note, that the time contains the calcula- tion / approximation of the metrics and the calculation of all Acknowledgements anomalies. The notebook domain is calculated six times and The work presented in this paper has been conducted within the mobile phone domain is calculated seven times. the scope of the research project ICONE (Intelligent Assis- tance for Configuration Knowledge Base Development and Notebooks Mobile phones Maintenance) funded by the Austrian Research Promotion Product variants 115.00 13,999.00 Product variables 28.00 34.00 Agency (827587). product variable domain sizes 1.00 - 45.00 2.00 - 47.00 Customer variables 4.00 5.00 avg. customer variable dom. size 3.75 4.00 REFERENCES constraints 12.00 8.00 [1] F. B. Abreu and W. Melo. Evaluating the impact of object- min. calc. time 669 msec. 6,811 msec. oriented design on software quality. Proceedings of the 3rd in- max. calc. time 1,715 msec. 18,643 msec. ternational software metrics symposium, pages 90 – 99, 1996. median calc. time 1,213 msec. 10,842 msec. [2] Ebrahim Bagheri and Dragan Gasevic. Assessing the main- mean calc. time 1,252 msec. 11,307 msec. tainability of software product line feature models using struc- tural metrics. Software Quality Journal, 19(3):579–612, 2011. Table 3. Duration for the calculation of all anomalies (conflicts, [3] Valerie Barr. Applications of rule-base coverage measures to diagnoses, redundancies, well-formedness violations), metrics, goal- expert system evaluation. AAAI, 1997. [4] Don Batory, David Benavides, and Antonio Ruiz-Cortes. Au- question-metrics and function-point-analysis for two configuration tomated analysis of feature models: challenges ahead. Com- knowledge bases (notebooks and mobile phones) mun. ACM, 49:45–47, December 2006. [5] Joachim Baumeister, Frank Puppe, and Dietmar Seipel. Refactoring methods for knowledge bases. In Engineer- In this paper we gave an overview of metrics in configu- ing Knowledge in the age of the Semantic Web: 14th in- ration knowledge bases, focusing on knowledge base engi- ternational conference, EKAW, LNAI 3257, pages 157–171. neering processes [38] is out of scope of this paper. We can Springer, 2004. 129 Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors Proceedings of the 17th International Configuration Workshop September 10-11, 2015, Vienna, Austria [6] David Benavides, Sergio Segura, and Antonio Ruiz-Cortés. bases. International Journal of Software Engineering and Automated analysis of feature models 20 years later: A lit- Knowledge Engineering, 8:16–1, 1998. erature review. Information Systems, 35:615–636, September [26] Mala Mehrotra, Dimitri Bobrovnikoff, Vinay Chaudhri, and 2010. Patrick Hayes. A clustering approach for knowledge base anal- [7] Jim Blythe, Jihie Kim, Surya Ramachandran, and Yolanda ysis. American Association for Artificial Intelligence, 2002. Gil. An integrated environment for knowledge acquisition. [27] Doaa Nabil, Abeer El-Korany, and A. Sharaf Eldin. Towards IUI, pages 14 – 17, 2001. a suite of quality metrics for kadss-domain knowledge. Expert [8] L.C. Briand, J. Wust, S.V. Ikonomovski, and H. Lounis. In- Systems with Applications, 35:654 – 660, 2008. vestigating quality factors in object-oriented designs: an in- [28] Cédric Piette. Let the solver deal with redundancy. In Pro- dustrial case study. In Software Engineering, 1999. Proceed- ceedings of the 2008 20th IEEE International Conference on ings of the 1999 International Conference on, pages 345–354, Tools with Artificial Intelligence - Volume 01, pages 67–73, 1999. Washington, DC, USA, 2008. IEEE Computer Society. [9] Robin Burke. Knowledge-based recommender systems. In [29] Joseph B. Pine, editor. Mass Customization. The new fron- Encyclopedia of library and information systems, page 2000. tier in business competition. Harvard Business School, 1992. Marcel Dekker, 2000. [30] Alun Preece. Building the right system right evaluating v&v [10] Varun Chandola, Arindam Banerjee, and Vipin Kumar. methods in knowledge engineering, 1998. Anomaly detection: A survey. ACM Comput. Surv., 41:15:1– [31] Alun D. Preece and Rajjan Shinghal. Foundation and applica- 15:58, July 2009. tion of knowledge base verification. International Journal of [11] Yu-Chen Chen, Rong-An Shang, and Chen-Yu Kao. The ef- Intelligent Systems 1994;9(8):683702. Duftschmid, S. Miksch fects of information overload on consumers’ subjective state /, 22:23–41, 1994. towards buying decision in the internet shopping environ- [32] ALUN D. Preece, STPHANE Talbot, and Laurence Vignol- ment. Electronic Commerce Research and Applications, 8:48 let. Evaluation of verification tools for knowledge-based sys- – 58, 2009. tems. International Journal of Human-Computer Studies, [12] S. R. Chidamber and C. F. Kemerer. A metrics suite for 47(5):629 – 658, 1997. object-oriented design. IEEE Transactions on Software En- [33] Florian Reinfrank, Gerald Ninaus, and Alexander Felfernig. gineering, 20(6):476 – 493, 1994. Intelligent techniques for the maintenance of constraint-based [13] Alexander Felfernig. Effort estimation for knowledge-based systems. Configuration Workshop, 2015. configuration systems. In Frank Maurer and Günther Ruhe, [34] Florian Reinfrank, Gerald Ninaus, Franz Wotawa, and editors, SEKE, pages 148–154, 2004. Alexander Felfernig. Maintaining constraint-based configu- [14] Alexander Felfernig, Lothar Hotz, Claire Bagley, and Juha ration systems: Challenges ahead. Configuration Workshop, Tiihonen, editors. Knowledge-based configuration. From re- 2015. search to business cases, volume 1. Morgan Kaufmann, 2014. [35] Raymond Reiter. A theory of diagnosis from first principles. [15] Alexander Felfernig, Monika Mandl, Anton Pum, and Monika Artificial Intelligence, 32(1):57–95, 1987. Schubert. Empirical knowledge engineering: Cognitive as- [36] Pedro F. Salvetto, Milton F. Martinez, Carlos D. Luna, and pects in the development of constraint-based recommenders. Javier Segovia. A very early estimation of software develop- In Nicols Garca-Pedrajas, Francisco Herrera, Colin Fyfe, Jos ment time and effort using neural networks. Workshop de Bentez, and Moonis Ali, editors, Trends in Applied Intelligent Ingeniera de Software y Base de Datos, 2004. Systems, volume 6096 of Lecture Notes in Computer Science, [37] Cheri Speier. The influence of information presentation for- pages 631–640. Springer Berlin / Heidelberg, 2010. mats on complex task decision-making performance. Inter- [16] Alexander Felfernig, Florian Reinfrank, and Gerald Ninaus. national Journal of Human-Computer Studies, 64(11):1115 – Resolving anomalies in configuration knowledge bases. IS- 1131, 2006. MIS, 1(1):1 – 10, 2012. [38] Rudi Studer, V. Richard Benjamins, and Dieter Fensel. [17] Alexander Felfernig and Monika Schubert. Personalized Knowledge engineering: Principles and methods. Data & diagnoses for inconsistent user requirements. AI EDAM, Knowlege Engineering, 25:161 – 197, 1998. 25(2):175–183, 2011. [39] Edward Tsang. Foundations of Constraint Satisfaction. Aca- [18] Alexander Felfernig, Monika Schubert, and Stefan Reiterer. demic Press, 1993. Personalized diagnosis for over-constrained problems. IJCAI, [40] William van Melle, Edward H. Shortliffe, and Bruce G. pages 1990 – 1996, 2013. Buchanan. Emycin: A knowledge engineer’s tool for con- [19] Alexander Felfernig, Monika Schubert, and Christoph Zehent- structing rule-based expert systems. In Bruce G. Buchanan ner. An efficient diagnosis algorithm for inconsistent con- and Edward H. Shortliffe, editors, Rule-Based Expert Sys- straint sets. AI EDAM, 26(1):53–62, 2012. tems. The Mycin Experiments of the Stanford Heuristic Pro- [20] Alexander Felfernig, Christoph Zehentner, and Paul Blazek. gramming Project, pages 302–313. Addison-Wesley, 1984. Corediag: Eliminating redundancy in constraint sets. In Mar- [41] Franz Wotawa, Florian Reinfrank, Gerald Ninaus, and tin Sachenbacher, Oskar Dressler, and Michael Hofbaur, ed- Alexander Felfernig. icone: intelligent environment for the de- itors, DX 2011. 22nd International Workshop on Principles velopment and maintenance of configuration knowledge bases. of Diagnosis, pages 219 – 224, Murnau, GER, 2010. IJCAI 2015 Joint Workshop on Constraints and Preferences [21] Yoav Ganzach and Yaacov Schul. The influence of quantity of for Configuration and Recommendation, 2015. information and goal framing on decision. Acta Psychologica, [42] Du Zhang and Doan Nguyen. Prepare: A tool for knowledge 89:23 – 36, 1995. base verification. IEEE Transactions on Knowledge and Data [22] Herman Hartmann and Tim Trew. Using feature diagrams Engineering, 6(6):983–989, 1994. with context variability to model multiple product lines for software supply chains. In Proceedings of the 2008 12th In- ternational Software Product Line Conference, pages 12–21, Washington, DC, USA, 2008. IEEE Computer Society. [23] Ulrich Junker. Quickxplain: preferred explanations and relax- ations for over-constrained problems. In Proceedings of the 19th national conference on Artifical intelligence, AAAI’04, pages 167–172. AAAI Press, 2004. [24] Kim Lauenroth and Klaus Pohl. Towards automated con- sistency checks of product line requirements specifications. In Proceedings of the twenty-second IEEE/ACM interna- tional conference on Automated software engineering, ASE ’07, pages 373–376, New York, NY, USA, 2007. ACM. [25] Timothy Lethbridge. Metrics for concept-oriented knowledge Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors 130 Proceedings of the 17th International Configuration Workshop September 10-11, 2015, Vienna, Austria