Alexander Felfernig, S. Reiterer, M. Stettinger, F. Reinfrank, M. Jeran, G. Ninaus 51 Recommender Systems for Configuration Knowledge Engineering ∗ A. Felfernig, S. Reiterer, M. Stettinger, F. Reinfrank, M. Jeran, and G. Ninaus Graz University of Technology Inffeldgasse 16b, A-8010 Graz, Austria {felfernig,reiterer,stettinger,reinfrank,jeran,ninaus}@ist.tugraz.at Abstract knowledge engineering environments. Adaptive user inter- faces for knowledge engineering have the potential to effec- The knowledge engineering bottleneck is still a ma- tively support engineers and domain experts in activities such jor challenge in configurator projects. In this pa- as learning (knowledge base understanding), finding (the rel- per we show how recommender systems can sup- evant items in the knowledge base), and testing & debugging port knowledge base development and maintenance (removing the source of faulty behavior). processes. We discuss a couple of scenarios for In order to offer more adaptivity in configurator devel- the application of recommender systems in knowl- opment environments, we propose the application of differ- edge engineering and report the results of empirical ent types of recommendation technologies [Jannach et al., studies which show the importance of user-centered 2010] which proactively support domain experts and engi- configuration knowledge organization. neers when creating and adapting configuration knowledge. Such technologies should dispose of a basic understanding of 1 Introduction cognitive processes when persons develop and maintain con- figuration knowledge bases. They should support functional- Product knowledge changes frequently [Soloway, 1987]. ities such as recommending relevant items (variables, compo- Therefore, it must be possible to conduct knowledge base nent types, constraints, diagnoses, etc.) and simultaneously development and maintenance operations efficiently. Since omitting specific items that are not relevant. Recommender the early developments of configurator applications in the systems have the potential to provide such a support (see, e.g., late 1970’s and early 1980’s [McDermott, 1982], knowledge [Robillard et al., 2010]). representations have been improved in terms of (1) model- There are three basic recommendation approaches. First, based approaches which allow a clear separation of do- collaborative filtering [Konstan et al., 1997] determines rec- main knowledge and problem solving algorithms, (2) higher- ommendations based on the preferences of nearest neigh- level knowledge representations which allow a component- bors (users with similar preferences compared to the current oriented representation of configuration knowledge (see, e.g., user). In this context, items are recommended to the cur- [Stumptner et al., 1998]), and (3) graphical knowledge rep- rent user which have received a positive rating by the near- resentations (e.g., [Felfernig et al., 2000; 2001]) which al- est neighbors but are not known to the current user. Second, low a compact representation. In addition to new knowl- content-based filtering [Pazzani and Billsus, 1997] recom- edge representations, intelligent diagnosis approaches have mends items that are not known to the current user and are been developed which help a knowledge engineer to identify similar to items that have already been purchased by her/him. and repair erroneous configuration knowledge [Junker, 2004; Similarity between items can be determined, for example, Felfernig et al., 2004; 2009; 2013]. on the basis of the similarity of keywords used to describe Due to diversification strategies of companies, product the item. Third, knowledge-based recommenders recommend and service assortments are becoming increasingly large and items by using constraints or similarity metrics [Burke, 2000; complex [Huffman and Kahn, 1998]. The complexity of Felfernig and Burke, 2008]. the underlying knowledge bases increases to the same extent This paper is organized as follows. In Section 2 we intro- which requires additional concepts that help a knowledge en- duce example scenarios for the application of recommender gineer to conduct knowledge base development and mainte- technologies in knowledge engineering. Thereafter, we report nance operations in an efficient fashion. Furthermore, knowl- results of related empirical studies (see Section 3). In Section edge bases are often developed by a group of persons with 4 we provide a discussion of related work. Conclusions and a different knowledge, goals, and focuses with regard to devel- discussion of future research issues are given in Section 5. opment and maintenance operations. This situation requires adaptive user interfaces to be integrated into configuration 2 Recommenders for Knowledge Engineering ∗ The work presented in this paper has been funded by the Aus- Collaborative Recommendation of Constraints. Collabo- trian Research Promotion Agency (Project: ICONE (827587)). rative filtering (CF) recommender systems have shown to be Michel Aldanondo and Andreas Falkner, Editors Proceedings of the 15th International Configuration Workshop August 29-30, 2013, Vienna, Austria 52 Alexander Felfernig, S. Reiterer, M. Stettinger, F. Reinfrank, M. Jeran, G. Ninaus one of the best choices to achieve serendipity effects, i.e., to 2 → v3 = 1, c4 : v3 = 1 → v1 6= 1, c5 : v3 = 1 → (v4 = be surprised (in a positive sense) by item recommendations 2 ∧ v1 > v5 ), c6 : v4 ≥ 1 → v5 ≤ 4, c7 : v5 = 1 → v3 = one did not expect when starting the recommendation pro- 2 ∨ v3 = 3}. cess. In situations were knowledge engineers do not know the On the basis of this simple knowledge base, we can configuration knowledge base very well, collaborative recom- calculate the similarities between the individual constraints mendations can be exploited to support a more focused anal- (ca , cb ) by using Formula 1. In this formula, V = ysis of the knowledge base. The availability of navigation variables(ca ) ∪ variables(cb ), co−occurrence(v, ca , cb ) data from other knowledge engineers is the major precondi- = 1 if v is contained in both constraints on the same tion for determining recommendations with collaborative fil- position, co−occurrence(v, ca , cb ) = 0.5 if v is con- tering. Table 1 shows an example of navigation data that de- tained in both constraints but on a different position, and scribes in which order knowledge engineers (users) accessed co−occurrence(v, ca , cb ) = 0 of no co-occurrence exists. the constraints of a knowledge base. For simplicity we as- Note that this is one possible approach to similarity determi- sume that each of the users accessed each constraint (but in nation. We also compared this approach with operator-based different order). Similar applications of collaborative filter- similarity and a random assignment of constraints to clusters. ing can be imagined for the recommendation of variables (or component types) and instances of a component catalog. P Table 1 stores the information in which order the con- v∈V co–occurrence(v, ca , cb ) sim(ca , cb ) = (1) straints have been visited by knowledge engineers (users), |V | for example, user 1 analyzed the constraints in the order The similarities between the pairs of individual constraints [c5 , c2 , c3 , c1 , c4 , c6 ]. Let us assume that the current user has are depicted in Table 2. already visited the constraints c5 and (then) c2 . The nearest neighbors of the current user (users with a similar navigation ci ∈ C c1 c2 c3 c4 c5 c6 c7 behavior) are the users 1, 2, and 4. The majority of these users analyzed constraint c1 in the third step – this one will c1 1.0 - - - - - - be recommended to the current user. Note that this recom- c2 0.33 1.0 - - - - - mendation approach is currently under evaluation, therefore c3 0.16 0.33 1.0 - - - - no related empirical results will be reported in Section 3. c4 0.16 0.5 0.16 1.0 - - - c5 0.1 0.25 0.1 0.37 1.0 - - user c1 c2 c3 c4 c5 c6 c6 0.0 0.0 0.0 0.0 0.12 1.0 - 1 4 2 3 5 1 6 c7 0.0 0.33 0.33 0.16 0.12 0.16 1.0 2 3 2 5 6 1 4 3 1 3 2 4 6 5 Table 2: Similarities between individual constraints. 4 3 2 4 5 1 6 On the basis of these individual similarities we are able to current ? 2 ? ? 1 ? determine a set of corresponding clusters (k = 2). The de- termination of such clusters is exemplified in Table 3. First, Table 1: Recommending constraints (ci ) with CF. we (randomly) select two constraints as initial cluster cen- ters (centroids): c1 and c5 (denoted by cs). In iteration 2 the Content-based Clustering of Constraints. Another pos- center of cluster 1 changes to c2 and we have to re-calculate sibility to support knowledge engineers is to cluster con- the cluster assignment. After this iteration, the assignment is straints with the goal to improve the overall clarity of the stable, i.e., the cluster centers (c2 and c5 ) remain the same. knowledge base. We will exemplify this on the basis of k- means clustering [Witten and Frank, 2005]. Following this iteration c1 c2 c3 c4 c5 c6 c7 approach, we have to generate k initial centroids which act as (first) representatives of future clusters. In the following, 1 1(cs) 1 1 2 2(cs) 2 2 each object (in our case: constraint) is assigned to the group 2 1 1(cs) 1 1 2(cs) 2 1 (cluster) with the closest (most similar) centroid. Thereafter, centroids are recalculated. In our case, a centroid is defined as Table 3: k-means clustering of C = {c1 , c2 , ..., c7 }. the object with the highest overall similarity to the other ob- jects in the cluster. The algorithm terminates if the centroids For the visualization of the constraints {c1 , c2 , ..., c7 } this are stable (do not change). k-means clustering is guaranteed means that the knowledge base would be presented in terms to terminate but is not necessarily optimal since the outcome of two constraint groups: {c1 , c2 , c3 , c4 , c7 } and {c5 , c6 }. depends on the initial centroids ([Witten and Frank, 2005]). Knowledge-based Refactoring Recommendations. The For demonstration purposes we introduce the following way in which semantics is expressed has an impact on the simple configuration problem which is represented as a ba- understandability of the knowledge base. For example, users sic constraint satisfaction problem (CSP = (V, D, C)) where need less time to understand the semantics of a knowledge V represents a set of variables {v1 , v2 , ..., v5 }, D represents base if implications are expressed in terms of A → B com- the set of corresponding domains (dom(vi ) = {1..5}), and C pared to the alternative representation of ¬A ∨ B. Explicit represents the following set of constraints. knowledge about the cognitive complexity of constraint rep- {c1 : v1 = 3 → v2 > 1, c2 : v1 = 3 ∧ v3 = 1, c3 : v2 = resentations can be exploited to recommend structural and Michel Aldanondo and Andreas Falkner, Editors Proceedings of the 15th International Configuration Workshop August 29-30, 2013, Vienna, Austria Alexander Felfernig, S. Reiterer, M. Stettinger, F. Reinfrank, M. Jeran, G. Ninaus 53 semantics-preserving adaptations of knowledge structures. Requires Incompatibility Such recommendations are knowledge-based, since they are X→Y X → ¬Y explicitly encoded in refactoring rules. ¬X ∨ Y ¬X ∨ ¬Y ¬Y → ¬X Y → ¬X 3 Empirical Evaluation ¬(X ∧ ¬Y ) ¬(X ∧ Y ) Y ←X ¬Y ← X For the content-based clustering of constraints and knowledge-based refactoring recommendations we now Table 6: Five different possibilities of representing requires present the results of two empirical studies. In the first study, and incompatibility relationships. we compared the applicability of three different clustering strategies with regard to knowledge engineering tasks (find a solution, find a minimal conflict) (see, e.g., [Junker, 2004]). Study B is based on an within-subjects design (N=66) with Study A: Clustering of Constraints. For two different two configuration knowledge bases. Knowledge base kbb1 configuration knowledge bases (kba1 , kba2 ) we conducted a consisted of a set of requires constraints and kbb2 consisted study based on an within-subjects design (N=40). Each study of a set of incompatibility constraints. Each study partici- participant (students of computer science who visited a re- pant (again, computer science students who visited a related lated course on knowledge engineering) had the task of (1) knowledge engineering course) had the task of finding a solu- finding a solution (in kba1 ) and (2) finding a minimal conflict tion for the given CSP. Each participant was confronted with (in kba2 ).1 There were no time limits regarding task com- one version of kbb1 and one version of kbb2 conform the pletion. Each student was assigned to one type of cluster- schema depicted in Table 6. For example, if a student re- ing (one out of variable-based similarity, operator-based sim- ceived the X → Y version of kbb1 then she/he also received ilarity, and random clustering), i.e., we did not vary the type the X → ¬Y version of kbb2 . The knowledge bases kbb1 of clustering per student. The knowledge bases (kba1 , kba2 ) and kbb2 were (again) defined in a domain-independent fash- were defined as CSPs in a domain-independent fashion in or- ion (see Study A). The basic properties of the used knowledge der to avoid an additional cognitive complexity related to the bases are summarized in Table 7. understanding of a product domain. The basic properties of the used knowledge bases are summarized in Table 4. Knowledge base #(vi ∈ V ) vi domain size #(ci ∈ C) kbb1 5 5 7 Knowledge base #(vi ∈ V ) vi domain size #(ci ∈ C) kbb2 3 3 5 kba1 5 5 15 Table 7: Knowledge bases used in Study B. kba2 10 3 10 Table 4: Knowledge bases used in Study A. The outcome of this experiment is shown in Table 8. kbb1 : SOL errors kbb2 : SOL errors The outcome of this experiment is shown in Table 5. X→Y 21.43% X → ¬Y 14.29% Grouping approach kba1 : SOL kba2 : CON ¬X ∨ Y 50.0% ¬X ∨ ¬Y 34.62% ¬Y → ¬X 96.43% Y → ¬X 50.0% Similar variables 21.43% 42.86% ¬(X ∧ ¬Y ) 73.08% ¬(X ∧ Y ) 42.31% Similar operators 30.77% 53.85% Y ←X 25.0% ¬Y ← X 16.67% Random 38.46% 76.92% Table 8: Error rates in solution identification (SOL) depend- Table 5: Error rates for completing the tasks find a solution ing on constraint representation. (SOL) and find a conflict (CON) depending on clustering ap- proach (variable-based, operator-based, or random). A result of the study is that basic implications (→) should be preferred to other representations in order to maximize un- From the three compared approaches to the clustering of derstandability. The only type of knowledge representation constraints in a configuration knowledge base, variable sim- with a similar performance is the reverse implication, how- ilarity based clustering clearly outperforms operator-based ever, when comparing both alternatives, the standard impli- clustering and random clustering of constraints. cation seems to be the better choice. Study B: Cognitive Complexities. There are different possibilities to represent equivalent semantics on the basis of 4 Related Work a constraint, for example, the requires relationship X → Y can be represented in terms of ¬X ∨ Y . The incompatibil- There is a long history of research on the improvement of ity relationship ¬(X ∧ Y ) can be represented as X → ¬Y . knowledge engineering processes. Early research focused on Table 6 depicts five different possibilities to express requires model-based knowledge representations that allowed a sepa- and incompatibility relationships. ration of domain and problem solving knowledge. An exam- ple of such a representation are constraint technologies which 1 We used these tasks to measure knowledge understanding. Fur- became extremely popular as a technological basis for indus- ther more differentiated tasks are within the scope of future work. trial applications [Freuder, 1997]. In a next step, graphical Michel Aldanondo and Andreas Falkner, Editors Proceedings of the 15th International Configuration Workshop August 29-30, 2013, Vienna, Austria 54 Alexander Felfernig, S. Reiterer, M. Stettinger, F. Reinfrank, M. Jeran, G. Ninaus knowledge representations [Felfernig et al., 2000] and intel- of configuration knowledge bases. Artificial Intelligence, ligent techniques for knowledge base testing and debugging 152(2):213–234, 2004. have been developed [Felfernig et al., 2004]. The need of an [Felfernig et al., 2009] A. Felfernig, G. Friedrich, M. Schu- intuitive access to a corpus of software artifacts is also one of bert, M. Mandl, M. Mairitsch, and E. Teppan. Plausible the major requirements for software comprehension [Storey, repairs for inconsistent requirements. In 21st Intl. Joint 2006]. In this context, recommender systems [Jannach et Conference on Artificical Intelligence (IJCAI’09), pages al., 2010] have already been identified as a valuable means 791–796, Pasadena, CA, 2009. to provide intelligent support for the navigation in large and complex software spaces (see, e.g., [Robillard et al., 2010]). [Felfernig et al., 2010] A. Felfernig, M. Mandl, A. Pum, and The application of recommendation technologies for support- M. Schubert. Empirical knowledge engineering: Cogni- ing knowledge engineering processes is a new research area. tive aspects in the development of constraint-based rec- Research contributions in this field have the potential to sig- ommenders. In IEA/AIE 2010, pages 631–640, Cordoba, nificantly improve the overall quality of knowledge engineer- Spain, 2010. ing processes. In [Felfernig et al., 2010] basic knowledge [Felfernig et al., 2013] A. Felfernig, M. Schubert, and S. Re- representations are compared, for example, the use of → to iterer. Personalized Diagnosis for Over-Constrained Prob- represent an implication vs. the use of ¬ and ∨. This work lems. In 23rd International Conference on Artificial Intel- is an important step towards a discipline of empirical knowl- ligence, Peking, China, 2013. edge engineering with a clear focus on usability aspects and [Freuder, 1997] E. Freuder. In pursuit of the holy grail. Con- cognitive efforts needed to complete knowledge engineering straints, 2(1):57–61, 1997. tasks. The work presented in this paper is a continuation of the work of [Felfernig et al., 2010]. It takes a more detailed [Huffman and Kahn, 1998] C. Huffman and B. Kahn. Va- look at different alternative representations of requires and riety for Sale: Mass Customization or Mass Confusion. incompatibility relationships and introduces a new concepts Journal of Retailing, 74:491–513, 1998. related to the content-based clustering of constraints. [Jannach et al., 2010] D. Jannach, M. Zanker, A. Felfernig, and G. Friedrich. Recommender Systems. CUP, 2010. 5 Conclusions [Junker, 2004] U. Junker. Quickxplain: Preferred expla- In this paper we showed how recommenders can be exploited nations and relaxations for over-constrained problems. to support knowledge engineering tasks. Examples are col- In 19th National Conference on Artificial Intelligence laborative filtering of constraint sets, clustering of constraints, (AAAI04), pages 167–172, San Jose, CA, 2004. and knowledge-based recommendation of refactoring oper- [Konstan et al., 1997] J. Konstan, B. Miller, D. Maltz, ations. Future work will include the development of fur- J. Herlocker, L. Gordon, and J. Riedl. Grouplens: applying ther recommendation algorithms, for example, the inclusion collaborative filtering to usenet news. Communications of of content-based filtering and further clustering algorithms the ACM, 40(3):77–87, 1997. as well as further empirical studies with more differentiated [McDermott, 1982] J. McDermott. R1: A Rule-based Con- maintenance tasks. Finally, we will focus on an in-depth anal- figurer of Computer Systems. Artificial Intelligence Jour- ysis of existing research in the area of cognition psychology nal, 19:39–88, 1982. which can further advance the state of the art in (configura- tion) knowledge engineering. [Pazzani and Billsus, 1997] M. Pazzani and D. Billsus. Learning and revising user profiles: the identification of References interesting websites. Mach. Learn., 27:313–331, 1997. [Robillard et al., 2010] M. Robillard, R. Walker, and T. Zim- [Burke, 2000] R. Burke. Knowledge-based recommender mermann. Recommendation systems for software engi- systems. Library and Inf. Systems, 69(32):180–200, 2000. neering. IEEE Software, 27(4):80–86, 2010. [Felfernig and Burke, 2008] A. Felfernig and R. Burke. [Soloway, 1987] E. et al. Soloway. Assessing the Maintain- Constraint-based recommender systems: Technologies abiliy of XCON-in-RIME: Coping with the Problem of and research issues. In ACM International Conference on very large Rule-bases. In Proc. of AAAI-87, pages 824– Electronic Commerce (ICEC08), pages 17–26, 2008. 829, Seattle, Washington, USA, July 13–17 1987. [Felfernig et al., 2000] A. Felfernig, G. E. Friedrich, and [Storey, 2006] M. Storey. Theories, tools and research meth- D. Jannach. UML as Domain Specific Language for the ods in program comprehension: past, present and future. Construction of Knowledge-based Configuration Systems. Software Quality Journal, 14:187–208, 2006. IJSEKE, 10(4):449–469, 2000. [Stumptner et al., 1998] M. Stumptner, G. Friedrich, and [Felfernig et al., 2001] A. Felfernig, G. Friedrich, and A. Haselböck. Generative Constraint-based Configuration D. Jannach. Conceptual modeling for configuration of of Large Technical Systems. AI EDAM, 12(4):307–320, mass-customizable products. Artificial Intelligence in En- 1998. gineering, 15(2):165–176, 2001. [Witten and Frank, 2005] I. Witten and E. Frank. Data Min- [Felfernig et al., 2004] A. Felfernig, G. Friedrich, D. Jan- ing. Morgan Kaufman, 2005. nach, and M. Stumptner. Consistency-based diagnosis Michel Aldanondo and Andreas Falkner, Editors Proceedings of the 15th International Configuration Workshop August 29-30, 2013, Vienna, Austria