Alexander Felfernig, S. Reiterer, M. Stettinger, F. Reinfrank, M. Jeran, G. Ninaus                                                    51


              Recommender Systems for Configuration Knowledge Engineering ∗

                A. Felfernig, S. Reiterer, M. Stettinger, F. Reinfrank, M. Jeran, and G. Ninaus
                                          Graz University of Technology
                                      Inffeldgasse 16b, A-8010 Graz, Austria
                        {felfernig,reiterer,stettinger,reinfrank,jeran,ninaus}@ist.tugraz.at


                            Abstract                                   knowledge engineering environments. Adaptive user inter-
                                                                       faces for knowledge engineering have the potential to effec-
        The knowledge engineering bottleneck is still a ma-            tively support engineers and domain experts in activities such
        jor challenge in configurator projects. In this pa-            as learning (knowledge base understanding), finding (the rel-
        per we show how recommender systems can sup-                   evant items in the knowledge base), and testing & debugging
        port knowledge base development and maintenance                (removing the source of faulty behavior).
        processes. We discuss a couple of scenarios for                   In order to offer more adaptivity in configurator devel-
        the application of recommender systems in knowl-               opment environments, we propose the application of differ-
        edge engineering and report the results of empirical           ent types of recommendation technologies [Jannach et al.,
        studies which show the importance of user-centered             2010] which proactively support domain experts and engi-
        configuration knowledge organization.                          neers when creating and adapting configuration knowledge.
                                                                       Such technologies should dispose of a basic understanding of
1       Introduction                                                   cognitive processes when persons develop and maintain con-
                                                                       figuration knowledge bases. They should support functional-
Product knowledge changes frequently [Soloway, 1987].                  ities such as recommending relevant items (variables, compo-
Therefore, it must be possible to conduct knowledge base               nent types, constraints, diagnoses, etc.) and simultaneously
development and maintenance operations efficiently. Since              omitting specific items that are not relevant. Recommender
the early developments of configurator applications in the             systems have the potential to provide such a support (see, e.g.,
late 1970’s and early 1980’s [McDermott, 1982], knowledge              [Robillard et al., 2010]).
representations have been improved in terms of (1) model-                 There are three basic recommendation approaches. First,
based approaches which allow a clear separation of do-                 collaborative filtering [Konstan et al., 1997] determines rec-
main knowledge and problem solving algorithms, (2) higher-             ommendations based on the preferences of nearest neigh-
level knowledge representations which allow a component-               bors (users with similar preferences compared to the current
oriented representation of configuration knowledge (see, e.g.,         user). In this context, items are recommended to the cur-
[Stumptner et al., 1998]), and (3) graphical knowledge rep-            rent user which have received a positive rating by the near-
resentations (e.g., [Felfernig et al., 2000; 2001]) which al-          est neighbors but are not known to the current user. Second,
low a compact representation. In addition to new knowl-                content-based filtering [Pazzani and Billsus, 1997] recom-
edge representations, intelligent diagnosis approaches have            mends items that are not known to the current user and are
been developed which help a knowledge engineer to identify             similar to items that have already been purchased by her/him.
and repair erroneous configuration knowledge [Junker, 2004;            Similarity between items can be determined, for example,
Felfernig et al., 2004; 2009; 2013].                                   on the basis of the similarity of keywords used to describe
   Due to diversification strategies of companies, product             the item. Third, knowledge-based recommenders recommend
and service assortments are becoming increasingly large and            items by using constraints or similarity metrics [Burke, 2000;
complex [Huffman and Kahn, 1998]. The complexity of                    Felfernig and Burke, 2008].
the underlying knowledge bases increases to the same extent               This paper is organized as follows. In Section 2 we intro-
which requires additional concepts that help a knowledge en-           duce example scenarios for the application of recommender
gineer to conduct knowledge base development and mainte-               technologies in knowledge engineering. Thereafter, we report
nance operations in an efficient fashion. Furthermore, knowl-          results of related empirical studies (see Section 3). In Section
edge bases are often developed by a group of persons with              4 we provide a discussion of related work. Conclusions and a
different knowledge, goals, and focuses with regard to devel-          discussion of future research issues are given in Section 5.
opment and maintenance operations. This situation requires
adaptive user interfaces to be integrated into configuration           2    Recommenders for Knowledge Engineering
    ∗
     The work presented in this paper has been funded by the Aus-      Collaborative Recommendation of Constraints. Collabo-
trian Research Promotion Agency (Project: ICONE (827587)).             rative filtering (CF) recommender systems have shown to be


                                                                                          Michel Aldanondo and Andreas Falkner, Editors
                                                                             Proceedings of the 15th International Configuration Workshop
                                                                                                       August 29-30, 2013, Vienna, Austria
52                                                          Alexander Felfernig, S. Reiterer, M. Stettinger, F. Reinfrank, M. Jeran, G. Ninaus

one of the best choices to achieve serendipity effects, i.e., to           2 → v3 = 1, c4 : v3 = 1 → v1 6= 1, c5 : v3 = 1 → (v4 =
be surprised (in a positive sense) by item recommendations                 2 ∧ v1 > v5 ), c6 : v4 ≥ 1 → v5 ≤ 4, c7 : v5 = 1 → v3 =
one did not expect when starting the recommendation pro-                   2 ∨ v3 = 3}.
cess. In situations were knowledge engineers do not know the                  On the basis of this simple knowledge base, we can
configuration knowledge base very well, collaborative recom-               calculate the similarities between the individual constraints
mendations can be exploited to support a more focused anal-                (ca , cb ) by using Formula 1. In this formula, V =
ysis of the knowledge base. The availability of navigation                 variables(ca ) ∪ variables(cb ), co−occurrence(v, ca , cb )
data from other knowledge engineers is the major precondi-                 = 1 if v is contained in both constraints on the same
tion for determining recommendations with collaborative fil-               position, co−occurrence(v, ca , cb ) = 0.5 if v is con-
tering. Table 1 shows an example of navigation data that de-               tained in both constraints but on a different position, and
scribes in which order knowledge engineers (users) accessed                co−occurrence(v, ca , cb ) = 0 of no co-occurrence exists.
the constraints of a knowledge base. For simplicity we as-                 Note that this is one possible approach to similarity determi-
sume that each of the users accessed each constraint (but in               nation. We also compared this approach with operator-based
different order). Similar applications of collaborative filter-            similarity and a random assignment of constraints to clusters.
ing can be imagined for the recommendation of variables (or
component types) and instances of a component catalog.                                               P
   Table 1 stores the information in which order the con-                                                v∈V co–occurrence(v, ca , cb )
                                                                                   sim(ca , cb ) =                                                 (1)
straints have been visited by knowledge engineers (users),                                                              |V |
for example, user 1 analyzed the constraints in the order                     The similarities between the pairs of individual constraints
[c5 , c2 , c3 , c1 , c4 , c6 ]. Let us assume that the current user has    are depicted in Table 2.
already visited the constraints c5 and (then) c2 . The nearest
neighbors of the current user (users with a similar navigation               ci ∈ C         c1     c2      c3          c4       c5      c6    c7
behavior) are the users 1, 2, and 4. The majority of these
users analyzed constraint c1 in the third step – this one will                  c1          1.0     -       -           -        -       -     -
be recommended to the current user. Note that this recom-                       c2         0.33    1.0      -           -        -       -     -
mendation approach is currently under evaluation, therefore                     c3         0.16   0.33     1.0          -        -       -     -
no related empirical results will be reported in Section 3.                     c4         0.16    0.5    0.16        1.0        -       -     -
                                                                                c5          0.1   0.25     0.1        0.37     1.0       -     -
              user      c1    c2     c3    c4    c5    c6                       c6          0.0    0.0     0.0        0.0      0.12     1.0    -
                1       4     2      3     5     1     6                        c7          0.0   0.33    0.33        0.16     0.12    0.16   1.0
                2       3     2      5     6     1     4
                3       1     3      2     4     6     5                        Table 2: Similarities between individual constraints.
                4       3     2      4     5     1     6
                                                                              On the basis of these individual similarities we are able to
             current     ?    2       ?     ?    1      ?
                                                                           determine a set of corresponding clusters (k = 2). The de-
                                                                           termination of such clusters is exemplified in Table 3. First,
      Table 1: Recommending constraints (ci ) with CF.
                                                                           we (randomly) select two constraints as initial cluster cen-
                                                                           ters (centroids): c1 and c5 (denoted by cs). In iteration 2 the
   Content-based Clustering of Constraints. Another pos-
                                                                           center of cluster 1 changes to c2 and we have to re-calculate
sibility to support knowledge engineers is to cluster con-
                                                                           the cluster assignment. After this iteration, the assignment is
straints with the goal to improve the overall clarity of the
                                                                           stable, i.e., the cluster centers (c2 and c5 ) remain the same.
knowledge base. We will exemplify this on the basis of k-
means clustering [Witten and Frank, 2005]. Following this
                                                                               iteration      c1      c2         c3     c4      c5      c6    c7
approach, we have to generate k initial centroids which act
as (first) representatives of future clusters. In the following,                   1         1(cs)     1         1      2      2(cs)    2     2
each object (in our case: constraint) is assigned to the group                     2           1     1(cs)       1      1      2(cs)    2     1
(cluster) with the closest (most similar) centroid. Thereafter,
centroids are recalculated. In our case, a centroid is defined as               Table 3: k-means clustering of C = {c1 , c2 , ..., c7 }.
the object with the highest overall similarity to the other ob-
jects in the cluster. The algorithm terminates if the centroids               For the visualization of the constraints {c1 , c2 , ..., c7 } this
are stable (do not change). k-means clustering is guaranteed               means that the knowledge base would be presented in terms
to terminate but is not necessarily optimal since the outcome              of two constraint groups: {c1 , c2 , c3 , c4 , c7 } and {c5 , c6 }.
depends on the initial centroids ([Witten and Frank, 2005]).                  Knowledge-based Refactoring Recommendations. The
   For demonstration purposes we introduce the following                   way in which semantics is expressed has an impact on the
simple configuration problem which is represented as a ba-                 understandability of the knowledge base. For example, users
sic constraint satisfaction problem (CSP = (V, D, C)) where                need less time to understand the semantics of a knowledge
V represents a set of variables {v1 , v2 , ..., v5 }, D represents         base if implications are expressed in terms of A → B com-
the set of corresponding domains (dom(vi ) = {1..5}), and C                pared to the alternative representation of ¬A ∨ B. Explicit
represents the following set of constraints.                               knowledge about the cognitive complexity of constraint rep-
   {c1 : v1 = 3 → v2 > 1, c2 : v1 = 3 ∧ v3 = 1, c3 : v2 =                  resentations can be exploited to recommend structural and


Michel Aldanondo and Andreas Falkner, Editors
Proceedings of the 15th International Configuration Workshop
August 29-30, 2013, Vienna, Austria
Alexander Felfernig, S. Reiterer, M. Stettinger, F. Reinfrank, M. Jeran, G. Ninaus                                                     53

semantics-preserving adaptations of knowledge structures.                               Requires        Incompatibility
Such recommendations are knowledge-based, since they are                                 X→Y               X → ¬Y
explicitly encoded in refactoring rules.                                                ¬X ∨ Y             ¬X ∨ ¬Y
                                                                                       ¬Y → ¬X             Y → ¬X
3     Empirical Evaluation                                                             ¬(X ∧ ¬Y )          ¬(X ∧ Y )
                                                                                         Y ←X              ¬Y ← X
For the content-based clustering of constraints and
knowledge-based refactoring recommendations we now
                                                                       Table 6: Five different possibilities of representing requires
present the results of two empirical studies. In the first study,
                                                                       and incompatibility relationships.
we compared the applicability of three different clustering
strategies with regard to knowledge engineering tasks (find a
solution, find a minimal conflict) (see, e.g., [Junker, 2004]).           Study B is based on an within-subjects design (N=66) with
   Study A: Clustering of Constraints. For two different               two configuration knowledge bases. Knowledge base kbb1
configuration knowledge bases (kba1 , kba2 ) we conducted a            consisted of a set of requires constraints and kbb2 consisted
study based on an within-subjects design (N=40). Each study            of a set of incompatibility constraints. Each study partici-
participant (students of computer science who visited a re-            pant (again, computer science students who visited a related
lated course on knowledge engineering) had the task of (1)             knowledge engineering course) had the task of finding a solu-
finding a solution (in kba1 ) and (2) finding a minimal conflict       tion for the given CSP. Each participant was confronted with
(in kba2 ).1 There were no time limits regarding task com-             one version of kbb1 and one version of kbb2 conform the
pletion. Each student was assigned to one type of cluster-             schema depicted in Table 6. For example, if a student re-
ing (one out of variable-based similarity, operator-based sim-         ceived the X → Y version of kbb1 then she/he also received
ilarity, and random clustering), i.e., we did not vary the type        the X → ¬Y version of kbb2 . The knowledge bases kbb1
of clustering per student. The knowledge bases (kba1 , kba2 )          and kbb2 were (again) defined in a domain-independent fash-
were defined as CSPs in a domain-independent fashion in or-            ion (see Study A). The basic properties of the used knowledge
der to avoid an additional cognitive complexity related to the         bases are summarized in Table 7.
understanding of a product domain. The basic properties of
the used knowledge bases are summarized in Table 4.                        Knowledge base      #(vi ∈ V )    vi domain size     #(ci ∈ C)
                                                                               kbb1                5               5                 7
    Knowledge base     #(vi ∈ V )    vi domain size     #(ci ∈ C)              kbb2                3               3                 5
        kba1                5              5                15
                                                                                 Table 7: Knowledge bases used in Study B.
        kba2               10              3                10

          Table 4: Knowledge bases used in Study A.                        The outcome of this experiment is shown in Table 8.

                                                                               kbb1 : SOL        errors     kbb2 : SOL      errors
    The outcome of this experiment is shown in Table 5.
                                                                                X→Y             21.43%      X → ¬Y         14.29%
        Grouping approach      kba1 : SOL      kba2 : CON                       ¬X ∨ Y           50.0%      ¬X ∨ ¬Y        34.62%
                                                                               ¬Y → ¬X          96.43%      Y → ¬X          50.0%
         Similar variables      21.43%           42.86%
                                                                               ¬(X ∧ ¬Y )       73.08%      ¬(X ∧ Y )      42.31%
         Similar operators      30.77%           53.85%
                                                                                Y ←X             25.0%      ¬Y ← X         16.67%
             Random             38.46%           76.92%
                                                                       Table 8: Error rates in solution identification (SOL) depend-
Table 5: Error rates for completing the tasks find a solution
                                                                       ing on constraint representation.
(SOL) and find a conflict (CON) depending on clustering ap-
proach (variable-based, operator-based, or random).
                                                                         A result of the study is that basic implications (→) should
                                                                       be preferred to other representations in order to maximize un-
   From the three compared approaches to the clustering of             derstandability. The only type of knowledge representation
constraints in a configuration knowledge base, variable sim-           with a similar performance is the reverse implication, how-
ilarity based clustering clearly outperforms operator-based            ever, when comparing both alternatives, the standard impli-
clustering and random clustering of constraints.                       cation seems to be the better choice.
   Study B: Cognitive Complexities. There are different
possibilities to represent equivalent semantics on the basis of        4     Related Work
a constraint, for example, the requires relationship X → Y
can be represented in terms of ¬X ∨ Y . The incompatibil-              There is a long history of research on the improvement of
ity relationship ¬(X ∧ Y ) can be represented as X → ¬Y .              knowledge engineering processes. Early research focused on
Table 6 depicts five different possibilities to express requires       model-based knowledge representations that allowed a sepa-
and incompatibility relationships.                                     ration of domain and problem solving knowledge. An exam-
                                                                       ple of such a representation are constraint technologies which
    1
      We used these tasks to measure knowledge understanding. Fur-     became extremely popular as a technological basis for indus-
ther more differentiated tasks are within the scope of future work.    trial applications [Freuder, 1997]. In a next step, graphical


                                                                                           Michel Aldanondo and Andreas Falkner, Editors
                                                                              Proceedings of the 15th International Configuration Workshop
                                                                                                        August 29-30, 2013, Vienna, Austria
54                                                      Alexander Felfernig, S. Reiterer, M. Stettinger, F. Reinfrank, M. Jeran, G. Ninaus

knowledge representations [Felfernig et al., 2000] and intel-             of configuration knowledge bases. Artificial Intelligence,
ligent techniques for knowledge base testing and debugging                152(2):213–234, 2004.
have been developed [Felfernig et al., 2004]. The need of an           [Felfernig et al., 2009] A. Felfernig, G. Friedrich, M. Schu-
intuitive access to a corpus of software artifacts is also one of         bert, M. Mandl, M. Mairitsch, and E. Teppan. Plausible
the major requirements for software comprehension [Storey,                repairs for inconsistent requirements. In 21st Intl. Joint
2006]. In this context, recommender systems [Jannach et                   Conference on Artificical Intelligence (IJCAI’09), pages
al., 2010] have already been identified as a valuable means               791–796, Pasadena, CA, 2009.
to provide intelligent support for the navigation in large and
complex software spaces (see, e.g., [Robillard et al., 2010]).         [Felfernig et al., 2010] A. Felfernig, M. Mandl, A. Pum, and
The application of recommendation technologies for support-               M. Schubert. Empirical knowledge engineering: Cogni-
ing knowledge engineering processes is a new research area.               tive aspects in the development of constraint-based rec-
Research contributions in this field have the potential to sig-           ommenders. In IEA/AIE 2010, pages 631–640, Cordoba,
nificantly improve the overall quality of knowledge engineer-             Spain, 2010.
ing processes. In [Felfernig et al., 2010] basic knowledge             [Felfernig et al., 2013] A. Felfernig, M. Schubert, and S. Re-
representations are compared, for example, the use of → to                iterer. Personalized Diagnosis for Over-Constrained Prob-
represent an implication vs. the use of ¬ and ∨. This work                lems. In 23rd International Conference on Artificial Intel-
is an important step towards a discipline of empirical knowl-             ligence, Peking, China, 2013.
edge engineering with a clear focus on usability aspects and           [Freuder, 1997] E. Freuder. In pursuit of the holy grail. Con-
cognitive efforts needed to complete knowledge engineering                straints, 2(1):57–61, 1997.
tasks. The work presented in this paper is a continuation of
the work of [Felfernig et al., 2010]. It takes a more detailed         [Huffman and Kahn, 1998] C. Huffman and B. Kahn. Va-
look at different alternative representations of requires and             riety for Sale: Mass Customization or Mass Confusion.
incompatibility relationships and introduces a new concepts               Journal of Retailing, 74:491–513, 1998.
related to the content-based clustering of constraints.                [Jannach et al., 2010] D. Jannach, M. Zanker, A. Felfernig,
                                                                          and G. Friedrich. Recommender Systems. CUP, 2010.
5    Conclusions                                                       [Junker, 2004] U. Junker. Quickxplain: Preferred expla-
In this paper we showed how recommenders can be exploited                 nations and relaxations for over-constrained problems.
to support knowledge engineering tasks. Examples are col-                 In 19th National Conference on Artificial Intelligence
laborative filtering of constraint sets, clustering of constraints,       (AAAI04), pages 167–172, San Jose, CA, 2004.
and knowledge-based recommendation of refactoring oper-                [Konstan et al., 1997] J. Konstan, B. Miller, D. Maltz,
ations. Future work will include the development of fur-                  J. Herlocker, L. Gordon, and J. Riedl. Grouplens: applying
ther recommendation algorithms, for example, the inclusion                collaborative filtering to usenet news. Communications of
of content-based filtering and further clustering algorithms              the ACM, 40(3):77–87, 1997.
as well as further empirical studies with more differentiated          [McDermott, 1982] J. McDermott. R1: A Rule-based Con-
maintenance tasks. Finally, we will focus on an in-depth anal-
                                                                          figurer of Computer Systems. Artificial Intelligence Jour-
ysis of existing research in the area of cognition psychology
                                                                          nal, 19:39–88, 1982.
which can further advance the state of the art in (configura-
tion) knowledge engineering.                                           [Pazzani and Billsus, 1997] M. Pazzani and D. Billsus.
                                                                          Learning and revising user profiles: the identification of
References                                                                interesting websites. Mach. Learn., 27:313–331, 1997.
                                                                       [Robillard et al., 2010] M. Robillard, R. Walker, and T. Zim-
[Burke, 2000] R. Burke. Knowledge-based recommender
                                                                          mermann. Recommendation systems for software engi-
   systems. Library and Inf. Systems, 69(32):180–200, 2000.
                                                                          neering. IEEE Software, 27(4):80–86, 2010.
[Felfernig and Burke, 2008] A. Felfernig and R. Burke.                 [Soloway, 1987] E. et al. Soloway. Assessing the Maintain-
   Constraint-based recommender systems: Technologies                     abiliy of XCON-in-RIME: Coping with the Problem of
   and research issues. In ACM International Conference on                very large Rule-bases. In Proc. of AAAI-87, pages 824–
   Electronic Commerce (ICEC08), pages 17–26, 2008.                       829, Seattle, Washington, USA, July 13–17 1987.
[Felfernig et al., 2000] A. Felfernig, G. E. Friedrich, and            [Storey, 2006] M. Storey. Theories, tools and research meth-
   D. Jannach. UML as Domain Specific Language for the                    ods in program comprehension: past, present and future.
   Construction of Knowledge-based Configuration Systems.                 Software Quality Journal, 14:187–208, 2006.
   IJSEKE, 10(4):449–469, 2000.
                                                                       [Stumptner et al., 1998] M. Stumptner, G. Friedrich, and
[Felfernig et al., 2001] A. Felfernig, G. Friedrich, and                  A. Haselböck. Generative Constraint-based Configuration
   D. Jannach. Conceptual modeling for configuration of                   of Large Technical Systems. AI EDAM, 12(4):307–320,
   mass-customizable products. Artificial Intelligence in En-             1998.
   gineering, 15(2):165–176, 2001.
                                                                       [Witten and Frank, 2005] I. Witten and E. Frank. Data Min-
[Felfernig et al., 2004] A. Felfernig, G. Friedrich, D. Jan-              ing. Morgan Kaufman, 2005.
   nach, and M. Stumptner. Consistency-based diagnosis


Michel Aldanondo and Andreas Falkner, Editors
Proceedings of the 15th International Configuration Workshop
August 29-30, 2013, Vienna, Austria