=Paper= {{Paper |id=Vol-2306/paper11 |storemode=property |title=Formal Concept Analysis and Pattern Structures for Recommendation Systems |pdfUrl=https://ceur-ws.org/Vol-2306/paper11.pdf |volume=Vol-2306 |authors=Nyoman Juniarta |dblpUrl=https://dblp.org/rec/conf/ekaw/Juniarta18 }} ==Formal Concept Analysis and Pattern Structures for Recommendation Systems== https://ceur-ws.org/Vol-2306/paper11.pdf
Formal Concept Analysis and Pattern Structures
        for Recommendation Systems

                                 Nyoman Juniarta

        Université de Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France
                             nyoman.juniarta@loria.fr


      Abstract. This work focuses on the application of Formal Concept
      Analysis and pattern structures in the problem of recommendation. We
      focus on the collaborative recommendation, where we give a suggestion
      to a new user based on a database of previous users. Here we can look the
      pattern in the database using two approaches: interval pattern structures
      or gradual pattern mining.

      Keywords: biclustering, FCA, pattern structures, recommendation


1   Introduction
CrossCult (http://www.crosscult.eu) is a European project whose idea is to
support the emergence of a European cultural heritage by allowing visitors in
different cultural sites (e.g. museum, historic city, archaeological site) to improve
the quality of their visit by using adapted computer-based devices and to con-
sider the visit at a European level. Such improvement can be accomplished by
studying, among others, the possibility to build a dynamic recommendation sys-
tem. This system should be able to produce a relevant suggestion on which part
of a cultural site may be interesting for a specific visitor. Here, our objective is
to study a dynamic recommendation system for visitors in a museum. Given a
new visitor Vn , the task is to suggest a museum item that may be interesting
for him/her. Based on how a suggestion is made to a new visitor Vn , a recom-
mendation system can be classified into one of the three following categories
[1]:
 – Content-based recommendations: The system makes a suggestion based only
   on the previous visited items of Vn . For example, if Vn visited mostly the
   items from prehistoric era, then the system recommends another item from
   that era.
 – Collaborative recommendations: The system looks for previous users who
   have similar interest to Vn , and makes a suggestion based on their visited
   items. For example, if many of Vn ’s similar users have visited item I, then
   the system recommends this item.
 – Hybrid approaches: The combination of content-based and collaborative ap-
   proaches.
In this work, we explore the second category (collaborative recommendation) for
museum visitors.
2      State-of-the-Art
2.1     Formal Concept Analysis and Pattern Structures
Formal Concept Analysis (FCA) is a mathematical framework based on lattice
theory and used for classification, data analysis, and knowledge discovery, intro-
duced in [4]. From a formal context, FCA detects all formal concepts, who can
be arranged in a concept lattice.
Definition 1. A formal context is a triple (G, M, I), where G is a set of objects,
M is a set of attributes, and I is a binary relation between G and M, i.e. I ⊆
G × M.
   If an object g has an attribute m, then (g, m) ∈ I. An example of a formal
context is shown in Figure 1.




                  m1 m2 m3 m4
               g1          ×
               g2 ×     ×
               g3 × × ×
               g4    × × ×

          Fig. 1. A formal context.        Fig. 2. Concept lattice for the formal con-
                                           text in Figure 1.


      The Galois connection for a formal context (G, M, I) is defined as follows:
Definition 2. For a subset of objects A ⊆ G, A0 is the set of attributes that
are possessed by all objects in A, i.e.: A0 = {m ∈ M |∀g ∈ A, (g, m) ∈ I}
    Dually, for a subset of attributes B ⊆ M , B 0 is the set of objects that have
all attributes in B, i.e.: B 0 = {g ∈ G|∀m ∈ B, (g, m) ∈ I}
    A formal concept is the pair (A, B), where A ⊆ G and B ⊆ M , and such
that A0 = B and B 0 = A. For such concept, A is called its extent and B is its
intent.
   From Figure 1, we see that {g3 , g4 }0 = {m2 , m3 } and {m2 , m3 }0 = {g3 , g4 }.
Therefore, ({g3 , g4 }, {m2 , m3 }) is a formal concept. It should be noticed that
the extent and the intent of a concept (A, B) are closed sets, i.e. A = A00 and
B = B 00 . A formal concept (A, B) is a subconcept of a formal concept (C, D) –
denoted by (A, B) ≤ (C, D) – if A ⊆ C (or equivalently D ⊆ B).
    If (A, B) < (C, D) and there is no (E, F ) such that (A, B) < (E, F ) <
(C, D), then (A, B) is a cover of (C, D). A concept lattice can be formed using
the ≤ relation which defines the partial order among concepts. For the context
in Figure 1, the formal concepts and their corresponding lattice are shown in
Figure 2 (extent and intent is below and above each concept respectively).
    FCA is restricted to specific datasets where each attribute is binary (e.g.
has only yes/no value). For more complex values (e.g. numbers, strings, trees,
graphs. . . ), FCA is then generalized into pattern structures [3].
Definition 3. A pattern structure is a triple (G, (D, u), δ), where G is a set of
objects, (D, u) is a complete meet-semilattice of descriptions, and δ : G → D
maps an object to a description.
    The operator u is a similarity operator that returns the common elements
to two descriptions. A description can be a set, a sequence, or other complex
structure. In the binary case where descriptions are sets, u corresponds to set
intersection (∩), i.e. {a, b, c} u {a, b, d} = {a, b}, and v corresponds to subset in-
clusion (⊆). The order between any two descriptions is given by the subsumption
relation:
                               d1 v d2 ⇐⇒ d1 u d2 = d1
Definition 4. The Galois connection for a pattern structure (G, (D, u), δ) is
defined by:
    A = g∈A δ(g), A ⊆ G,
          F
    d = {g ∈ G|d v δ(g)}, d ∈ D.
    A pattern concept is a pair (A, d), A ⊆ G and d ∈ D, where A = d and
 
d = A.


                                          m1 m2 m3
                                     g1   5 7 6
                                     g2   6 8 4
                                     g3   4 8 5
                                     g4   4 9 8
                                     g5   5 8 5
                                     gt   4 7 ?

        Table 1. A numerical dataset with 6 objects (one as a target object).



    Interval pattern structures (ip-structures) is one of possible extensions of
FCA to study more complex descriptions. In ip-structures, the description of an
object is an interval for each attribute. Consider the dataset given in Table 1
where G = {g1 , g2 , g3 } is the set of objects and M = {m1 , m2 , m3 , m4 } is the set
of attributes. Here the description of g1 is δ(g1 ) = h[5, 5], [7, 7], [6, 6]i, while the
description of g4 is δ(g4 ) = h[4, 4], [9, 9], [8, 8]i.
    The similarity operator u is the smallest interval containing both descriptions
in each attribute. Therefore, δ(g1 ) u δ(g4 ) = h[4, 5], [7, 9], [6, 8]i. A description
with larger interval is “subsumed by” (v) descriptions with smaller ones. The
corresponding Galois connection is illustrated as follows:

                         {g1 , g4 } = δ(g1 ) u δ(g4 )
                                      = h[4, 5], [7, 9], [6, 8]i
                                  
             h[4, 5], [7, 9], [6, 8]i = {g ∈ G|h[4, 5], [7, 9], [6, 8]i v δ(g)}
                                      = {g1 , g4 }

   Similar to formal concept, interval pattern concept (ip-concept) can also be
arranged in a lattice. The ip-concept lattice for Table 1 is illustrated in Figure 3.
For further background concerning ip-structures, see [5].




       Fig. 3. Lattice of interval pattern concept for g1 , g2 , and g3 in Table 1.




2.2   Gradual Pattern Mining
In a numerical dataset, a gradual pattern is a pattern that is observed from at
least two objects across at least two attributes. Usually this pattern is described
as a correlation between two (or more) attributes: “the more/less A corresponds
to the more/less B”. This type of pattern was originally proposed in [2] and
studied by [7] as an instrument to aid fuzzy controllers.
     Consider again the matrix in Table 1. This numerical matrix can be trans-
formed into a sign matrix as shown in Table 2. It contains information about
the attribute comparison between any pair of objects. For example, pair (g1 , g2 )
has ‘%’ in m1 because according to this attribute, g1 < g2 . Then, from this sign
matrix, we can find a coherent-sign-changes bicluster (for detailed explanation
about biclustering, please refer to [6]). One of such bicluster is ({p3 , p5 , p6 , p7 ,
p8 , p9 , p10 }, {m1 , m2 , m3 }). This bicluster represent the pattern “the more/less
m1 , the less/more m2 , the less/more m3 (the ‘=’ can be regarded as either ‘&’
or ‘%’).


                                     Pair         m1 m2 m3
                                 p1 = (g1 , g2 ) % % &
                                 p2 = (g1 , g3 ) & % &
                                 p3 = (g1 , g4 ) & % %
                                 p4 = (g1 , g5 ) = % &
                                 p5 = (g2 , g3 ) & = %
                                 p6 = (g2 , g4 ) & % %
                                 p7 = (g2 , g5 ) & = %
                                 p8 = (g3 , g4 ) = % %
                                 p9 = (g3 , g5 ) % = =
                                 p10 = (g4 , g5 ) % & &

                            Table 2. Sign matrix for Table 1.




3     Proposed Approach

3.1    Recommendation Based on Interval Pattern Structures

Table 1 can be regarded as a rating matrix, where G is the set of users and M
is the set of movies. Here we have a target user gt for which we will estimate
his/her rating for movie m3 . Based on this estimation, we can decide whether
we recommend m3 to gt .
    To do that we can traverse the lattice in Figure 3 from the top node. Here
we search the concept(s) with the most specific interval that contains m1 : 4 and
m2 : 7. We will arrive at concept ({g1 , g3 , g5 }, h[4, 5], [7, 8], [5, 6]i). Therefore, we
can estimate that gt will likely give a rating between 5 and 6 for m3 .


3.2    Recommendation Based on Gradual Pattern Mining

For explaining this approach, we will also use the example in Table 1 as user–
movie rating matrix. The user gt has rated the movie m1 and m2 , and his/her
rating for m3 is to be estimated. Therefore, from the sign matrix in Table 2, we
have to find the largest bicluster that contains m1 and m2 .
    Suppose that the largest bicluster is ({p3 , p5 , p6 , p7 , p8 , p9 , p10 }, {m1 , m2 , m3 })
that represents the pattern R1 “the more/less m1 , the less/more m2 , the less/more
m3 . Then, we have to compare the ratings from gt to the ratings from every other
users, as shown in Table 3. Here the pairs that follow R1 are pa , pc , and pd , and
the estimation of m3 ’s rating from gt is in the range [6,8].
                            Pairs        m1 m2 m3 estimation
                         pa = (g1 , gt ) & =        ≥6
                         pb = (g2 , gt ) & &         –
                         pc = (g3 , gt ) = &        ≤5
                         pd = (g4 , gt ) = &        ≤8
                         pe = (g5 , gt ) & &         –

            Table 3. Comparison of gt and every other users in Table 1.


4    Methodology
Our research problem is to build a more sophisticated collaborative recommen-
dation system for visitors in a museum. The problem is then formulated as rating
prediction, i.e. predicting the given rating from a new user based on patterns
studied from previous users. We focus on whether FCA and pattern structures
are applicable to our problem. In literature, certain types of pattern structures
have been studied to solve the task of recommendation. On the other hand, the
applicability of interval pattern structures for this task is not yet explored, and
this becomes our objective.
    Finally, the proposed recommendation system is evaluated based on the ac-
curacy of recommended items. Given that the CrossCult project doesn’t have
any substantial visitor–item rating dataset, the approaches will be tested on a
movie dataset.
    Currently, the authors work on gradual pattern mining using coherent-sign-
changes biclustering. In order to take into account the ‘=’ sign, we will apply
the interval pattern structures.


Acknowledgments
The thesis of the author is financed by the Région Grand-Est and the Euro-
pean project CrossCult (http://www.crosscult.eu/), under the supervision
of Amedeo Napoli, Chedy Raı̈ssi, and Miguel Couceiro.


References
1. Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender sys-
   tems: A survey of the state-of-the-art and possible extensions. IEEE transactions
   on knowledge and data engineering 17(6), 734–749 (2005)
2. Dubois, D., Prade, H.: Gradual inference rules in approximate reasoning. Informa-
   tion Sciences 61(1-2), 103–122 (1992)
3. Ganter, B., Kuznetsov, S.O.: Pattern structures and their projections. In: Interna-
   tional Conference on Conceptual Structures. pp. 129–142. Springer (2001)
4. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations.
   Springer-Verlag, 2nd edn. (1999)
5. Kaytoue, M.: Traitement de données numériques par analyse formelle de concepts
   et structures de patrons. Ph.D. thesis, Université de Lorraine (2011)
6. Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis:
   a survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics
   (TCBB) 1(1), 24–45 (2004)
7. Subašić, P., Hirota, K.: Similarity rules and gradual rules for analogical and in-
   terpolative reasoning with imprecise data. Fuzzy Sets and Systems 96(1), 53–75
   (1998)