=Paper= {{Paper |id=Vol-2729/paper14 |storemode=property |title=Building a Representation Context Based on Attribute Exploration Algorithms |pdfUrl=https://ceur-ws.org/Vol-2729/paper14.pdf |volume=Vol-2729 |authors=Jaume Baixeries,Victor Codocedo,Mehdi Kaytoue,Amedeo Napoli |dblpUrl=https://dblp.org/rec/conf/ecai/BaixeriesCKN20 }} ==Building a Representation Context Based on Attribute Exploration Algorithms== https://ceur-ws.org/Vol-2729/paper14.pdf
    Building a Representation Context Based on
         Attribute Exploration Algorithms

 Jaume Baixeries1 , Victor Codocedo2 , Mehdi Kaytoue3 , and Amedeo Napoli4
    1
          Computer Science Department. Universitat Politècnica de Catalunya. 08032,
                                      Barcelona. Catalonia.
        2
           Departamento de Informática. Universidad Técnica Federico Santa Marı́a.
                            Campus San Joaquı́n, Santiago de Chile.
           3
              Infologic R&D, 99 avenue de Lyon, 26500 Bourg-Lès-Valence, France.
            4
               Université de Lorraine, CNRS, Inria, LORIA, 54000 Nancy, France.
                  Corresponding author : victor.codocedo@inf.utfsm.cl



          Abstract. In this paper we discuss the problem of constructing the
          representation context of a pattern structure. First, we present a naive
          algorithm that computes the representation context of a pattern struc-
          ture using an algorithm which is a variation of attribute exploration.
          Then, we study a different sampling technique for reducing the size of
          the representation context. Finally we show how to build such a reduced
          context and we discuss the possible ways of doing it.


1       Introduction

Formal Concept Analysis (FCA) has dealt with non binary data mainly in two
different ways: one is by scaling [9] the original dataset into a formal context.
This method has some drawbacks as it generates a formal context with larger di-
mensions than the original dataset and, depending on the method used to scale,
some information may also be lost. Another way to deal with non binary data
is by generalizing FCA into “pattern structures” [10,15], which allows handling
complex data directly. Thus, instead of a analysing a binary relation between
some objects and their attributes, it analyses a relation between objects and their
representations, the values of which form a meet-semilattice. Pattern structures
have proved useful for different unrelated tasks, such as, for instance to ana-
lyze gene expressions [14,13], compute database dependencies [1,2] or compute
biclusters [6], or other structures data, like trees, intervals, graphs, sequences,
fuzzy and heterogeneous data, among others [12].
    Any pattern structure can be represented by an equivalent formal context,
which is consequently called a “representation context” [10,4]. By equivalent we
mean that there is a bijection between the intents in the representation context
and the pattern-intents in the pattern structure. In fact, this means that both
contain the same set of extents.
    Pattern structures are difficult to calculate precisely because of the complex
nature of the object representations they model. Often, pattern concept lattices

Copyright c 2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
are very large and their associated sets of pattern concepts are difficult to han-
dle. In this article we tackle this problem by mining a reduced “representation
context” of pattern structures by means of irreducible patterns (IPs) [7,11]. IPs
act as a basis –in the linear-algebraic sense– of the space of patterns, i.e. each
pattern can be represented as a linear combination of the join of a subset
                                                                     V       of IPs.
In the standard FCA notation, IPs correspond to the intents of -irreducible
concepts. More specifically, we propose an algorithm that calculates the extents
of a pattern structure while simultaneously calculating a small representation
context. The later allows for a cheap calculation of most pattern extents and it
is incrementally completed with samples obtained from the actual pattern struc-
ture. The algorithm is based on attribute exploration [8] (object exploration in
this case) that instead of a domain-expert, uses an oracle to validate the question
“is set X an extent in the pattern structure?”. In the negative case, the oracle
should be able to sample a new attribute to add in the representation context
which is heuristically selected to be an IP (or close to it).
     We first present a naive algorithm based on NextClosure [8] and attribute
exploration techniques. Then, we present an alternative way to “call the oracle”
with a sampling strategy which strongly depends on the nature of the object
descriptions. We show that the sampling strategy is able to obtain very small
representation contexts, and as well, it sometimes generates with the highest
probability the representation contexts with irreducible attributes only.
     This work mainly relies on studies about the formalization of representation
contexts such as [10] and [4], and on the book [8]. Moreover, a short version of
this paper was published in the proceedings of the ICFCA 2019 Conference [5].


2   Notation and Background

In the following we introduce some definitions needed for the development of
this article. The notations used are based on [9]. A formal context (G, M, I) is a
triple where G is a set of objects, M is a set of attributes and I ⊆ G × M is an
incidence relation where (g, m) ∈ I denotes that “object g has the attribute m”.
The derivation operators are denoted as 0 : ℘(G) → ℘(M) and 0 : ℘(M) → ℘(G).
    A many-valued context M = (G, N, W, J) is a data table where, in addition S to
G and M, we define a set of values Wm for each attribute m ∈ M (where W = Wm
for all m ∈ M) such that m(g) = w denotes that “the value of the attribute m
for the object g is w” (with w ∈ Wm ). Additionally, in this document we will
consider each Wm as an ordered set where Wm  i denotes the i-th element in the set.
An example of a many-valued context can be found in Table 1 where a(r1 ) = 1,
Wb = {1, 2} and Wb2 = 2.
    The pattern structure framework is a generalization of FCA [10] where ob-
jects may have complex descriptions. A pattern structure is a triple (G, (D, u), δ)
where G is a set of objects, (D, u) is a semi-lattice of complex object descrip-
tions, and δ is a function that assigns to each object in G a description in D. The
derivation operators for a pattern structure are denoted as (·) : G → D and
(·) : D → G.
                                         l
                        A ⊆ G ; A =           δ(g)
                                         g∈A

                           d ∈ D ; d = {g ∈ G | d v δ(g)}

    Pattern structures come in different flavors depending on the nature of the
representation for which they are intended. Regardless of the nature of data,
any pattern structure can be represented with a formal context, as the next
definition shows:

Definition 1 ([10]). Let (G, (D, u), δ) be a pattern structure, and let (G, M, I) be
a context such that M ⊆ D and (g, m) ∈ I ⇐⇒ m v δ(g).
   If every intent of (G, (D, u), δ) is of the form

                       G        l
                           X=       {d ∈ D | (∀x ∈ X)x v d}
                                F
    for some X ⊆ M (i.e. M is -dense in (D, u)), then (G, M, I) is a “representa-
tion context” of (G, (D, u), δ)
                              F
    The condition that M is -dense in (D, u) means that any element in D can be
represented (uniquely) as the meet of the filters of a set of descriptions X ⊆ M
in (D, u).
    Representation contexts yield concept lattices that are isomorphic to the
pattern concept lattices of their corresponding pattern structures. Moreover,
there is a bijection between the intents in the representation context and the
pattern-intents in the pattern structure. For any pattern-intent d ∈ D in the
pattern structure
           F       we have an intent X ⊆ M in the representation context such
that d = X and X =↓ d ∩ M where ↓ d is the ideal of d in (D, u).


3     Computing a Representation Context: a First and
      Naive Approach

3.1   The NaiveRepContext Algorithm

We first present a simple and naive method for building a representation context
from a given pattern structure. Algorithm 1 shows this method in the form of a
procedure called NaiveRepContext. This procedure is based on the standard
NextClosure algorithm [8], to which some changes –marked with an asterisk–
have been performed. These changes represent the interaction of the algorithm
with the oracle.
    The algorithm receives as inputs a copy of a many-valued context M =
(G, N, W, J) and implementations for the derivation operators (·) and (·) corre-
sponding to the pattern structure (G, (D, u), δ) defined over M. To distinguish the
attributes in the many-valued context from those in the representation context
created by Algorithm 1, we will refer to N as a set of columns in the many-valued
context.
    The algorithm starts by building the representation context K. The set of
objects is the same set of objects in the pattern structure, while the set of at-
tributes and the incidence relation are initially empty. Line 12 checks whether a
set of objects (B 00 ) in the representation context is an extent in the pattern struc-
ture (B  ). If this is the case, the algorithm continues executing NextClosure.
Otherwise, the algorithm adds to the representation context a new attribute
corresponding to the pattern associated to the mismatching closure. It also adds
to the incidence relation the pairs object-attribute, i.e. (h, B  ) as defined in line
14. Line 24 outputs the calculated representation context.


Algorithm 1 A Naive Calculation of the Representation Context.
 1: procedure NaiveRepContext(M, (·) , (·) )                          . M = (G, N, W, J)
 2:    M←∅
 3:    I←∅
 4:    K ← (G, M, I)
 5:    A←∅
 6:    while A 6= G do
 7:       for g ∈ G in reverse order do
 8:           if g ∈ A then
 9:               A ← A \ {g}
10:            else
11:                B ← A ∪ {g}
12:                if B 00 6= B  then                                             . (*)
13:                    M ← M ∪ {B  }                                               . (*)
14:                    I ← I ∪ {(h, B  ) | h ∈ B  }                              . (*)
15:                end if
16:                if B 00 \A contains no element < g then
17:                    A ← B 00
18:                    Exit For
19:                end if
20:            end if
21:        end for
22:        Output A
23:    end while
24:    return K
25: end procedure




Proposition 1.
Algorithm 1 computes a representation context (G, M, I) of (G, (D, u), δ).
Proof. We show that (G, M, I) meets the conditions in Definition 1. Similarly to
NextClosure, Algorithm 1 enumerates all closures –given an arbitrary closure
operator– in lectic order. However, Algorithm 1 uses two different closure opera-
tors, namely the standard closure operator of FCA defined over the representa-
tion context under construction (·)00 , and the one defined by the two derivation
operators in the pattern structure, i.e. (·) and (·) . Both closure operators are
made to coincide by the new instructions in the algorithm. When this is not the
case (i.e. B 00 6= B  for a given B ⊆ G), a new attribute is added to the repre-
sentation context in the shape of B  . Additionally, the pair (h, B  ) is added to
the incidence relation of the representation context for all objects h ∈ B. This
in turn ensures that B 00 = B  holds in the modified representation context.
    A consequence of the equality B 00 = B  is that the set of extents in the
representation context is the same as the set of extents in the pattern structure.
This also means that there is a one-to-one correspondence between the intents in
the representation context and the patterns in the pattern structure, i.e. for any
extent B 00 = B  , the intent B 000 = B 0 corresponds to the pattern B  = B 
(B 000 = B 0 and B  = B  are properties of the derivation operators [9]). Thus,
we have that any elementF in   D,dwhich can be represented as B  for an arbitrary
                             0                        0
F ⊆ G, is of the form B or {d ∈ D | (∀m ∈ B )m v d}. Consequently, M is
B
  -dense in (D, u) and (G, M, I) is an RC of the pattern structure (G, (D, u), δ). t
                                                                                   u


3.2   An Example of Execution

Table 3 shows the execution trace of NaiveRepContext over an interval pat-
tern structure defined over the many-valued formal context on Table 1. Columns
in Table 3 are: row number, a candidate set B in Algorithm 1, its closure in the
representation context, its closure in the pattern structure, the result of the test
in line 12 of Algorithm 1, and the pattern-intent B  . Notice that there are 26 en-
tries in the last column of the table that correspond to the 26 different attributes
in the resulting representation context in Table 4. The latter are enumerated in
the order they appear in Table 3.
    Examining the first row in Table 3, we observe that the algorithm initially
calculates the closure of set {r7 }. At this point the representation context is
empty so the closure of {r7 } corresponds exactly to G. However, using the pat-
tern structure we find out that {r7 } = {r7 } (test B 00 = B  fails) and we
need to add something to the representation context. Algorithms 1 adds to M
the attribute corresponding to {r7 } which is labelled as attribute d1 in the rep-
resentation context of Table 4 (thus ensuring that {r7 }00 = {r7 })). The procedure
is the same for the following sets in the lectic enumeration until we calculate the
closure of {r3 }.
    Two important things occur at this point: Firstly, {r3 } = {r3 }. Secondly,
there is enough information in the representation context so that {r3 }00 = {r3 }
as well. Consequently, no new attribute is added to the representation context.
    Let us discuss this example in more depth. The representation context at
this point is conformed by the same elements in Table  F 4 truncated
                                                                  F       at column
d10 . At this point, we should observe that {r3 } = {r3 }0 = {d6 , d8 , d9 , d10 }.
    Another important observation is that we do not really need to verify in the
pattern structure that {r3 } = {r3 }00 . Because closure is an extensive operation
(B ⊆ B 00 and B ⊆ B  ) at any point in the execution of Algorithm 1 we have
that B ⊆ B  ⊆ B 00 . Thus, B = B 00 =⇒ B = B  . This is indeed quite
important since, as shown in Table 3, usually B 00 = B  is true more often
than not (in this example, 38 times out of 64). By avoiding the calculation of
B  within the pattern structure we can avoid the costly calculation of pattern
similarities and subsumptions by means of the representation context.
      Table 1: Example Dataset 1            Table 2: Example Dataset 2
                   a b c d e                               a b c d
              r1   1 1 1 1 1                          r1   1 4 3 2
              r2   2 1 1 1 1                          r2   2 1 4 3
              r3   3 1 2 2 2                          r3   3 2 1 4
              r4   3 2 3 2 2                          r4   4 3 2 1
              r5   3 1 2 3 2
              r6   1 1 2 2 2
              r7   1 1 2 4 2



4     Computing Smaller Representation Contexts

4.1   Extending the NaiveRepContext Algorithm with Sampling

Algorithm 1 is able to calculate the representation context of the interval pattern
structure created for the many-valued context in Table 1 rather inefficiently
since it creates a different attribute for almost every interval pattern in the
pattern structure. As previously described, Algorithm 1 is able to avoid creating
attributes for some interval patterns when there is enough information in the
partial representation context. More formally, it does not include a new attribute
d in the representation
           F            T context if and only if there exists a set Y ⊆ M such
that d = Y or d = m∈Y m . Knowing this, we are interested in examining
whether there is a way to calculate a smaller representation context given an
interval pattern structure definition.
    Table 6 shows the execution trace of Algorithm 1 over the interval pattern
structure defined over Table 2. We can observe that the algorithm generates 14
different attributes in the representation context, one for each interval pattern
concept except for the top and bottom concepts. Notice that this pattern struc-
ture contains 24 = 16 interval pattern concepts and since it has 4 objects, we
can conclude that its associated concept lattice is Boolean.
    This example is interesting because it is a worst-case scenario for Algorithm 1.
In fact, it generated the largest possible representation context for the interval
pattern structure derived from Table 2. Moreover, it verified each extent closure
in the pattern structure. This example is even more interesting considering that
for such an interval pattern structure there is a known smallest representation
context which corresponds to a contranominal scale [8] that for this example
would contain only 4 attributes as shown in Table 5.
    To make matters worse, we can show that Algorithm 1 would behave the same
for an interval pattern structure with an associated Boolean concept lattice of
any size. Actually, this is a consequence of the lectic enumeration of object sets
performed by Algorithm 1 which implies that whenever B0 ⊆ B1 (with B0 , B1
closed sets in G) then, B0 is enumerated before B1 . Since a pattern-intent B1
            T in the representation context only when there exists a set Y ⊆ M
is not included
such that m∈Y m = B1 we have that (∀m ∈ Y )B1 ⊆ m . Consequently, B1
would not be included in the representation context only when all m ∈ X had
been already enumerated which cannot be the case because of lectic numeration.
             Table 3: Execution Trace of NaiveRepContext over Table 1

                B                          B 00                      B               B 00 = B                    B
 1              r7                           G                          r7                F alse   h[1, 1], [1, 1], [2, 2], [4, 4], [2, 2]i
 2              r6                           G                          r6                F alse   h[1, 1], [1, 1], [2, 2], [2, 2], [2, 2]i
 3           r6 , r 7                        G                       r6 , r 7             F alse   h[1, 1], [1, 1], [2, 2], [2, 4], [2, 2]i
 4              r5                           G                          r5                F alse   h[3, 3], [1, 1], [2, 2], [3, 3], [2, 2]i
 5           r5 , r 7                        G                       r5 , r 7             F alse   h[1, 3], [1, 1], [2, 2], [3, 4], [2, 2]i
 6           r5 , r 6                        G                    r3 , r 5 , r 6          F alse   h[1, 3], [1, 1], [2, 2], [2, 3], [2, 2]i
 7              r4                           G                          r4                F alse   h[3, 3], [2, 2], [3, 3], [2, 2], [2, 2]i
 8           r4 , r 7                        G               r3 , r4 , r5 , r6 , r7       F alse   h[1, 3], [1, 2], [2, 3], [2, 4], [2, 2]i
 9           r4 , r 6            r3 , r4 , r5 , r6 , r7           r3 , r 4 , r 6          F alse   h[1, 3], [1, 2], [2, 3], [2, 2], [2, 2]i
10           r4 , r 5            r3 , r4 , r5 , r6 , r7           r3 , r 4 , r 5          F alse   h[3, 3], [1, 2], [2, 3], [2, 3], [2, 2]i
11              r3                          r3                          r3                 T rue                      -
12           r3 , r 7            r3 , r4 , r5 , r6 , r7         r3 , r 5 , r 6 , r 7      F alse   h[1, 3], [1, 1], [2, 2], [2, 4], [2, 2]i
13           r3 , r 6                    r3 , r 6                    r3 , r 6              T rue                      -
14        r3 , r 6 , r 7            r3 , r 5 , r 6 , r 7        r3 , r 5 , r 6 , r 7       T rue                      -
15           r3 , r 5                    r3 , r 5                    r3 , r 5              T rue                      -
16        r3 , r 5 , r 7            r3 , r 5 , r 6 , r 7        r3 , r 5 , r 6 , r 7       T rue                      -
17        r3 , r 5 , r 6              r3 , r 5 , r 6              r3 , r 5 , r 6           T rue                      -
18      r3 , r 5 , r 6 , r 7        r3 , r 5 , r 6 , r 7        r3 , r 5 , r 6 , r 7       T rue                      -
19           r3 , r 4                    r3 , r 4                    r3 , r 4              T rue                      -
20        r3 , r 4 , r 7         r3 , r4 , r5 , r6 , r7      r3 , r4 , r5 , r6 , r7        T rue                      -
21        r3 , r 4 , r 6              r3 , r 4 , r 6              r3 , r 4 , r 6           T rue                      -
22      r3 , r 4 , r 6 , r 7     r3 , r4 , r5 , r6 , r7      r3 , r4 , r5 , r6 , r7        T rue                      -
23        r3 , r 4 , r 5              r3 , r 4 , r 5              r3 , r 4 , r 5           T rue                      -
24      r3 , r 4 , r 5 , r 7     r3 , r4 , r5 , r6 , r7      r3 , r4 , r5 , r6 , r7        T rue                      -
25      r3 , r 4 , r 5 , r 6     r3 , r4 , r5 , r6 , r7         r3 , r 4 , r 5 , r 6      F alse   h[1, 3], [1, 2], [2, 3], [2, 3], [2, 2]i
26 r3 , r4 , r5 , r6 , r7        r3 , r4 , r5 , r6 , r7      r3 , r4 , r5 , r6 , r7        T rue                      -
27              r2                           G                          r2                F alse   h[2, 2], [1, 1], [1, 1], [1, 1], [1, 1]i
28           r2 , r 7                        G                  r1 , r 2 , r 6 , r 7      F alse   h[1, 2], [1, 1], [1, 2], [1, 4], [1, 2]i
29           r2 , r 6               r1 , r 2 , r 6 , r 7          r1 , r 2 , r 6          F alse   h[1, 2], [1, 1], [1, 2], [1, 2], [1, 2]i
30           r2 , r 5                        G                    r2 , r 3 , r 5          F alse   h[2, 3], [1, 1], [1, 2], [1, 3], [1, 2]i
31           r2 , r 4                        G                    r2 , r 3 , r 4          F alse   h[2, 3], [1, 2], [1, 3], [1, 2], [1, 2]i
32           r2 , r 3                    r2 , r 3                    r2 , r 3              T rue                      -
33        r2 , r 3 , r 7                     G             r1 , r2 , r3 , r5 , r6 , r7    F alse   h[1, 3], [1, 1], [1, 2], [1, 4], [1, 2]i
34        r2 , r 3 , r 6       r1 , r2 , r3 , r5 , r6 , r7      r1 , r 2 , r 3 , r 6      F alse   h[1, 3], [1, 1], [1, 2], [1, 2], [1, 2]i
35        r2 , r 3 , r 5              r2 , r 3 , r 5              r2 , r 3 , r 5           T rue                      -
36      r2 , r 3 , r 5 , r 7   r1 , r2 , r3 , r5 , r6 , r7 r1 , r2 , r3 , r5 , r6 , r7     T rue                      -
37      r2 , r 3 , r 5 , r 6   r1 , r2 , r3 , r5 , r6 , r7 r1 , r2 , r3 , r5 , r6         F alse   h[1, 3], [1, 1], [1, 2], [1, 3], [1, 2]i
38        r2 , r 3 , r 4              r2 , r 3 , r 4              r2 , r 3 , r 4           T rue                      -
39      r2 , r 3 , r 4 , r 7                 G                           G                 T rue                      -
40      r2 , r 3 , r 4 , r 6                 G               r1 , r2 , r3 , r4 , r6       F alse   h[1, 3], [1, 2], [1, 3], [1, 2], [1, 2]i
41      r2 , r 3 , r 4 , r 5                 G                  r2 , r 3 , r 4 , r 5      F alse   h[2, 3], [1, 2], [1, 3], [1, 3], [1, 2]i
42 r2 , r3 , r4 , r5 , r7                    G                           G                 T rue                      -
43 r2 , r3 , r4 , r5 , r6                    G             r1 , r2 , r3 , r4 , r5 , r6    F alse   h[1, 3], [1, 2], [1, 3], [1, 3], [1, 2]i
44              r1                    r1 , r 2 , r 6                    r1                F alse   h[1, 1], [1, 1], [1, 1], [1, 1], [1, 1]i
45           r1 , r 7               r1 , r 2 , r 6 , r 7          r1 , r 6 , r 7          F alse   h[1, 1], [1, 1], [1, 2], [1, 4], [1, 2]i
46           r1 , r 6                    r1 , r 6                    r1 , r 6              T rue                      -
47        r1 , r 6 , r 7              r1 , r 6 , r 7              r1 , r 6 , r 7           T rue                      -
48           r1 , r 5            r1 , r2 , r3 , r5 , r6      r1 , r2 , r3 , r5 , r6        T rue                      -
49           r1 , r 4            r1 , r2 , r3 , r4 , r6      r1 , r2 , r3 , r4 , r6        T rue                      -
50           r1 , r 3               r1 , r 2 , r 3 , r 6        r1 , r 2 , r 3 , r 6       T rue                      -
51           r1 , r 2                 r1 , r 2 , r 6                 r1 , r 2             F alse   h[1, 2], [1, 1], [1, 1], [1, 1], [1, 1]i
52        r1 , r 2 , r 7            r1 , r 2 , r 6 , r 7        r1 , r 2 , r 6 , r 7       T rue                      -
53        r1 , r 2 , r 6              r1 , r 2 , r 6              r1 , r 2 , r 6           T rue                      -
54      r1 , r 2 , r 6 , r 7        r1 , r 2 , r 6 , r 7        r1 , r 2 , r 6 , r 7       T rue                      -
55        r1 , r 2 , r 5         r1 , r2 , r3 , r5 , r6      r1 , r2 , r3 , r5 , r6        T rue                      -
56        r1 , r 2 , r 4         r1 , r2 , r3 , r4 , r6      r1 , r2 , r3 , r4 , r6        T rue                      -
57        r1 , r 2 , r 3            r1 , r 2 , r 3 , r 6        r1 , r 2 , r 3 , r 6       T rue                      -
58 r1 , r2 , r3 , r6 , r7 r1 , r2 , r3 , r5 , r6 , r7 r1 , r2 , r3 , r5 , r6 , r7          T rue                      -
59      r1 , r 2 , r 3 , r 5     r1 , r2 , r3 , r5 , r6      r1 , r2 , r3 , r5 , r6        T rue                      -
60 r1 , r2 , r3 , r5 , r6 , r7 r1 , r2 , r3 , r5 , r6 , r7 r1 , r2 , r3 , r5 , r6 , r7     T rue                      -
61      r1 , r 2 , r 3 , r 4     r1 , r2 , r3 , r4 , r6      r1 , r2 , r3 , r4 , r6        T rue                      -
62 r1 , r2 , r3 , r4 , r6 , r7               G                           G                 T rue                      -
63 r1 , r2 , r3 , r4 , r5 r1 , r2 , r3 , r4 , r5 , r6 r1 , r2 , r3 , r4 , r5 , r6          T rue                      -
64               G                           G                           G                 T rue                      -
 Table 4: Representation Context of the Interval Pattern Structure of Table 1

      d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12 d13 d14 d15 d16 d17 d18 d19 d20 d21 d22 d23 d24 d25 d26
   r1                                             × ×             × × × ×             × × × ×
   r2                                         × × × × × × × × × × ×                               ×
   r3                × ×× × × ×                           × × × × × × × ×
   r4                   ××× ×             ×                   ×               × × ×
   r5          ××× × × × ×                                ×       ×       ×       × ×
   r6 × ×            × ××             × ×         × ×             × × × ×             ×       ×
   r7 × × ×                ×          ×           ×               ×                           ×


              Table 5: Contranominal scale generated by Algorithm 3

                                                      X1 X2 X3 X4
                                                 r1          ×      ×    ×
                                                 r2    ×            ×    ×
                                                 r3    ×     ×           ×
                                                 r4    ×     ×      ×




    Given a pattern structure with an associated Boolean concept lattice we
would like that ideally, Algorithm 1 only adds irreducible
                                                       F attributes to the repre-
sentation context. In this way we would maintain the -dense in (D, u) property
with a minimum number of attributes. Table 6 shows in gray those rows that rep-
resent the relevant attributes to add to the representation context. Adding only
these four attributes would be enough to represent the set of interval pattern
concepts. Nevertheless, algorithms that compute the irreducible attributes of a
closure system have exponential complexity in the size of its formal context[3].
Instead, we resort to a sampling-based strategy to retrieve attributes with a
large image in G.



         Table 6: Execution Trace of NaiveRepContext over Table 2

                    B                B 00             B           B 00 = B                 B
          1        r4         r1 , r 2 , r 3 , r 4      r4              F alse    h[4, 4], [3, 3], [2, 2], [1, 1]i
          2        r3         r1 , r 2 , r 3 , r 4      r3              F alse    h[3, 3], [2, 2], [1, 1], [4, 4]i
          3      r3 , r 4     r1 , r 2 , r 3 , r 4   r3 , r4            F alse    h[3, 4], [2, 3], [1, 2], [1, 4]i
          4        r2         r1 , r 2 , r 3 , r 4      r2              F alse    h[2, 2], [1, 1], [4, 4], [3, 3]i
          5      r2 , r 4     r1 , r 2 , r 3 , r 4   r2 , r4            F alse    h[2, 4], [1, 3], [2, 4], [1, 3]i
          6      r2 , r 3     r1 , r 2 , r 3 , r 4   r2 , r3            F alse    h[2, 3], [1, 2], [1, 4], [3, 4]i
          7   r2 , r 3 , r 4 r1 , r 2 , r 3 , r 4 r 2 , r 3 , r 4       F alse    h[2, 4], [1, 3], [1, 4], [1, 4]i
          8        r1         r1 , r 2 , r 3 , r 4      r1              F alse    h[1, 1], [4, 4], [3, 3], [2, 2]i
          9      r1 , r 4     r1 , r 2 , r 3 , r 4   r1 , r4            F alse    h[1, 4], [3, 4], [2, 3], [1, 2]i
         10      r1 , r 3     r1 , r 2 , r 3 , r 4   r1 , r3            F alse    h[1, 3], [2, 4], [1, 3], [2, 4]i
         11 r1 , r3 , r4 r1 , r2 , r3 , r4 r1 , r3 , r4                 F alse    h[1, 4], [2, 4], [1, 3], [1, 4]i
         12      r1 , r 2     r1 , r 2 , r 3 , r 4   r1 , r2            F alse    h[1, 2], [1, 4], [3, 4], [2, 3]i
         13 r1 , r2 , r4 r1 , r2 , r3 , r4 r1 , r2 , r4                 F alse    h[1, 4], [1, 4], [2, 4], [1, 3]i
         14 r1 , r2 , r3 r1 , r2 , r3 , r4 r1 , r2 , r3                 F alse    h[1, 3], [1, 4], [1, 4], [2, 4]i
         15 r1 , r2 , r3 , r4 r1 , r2 , r3 , r4 r1 , r2 , r3 , r4       T rue                    -
4.2   Sampling Attributes for Interval Pattern Structures

Our sampling method is based on picking large convex regions in the space. To
achieve this, we use a many-valued context denoted as (G, N, W, J) to differentiate
it from the sets in the representation context (G, M, I). Given a set B and its
closure in the representation context B 00 –which we know to be different from
B  – the goal of the sampling procedure is to find a set X such that B ⊆ (X ∩
B 00 ). The sampling procedure starts by picking a random column in the many-
valued context n ∈ N. Then, we randomly pick a side to trim the ordered set Wn
(left or right) to generate a new set Ŵn . A candidate set X = {g ∈ G | n(g) ∈ Ŵn }
is created and we check whether B ⊆ X. If so, we return X if and only if B 00 6⊆ X.
In a different case, we pick a random column from N and proceed with the same
instructions. This is a basic description of the sampling algorithm described in
Algorithm 3. Some details on the creation of Ŵn are left described only in the
pseudocode.
     Algorithm 3 is able to generate large convex regions by considering single
dimensions of the space. It basically tries to answer the question: Which is the
largest region in any dimension which contains set B? For example, let us try
to sample an extent for the first row of Table 6 where B = {r4 } and B 00 =
{r1 , r2 , r3 , r4 }. The many-valued context is showed in Table 2. Let us pick column
b such that W b = {1, 2, 3, 4}. Next, we pick right to create Ŵb = {1, 2, 3}. The
candidate set is then X1 = {r2 , r3 , r4 } for which we have that B ⊆ X1 and
B 00 6⊆ X1 so we return X1 = {r2 , r3 , r4 }.
     We can observe that in the previous example {r2 , r3 , r4 } corresponds to the
image of an irreducible attribute in the representation context. Of course, this is
a happy accident because we have made not-so-random decisions. True random
decisions may lead to add reducible attributes into the representation context.
However, since the random decisions are likely to pick large regions in the space,
and thus attributes with a large image in G, they are likely to be irreducible
attributes.
     Algorithm 2 adapts a sampling method into the generation of the represen-
tation context. Line 14 calls the sampling procedure such as the one described in
Algorithm 3. Line 13 has been changed for a while instruction instead of an if
instruction. This is because in this case the closure B 00 should converge to B 
by the addition of one or more attributes into the representation context. This
convergence is ensured since in the worst case scenario Algorithm 3 returns B 
as an attribute for the representation context.
     Let us finish the previous example by using Algorithm 2. We notice that with
X1 = {r2 , r3 , r4 } as the first object of the representation context we have that
{r4 }00 = {r2 , r3 , r4 } which is again different from {r4 } . Consequently, Algo-
rithm 2 calls for a new sample which by the same procedure described could
be X2 = {r1 , r3 , r4 } (another happy accident). Notice that it cannot be again
{r2 , r3 , r4 } because of line 18 of Algorithm 3. Next, we have that {r4 }00 = {r3 , r4 }
calls for a new sample. This new sample could be X3 = {r1 , r2 , r4 } which renders
{r4 }00 = {r4 } . Algorithm 2 proceeds to calculate {r3 }00 which in the current
state of the representation context yields {r3 , r4 }. If the Sample procedure re-
Algorithm 2 A General Algorithm for Computing a Representation Context.
 1: procedure RepresentationContext(M, (·) , (·) )                  . M = (G, N, W, J)
 2:    M←∅
 3:    I←∅
 4:    K ← (G, M, I)
 5:    A←∅
 6:    while A 6= G do
 7:       for g ∈ G in reverse order do
 8:           if g ∈ A then
 9:               A ← A \ {g}
10:            else
11:                B ← A ∪ {g}
12:                if B 6= B 00 then
13:                    while B 00 6= B  do                                      . (*)
14:                         X ← Sample(B, B 00 , M)                               . (*)
15:                         M ← M ∪ {mX }                     . (mX is a new attribute)
16:                         I ← I ∪ {(h, mX ) | h ∈ X}                            . (*)
17:                    end while
18:                end if
19:                if B 00 \A contains no element < g then
20:                    A ← B 00
21:                    Exit For
22:                end if
23:            end if
24:        end for
25:        Output A
26:    end while
27:    return K
28: end procedure


turns X4 = {r1 , r2 , r3 } we have then that {r3 }00 = {r3 } . It should be noticed
that at this point the representation context is complete w.r.t. the interval pat-
tern structure. This is, any subsequent closure will be calculated in the repre-
sentation context alone. Table 5 shows the contranominal scale corresponding to
the representation context generated.

5    Conclusions
In this paper we have presented a sampling strategy in order to compute a
smaller representation context for an interval pattern structure. We propose two
algorithms to achieve such a computation, a first naive version based on object
exploration, and a second improved version that uses a sampling oracle to quickly
find irreducible patterns. These irreducible pattern can be considered the basis
of a pattern structure. This paper is a first step towards the computation of
minimal representation of a pattern structure by means of sampling techniques.
Acknowledgments. This research work has been supported by the recognition of
2017SGR-856 (MACDA) from AGAUR (Generalitat de Catalunya), and the grant
TIN2017-89244-R from MINECO (Ministerio de Economı́a y Competitividad).


References
 1. Jaume Baixeries, Mehdi Kaytoue, and Amedeo Napoli. Computing Similarity De-
    pendencies with Pattern Structures. In Concept Lattices and their Applications,
Algorithm 3 An interval pattern extent sampler algorithm.
 1: procedure Sample(B, B 00 , M)                                      . M = (G, N, W, J)
 2:    f ound ← F alse
 3:    states ← list of size |N|
 4:    for n ∈ N do
 5:        side ← pick randomly from {0, 1}
 6:        states[n] ← (side, 0, |Wn |)
 7:    end for
 8:    while not f ound do
 9:        n ← pick randomly from N
10:         side, i, j ← states[n]
11:         a = i + side
12:         b = j + (side − 1)
13:         if a < b then
14:             X ← {g ∈ G | Wn              n
                                 a ≤ n(g) ≤ Wb }
15:             side ← not side
16:             if B ⊆ X then
17:                 i, j ← a, b
18:                 if B 00 6⊆ X then
19:                      f ound ← T rue
20:                 end if
21:             end if
22:             states[n] = (side, i, j)
23:         end if
24:     end while
25:     return X
26: end procedure



    volume 1062, pages 33–44. CEUR Workshop Proceedings (ceur-ws.org), 2013.
 2. Jaume Baixeries, Mehdi Kaytoue, and Amedeo Napoli. Characterizing Functional
    Dependencies in Formal Concept Analysis with Pattern Structures. Annals of
    Mathematics and Artificial Intelligence, 72(1–2):129–149, 2014.
 3. Alexandre Bazin. Comparing algorithms for computing lower covers of implication-
    closed sets. In Proceedings of the Thirteenth International Conference on Concept
    Lattices and Their Applications, Moscow, Russia, July 18-22, 2016., pages 21–31,
    2016.
 4. Aleksey Buzmakov. Formal Concept Analysis and Pattern Structures for mining
    Structured Data. PhD thesis, University of Lorraine, Nancy, France, 2015.
 5. Vı́ctor Codocedo, Jaume Baixeries, Mehdi Kaytoue, and Amedeo Napoli. Sampling
    Representation Contexts with Attribute Exploration. In Diana Cristea, Florence Le
    Ber, and Baris Sertkaya, editors, Proceedings of the 15th International Conference
    on Formal Concept Analysis (ICFCA), Lecture Notes in Computer Science 11511,
    pages 307–314. Springer, 2019.
 6. Vı́ctor Codocedo and Amedeo Napoli. Lattice-based biclustering using Partition
    Pattern Structures. In Proceedings of ECAI 2014, volume 263 of Frontiers in
    Artificial Intelligence and Applications, pages 213–218. IOS Press, 2014.
 7. Brian A. Davey and Hilary A. Priestley. Introduction to Lattices and Order. Cam-
    bridge University Press, Cambridge, UK, 1990.
 8. Benhard Ganter and Sergei Obiedkov. Conceptual Exploration. Springer, Berlin,
    2016.
 9. Benhard Ganter and Rudolph Wille. Formal Concept Analysis. Springer, Berlin,
    1999.
10. Bernhard Ganter and Sergei O. Kuznetsov. Pattern Structures and Their Pro-
    jections. In Proceedings of ICCS 2001, Lecture Notes in Computer Science 2120,
    pages 129–142. Springer, 2001.
11. Michel Habib and Lhouari Nourine. Representation of lattices via set-colored
    posets. Discrete Applied Mathematics, 249:64–73, 2018.
12. Mehdi Kaytoue, Vı́ctor Codocedo, Aleksey Buzmakov, Jaume Baixeries, Sergei O
    Kuznetsov, and Amedeo Napoli. Pattern Structures and Concept Lattices for Data
    Mining and Knowledge Processing. In Proceedings of ECML-PKDD 2015 (Part
    III), volume 9286 of Lecture Notes in Computer Science, pages 227–231. Springer,
    2015.
13. Mehdi Kaytoue, Sergei O. Kuznetsov, and Amedeo Napoli. Revisiting Numerical
    Pattern Mining with Formal Concept Analysis. In Toby Walsh, editor, Proceedings
    of the 22nd International Joint Conference on Artificial Intelligence (IJCAI 2011),
    pages 1342–1347. IJCAI/AAAI, 2011.
14. Mehdi Kaytoue, Sergei O. Kuznetsov, Amedeo Napoli, and Sébastien Duplessis.
    Mining gene expression data with pattern structures in Formal Concept Analysis.
    Information Sciences, 181(10):1989–2001, 2011.
15. Sergei O. Kuznetsov. Pattern Structures for Analyzing Complex Data. In Hiroshi
    Sakai, Mihir K. Chakraborty, Aboul Ella Hassanien, Dominik Slezak, and William
    Zhu, editors, RSFDGrC, volume 5908 of Lecture Notes in Computer Science, pages
    33–44. Springer, 2009.