=Paper= {{Paper |id=Vol-3151/short5 |storemode=property |title= |pdfUrl=https://ceur-ws.org/Vol-3151/short5.pdf |volume=Vol-3151 |authors=Pedro H. B. Ruas,Rokia Missaoui,Mark A. J. Song,Léonard Kwuida }} ==== https://ceur-ws.org/Vol-3151/short5.pdf
    Mining the Groceries Database using Triadic
                Concept Analysis

    Pedro H. B. Ruas1 , Rokia Missaoui2 , Mark A. J. Song1 , Léonard Kwuida3
        1
            Pontifical Catholic University of Minas Gerais Belo Horizonte, Brazil
                       pedrohbruas@gmail.com, song@pucminas.br
                           2
                             Université du Québec en Outaouais
                                 rokia.missaoui@uqo.ca
                3
                   Bern University of Applied Sciences, Bern, Switzerland
                                 leonard.kwuida@bfh.ch



        Abstract. In this paper we illustrate the potential of Triadic Concept
        Analysis (TCA) in extracting useful patterns from triadic formal con-
        texts. To that end, we exploit our recent contributions to TCA to an-
        alyze the Groceries database and identify triadic patterns expressed by
        concepts and association rules, including implications.


1     Introduction

Since datasets are frequently expressed by ternary and more generally n-ary re-
lations, one can observe a recent and increasing interest to propose new solutions
for the analysis and exploration of these multidimensional data, and specially tri-
adic contexts [2, 4, 6, 7, 10]. This is the case of multidimensional social networks,
social resource sharing systems such as folksonomies, and security policies. For
instance, in the latter application, a 4-ary or quaternary relation indicates that
a user is authorized to use resources with given privileges under restricted con-
ditions. For example, User 1000 is allowed to access to Files F1 , F2 and F3 with
the privilege to read F1 and update the last two files in the first three working
days of the week.
    In this paper, we aim at illustrating the utilization of our software platform
using a subset of the well-known dataset named the Groceries database [1] of
transactions made by customers at a given date for a set of products. Subsets and
variants of this dataset have been extensively used by data mining researchers
under the name of “market basket analysis” to discover associations between
products bought by customers by looking for combinations of items that occur
together frequently in customer transactions.
    The rest of the paper is organized as follows. In Section 2 we briefly recall the
main notions of Triadic Concept Analysis. Section 3 presents our platform, the
original dataset and its preprocessing as well as a few examples of the generated
patterns which are triadic concepts and association rules, including implications.
Finally, Section 4 summarizes the paper and presents the future work in terms
of the platform enrichment and validation using synthetic and real data.
RealDataFCA'2021: Analyzing Real Data with Formal Concept Analysis, June 29,
2021, Strasbourg, France
       Copyright © 2021 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
2    Triadic Concept Analysis

Triadic concept analysis was originally introduced by Lehmann and Wille [9] as
an extension to FCA, to analyze data described by three sets K1 (objects), K2
(attributes) and K3 (conditions) together with a 3-ary relation Y ⊆ K1 ×K2 ×K3 .
K := (K1 , K2 , K3 , Y ) is called a triadic context. A triple (a1 , a2 , a3 ) in Y means
that object a1 possesses attribute a2 under condition a3 . Table 1 shows a row of
the context, which partially describes the transactions made by Customer 3782
in K1 who bought items in K2 , including pip fruit in January, chicken and oil
on February, and curd on January and September as listed in Table 1. Here the
condition set K3 represents the twelve months (J, F, . . . , D) of the year.


      K    Chicken Oil         Pip Fruit Curd       ...      ...
           J F ... S J F ... S J F ... S J F ... S J F ... S J F ... S
       3782 1          1       1         1       1 ...       ...
             Table 1. A row of the triadic context K := (K1 , K2 , K3 , Y )




    A triadic concept or closed tri-set of K is a triple c = (A1 , A2 , A3 ) (also
denoted by A1 ×A2 ×A3 ) with A1 ⊆ K1 , A2 ⊆ K2 , A3 ⊆ K3 and A1 ×A2 ×A3 ⊆ Y
is maximal with respect to inclusion in Y . A1 is called the extent of c, A2 its
intent, and A3 its modus. We name (A2 , A3 ) the feature of c. For example,
({2512, 4320}, {canned beer}, {January, April}) is one of two triadic concepts
(see the green box in Figure 2) with the same extent {2512, 4320} which indicates
that the two identified customers have two common features. The first feature
means that both of them bought canned beer in January and April while the
second one indicates that they purchased butter and canned beer in January.
    Let L = (C, ≤1 ) be a poset of nodes such that each node represents the set
of triadic concepts in C with the same extent, and the relation ≤1 is induced
by the inclusion on extents. We defined in [10] an adapted version of the iPred
algorithm to link triadic concepts according to the quasi-order on extents. This
allows the Hasse diagram construction of triadic concepts.
    A pair (B2 , B3 ) is called a triadic feature-based generator [10] (or F-generator)
associated with (i.e., compatible with) the feature (A2 , A3 ) in a concept (A1 , A2 , A3 )
if A2 × A3 ⊆ (B2 , B3 )(1)(1) , where the (i) -derivation [9] is defined by:
                (i)
              Xi      := {(aj , ak ) ∈ Kj ×Kk | (ai , aj , ak ) ∈ Y ∀ai ∈ Xi }         (1)
      (Xj , Xk )(i) := {ai ∈ Ki | (ai , aj , ak ) ∈ Y for all (aj , ak ) ∈ Xj ×Xk }.   (2)

   As an illustration, let us consider the portion of the Hasse diagram shown
in Figure 2. We can see that each node represents the set of features and F-
generators associated with one extent. For example, there are two features and
two F-generators attached to the node whose extent is {2512, 4320}, The dotted
red box attached to the extent {2512} contains ten F-generators. One of them
is ({coffee, pastry}, {September}) which is simply denoted by (coffee, pastry -
September ) in Figure 2.
    Biedermann [5] defined a triadic implication of the form (A → D)C with
the meaning that “whenever A occurs under all conditions in C, then D also
occurs under the same conditions”. Later on, Ganter and Obiedkov [8] extended
Biedermann’s definition by proposing three types of implications. We recall two
of them in the following.
                                                              C
    A conditional attribute implication takes the form: A −→ D, where A and D
are subsets of K2 , and C is a subset of K3 . It means that A implies D under all
conditions in C. In particular, the implication holds for any subset in C. This
implication is then linked to Biedermann’s definition of triadic implication as
                 C
follows [8]: A −→ D ⇐⇒ (A → D)C1 for all C1 ⊆ C.
    In a dual way, an attributional condition implication is an exact association
                        C
rule of the form A −→ D, where A and D are subsets of K3 , and C is a subset
of K2 .
    Let (B2 , B3 ) be a feature-based generator associated with (A2 , A3 ) of a triadic
concept whose extent is A1 , i.e., B2 ⊆ A2 and B3 ⊆ A3 . Then, we define in [10]
the Biedermann conditional attribute implication (BCAI) as (B2 → A2 \ B2 )B3
with a support equal to |A1 |/|K1 | if B2 ⊂ A2 and B3 ⊆ A3 , where K1 stands
for the set of objects.
    Dually, the Biedermann attributional condition implication (BACI) (B3 →
A3 \ B3 )B2 holds with a support equal to |A1 |/|K1 | if B3 ⊂ A3 and B2 ⊆ A2 .
A2 involved in a BCAI is constrained to be maximal among all the intents of
the features associated with (B2 , B3 ) and A3 inside a BACI must be maximal
among all the modi of the features associated with (B2 , B3 ).
    Given a feature-based generator g = (B2 , B3 ) and a feature (A2 , A3 ), both
associated with the node whose extent is A1 such that A2 is the maximal in-
tent in ((B2 , B3 )(1) )(1) that contains B2 . Let the quasi-order (X1 , X2 , X3 ) .1
(A1 , A2 , A3 ) holds between the current concept c = (A1 , A2 , A3 ) and c1 =
(X1 , X2 , X3 ) found in the lower cover of the node that contains c. Then, we
claim the following statement:
    If A2 ⊂ X2 and X3 ⊆ A3 and B2 ⊂ X2 and B3 ⊆ X3 , then the following
Biedermann conditional attribute association rule BCAAR holds: (B2 → X2 \
A2 )B3 with a support equal to |X1 |/|K1 | and a confidence of |X1 |/|A1 |, where
K1 stands for the set of objects.
    In a dual way, the Biedermann attributional condition association rule BACAR
can be defined.


3     A Case Study
3.1   Platform
Our development platform for TCA is implemented mostly with Python and has
the following modules [10] as presented in Figure 1:
1. Call of Data-Peeler procedure [6] to get triadic concepts
2. Computation of the precedence links by adapting the iPred algorithm [3] to
   the triadic setting
3. Calculation of two types of triadic generators, including the feature-based
   generators
4. Computation of association rules, including implications
5. Adaptation of stability and separation indices to the triadic framework.

  In this paper, we will mainly illustrate our work on Module 2 and parts of
Modules 3 and 4.




            Fig. 1. The architecture modules of our proposed solution.




3.2   Data set

The Groceries database [1] contains 38765 transactions, 3898 customers, 167
products (items) and 728 distinct transaction dates. From this database, we
extracted and analyzed many samples of different sizes w.r.t. to the number
of customers, the number of items and variants of transaction dates (day and
month, month only, day of the week, ..). The subset we are analyzing in this
paper contains 8121 transactions made by 1000 customers who bought 30 items
during a given month (rather than a specific date) between 2014 and 2015. To
select items, we took the 23 frequently bought products (after removing the first
top 5 ones) and added seven other items which are turkey, chicken, chocolate,
meat, ham, ice cream, and napkins. The identified customers are those who
bought at least one of the 30 selected products. A portion of the input data
after the preprocessing step (but before a conversion into a triadic context) can
be seen in Table 2.
    The reason we used such a sample is due to the fact that it allows us to
illustrate the meaning of the generated patterns that are either triadic concepts,
implications or association rules with a confidence lower than 1. Due to the
sparsity of the initial dataset and many large subsets, the set of implications
was frequently empty or small and many generated association rules were trivial
with a very low support.
    The sample is then converted into a triadic context where the value 1 in a
cell indicates that a given customer bought an item at least once during a given
month. For example, Customer 3782 purchased a pip fruit on January as shown
in Table 1.


               Costumer number         Item           Month
                     2512              butter         January
                     2512              coffee        September
                     2512          bottled water     September
                     2512          domestic eggs       April
                     2512         root vegetables     January
                     2512      fruit/vegetable juice September
                     2512           newspapers       September
                     2512          bottled water       April
                     2512               ham            April
                     2512          domestic eggs      January
                     2512           canned beer       January
                     2512           canned beer        April
                     2512              pastry        September

                               Table 2. Input data.




3.3   Patterns

In the following we show a part of the output produced for the subset of the
Groceries database by first displaying a part of the Hasse diagram and then a
set of implications and association rules.
    Figure 2 shows a portion of the Hasse diagram of triadic concepts where
the node in the green box represents the extent of two triadic concepts as a
set of two customers who share two features found inside the corresponding
blue box. The first feature tells us that Customers 2512 and 4320 bought the
item canned beer on April and January while the second feature indicates that
they both purchased butter and canned beer on January. Since the set of F-
generators is equal to the set of features associated with the extent {2512, 4320},
no implication can be generated from the node labeled with this extent. If we look
at the upper covers of this current node, we observe three groups of customers:
those who bought canned beer in January (winter season), those who purchased
canned beer in April (spring) and those who bought butter in January.
Fig. 2. Portion of the Hasse diagram annotated with extents, features and F-generators.


    The node in the purple box concerns Customer 2512 with four features dis-
played in the blue box and 10 F-generators in the red box. We can then extract
the following BCAI using the first feature and the first F-generator:
(bottled water, f ruit/vegetable juice → coffee, newspapers, pastry)September
[0.1%, 100%]. It means that whenever this customer (one out of 1000 = 0.1%)
buys f ruit/vegetable juice and bottled water at least once on September, then
he/she purchases coffee, newspapers and pastry during this month. In a similar
way, we can identify the following BACI from another node in the diagram (not
shown in Figure 2):
(F ebruary, April → N ovember)root vegetables [2%, 100%]. It means that buying
root vegetables on February and April implies making the same purchase on
November with a support of 2%.
    All the empirical tests are executed using multi-threading on a Ubuntu 19.10
based system with 32GB of RAM memory and an Intel i7-4790 3.6GHz 8-core
processor.
    In Table 3 we present a few statistics about the size of the output as well as
the execution time of each one of the platform component.
             Statistics                                Groceries dataset
             Nb. of links                                   11124
             Nb. of distinct extents                         3128
             Nb. of triadic concepts                         3912
             Nb.of minimal F-generators                      5124
             Nb. of implications (sup >0)                    2164
             Nb. of association rules (conf. <1)             7290

             Execution time in seconds per step
             T-iPred                                         2.524
             F-generator computation                       95.721
             Implication computation                         0.053
             Association rule computation (conf. <1)        16.281
             Total time                                    114.579

Table 3. Statistics and execution times in seconds for the sample (1000 × 30 × 12) of
the Groceries dataset.




4   Conclusion and Further Development

In this paper we illustrated the application of algorithms for Triadic Concept
Analysis to a subset of the Groceries database as a triadic context to discover
concepts and association rules, including implications. The merits of TCA lie in
the fact that patterns are in a compact form and under many perspectives in
the sense that for a given extent (set of objects) of a triadic concept, we obtain
different views expressed by distinct features, and for a given generator and
compatible features, we may get more than one association rule or implication.
    Our current work is to complete the development of our software solution to
make it an open source for everyone and continue our investigation of new kinds
of association rules [10] on other datasets.
    Finally, in order to help researchers validate their present and future contri-
butions in TCA and more generally in Polyadic Concept Analysis [4, 6, 11], the
FCA research community members need to join their forces and share their best
experiences in collecting, exchanging and using clean and coherent multidimen-
sional datasets. Software tools to generate synthetic data of different sizes and
density levels are also welcome to evaluate the performance and scalability of
the designed algorithms.


Acknowledgment

This study was financed in part by the NSERC (Natural Sciences and Engineer-
ing Research Council of Canada) and by the Coordenação de Aperfeiçoamento
de Pessoal de Nı́vel Superior - Brazil (CAPES) - Finance Code 001.
References
 1. Groceries         dataset         for         market         basket        analysis.
    https://www.kaggle.com/heeraldedhia/groceries-dataset
 2. Ananias, K.H., Missaoui, R., B. Ruas, P.H., Zarate, L.E., Song, M.A.: Triadic
    concept approximation. Information Sciences (2021)
 3. Baixeries, J., Szathmary, L., Valtchev, P., Godin, R.: Yet a Faster Algorithm for
    Building the Hasse Diagram of a Concept Lattice. In: ICFCA’09. pp. 162–177
    (2009)
 4. Bazin, A.: On implication bases in n-lattices. Discrete Applied Mathematics 273,
    21–29 (2020), advances in Formal Concept Analysis: Traces of CLA 2016
 5. Biedermann, K.: How triadic diagrams represent conceptual structures. In: ICCS.
    pp. 304–317 (1997)
 6. Cerf, L., Besson, J., Robardet, C., Boulicaut, J.: Data peeler: Contraint-based
    closed pattern mining in n-ary relations. In: Proceedings of the SIAM International
    Conference on Data Mining, SDM 2008, April 24-26, 2008, Atlanta, Georgia, USA.
    pp. 37–48. SIAM (2008)
 7. Felde, M., Stumme, G.: Triadic exploration and exploration with multiple experts.
    CoRR abs/2102.02654 (2021)
 8. Ganter, B., Obiedkov, S.: Implications in triadic formal contexts. In: Wolff, K.E.,
    Pfeiffer, H.D., Delugach, H.S. (eds.) Conceptual Structures at Work. pp. 186–195.
    Springer Berlin Heidelberg, Berlin, Heidelberg (2004)
 9. Lehmann, F., Wille, R.: A triadic approach to formal concept analysis. In: ICCS.
    pp. 32–43 (1995)
10. Missaoui, R., Ruas, P.H., Kwuida, L., Song, M.A.: Pattern discovery in triadic
    contexts. In: ICCS. pp. 117–131. Springer (2020)
11. Voutsadakis, G.: Polyadic concept analysis. Order 19(3), 295–304 (2002)