<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Mining the Groceries Database using Triadic Concept Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pedro H. B. Ruas</string-name>
          <email>pedrohbruas@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rokia Missaoui</string-name>
          <email>rokia.missaoui@uqo.ca</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mark A. J. Song</string-name>
          <email>song@pucminas.br</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Leonard Kwuida</string-name>
          <email>leonard.kwuida@bfh.ch</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Bern University of Applied Sciences</institution>
          ,
          <addr-line>Bern</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ponti cal Catholic University of Minas Gerais Belo Horizonte</institution>
          ,
          <country country="BR">Brazil</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universite du Quebec en Outaouais</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we illustrate the potential of Triadic Concept Analysis (TCA) in extracting useful patterns from triadic formal contexts. To that end, we exploit our recent contributions to TCA to analyze the Groceries database and identify triadic patterns expressed by concepts and association rules, including implications.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Since datasets are frequently expressed by ternary and more generally n-ary
relations, one can observe a recent and increasing interest to propose new solutions
for the analysis and exploration of these multidimensional data, and specially
triadic contexts [
        <xref ref-type="bibr" rid="ref10 ref2 ref4 ref6 ref7">2, 4, 6, 7, 10</xref>
        ]. This is the case of multidimensional social networks,
social resource sharing systems such as folksonomies, and security policies. For
instance, in the latter application, a 4-ary or quaternary relation indicates that
a user is authorized to use resources with given privileges under restricted
conditions. For example, User 1000 is allowed to access to Files F1, F2 and F3 with
the privilege to read F1 and update the last two les in the rst three working
days of the week.
      </p>
      <p>
        In this paper, we aim at illustrating the utilization of our software platform
using a subset of the well-known dataset named the Groceries database [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] of
transactions made by customers at a given date for a set of products. Subsets and
variants of this dataset have been extensively used by data mining researchers
under the name of \market basket analysis" to discover associations between
products bought by customers by looking for combinations of items that occur
together frequently in customer transactions.
      </p>
      <p>The rest of the paper is organized as follows. In Section 2 we brie y recall the
main notions of Triadic Concept Analysis. Section 3 presents our platform, the
original dataset and its preprocessing as well as a few examples of the generated
patterns which are triadic concepts and association rules, including implications.
Finally, Section 4 summarizes the paper and presents the future work in terms
of the platform enrichment and validation using synthetic and real data.</p>
    </sec>
    <sec id="sec-2">
      <title>Triadic Concept Analysis</title>
      <p>
        Triadic concept analysis was originally introduced by Lehmann and Wille [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] as
an extension to FCA, to analyze data described by three sets K1 (objects), K2
(attributes) and K3 (conditions) together with a 3-ary relation Y K1 K2 K3.
K := (K1; K2; K3; Y ) is called a triadic context. A triple (a1; a2; a3) in Y means
that object a1 possesses attribute a2 under condition a3. Table 1 shows a row of
the context, which partially describes the transactions made by Customer 3782
in K1 who bought items in K2, including pip fruit in January, chicken and oil
on February, and curd on January and September as listed in Table 1. Here the
condition set K3 represents the twelve months (J; F; : : : ; D) of the year.
      </p>
      <p>K
3782</p>
      <p>Chicken Oil Pip Fruit Curd . . . . . .</p>
      <p>J F . . . S J F . . . S J F . . . S J F . . . S J F . . . S J</p>
      <p>1 1 1 1 1 . . . . . .</p>
      <p>Table 1. A row of the triadic context K := (K1; K2; K3; Y )
F . . . S</p>
      <p>A triadic concept or closed tri-set of K is a triple c = (A1; A2; A3) (also
denoted by A1 A2 A3) with A1 K1, A2 K2, A3 K3 and A1 A2 A3 Y
is maximal with respect to inclusion in Y . A1 is called the extent of c, A2 its
intent, and A3 its modus. We name (A2; A3) the feature of c. For example,
(f2512; 4320g; fcanned beerg; fJ anuary; Aprilg) is one of two triadic concepts
(see the green box in Figure 2) with the same extent f2512; 4320g which indicates
that the two identi ed customers have two common features. The rst feature
means that both of them bought canned beer in January and April while the
second one indicates that they purchased butter and canned beer in January.</p>
      <p>
        Let L = (C; 1) be a poset of nodes such that each node represents the set
of triadic concepts in C with the same extent, and the relation 1 is induced
by the inclusion on extents. We de ned in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] an adapted version of the iPred
algorithm to link triadic concepts according to the quasi-order on extents. This
allows the Hasse diagram construction of triadic concepts.
      </p>
      <p>
        A pair (B2; B3) is called a triadic feature-based generator [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] (or F-generator)
associated with (i.e., compatible with) the feature (A2; A3) in a concept (A1; A2; A3)
if A2 A3 (B2; B3)(1)(1), where the (i)-derivation [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] is de ned by:
Xi(i) := f(aj ; ak) 2 Kj
      </p>
      <p>Kk j (ai; aj ; ak) 2 Y 8ai 2 Xig
(Xj ; Xk)(i) := fai 2 Ki j (ai; aj ; ak) 2 Y for all (aj ; ak) 2 Xj
Xkg:
(1)
(2)</p>
      <p>As an illustration, let us consider the portion of the Hasse diagram shown
in Figure 2. We can see that each node represents the set of features and
Fgenerators associated with one extent. For example, there are two features and
two F-generators attached to the node whose extent is f2512; 4320g, The dotted
red box attached to the extent f2512g contains ten F-generators. One of them
is (fco ee, pastryg; fSeptemberg) which is simply denoted by (co ee, pastry
September ) in Figure 2.</p>
      <p>
        Biedermann [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] de ned a triadic implication of the form (A ! D)C with
the meaning that \whenever A occurs under all conditions in C, then D also
occurs under the same conditions ". Later on, Ganter and Obiedkov [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] extended
Biedermann's de nition by proposing three types of implications. We recall two
of them in the following.
      </p>
      <p>
        A conditional attribute implication takes the form: A C! D, where A and D
are subsets of K2, and C is a subset of K3. It means that A implies D under all
conditions in C. In particular, the implication holds for any subset in C. This
implication is then linked to Biedermann's de nition of triadic implication as
follows [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]: A C! D () (A ! D)C1 for all C1 C.
      </p>
      <p>In a dual way, an attributional condition implication is an exact association</p>
      <p>C! D, where A and D are subsets of K3, and C is a subset
rule of the form A
of K2.</p>
      <p>
        Let (B2; B3) be a feature-based generator associated with (A2; A3) of a triadic
concept whose extent is A1, i.e., B2 A2 and B3 A3. Then, we de ne in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
the Biedermann conditional attribute implication (BCAI) as (B2 ! A2 n B2)B3
with a support equal to jA1j=jK1j if B2 A2 and B3 A3, where K1 stands
for the set of objects.
      </p>
      <p>Dually, the Biedermann attributional condition implication (BACI) (B3 !
A3 n B3)B2 holds with a support equal to jA1j=jK1j if B3 A3 and B2 A2.
A2 involved in a BCAI is constrained to be maximal among all the intents of
the features associated with (B2; B3) and A3 inside a BACI must be maximal
among all the modi of the features associated with (B2; B3).</p>
      <p>Given a feature-based generator g = (B2; B3) and a feature (A2; A3), both
associated with the node whose extent is A1 such that A2 is the maximal
intent in ((B2; B3)(1))(1) that contains B2. Let the quasi-order (X1; X2; X3) .1
(A1; A2; A3) holds between the current concept c = (A1; A2; A3) and c1 =
(X1; X2; X3) found in the lower cover of the node that contains c. Then, we
claim the following statement:</p>
      <p>If A2 X2 and X3 A3 and B2 X2 and B3 X3, then the following
Biedermann conditional attribute association rule BCAAR holds: (B2 ! X2 n
A2)B3 with a support equal to jX1j=jK1j and a con dence of jX1j=jA1j, where
K1 stands for the set of objects.</p>
      <p>In a dual way, the Biedermann attributional condition association rule BACAR
can be de ned.
3
3.1</p>
    </sec>
    <sec id="sec-3">
      <title>A Case Study</title>
      <p>
        Platform
Our development platform for TCA is implemented mostly with Python and has
the following modules [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] as presented in Figure 1:
1. Call of Data-Peeler procedure [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] to get triadic concepts
2. Computation of the precedence links by adapting the iPred algorithm [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to
the triadic setting
3. Calculation of two types of triadic generators, including the feature-based
generators
4. Computation of association rules, including implications
5. Adaptation of stability and separation indices to the triadic framework.
      </p>
      <p>
        In this paper, we will mainly illustrate our work on Module 2 and parts of
Modules 3 and 4.
The Groceries database [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] contains 38765 transactions, 3898 customers, 167
products (items) and 728 distinct transaction dates. From this database, we
extracted and analyzed many samples of di erent sizes w.r.t. to the number
of customers, the number of items and variants of transaction dates (day and
month, month only, day of the week, ..). The subset we are analyzing in this
paper contains 8121 transactions made by 1000 customers who bought 30 items
during a given month (rather than a speci c date) between 2014 and 2015. To
select items, we took the 23 frequently bought products (after removing the rst
top 5 ones) and added seven other items which are turkey, chicken, chocolate,
meat, ham, ice cream, and napkins. The identi ed customers are those who
bought at least one of the 30 selected products. A portion of the input data
after the preprocessing step (but before a conversion into a triadic context) can
be seen in Table 2.
      </p>
      <p>The reason we used such a sample is due to the fact that it allows us to
illustrate the meaning of the generated patterns that are either triadic concepts,
implications or association rules with a con dence lower than 1. Due to the
sparsity of the initial dataset and many large subsets, the set of implications
was frequently empty or small and many generated association rules were trivial
with a very low support.</p>
      <p>The sample is then converted into a triadic context where the value 1 in a
cell indicates that a given customer bought an item at least once during a given
month. For example, Customer 3782 purchased a pip fruit on January as shown
in Table 1.
In the following we show a part of the output produced for the subset of the
Groceries database by rst displaying a part of the Hasse diagram and then a
set of implications and association rules.</p>
      <p>Figure 2 shows a portion of the Hasse diagram of triadic concepts where
the node in the green box represents the extent of two triadic concepts as a
set of two customers who share two features found inside the corresponding
blue box. The rst feature tells us that Customers 2512 and 4320 bought the
item canned beer on April and January while the second feature indicates that
they both purchased butter and canned beer on January. Since the set of
Fgenerators is equal to the set of features associated with the extent f2512; 4320g,
no implication can be generated from the node labeled with this extent. If we look
at the upper covers of this current node, we observe three groups of customers:
those who bought canned beer in January (winter season), those who purchased
canned beer in April (spring) and those who bought butter in January.</p>
      <p>The node in the purple box concerns Customer 2512 with four features
displayed in the blue box and 10 F-generators in the red box. We can then extract
the following BCAI using the rst feature and the rst F-generator:
(bottled water; f ruit=vegetable juice ! co ee, newspapers; pastry)September
[0:1%; 100%]. It means that whenever this customer (one out of 1000 = 0:1%)
buys f ruit=vegetable juice and bottled water at least once on September, then
he/she purchases co ee, newspapers and pastry during this month. In a similar
way, we can identify the following BACI from another node in the diagram (not
shown in Figure 2):
(F ebruary; April ! N ovember)root vegetables [2%; 100%]. It means that buying
root vegetables on February and April implies making the same purchase on
November with a support of 2%.</p>
      <p>All the empirical tests are executed using multi-threading on a Ubuntu 19.10
based system with 32GB of RAM memory and an Intel i7-4790 3.6GHz 8-core
processor.</p>
      <p>In Table 3 we present a few statistics about the size of the output as well as
the execution time of each one of the platform component.</p>
      <p>Statistics
Nb. of links
Nb. of distinct extents
Nb. of triadic concepts
Nb.of minimal F-generators
Nb. of implications (sup &gt;0)
Nb. of association rules (conf. &lt;1)
Execution time in seconds per step
T-iPred
F-generator computation
Implication computation
Association rule computation (conf. &lt;1)
Total time</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion and Further Development</title>
      <p>In this paper we illustrated the application of algorithms for Triadic Concept
Analysis to a subset of the Groceries database as a triadic context to discover
concepts and association rules, including implications. The merits of TCA lie in
the fact that patterns are in a compact form and under many perspectives in
the sense that for a given extent (set of objects) of a triadic concept, we obtain
di erent views expressed by distinct features, and for a given generator and
compatible features, we may get more than one association rule or implication.</p>
      <p>
        Our current work is to complete the development of our software solution to
make it an open source for everyone and continue our investigation of new kinds
of association rules [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] on other datasets.
      </p>
      <p>
        Finally, in order to help researchers validate their present and future
contributions in TCA and more generally in Polyadic Concept Analysis [
        <xref ref-type="bibr" rid="ref11 ref4 ref6">4, 6, 11</xref>
        ], the
FCA research community members need to join their forces and share their best
experiences in collecting, exchanging and using clean and coherent
multidimensional datasets. Software tools to generate synthetic data of di erent sizes and
density levels are also welcome to evaluate the performance and scalability of
the designed algorithms.
      </p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgment</title>
      <p>This study was nanced in part by the NSERC (Natural Sciences and
Engineering Research Council of Canada) and by the Coordenac~ao de Aperfeicoamento
de Pessoal de N vel Superior - Brazil (CAPES) - Finance Code 001.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <article-title>Groceries dataset for market basket analysis</article-title>
          . https://www.kaggle.com/heeraldedhia/groceries-dataset
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Ananias</surname>
            ,
            <given-names>K.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Missaoui</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ruas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.H.</given-names>
            ,
            <surname>Zarate</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.E.</given-names>
            ,
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.A.</surname>
          </string-name>
          :
          <article-title>Triadic concept approximation</article-title>
          .
          <source>Information Sciences</source>
          (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Baixeries</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szathmary</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Valtchev</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Godin</surname>
          </string-name>
          , R.:
          <article-title>Yet a Faster Algorithm for Building the Hasse Diagram of a Concept Lattice</article-title>
          .
          <source>In: ICFCA'09</source>
          . pp.
          <volume>162</volume>
          {
          <issue>177</issue>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bazin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>On implication bases in n-lattices</article-title>
          .
          <source>Discrete Applied Mathematics</source>
          <volume>273</volume>
          ,
          <volume>21</volume>
          {
          <fpage>29</fpage>
          (
          <year>2020</year>
          ),
          <source>advances in Formal Concept Analysis: Traces of CLA 2016</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Biedermann</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>How triadic diagrams represent conceptual structures</article-title>
          .
          <source>In: ICCS</source>
          . pp.
          <volume>304</volume>
          {
          <issue>317</issue>
          (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Cerf</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Besson</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robardet</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boulicaut</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Data peeler: Contraint-based closed pattern mining in n-ary relations</article-title>
          .
          <source>In: Proceedings of the SIAM International Conference on Data Mining, SDM 2008, April 24-26</source>
          ,
          <year>2008</year>
          , Atlanta, Georgia, USA. pp.
          <volume>37</volume>
          {
          <fpage>48</fpage>
          .
          <string-name>
            <surname>SIAM</surname>
          </string-name>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Felde</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stumme</surname>
          </string-name>
          , G.:
          <article-title>Triadic exploration and exploration with multiple experts</article-title>
          .
          <source>CoRR abs/2102</source>
          .02654 (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Ganter</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Obiedkov</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>Implications in triadic formal contexts</article-title>
          . In: Wol ,
          <string-name>
            <given-names>K.E.</given-names>
            ,
            <surname>Pfei</surname>
          </string-name>
          <string-name>
            <given-names>er</given-names>
            , H.D.,
            <surname>Delugach</surname>
          </string-name>
          , H.S. (eds.) Conceptual Structures at Work. pp.
          <volume>186</volume>
          {
          <fpage>195</fpage>
          . Springer Berlin Heidelberg, Berlin, Heidelberg (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wille</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>A triadic approach to formal concept analysis</article-title>
          .
          <source>In: ICCS</source>
          . pp.
          <volume>32</volume>
          {
          <issue>43</issue>
          (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Missaoui</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ruas</surname>
            ,
            <given-names>P.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kwuida</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          :
          <article-title>Pattern discovery in triadic contexts</article-title>
          .
          <source>In: ICCS</source>
          . pp.
          <volume>117</volume>
          {
          <fpage>131</fpage>
          . Springer (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Voutsadakis</surname>
          </string-name>
          , G.:
          <article-title>Polyadic concept analysis</article-title>
          .
          <source>Order</source>
          <volume>19</volume>
          (
          <issue>3</issue>
          ),
          <volume>295</volume>
          {
          <fpage>304</fpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>