=Paper=
{{Paper
|id=Vol-3151/short5
|storemode=property
|title=
|pdfUrl=https://ceur-ws.org/Vol-3151/short5.pdf
|volume=Vol-3151
|authors=Pedro H. B. Ruas,Rokia Missaoui,Mark A. J. Song,Léonard Kwuida
}}
====
Mining the Groceries Database using Triadic Concept Analysis Pedro H. B. Ruas1 , Rokia Missaoui2 , Mark A. J. Song1 , Léonard Kwuida3 1 Pontifical Catholic University of Minas Gerais Belo Horizonte, Brazil pedrohbruas@gmail.com, song@pucminas.br 2 Université du Québec en Outaouais rokia.missaoui@uqo.ca 3 Bern University of Applied Sciences, Bern, Switzerland leonard.kwuida@bfh.ch Abstract. In this paper we illustrate the potential of Triadic Concept Analysis (TCA) in extracting useful patterns from triadic formal con- texts. To that end, we exploit our recent contributions to TCA to an- alyze the Groceries database and identify triadic patterns expressed by concepts and association rules, including implications. 1 Introduction Since datasets are frequently expressed by ternary and more generally n-ary re- lations, one can observe a recent and increasing interest to propose new solutions for the analysis and exploration of these multidimensional data, and specially tri- adic contexts [2, 4, 6, 7, 10]. This is the case of multidimensional social networks, social resource sharing systems such as folksonomies, and security policies. For instance, in the latter application, a 4-ary or quaternary relation indicates that a user is authorized to use resources with given privileges under restricted con- ditions. For example, User 1000 is allowed to access to Files F1 , F2 and F3 with the privilege to read F1 and update the last two files in the first three working days of the week. In this paper, we aim at illustrating the utilization of our software platform using a subset of the well-known dataset named the Groceries database [1] of transactions made by customers at a given date for a set of products. Subsets and variants of this dataset have been extensively used by data mining researchers under the name of “market basket analysis” to discover associations between products bought by customers by looking for combinations of items that occur together frequently in customer transactions. The rest of the paper is organized as follows. In Section 2 we briefly recall the main notions of Triadic Concept Analysis. Section 3 presents our platform, the original dataset and its preprocessing as well as a few examples of the generated patterns which are triadic concepts and association rules, including implications. Finally, Section 4 summarizes the paper and presents the future work in terms of the platform enrichment and validation using synthetic and real data. RealDataFCA'2021: Analyzing Real Data with Formal Concept Analysis, June 29, 2021, Strasbourg, France Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 2 Triadic Concept Analysis Triadic concept analysis was originally introduced by Lehmann and Wille [9] as an extension to FCA, to analyze data described by three sets K1 (objects), K2 (attributes) and K3 (conditions) together with a 3-ary relation Y ⊆ K1 ×K2 ×K3 . K := (K1 , K2 , K3 , Y ) is called a triadic context. A triple (a1 , a2 , a3 ) in Y means that object a1 possesses attribute a2 under condition a3 . Table 1 shows a row of the context, which partially describes the transactions made by Customer 3782 in K1 who bought items in K2 , including pip fruit in January, chicken and oil on February, and curd on January and September as listed in Table 1. Here the condition set K3 represents the twelve months (J, F, . . . , D) of the year. K Chicken Oil Pip Fruit Curd ... ... J F ... S J F ... S J F ... S J F ... S J F ... S J F ... S 3782 1 1 1 1 1 ... ... Table 1. A row of the triadic context K := (K1 , K2 , K3 , Y ) A triadic concept or closed tri-set of K is a triple c = (A1 , A2 , A3 ) (also denoted by A1 ×A2 ×A3 ) with A1 ⊆ K1 , A2 ⊆ K2 , A3 ⊆ K3 and A1 ×A2 ×A3 ⊆ Y is maximal with respect to inclusion in Y . A1 is called the extent of c, A2 its intent, and A3 its modus. We name (A2 , A3 ) the feature of c. For example, ({2512, 4320}, {canned beer}, {January, April}) is one of two triadic concepts (see the green box in Figure 2) with the same extent {2512, 4320} which indicates that the two identified customers have two common features. The first feature means that both of them bought canned beer in January and April while the second one indicates that they purchased butter and canned beer in January. Let L = (C, ≤1 ) be a poset of nodes such that each node represents the set of triadic concepts in C with the same extent, and the relation ≤1 is induced by the inclusion on extents. We defined in [10] an adapted version of the iPred algorithm to link triadic concepts according to the quasi-order on extents. This allows the Hasse diagram construction of triadic concepts. A pair (B2 , B3 ) is called a triadic feature-based generator [10] (or F-generator) associated with (i.e., compatible with) the feature (A2 , A3 ) in a concept (A1 , A2 , A3 ) if A2 × A3 ⊆ (B2 , B3 )(1)(1) , where the (i) -derivation [9] is defined by: (i) Xi := {(aj , ak ) ∈ Kj ×Kk | (ai , aj , ak ) ∈ Y ∀ai ∈ Xi } (1) (Xj , Xk )(i) := {ai ∈ Ki | (ai , aj , ak ) ∈ Y for all (aj , ak ) ∈ Xj ×Xk }. (2) As an illustration, let us consider the portion of the Hasse diagram shown in Figure 2. We can see that each node represents the set of features and F- generators associated with one extent. For example, there are two features and two F-generators attached to the node whose extent is {2512, 4320}, The dotted red box attached to the extent {2512} contains ten F-generators. One of them is ({coffee, pastry}, {September}) which is simply denoted by (coffee, pastry - September ) in Figure 2. Biedermann [5] defined a triadic implication of the form (A → D)C with the meaning that “whenever A occurs under all conditions in C, then D also occurs under the same conditions”. Later on, Ganter and Obiedkov [8] extended Biedermann’s definition by proposing three types of implications. We recall two of them in the following. C A conditional attribute implication takes the form: A −→ D, where A and D are subsets of K2 , and C is a subset of K3 . It means that A implies D under all conditions in C. In particular, the implication holds for any subset in C. This implication is then linked to Biedermann’s definition of triadic implication as C follows [8]: A −→ D ⇐⇒ (A → D)C1 for all C1 ⊆ C. In a dual way, an attributional condition implication is an exact association C rule of the form A −→ D, where A and D are subsets of K3 , and C is a subset of K2 . Let (B2 , B3 ) be a feature-based generator associated with (A2 , A3 ) of a triadic concept whose extent is A1 , i.e., B2 ⊆ A2 and B3 ⊆ A3 . Then, we define in [10] the Biedermann conditional attribute implication (BCAI) as (B2 → A2 \ B2 )B3 with a support equal to |A1 |/|K1 | if B2 ⊂ A2 and B3 ⊆ A3 , where K1 stands for the set of objects. Dually, the Biedermann attributional condition implication (BACI) (B3 → A3 \ B3 )B2 holds with a support equal to |A1 |/|K1 | if B3 ⊂ A3 and B2 ⊆ A2 . A2 involved in a BCAI is constrained to be maximal among all the intents of the features associated with (B2 , B3 ) and A3 inside a BACI must be maximal among all the modi of the features associated with (B2 , B3 ). Given a feature-based generator g = (B2 , B3 ) and a feature (A2 , A3 ), both associated with the node whose extent is A1 such that A2 is the maximal in- tent in ((B2 , B3 )(1) )(1) that contains B2 . Let the quasi-order (X1 , X2 , X3 ) .1 (A1 , A2 , A3 ) holds between the current concept c = (A1 , A2 , A3 ) and c1 = (X1 , X2 , X3 ) found in the lower cover of the node that contains c. Then, we claim the following statement: If A2 ⊂ X2 and X3 ⊆ A3 and B2 ⊂ X2 and B3 ⊆ X3 , then the following Biedermann conditional attribute association rule BCAAR holds: (B2 → X2 \ A2 )B3 with a support equal to |X1 |/|K1 | and a confidence of |X1 |/|A1 |, where K1 stands for the set of objects. In a dual way, the Biedermann attributional condition association rule BACAR can be defined. 3 A Case Study 3.1 Platform Our development platform for TCA is implemented mostly with Python and has the following modules [10] as presented in Figure 1: 1. Call of Data-Peeler procedure [6] to get triadic concepts 2. Computation of the precedence links by adapting the iPred algorithm [3] to the triadic setting 3. Calculation of two types of triadic generators, including the feature-based generators 4. Computation of association rules, including implications 5. Adaptation of stability and separation indices to the triadic framework. In this paper, we will mainly illustrate our work on Module 2 and parts of Modules 3 and 4. Fig. 1. The architecture modules of our proposed solution. 3.2 Data set The Groceries database [1] contains 38765 transactions, 3898 customers, 167 products (items) and 728 distinct transaction dates. From this database, we extracted and analyzed many samples of different sizes w.r.t. to the number of customers, the number of items and variants of transaction dates (day and month, month only, day of the week, ..). The subset we are analyzing in this paper contains 8121 transactions made by 1000 customers who bought 30 items during a given month (rather than a specific date) between 2014 and 2015. To select items, we took the 23 frequently bought products (after removing the first top 5 ones) and added seven other items which are turkey, chicken, chocolate, meat, ham, ice cream, and napkins. The identified customers are those who bought at least one of the 30 selected products. A portion of the input data after the preprocessing step (but before a conversion into a triadic context) can be seen in Table 2. The reason we used such a sample is due to the fact that it allows us to illustrate the meaning of the generated patterns that are either triadic concepts, implications or association rules with a confidence lower than 1. Due to the sparsity of the initial dataset and many large subsets, the set of implications was frequently empty or small and many generated association rules were trivial with a very low support. The sample is then converted into a triadic context where the value 1 in a cell indicates that a given customer bought an item at least once during a given month. For example, Customer 3782 purchased a pip fruit on January as shown in Table 1. Costumer number Item Month 2512 butter January 2512 coffee September 2512 bottled water September 2512 domestic eggs April 2512 root vegetables January 2512 fruit/vegetable juice September 2512 newspapers September 2512 bottled water April 2512 ham April 2512 domestic eggs January 2512 canned beer January 2512 canned beer April 2512 pastry September Table 2. Input data. 3.3 Patterns In the following we show a part of the output produced for the subset of the Groceries database by first displaying a part of the Hasse diagram and then a set of implications and association rules. Figure 2 shows a portion of the Hasse diagram of triadic concepts where the node in the green box represents the extent of two triadic concepts as a set of two customers who share two features found inside the corresponding blue box. The first feature tells us that Customers 2512 and 4320 bought the item canned beer on April and January while the second feature indicates that they both purchased butter and canned beer on January. Since the set of F- generators is equal to the set of features associated with the extent {2512, 4320}, no implication can be generated from the node labeled with this extent. If we look at the upper covers of this current node, we observe three groups of customers: those who bought canned beer in January (winter season), those who purchased canned beer in April (spring) and those who bought butter in January. Fig. 2. Portion of the Hasse diagram annotated with extents, features and F-generators. The node in the purple box concerns Customer 2512 with four features dis- played in the blue box and 10 F-generators in the red box. We can then extract the following BCAI using the first feature and the first F-generator: (bottled water, f ruit/vegetable juice → coffee, newspapers, pastry)September [0.1%, 100%]. It means that whenever this customer (one out of 1000 = 0.1%) buys f ruit/vegetable juice and bottled water at least once on September, then he/she purchases coffee, newspapers and pastry during this month. In a similar way, we can identify the following BACI from another node in the diagram (not shown in Figure 2): (F ebruary, April → N ovember)root vegetables [2%, 100%]. It means that buying root vegetables on February and April implies making the same purchase on November with a support of 2%. All the empirical tests are executed using multi-threading on a Ubuntu 19.10 based system with 32GB of RAM memory and an Intel i7-4790 3.6GHz 8-core processor. In Table 3 we present a few statistics about the size of the output as well as the execution time of each one of the platform component. Statistics Groceries dataset Nb. of links 11124 Nb. of distinct extents 3128 Nb. of triadic concepts 3912 Nb.of minimal F-generators 5124 Nb. of implications (sup >0) 2164 Nb. of association rules (conf. <1) 7290 Execution time in seconds per step T-iPred 2.524 F-generator computation 95.721 Implication computation 0.053 Association rule computation (conf. <1) 16.281 Total time 114.579 Table 3. Statistics and execution times in seconds for the sample (1000 × 30 × 12) of the Groceries dataset. 4 Conclusion and Further Development In this paper we illustrated the application of algorithms for Triadic Concept Analysis to a subset of the Groceries database as a triadic context to discover concepts and association rules, including implications. The merits of TCA lie in the fact that patterns are in a compact form and under many perspectives in the sense that for a given extent (set of objects) of a triadic concept, we obtain different views expressed by distinct features, and for a given generator and compatible features, we may get more than one association rule or implication. Our current work is to complete the development of our software solution to make it an open source for everyone and continue our investigation of new kinds of association rules [10] on other datasets. Finally, in order to help researchers validate their present and future contri- butions in TCA and more generally in Polyadic Concept Analysis [4, 6, 11], the FCA research community members need to join their forces and share their best experiences in collecting, exchanging and using clean and coherent multidimen- sional datasets. Software tools to generate synthetic data of different sizes and density levels are also welcome to evaluate the performance and scalability of the designed algorithms. Acknowledgment This study was financed in part by the NSERC (Natural Sciences and Engineer- ing Research Council of Canada) and by the Coordenação de Aperfeiçoamento de Pessoal de Nı́vel Superior - Brazil (CAPES) - Finance Code 001. References 1. Groceries dataset for market basket analysis. https://www.kaggle.com/heeraldedhia/groceries-dataset 2. Ananias, K.H., Missaoui, R., B. Ruas, P.H., Zarate, L.E., Song, M.A.: Triadic concept approximation. Information Sciences (2021) 3. Baixeries, J., Szathmary, L., Valtchev, P., Godin, R.: Yet a Faster Algorithm for Building the Hasse Diagram of a Concept Lattice. In: ICFCA’09. pp. 162–177 (2009) 4. Bazin, A.: On implication bases in n-lattices. Discrete Applied Mathematics 273, 21–29 (2020), advances in Formal Concept Analysis: Traces of CLA 2016 5. Biedermann, K.: How triadic diagrams represent conceptual structures. In: ICCS. pp. 304–317 (1997) 6. Cerf, L., Besson, J., Robardet, C., Boulicaut, J.: Data peeler: Contraint-based closed pattern mining in n-ary relations. In: Proceedings of the SIAM International Conference on Data Mining, SDM 2008, April 24-26, 2008, Atlanta, Georgia, USA. pp. 37–48. SIAM (2008) 7. Felde, M., Stumme, G.: Triadic exploration and exploration with multiple experts. CoRR abs/2102.02654 (2021) 8. Ganter, B., Obiedkov, S.: Implications in triadic formal contexts. In: Wolff, K.E., Pfeiffer, H.D., Delugach, H.S. (eds.) Conceptual Structures at Work. pp. 186–195. Springer Berlin Heidelberg, Berlin, Heidelberg (2004) 9. Lehmann, F., Wille, R.: A triadic approach to formal concept analysis. In: ICCS. pp. 32–43 (1995) 10. Missaoui, R., Ruas, P.H., Kwuida, L., Song, M.A.: Pattern discovery in triadic contexts. In: ICCS. pp. 117–131. Springer (2020) 11. Voutsadakis, G.: Polyadic concept analysis. Order 19(3), 295–304 (2002)