=Paper=
{{Paper
|id=Vol-3308/paper15
|storemode=property
|title=GALACTIC: towards a generic and scalable platform
for complex and heterogeneous data using Formal
Concept Analysis
|pdfUrl=https://ceur-ws.org/Vol-3308/Paper15.pdf
|volume=Vol-3308
|authors=Christophe Demko,Salah Eddine Boukhetta,Jérêmy Richard,Guillaume Savarit,Karell Bertet,Cyril Faucher,Damien Mondou
|dblpUrl=https://dblp.org/rec/conf/cla/DemkoBRSBFM22
}}
==GALACTIC: towards a generic and scalable platform
for complex and heterogeneous data using Formal
Concept Analysis==
GALACTIC: towards a generic and scalable platform for complex and heterogeneous data using Formal Concept Analysis Christophe Demko1,∗ , Salah Eddine Boukhetta1 , Jérémy Richard1 , Guillaume Savarit1 , Karell Bertet1 , Cyril Faucher1 and Damien Mondou1 1 Laboratory L3i, La Rochelle University, La Rochelle, France Abstract In a recent paper, we presented a new pattern discovery algorithm, NextPriorityConcept, in order to take into account complex and heterogeneous data using Formal Concept Analysis. We implemented this algorithm and developed a python 3 library whose acronym GALACTIC means Galois Lattices, Concept Theory, Implicational systems and Closures. It is opened to the community using a BSD-3 license and its architecture allows the writing of plugins to take into account new datatypes. In this article we will present the architecture of our software solution, we will explain how to add new plugins to the core of our system by giving the UML diagram of each kind of plugins and we will give some examples of plugins developed within our team. Keywords Formal Concept Analysis, Pattern Structures, Complex Data, Heterogeneous data 1. Introduction GALACTIC is a new platform for computing patterns from heterogeneous and complex data that extend the approach of pattern structures [1] and logical concept analysis [2]. It’s a development platform for a generic implementation of the NextPriorityConcept[3] algorithm allowing easy integration of new plugins for characteristics, descriptions, strategies and meta-strategies. The GALACTIC eco-system is organized with: • A core which implements the NextPriorityConcept algorithm and a lot of tools for visualizing lattices and reduced contexts in python notebooks; • A set of characteristic plugins defining new types of data; • A set of description plugins defining new types of descriptions and their associated predicates; • A set of strategy plugins defining new types of strategies for a given characteristic; Published in Pablo Cordero, Ondrej Kridlo (Eds.): The 16𝑡ℎ International Conference on Concept Lattices and Their Applications, CLA 2022, Tallinn, Estonia, June 20–22, 2022, Proceedings, pp. 191–198. ∗ Corresponding author. Envelope-Open christophe.demko@univ-lr.fr (C. Demko); salah.boukhetta@univ-lr.fr (S. Boukhetta); jeremy.richard2@univ-lr.fr (J. Richard); guillaume.savarit1@univ-lr.fr (G. Savarit); karell.bertet@univ-lr.fr (K. Bertet); cyril.faucher@univ-lr.fr (C. Faucher); damien.mondou@univ-lr.fr (D. Mondou) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) Characteristic Description Strategy Data reader Next Priority Concept Figure 1: the NextPriorityConcept algorithm process • A set of measure plugins useful for the filter meta-strategies; • A set of data reader plugins allowing GALACTIC to read any type of data file; • A set of applications using the core library and the different plugins; • A set of localization plugins for translating the different applications. Each plugin must register with the core library by declaring an entry point in the configuration file of the setuptools 1 (setup.py ) named py_galactic_extension . The declared function informs the library that a new extension is available. The construction of the lattice is carried out as shown in Fig. 1: the data are read from a file; characteristics are extracted; a description is produced for each concept; strategies generate selectors for the exploration of potential new concepts and the NextPriorityConcept algorithm selects the predecessors and maintains the lattice structure. 2. GALACTIC platform 2.1. Data reader plugins A data reader plugin is responsible for reading a file according to its extension and producing the list of individuals of the population. It must implement two class methods: • read(cls, data_file: TextIO): Iterable[Any] that reads a file and produces an iterable of objects. • extensions(cls): Iterator[str] that produces the list of file extensions supported by this plugin. 1 https://pypi.org/project/setuptools/ 2.2. Characteristic plugins Each concept is composed of a subset of objects together with a set of predicates describing them, each predicate being specific to one type of characteristics. Such generic use of predicates makes it possible to consider heterogeneous data as input, i.e. numeric, discrete or more complex data. A characteristic plugin (cf Fig. 3 in appendix) is responsible of extracting a value from a python object. It must implement the __call__ magic method which will be applied to each individual of the population. This method should return the characteristic of the individual. It is also reasonable to implement the other magic methods __eq__ , __hash__ , __str__ . 2.3. Description plugins The algorithm introduces the notion of description 𝛿 as an application to provide predicates describing a set of objects 𝐴 according to their characteristics, that corresponds to a concept (𝐴, 𝛿(𝐴)). At each iteration, predicates describing the objects 𝐴 of the current concept are computed ”on the fly” by a specific treatment for each type of characteristics, depending on whether it is numeric, discrete or more complex, and the final description 𝛿 is the union of these predicates. In order to obtain a lattice, the description must verify 𝛿(𝐴) ⊑ 𝛿(𝐴′ ) for 𝐴′ ⊆ 𝐴. NextPriorityConcept computes the concept lattice of the context composed of individuals in rows and predicates issued from the description in columns[3]. A description plugin (cf Fig. 4 in appendix) is used to describe a collection of individuals using a set of predicates. It usually defines a new predicate class and a new description class. The description class is responsible of calculating a set of descriptors representing the convex hull of a collection of individuals. These descriptors must be predicates that describe half-spaces on the set of individuals. The convex hull is represented by the intersection of the set of descriptors. 2.4. Strategy plugins The algorithm also introduces the notion of strategy 𝜎 to provide selectors generating the predecessors of a concept (𝐴, 𝛿(𝐴)). The selectors propose a way to refine or cut the description 𝛿(𝐴). The purpose of a strategy plugin is then to produce a set of selectors restricting the set of individuals in order to obtain new potential concepts containing fewer individuals. The NextPriorityConcept algorithm selects among the set of selectors produced by the strategies that generates new effective concepts (i.e. with a reduction of individuals). The lattice property is maintained using propagation of cross and residuals constraints and a level by level generation of concepts using a priority queue[3]. 2.4.1. Basic strategies A basic strategy plugin proposes selectors to the NextPriorityConcept algorithm. These selectors act as cut into the set of individuals. A basic strategy must be initialized with a description and must implement the selectors method (cf Fig. 5 in appendix). 2.4.2. Meta strategies and measure plugins The core of the GALACTIC library defines two meta-strategies that act as filters for other strate- gies: • LimitFilter which selects predecessors whose measure is above/below a threshold; • SelectionFilter which selects the best/worst predecessors relatively to a measure. They are initialized with a set of strategies and with a measure. A measure is a class for measuring a predicate on a concept. It must implement the __call__ magic method with a concept and a predicate as parameters (cf Fig. 6 in appendix). 3. Some particular plugins We implemented some basic description and strategies plugins to manage boolean, categorical, numerical characteristic, but also more complex characteristics such as temporal sequences[4]. 3.1. Logical description and strategy plugin When characteristics are of the same type, NextPriorityConcept offers the possibility to process them separately or together. An immediate way to process with several characteristics together would be to merge the predicates obtained in individual cases, both for the descriptions and for the strategies. But it is possible to obtain more relevant predicates by using a specific process. For boolean characteristics for example, the classical FCA approach describes a set of objects 𝐴 thanks to the set of attributes 𝐵 = {𝑥 ∶ 𝑎 ∈ 𝐴 and 𝑎 possesses 𝑥} and the strategy would consider as selectors the set of all the attributes that are not in 𝐵. These two sets described by predicates of the form ”possesses attribute 𝑥” would respectively corresponds to 𝛿(𝐴) and 𝜎 (𝐴). It is possible to consider other descriptions of 𝐴 by the introduction of negative attributes, for example, the disjunction of clauses or its minimalization. 𝑥 if 𝑎 possesses 𝑥 ⋁ ⋀{ (1) 𝑎∈𝐴 𝑥 otherwise We defined plugins allowing to manage several boolean variables and their negations in the same description using the well-known Quine-McCluskey’s dual algorithm (or the method of prime implicants) that computes the minimization of a set of disjunctive clauses with a time complexity in 𝑂(3𝑛 log 𝑛)[5] where 𝑛 is the number of attributes. For example, using the context defined in Fig 2, the concept containing the even numbers is defined by {𝑜,̄ 𝑒, 𝑐 + 𝑠,̄ 𝑐 + 𝑝} and the concept containing the odd numbers is defined by {𝑒,̄ 𝑜, 𝑐 ̄ + 𝑠, 𝑝 + 𝑠}. The basic strategy associated to that description consists in simply adding a boolean variable or its negation to a concept to generate its potential predecessors. # composite even odd prime square 0 � � � $0: #10 1 � � 2 � � ... 3 � � 4 � � � $4: #5 $5: #5 5 � � not o not e 6 � � e o 7 � � (c or not s) (not c or s) 8 � � (c or p) (p or s) 9 � � � Figure 2: Digits example context and a fragment of the concept lattice with the even number and odd number concepts 3.2. Numerical description and strategy plugin As already proposed in pattern structure theory[6], it is possible to describe a set of points 𝐴 of ℝ𝑛 using the usual convex hull represented by a polytope, the description 𝛿(𝐴) is then composed of predicates describing the borders of the convex hull. For points in two and three dimensions, output-sensitive algorithms are known to compute the convex hull in time 𝑂(𝑛 log 𝑛), where 𝑛 is the number of points. We defined a description calculating the convex hull, and two strategies for cutting the 𝑛-dimensional space by projecting the data on the main axis of inertia: • we can define two predicates, 𝑥 ≥ 𝑚 − 𝛼𝜎 and 𝑥 ≤ 𝑚 + 𝛼𝜎 where 𝛼 is a parameter of this strategy, 𝑚 is the mean and 𝜎 is the standard deviation; • we can define a set of similar predicates by using the quantiles of the numerical data. The number of quantiles is a parameter of the strategy. 3.3. Entropy measure plugin The measure plugins are mainly used in the LimitFilter and SelectionFilter meta- strategies. They must give a numerical value evaluating a predicate in relation to a concept. In the core, we developed the confidence, support and cardinality measures and we defined a plugin for evaluating the entropy relatively to a categorical characteristic, i.e. a class attribute. For a given concept (𝐴, 𝑃), a predicate 𝑝 that select only some individuals from 𝐴 and a categorical characteristic Ω, the entropy 𝐻𝜃 (𝐴, 𝑝, Ω) is defined by: 𝐻𝜃 (𝐴, 𝑝, Ω) = 𝜃𝐻 ({𝑎 ∈ 𝐴|𝑝(𝑎)}, Ω) + (1 − 𝜃)𝐻 ({𝑎 ∈ 𝐴|¬𝑝(𝑎)}, Ω) 𝐻 (𝐴′ , Ω) = ∑ 𝑝𝜔 (𝐴′ , Ω)𝑙𝑜𝑔(𝑝𝜔 (𝐴′ , Ω)) 𝜔∈𝐷Ω |{𝑎 ∈ 𝐴′ |Ω(𝑎) = 𝜔}| 𝑝𝜔 (𝐴′ , Ω) = |𝐴′ | where 𝜃 is a [0, 1] parameter and 𝑝𝜔 (𝐴′ , Ω) is the ratio of individual in 𝐴′ that have their characteristic Ω equal to 𝜔. 4. Conclusion In this article, we described the software architecture of our GALACTIC platform that implements the NextPriorityConcept algorithm to generate formal concepts for heterogeneous and complex data. We explained the different types of plugins (reader, characteristic, description, strategy, measure) and we specified the methods to implement new ones. Then we presented some interesting plugins that we had developed, that can be used either to mine data or to develop new plugins. We are currenly working for plugins on time series. Inspired from pattern structures, the generated lattices are often smaller with more relevant concepts thanks to the notion of strategy, and a user driven approach to discover patterns is possible, in an interactive way for instance. The GALACTIC engine is data agnostic since it only handles predicates generated by strategies and descriptions for each type of characteristics. We are currently working on adding a graphical interface and the possibility of changing strategies on demand. This will ease the use of the GALACTIC platform for non-IT users. Future work will also be devoted to the extraction of minimal generators and logical rules from the concepts. References [1] B. Ganter, S. Kuznetsov, Pattern structures and their projections, in: LNCS of International Conference on Conceptual Structures (ICCS’01), 2001, pp. 129–142. [2] S. Ferré, O. Ridoux, A logical generalization of formal concept analysis, LNCS 1867 (2000) 371–384. [3] Ch. Demko, K. Bertet, C. Faucher, J.-F. Viaud, S. O. Kuznetsov, NextPriorityConcept: A new and generic algorithm computing concepts from complex and heterogeneous data, Theoretical Computer Science 845 (2020) 1–20. doi:10.1016/j.tcs.2020.08.026 . [4] S. E. Boukhetta, C. Demko, K. Bertet, J. Richard, C. Cayèré, Temporal sequence mining using fca and galactic, in: International Conference on Conceptual Structures, Springer, 2021, pp. 185–199. [5] W. V. O. Quine, The problem of simplifying truth functions, The American Mathematical 59 (1952) 521–531. [6] M. Kaytoue, V. Codocedo, A. Buzmakov, J. Baixeries, S. Kuznetsov, A. Napoli, Pattern structures and concept lattices for data mining and knowledge processing, in: In Proceedings of ECML-PKDDl, 2015, pp. 227–231. Appendix galactic.characteristics Characteristic myplugin MyCharacteristic The characteristic is a callable objet __eq__(self, other: Any): bool able to extract __hash__(self): int a value from __str__(self): str any python object. __call__(self, individual: Any): Any Figure 3: characteristic plugin galactic.characteristics Characteristic contains galactic.descriptions Description Predicate space: Tuple[Characteristic] myplugin MyDescription The description can be called on any collection of individuals. It produces a set of descriptors __eq__(self, other: Any): bool __hash__(self): int representing the convex hull __call__(self, individuals: Collection[Any]): Iterator[MyPredicate] of the individuals. produces MyPredicate A predicate defines __eq__(self, other: Any): bool a half-space __hash__(self): int for the set of the individuals. __str__(self): str __call__(self, individual: Any): Optional[bool] Figure 4: description plugin galactic.strategies Strategy A basic strategy should be descriptions: Iterator[Description] initialized with a description. mybasicplugin MyBasicStrategy BasicStrategy __init__(self, description: Description) selectors(self, concept: Concept): Iterator[Predicate] produces A basic strategy produces galactic.descriptions a set of selectors from a description space. Predicate Each selector could lead to a predecessor of the concept Figure 5: basic strategy plugin galactic.strategies Strategy descriptions: Iterator[Description] MetaStrategy Filter strategies: Tuple[Strategy] measure: Measure __init__(self, *args: Strategy) __init__(self, *args: Strategy, measure: Measure) selectors(self, concept: Concept): Iterator[Predicate] contains Measure LimitFilter SelectionFilter mymeasureplugin MyMeasure A measure can be called on a concept for a specific __call__(self, concept: Concept, predicate: Predicate): float predicate. Figure 6: measure plugin