=Paper= {{Paper |id=Vol-3308/paper15 |storemode=property |title=GALACTIC: towards a generic and scalable platform for complex and heterogeneous data using Formal Concept Analysis |pdfUrl=https://ceur-ws.org/Vol-3308/Paper15.pdf |volume=Vol-3308 |authors=Christophe Demko,Salah Eddine Boukhetta,Jérêmy Richard,Guillaume Savarit,Karell Bertet,Cyril Faucher,Damien Mondou |dblpUrl=https://dblp.org/rec/conf/cla/DemkoBRSBFM22 }} ==GALACTIC: towards a generic and scalable platform for complex and heterogeneous data using Formal Concept Analysis== https://ceur-ws.org/Vol-3308/Paper15.pdf
GALACTIC: towards a generic and scalable platform
for complex and heterogeneous data using Formal
Concept Analysis
Christophe Demko1,∗ , Salah Eddine Boukhetta1 , Jérémy Richard1 , Guillaume Savarit1 ,
Karell Bertet1 , Cyril Faucher1 and Damien Mondou1
1
    Laboratory L3i, La Rochelle University, La Rochelle, France


                                         Abstract
                                         In a recent paper, we presented a new pattern discovery algorithm, NextPriorityConcept, in order to
                                         take into account complex and heterogeneous data using Formal Concept Analysis. We implemented this
                                         algorithm and developed a python 3 library whose acronym GALACTIC means Galois Lattices, Concept
                                         Theory, Implicational systems and Closures. It is opened to the community using a BSD-3 license and
                                         its architecture allows the writing of plugins to take into account new datatypes. In this article we will
                                         present the architecture of our software solution, we will explain how to add new plugins to the core
                                         of our system by giving the UML diagram of each kind of plugins and we will give some examples of
                                         plugins developed within our team.

                                         Keywords
                                         Formal Concept Analysis, Pattern Structures, Complex Data, Heterogeneous data




1. Introduction
GALACTIC is a new platform for computing patterns from heterogeneous and complex data that
extend the approach of pattern structures [1] and logical concept analysis [2]. It’s a development
platform for a generic implementation of the NextPriorityConcept[3] algorithm allowing
easy integration of new plugins for characteristics, descriptions, strategies and meta-strategies.
The GALACTIC eco-system is organized with:

                  • A core which implements the NextPriorityConcept algorithm and a lot of tools for
                    visualizing lattices and reduced contexts in python notebooks;
                  • A set of characteristic plugins defining new types of data;
                  • A set of description plugins defining new types of descriptions and their associated
                    predicates;
                  • A set of strategy plugins defining new types of strategies for a given characteristic;

Published in Pablo Cordero, Ondrej Kridlo (Eds.): The 16𝑡ℎ International Conference on Concept Lattices and Their
Applications, CLA 2022, Tallinn, Estonia, June 20–22, 2022, Proceedings, pp. 191–198.
∗
    Corresponding author.
Envelope-Open christophe.demko@univ-lr.fr (C. Demko); salah.boukhetta@univ-lr.fr (S. Boukhetta);
jeremy.richard2@univ-lr.fr (J. Richard); guillaume.savarit1@univ-lr.fr (G. Savarit); karell.bertet@univ-lr.fr
(K. Bertet); cyril.faucher@univ-lr.fr (C. Faucher); damien.mondou@univ-lr.fr (D. Mondou)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
                                  Characteristic          Description




                                                                           Strategy




                                  Data reader
                                                   Next Priority Concept


Figure 1: the NextPriorityConcept algorithm process




        • A set of measure plugins useful for the filter meta-strategies;
        • A set of data reader plugins allowing GALACTIC to read any type of data file;
        • A set of applications using the core library and the different plugins;
        • A set of localization plugins for translating the different applications.

   Each plugin must register with the core library by declaring an entry point in the configuration
file of the setuptools 1 (setup.py ) named
py_galactic_extension . The declared function informs the library that a new extension is
available.
   The construction of the lattice is carried out as shown in Fig. 1: the data are read from
a file; characteristics are extracted; a description is produced for each concept; strategies
generate selectors for the exploration of potential new concepts and the NextPriorityConcept
algorithm selects the predecessors and maintains the lattice structure.


2. GALACTIC platform
2.1. Data reader plugins
A data reader plugin is responsible for reading a file according to its extension and producing
the list of individuals of the population. It must implement two class methods:

        • read(cls, data_file: TextIO): Iterable[Any] that reads a file and produces an
          iterable of objects.
        • extensions(cls): Iterator[str] that produces the list of file extensions supported
          by this plugin.




1
    https://pypi.org/project/setuptools/
2.2. Characteristic plugins
Each concept is composed of a subset of objects together with a set of predicates describing
them, each predicate being specific to one type of characteristics. Such generic use of predicates
makes it possible to consider heterogeneous data as input, i.e. numeric, discrete or more complex
data.
   A characteristic plugin (cf Fig. 3 in appendix) is responsible of extracting a value from a
python object. It must implement the __call__ magic method which will be applied to each
individual of the population. This method should return the characteristic of the individual. It
is also reasonable to implement the other magic methods __eq__ , __hash__ , __str__ .

2.3. Description plugins
The algorithm introduces the notion of description 𝛿 as an application to provide predicates
describing a set of objects 𝐴 according to their characteristics, that corresponds to a concept
(𝐴, 𝛿(𝐴)). At each iteration, predicates describing the objects 𝐴 of the current concept are
computed ”on the fly” by a specific treatment for each type of characteristics, depending on
whether it is numeric, discrete or more complex, and the final description 𝛿 is the union of these
predicates. In order to obtain a lattice, the description must verify 𝛿(𝐴) ⊑ 𝛿(𝐴′ ) for 𝐴′ ⊆ 𝐴.
NextPriorityConcept computes the concept lattice of the context composed of individuals in
rows and predicates issued from the description in columns[3].
   A description plugin (cf Fig. 4 in appendix) is used to describe a collection of individuals using
a set of predicates. It usually defines a new predicate class and a new description class. The
description class is responsible of calculating a set of descriptors representing the convex hull of
a collection of individuals. These descriptors must be predicates that describe half-spaces on the
set of individuals. The convex hull is represented by the intersection of the set of descriptors.

2.4. Strategy plugins
The algorithm also introduces the notion of strategy 𝜎 to provide selectors generating the
predecessors of a concept (𝐴, 𝛿(𝐴)). The selectors propose a way to refine or cut the description
𝛿(𝐴). The purpose of a strategy plugin is then to produce a set of selectors restricting the set
of individuals in order to obtain new potential concepts containing fewer individuals. The
NextPriorityConcept algorithm selects among the set of selectors produced by the strategies
that generates new effective concepts (i.e. with a reduction of individuals). The lattice property
is maintained using propagation of cross and residuals constraints and a level by level generation
of concepts using a priority queue[3].

2.4.1. Basic strategies
A basic strategy plugin proposes selectors to the NextPriorityConcept algorithm. These
selectors act as cut into the set of individuals. A basic strategy must be initialized with a
description and must implement the selectors method (cf Fig. 5 in appendix).
2.4.2. Meta strategies and measure plugins
The core of the GALACTIC library defines two meta-strategies that act as filters for other strate-
gies:

    • LimitFilter which selects predecessors whose measure is above/below a threshold;
    • SelectionFilter which selects the best/worst predecessors relatively to a measure.

  They are initialized with a set of strategies and with a measure. A measure is a class for
measuring a predicate on a concept. It must implement the __call__ magic method with a
concept and a predicate as parameters (cf Fig. 6 in appendix).


3. Some particular plugins
We implemented some basic description and strategies plugins to manage boolean, categorical,
numerical characteristic, but also more complex characteristics such as temporal sequences[4].

3.1. Logical description and strategy plugin
When characteristics are of the same type, NextPriorityConcept offers the possibility to
process them separately or together. An immediate way to process with several characteristics
together would be to merge the predicates obtained in individual cases, both for the descriptions
and for the strategies. But it is possible to obtain more relevant predicates by using a specific
process.
  For boolean characteristics for example, the classical FCA approach describes a set of objects
𝐴 thanks to the set of attributes 𝐵 = {𝑥 ∶ 𝑎 ∈ 𝐴 and 𝑎 possesses 𝑥} and the strategy would
consider as selectors the set of all the attributes that are not in 𝐵. These two sets described by
predicates of the form ”possesses attribute 𝑥” would respectively corresponds to 𝛿(𝐴) and 𝜎 (𝐴).
  It is possible to consider other descriptions of 𝐴 by the introduction of negative attributes,
for example, the disjunction of clauses or its minimalization.

                                           𝑥   if 𝑎 possesses 𝑥
                                 ⋁ ⋀{                                                          (1)
                                 𝑎∈𝐴
                                           𝑥   otherwise
  We defined plugins allowing to manage several boolean variables and their negations in the
same description using the well-known Quine-McCluskey’s dual algorithm (or the method of
prime implicants) that computes the minimization of a set of disjunctive clauses with a time
complexity in 𝑂(3𝑛 log 𝑛)[5] where 𝑛 is the number of attributes. For example, using the context
defined in Fig 2, the concept containing the even numbers is defined by {𝑜,̄ 𝑒, 𝑐 + 𝑠,̄ 𝑐 + 𝑝} and
the concept containing the odd numbers is defined by {𝑒,̄ 𝑜, 𝑐 ̄ + 𝑠, 𝑝 + 𝑠}.
  The basic strategy associated to that description consists in simply adding a boolean variable
or its negation to a concept to generate its potential predecessors.
              #    composite      even     odd   prime     square
              0       �            �                         �
                                                                                  $0: #10
              1                             �                �
              2                    �                �                             ...

              3                             �       �
              4        �           �                         �
                                                                         $4: #5             $5: #5
              5                             �       �
                                                                       not o            not e
              6        �           �                                   e                o
              7                             �       �                  (c or not s)     (not c or s)
              8        �           �                                   (c or p)         (p or s)
              9        �                    �                �


Figure 2: Digits example context and a fragment of the concept lattice with the even number and odd
number concepts


3.2. Numerical description and strategy plugin
As already proposed in pattern structure theory[6], it is possible to describe a set of points 𝐴 of
ℝ𝑛 using the usual convex hull represented by a polytope, the description 𝛿(𝐴) is then composed
of predicates describing the borders of the convex hull. For points in two and three dimensions,
output-sensitive algorithms are known to compute the convex hull in time 𝑂(𝑛 log 𝑛), where 𝑛
is the number of points.
   We defined a description calculating the convex hull, and two strategies for cutting the
𝑛-dimensional space by projecting the data on the main axis of inertia:

    • we can define two predicates, 𝑥 ≥ 𝑚 − 𝛼𝜎 and 𝑥 ≤ 𝑚 + 𝛼𝜎 where 𝛼 is a parameter of this
      strategy, 𝑚 is the mean and 𝜎 is the standard deviation;
    • we can define a set of similar predicates by using the quantiles of the numerical data. The
      number of quantiles is a parameter of the strategy.

3.3. Entropy measure plugin
The measure plugins are mainly used in the LimitFilter and SelectionFilter meta-
strategies. They must give a numerical value evaluating a predicate in relation to a concept.
In the core, we developed the confidence, support and cardinality measures and we defined a
plugin for evaluating the entropy relatively to a categorical characteristic, i.e. a class attribute.
   For a given concept (𝐴, 𝑃), a predicate 𝑝 that select only some individuals from 𝐴 and a
categorical characteristic Ω, the entropy 𝐻𝜃 (𝐴, 𝑝, Ω) is defined by:


               𝐻𝜃 (𝐴, 𝑝, Ω) = 𝜃𝐻 ({𝑎 ∈ 𝐴|𝑝(𝑎)}, Ω) + (1 − 𝜃)𝐻 ({𝑎 ∈ 𝐴|¬𝑝(𝑎)}, Ω)
                  𝐻 (𝐴′ , Ω) =         ∑ 𝑝𝜔 (𝐴′ , Ω)𝑙𝑜𝑔(𝑝𝜔 (𝐴′ , Ω))
                                   𝜔∈𝐷Ω
                                    |{𝑎 ∈ 𝐴′ |Ω(𝑎) = 𝜔}|
                  𝑝𝜔 (𝐴′ , Ω) =
                                            |𝐴′ |
  where 𝜃 is a [0, 1] parameter and 𝑝𝜔 (𝐴′ , Ω) is the ratio of individual in 𝐴′ that have their
characteristic Ω equal to 𝜔.


4. Conclusion
In this article, we described the software architecture of our GALACTIC platform that implements
the NextPriorityConcept algorithm to generate formal concepts for heterogeneous and
complex data.
   We explained the different types of plugins (reader, characteristic, description, strategy,
measure) and we specified the methods to implement new ones. Then we presented some
interesting plugins that we had developed, that can be used either to mine data or to develop
new plugins. We are currenly working for plugins on time series.
   Inspired from pattern structures, the generated lattices are often smaller with more relevant
concepts thanks to the notion of strategy, and a user driven approach to discover patterns is
possible, in an interactive way for instance. The GALACTIC engine is data agnostic since it only
handles predicates generated by strategies and descriptions for each type of characteristics.
   We are currently working on adding a graphical interface and the possibility of changing
strategies on demand. This will ease the use of the GALACTIC platform for non-IT users. Future
work will also be devoted to the extraction of minimal generators and logical rules from the
concepts.


References
[1] B. Ganter, S. Kuznetsov, Pattern structures and their projections, in: LNCS of International
    Conference on Conceptual Structures (ICCS’01), 2001, pp. 129–142.
[2] S. Ferré, O. Ridoux, A logical generalization of formal concept analysis, LNCS 1867 (2000)
    371–384.
[3] Ch. Demko, K. Bertet, C. Faucher, J.-F. Viaud, S. O. Kuznetsov, NextPriorityConcept:
    A new and generic algorithm computing concepts from complex and heterogeneous data,
    Theoretical Computer Science 845 (2020) 1–20. doi:10.1016/j.tcs.2020.08.026 .
[4] S. E. Boukhetta, C. Demko, K. Bertet, J. Richard, C. Cayèré, Temporal sequence mining
    using fca and galactic, in: International Conference on Conceptual Structures, Springer,
    2021, pp. 185–199.
[5] W. V. O. Quine, The problem of simplifying truth functions, The American Mathematical
    59 (1952) 521–531.
[6] M. Kaytoue, V. Codocedo, A. Buzmakov, J. Baixeries, S. Kuznetsov, A. Napoli, Pattern
    structures and concept lattices for data mining and knowledge processing, in: In Proceedings
    of ECML-PKDDl, 2015, pp. 227–231.
Appendix


                                                galactic.characteristics

                                                          Characteristic




                                      myplugin

                                                         MyCharacteristic                          The characteristic
                                                                                                   is a callable objet
                                           __eq__(self, other: Any): bool                          able to extract
                                           __hash__(self): int                                     a value from
                                           __str__(self): str                                      any python object.
                                           __call__(self, individual: Any): Any




Figure 3: characteristic plugin

                                                                                                    galactic.characteristics

                                                                                                                Characteristic




                                                                                                                contains

                                                                                     galactic.descriptions

                                                                                                  Description
                                                                                                                                 Predicate
                                                                                        space: Tuple[Characteristic]




                      myplugin

                                                    MyDescription                                  The description can be called
                                                                                                   on any collection of individuals.
                                                                                                   It produces a set of descriptors
                        __eq__(self, other: Any): bool
                        __hash__(self): int                                                        representing the convex hull
                        __call__(self, individuals: Collection[Any]): Iterator[MyPredicate]        of the individuals.


                                                               produces



                                                                MyPredicate
                                                                                                   A predicate defines
                                            __eq__(self, other: Any): bool                         a half-space
                                            __hash__(self): int                                    for the set of the individuals.
                                            __str__(self): str
                                            __call__(self, individual: Any): Optional[bool]




Figure 4: description plugin
                    galactic.strategies

                                       Strategy
                                                                                    A basic strategy should be
                       descriptions: Iterator[Description]                          initialized with a description.




                                                                      mybasicplugin

                                                                                               MyBasicStrategy
                                           BasicStrategy
                                                                         __init__(self, description: Description)
                                                                         selectors(self, concept: Concept): Iterator[Predicate]



                                                                                                          produces


                                                             A basic strategy produces            galactic.descriptions
                                                             a set of selectors
                                                             from a description space.
                                                                                                          Predicate
                                                             Each selector could
                                                             lead to a predecessor
                                                             of the concept



Figure 5: basic strategy plugin

           galactic.strategies

                                      Strategy

                      descriptions: Iterator[Description]




                                    MetaStrategy
                                                                                                          Filter
             strategies: Tuple[Strategy]
                                                                                measure: Measure
             __init__(self, *args: Strategy)
                                                                                __init__(self, *args: Strategy, measure: Measure)
             selectors(self, concept: Concept): Iterator[Predicate]



                                                                                       contains



                                                                      Measure               LimitFilter               SelectionFilter




                                    mymeasureplugin

                                                                   MyMeasure                                       A measure can be called
                                                                                                                   on a concept for a specific
                                       __call__(self, concept: Concept, predicate: Predicate): float               predicate.




Figure 6: measure plugin