<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>GALACTIC: towards a generic and scalable platform for complex and heterogeneous data using Formal Concept Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>ChristopheDemko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Salah Eddine Boukhett</string-name>
          <email>salah.boukhetta@univ-lr.f</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jérémy Richard</string-name>
          <email>jeremy.richard2@univ-lr.f</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guillaume Savarit</string-name>
          <email>guillaume.savarit1@univ-lr.f</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Karell Berte</string-name>
          <email>karell.bertet@univ-lr.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cyril Faucher</string-name>
          <email>yril.faucher@univ-lr.f</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Damien Mondou</string-name>
          <email>damien.mondou@univ-lr.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop Proceedings</string-name>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Formal Concept Analysis, Pattern Structures, Complex Data, Heterogeneous data</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Laboratory L3i, La Rochelle University</institution>
          ,
          <addr-line>La Rochelle</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In a recent paper, we presented a new pattern discovery algoriNtehxmt,PriorityConcept, in order to take into account complex and heterogeneous data using Formal Concept Analysis. We implemented this algorithm and developed apython 3 library whose acronymGALACTIC means Galois LatticesC,oncept Theory,Implicational systems andClosures. It is opened to the community using a BSD-3 license and its architecture allows the writing of plugins to take into account new datatypes. In this article we will present the architecture of our software solution, we will explain how to add new plugins to the core of our system by giving the UML diagram of each kind of plugins and we will give some examples of plugins developed within our team.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        extend the approach of pattern structu1r]easn[d logical concept analysis2[]. It’s a development
platform for a generic implementation of NthexetPriorityConcept[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] algorithm allowing
easy integration of new plugins for characteristics, descriptions, strategies and meta-strategies.
The GALACTIC eco-system is organized with:
      </p>
      <p>predicates;
• A core which implements theNextPriorityConcept algorithm and a lot of tools for
visualizing lattices and reduced contexts in python notebooks;
• A set ofcharacteristic plugins defining new types of data;
• A set ofdescription plugins defining new types of descriptions and their associated
• A set ofstrategy plugins defining new types of strategies for a given characteristic;</p>
      <p>Strategy
Data reader</p>
      <p>Next Priority Concept
• A set ofmeasure plugins useful for the filter meta-strategies;
• A set ofdata reader plugins allowingGALACTIC to read any type of data file;
• A set ofapplications using the core library and the diferent plugins;
• A set oflocalization plugins for translating the diferent applications.</p>
      <p>Each plugin must register with the core library by declaring an entry point in the configuration
ifle of the setuptools1 (setup.py) named
py_galactic_extension. The declared function informs the library that a new extension is
available.</p>
      <p>The construction of the lattice is carried out as showFing.i1n: the data are read from
a file; characteristics are extracted; a description is produced for each concept; strategies
generate selectors for the exploration of potential new concepts NanedxtPhreiorityConcept
algorithm selects the predecessors and maintains the lattice structure.</p>
    </sec>
    <sec id="sec-2">
      <title>2. GALACTIC platform</title>
      <sec id="sec-2-1">
        <title>2.1. Data reader plugins</title>
        <p>A data reader plugin is responsible for reading a file according to its extension and producing
the list of individuals of the population. It must implement two class methods:
• read(cls, data_file: TextIO): Iterable[Any] that reads a file and produces an
iterable of objects.
• extensions(cls): Iterator[str] that produces the list of file extensions supported
by this plugin.
1https://pypi.org/project/setuptools/</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Characteristic plugins</title>
        <p>Each concept is composed of a subset of objects together with a set of predicates describing
them, each predicate being specific to one type of characteristics. Such generic use of predicates
makes it possible to consider heterogeneous data as input, i.e. numeric, discrete or more complex
data.</p>
        <p>A characteristic plugin (Fcifg. 3 in appendix) is responsible of extracting a value from a
python object. It must implement t_h_ecall__ magic method which will be applied to each
individual of the population. This method should return the characteristic of the individual. It
is also reasonable to implement the other magic meth_o_desq__, __hash__, __str__.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Description plugins</title>
        <p>The algorithm introduces the notiondeoscfription  as an application to provide predicates
describing a set of objects according to their characteristics, that corresponds to a concept
(, ()) . At each iteration, predicates describing the ob jecotfsthe current concept are
computed ”on the fly” by a specific treatment for each type of characteristics, depending on
whether it is numeric, discrete or more complex, and the final descri ptiisotnhe union of these
predicates. In order to obtain a lattice, the description must()ve⊑ri(f y ′) for ′ ⊆  .
NextPriorityConcept computes the concept lattice of the context composed of individuals in
rows and predicates issued from the description in colum3]n.s[</p>
        <p>A description plugin (cFfig. 4 in appendix) is used to describe a collection of individuals using
a set of predicates. It usually defines a new predicate class and a new description class. The
description class is responsible of calculating a set of descriptors representing the convex hull of
a collection of individuals. These descriptors must be predicates that describe half-spaces on the
set of individuals. The convex hull is represented by the intersection of the set of descriptors.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Strategy plugins</title>
        <p>The algorithm also introduces the notionstroaftegy  to provide selectors generating the
predecessors of a concep(t, ()) . The selectors propose a way to refine or cut the description
() . The purpose of a strategy plugin is then to produce a set of selectors restricting the set
of individuals in order to obtain new potential concepts containing fewer individuals. The
NextPriorityConcept algorithm selects among the set of selectors produced by the strategies
that generates new efective concepts (i.e. with a reduction of individuals). The lattice property
is maintained using propagation of cross and residuals constraints and a level by level generation
of concepts using a priority queu3e].[</p>
        <sec id="sec-2-4-1">
          <title>2.4.1. Basic strategies</title>
          <p>A basic strategy plugin proposes selectors to NtehxetPriorityConcept algorithm. These
selectors act as cut into the set of individuals. A basic strategy must be initialized with a
description and must implement tsheelectors method (cfFig. 5 in appendix).</p>
        </sec>
        <sec id="sec-2-4-2">
          <title>2.4.2. Meta strategies and measure plugins</title>
          <p>The core of theGALACTIC library defines two meta-strategies that act as filters for other
strategies:
• LimitFilter which selects predecessors whose measure is above/below a threshold;
• SelectionFilter which selects the best/worst predecessors relatively to a measure.</p>
          <p>They are initialized with a set of strategies and with a measure. A measure is a class for
measuring a predicate on a concept. It must implement__tchaell__ magic method with a
concept and a predicate as parametersF(cigf. 6 in appendix).</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Some particular plugins</title>
      <p>We implemented some basic description and strategies plugins to manage boolean, categorical,
numerical characteristic, but also more complex characteristics such as temporal seq4u].ences[</p>
      <sec id="sec-3-1">
        <title>3.1. Logical description and strategy plugin</title>
        <p>When characteristics are of the same tyNpex,tPriorityConcept ofers the possibility to
process them separately or together. An immediate way to process with several characteristics
together would be to merge the predicates obtained in individual cases, both for the descriptions
and for the strategies. But it is possible to obtain more relevant predicates by using a specific
process.</p>
        <p>For boolean characteristics for example, the classical FCA approach describes a set of objects
 thanks to the set of attribu t=es{ ∶  ∈</p>
        <p>and  possesses } and the strategy would
consider as selectors the set of all the attributes that are n.oTtheisne two sets described by
predicates of the form ”possesses attribu”tweould respectively corresponds t()o
and  ()
.</p>
        <p>It is possible to consider other descriptio ns boyf the introduction of negative attributes,
for example, the disjunction of clauses or its minimalization.</p>
        <p>⋁ ⋀ {
∈


if  possesses 
otherwise
(1)</p>
        <p>
          We defined plugins allowing to manage several boolean variables and their negations in the
same description using the well-known Qui-nMecCluskey’s dual algorithm (or the method of
prime implicants) that computes the minimization of a set of disjunctive clauses with a time
complexity in(3  log ) [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] where is the number of attributes. For example, using the context
defined in Fig 2, the concept containing the even numbers is defined{b, y,̄ + , ̄ + }
and
the concept containing the odd numbers is defined{b, ȳ,  +̄ ,  + }
.
        </p>
        <p>The basic strategy associated to that description consists in simply adding a boolean variable
or its negation to a concept to generate its potential predecessors.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Numerical description and strategy plugin</title>
        <p>As already proposed in pattern structure the6o],riyt[ is possible to describe a set of poi ntsof
ℝ using the usual convex hull represented by a polytope, the descript()ionis then composed
of predicates describing the borders of the convex hull. For points in two and three dimensions,
output-sensitive algorithms are known to compute the convex hull in( timloge ) , where
is the number of points.</p>
        <p>We defined a description calculating the convex hull, and two strategies for cutting the
 -dimensional space by projecting the data on the main axis of inertia:
• we can define two predicates, ≥  −  and  ≤  +  where is a parameter of this
strategy , is the mean and is the standard deviation;
• we can define a set of similar predicates by using the quantiles of the numerical data. The
number of quantiles is a parameter of the strategy.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Entropy measure plugin</title>
        <p>The measure plugins are mainly used in thLeimitFilter and SelectionFilter
metastrategies. They must give a numerical value evaluating a predicate in relation to a concept.
In the core, we developed the confidence, support and cardinality measures and we defined a
plugin for evaluating the entropy relatively to a categorical characteristic, i.e. a class attribute.</p>
        <p>For a given concept(, ) , a predicate that select only some individuals fro mand a
categorical characterisΩti,cthe entropy  (, , Ω) is defined by:</p>
        <p>( ′, Ω) =
  (, , Ω) =  ({ ∈ |()}, Ω) + (1 − ) ({ ∈ |¬()}, Ω)
 ( ′, Ω) =
∑   ( ′, Ω)(  ( ′, Ω))
∈ Ω
|{ ∈  ′|Ω() = }|</p>
        <p>
          | ′|
where is a [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] parameter an d  ( ′, Ω) is the ratio of individual in′ that have their
characteristiΩc equal to .
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>In this article, we described the software architecture ofGoAuLrACTIC platform that implements
the NextPriorityConcept algorithm to generate formal concepts for heterogeneous and
complex data.</p>
      <p>We explained the diferent types of plugins (reader, characteristic, description, strategy,
measure) and we specified the methods to implement new ones. Then we presented some
interesting plugins that we had developed, that can be used either to mine data or to develop
new plugins. We are currenly working for plugins on time series.</p>
      <p>Inspired from pattern structures, the generated lattices are often smaller with more relevant
concepts thanks to the notion of strategy, and a user driven approach to discover patterns is
possible, in an interactive way for instance. TGhAeLACTIC engine is data agnostic since it only
handles predicates generated by strategies and descriptions for each type of characteristics.</p>
      <p>We are currently working on adding a graphical interface and the possibility of changing
strategies on demand. This will ease the use of tGhAeLACTIC platform for non-IT users. Future
work will also be devoted to the extraction of minimal generators and logical rules from the
concepts.
descriptions: Iterator[Description]
mybasicplugin</p>
      <p>produces
galactic.descriptions
galactic.strategies
descriptions: Iterator[Description]</p>
      <p>MetaStrategy
strategies: Tuple[Strategy]</p>
      <p>Filter
measure: Measure</p>
      <p>contains
Measure</p>
      <p>LimitFilter</p>
      <p>SelectionFilter
mymeasureplugin</p>
      <p>MyMeasure</p>
      <p>A measure can be called
on a concept for a specific
predicate.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ganter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kuznetsov</surname>
          </string-name>
          ,
          <article-title>Pattern structures and their projections</article-title>
          ,
          <source>in: LNCS of International Conference on Conceptual Structures (ICCS'01)</source>
          ,
          <year>2001</year>
          , pp.
          <fpage>129</fpage>
          -
          <lpage>142</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ferré</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Ridoux</surname>
          </string-name>
          ,
          <article-title>A logical generalization of formal concept analysis</article-title>
          ,
          <source>LNCS</source>
          <year>1867</year>
          (
          <year>2000</year>
          )
          <fpage>371</fpage>
          -
          <lpage>384</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Ch. Demko</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Bertet</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Faucher</surname>
            ,
            <given-names>J.-F.</given-names>
          </string-name>
          <string-name>
            <surname>Viaud</surname>
            ,
            <given-names>S. O.</given-names>
          </string-name>
          <article-title>KuznetsoNv,extPriorityConcept: A new and generic algorithm computing concepts from complex and heterogeneous data</article-title>
          ,
          <source>Theoretical Computer Science</source>
          <volume>845</volume>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>20</lpage>
          .
          <year>d1o0i</year>
          .: 1016/j.tcs.
          <year>2020</year>
          .
          <volume>08</volume>
          .026.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Boukhetta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Demko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bertet</surname>
          </string-name>
          , J. Richard, C. Cayèré,
          <article-title>Temporal sequence mining using fca and galactic</article-title>
          , in: International Conference on Conceptual Structures, Springer,
          <year>2021</year>
          , pp.
          <fpage>185</fpage>
          -
          <lpage>199</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>W. V. O.</given-names>
            <surname>Quine</surname>
          </string-name>
          ,
          <article-title>The problem of simplifying truth functions</article-title>
          ,
          <source>The American Mathematical</source>
          <volume>59</volume>
          (
          <year>1952</year>
          )
          <fpage>521</fpage>
          -
          <lpage>531</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kaytoue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Codocedo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Buzmakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Baixeries</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kuznetsov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <article-title>Pattern structures and concept lattices for data mining and knowledge processing</article-title>
          ,
          <source>in: In Proceedings of ECML-PKDDl</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>227</fpage>
          -
          <lpage>231</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>