=Paper= {{Paper |id=Vol-3308/paper17 |storemode=property |title=FCA4J: A Java Library for Relational Concept Analysis and Formal Concept Analysis |pdfUrl=https://ceur-ws.org/Vol-3308/Paper17.pdf |volume=Vol-3308 |authors=Alain Gutierrez,Marianne Huchard,Pierre Martin |dblpUrl=https://dblp.org/rec/conf/cla/GutierrezH022 }} ==FCA4J: A Java Library for Relational Concept Analysis and Formal Concept Analysis== https://ceur-ws.org/Vol-3308/Paper17.pdf
FCA4J: A Java Library for Relational Concept Analysis
and Formal Concept Analysis
Alain Gutierrez1,∗ , Marianne Huchard1,∗ and Pierre Martin2,∗
1
    LIRMM, Univ Montpellier, CNRS, Montpellier, France
2
    CIRAD, UPR AIDA, F-34398 Montpellier, France; AIDA, Univ Montpellier, CIRAD, Montpellier, France


                                         Abstract
                                         Formal Concept Analysis (FCA) and its extensions have shown their efficacy and relevance in various
                                         application domains. We recently conducted a set of experiments using Relational Concept Analysis and
                                         FCA in the domain of agro-ecology. This motivated the development of a library named FCA4J, which
                                         includes Java implementation of algorithms to build structures and implications and to manage data, in
                                         particular relational context families. This paper presents the main features of FCA4J, its ecosystem, and
                                         a few use cases.

                                         Keywords
                                         Formal Concept Analysis, Relational Concept Analysis, software library, Java




1. Introduction
The development of the Knomana knowledge-based system [1], dealing with pesticidal plant
use, required to use Relational Concept Analysis (RCA) to consider the conforming entity-
relationship model [2] of its data model with the aim to perform knowledge discovery and
Data mining [3]. The first results obtained using RCAexplore were very relevant [4], but led to
raising new questions to be tackled through the computing of implication sets. As RCAexplore
was not proposing such calculations, and no software was computing Duquennes-Guigues basis
of implications [5] on a relational dataset, we initiated the development of a library named
FCA4J. The latter was developed in Java to be cross-platform, and to facilitate its distribution.
Since then, various types of analysis, including Formal Concept Analysis (FCA) computations,
have been produced for Knomana [6, 7], that went enriching this library. FCA4J is actually
available1 and distributed with the licence BSD3.
   The objective of this paper is to introduce FCA4J. Section 2 presents the existing libraries on
FCA. The functionalities of FCA4J are presented in Section 3, and examples of usage in Section 4.
Section 5 concludes and provides perspectives on FCA4J enrichment.



Published in Pablo Cordero, Ondrej Kridlo (Eds.): The 16𝑡ℎ International Conference on Concept Lattices and Their
Applications, CLA 2022, Tallinn, Estonia, June 20–22, 2022, Proceedings, pp. 207–212.
∗
    Corresponding author.
Envelope-Open alain.gutierrez@lirmm.fr (A. Gutierrez); marianne.huchard@lirmm.fr (M. Huchard); pierre.martin@cirad.fr
(P. Martin)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
             CEUR Workshop Proceedings (CEUR-WS.org)
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073




1
    https://www.lirmm.fr/fca4j/
2. Existing FCA libraries
Multiple concrete applications have been developed using FCA [8, 9, 10]. Most of the software
that supports these applications are listed in [11], including the following libraries.
  Colibri and qdfca2 aim both to compute a lattice from a formal context. Colibri was initially
developed in C (to be compiled before being executed on a computer), but a Java release3
was further developed to facilitate its distribution. To run qdfca, it is required the standard
Ruby library. Galicia v3 4 is a java library distributed as a jar file, which computes various
FCA structures and various rule bases. fcaR [12] is a R package that also computes fuzzy
FCA. FCAlib5 is a Java library distributed as a jar file. FCAlib includes basic FCA algorithms
such as implicational closure and implements an attribute exploration algorithm that can be
customized according to expert needs. FcaStone6 aims to convert between the file formats of
commonly-used FCA tools and between FCA formats and other graphical formats. To run, it
requires the PERL files. The library presented above considers a formal context exclusively in
the computation. Both Galicia v1 7 and RCAexplore8 consider RCA. They are developed in Java
and distributed as a jar file. They aim to compute a family of relational conceptual structures.


3. FCA4J Features
Figure 1 shows the FCA4J features organized along six high-level categories: (1) input file format,
(2) output file format, (3) pre-processing commands, (4) computed structures command, (5) Java
data structure used to compute structures, and (6) Duquennes-Guigues Basis of Implication
(DGBI) computation parameters. This section introduces each of these categories.

Input and output formats Several external formats are supported by FCA4J to facilitate
interoperability with other software. As FCA input, it supports XML data files, i.e. XML format
from Galicia v3 and CEX from ConExp, and text files, such as SLF (a HTK standard lattice
format), CXT (Burmeisters ConImp), and CSV. For the latter, it allows data provided as Boolean
and multi-valued attributes. As FCA output, it generates DOT, XML, JSON, TXT, and CSV format
files. The conversion of a file to another format can be realized using the command Convert.
Regarding RCA, it supports RCFT (the relational context family file format of RCAexplore,
RCFTGZ (a compressed released of RCFT), and RCFAL (a JSON file containing an adjacency
list).

Pre-processing command FCA4J provides advanced commands required for the data pre-
processing. Applied to a formal context, the command Inspect displays information (size, density,
etc.), Binarize applies scaling operations to binarizing multi-valued data, and Clarify eliminates
2
  https://rubygems.org/gems/qdfca
3
  https://code.google.com/archive/p/colibri-java/
4
  v3 at https://sourceforge.net/projects/galicia/files/galicia/
5
  https://github.com/julianmendez/fcalib
6
  https://github.com/upriss/fcastone
7
  v1 at http://www.iro.umontreal.ca/ galicia/
8
  http://dataqual.engees.unistra.fr/logiciels/rcaExplore
Figure 1: An overview of all features of FCA4J


any duplicated objects and attributes. Reduce keeps only irreducible objects and attributes.
Family, applied to a relational context family, enables the creation and the management of a
relational context family using a set of formal contexts and relations.

Computing structure command FCA4J computes various structures. The command Lattice
computes the concept lattice (AddExtent option) or an Iceberg lattice (Iceberg option). Command
AocPoset builds the subset of the concept lattice considering only concepts introducing an
object or an attribute. Various algorithms are available, i.e. Hermes, Ares, Ceres, and Pluton.
Irreducible lists irreducible objects and attributes. RuleBasis computes the canonical basis of
implications (Duquenne-Guigues). Two algorithms are available, Lincbo [13] and LinCbO with
pruning [14, 15]. RCA applies Relational Concept Analysis and creates a family of conceptual
structures (AOCposets, Iceberg lattices or concept lattices) from a relational context family.

Java data structures FCA4J integrates most of them through a design pattern in order
to enable the user to select the most appropriate one to perform FCA or RCA computation.
The available data structure types are BoolArray, BitSet, AVLtree, RBTree, IntOrderedSet, and
fca4j.ISet. These data structure types present various implementations, such as RBTree, which
uses the Java.util.TreeSet collection or the FastUtil framework9 . Other data structures are

9
    https://fastutil.di.unimi.it
provided by Koloboke10 and hppc11 collections.

DGBI computation parameters Various configurations of the DGBI computation are avail-
able. The formal context can be clarified, the closure computation can consider the history, and
the calculation can be realized using a monothread or multithreads (fork Join Pool method).


4. FCA4J ecosystem and use case
The FCA4J website fully describes the commands presented in Section 3. In addition, its Getting
started page provides a dataset and a set of command lines to produce artifacts, e.g. concept
lattice, Iceberg50, AOCposet, Duquenne-Guigues set of implications. This section presents the
actual ecosystem of FCA4J and then a few use cases.

Ecosystem FCA4J does not propose any viewer. To navigate within a conceptual structure
family represented as a set of lattices, we developed the online visualiser RCAviz12 [16]. Figure
2 presents the process that combines FCA4J and RCAviz. The RCFT file containing the data
is provided as input to FCA4J that generates the JSON file (the input file format required by
RCAviz) using the command RCA via the terminal. The user has then to open this file with
RCAviz to navigate within the lattices and from one to another. FCA4J is not managed by
RCAviz in order to keep these applications independent. Moreover, FCA4J was integrated into
Cogui13 platform to reason on knowledge through the representation of the inference rules,
expressed as predicates sets, as an AOCposet. The aim is to perform a lazy evaluation of the
predicates, by favouring the easiest to compute.

Use cases FCA4J is currently used by students of the Software engineering Master at Mont-
pellier University in several laboratory works as the practical support to learn FCA and RCA, to
study Java class refactoring (using FCA) and the UML class model refactoring (using RCA).
   FCA4J is also used by students during their master internship applied to Knomana. For
instance, L. Mahrach used FCA4J to explore the combination of RCA and DGBI with the aim
of rendering knowledge suitable to experts [6]. The results showed that the way of splitting
data between tables could have a positive or negative impact on the implications’ readability by
the experts. Moreover, J. Saoud identified various types of knowledge elements and patterns
that constitute the implications from the DGBI [7]. The results show that a post-process can be
conducted to remove tacit knowledge elements from implications, to improve their readability
by the experts. An on-going work by N. Saab evaluates the use of FCA4J to detect and correct
various types of anomalies in the Knomana dataset and the associated sets of implications [17].




10
   https://github.com/leventov/Koloboke
11
   https://github.com/carrotsearch/hppc
12
   https://rcaviz.lirmm.fr
13
   https://www.lirmm.fr/cogui/
Figure 2: Process for navigating within a Relational context family, provided as a RCFT format file,
using FCA4J and RCAviz.


5. Conclusion
Two releases of FCA4J are currently distributed as a jar file. One contains all the elements
presented in this document. The light release does not contain the libraries FastUtil, hppc, and
Koloboke as they are very large. This removal reduces the number of data structures available
in FCA4J as well as the size of the distributed jar file, and thus facilitates the integration of
FCA4J in an Apache Maven repository.
   In the context of a Master student project, we are currently investigating the use of FCA4J
commands in a Jupyter notebook [18]. One investigated scenario is to design a workflow involv-
ing tools such as Scikitlearn14 , weka15 or Orange16 , that includes association rule computation
and classification of the rules in a concept lattice based on the premise attributes.


Acknowledgments
The authors would like to thank Huaxi (Yulin) Zhang for her co-supervision of the Master
project [18]. Part of this work was supported by the French National Research Agency under the
Investments for the Future Program (ANR-16-CONV-0004), and through the project SmartFCA
(ANR-21-CE23-0023).



14
   https://scikit-learn.org/stable/
15
   https://www.cs.waikato.ac.nz/ml/weka/
16
   https://orangedatamining.com/
References
 [1] P. J. Silvie, P. Martin, M. Huchard, P. Keip, A. Gutierrez, S. Sarter, Prototyping a knowledge-
     based system to identify botanical extracts for plant health in sub-saharan africa, Plants
     10 (2021). doi:10.3390/plants10050896 .
 [2] M. R. Hacene, M. Huchard, A. Napoli, P. Valtchev, Relational concept analysis: mining
     concept lattices from multi-relational data, Ann. Math. Artif. Intell. 67 (2013) 81–108.
 [3] U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, The KDD process for extracting useful
     knowledge from volumes of data, Commun. ACM 39 (1996) 27–34.
 [4] P. Keip, A. Gutierrez, M. Huchard, F. Le Ber, S. Sarter, P. Silvie, P. Martin, Effects of
     Input Data Formalisation in Relational Concept Analysis for a Data Model with a Ternary
     Relation, in: Formal Concept Analysis, LNCS, Springer International Publishing, 2019, pp.
     191–207.
 [5] J. L. Guigues, V. Duquenne, Famille minimale d’implications informatives résultant d’un
     tableau de données binaires, Math. et Sci. Hum. 24 (1986) 5–18.
 [6] L. Mahrach, A. Gutierrez, M. Huchard, P. Keip, P. Marnotte, P. Silvie, P. Martin, Combining
     implications and conceptual analysis to learn from a pesticidal plant knowledge base,
     volume 12879 of LNCS, Springer, 2021, pp. 57–72.
 [7] J. Saoud, A. Gutierrez, M. Huchard, P. Marnotte, P. Silvie, P. Martin, Explicit versus
     Tacit Knowledge in Duquenne-Guigues Basis of Implications: Preliminary Results, in:
     Workshop – Analyzing Real Data with Formal Concept Analysis (RealDataFCA’2021), 2021,
     pp. 20–27.
 [8] S. Ferré, M. Kaytoue, M. Huchard, S. O. Kuznetsov, A. Napoli, Formal concept analysis:
     From Knowledge discovery to Knowledge processing, in: A Guided Tour of Artificial
     Intelligence Research, volume II, Springer, 2018.
 [9] J. Poelmans, D. I. Ignatov, S. O. Kuznetsov, G. Dedene, Formal concept analysis in knowledge
     processing: A survey on applications, Expert Syst. Appl. 40 (2013) 6538–6560.
[10] P. Cordero, M. Enciso, D. López, A. Mora, A conversational recommender system for
     diagnosis using fuzzy rules, Exp. Syst. App. 154 (2020) 113449.
[11] U. Priss, Formal Concept Analysis Home page: FCA Software, https://upriss.github.io/f-
     ca/fcasoftware.html, 2022. [Last access 30-April-2022].
[12] P. Cordero, M. Enciso, D. López-Rodríguez, , Ángel Mora, fcaR, Formal Concept Analysis
     with R, The R Journal (2022) Accepted.
[13] R. Janostik, J. Konecny, P. Krajča, LinCbO: Fast algorithm for computation of the Duquenne-
     Guigues basis. arXiv:2011.04928, 2020. arXiv:2011.04928 .
[14] R. Janostik, J. Konecny, P. Krajča, Pruning techniques in LinCbO for computation of the
     Duquenne-Guigues basis, in: The 16th Int. Conf. on Formal Concept Analysis (ICFCA)
     2021, number 12733 in LNCS/LNAI, Springer, 2021.
[15] R. Janostik, J. Konecny, P. Krajca, LinCbO: Fast algorithm for computation of the Duquenne-
     Guigues basis, Inf. Sci. 572 (2021) 223–240.
[16] E. Muller, M. Huchard, P. Martin, P. Poncelet, A. Sallaberry, RCAviz: Visualizing and
     Exploring Relational Conceptual Structures, in: Proc. of CLA 2022 - 16th Int. Conf. on
     Concept Lattices and Their Applications, 2022, pp. 135–148.
[17] N. Saab, M. Huchard, P. Martin, Evaluating Formal Concept Analysis Software for Anomaly
     Detection and Correction, in: ETAFCA 2022 : Existing Tools and Applications for Formal
     Concept Analysis, Workshop @ CLA 2022, 2022, pp. 215–220. URL: https://cs.ttu.ee/events/
     etafca-2022/.
[18] T. Bros, R. Haoulani, J. C. Alla, N. Seoudi, FCA Notebook - Master 1st year - Research initia-
     tion report, Master’s thesis, Université de Montpellier, France, 2022. In French, supervision:
     Huaxi (Yulin) Zhang, Marianne Huchard.