FCA4J: A Java Library for Relational Concept Analysis and Formal Concept Analysis Alain Gutierrez1,∗ , Marianne Huchard1,∗ and Pierre Martin2,∗ 1 LIRMM, Univ Montpellier, CNRS, Montpellier, France 2 CIRAD, UPR AIDA, F-34398 Montpellier, France; AIDA, Univ Montpellier, CIRAD, Montpellier, France Abstract Formal Concept Analysis (FCA) and its extensions have shown their efficacy and relevance in various application domains. We recently conducted a set of experiments using Relational Concept Analysis and FCA in the domain of agro-ecology. This motivated the development of a library named FCA4J, which includes Java implementation of algorithms to build structures and implications and to manage data, in particular relational context families. This paper presents the main features of FCA4J, its ecosystem, and a few use cases. Keywords Formal Concept Analysis, Relational Concept Analysis, software library, Java 1. Introduction The development of the Knomana knowledge-based system [1], dealing with pesticidal plant use, required to use Relational Concept Analysis (RCA) to consider the conforming entity- relationship model [2] of its data model with the aim to perform knowledge discovery and Data mining [3]. The first results obtained using RCAexplore were very relevant [4], but led to raising new questions to be tackled through the computing of implication sets. As RCAexplore was not proposing such calculations, and no software was computing Duquennes-Guigues basis of implications [5] on a relational dataset, we initiated the development of a library named FCA4J. The latter was developed in Java to be cross-platform, and to facilitate its distribution. Since then, various types of analysis, including Formal Concept Analysis (FCA) computations, have been produced for Knomana [6, 7], that went enriching this library. FCA4J is actually available1 and distributed with the licence BSD3. The objective of this paper is to introduce FCA4J. Section 2 presents the existing libraries on FCA. The functionalities of FCA4J are presented in Section 3, and examples of usage in Section 4. Section 5 concludes and provides perspectives on FCA4J enrichment. Published in Pablo Cordero, Ondrej Kridlo (Eds.): The 16𝑡ℎ International Conference on Concept Lattices and Their Applications, CLA 2022, Tallinn, Estonia, June 20–22, 2022, Proceedings, pp. 207–212. ∗ Corresponding author. Envelope-Open alain.gutierrez@lirmm.fr (A. Gutierrez); marianne.huchard@lirmm.fr (M. Huchard); pierre.martin@cirad.fr (P. Martin) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop CEUR Workshop Proceedings (CEUR-WS.org) Proceedings http://ceur-ws.org ISSN 1613-0073 1 https://www.lirmm.fr/fca4j/ 2. Existing FCA libraries Multiple concrete applications have been developed using FCA [8, 9, 10]. Most of the software that supports these applications are listed in [11], including the following libraries. Colibri and qdfca2 aim both to compute a lattice from a formal context. Colibri was initially developed in C (to be compiled before being executed on a computer), but a Java release3 was further developed to facilitate its distribution. To run qdfca, it is required the standard Ruby library. Galicia v3 4 is a java library distributed as a jar file, which computes various FCA structures and various rule bases. fcaR [12] is a R package that also computes fuzzy FCA. FCAlib5 is a Java library distributed as a jar file. FCAlib includes basic FCA algorithms such as implicational closure and implements an attribute exploration algorithm that can be customized according to expert needs. FcaStone6 aims to convert between the file formats of commonly-used FCA tools and between FCA formats and other graphical formats. To run, it requires the PERL files. The library presented above considers a formal context exclusively in the computation. Both Galicia v1 7 and RCAexplore8 consider RCA. They are developed in Java and distributed as a jar file. They aim to compute a family of relational conceptual structures. 3. FCA4J Features Figure 1 shows the FCA4J features organized along six high-level categories: (1) input file format, (2) output file format, (3) pre-processing commands, (4) computed structures command, (5) Java data structure used to compute structures, and (6) Duquennes-Guigues Basis of Implication (DGBI) computation parameters. This section introduces each of these categories. Input and output formats Several external formats are supported by FCA4J to facilitate interoperability with other software. As FCA input, it supports XML data files, i.e. XML format from Galicia v3 and CEX from ConExp, and text files, such as SLF (a HTK standard lattice format), CXT (Burmeisters ConImp), and CSV. For the latter, it allows data provided as Boolean and multi-valued attributes. As FCA output, it generates DOT, XML, JSON, TXT, and CSV format files. The conversion of a file to another format can be realized using the command Convert. Regarding RCA, it supports RCFT (the relational context family file format of RCAexplore, RCFTGZ (a compressed released of RCFT), and RCFAL (a JSON file containing an adjacency list). Pre-processing command FCA4J provides advanced commands required for the data pre- processing. Applied to a formal context, the command Inspect displays information (size, density, etc.), Binarize applies scaling operations to binarizing multi-valued data, and Clarify eliminates 2 https://rubygems.org/gems/qdfca 3 https://code.google.com/archive/p/colibri-java/ 4 v3 at https://sourceforge.net/projects/galicia/files/galicia/ 5 https://github.com/julianmendez/fcalib 6 https://github.com/upriss/fcastone 7 v1 at http://www.iro.umontreal.ca/ galicia/ 8 http://dataqual.engees.unistra.fr/logiciels/rcaExplore Figure 1: An overview of all features of FCA4J any duplicated objects and attributes. Reduce keeps only irreducible objects and attributes. Family, applied to a relational context family, enables the creation and the management of a relational context family using a set of formal contexts and relations. Computing structure command FCA4J computes various structures. The command Lattice computes the concept lattice (AddExtent option) or an Iceberg lattice (Iceberg option). Command AocPoset builds the subset of the concept lattice considering only concepts introducing an object or an attribute. Various algorithms are available, i.e. Hermes, Ares, Ceres, and Pluton. Irreducible lists irreducible objects and attributes. RuleBasis computes the canonical basis of implications (Duquenne-Guigues). Two algorithms are available, Lincbo [13] and LinCbO with pruning [14, 15]. RCA applies Relational Concept Analysis and creates a family of conceptual structures (AOCposets, Iceberg lattices or concept lattices) from a relational context family. Java data structures FCA4J integrates most of them through a design pattern in order to enable the user to select the most appropriate one to perform FCA or RCA computation. The available data structure types are BoolArray, BitSet, AVLtree, RBTree, IntOrderedSet, and fca4j.ISet. These data structure types present various implementations, such as RBTree, which uses the Java.util.TreeSet collection or the FastUtil framework9 . Other data structures are 9 https://fastutil.di.unimi.it provided by Koloboke10 and hppc11 collections. DGBI computation parameters Various configurations of the DGBI computation are avail- able. The formal context can be clarified, the closure computation can consider the history, and the calculation can be realized using a monothread or multithreads (fork Join Pool method). 4. FCA4J ecosystem and use case The FCA4J website fully describes the commands presented in Section 3. In addition, its Getting started page provides a dataset and a set of command lines to produce artifacts, e.g. concept lattice, Iceberg50, AOCposet, Duquenne-Guigues set of implications. This section presents the actual ecosystem of FCA4J and then a few use cases. Ecosystem FCA4J does not propose any viewer. To navigate within a conceptual structure family represented as a set of lattices, we developed the online visualiser RCAviz12 [16]. Figure 2 presents the process that combines FCA4J and RCAviz. The RCFT file containing the data is provided as input to FCA4J that generates the JSON file (the input file format required by RCAviz) using the command RCA via the terminal. The user has then to open this file with RCAviz to navigate within the lattices and from one to another. FCA4J is not managed by RCAviz in order to keep these applications independent. Moreover, FCA4J was integrated into Cogui13 platform to reason on knowledge through the representation of the inference rules, expressed as predicates sets, as an AOCposet. The aim is to perform a lazy evaluation of the predicates, by favouring the easiest to compute. Use cases FCA4J is currently used by students of the Software engineering Master at Mont- pellier University in several laboratory works as the practical support to learn FCA and RCA, to study Java class refactoring (using FCA) and the UML class model refactoring (using RCA). FCA4J is also used by students during their master internship applied to Knomana. For instance, L. Mahrach used FCA4J to explore the combination of RCA and DGBI with the aim of rendering knowledge suitable to experts [6]. The results showed that the way of splitting data between tables could have a positive or negative impact on the implications’ readability by the experts. Moreover, J. Saoud identified various types of knowledge elements and patterns that constitute the implications from the DGBI [7]. The results show that a post-process can be conducted to remove tacit knowledge elements from implications, to improve their readability by the experts. An on-going work by N. Saab evaluates the use of FCA4J to detect and correct various types of anomalies in the Knomana dataset and the associated sets of implications [17]. 10 https://github.com/leventov/Koloboke 11 https://github.com/carrotsearch/hppc 12 https://rcaviz.lirmm.fr 13 https://www.lirmm.fr/cogui/ Figure 2: Process for navigating within a Relational context family, provided as a RCFT format file, using FCA4J and RCAviz. 5. Conclusion Two releases of FCA4J are currently distributed as a jar file. One contains all the elements presented in this document. The light release does not contain the libraries FastUtil, hppc, and Koloboke as they are very large. This removal reduces the number of data structures available in FCA4J as well as the size of the distributed jar file, and thus facilitates the integration of FCA4J in an Apache Maven repository. In the context of a Master student project, we are currently investigating the use of FCA4J commands in a Jupyter notebook [18]. One investigated scenario is to design a workflow involv- ing tools such as Scikitlearn14 , weka15 or Orange16 , that includes association rule computation and classification of the rules in a concept lattice based on the premise attributes. Acknowledgments The authors would like to thank Huaxi (Yulin) Zhang for her co-supervision of the Master project [18]. Part of this work was supported by the French National Research Agency under the Investments for the Future Program (ANR-16-CONV-0004), and through the project SmartFCA (ANR-21-CE23-0023). 14 https://scikit-learn.org/stable/ 15 https://www.cs.waikato.ac.nz/ml/weka/ 16 https://orangedatamining.com/ References [1] P. J. Silvie, P. Martin, M. Huchard, P. Keip, A. Gutierrez, S. Sarter, Prototyping a knowledge- based system to identify botanical extracts for plant health in sub-saharan africa, Plants 10 (2021). doi:10.3390/plants10050896 . [2] M. R. Hacene, M. Huchard, A. Napoli, P. Valtchev, Relational concept analysis: mining concept lattices from multi-relational data, Ann. Math. Artif. Intell. 67 (2013) 81–108. [3] U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, The KDD process for extracting useful knowledge from volumes of data, Commun. ACM 39 (1996) 27–34. [4] P. Keip, A. Gutierrez, M. Huchard, F. Le Ber, S. Sarter, P. Silvie, P. Martin, Effects of Input Data Formalisation in Relational Concept Analysis for a Data Model with a Ternary Relation, in: Formal Concept Analysis, LNCS, Springer International Publishing, 2019, pp. 191–207. [5] J. L. Guigues, V. Duquenne, Famille minimale d’implications informatives résultant d’un tableau de données binaires, Math. et Sci. Hum. 24 (1986) 5–18. [6] L. Mahrach, A. Gutierrez, M. Huchard, P. Keip, P. Marnotte, P. Silvie, P. Martin, Combining implications and conceptual analysis to learn from a pesticidal plant knowledge base, volume 12879 of LNCS, Springer, 2021, pp. 57–72. [7] J. Saoud, A. Gutierrez, M. Huchard, P. Marnotte, P. Silvie, P. Martin, Explicit versus Tacit Knowledge in Duquenne-Guigues Basis of Implications: Preliminary Results, in: Workshop – Analyzing Real Data with Formal Concept Analysis (RealDataFCA’2021), 2021, pp. 20–27. [8] S. Ferré, M. Kaytoue, M. Huchard, S. O. Kuznetsov, A. Napoli, Formal concept analysis: From Knowledge discovery to Knowledge processing, in: A Guided Tour of Artificial Intelligence Research, volume II, Springer, 2018. [9] J. Poelmans, D. I. Ignatov, S. O. Kuznetsov, G. Dedene, Formal concept analysis in knowledge processing: A survey on applications, Expert Syst. Appl. 40 (2013) 6538–6560. [10] P. Cordero, M. Enciso, D. López, A. Mora, A conversational recommender system for diagnosis using fuzzy rules, Exp. Syst. App. 154 (2020) 113449. [11] U. Priss, Formal Concept Analysis Home page: FCA Software, https://upriss.github.io/f- ca/fcasoftware.html, 2022. [Last access 30-April-2022]. [12] P. Cordero, M. Enciso, D. López-Rodríguez, , Ángel Mora, fcaR, Formal Concept Analysis with R, The R Journal (2022) Accepted. [13] R. Janostik, J. Konecny, P. Krajča, LinCbO: Fast algorithm for computation of the Duquenne- Guigues basis. arXiv:2011.04928, 2020. arXiv:2011.04928 . [14] R. Janostik, J. Konecny, P. Krajča, Pruning techniques in LinCbO for computation of the Duquenne-Guigues basis, in: The 16th Int. Conf. on Formal Concept Analysis (ICFCA) 2021, number 12733 in LNCS/LNAI, Springer, 2021. [15] R. Janostik, J. Konecny, P. Krajca, LinCbO: Fast algorithm for computation of the Duquenne- Guigues basis, Inf. Sci. 572 (2021) 223–240. [16] E. Muller, M. Huchard, P. Martin, P. Poncelet, A. Sallaberry, RCAviz: Visualizing and Exploring Relational Conceptual Structures, in: Proc. of CLA 2022 - 16th Int. Conf. on Concept Lattices and Their Applications, 2022, pp. 135–148. [17] N. Saab, M. Huchard, P. Martin, Evaluating Formal Concept Analysis Software for Anomaly Detection and Correction, in: ETAFCA 2022 : Existing Tools and Applications for Formal Concept Analysis, Workshop @ CLA 2022, 2022, pp. 215–220. URL: https://cs.ttu.ee/events/ etafca-2022/. [18] T. Bros, R. Haoulani, J. C. Alla, N. Seoudi, FCA Notebook - Master 1st year - Research initia- tion report, Master’s thesis, Université de Montpellier, France, 2022. In French, supervision: Huaxi (Yulin) Zhang, Marianne Huchard.