=Paper=
{{Paper
|id=Vol-3308/paper16
|storemode=property
|title=fcaR, Spreading FCA to the Data Science World
|pdfUrl=https://ceur-ws.org/Vol-3308/Paper16.pdf
|volume=Vol-3308
|authors=Pablo Cordero,Manuel Enciso,Domingo López-Rodríguez,Ángel Mora
|dblpUrl=https://dblp.org/rec/conf/cla/CorderoEL022
}}
==fcaR, Spreading FCA to the Data Science World==
<pdf width="1500px">https://ceur-ws.org/Vol-3308/Paper16.pdf</pdf>
<pre>
fcaR, Spreading FCA to the Data Science World
Pablo Cordero1,∗ , Manuel Enciso2 , Domingo López-Rodríguez1 and Ángel Mora1
1
    Departamento de Matemática Aplicada, Universidad de Málaga, Spain
2
    Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, Spain


                                         Abstract
                                         Formal concept analysis (FCA) has become a mature tool for extracting helpful knowledge for real
                                         problems based on solid mathematical foundations rooted in logic and lattice theory. However, in areas
                                         such as machine learning, big data, artificial intelligence, database, etc. remains a stranger. The R
                                         language is one of the main languages used in data science, and this work describes an R package called
                                         fcaR that implements FCA’s core notions and techniques. One of the main goals is to spread FCA to the
                                         rest of the world. The main facilities of the tool are shown with a running example.

                                         Keywords
                                         R programming language, Data science, Formal concept analysis


1. Introduction
We assume the main FCA works [1, 2] are known and show in this short introduction some fea-
tures of the developed package and the main references of the mathematical methods developed
in fcaR.
   Classic FCA is devoted to the study of binary datasets (formal contexts) where variables
are called attributes. Extensions of FCA (see [3, 4] ) have been developed to model real-world
problems for datasets containing imprecise, graded or vague information that is not adequately
represented as binary values. This fuzzy extension is able to model problems with numerical
and categorical attributes since these can be scaled to a truth value describing the degree of
fulfilment of the attribute.
   As it is well known, from a dataset (binary or fuzzy), FCA can compute maximal clusters,
named concepts, between objects and attributes with a hierarchy between the concepts and
relationships between the attributes (rules or implications) are computed with the same compu-
tational cost in FCA.
   We emphasize the notion of if-then rules as a efficient way to compact knowledge and enable
automatic handling by using logic. In this direction, [5] introduced a logic, named simplification
logic for functional dependencies (𝑆𝐿𝐹 𝐷 ), firmly based on a simplification rule, which allows

Published in Pablo Cordero, Ondrej Kridlo (Eds.): The 16𝑡ℎ International Conference on Concept Lattices and Their
Applications, CLA 2022, Tallinn, Estonia, June 20–22, 2022, Proceedings, pp. 199–205.
∗
    Corresponding author.
Envelope-Open pcordero@uma.es (P. Cordero); enciso@uma.es (M. Enciso); dominlopez@uma.es (D. López-Rodríguez);
amora@uma.es (Á. Mora)
Orcid 0000-0002-5506-6467 (P. Cordero); 0000-0002-0531-4055 (M. Enciso); 0000-0002-0172-1585 (D. López-Rodríguez);
0000-0003-4548-8030 (Á. Mora)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
us to narrow the functional dependency set by removing redundant attributes. Although the
semantic of implications or if-then rules in other areas are different, the logic can be used too.
Using directly 𝑆𝐿𝐹 𝐷 , some automated deduction methods directly based on this inference system
have been developed for classical systems and fuzzy systems [6, 7, 8, 9, 10].
   Also, a generalization of 𝑆𝐿𝐹 𝐷 to the fuzzy framework [11]was developed. FASL, fuzzy
attribute simplification logic, has become a helpful reasoning tool for the fuzzy extension.
   As we have said, one of the main goals of the fcaR package is to provide a user-friendly compu-
tational interface to the principal operators and methods of binary-fuzzy FCA, including the men-
tioned logic tools. The use of R language can spread FCA to others communities. As of today the
package has 25000 downloads, published in CRAN repositories (https://cran.rstudio.com/web/
packages/fcaR/index.html) with a living live cycle https://github.com/Malaga-FCA-group/fcaR
and with vignettes to spread the package https://neuroimaginador.github.io/fcaR/.
   The work is organized as follows: Section 2 describes the internal classes implemented in the
library. Section 3 shows how to use the package. In Section 4, a real application of the package
is shown. Finally, some conclusions and future works are presented in Section 5.


2. Structure of fcaR
The fcaR package provides data structures which allow the user to work seamlessly with formal
contexts and sets of implications. More explicitly, the following main classes are implemented,
using the R6 object-oriented-programming paradigm in R:

    • FormalContext encapsulates the definition of a formal context (𝐺, 𝑀, 𝐼 ), being 𝐺 the
      set of objects, 𝑀 the set of attributes and 𝐼 the (fuzzy) relationship matrix, and provides
      methods to operate on the context using FCA tools.
    • ImplicationSet represents a set of implications over a specific formal context.
    • Set encapsulates a class for storing variables (attributes or objects) in an efficient way.

   As an advantage, object oriented programming style of R language and all the knowledge
(concepts, implications, minimal generators, etc.) will be stored inside the formal context object
fc .
   The main and computationally hard methods of FCA have been developed in C and linked to
fcaR .


3. fcaR
In this section, we present the very essential methods in the FCA framework using a well-known
running example about planets. From a dataset, we build an formal context object, named fc , in
R using the function FormalContext .
   Sets of attributes or objects will be stored in variables of type Set . For the variable fc contain-
ing the formal context, a list of some methods are available: fc$clarify() , fc$attributes ,
fc$objects , fc$concepts , fc$implications , etc.
   As an example, with the planets dataset (Table 1), we compute the intent, extent and the
closure of a set of attributes:
                         small   medium    large   near   far   moon   no_moon
               Mercury     ×                        ×                     ×
               Venus       ×                        ×                     ×
               Earth       ×                        ×            ×
               Mars        ×                        ×            ×
               Jupiter                       ×            ×      ×
               Saturn                        ×            ×      ×
               Uranus               ×                     ×      ×
               Neptune              ×                     ×      ×
               Pluto       ×                              ×      ×

Table 1
Planets dataset.


> set_objetcs <- Set$new(fc$objects)
> set_objetcs$assign(Mars = 1, Earth = 1)
> fc$intent(set_objetcs)
{small, near, moon}
> set_attributess1 <- Set$new(fc$attributes)
> set_attributess1$assign(medium = 1, far = 1)
> fc$extent(set_attributess1)
{Uranus, Neptune}
> set_attributess2 <- Set$new(fc$attributes)
> set_attributess2$assign(medium = 1)
> fc$closure(set_attributess2)
{medium, far, moon}

  To extract knowledge, we will use some methods associated to the variable fc . Some concepts
and implications are shown next:

> fc$find_concepts()
> fc$concepts[3:4]
A set of 2 concepts:
1: ({Jupiter, Saturn, Uranus, Neptune, Pluto}, {far, moon})
2: ({Jupiter, Saturn}, {large, far, moon})
> fc$find_implications()
> fc$implications[1:2]
Implication set with 2 implications.
Rule 1: {no_moon} -> {small, near}
Rule 2: {far} -> {moon}

  And for the concepts and implications (inside the variable fc) we could use the main methods
and algorithms developed:
    • For concepts:  ìnfimum() , supremum() , top() , bottom() , plot() ,             size() ,
      join_irreducibles() , meet_irreducibles() , lower_neighbours() , etc.
    • For implications: apply_rules() , cardinality() , to_basis() , filter() , etc.


4. A case of study
In this section, a real case of study showing fcaR on real-world problems is presented. The goal
is to extract knowledge about the features of tourist destinations given a user profile.
   The dataset vegas (see more information in the package) stores more than 500 TripAdvisor
reviews of hotels in Las Vegas Strip. The main attributes are:

    • Period of Stay : 4 categories are present in the original data, which produces as many
      binary variables: Dec-Feb , Mar-May , Jun-Aug and Sep-Nov .
    • Traveler type : five binary categories are created from the original data: Business ,
      Couples , Families , Friends and Solo .
    • Pool , Gym , Tennis court , Spa , Casino , Free internet : binary variables for the services
      offered by each destination hotel.
    • Stars : five binary variables are created, according to the number of stars of the hotel, 3 ,
      3.5 , 4 , 4.5 and 5 .
    • Score , the score assigned in the review, from 1 to 5 , five variables are created.

   We can load the dataset, create a FormalContext object, and compute concepts and implica-
tions with:

> data(vegas)
> fc <- FormalContext$new(vegas)
> fc$find_implications()

  In this case, it is complicated to visualize the lattice with 2082 concepts, thus we opt for
plotting a sublattice where we impose a minimum support:
                                                                            {}


                                        {Free internet}         {Casino}          {Gym}                           {Pool}


   {Casino, Free internet}         {Gym, Free internet}         {Gym, Casino}             {Pool, Free internet}            {Pool, Casino}        {Pool, Gym}


                {Gym, Casino, Free internet}          {Pool, Casino, Free internet}           {Pool, Gym, Free internet}              {Pool, Gym, Casino}


                                                                      {Pool, Gym, Casino, Free internet}
  This exploration gives some hints about the most important attributes in the dataset. After
that, the set of implications is manipulated to remove redundancies and remove those rules
with zero support:

> fc$implications$apply_rules(c("simplification",
+                               "composition",
+                               "generalization"))
> fc$implications <- fc$implications[fc$implications$support() > 0]

   We are now in position to pose the question that must be answered by means of the extracted
knowledge: for a given couple, searching for a hotel in Las Vegas with Spa, which are the
additional services that would make the highest score (5)?
   In order to answer this question, let us begin with a subset of the implications, those related
to couples travelling:

> base_implications <- fc$implications$filter("Traveler type=Couples")


  Then, specify the minimum services (Spa) in a Set :

> Setattr1 <- Set$new(fc$attributes)
> Setattr1$assign("Traveler type=Couples" = 1, "Spa" = 1)

  And compute the closure by using the simplification logic, since we are interested in the
knowledge that can be inferred from the condition given by the set :

> cl <- base_implications$closure(Setattr1, reduce = TRUE)
> specific_implications <- cl$implications

   There are 36 implications representing the knowledge in the formal context for the required
case. Since the problem stated to extract the additional features needed to get a score of 5,
let us filter the new ImplicationSet by this condition on the RHS removing redundancies
previously:

> specific_implications$filter(rhs = c("Score=5"))
Implication set with 5 implications.
Rule 1: {Period of stay=Mar-May, Stars=4.5} -> {Score=5}
Rule 2: {Period of stay=Jun-Aug, Stars=4.5} -> {Score=5}
Rule 3: {Period of stay=Jun-Aug, Tennis court, Stars=3.5} -> {Score=5}
Rule 4: {Period of stay=Dec-Feb, Tennis court, Stars=3.5} -> {Score=5}
Rule 5: {Period of stay=Dec-Feb, Tennis court, Stars=3} -> {Score=5}


  From these implications, we can infer the additional services that would make a perfect stay
for the user.
5. Conclusions
The main objective in this work has been the development of an R package able to be useful
not just for the FCA community but in general to perform knowledge retrieval from binary or
fuzzy (graded) datasets. It is the first R package implementing the core methods in FCA.
  To sum up, the fcaR package is designed to:

    • Manage formal contexts (datasets), implementing the core notions of formal concept
      analysis: objects, attributes, derivation operators, concepts, closures, implications, etc.
    • Extract the concepts and the concept lattice from a context.
    • Find implications (exact association rules) that are true in the context.
    • Provide tools to visualize the extracted knowledge.
    • Implement the simplification logic for fuzzy and binary settings as the core of automated
      methods based on logic to remove redundancy in an easy way (only applying the rules of
      the logic), to compute closures and make recommendations.

   Thus, fcaR implements a wide range of features, and with the help of the included documen-
tation and vignettes, any user can start analysing datasets with FCA tools.
   From the point of view of efficiency, the fcaR package uses the vectorial and parallelization
capabilities of the R language, whereas algorithmic bottlenecks have been implemented in C. In
addition, we have used sparse matrices as the main internal data structure of the package.
   Currently, the package is under active development of several extensions or enhancements:
improvement of the efficiency of fuzzy algorithms, adding other algorithms of the FCA commu-
nity to compute the concept lattice or the implication basis, or the incorporation of advanced
algorithms such as the calculation of direct bases of implications and minimal generators that
have proved useful in practical applications.


Acknowledgments
Supported by Grants TIN2017-89023-P, UMA2018-FEDERJA-001 and PGC2018-095869-B-I00 of
the Junta de Andalucia, and European Social Fund.


References
 [1] R. Wille, Restructuring lattice theory: An approach based on hierarchies of concepts, in:
     Ordered Sets, Springer, 1982, pp. 445–470.
 [2] B. Ganter, R. Wille, Formal Concept Analysis - Mathematical Foundations, Springer, 1999.
     URL: https://doi.org/10.1007/978-3-642-59830-2.
 [3] R. Belohlávek, V. Vychodil, Attribute dependencies for data with grades I, International
     Journal of General Systems 45 (2016) 864–888. URL: https://doi.org/10.1080/03081079.2016.
     1205711.
 [4] R. Belohlávek, V. Vychodil, Attribute dependencies for data with grades II, International
     Journal of General Systems 46 (2017) 66–92. URL: https://doi.org/10.1080/03081079.2016.
     1205712. doi:10.1080/03081079.2016.1205712 .
 [5] P. Cordero, M. Enciso, A. Mora, I. P. de Guzmán, SLFD logic: Elimination of data redun-
     dancy in knowledge representation, in: IBERAMIA, volume 2527 of LNCS, Springer, 2002,
     pp. 141–150. URL: https://doi.org/10.1007/3-540-36131-6_15.
 [6] A. Mora, M. Enciso, P. Cordero, I. P. de Guzmán, An efficient preprocessing transforma-
     tion for functional dependencies sets based on the substitution paradigm, in: CAEPIA
     2003, volume 3040 of LNCS, Springer, 2003, pp. 136–146. URL: https://doi.org/10.1007/
     978-3-540-25945-9_14.
 [7] P. Cordero, M. Enciso, A. Mora, M. Ojeda-Aciego, Computing minimal generators from
     implications: a logic-guided approach, in: CLA 2012, volume 972 of CEUR W.Proc., CEUR-
     WS.org, 2012, pp. 187–198. URL: http://ceur-ws.org/Vol-972/paper16.pdf.
 [8] A. Mora, P. Cordero, M. Enciso, I. Fortes, G. Aguilera, Closure via functional dependence
     simplification, International Journal of Computer Mathematics 89 (2012) 510–526. URL:
     https://doi.org/10.1080/00207160.2011.644275.
 [9] E. Rodríguez Lorenzo, K. Bertet, P. Cordero, M. Enciso, A. Mora, The direct-optimal basis
     via reductions, in: CLA 2014, volume 1252 of CEUR W.Proc., CEUR-WS.org, 2014, pp.
     145–156. URL: http://ceur-ws.org/Vol-1252/cla2014_submission_18.pdf.
[10] E. Rodríguez Lorenzo, K. V. Adaricheva, P. Cordero, M. Enciso, A. Mora, From an implica-
     tional system to its corresponding d-basis, in: CLA 2015, volume 1466 of CEUR W.Proc.,
     CEUR-WS.org, 2015, pp. 217–228. URL: http://ceur-ws.org/Vol-1466/paper18.pdf.
[11] R. Belohlávek, P. Cordero, M. Enciso, A. Mora, V. Vychodil, Automated prover for attribute
     dependencies in data with grades, International Journal of Approximate Reasoning 70
     (2016) 51–67. URL: https://doi.org/10.1016/j.ijar.2015.12.007.

</pre>