<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>International Journal of Computer Mathematics 89 (2012) 510-526. URL:
https://doi.org/10.1080/00207160.2011.644275.
[9] E. Rodríguez Lorenzo</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>fcaR, Spreading FCA to the Data Science World</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pablo Cordero</string-name>
          <email>pcordero@uma.es</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manuel Enciso</string-name>
          <email>enciso@uma.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Domingo López-Rodríguez</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ángel Mora</string-name>
          <email>amora@uma.es</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Departamento de Matemática Aplicada, Universidad de Málaga</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2002</year>
      </pub-date>
      <volume>2527</volume>
      <fpage>141</fpage>
      <lpage>150</lpage>
      <abstract>
        <p>Formal concept analysis (FCA) has become a mature tool for extracting helpful knowledge for real problems based on solid mathematical foundations rooted in logic and lattice theory. However, in areas such as machine learning, big data, artificial intelligence, database, etc. remains a stranger. The R language is one of the main languages used in data science, and this work describes an R package called fcaR that implements FCA's core notions and techniques. One of the main goals is to spread FCA to the rest of the world. The main facilities of the tool are shown with a running example.</p>
      </abstract>
      <kwd-group>
        <kwd>R programming language</kwd>
        <kwd>Data science</kwd>
        <kwd>Formal concept analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>us to narrow the functional dependency set by removing redundant attributes. Although the
semantic of implications or if-then rules in other areas are diferent, the logic can be used too.
Using directly    , some automated deduction methods directly based on this inference system
have been developed for classical systems and fuzzy systems [6, 7, 8, 9, 10].</p>
      <p>Also, a generalization of    to the fuzzy framework [11]was developed. FASL, fuzzy
attribute simplification logic, has become a helpful reasoning tool for the fuzzy extension.</p>
      <p>As we have said, one of the main goals of the fcaR package is to provide a user-friendly
computational interface to the principal operators and methods of binary-fuzzy FCA, including the
mentioned logic tools. The use of R language can spread FCA to others communities. As of today the
package has 25000 downloads, published in CRAN repositories (https://cran.rstudio.com/web/
packages/fcaR/index.html) with a living live cycle https://github.com/Malaga-FCA-group/fcaR
and with vignettes to spread the package https://neuroimaginador.github.io/fcaR/.</p>
      <p>The work is organized as follows: Section 2 describes the internal classes implemented in the
library. Section 3 shows how to use the package. In Section 4, a real application of the package
is shown. Finally, some conclusions and future works are presented in Section 5.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Structure of fcaR</title>
      <p>The fcaR package provides data structures which allow the user to work seamlessly with formal
contexts and sets of implications. More explicitly, the following main classes are implemented,
using the R6 object-oriented-programming paradigm in R:
• FormalContext encapsulates the definition of a formal context (,  ,  ) , being  the
set of objects,  the set of attributes and  the (fuzzy) relationship matrix, and provides
methods to operate on the context using FCA tools.
• ImplicationSet represents a set of implications over a specific formal context.
• Set encapsulates a class for storing variables (attributes or objects) in an eficient way.</p>
      <p>As an advantage, object oriented programming style of R language and all the knowledge
(concepts, implications, minimal generators, etc.) will be stored inside the formal context object
fc.</p>
      <p>The main and computationally hard methods of FCA have been developed in C and linked to
fcaR.</p>
    </sec>
    <sec id="sec-3">
      <title>3. fcaR</title>
      <p>In this section, we present the very essential methods in the FCA framework using a well-known
running example about planets. From a dataset, we build an formal context object, named fc, in
R using the function FormalContext.</p>
      <p>Sets of attributes or objects will be stored in variables of type Set. For the variable fc
containing the formal context, a list of some methods are available: fc$clarify(), fc$attributes,
fc$objects, fc$concepts, fc$implications, etc.</p>
      <p>As an example, with the planets dataset (Table 1), we compute the intent, extent and the
closure of a set of attributes:</p>
      <p>small medium large near far moon no_moon
Mercury
Venus
Earth
Mars
Jupiter
Saturn
Uranus
Neptune
Pluto
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×</p>
      <p>To extract knowledge, we will use some methods associated to the variable fc. Some concepts
and implications are shown next:
&gt; fc$find_concepts()
&gt; fc$concepts[3:4]
A set of 2 concepts:
1: ({Jupiter, Saturn, Uranus, Neptune, Pluto}, {far, moon})
2: ({Jupiter, Saturn}, {large, far, moon})
&gt; fc$find_implications()
&gt; fc$implications[1:2]
Implication set with 2 implications.</p>
      <p>Rule 1: {no_moon} -&gt; {small, near}
Rule 2: {far} -&gt; {moon}</p>
      <p>And for the concepts and implications (inside the variable fc) we could use the main methods
and algorithms developed:
• For concepts: ìnfimum(), supremum(), top(), bottom(), plot(), size(),
join_irreducibles(), meet_irreducibles(), lower_neighbours(), etc.
• For implications: apply_rules(), cardinality(), to_basis(), filter(), etc.</p>
    </sec>
    <sec id="sec-4">
      <title>4. A case of study</title>
      <p>In this section, a real case of study showing fcaR on real-world problems is presented. The goal
is to extract knowledge about the features of tourist destinations given a user profile.</p>
      <p>The dataset vegas (see more information in the package) stores more than 500 TripAdvisor
reviews of hotels in Las Vegas Strip. The main attributes are:
• Period of Stay: 4 categories are present in the original data, which produces as many
binary variables: Dec-Feb, Mar-May, Jun-Aug and Sep-Nov.
• Traveler type: five binary categories are created from the original data: Business,</p>
      <p>Couples, Families, Friends and Solo.
• Pool, Gym, Tennis court, Spa, Casino, Free internet: binary variables for the services
ofered by each destination hotel.
• Stars: five binary variables are created, according to the number of stars of the hotel, 3,
3.5, 4, 4.5 and 5.</p>
      <p>• Score, the score assigned in the review, from 1 to 5, five variables are created.</p>
      <p>We can load the dataset, create a FormalContext object, and compute concepts and
implications with:
&gt; data(vegas)
&gt; fc &lt;- FormalContext$new(vegas)
&gt; fc$find_implications()</p>
      <p>In this case, it is complicated to visualize the lattice with 2082 concepts, thus we opt for
plotting a sublattice where we impose a minimum support:</p>
      <p>{}
{Free internet}
{Casino} {Gym}</p>
      <p>{Pool}
{Casino, Free internet}
{Gym, Free internet}
{Gym, Casino}
{Pool, Free internet}
{Pool, Casino}
{Pool, Gym}
{Gym, Casino, Free internet}
{Pool, Casino, Free internet}
{Pool, Gym, Free internet}</p>
      <p>{Pool, Gym, Casino}
{Pool, Gym, Casino, Free internet}</p>
      <p>This exploration gives some hints about the most important attributes in the dataset. After
that, the set of implications is manipulated to remove redundancies and remove those rules
with zero support:
&gt; fc$implications$apply_rules(c("simplification",
+ "composition",
+ "generalization"))
&gt; fc$implications &lt;- fc$implications[fc$implications$support() &gt; 0]</p>
      <p>We are now in position to pose the question that must be answered by means of the extracted
knowledge: for a given couple, searching for a hotel in Las Vegas with Spa, which are the
additional services that would make the highest score (5)?</p>
      <p>In order to answer this question, let us begin with a subset of the implications, those related
to couples travelling:
&gt; base_implications &lt;- fc$implications$filter("Traveler type=Couples")</p>
      <p>Then, specify the minimum services (Spa) in a Set:
&gt; Setattr1 &lt;- Set$new(fc$attributes)
&gt; Setattr1$assign("Traveler type=Couples" = 1, "Spa" = 1)</p>
      <p>And compute the closure by using the simplification logic, since we are interested in the
knowledge that can be inferred from the condition given by the set :
&gt; cl &lt;- base_implications$closure(Setattr1, reduce = TRUE)
&gt; specific_implications &lt;- cl$implications</p>
      <p>There are 36 implications representing the knowledge in the formal context for the required
case. Since the problem stated to extract the additional features needed to get a score of 5,
let us filter the new ImplicationSet by this condition on the RHS removing redundancies
previously:
&gt; specific_implications$filter(rhs = c("Score=5"))
Implication set with 5 implications.</p>
      <p>Rule 1: {Period of stay=Mar-May, Stars=4.5} -&gt; {Score=5}
Rule 2: {Period of stay=Jun-Aug, Stars=4.5} -&gt; {Score=5}
Rule 3: {Period of stay=Jun-Aug, Tennis court, Stars=3.5} -&gt; {Score=5}
Rule 4: {Period of stay=Dec-Feb, Tennis court, Stars=3.5} -&gt; {Score=5}
Rule 5: {Period of stay=Dec-Feb, Tennis court, Stars=3} -&gt; {Score=5}</p>
      <p>From these implications, we can infer the additional services that would make a perfect stay
for the user.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>The main objective in this work has been the development of an R package able to be useful
not just for the FCA community but in general to perform knowledge retrieval from binary or
fuzzy (graded) datasets. It is the first R package implementing the core methods in FCA.</p>
      <p>To sum up, the fcaR package is designed to:
• Manage formal contexts (datasets), implementing the core notions of formal concept
analysis: objects, attributes, derivation operators, concepts, closures, implications, etc.
• Extract the concepts and the concept lattice from a context.
• Find implications (exact association rules) that are true in the context.
• Provide tools to visualize the extracted knowledge.
• Implement the simplification logic for fuzzy and binary settings as the core of automated
methods based on logic to remove redundancy in an easy way (only applying the rules of
the logic), to compute closures and make recommendations.</p>
      <p>Thus, fcaR implements a wide range of features, and with the help of the included
documentation and vignettes, any user can start analysing datasets with FCA tools.</p>
      <p>From the point of view of eficiency, the fcaR package uses the vectorial and parallelization
capabilities of the R language, whereas algorithmic bottlenecks have been implemented in C. In
addition, we have used sparse matrices as the main internal data structure of the package.</p>
      <p>Currently, the package is under active development of several extensions or enhancements:
improvement of the eficiency of fuzzy algorithms, adding other algorithms of the FCA
community to compute the concept lattice or the implication basis, or the incorporation of advanced
algorithms such as the calculation of direct bases of implications and minimal generators that
have proved useful in practical applications.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Supported by Grants</surname>
          </string-name>
          TIN2017-89023-P,
          <fpage>UMA2018</fpage>
          -FEDERJA-001 and
          <fpage>PGC2018</fpage>
          -095869
          <string-name>
            <surname>-B-I00 of</surname>
          </string-name>
          the Junta de Andalucia, and European Social Fund.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Wille</surname>
          </string-name>
          ,
          <article-title>Restructuring lattice theory: An approach based on hierarchies of concepts</article-title>
          ,
          <source>in: Ordered Sets</source>
          , Springer,
          <year>1982</year>
          , pp.
          <fpage>445</fpage>
          -
          <lpage>470</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ganter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Wille</surname>
          </string-name>
          ,
          <source>Formal Concept Analysis - Mathematical Foundations</source>
          , Springer,
          <year>1999</year>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>642</fpage>
          -59830-2.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Belohlávek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vychodil</surname>
          </string-name>
          ,
          <article-title>Attribute dependencies for data with grades</article-title>
          I,
          <source>International Journal of General Systems</source>
          <volume>45</volume>
          (
          <year>2016</year>
          )
          <fpage>864</fpage>
          -
          <lpage>888</lpage>
          . URL: https://doi.org/10.1080/03081079.
          <year>2016</year>
          .
          <volume>1205711</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Belohlávek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vychodil</surname>
          </string-name>
          ,
          <article-title>Attribute dependencies for data with grades II</article-title>
          ,
          <source>International Journal of General Systems</source>
          <volume>46</volume>
          (
          <year>2017</year>
          )
          <fpage>66</fpage>
          -
          <lpage>92</lpage>
          . URL: https://doi.org/10.1080/03081079.
          <year>2016</year>
          .
          <volume>1205712</volume>
          . doi:
          <volume>10</volume>
          .1080/03081079.
          <year>2016</year>
          .
          <volume>1205712</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>