<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Unified External Data Access Implementation in Formal Concept Analysis Research Toolbox</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alexey A. Neznanov</string-name>
          <email>ANeznanov@hse.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrey A. Parinov</string-name>
          <email>AParinov@hse.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Research University Higher School of Economics</institution>
          ,
          <addr-line>20 Myasnitskaya Ulitsa, Moscow, 101000</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Formal Concept Analysis (FCA) provides mathematical models, methods and algorithms for data analysis. However, by now there is no easily available program system, which would provide data analyst with unified, intelligible and transparent access to various external data sources with large amount of heterogeneous data for subsequent FCA-based knowledge discovery. The lack of such tools complicates spreading FCA methods among big data analysts and miners of unstructured data. In this paper, we describe advances and new functionality in external data querying and preprocessing subsystems of Formal Concept Analysis Research Toolbox (FCART), which helps processing data of different types in a unified way.</p>
      </abstract>
      <kwd-group>
        <kwd>Formal Concept Analysis</kwd>
        <kwd>Knowledge Extraction</kwd>
        <kwd>Data Mining</kwd>
        <kwd>Text Mining</kwd>
        <kwd>Social Network Mining</kwd>
        <kwd>Software</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        By now, mathematical models of Formal Concept Analysis (FCA) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] are widely used
for solving various problems of Knowledge Discovery and Artificial Intelligence
[
        <xref ref-type="bibr" rid="ref2 ref3">2,3</xref>
        ]. Some systems use FCA ideas implicitly, by processing closed sets of attributes
or objects. In this paper we will concentrate on explicit implementation of FCA
methods as part of analyst’s workflow in a software system. Three main problems
here can be stated as follows.
1. How to generate suitable input data for FCA-based methods?
2. How to keep initial data properties and metadata while analyzing object-attribute
representation by FCA-based methods?
3. How to combat high computational complexity of FCA-based methods in the
context of an integral analyst’s workflow?
      </p>
      <p>
        Around the middle of the last decade, there were several successful
implementations for transforming a relatively small formal context into a line diagram
and computing implications and association rules. In [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] we have discussed
wellknown FCA-based tools, like ConExp [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], Conexp-clj [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], Galicia [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], Tockit [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ],
ToscanaJ [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], Lattice Miner [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], OpenFCA [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], Coron [
        <xref ref-type="bibr" rid="ref12 ref13">12,13</xref>
        ], Cubist [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Most of
the reviewed software tools are local applications that require initial data in the form
of binary or many-valued context in one of the common formats (CSV, CXT or
other). Thus, such programs can not be used on the stage of data gathering and
preprocessing, but we should include input formats of those programs in the list of
supported formats for future integration.
      </p>
      <p>
        Formal Concept Analysis Research Toolbox (FCART) [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] supports iterative
methodology of data mining and knowledge discovery. One of the goals of
developing FCART is to create a system for handy analysis of heterogeneous data
gathered from external data sources, e.g. SQL databases, NoSql databases and Social
Network Services. FCART was successfully applied to analyzing data in medicine,
criminalistics, sociology, and trend detection [
        <xref ref-type="bibr" rid="ref15 ref3">3, 15</xref>
        ].
      </p>
      <p>
        In previous papers, we have described the system architecture, main workflow and
stages of data extraction from various external sources. Here we would like to
describe recent progress in the distributed version [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] of FCART and its Intermediate
Data Storage (IDS) subsystem. This progress is mainly related to new functionality in
data preprocessing.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Problems description</title>
      <p>Data analysis is highly dependent on preprocessing, i.e., transformation from the
source data format to the target data format, in which data are processed. An
important functionality of any data analysis system is to support analyst in
preprocessing transformations, making them transparent and easy.
2.1</p>
      <sec id="sec-2-1">
        <title>A gap between FCA analytical artifacts and external data</title>
        <p>From an analyst point of view, there is a gap between FCA analytical artifacts
workflow and data and the legacy data. Fig. 1 illustrates this gap between
“analyzable” and “external” data. It should be emphasized that it is not a gap between
concrete data formats or access protocols, it is the gap in ways of thinking and
knowledge representation.</p>
        <p>The four main questions of object-attribute-value (or object-attribute)
representation of data are trivial: 1) What are objects? 2) What are attributes? 3) How
do we gather values of attributes? 4) How do we interpret values of attributes?</p>
        <p>
          However, such questions bring into being a great many technological questions.
For now, we can observe specific data preprocessing techniques of concrete data
analysis projects. Can we propose fully unified approach? In general, the answer is
no. However, we can try to adapt some common techniques for most popular classes
of initial data formats and external data sources. On the one hand, we can see
appearance of such terms as “Data Tidying” [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] for some “human readable” variants
of ETL (Extract-Transform-Load) processes. On the other hand, there are continuous
development of such monster software as Oracle Data Integrator Enterprise Edition
[
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] or less monstrous Microsoft PowerBI [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ].
        </p>
        <p>Analyzable Data</p>
        <p>Integrated Preprocessed Data
(Analyst-oriented representation)</p>
        <p>Heterogeneous Data</p>
        <p>with specific API
FCA-oriented query</p>
        <p>Gap!</p>
        <p>Data Snapshot
(Multivalued Context)</p>
        <p>Binary Context
Concept Lattice</p>
        <p>Other Artifacts</p>
        <p>Relational Data
(accessed by SQL queries)
1. Basic FCA algorithms have very high computational complexity.
2. Big concept lattices are not suitable for interactive processing and visualizing.</p>
        <p>
          There were many attempts to adapt FCA-based methods for complex tasks. For
example, building Iceberg Lattices [
          <xref ref-type="bibr" rid="ref20 ref21 ref22">20, 21, 22</xref>
          ], visualizing other fragments of
lattices, using incremental lattice construction algorithms.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>A few comments about methodology</title>
        <p>The core of the FCART supports knowledge discovery techniques, based on Formal
Concept Analysis, clustering, multimodal clustering, pattern structures and others.
From the analyst point of view, basic FCA workflow in FCART has four stages. On
each stage, a user has the ability to import/export every artifact or add it to a report.
1. The filling Intermediate Data Storage (IDS) of FCART from various external SQL,
XML/JSON and other data sources (querying external source is described by an
External Data Query Description – EDQD). EDQD can be constructed by some
visual External Data Browser (see later).
2. The loading a data snapshot from the IDS into an analytic session (a snapshot is
described by a Snapshot Profile). A data snapshot is a data table with annotated
structured and text attributes (a many-valued context) loaded in the system by
accessing IDS.
3. The transforming a snapshot to a binary context (a transformation is described by a</p>
        <p>Scaling Query).
4. The building and visualizing formal concept lattice and other artifacts based on the
binary context within an analytic session.</p>
        <p>
          Later in this paper we will discuss mainly the first stage and using EDQDs. Hadley
Wickham in [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] wrote: “there has been little research on how to make data cleaning
as easy and effective as possible”. The second and the third stages with example of
Snapshot Profile construction were initially described in [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ].
2.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>FCART architecture and the role of the IDS</title>
        <p>The current distributed version of FCART consists of the following four parts:
1. FCART AuthServer for authentication and authorization, as well as integration of
algorithmic and storage resources.
2. FCART Intermediate Data Storage (IDS) for storage and preprocessing (initial
converting, indexing of text fields, etc.) of big datasets.
3. FCART Thick Client (Client) for interactive data processing and visualization in
integrated graphical multi-document user interface.
4. FCART Web-based solvers (Web-Solvers) for implementing independent
resourceintensive computations.</p>
        <p>IDS plays important role in effectiveness of whole data analysis process because
all data from external data storages, session data and intermediate analytic artifacts
saves in IDS. All interaction between user and external data storages goes through the
IDS. All interactions between Client, Web-Solvers and IDS go through a RESTful
Web-API. The http-request to the IDS web-service constructed from two parts: prefix
part and command part. Prefix part contains domain name and local path (e.g.
http://zeus2.hse.ru:8444/). The command part describes what IDS has to do and
represents some function of the Web-API. Using web-service commands, FCART
client can query data from external data storages in uniform and efficient way.</p>
        <p>Early we already have implemented populating IDS from external data sources, but
now we extend the set of providers and improve data providers’ EDQDs.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Worlds of data and data representation in IDS</title>
      <p>
        Readers may have noticed that a simplest case of legacy data for object-attribute
representation is relational data that meet the well-known conditions of E. Codd [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ].
In this case we have virtually multivalued context. In the current state of Internet
development we should distinguish at least the following types of data sources:
1. Relational data sources (directly queried by SQL).
2. NoSQL document collections (queried by XQuery or similar query languages).
3. Text collections with full-text index (queried by special full-text queries).
4. Social Network Services (with plenty of different access APIs).
3.1
      </p>
      <sec id="sec-3-1">
        <title>Data integration problems and FCART Intermediate Data Storage</title>
        <p>
          Documents are kept in many data formats (only ISO standards describe more then 400
formats, for example see [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ]). After open data revolution [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] and Web
infrastructure integration in Internet [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ], most popular formats for information
interchange are Comma Separate Values (CSV) [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ], Extensible Markup Language
(XML) [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ] and JavaScript Object Notation (JSON) [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ]. Extensible Markup
Language (XML) is a markup language that defines a set of rules for encoding
documents in a format that is both human-readable and machine-readable. The main
goal of XML is to store metainformation with information itself. Hundreds of
document formats using XML syntax have been developed, including RSS, Atom,
SOAP, and XHTML. XML-based formats have become the default for many
officeproductivity tools, including Microsoft Office (Office Open XML), OpenOffice.org
and LibreOffice (OpenDocument), and Apple's iWork. XML has also been employed
as the base language for communication protocols, such as XMPP.
        </p>
        <p>XML and its extensions have regularly been criticized for verbosity and
complexity. JSON is lightweight alternative which focus on representing (serializing)
programming language level objects with complex data structures rather than
documents, which may contain both highly structured and relatively unstructured
content. JSON is an open standard format that uses human-readable text to transmit
data objects consisting of attribute–value pairs.</p>
        <p>Traditional relational databases are not convenient for fast processing of big
amounts of unstructured textual datasets with metadata. Document-oriented databases
operating with documents in XML or JSON format are successfully used for storing,
retrieving and managing big amounts of textual data in last decade. Both FCART IDS
and FCART Client can handle XML and JSON documents as input format. XML
format is complex and relatively hard to process at the same time. JSON format is
more easy to use and lightweight. FCART uses JSON internally as a main format for
data serialization and intercomponent communication.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Main terms and terminology problems</title>
        <p>
          Preliminary problem of the discussed concepts is a terminological one. Table 1
illustrates the difference in approaches to defining terms for basic data-related
concepts in SQL Servers (as stated in the SQL ISO Standard [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ]), full-text indexing
systems (as stated in the Elasticsearch reference [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ]) and document-oriented NoSQL
storages (as stated in the MongoDB reference [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ]). One can look at term “Index” as
a good example of polysemantic word. Graph-oriented databases use absolutely
different terms for atomic elements (vertices, nodes, link, edges, arcs) and data
structures, that reflect incidence, adjacency neighbourhood, etc.
        </p>
        <p>
          In IDS we use data representation in form of “Databases” with hierarchical
structure of “Collections” of JSON “Documents”. Each of the Documents may
contains heterogeneous “Fields”. Each Collection can possess metadata, which
describes structure of Documents and data types of Fields using JSON Schema [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ].
It is very powerful approach, which gives an ability to validate Documents with
compound data types.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>External Data Queries in IDS</title>
      <p>Extracting data is complicated by the fact that any Internet data source may have its
own API. For example, we consider Social Network Services as a data source. That is
why one needs mechanism to describe how data should be extracted, preprocessed
and stored in IDS. For unified data access we developed External Data Query
Description (EDQD) language. Each EDQD is a JSON formatted document. It aims
to unify access to different data sources. By using EDQD FCART represents data
from various data sources as a IDS Collection. There is no way to create a single
query with fixed fields that would be with various data sources because each data
source has its own set of functions, its own API. However, we developed the most
common EDQD types and field set for mentioned above data sources types (Fig. 2).</p>
      <p>EDQD for IDS
Intermediate Data Storage</p>
      <p>IDS Web-service interface</p>
      <p>Http Request Parser
IDS Import/Export Tools
Communication Subsystem</p>
      <p>Indexing Subsystem
MongoDB</p>
      <p>Elasticsearch
Snapshot Profile</p>
      <p>JSON Collection</p>
      <p>EDQD for SQL
EDQD for Text
EDQD for SD</p>
      <p>External data</p>
      <p>Relational Data
(accessed by SQL queries)
Network Data
(Graph models)</p>
      <p>Data Snapshot
(Multivalued Context)</p>
      <p>EDQD for SNS</p>
      <p>
        Each field is a JSON object. An EDQD query includes next fields:
─ ID – this field describes a unique identifier GUID.
─ TYPE – this field describes type of data source. For now, FCART supports next
types: “FS Folder”, “FS File”, “TSQL”, “REST”, “SOAP”, ”Facebook”, etc. Also
EDQD with type “IDS” can refer to documents which are already stored in IDS.
─ URI [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] – this field describes path to the data source. It is optional.
─ CS – this field describes connection string [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ]. It is optional.
─ QUERY – this field describes query to the data source.
─ TARGET – this field describes target data storage. For now, it’s can be set to the
values – URI file path, “IDS”.
─ TRANSFORMATION – this field describes type of source field, connection
between source field and fields of target data storage; and all transformation such
as scaling, indexing, etc.
      </p>
      <p>This are the common fields for all EDQD queries. Below we have described
specific for data sources EDQD queries.</p>
      <p>Creating EDQD is a complex task, which needs visual tool for development. For
now, External Data Browsers have been prepared to help user constructing EDQD for
local JSON/XML files, unstructured text files and SQL data sources. Other types of
EDQD can be created via direct JSON editing.
4.1</p>
      <sec id="sec-4-1">
        <title>Query to a SQL data source</title>
        <p>EDQD for a SQL data source is the most straightforward. For now, IDS supports
connection to the Microsoft SQL 2014 (and its earlier versions) and Postgres 9.5.2
(and its earlier versions). EDQD for SQL has the following fields:
─ “ID” – GUID (Globally Unique Identifier).
─ “TYPE” – DBMS Type. Can be “TSQL” or “PS”.
─ “URI” – This field is empty for that EDQD query type.
─ “CS” – Connection String.
─ “QUERY” – TSQL or PL-SQL query.
─ “TRANSFORMATION” – For now it describes mapping a column name to a
target field path in JSON document.
}</p>
      </sec>
      <sec id="sec-4-2">
        <title>Query to unstructured text files</title>
        <p>EDQD for Text (a collection of files with unstructured text) provides ability to
extract and transform data from unstructured texts. To analyze data from unstructured
data file we need to create an inverted index. Using inverted index reduces searching
time for every text word.</p>
        <p>
          The inverted index is a central component of indexing search engine. A goal of a
search engine implementation is to optimize the speed of the query: find the
documents where word X occurs. Once a forward index is developed, which stores
lists of words per document, it is next inverted to develop an inverted index. Querying
the forward index would require sequential iteration through each document and each
word to verify a matching document. The time, memory, and processing resources to
perform such a query are not always technically realistic. Instead of listing the words
per document in the forward index, the inverted index data structure is developed
which lists the documents per word [
          <xref ref-type="bibr" rid="ref36">36</xref>
          ].
        </p>
        <p>
          To create inverted index, we use full-text search engine. For now, there are many
full-text search engines, which provides rapid search, complicated query language and
REST interface. Solr [
          <xref ref-type="bibr" rid="ref37">37</xref>
          ] and Elasticsearch [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ] are the most powerful and popular
search engines for now. In the previous paper [
          <xref ref-type="bibr" rid="ref38">38</xref>
          ] we described detailed comparison
of Solr and Elasticsearch as basis for implementing full-text manipulating part of IDS.
In the paper we showed speed advantage of Elasticsearch in situation of indexing and
inserting data at the same time. It’s important to search text data because unstructured
text is often a part of other data types, e.g., structured documents (CSV, JSON,
XML), documents extracted from social network services (user information, posts).
        </p>
        <p>Initial sets of automatically extracted keywords may be very big. We can have
additional instruments for such sparse contexts with many uniform attributes like
sorting and searching attributes (Fig. 3) or analyzing attributes usage statistics. But
more proper way to generate initial context is using adjustable query.</p>
        <p>Example of EDQD for a query to a folder with text files:
{ “ID”: {6F9619FF-8B86-D011-B42D-00CF4FC964FF},
“TYPE”:”FS folder”
“URI”:”file://localhost/c|/source/”
“CS“:””
“QUERY”:””
“TARGET”:” file://localhost/c|/target”
“TRANSFORMATION”: {
“field”: {
“name”: “”
“target_field”:”body”,
“type”:”text”,
“indexing”: True}
}</p>
        <p>Field “Transformation” is the most interesting part of EDQD for a local text-files
folder. “Name” refers to a field of result IDS document is affected. “Target_field”
describes name in IDS document. “Type” describes type of the source field. The value
of the EDQD field “Type” determines operations and transformations which are
applicable to a document field. By now, FCART supports an indexing operation on
the “text” type. By default, the value of “Indexing” field is False.
}</p>
        <p>EDQD for web-service interface provides ability to extract data from web-service.
In the current version FCART supports REST and SOAP interfaces. EDQD for
webservice has the following fields:
─ “ID” – GUID (Globally Unique Identifier).
─ “TYPE” – Web-service type. Can be “REST” or “SOAP”.
─ “URI” – URI of web-service.
─ “CS” - This field is empty.
─ “QUERY” – JSON document which contains query.
─ “TRANSFORMATION” – JSON document which describes field mapping.</p>
        <p>Example of EDQD query to an Elasticsearch REST interface:
{ “ID”: {6F9619FF-8B86-D011-B42D-00CF4FC964FF},
”TYPE”:”REST”,
“URI”:”http://elasticsearch:1234/index_name/mapping_name/”,
“CS”: “”,
“QUERY”: “{
"query": {
"bool": {
"must": [
{"match":{"address": "mill" }},
{"match":{"address": "lane" }}
] }}}”,
“TRANSFORMATION”: “{
“field”:{
“target_field”: “body”,
“indexing”: True}”
}</p>
        <p>REST interfaces can iterate set of elements, which are returned by query. Query
field contains JSON document written on Elasticsearch query language
(https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html).</p>
        <p>Transformation field contains JSON document, which describes target field and
preprocessing operation. The current version of FCART supports only indexing
operation.
4.5</p>
      </sec>
      <sec id="sec-4-3">
        <title>EDQD query to a Social Network Service</title>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Discussion and future work</title>
      <p>In this paper, main problems of external data access in FCA-based analytics software
were addressed and some real cases were examined while implementing new
functionality in the FCART system. The demo version of FCART client is available
at https://cs.hse.ru/en/ai/issa/proj_fcart and the test version of the IDS Web-service is
available at http://zeus2.hse.ru:8444.</p>
      <p>For FCA-based data analysis fundamental requirements for software are as
follows:
1. The ability to merge heterogeneous data sources in a query to external data.
2. The ability to cache frequent queries.
3. The automatic populating of query metadata.
4. The support of many formats of local data files to communicate with other
software tools easily.
5. The support of apriori prescribed constraints on FCA algorithms and
visualization schemes.
6. The availability of common and special “quick and dirty” methods of query
result visualization with low computational complexity.</p>
      <p>When prototyping clinical decision support system components, we have realized
the importance of having local and web-based versions of the preprocessing tools. So
unification of external data access tools is the first step in satisfying informal analysts’
wishes. We also understand importance of other subsystems, including efficient data
transformation algorithms, dashboards, etc. However, without unified and
reproducible access to initial data no one can build real data analysis workflow.</p>
      <p>Improved mechanisms of query data work faster, more intelligible and provide
necessary information to data analyst. The next steps in our development process are
adding new External Data Browsers, increasing efficiency of EDQD processing and
standardizing new API for running Web-Solvers inside IDS instead of Client.
Acknowledgments This work was carried out by the authors within the project “Mining Data with
Complex Structure and Semantic Technologies” supported by the Basic Research Program of the
National Research University Higher School of Economics.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ganter</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wille</surname>
          </string-name>
          , R. Formal
          <source>Concept Analysis: Mathematical Foundations</source>
          , Springer,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Poelmans</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ignatov</surname>
            ,
            <given-names>D.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dedene</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <article-title>Formal Concept Analysis in knowledge processing: A survey on models and techniques</article-title>
          .
          <source>Expert Systems with Applications</source>
          ,
          <volume>40</volume>
          (
          <issue>16</issue>
          ),
          <year>2013</year>
          pp.
          <fpage>6601</fpage>
          -
          <lpage>6623</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Poelmans</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ignatov</surname>
            ,
            <given-names>D.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dedene</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <article-title>Formal Concept Analysis in knowledge processing: A survey on applications</article-title>
          .
          <source>Expert Systems with Applications</source>
          ,
          <volume>40</volume>
          (
          <issue>16</issue>
          ),
          <year>2013</year>
          , pp.
          <fpage>6538</fpage>
          -
          <lpage>6560</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Neznanov</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parinov</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          <string-name>
            <surname>About</surname>
          </string-name>
          <article-title>Universality and Flexibility of FCA-based Software Tools</article-title>
          .
          <source>Proceedings of the 3rd International Workshop "What can FCA do for Artificial Intelligence?"</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>59</fpage>
          -
          <lpage>65</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Yevtushenko</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          <article-title>System of data analysis “Concept Explorer” (In Russian)</article-title>
          .
          <source>7th National Conference on Artificial Intelligence (KII-2000)</source>
          , Russia,
          <year>2000</year>
          , pp.
          <fpage>127</fpage>
          -
          <lpage>134</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>6. Conexp-clj (http://daniel.kxpq.de/math/conexp-clj)</mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Valtchev</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grosser</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roume</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Hacene</surname>
            ,
            <given-names>M.R.</given-names>
          </string-name>
          <string-name>
            <surname>GALICIA</surname>
          </string-name>
          <article-title>: an open platform for lattices</article-title>
          , in Using Conceptual Structures // Contributions to the
          <source>11th International Conference on Conceptual Structures (ICCS'03)</source>
          ,
          <year>2003</year>
          , pp.
          <fpage>241</fpage>
          -
          <lpage>254</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. Tockit:
          <article-title>Framework for Conceptual Knowledge Processing (http://www</article-title>
          .tockit.org)
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Becker</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hereth</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stumme</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <article-title>ToscanaJ: An Open Source Tool for Qualitative Data Analysis</article-title>
          .
          <source>Workshop FCAKDD of the 15th European Conference on Artificial Intelligence (ECAI-2002)</source>
          . Lyon, France,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Lahcen</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kwuida</surname>
            ,
            <given-names>L. Lattice</given-names>
          </string-name>
          <article-title>Miner: A Tool for Concept Lattice Construction and Exploration</article-title>
          .
          <source>Suplementary Proceeding of International Conference on Formal Concept Analysis (ICFCA'10)</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Borza</surname>
            ,
            <given-names>P.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sabou</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sacarea</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <article-title>OpenFCA, an open source formal concept analysis toolbox</article-title>
          .
          <source>IEEE International Conference on Automation Quality and Testing Robotics (AQTR)</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Szathmary</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <article-title>The Coron Data Mining Platform (http://coron</article-title>
          .loria.fr)
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Szathmary</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Napoli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          <string-name>
            <surname>ZART: A Multifunctional Itemset Mining Algorithm</surname>
          </string-name>
          .
          <source>5th International Conference on Concept Lattices and Their Applications (CLA'07)</source>
          , pp.
          <fpage>26</fpage>
          -
          <lpage>37</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Cubist</surname>
          </string-name>
          <article-title>Project (http://www.cubist-project</article-title>
          .
          <source>eu)</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Neznanov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ilvovsky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <article-title>FCART: A New FCA-based System for Data Analysis and Knowledge Discovery</article-title>
          .
          <source>Contributions to the 11th International Conference on Formal Concept Analysis, Dresden</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>31</fpage>
          -
          <lpage>44</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Neznanov</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parinov</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          <article-title>Distributed architecture of data analysis system based on formal concept analysis approach</article-title>
          .
          <source>Studies in Computational Intelligence</source>
          ,
          <volume>616</volume>
          ,
          <year>2016</year>
          , pp.
          <fpage>265</fpage>
          -
          <lpage>271</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Wickham</surname>
            ,
            <given-names>H. Tidy</given-names>
          </string-name>
          <string-name>
            <surname>Data</surname>
          </string-name>
          .
          <source>Journal of Statistical Software</source>
          ,
          <volume>59</volume>
          (
          <issue>10</issue>
          ),
          <year>2014</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>23</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Oracle Data Integrator Enterprise Edition</surname>
          </string-name>
          (http://www.oracle.com/us/products/middleware/data-integration/enterpriseedition/overview/index.html)
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>19. Microsoft PowerBI (http://powerbi.microsoft.com)</mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Rouane</surname>
            ,
            <given-names>M.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nehme</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Valtchev</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Godin</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <article-title>On-line maintenance of iceberg concept lattices</article-title>
          .
          <source>ICCS-2004</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Szathmary</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Valtchev</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Napoli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Godin</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boc</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Makarenkov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <article-title>Fast Mining of Iceberg Lattices: A Modular Approach Using Generators</article-title>
          .
          <source>CLA-2011</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>191</fpage>
          -
          <lpage>206</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>D.T.</given-names>
          </string-name>
          <article-title>A Formal Concept Analysis Approach to Data Mining: The QuICL Algorithm for Fast Iceberg Lattice Construction</article-title>
          .
          <source>Computer and Inf. Science</source>
          ,
          <volume>7</volume>
          (
          <issue>1</issue>
          ),
          <year>2014</year>
          , pp.
          <fpage>10</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Neznanov</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          <article-title>Information Retrieval and Knowledge Discovery with FCART</article-title>
          .
          <source>Proceedings of the Workshop Formal Concept Analysis Meets Information Retrieval (FCAIR</source>
          <year>2013</year>
          ), CEUR-
          <volume>977</volume>
          ,
          <year>2013</year>
          , pp.
          <fpage>74</fpage>
          -
          <lpage>82</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Codd</surname>
            ,
            <given-names>E. F.</given-names>
          </string-name>
          <string-name>
            <surname>Further</surname>
          </string-name>
          <article-title>Normalization of the Data Base Relational Model</article-title>
          .
          <source>Data Base Systems: Courant Computer Science Symposia Series 6</source>
          ,
          <string-name>
            <surname>Prentice-Hall</surname>
          </string-name>
          ,
          <year>1972</year>
          , pp.
          <fpage>33</fpage>
          -
          <lpage>64</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25. ISO Standards catalogue
          <volume>35</volume>
          .040:
          <article-title>Character sets and information coding (http://iso</article-title>
          .org/iso/products/standards/catalogue_ics_browse.
          <source>htm?ICS1=35&amp;ICS2=040)</source>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>26. Open Data Watch (http://opendatawatch.com)</mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27. RFC-
          <volume>3986</volume>
          :
          <article-title>Uniform Resource Identifier (URI): Generic Syntax (https://tools</article-title>
          .ietf.org/html/rfc3986)
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28. RFC-
          <volume>4180</volume>
          :
          <article-title>Common Format and MIME Type for Comma-Separated Values (CSV) Files (http://tools</article-title>
          .ietf.org/html/rfc4180)
          <article-title>The Connection Strings Reference (http://www</article-title>
          .connectionstrings.com)
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Extensible Markup</surname>
          </string-name>
          <article-title>Language (XML) 1.0 (Fifth Edition)</article-title>
          ,
          <source>W3C Recommendation</source>
          ,
          <year>2008</year>
          (https://www.w3.org/TR/REC-xml)
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Standard</surname>
            <given-names>ECMA</given-names>
          </string-name>
          -404
          <source>: The JSON Data Interchange Format</source>
          ,
          <year>2013</year>
          (http://www.ecmainternational.org/publications/standards/Ecma-404.htm)
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31. ISO/IEC 9075-2:2011 Information technology -
          <source>Database languages - SQL - Part</source>
          <volume>2</volume>
          :
          <string-name>
            <surname>Foundation</surname>
          </string-name>
          (https://www.iso.org/obp/ui/#iso:std:iso-iec:
          <volume>9075</volume>
          :-2:ed-4
          <source>:v1:en)</source>
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>Elasticsearch</surname>
          </string-name>
          <article-title>Reference (https://www</article-title>
          .elastic.co/guide/en/elasticsearch/reference)
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>33. MongoDB Reference Glossary (https://docs.mongodb.org/manual/reference/glossary)</mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <article-title>The home of JSON Schema (http://json-schema</article-title>
          .org)
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          35.
          <article-title>The Connection Strings Reference (http://www</article-title>
          .connectionstrings.com)
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          36.
          <article-title>The inverted index description (https://en</article-title>
          .wikipedia.org/wiki/Inverted_index)
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>37. Apache Solr (http://lucene.apache.org/solr)</mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          38.
          <string-name>
            <surname>Neznanov</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          <string-name>
            <surname>Parinov</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          <article-title>Full-text Search in Intermediate Data Storage of FCART</article-title>
          .
          <source>RuZA2015 Workshop</source>
          ,
          <year>2015</year>
          . (http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>1552</volume>
          /paper7.pdf)
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>39. XPath Syntax (http://www.w3schools.com/xsl/xpath_syntax.asp)</mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          40.
          <string-name>
            <surname>JSONPath</surname>
          </string-name>
          (http://goessner.net/articles/JsonPath)
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          41. RFC-7111:
          <article-title>URI Fragment Identifiers for the text/csv Media Type (https://tools</article-title>
          .ietf.org/html/rfc7111)
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>