<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Global Optimization in Learning with Important Data: an FCA-Based Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yury Kashnitsky</string-name>
          <email>ykashnitsky@hse.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sergei O. Kuznetsov</string-name>
          <email>skuznetsov@hse.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Research University Higher School of Economics</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Nowadays decision tree learning is one of the most popular classification and regression techniques. Though decision trees are not accurate on their own, they make very good base learners for advanced tree-based methods such as random forests and gradient boosted trees. However, applying ensembles of trees deteriorates interpretability of the final model. Another problem is that decision tree learning can be seen as a greedy search for a good classification hypothesis in terms of some information-based criterion such as Gini impurity or information gain. But in case of small data sets the global search might be possible. In this paper, we propose an FCA-based lazy classification technique where each test instance is classified with a set of the best (in terms of some information-based criterion) rules. In a set of benchmarking experiments, the proposed strategy is compared with decision tree and nearest neighbor learning.</p>
      </abstract>
      <kwd-group>
        <kwd>Formal Concept Analysis</kwd>
        <kwd>lazy learning</kwd>
        <kwd>global optimization</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The classification task in machine learning aims to use some historical data
(a training set) to predict unknown discrete variables in unknown data (a test
set). While there are dozens of popular methods for solving the classification
problem, usually there is an accuracy-interpretability trade-off when choosing
a method for a particular task. Neural networks, random forests and ensemble
techniques (boosting, bagging, stacking etc.) are known to outperform simple
methods in difficult tasks. Kaggle competitions also bear testimony for that –
usually, winners resort to ensemble techniques, mainly to gradient boosting [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
The mentioned algorithms are widely spread in those application scenarios where
classification performance is the main objective. In Optical Character
Recognition, voice recognition, information retrieval and many other tasks typically we
are satisfied with a trained model if it has a low generalization error.
      </p>
      <p>
        However, in lots of applications we need a model to be interpretable as well
as accurate. Some classification rules, built from data and examined by experts,
may be justified or proved. In medical diagnostics, when making highly
responsible decisions, e.g., predicting whether a patient has cancer (i.e., dealing with
“important data”), experts prefer to extract readable rules from a machine
learning model in order to “understand” it and justify the decision. In credit scoring,
for instance, applying ensemble techniques can be very effective, but the model
is often obliged to have “sound business logic”, that is, to be interpretable [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>Eager (non-lazy) algorithms construct classifiers that contain an explicit
hypothesis mapping unlabelled test instances to their predicted labels. A decision
tree classifier, for example, uses a stored model to classify instances by tracing
the instance through the tests at the interior nodes until a leaf containing the
label is reached. In eager algorithms, the main work is done at the phase of
building a classifier.</p>
      <p>
        In lazy classification paradigm [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], however, no explicit model is constructed,
and the inductive process is done by a classifier which maps each test instance
to a label using a training set.
2.1
      </p>
      <sec id="sec-2-1">
        <title>Lazy decision trees</title>
        <p>
          The authors of [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] point the following problem with decision tree learning:
while entropy measures used in C4.5 and ID3 are guaranteed to decrease on
average, the entropy of a specific child may not change or may increase. In other
words, a single decision tree may find a locally optimal hypothesis in terms of
entropy measure such as Gini impurity or pairwise mutual information. But
using a single tree may lead to many irrelevant splits for a given test instance.
A decision tree built for each test instance individually can avoid splits on
attributes that are irrelevant for the specific instance. Thus, such “customized”
decision trees (actually classification paths) built for a specific test instance may
be much shorter and hence may provide a short explanation for the classification.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Lazy associative classification</title>
        <p>
          Associative classifiers build a classifier using association rules mined from
training data. Such rules have the class attribute as a conclusion. This approach
was shown to yield improved accuracy over decision trees as they perform a
global search for rules satisfying some quality constraints [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Decision trees, on
the contrary, perform greedy search for rules by selecting the most promising
attributes.
        </p>
        <p>
          Unfortunately, associative classifiers tend to output too many rules while
many of them even might not be used for classification of a test instance. Lazy
associative classification algorithm overcomes these problems of associative
classifiers by generating only the rules with premises being subsets of test instance
attributes [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Thus, in lazy associative classification paradigm only those rules
are generated that might be used in classification of a test instance. This leads
to a reduced set of classification rules for each test instance.
2.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Decision trees in terms of Formal Concept Analysis</title>
        <p>
          In [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] the authors utilize concept lattices to represent each concept intent (a
closed set of attributes) as a decision tree node and a concept lattice itself – as
a set of overlapping decision trees. The construction of a decision tree is thus
reduced to selecting one of the downward paths in a concept lattice via some
information criterion.
2.4
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>Lazy classification for complex structure data</title>
        <p>
          The modification of the lazy classification algorithm capable of handling
complex structure data was first proposed in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The main difference from the Lazy
Associative Classification algorithm is that the method is designed to analyze
arbitrary objects with complex descriptions (intervals, sequences, graphs etc.).
This setting was implemented for interval credit scoring data [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] and for graphs
in a toxicology prediction task [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Definitions</title>
      <p>
        Here we introduce some notions from Formal Concept Analysis [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] which
help us to organize the search space for classification hypotheses.
Definition 1. A formal context in FCA is a triple K = (G; M; I) where G is a
set of objects, M is a set of attributes, and the binary relation I G M shows
which object possesses which attribute. gIm denotes that object g has attribute
m. For subsets of objects and attributes A G and B M Galois operators are
defined as follows:
      </p>
      <p>A0 = fm 2 M j gIm 8g 2 Ag;</p>
      <p>B0 = fg 2 G j gIm 8m 2 Bg:</p>
      <p>A pair (A; B) such that A G; B M; A0 = B and B0 = A, is called
a formal concept of a context K. The sets A and B are closed and called the
extent and the intent of a formal concept (A; B) respectively.</p>
      <p>Example 1. Let us consider a “classical” toy example of a classification task. The
training set is represented in Table 1. All categorical attributes are binarized
into “dummy” attributes. The table shows a formal context K = (G; M; I) with
G = f1; : : : ; 10g, M = for; oo; os; tc; tm; th; hn; wg (let us omit a class attribute
“play”) and I – a binary relation defined on G M where an element of a relation
is represented with a cross ( ) in a corresponding cell of a table.</p>
      <p>A concept lattice for this formal context is depicted to the right from1. It
should be read as follows: for a given element (formal concept) of the lattice
its intent (closed set of attributes) is given by all attributes which labels can
be reached in ascending lattice traversal. Similarly, the extent (a closed set of
objects) of a certain lattice element (formal concept) can be traced in a downward
lattice traversal from a given point. For instance, a big blue-and-black circle
depicts a formal concept (f1; 2; 5g; for; tc; hng).</p>
      <p>
        Such concept lattice is a concise way of representing all closed itemsets
(formal concepts’ intents) of a formal context. Closed itemsets, further, can serve
as a condensed representation of classification rules [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. In what follows, we
develop the idea of a hypotheses search space represented with a concept lattice.
      </p>
    </sec>
    <sec id="sec-4">
      <title>Concept lattice a hypothesis search space</title>
      <p>Further we describe and illustrate the proposed approach in binary- and
numeric-attribute cases when dealing with binary classification. The approach
is naturally extended to multiclass case with the corresponding adjustments to
information criteria formulas.
4.1</p>
      <sec id="sec-4-1">
        <title>Binary-attribute case</title>
        <p>In case of training and test data represented as binary tables, the proposed
algorithm is described as Algorithm 1.</p>
        <p>
          Let Ktrain = (Gtrain; M0 [ M 0 [ ctrain; Itrain) and Ktest = (Gtrain; M0 [
M 0; Itest) be formal contexts representing a training set and a test set
correspondingly. We state clearly that the set of attributes is dichotomized:
M = M0 [ M 0 where 8g 2 Gtrain; m 2 M0 9 m 2 M 0 : gItrainm ! :gItrainm.
Let CbO(K; min_supp) be the algorithm used to find all formal concepts of a
formal context K with support greater or equal to min_supp (by default we use
a modification of the InClose-2 program implementation [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] of the CloseByOne
algorithm [13]). Let inf : M [ ctrain ! R be an information criterion used to
rate classification rules (we use Gini impurity by default). Finally, let min_supp
and n_rules be the parameters of the algorithm (the minimal support of each
classification rule’s premise and the number of rules to be used for prediction of
each test instance’s class attribute).
        </p>
        <p>With these designations, the main steps of the proposed algorithm for each
test instance are the following:
1. For each test object we leave only its attributes in the training set (step 1
in Algorithm 1). Or, formally, we build a new formal context
Kt = fGtrain; gt0 ; Itraing with the same objects Gtrain as in the training
context Ktrain and with attributes of a test object gt0 [ ctrain. We clarify
what it means in case of real-valued attributes in subsection 4.2.
2. With CbO(K; min_supp), find all formal concepts of a formal context Kt
satisfying the constraint on minimal support. We build formal concepts in a
top-down manner (increasing the number of attributes) and backtrack when
the support of a formal concept intent is less than min_supp. The parameter
min_supp refines the support of any possible hypothesis mined to classify
the test object and is therefore analogous to the parameter
min_samples_leaf of a decision tree. While generating formal concepts, we
keep track of the values of the class attributes for all training objects having
all corresponding attributes (i.e. for all objects in formal concept extent).
We calculate the value of an information criterion inf (we use Gini impurity
by default) for each formal concept intent.
3. Then the mined formal concepts are sorted by the value of the criterion inf
from the “best” to the “worse”.
4. Retaining first n concepts with the best values of the chosen information
criterion, we have a set of rules to classify the current test object. For each
concept we define a classification rule with concept intent as an antecedent
and the most common value of class attribute among the objects of concept
extent as a consequent.
5. Finally, we predict the value of the class attribute for current test object
simply via majority rule among n “best” classification rules’ antecedents. We
also save the rules for each test object in a dictionary rtest.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Numeric-attribute case</title>
        <p>In our approach, we deal with numeric attributes similarly to what is done
in the CART algorithm [14]. We sort the values of a numeric attribute and
identify the thresholds to binarize numeric attributes where the target attribute
changes. Let us demonstrate step 1 of Algorithm 1 in case of binary and numeric
attributes with a sample from Kaggle “Titanic: Machine Learning from Disaster”
competition dataset.1</p>
        <sec id="sec-4-2-1">
          <title>1 https://www.kaggle.com/c/titanic</title>
          <p>Algorithm 1 Lazy Lattice-based Optimization (LLO)
Input: Ktrain = (Gtrain; M0 [ M 0 [ ctrain; Itrain)
Ktest = (Gtest; M0 [ M 0; Itest)
min_supp 2 R+; nrules 2 N;
CbO(K; min_supp) : K ! S;
sort(S; inf ) : S ! S
inf : M [ ctrain ! R;
Output: ctest; rtest
ctest = ;; rtest = ;
for gt 2 Gtest do
1. Kt = fGtrain; gt0; Itraing
2. St = f(A; B) j A Gtrain; B
CbO(Kt; min_supp)
3. St = sort(St; inf )
4. fBigi2[1;nrules] = fBj j (Aj; Bj ) 2 Stg; j 2 [1; n_rules]
5. ci = argmax(fcount(ctrainj ) j j 2 Bi0g)
6. rtest[i] = fBi ! cig; i = 1; : : : ; nrules
7. ctest[i] = argmax(fcount(cj ) j j = 1; : : : ; nrules)g
end for
gt0; A0 = B; B0 = A; GtjrAajin
min_suppg =
Example 2. Table 2 shows a sample from the Titanic dataset. Let us build a
formal context to classify passenger no. 7 with attributes Pclass=2, Age=28,
City=C. If we sort the data by age in ascending order we see where the target
attribute “Survived” switches from 0 to 1 or vice versa.</p>
          <p>Age 16 18 30 39 42 62
Survived 1 0 0 1 0 1</p>
          <p>Thus we have a set of thresholds to discretize the attribute “Age”:
T = f17; 34:5; 40:5; 52g: The formal context K7 (corresponding to Kt for t = 7
in Algorithm 1) is presented in Table 3.
4.3</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>Complexity</title>
        <p>The algorithm is based on the CloseByOne lattice-building algorithm with
time complexity shown [15] to be equal to O(jGjjM j2jLj) for a formal context
(G; M; I) and a corresponding lattice L. To put it simply, the complexity is linear
in the number of objects, quadratic in the number of attributes and linear in the
number of built formal concepts.</p>
        <p>In the proposed algorithm CloseByOne is run for each test object (step 3
in Algorithm 1), and for each formal concept information criterion values are
calculated. Calculating entropy or Gini index is linear in the number of objects
as it requires calculating supports of attribute sets. This is done “on-the-go”
while building a lattice (step 4 in Algorithm 1).</p>
        <p>Therefore, the time complexity of classifying jGtj test instances with the
proposed algorithm based on a training formal context (G; M; I) is approximately
O(jGtjjGjjM j2jLj) where jLj is an average lattice size for formal contexts
described in step 2 in Algorithm 1.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Example</title>
      <p>Let us illustrate the proposed algorithm with a toy example from Table 1. To
classify the object no. 10, we do the following steps according to Algorithm 1:
1. Let us fix Gini impurity as an information criterion of interest and the
parameters min_supp = 0:5 and n = 3. Thus, we are going to classify a test
instance with 3 rules supporting at least 5 objects and having highest gain
in Gini impurity.
2. The case Outlook=sunny, Temperature=cool, Humidity=high, Windy=false
corresponds to a set of attributes fos; tc; hh; wg describing the test instance.
Or, if we consider the negations of the attributes, such case is described with
a set of attributes: for; oo; os; tc; tm; th; hn; wg
3. We build a formal context with objects being the training set instances and
attributes of a test instance – for; oo; os; tc; tm; th; hn; wg. The
corresponding binary table is shown in Table 4.
4. A concept lattice, organizing all formal concepts for a formal context is shown
to the right from Table 4. The horizontal line separates the concepts with
extents having at least 5 objects (above, min_supp 0:5).
5. 9 formal concepts satisfying min_supp 0:5 give rise to 9 classification
rules. Top 3 rules having the highest gain in Gini impurity are given in
Table 5.
6. The “best” rules mined in the previous step unanimously classify the test
instance Outlook=sunny, Temperature=cool, Humidity=high, Windy=false
as appropriate for playing tennis.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Experiments</title>
      <p>As we have stated, in this paper we deal with “important data” problems,
those where accurate and interpretable results are needed. We compare the
proposed classification algorithm (denoted as LLO for “Lazy Lattice-based
Optimization”) with Scikit-learn [16] implementations of CART [14] and kNN on
several datasets from the UCI machine learning repository.2</p>
      <p>We used pairwise mutual information as a criterion for rule selection. CART
and kNN parameters were chosen in stratified 5-fold cross-validation and are
given in Table 7.</p>
      <p>Parameter min_supp for LLO was taken equal to CART min_sample_leaf
for each dataset divided by the number of objects. We used n = 5 classification
rules to vote for a test instance label.</p>
      <p>As it can be seen, the proposed approach performs better than CART on
most of the datasets while kNN is often better when the number of attributes is
not high. Obviously, the running times of LLO are far from perfect. That is due
to the computationally demanding nature of the algorithm.</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusions and further work</title>
      <p>In this paper, we have shown how searching for classification hypotheses in
a formal concept lattice for each test instance individually may yield accurate</p>
      <sec id="sec-7-1">
        <title>2 http://repository.seasr.org/Datasets/UCI/csv/</title>
        <p>results while keeping the classification model interpretable. The proposed
strategy is computationally demanding but may be used for “small data” problems
where prediction delay is not as important as classification accuracy and
interpretability.</p>
        <p>Further we plan to implement the idea of searching for classification
hypotheses in a concept lattice for complex structure data such as molecular graphs We
plan to implement the same strategy of lazy classification by searching for
succinct classification rules in a pattern concept lattice. The designed framework
might help to learn sets of rules for tasks such as biological activity (toxicology,
mutagenicity, etc.) prediction. We are also going to interpret random forests as
a search for an optimal hypothesis in a concept lattice and try to compete with
this popular classification method.
13. Kuznetsov, S.O.: A fast algorithm for computing all intersections of objects from
an arbitrary semilattice. Nauchno-Tekhnicheskaya Informatsiya Seriya 2 –
Informatsionnye protsessy i sistemy (1) (1993) 17–20
14. Breiman, L., Friedman, J., Stone, C., Olshen, R.: Classification and Regression
Trees. The Wadsworth and Brooks-Cole statistics-probability series. Taylor &amp;
Francis (1984)
15. Kuznetsov, S.O., Obiedkov, S.A.: Comparing performance of algorithms for
generating concept lattices. Journal of Experimental &amp; Theoretical Artificial Intelligence
14(2-3) (2002) 189–216
16. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,
Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine
Learning in Python. Journal of Machine Learning Research 12 (2011) 2825–2830</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Tsoumakas</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Papadopoulos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qian</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vologiannidis</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , D'yakonov,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Puurula</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Read</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Svec</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Semenov</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          :
          <article-title>Wise 2014 challenge: Multi-label classification of print media articles to topics</article-title>
          .
          <source>In: 15th International Conference on Web Information Systems Engineering (WISE</source>
          <year>2014</year>
          ).
          <source>Proceedings Part II. Volume 8787 of Lecture Notes in Computer Science</source>
          .,
          <source>Springer (October 12-14</source>
          <year>2014</year>
          )
          <fpage>541</fpage>
          -
          <lpage>548</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhong</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>An overview of personal credit scoring: Techniques and future work</article-title>
          .
          <source>International Journal of Intelligence Science</source>
          <volume>2</volume>
          (
          <issue>4A</issue>
          ) (
          <year>2012</year>
          )
          <fpage>181</fpage>
          -
          <lpage>189</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Aha</surname>
          </string-name>
          , D.W., ed.:
          <source>Lazy Learning</source>
          . Kluwer Academic Publishers, Norwell, MA, USA (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Friedman</surname>
            ,
            <given-names>J.H.</given-names>
          </string-name>
          :
          <article-title>Lazy decision trees</article-title>
          .
          <source>In: Proceedings of the Thirteenth National Conference on Artificial Intelligence - Volume 1. AAAI'96</source>
          , AAAI Press (
          <year>1996</year>
          )
          <fpage>717</fpage>
          -
          <lpage>724</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Veloso</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Meira</given-names>
            <surname>Jr</surname>
          </string-name>
          .,
          <string-name>
            <given-names>W.</given-names>
            ,
            <surname>Zaki</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.J.</surname>
          </string-name>
          :
          <article-title>Lazy Associative Classification</article-title>
          .
          <source>In: Proceedings of the Sixth International Conference on Data Mining. ICDM '06</source>
          , Washington, DC, USA, IEEE Computer Society (
          <year>2006</year>
          )
          <fpage>645</fpage>
          -
          <lpage>654</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Belohlavek</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Baets</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Outrata</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vychodil</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Inducing decision trees via concept lattices</article-title>
          .
          <source>International Journal of General Systems</source>
          <volume>38</volume>
          (
          <issue>4</issue>
          ) (
          <year>2009</year>
          )
          <fpage>455</fpage>
          -
          <lpage>467</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          :
          <article-title>Scalable knowledge discovery in complex data with pattern structures</article-title>
          . In Maji, P.,
          <string-name>
            <surname>Ghosh</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murty</surname>
            ,
            <given-names>M.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghosh</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pal</surname>
          </string-name>
          , S.K., eds.:
          <source>PReMI</source>
          . Volume
          <volume>8251</volume>
          of Lecture Notes in Computer Science., Springer (
          <year>2013</year>
          )
          <fpage>30</fpage>
          -
          <lpage>39</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Masyutin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kashnitsky</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          :
          <article-title>Lazy classification with interval pattern structures: Application to credit scoring</article-title>
          .
          <source>In: CEUR Workshop Proceedings</source>
          . Volume
          <volume>1430</volume>
          . (
          <year>2015</year>
          )
          <fpage>43</fpage>
          -
          <lpage>54</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kashnitsky</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          :
          <article-title>Lazy associative graph classification</article-title>
          .
          <source>In: CEUR Workshop Proceedings</source>
          . Volume
          <volume>1430</volume>
          . (
          <year>2015</year>
          )
          <fpage>63</fpage>
          -
          <lpage>74</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Ganter</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wille</surname>
          </string-name>
          , R.:
          <source>Formal Concept Analysis: Mathematical Foundations. 1st edn</source>
          . Springer-Verlag New York, Inc., Secaucus, NJ, USA (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Hata</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Veloso</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ziviani</surname>
          </string-name>
          , N.:
          <article-title>Learning accurate and interpretable classifiers using optimal multi-criteria rules</article-title>
          .
          <source>JIDM</source>
          <volume>4</volume>
          (
          <issue>3</issue>
          ) (
          <year>2013</year>
          )
          <fpage>204</fpage>
          -
          <lpage>219</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Andrews</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>In-close2, a high performance formal concept miner</article-title>
          .
          <source>In: Conceptual Structures for Discovering Knowledge - 19th International Conference on Conceptual Structures, ICCS</source>
          <year>2011</year>
          ,
          <article-title>Derby, UK</article-title>
          . Proceedings. (
          <year>2011</year>
          )
          <fpage>50</fpage>
          -
          <lpage>62</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>