<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Lazy Learning of Classification Rules for Complex Structure Data</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>National Research University Higher School of Economics</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we address machine learning classification problem and classify each test instance with a set of interpretable and accurate rules. We resort to the idea of lazy classification and mathematical apparatus of formal concept analysis to develop an abstract framework for this task. In a set of benchmarking experiments, we compare the proposed strategy with decision tree learning. We discuss the generalization of the proposed framework for the case of complex structure data such as molecular graphs in tasks such as prediction of biological activity of chemical compounds.</p>
      </abstract>
      <kwd-group>
        <kwd>formal concept analysis</kwd>
        <kwd>lazy classification</kwd>
        <kwd>complex structure</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The classification task in machine learning aims to use some historical data
(training set) to predict unknown discrete variables in unknown data (test set).
While there are dozens of popular methods for solving the classification problem,
usually there is an accuracy-interpretability trade-off when choosing a method
for a particular task. Neural networks, random forests and ensemble techniques
(boosting, bagging, stacking etc.) are known to outperform simple methods in
difficult tasks. Kaggle competitions also bear testimony for that – usually, the
winners resort to ensemble techniques, mainly to gradient boosting [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The
mentioned algorithms are widely spread in those application scenarios where
classification performance is the main objective. In Optical Character
Recognition, voice recognition, information retrieval and many other tasks typically we
are satisfied with a trained model if it has low generalization error.
      </p>
      <p>
        However, in lots of applications we need a model to be interpretable as well as
accurate. Some classification rules, built from data and examined by experts, may
be justified or proved. In medical diagnostics, when making highly responsible
decisions (e.g., predicting whether a patient has cancer), experts prefer to extract
readable rules from a machine learning model in order to “understand” it and
justify the decision. In credit scoring, for instance, applying ensemble techniques
can be very effective, but the model is often obliged to have “sound business
logic”, that is, to be interpretable [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>Another point of interest in this paper is dealing with complex structure data
in classification tasks. While there are various popular techniques for handling
time series, sequences, graph data, we discuss how pattern structures as formal
data representation and lazy associative classification as a learning paradigm
may help to learn succinct classification rules for tasks with complex structure
data.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Definitions</title>
      <p>
        Here we introduce some notions from Formal Concept Analysis [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] which help
us to organize the search space for classification hypotheses.
      </p>
      <p>Definition 1. A formal context in FCA is a triple K = (G; M; I) where G is a
set of objects, M is a set of attributes, and the binary relation I G M shows
which object possesses which attribute. gIm denotes that object g has attribute
m. For subsets of objects and attributes A G and B M Galois operators are
defined as follows:</p>
      <p>A0 = fm 2 M j gIm 8g 2 Ag;</p>
      <p>B0 = fg 2 G j gIm 8m 2 Bg:</p>
      <p>A pair (A; B) such that A G; B M; A0 = B and B0 = A, is called
a formal concept of a context K. The sets A and B are closed and called the
extent and the intent of a formal concept (A; B) respectively.</p>
      <p>
        Example 1. Let us consider a “classical” toy example of a classification task from
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The training set is represented in Table 1. All categorical attributes are
binarized into “dummy” attributes. The table shows a formal context K = (G; M; I)
with G = f1; : : : ; 10g, M = for; oo; os; tc; tm; th; hn; wg (let us omit a class
attribute “play”) and I – a binary relation defined on G M where an element of
a relation is represented with a cross ( ) in a corresponding cell of a table.
      </p>
      <p>A concept lattice for this formal context is depicted in Fig. 1. It should be
read as follows: for a given element (formal concept) of the lattice its intent
(closed set of attributes) is given by all attributes which labels can be reached
in ascending lattice traversal. Similarly, the extent (a closed set of objects) of
a certain lattice element (formal concept) can be traced in a downward lattice
traversal from a given point. For instance, a big blue-and-black circle depicts a
formal concept (f1; 2; 5g; for; tc; hng).</p>
      <p>
        Such concept lattice is a concise way of representing all closed itemsets
(formal concepts’ intents) of a formal context. Closed itemsets, further, can serve as
a condensed representation of classification rules [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. In what follows, we develop
the idea of a hypotheses search space represented with a concept lattice.
2.1
      </p>
      <sec id="sec-2-1">
        <title>Pattern Structures</title>
        <p>
          Pattern structures are natural extension of Formal Concept Analysis to
objects with arbitrary partially-ordered descriptions [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. The order on a set of
descriptions D allows one to define a semilattice (D; u), i.e. for any di; dj ; dk 2 D:
di u di = di; di u dj = dj u di; di u (dj u dk) = (di u dj ) u dk. Please refer to [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]
for details.
        </p>
        <p>
          Definition 2. Let G be a set (of objects), let (D; u) be a meet-semi-lattice (of
all possible object descriptions) and let : G ! D be a mapping between objects
and descriptions. Set (G) := f (g)jg 2 Gg generates a complete subsemilattice
(D ; u) of (D; u), if every subset X of (G) has infimum uX in (D; u).
Pattern structure is a triple (G; D; ), where D = (D; u), provided that the
set (G) := f (g) j g 2 Gg generates a complete subsemilattice (D ; u) [
          <xref ref-type="bibr" rid="ref6 ref8">6,8</xref>
          ].
Definition 3. Patterns are elements of D. Patterns are naturally ordered by
subsumption relation v: given c; d 2 D one has c v d , c u d = c. Operation u
is also called a similarity operation. A pattern structure (G; D; ) gives rise
to the following derivation operators ( ) :
        </p>
        <p>A = l (g)
g2A</p>
        <p>for A 2 G;
d = fg 2 G j d v (g)g
for d 2 (D; u):</p>
        <p>Pairs (A; d) satisfying A
pattern concepts of (G; D; ).</p>
        <p>G; d 2 D; A
= d, and A = d are called
Example 2. Closed sets of graphs can be presented with a pattern structure. Let
f1; 2; 3g be a set of objects, fG1; G2; G3g – be a set of their molecular graphs:
G1 :</p>
        <p>CH3</p>
        <p>NH2
C</p>
        <p>C
NH2</p>
        <p>NH2</p>
        <p>G2 :</p>
        <p>C</p>
        <p>C
NH2</p>
        <p>OH</p>
        <p>NH2</p>
        <p>OH</p>
        <p>G3 :
C
C</p>
        <p>
          A set of objects f1; 2; 3g, their molecular graphs D = fG1; G2; G3g ( (i) =
Gi; i = 1; : : : ; 3), and a similarity operator u defined in [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] comprise a pattern
structure (f1; 2; 3g; (D; u); ).
        </p>
        <p>Here is the set of all pattern concepts for this pattern structure:
{ f1; 2; 3g ;</p>
        <p>NH2 C!</p>
        <p>C ; f1; 2g ;</p>
        <p>CH3 C</p>
        <p>!
C</p>
        <p>NH2
; f1; 3g ;</p>
        <p>NH2 C</p>
        <p>C</p>
        <p>NH2
!
;
f2; 3g ;</p>
        <p>NH2 C OH!</p>
        <p>C</p>
        <p>; (1; fG1g) ; (2; fG2g) ; (3; fG3g) ; (;; fG1; G2; G3g) }:</p>
        <p>
          Cl
Please refer to [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] for clarification of this example.
        </p>
        <p>Further, we show how pattern concept lattices help to organize the search
space for classification hypotheses.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Related work</title>
      <p>Eager (non-lazy) algorithms construct classifiers that contain an explicit
hypothesis mapping unlabelled test instances to their predicted labels. A decision
tree classifier, for example, uses a stored model to classify instances by tracing
the instance through the tests at the interior nodes until a leaf containing the
label is reached. In eager algorithms, the main work is done at the phase of
building a classifier.</p>
      <p>
        In lazy classification paradigm [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], however, no explicit model is constructed,
and the inductive process is done by a classifier which maps each test instance
to a label using a training set.
      </p>
      <p>
        The authors of [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] point the following problem with decision tree learning:
while entropy measures used in C4.5 and ID3 are guaranteed to decrease on
average, the entropy of a specific child may not change or may increase. In other
words, a single decision tree may find a locally optimal hypothesis in terms of
entropy measure such as Gini impurity or pairwise mutual information. But
using a single tree may lead to many irrelevant splits for a given test instance.
A decision tree built for each test instance individually can avoid splits on
attributes that are irrelevant for the specific instance. Thus, such “customized”
decision trees (actually classification paths) built for a specific test instance may
be much shorter and hence may provide a short explanation for the classification.
      </p>
      <p>
        Associative classifiers build a classifier using association rules mined from
training data. Such rules have the class attribute as a conclusion. This approach
was shown to yield improved accuracy over decision trees as they perform a
global search for rules satisfying some quality constraints [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Decision trees,
on the contrary, perform greedy search for rules by selecting the most promising
attributes.
      </p>
      <p>
        Unfortunately, associative classifiers tend to output too many rules while
many of them even might not be used for classification of a test instance. Lazy
associative classification algorithm overcomes these problems of associative
classifiers by generating only the rules with premises being subsets of test instance
attributes [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Thus, in lazy associative classification paradigm only those rules
are generated that might be used in classification of a test instance. This leads
to a reduced set of classification rules for each test instance.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] the authors generalize the lazy associative classification
framework to operate with complex data descriptions such as intervals, sequences,
processes and graphs.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] the authors use concept lattices to represent each concept intent (a
closed set of attributes) as a decision tree node and a concept lattice itself – as
a set of overlapping decision trees. The construction of a decision tree is thus
reduced to selecting one of the downward paths in a concept lattice via some
information criterion.
4
      </p>
      <p>The search for classification hypotheses in a concept
lattice
4.1</p>
      <sec id="sec-3-1">
        <title>Binary-attribute case</title>
        <p>For training and test data represented as binary tables, we propose Algorithm
1.</p>
        <p>For each test instance we leave only its attributes in the training set (steps
1-2 in Algorithm 1). We clarify what it means in case of real-valued attributes
in subsection 4.2.</p>
        <p>
          Then we utilize a modification of the In-Close algorithm [
          <xref ref-type="bibr" rid="ref14 ref15">14,15</xref>
          ] to find all
formal concepts of a formal context with attributes of a test instance (step 3 in
Algorithm 1). We build formal concepts in a top-down manner (increasing the
number of attributes) and backtrack when the cardinality of a formal concept
intent exceeds k. The parameter k refines the length of any possible hypothesis
mined to classify the test instance and is therefore analogous to the depth of
a decision tree. We speed up computing closed attribute sets (formal concept
intents) by storing them in a separate data structure (set S in the pseudocode).
        </p>
        <p>While generating formal concepts, we retain the values of the class attributes
for all training instances having all corresponding attributes (i.e. for all objects
in formal concept extent). We calculate the value of some information criterion
(such as Gini impurity, Gini ratio or pairwise mutual information) for each
formal concept intent (step 4 in Algorithm 1). Retaining the top n concepts with
maximal values of the chosen information criterion, we have a set of rules to
classify the current test instance. For each concept we define a classification rule
with concept intent as a premise and the most common value of class attribute
among the instances of concept extent as a conclusion.</p>
        <p>Finally, we predict the value of the class attribute for the current test instance
simply via majority rule among n “best” classification rules (step 5 in Algorithm
1). Then the calculated formal concept intents are stored (step 6), and the cycle
is repeated for the next test instance.
4.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Numeric-attribute case</title>
        <p>
          In our approach, we deal with numeric attributes similarly to what is done
in C4.5 algorithm [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. We compute percentiles x1; : : : ; x for each numeric
attribute x and introduce 2 new binary attributes in a form “ x x1”, “x &lt; x1”,
: : :, “x x ”, “x &lt; x ”. Let us demonstrate steps 1 and 2 of Algorithm 1 in case
of binary and numeric attributes with a sample from Kaggle “Titanic: Machine
Learning from Disaster” competition dataset.1
Example 3. Fig. 2 shows a sample from the Titanic dataset. Let us from a formal
context to classify a passenger with attributes \P class = 3; SibSp = 0; Age =
34:5”. We use 25 and 50% percentiles of the Age attribute to binarize it. The
corresponding binary table is shown in Table 2.
5
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Example</title>
      <p>Let us illustrate the proposed algorithm with a toy example from Table 1. To
classify the object no. 10, we perform the following steps according to Algorithm
1:
1. Let us fix Gini impurity as an information criterion of interest and the
parameters k = 2 and n = 5. Thus, we are going to classify a test instance</p>
      <sec id="sec-4-1">
        <title>1 https://www.kaggle.com/c/titanic</title>
        <p>Algorithm 1 The proposed algorithm - binary attribute case
Input: Ktrain = (Gtrain; M [ ctrain; Itrain) is a formal context (a training set),
Ktest = (Gtest; M; Itest) is a formal context (a test set);
CbO(K; k) is the algorithm used to find all formal concepts of a formal context K
with intent cardinality not exceeding k;
inf : M [ ctrain ! R is an information criterion used to rate classification rules (such
as Gini impurity, Gini gain or pairwise mutual information);
k is the maximal cardinality of each classification rule’s premise (a parameter);
n is the number of rules to be used for prediction of each test instance’s class attribute
(a parameter);
Output: ctest, predicted values of the class attribute for test instances in Ktest.
S = ;; ctest = []. Initialize a set of formal concept intents (a.k.a. closed itemsets). This
set will be used to form classification rules for each test instance from Gtest. Initialize
a list of predicted labels for test instances.</p>
        <p>for each test instance gt 2 Gtest do
1. Let Mt be a set of attributes of a test instance gt together with the negations
of the attributes not in g0 ;</p>
        <p>t
2. Build a formal context Kt = fGtrain; Mt; Itg where It = I \ (G Mt).
Informally, leave only a part of a context Ktrain with attributes from Mt;
3. With the CbO algorithm and a set S of already computed formal concept
intents, find all formal concepts of a formal context Kt with intent cardinality not
exceeding the value of the parameter k;
4. Meanwhile, calculate the value of the criterion inf for each concept intent and
keep n intents with highest values of the criterion. For each “top-ranked” concept
intent Bi determine ci, the most common class among objects from Bi0. Thus,
form fBi ! cig; i = 1 : : : n, a set of classification rules for gt;
5. Predict the value of the class attribute for gt via a majority rule among
fBi ! cig; i = 1 : : : n. Add it to ctest;
6. Add calculated intents to S.</p>
        <p>end for</p>
        <p>P class! = 1 P class == 3 SibSp == 0 SibSp! = 1 Age</p>
        <p>with 5 rules with at most 2 attributes in premise having highest gain in Gini
impurity.
2. The case “Outlook=sunny, Temperature=cool, Humidity=high, Windy=false”
corresponds to a set of attributes fos; tcg describing the test instance. Or,
if we consider the negations of the attributes, such case is described with a
set of attributes: for; oo; os; tc; tm; th; hn; wg.
3. We build a formal context with objects being the training set instances and
attributes of a test instance – for; oo; os; tc; tm; th; hn; wg. The
corresponding binary table is shown in Table 3.
4. The diagram of the for the formal context given by Table 3 is shown in Fig.
3. The horizontal line separates the concepts with intents having at most 2
attributes.
5. 13 formal concepts with intents having at most 2 attributes give rise to 13
classification rules. Top 5 rules having the highest gain in Gini impurity are
given in Table 4.
6. The “best” rules mined in the previous step unanimously classify the test
instance “Outlook=sunny, Temperature=cool, Humidity=high, Windy=false”
as appropriate for playing tennis.
6</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experiments</title>
      <p>
        We compare the proposed classification algorithm (“PCL” for Pattern
Concept Lattice based classification) with the results from [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] on several datasets
from the UCI machine learning repository.2
      </p>
      <p>We used pairwise mutual information as a criterion for rule selection.
Parameters k 2 f3; : : : 7g and n 2 f1; : : : 5g were chosen via 5-fold cross validation.
The described algorithms were implemented in Python 2.7.3 on a dual-core CPU
(Core i3-370M, 2.4 GHz) with 3.87 GB RAM.</p>
      <p>
        The algorithm was also tested on a 2001 Predictive Toxicology Challenge
(PTC) dataset.3 Please refer to [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] for the description of the problem and
some notions on pattern structures with descriptions given by labeled graphs.
Here we compare the results of the proposed algorithm (Pattern Concept
Latticebased classification) and the previously developed graphlet-based lazy associative
classification on the PTC dataset. The results are shown in Table 6.
      </p>
      <p>To clarify, in both algorithms k-graphlet (parameter “K nodes” in Table 6)
graph intersections were build. In “GLAC”, each test instance is classified via
voting among all classification hypotheses. In “PCL”, only n best (according to some</p>
      <sec id="sec-5-1">
        <title>3 http://www.predictive-toxicology.org/ptc/</title>
        <p>information criterion) closed hypotheses are chosen (here we used n=5). As we
can see, “PCL” works slightly better with this dataset suggesting that choosing
“best” hypotheses for classification may lead to more accurate classification.
7</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion and further work</title>
      <p>In this paper, we have shown how searching for classification hypotheses in
a formal concept lattice for each test instance individually may yield accurate
results while providing succinct classification rules. The proposed strategy is
computationally demanding but may be used for “small data” problems where
prediction delay is not as important as classification accuracy and
interpretability.</p>
      <p>Further we plan to interpret random forests as a search for an optimal
hypothesis in a concept lattice and try to compete with this popular classification
technique.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Grigorios</given-names>
            <surname>Tsoumakas</surname>
          </string-name>
          , Apostolos Papadopoulos, Weining Qian, Stavros Vologiannidis, Alexander D'yakonov, Antti Puurula, Jesse Read, Jan Svec, and Stanislav Semenov, “
          <article-title>Wise 2014 challenge: Multi-label classification of print media articles to topics,”</article-title>
          <source>in 15th International Conference on Web Information Systems Engineering (WISE</source>
          <year>2014</year>
          ).
          <source>Proceedings Part II. October 12-14</source>
          <year>2014</year>
          , vol.
          <volume>8787</volume>
          of Lecture Notes in Computer Science, pp.
          <fpage>541</fpage>
          -
          <lpage>548</lpage>
          , Springer.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhong</surname>
          </string-name>
          , “
          <article-title>An overview of personal credit scoring: Techniques and future work</article-title>
          ,”
          <source>International Journal of Intelligence Science</source>
          , vol.
          <volume>2</volume>
          , no.
          <issue>4A</issue>
          , pp.
          <fpage>181</fpage>
          -
          <lpage>189</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Bernhard</given-names>
            <surname>Ganter</surname>
          </string-name>
          and
          <string-name>
            <given-names>Rudolf</given-names>
            <surname>Wille</surname>
          </string-name>
          ,
          <source>Formal Concept Analysis: Mathematical Foundations</source>
          , Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1st edition,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Thomas</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Mitchell</surname>
          </string-name>
          , Machine Learning,
          <string-name>
            <surname>McGraw-Hill</surname>
          </string-name>
          , Inc., New York, NY, USA,
          <volume>1</volume>
          <fpage>edition</fpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Itamar</given-names>
            <surname>Hata</surname>
          </string-name>
          , Adriano Veloso, and Nivio Ziviani, “
          <article-title>Learning accurate and interpretable classifiers using optimal multi-criteria rules</article-title>
          .,
          <source>” JIDM</source>
          , vol.
          <volume>4</volume>
          , no.
          <issue>3</issue>
          , pp.
          <fpage>204</fpage>
          -
          <lpage>219</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Bernhard</given-names>
            <surname>Ganter</surname>
          </string-name>
          and Sergei Kuznetsov, “
          <article-title>Pattern Structures and Their Projections,” in Conceptual Structures: Broadening the Base, Harry Delugach</article-title>
          and Gerd Stumme, Eds., vol.
          <volume>2120</volume>
          of Lecture Notes in Computer Science, pp.
          <fpage>129</fpage>
          -
          <lpage>142</lpage>
          . Springer, Berlin/Heidelberg,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Sergei O. Kuznetsov</surname>
          </string-name>
          , “
          <article-title>Fitting pattern structures to knowledge discovery in big data,” in Formal Concept Analysis: 11th International Conference</article-title>
          , ICFCA 2013, Dresden, Germany, May 21-24,
          <year>2013</year>
          . Proceedings, Peggy Cellier, Felix Distel, and Bernhard Ganter, Eds., Berlin, Heidelberg,
          <year>2013</year>
          , pp.
          <fpage>254</fpage>
          -
          <lpage>266</lpage>
          , Springer Berlin Heidelberg.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Sergei O. Kuznetsov</surname>
          </string-name>
          , “
          <article-title>Scalable Knowledge Discovery in Complex Data with Pattern Structures,” in PReMI</article-title>
          , Pradipta Maji, Ashish Ghosh,
          <string-name>
            <given-names>M. Narasimha</given-names>
            <surname>Murty</surname>
          </string-name>
          , Kuntal Ghosh, and Sankar K. Pal, Eds.
          <year>2013</year>
          , vol.
          <volume>8251</volume>
          of Lecture Notes in Computer Science, pp.
          <fpage>30</fpage>
          -
          <lpage>39</lpage>
          , Springer.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Yury</given-names>
            <surname>Kashnitsky</surname>
          </string-name>
          and
          <string-name>
            <surname>Sergei O. Kuznetsov</surname>
          </string-name>
          , “
          <article-title>Lazy associative graph classification</article-title>
          ,”
          <source>in Proceedings of the 4th International Workshop "What can FCA do for Artificial Intelligence?"</source>
          ,
          <year>FCA4AI 2015</year>
          ,
          <article-title>co-located with the</article-title>
          <source>International Joint Conference on Artificial Intelligence (IJCAI</source>
          <year>2015</year>
          ), Buenos Aires, Argentina, July
          <volume>25</volume>
          ,
          <year>2015</year>
          .,
          <year>2015</year>
          , pp.
          <fpage>63</fpage>
          -
          <lpage>74</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. David W. Aha, Ed.,
          <source>Lazy Learning</source>
          , Kluwer Academic Publishers, Norwell, MA, USA,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Jerome H. Friedman</surname>
          </string-name>
          , “
          <article-title>Lazy decision trees</article-title>
          ,
          <source>” in Proceedings of the Thirteenth National Conference on Artificial Intelligence -</source>
          Volume
          <volume>1</volume>
          .
          <year>1996</year>
          , AAAI'
          <fpage>96</fpage>
          , pp.
          <fpage>717</fpage>
          -
          <lpage>724</lpage>
          , AAAI Press.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Adriano</surname>
            <given-names>Veloso</given-names>
          </string-name>
          , Wagner Meira Jr., and
          <string-name>
            <surname>Mohammed J. Zaki</surname>
          </string-name>
          , “Lazy Associative Classification,”
          <source>in Proceedings of the Sixth International Conference on Data Mining</source>
          , Washington, DC, USA,
          <year>2006</year>
          , ICDM '06, pp.
          <fpage>645</fpage>
          -
          <lpage>654</lpage>
          , IEEE Computer Society.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Radim</surname>
            <given-names>Belohlavek</given-names>
          </string-name>
          , Bernard De Baets, Jan Outrata, and Vilem Vychodil, “
          <article-title>Inducing decision trees via concept lattices</article-title>
          ,”
          <source>International Journal of General Systems</source>
          , vol.
          <volume>38</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>455</fpage>
          -
          <lpage>467</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Sergei</surname>
          </string-name>
          . O. Kuznetsov, “
          <article-title>A fast algorithm for computing all intersections of objects from an arbitrary semilattice,” Nauchno-Tekhnicheskaya Informatsiya Seriya 2- Informatsionnye Protsessy I Sistemy,</article-title>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>17</fpage>
          -
          <lpage>20</lpage>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15. S. Andrews, “
          <article-title>In-close, a fast algorithm for computing formal concepts</article-title>
          ,
          <source>” in CEUR Workshop Proceedings</source>
          ,
          <year>2009</year>
          , vol.
          <volume>483</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>J. Ross</surname>
            <given-names>Quinlan</given-names>
          </string-name>
          ,
          <year>C4</year>
          .
          <article-title>5: Programs for Machine Learning</article-title>
          , Morgan Kaufmann Publishers Inc., San Francisco, CA, USA,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Sergei</surname>
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Kuznetsov</surname>
            and
            <given-names>Mikhail V.</given-names>
          </string-name>
          <string-name>
            <surname>Samokhin</surname>
          </string-name>
          , “
          <article-title>Learning Closed Sets of Labeled Graphs for Chemical Applications,” in ILP, Stefan Kramer</article-title>
          and Bernhard Pfahringer, Eds.
          <year>2005</year>
          , vol.
          <volume>3625</volume>
          of Lecture Notes in Computer Science, pp.
          <fpage>190</fpage>
          -
          <lpage>208</lpage>
          , Springer.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>