<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Hybrid Classification Approach based on FCA and Emerging Patterns - An application for the classification of biological inhibitors</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yasmine Asses</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aleksey Buzmakov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Bourquard</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sergei O. Kuznetsov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amedeo Napoli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>LORIA (CNRS - Inria NGE - U. de Lorraine)</institution>
          ,
          <addr-line>Vandoeuvre les Nancy</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Research University Higher School of Economics</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <fpage>211</fpage>
      <lpage>222</lpage>
      <abstract>
        <p>Classification is an important task in data analysis and learning. Classification can be performed using supervised or unsupervised methods. From the unsupervised point of view, Formal Concept Analysis (FCA) can be used for such a task in an efficient and well-founded way. From the supervised point of view, emerging patterns rely on pattern mining and can be used to characterize classes of objects w.r.t. a priori labels. In this paper, we present a hybrid classification method which is based both on supervised and unsupervised aspects. This method relies on FCA for building a concept lattice and then detects the concepts whose extents determines classes of objects sharing the same labels. These classes can then be used as reference classes for classifying unknown objects. This hybrid approach has been used in an experiment in chemistry for classifying inhibitors of the c-Met protein which plays an important role in protein interactions and in the development of cancer.</p>
      </abstract>
      <kwd-group>
        <kwd>Formal Concept Analysis</kwd>
        <kwd>supervised classification</kwd>
        <kwd>unsupervised classification</kwd>
        <kwd>emerging patterns</kwd>
        <kwd>pattern mining</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>In this paper, we present a classification approach based on a combination of
knowledge discovery methods which are all interconnected. This approach has
to guide two processes, classification and prediction, for analyzing the c-Met
receptor protein, a molecule showing an abnormally elevated expression in cancer
disease [1]. Activation of this receptor can be inhibited by different biochemical
compounds (inhibitors). We collected a group of 100 molecules (“complete set
of inhibitors”) which are known to be c-Met inhibitors. Inhibitors act on c-Met
through a “binding pocket” and an associated “binding mode”. We know the
binding modes for 30 inhibitors of the dataset (so called “training set”).
According to the spatial regions involved in the binding pocket, three main binding
modes have been determined: “Type-1”, “DFG-out”, and “C-Helix-out” (the
names are given w.r.t. spatial configuration of proteins). The “Type-1” binding
mode is very mixed, probably meaning that it should be divided into more
specialized modes. Chemists are working on the definition of a fourth binding mode,
close to “Type-1” and termed as “Type-1bis”.
To ensure the best and adapted inhibition, it is important to know the binding
mode of an inhibitor, and this can only be done through chemical experiments,
which are long and expensive. Thus, two main questions arise here:
– Is it possible to classify the complete inhibitor set of 100 molecules according
to the functionality (based on functional groups) and particular
substructures detected in the 30 molecules of the training set?
– Is it possible then to predict the binding mode or “class” of the 70 inhibitors
based on the classification of the complete inhibitors set?</p>
      <p>
        For answering the two questions, we introduce a combined
classification/prediction process involving supervised and unsupervised classification within the
framework of FCA, graph mining and the so-called “Jumping Emerging
Patterns” (JEPs). More precisely, we first want to classify a set of molecules (of
the training set) according to their structure and their functionality (the
functionality determines the behavior of a molecule during reaction and is linked
to special substructures called functional groups). For analyzing the structures
of the molecules in the training set, we consider molecules as graphs and apply
graph mining techniques [2, 3] to extract frequent substructures. Then, these
substructures are used as attributes in a formal context where objects are molecules
of the training set. This formal context is “augmented” in the sense that each
molecule in the training set has a “type” or a “class” according to its binding
mode. A concept lattice is built from the formal concept. Moreover, the class
information is used for characterizing the concepts whose extents include
objects of a single class or binding mode. The intents of these particular concepts
are JEPs. Closed JEPs have already been studied in the framework of FCA
(see [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">4–6</xref>
        ], where they are called JSM-hypotheses). The set of all JEPs forms a
“disjunctive version space” which was related to FCA in [
        <xref ref-type="bibr" rid="ref4">7</xref>
        ].
      </p>
      <p>The last step involves a “hierarchical agglomerative clustering” process. Based
on the knowledge of JEPs and of functional groups, inhibitors are represented as
vectors where components are filled with functional groups and JEPs (55
components where 42 are functional groups and 13 are JEPs). The cosine similarity
is used for building a dendrogram which is used for explaining the “proximity”
of some inhibitors and for predicting the binding mode of inhibitors for which
this information is still unknown.</p>
      <p>This classification process which calls for a variety of knowledge discovery
methods is totally original and is designed for solving a real-world problem. Here,
an original combination of supervised and unsupervised classification works in
relation with graph mining and clustering. This shows also the flexibility of
the FCA process to be combined with other classification methods for giving
actual and substantial results. Experiments are still running but preliminary
results have been reached and show that the approach should be continued and
improved.</p>
      <p>The paper is organized as follows. In Section 2 a motivating example is
introduced. Then Section 3 describes the classification flow. Section 4 introduces the
main definitions on FCA, JEPs and how they are extracted. Section 5 details
(a) Formal Context (b) Molecule Binding Modes
the preparation of the molecular data that are processed with FCA. The
clustering method and its application are following. The main results are discussed
in Section 7 before the conclusion of the paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Running Example</title>
      <p>
        Formal Concept Analysis (FCA) is briefly introduced hereafter. FCA is based
on a formal context which is a triple (G, M, I), where G is a set of objects, M
is a set of attributes and I ⊆ G × M is a relation between G and M [
        <xref ref-type="bibr" rid="ref5">8</xref>
        ].
      </p>
      <p>A running example is shown in Table 1. Molecules are objects which are
described by substructures, corresponding to attributes. The selection of these
particular substructures is discussed later.</p>
      <p>Concept A Set of Molecules (Extent) A Set of Substructures (Intent)
H, CAD, OH, P, AAE, F, O=</p>
      <p>H, P, AAE, F
H, CAD, P, F, O=
H, CAD, AAE, F, O=</p>
      <p>CAD, OH, O=
H, CAD, F, O=</p>
      <p>H, P, F
H, AAE, F
CAD, O=</p>
      <p>H, F</p>
      <p>For every set of molecules A it is possible to find the maximal set of
substructures B, included into every molecule from A. This operation is denoted as (·)′
with B = A′. For example, molecules 319 (BMS WO/2005/117867 24) and ZZY
(UCB Celltech azaindole) include the following substructures: H (Halogen),
c-Met inhibitors
Mining Frequent
Substructures</p>
      <p>Searching
Functional Groups
Description of
the Target Set
Molecules in terms
of Substructures</p>
      <p>Description of
the Training
Set Molecules in
terms of
Functional Groups</p>
      <p>Description of
the Target Set
Molecules in
terms of
Functional Groups</p>
      <p>Description of
the Training Set
Molecules in terms
of Substructures</p>
      <p>FCA
Concept Lattice</p>
      <p>Mining JEPs
A Set of
Substructures Characterizing
the Binding Modes</p>
      <p>Clustering of
the Training Set</p>
      <p>Prediction based on Clustering
domain knowledge and to consider a molecule as a set of functional groups that
are involved into interactions. But some other substructures are also involved
into interactions, which are detected as follows:
1. Molecules from a dataset are considered as graphs, where vertexes correspond
to atoms and edges to bonds between atoms.
2. A graph miming method is used to find all frequent subgraphs, i.e. subgraphs
that belong to a significant part of molecules in the dataset.
3. A formal context is built in the following way:
– Molecules are considered as objects.
– Extracted substructures are considered as attributes.
– A molecule m and a substructure s are related iff the molecule m includes
s as a substructure.
4. JEPs (the sets of attributes that characterize only objects of the same class)
are extracted from the formal context.</p>
      <p>In the supervised classification task, the extracted substructures are used
with functional groups to cluster molecules and to predict the binding mode of
molecules in the test set.</p>
    </sec>
    <sec id="sec-3">
      <title>Jumping Emerging Pattens (JEPs)</title>
      <p>
        JEPs were introduced as a means for classification in itemset mining [
        <xref ref-type="bibr" rid="ref10 ref9">12, 13</xref>
        ], but
the underlying idea had appeared and had been studied much earlier, e.g., within
the framework of disjunctive version spaces [
        <xref ref-type="bibr" rid="ref11 ref12">14, 15</xref>
        ] or JSM-hypotheses. Consider
an “augmented context”, i.e. a context (G, M, I) taken with an additional “class
attribute” giving “class information”, i.e. the class of each object in G. For a
concept (A, B) the set of attributes B is a JEP if every object in A is of the
same class. In Table 1, the set of attributes {F, O=} is a JEP because objects
319 and 320 including these attributes are of the same class “DFG-out”.
      </p>
      <p>Since a JEP characterizes a class of objects, it can be used for analyzing this
class and for guiding a clustering method. Usually, the set of attributes associated
with a single object is trivially a JEP, but there are especially interesting JEPs
characterizing a class of objects. The set of JEPs can be partially ordered w.r.t.
the subset relation: if there are two JEPs J1 and J2 such that J1 is a subset
of J2, then J1 is more general, since it describes all the objects described by
J2 and some other objects. For example, the JEP J1 ={H, CAD, F, O=} is more
general than the JEP J2 ={H, CAD, P, F, O=} since J2 describes object 320while
J1 describes objects 320and 319.</p>
      <p>Relying on the JEP definition, the intent of a formal concept is a JEP if
all objects in the concept extent are in the same class. Thus it is possible to
compute the set of concepts for a given context and then to extract the JEPs by
checking the class of objects in the concept extents. Moreover, the most general
JEPs can be selected for further analysis and for clustering.
5</p>
    </sec>
    <sec id="sec-4">
      <title>Graph Mining</title>
      <p>A molecule is a complex structure composed of atoms connected by bonds, that
can be considered as a graph. Vertexes of the molecule graph correspond to
the atoms of the molecule and are labeled with atom names. The edges of the
molecule graph are labeled with types of bonds between the corresponding atoms.
For applying FCA and for finding a set of JEPs, a molecular graph can be
described as as a set of subgraphs. Then, a formal context can be built with G
as a set of molecules, M as a set of subgraphs or substructures and I the relation
meaning that a molecule g has a substructure m. The problem now is to find
“valid” and “interesting” substructures.</p>
      <p>One way to select valid and interesting substructures is to search for frequent
subgraphs –that often appear in molecular graphs– using graph mining. For a
set of graphs G and a frequency threshold Fmin, a graph s is frequent iff s is a
subgraph of at least Fmin graphs from G, i.e. |{g ∈ G | s ⊆ g}| ≥ Fmin.</p>
      <p>For example, considering the set of molecular graphs G in Figure 5 and
Fmin = 3, the subgraphs “N-H” and “O=C” are frequent as they occur in all
molecular graphs while subgraph “C-OH” only occurring in graph (b) (Figure 5b)
and subgraph “F-C” only occurring in graph (c) (Figure 5c) are not frequent.</p>
      <p>For discovering frequent subgraphs, different graph mining algorithms may
be applied [2, 3]. Here we used gSpan and set Fmin = 10 for the dataset of 100
(a) Imatinib
or Gleevec R
(b) K-252a</p>
      <p>(c) CKK
molecular graphs. This frequency threshold is sufficiently low to have a set of
specific subgraphs characterizing every molecule, and it is sufficiently high to
obtain feasible processing time.</p>
      <p>The set of mined subgraphs can be divided into groups, where a group
consist of a set of subgraphs appearing in the same set of molecular graphs. Thus,
the group forms an equivalence class and can be represented by only one
subgraph. Furthermore, the largest subgraphs preserve the sufficient information on
substructures related to binding modes. In the present experiment, around 106
frequent subgraphs were extracted, then divided into 104 groups.</p>
      <p>It can be noticed that if there are two frequent subgraphs g1 and g2 such
that g1 ⊆ g2 then every closed JEP containing the subgraph g2 contains the
subgraph g1. Thus, if a JEP contains g2, there is no need to consider g1.
6</p>
      <p>
        Hierarchical Agglomerative Clustering (HAC)
Here we describe a hierarchical agglomerative clustering process (see [
        <xref ref-type="bibr" rid="ref13">16</xref>
        ]) based
on the extracted JEPs and background knowledge on functional groups. Molecules
are described by vectors having 55 components, including 42 functional groups3
and 13 JEPs. The 13 JEPs are selected as the most representative for the
molecules in the training set. Each attribute of the vector therefore corresponds
either to a chemical functional group or to a substructures of a JEP with value
set to 1 when this chemical function/substructure is present and null
otherwise. The choice of a proper similarity is crucial for ensuring the quality of the
clustering. Here, the cosine similarity was chosen according to the results of
several specialized studies [
        <xref ref-type="bibr" rid="ref14 ref15">17, 18</xref>
        ]. If m1 and m2 are the description vectors of two
molecules, then ((m1, m2) denotes the scalar product of two vectors):
3 The functional groups were extracted thanks to the specialized algorithm
“Checkmol” http://merian.pch.univie.ac.at/ nhaider/cheminf/cmmm.html.
      </p>
      <p>The “centroid” of a cluster of molecules C, denoted by centr(C), is
calculated as follows:
simcos(m1, m2) =
(m1, m2)
|m1| · |m2|
centr(C) =
1 X mi
|C|</p>
      <p>mi∈C
(1)
(2)
(3)
(4)
(5)</p>
      <p>Similarity between two clusters or between a molecule and a cluster is
calculated with the same formula (1) by substituting the cluster C with its centroid
centr(C).</p>
      <p>The HAC clustering is a bottom-up process working as follows. For every
molecule a unique cluster is created. Actually, all these clusters will be
progressively merged until only one unique cluster remains. Considering at some step
the set of remaining clusters C = {C1, C2, .., Ck}, a new cluster Ck+1 is created
by merging the two clusters Ci and Cj maximizing the similarity measure
between them. The new cluster is added to the set of clusters while Ci and Cj are
deleted from C. Finally, the process stops when only one cluster remains, |C| = 1.
(Ci, Cj ) =</p>
      <p>Ck+1 :=</p>
      <p>C :=</p>
      <p>argmax
Ci,Cj∈C,Ci6=Cj
simcos(centr(Ci), centr(Cj ))</p>
      <p>Ci ∪ Cj</p>
      <p>C ∪ {Ck+1} \ {Ci, Cj }</p>
      <p>The result of HAC is shown on a dendrogram (see Figures 3 and 6). Each
“vertex” of the dendrogram corresponds to a merging step of the algorithm.
The number attached to the vertex represents the similarity between the two
clusters at the lower level. The correlation between chemical similarities and
binding modes is discussed below.
7</p>
    </sec>
    <sec id="sec-5">
      <title>Results and Discussion</title>
      <p>After applying graph mining on the set of molecules, a formal context including
30 objects (molecules) and 104 attributes (substructures) was built. The
cardinalitiy of the sets of most general JEPs for the different binding modes are
distributed as follows:
– 35 JEPs for Type-1 binding mode;
– 1 JEP for DFG-out binding mode;
– 1 JEP for C-Helix-out binding mode;
– 3 JEPs for Type-1bis binding mode.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          4.
          <string-name>
            <surname>Ganter</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Formalizing hypotheses with concepts</article-title>
          .
          <source>In Ganter</source>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Mineau</surname>
          </string-name>
          , G., eds.: Conceptual Structures: Logical, Linguistic, and
          <string-name>
            <given-names>Computational</given-names>
            <surname>Issues</surname>
          </string-name>
          .
          <source>Volume 1867 of Lecture Notes in Computer Science</source>
          . Springer Berlin / Heidelberg (
          <year>2000</year>
          )
          <fpage>342</fpage>
          -
          <lpage>356</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          5.
          <string-name>
            <surname>Blinova</surname>
            ,
            <given-names>V.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dobrynin</surname>
            ,
            <given-names>D.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finn</surname>
            ,
            <given-names>V.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pankratova</surname>
            ,
            <given-names>E.S.</given-names>
          </string-name>
          :
          <article-title>Toxicology analysis by means of the JSM-method</article-title>
          .
          <source>Bioinformatics</source>
          <volume>19</volume>
          (
          <issue>10</issue>
          ) (
          <year>2003</year>
          )
          <fpage>1201</fpage>
          -
          <lpage>1207</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Samokhin</surname>
            ,
            <given-names>M.V.</given-names>
          </string-name>
          :
          <article-title>Learning closed sets of labeled graphs for chemical applications</article-title>
          .
          <source>In: Proceedings of ILP</source>
          . (
          <year>2005</year>
          )
          <fpage>190</fpage>
          -
          <lpage>208</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          7.
          <string-name>
            <surname>Ganter</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Hypotheses and version spaces</article-title>
          . In Ganter, B.,
          <string-name>
            <surname>de Moor</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lex</surname>
          </string-name>
          , W., eds.:
          <article-title>Conceptual Structures for Knowledge Creation and Communication</article-title>
          . Volume
          <volume>2746</volume>
          of Lecture Notes in Computer Science. Springer Berlin / Heidelberg (
          <year>2003</year>
          )
          <fpage>83</fpage>
          -
          <lpage>95</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          8.
          <string-name>
            <surname>Ganter</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wille</surname>
          </string-name>
          , R.:
          <source>Formal Concept Analysis: Mathematical Foundations. 1st edn</source>
          . Springer-Verlag New York, Inc., Secaucus, NJ, USA (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          9.
          <string-name>
            <surname>Ganter</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Two basic algorithms in concept analysis</article-title>
          . In Kwuida, L.,
          <string-name>
            <surname>Sertkaya</surname>
          </string-name>
          , B., eds.:
          <source>Formal Concept Analysis. Volume 5986 of Lecture Notes in Computer Science</source>
          . Springer Berlin / Heidelberg (
          <year>1984</year>
          )
          <fpage>312</fpage>
          -
          <lpage>340</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          10.
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.:</given-names>
          </string-name>
          <article-title>A fast algorithm for computing all intersections of objects in a finite semi-lattice</article-title>
          .
          <source>Automatic documentation and Mathematical linguistics</source>
          <volume>27</volume>
          (
          <issue>5</issue>
          ) (
          <year>1993</year>
          )
          <fpage>11</fpage>
          -
          <lpage>21</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          11.
          <string-name>
            <surname>Merwe</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Obiedkov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kourie</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>AddIntent: a new incremental algorithm for constructing concept lattices</article-title>
          . In Goos, G.,
          <string-name>
            <surname>Hartmanis</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leeuwen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eklund</surname>
          </string-name>
          , P., eds.: Concept Lattices. Volume
          <volume>2961</volume>
          . Springer Berlin / Heidelberg (
          <year>2004</year>
          )
          <fpage>372</fpage>
          -
          <lpage>385</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          12.
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Efficient mining of emerging patterns: discovering trends and differences</article-title>
          .
          <source>In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD '99</source>
          , New York, ACM (
          <year>1999</year>
          )
          <fpage>43</fpage>
          -
          <lpage>52</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          13.
          <string-name>
            <surname>Poezevara</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cuissart</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , Cr´emilleux, B.:
          <article-title>Extracting and summarizing the frequent emerging graph patterns from a dataset of graphs</article-title>
          .
          <source>Journal of Intelligent Information Systems 37 (July</source>
          <year>2011</year>
          )
          <fpage>333</fpage>
          -
          <lpage>353</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          14.
          <string-name>
            <surname>Sebag</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Delaying the choice of bias: A disjunctive version space approach</article-title>
          .
          <source>In: Proceedings of the 13 th International Conference on Machine Learning</source>
          , Morgan Kaufmann (
          <year>1996</year>
          )
          <fpage>444</fpage>
          -
          <lpage>452</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          15.
          <string-name>
            <surname>Nikolaev</surname>
            ,
            <given-names>N.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smirnov</surname>
            ,
            <given-names>E.N.</given-names>
          </string-name>
          :
          <article-title>Stochastically guided disjunctive version space learning</article-title>
          .
          <source>In: Proceedings of the 12th European Conference on Artificial Intelligence</source>
          , John Wiley &amp; Sons, Ltd. (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          16.
          <string-name>
            <surname>Murtagh</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>A survey of recent advances in hierarchical clustering algorithms</article-title>
          .
          <source>The Computer Journal</source>
          <volume>26</volume>
          (
          <issue>4</issue>
          ) (
          <year>November 1983</year>
          )
          <fpage>354</fpage>
          -
          <lpage>359</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          17.
          <string-name>
            <surname>Qian</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sural</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pramanik</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Similarity between euclidean and cosine angle distance for nearest neighbor queries</article-title>
          .
          <source>In: Proceedings of 2004 ACM Symposium on Applied Computing</source>
          , ACM Press (
          <year>2004</year>
          )
          <fpage>1232</fpage>
          -
          <lpage>1237</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          18.
          <string-name>
            <surname>Yamagishi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martins</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neshich</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cai</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shao</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beautrait</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maigret</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>A fast surface-matching procedure for protein-ligand docking</article-title>
          .
          <source>Journal of Molecular Modeling</source>
          <volume>12</volume>
          (
          <issue>6</issue>
          ) (
          <year>2006</year>
          )
          <fpage>965</fpage>
          -
          <lpage>972</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          19.
          <string-name>
            <surname>Kaytoue</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Assaghir</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Napoli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          :
          <article-title>Embedding tolerance relations in formal concept analysis: an application in information fusion</article-title>
          .
          <source>In: Proceedings of the 19th ACM international conference on Information and knowledge management</source>
          .
          <source>CIKM '10</source>
          , New York, NY, USA, ACM (
          <year>2010</year>
          )
          <fpage>1689</fpage>
          -
          <lpage>1692</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>