A Hybrid Classification Approach based on FCA
  and Emerging Patterns - An application for the
       classification of biological inhibitors

          Yasmine Asses1 , Aleksey Buzmakov1,2 , Thomas Bourquard1 , Sergei O.
                            Kuznetsov2 , and Amedeo Napoli1
      1
              LORIA (CNRS - Inria NGE - U. de Lorraine), Vandoeuvre les Nancy, France
          2
               National Research University Higher School of Economics, Moscow, Russia


               Abstract. Classiﬁcation is an important task in data analysis and learn-
               ing. Classiﬁcation can be performed using supervised or unsupervised
               methods. From the unsupervised point of view, Formal Concept Analysis
               (FCA) can be used for such a task in an eﬃcient and well-founded way.
               From the supervised point of view, emerging patterns rely on pattern
               mining and can be used to characterize classes of objects w.r.t. a priori
               labels. In this paper, we present a hybrid classiﬁcation method which is
               based both on supervised and unsupervised aspects. This method relies
               on FCA for building a concept lattice and then detects the concepts
               whose extents determines classes of objects sharing the same labels.
               These classes can then be used as reference classes for classifying un-
               known objects. This hybrid approach has been used in an experiment in
               chemistry for classifying inhibitors of the c-Met protein which plays an
               important role in protein interactions and in the development of cancer.

               Keywords: Formal Concept Analysis, supervised classiﬁcation, unsu-
               pervised classiﬁcation, emerging patterns, pattern mining


  1       Introduction
  In this paper, we present a classiﬁcation approach based on a combination of
  knowledge discovery methods which are all interconnected. This approach has
  to guide two processes, classiﬁcation and prediction, for analyzing the c-Met re-
  ceptor protein, a molecule showing an abnormally elevated expression in cancer
  disease [1]. Activation of this receptor can be inhibited by diﬀerent biochemical
  compounds (inhibitors). We collected a group of 100 molecules (“complete set
  of inhibitors”) which are known to be c-Met inhibitors. Inhibitors act on c-Met
  through a “binding pocket” and an associated “binding mode”. We know the
  binding modes for 30 inhibitors of the dataset (so called “training set”). Accord-
  ing to the spatial regions involved in the binding pocket, three main binding
  modes have been determined: “Type-1”, “DFG-out”, and “C-Helix-out” (the
  names are given w.r.t. spatial conﬁguration of proteins). The “Type-1” binding
  mode is very mixed, probably meaning that it should be divided into more spe-
  cialized modes. Chemists are working on the deﬁnition of a fourth binding mode,
  close to “Type-1” and termed as “Type-1bis”.

c 2012 by the paper authors. CLA 2012, pp. 211–222. Copying permitted only for
  private and academic purposes. Volume published and copyrighted by its editors.
  Local Proceedings in ISBN 978–84–695–5252–0,
  Universidad de Málaga (Dept. Matemática Aplicada), Spain.
212     Y. Asses, A. Buzmakov, T. Bourquard, S. O. Kuznetsov and A. Napoli


   To ensure the best and adapted inhibition, it is important to know the binding
mode of an inhibitor, and this can only be done through chemical experiments,
which are long and expensive. Thus, two main questions arise here:

 – Is it possible to classify the complete inhibitor set of 100 molecules according
   to the functionality (based on functional groups) and particular substruc-
   tures detected in the 30 molecules of the training set?
 – Is it possible then to predict the binding mode or “class” of the 70 inhibitors
   based on the classiﬁcation of the complete inhibitors set?

    For answering the two questions, we introduce a combined classiﬁcation/pre-
diction process involving supervised and unsupervised classiﬁcation within the
framework of FCA, graph mining and the so-called “Jumping Emerging Pat-
terns” (JEPs). More precisely, we ﬁrst want to classify a set of molecules (of
the training set) according to their structure and their functionality (the func-
tionality determines the behavior of a molecule during reaction and is linked
to special substructures called functional groups). For analyzing the structures
of the molecules in the training set, we consider molecules as graphs and apply
graph mining techniques [2, 3] to extract frequent substructures. Then, these sub-
structures are used as attributes in a formal context where objects are molecules
of the training set. This formal context is “augmented” in the sense that each
molecule in the training set has a “type” or a “class” according to its binding
mode. A concept lattice is built from the formal concept. Moreover, the class
information is used for characterizing the concepts whose extents include ob-
jects of a single class or binding mode. The intents of these particular concepts
are JEPs. Closed JEPs have already been studied in the framework of FCA
(see [4–6], where they are called JSM-hypotheses). The set of all JEPs forms a
“disjunctive version space” which was related to FCA in [7].
    The last step involves a “hierarchical agglomerative clustering” process. Based
on the knowledge of JEPs and of functional groups, inhibitors are represented as
vectors where components are ﬁlled with functional groups and JEPs (55 com-
ponents where 42 are functional groups and 13 are JEPs). The cosine similarity
is used for building a dendrogram which is used for explaining the “proximity”
of some inhibitors and for predicting the binding mode of inhibitors for which
this information is still unknown.
    This classiﬁcation process which calls for a variety of knowledge discovery
methods is totally original and is designed for solving a real-world problem. Here,
an original combination of supervised and unsupervised classiﬁcation works in
relation with graph mining and clustering. This shows also the ﬂexibility of
the FCA process to be combined with other classiﬁcation methods for giving
actual and substantial results. Experiments are still running but preliminary
results have been reached and show that the approach should be continued and
improved.
    The paper is organized as follows. In Section 2 a motivating example is intro-
duced. Then Section 3 describes the classiﬁcation ﬂow. Section 4 introduces the
main deﬁnitions on FCA, JEPs and how they are extracted. Section 5 details
    A Hybrid Classiﬁcation Approach based on FCA and Emerging Patterns        213


                      CAD

                      AAE
                      OH


                      O=
                                         Molecule Binding Mode


                      H

                      P
                      F
                  319 x x  xxx           319           DFG-out
                  320 x x x x x          320           DFG-out
                  L5G x x     x          L5G            Type-1
                  ZZY x   xxx            ZZY            Type-1
                 (a) Formal Context    (b) Molecule Binding Modes

Table 1: Running Example. In 1a, objects (the rows) are molecules; attributes
(the columns) are functional groups. A cross in the cell (i, j) means that the
molecule i includes the functional group j as a substructure. In 1b the last
column designates the ”class” of an object, i.e. the binding mode of the molecule.


the preparation of the molecular data that are processed with FCA. The clus-
tering method and its application are following. The main results are discussed
in Section 7 before the conclusion of the paper.


2   Running Example

Formal Concept Analysis (FCA) is brieﬂy introduced hereafter. FCA is based
on a formal context which is a triple (G, M, I), where G is a set of objects, M
is a set of attributes and I ⊆ G × M is a relation between G and M [8].
    A running example is shown in Table 1. Molecules are objects which are
described by substructures, corresponding to attributes. The selection of these
particular substructures is discussed later.
        Concept A Set of Molecules (Extent) A Set of Substructures (Intent)
          C0                                   H, CAD, OH, P, AAE, F, O=
          C1                ZZY                        H, P, AAE, F
          C2                319                     H, CAD, P, F, O=
          C3                320                    H, CAD, AAE, F, O=
          C4                L5G                        CAD, OH, O=
          C5             319, 320                     H, CAD, F, O=
          C6             320, ZZY                         H, P, F
          C7             319, ZZY                       H, AAE, F
          C8           319, 320, L5G                     CAD, O=
          C9           319, 320, ZZY                       H, F
          C10       319, 320, L5G, ZZY
          Table 2: A set of formal concepts w.r.t context on Table 1a.

   For every set of molecules A it is possible to ﬁnd the maximal set of substruc-
tures B, included into every molecule from A. This operation is denoted as (·)′
with B = A′ . For example, molecules 319 (BMS WO/2005/117867 24) and ZZY
(UCB Celltech azaindole) include the following substructures: H (Halogen),
216     Y. Asses, A. Buzmakov, T. Bourquard, S. O. Kuznetsov and A. Napoli

                                            c-Met inhibitors

                         Mining Frequent                        Searching Func-
                          Substructures                          tional Groups


                                                        Description of         Description of
             Description of        Description of
                                                        the Training          the Target Set
            the Target Set       the Training Set
                                                       Set Molecules in         Molecules in
          Molecules in terms    Molecules in terms
                                                       terms of Func-         terms of Func-
          of Substructures      of Substructures
                                                       tional Groups          tional Groups

                                      FCA


                                 Concept Lattice


                                  Mining JEPs


                                A Set of Substruc-
                               tures Characterizing
                                the Binding Modes


                                              Clustering of
                                            the Training Set


                                     Prediction based on Clustering


                      Fig. 4: Diagram of the Classiﬁcation Flow.


domain knowledge and to consider a molecule as a set of functional groups that
are involved into interactions. But some other substructures are also involved
into interactions, which are detected as follows:

1. Molecules from a dataset are considered as graphs, where vertexes correspond
   to atoms and edges to bonds between atoms.
2. A graph miming method is used to ﬁnd all frequent subgraphs, i.e. subgraphs
   that belong to a signiﬁcant part of molecules in the dataset.
3. A formal context is built in the following way:
      – Molecules are considered as objects.
      – Extracted substructures are considered as attributes.
      – A molecule m and a substructure s are related iﬀ the molecule m includes
        s as a substructure.
4. JEPs (the sets of attributes that characterize only objects of the same class)
   are extracted from the formal context.

   In the supervised classiﬁcation task, the extracted substructures are used
with functional groups to cluster molecules and to predict the binding mode of
molecules in the test set.
    A Hybrid Classiﬁcation Approach based on FCA and Emerging Patterns          217


4   Jumping Emerging Pattens (JEPs)
JEPs were introduced as a means for classiﬁcation in itemset mining [12, 13], but
the underlying idea had appeared and had been studied much earlier, e.g., within
the framework of disjunctive version spaces [14, 15] or JSM-hypotheses. Consider
an “augmented context”, i.e. a context (G, M, I) taken with an additional “class
attribute” giving “class information”, i.e. the class of each object in G. For a
concept (A, B) the set of attributes B is a JEP if every object in A is of the
same class. In Table 1, the set of attributes {F, O=} is a JEP because objects
319 and 320 including these attributes are of the same class “DFG-out”.
    Since a JEP characterizes a class of objects, it can be used for analyzing this
class and for guiding a clustering method. Usually, the set of attributes associated
with a single object is trivially a JEP, but there are especially interesting JEPs
characterizing a class of objects. The set of JEPs can be partially ordered w.r.t.
the subset relation: if there are two JEPs J1 and J2 such that J1 is a subset
of J2 , then J1 is more general, since it describes all the objects described by
J2 and some other objects. For example, the JEP J1 ={H, CAD, F, O=} is more
general than the JEP J2 ={H, CAD, P, F, O=} since J2 describes object 320while
J1 describes objects 320and 319.
    Relying on the JEP deﬁnition, the intent of a formal concept is a JEP if
all objects in the concept extent are in the same class. Thus it is possible to
compute the set of concepts for a given context and then to extract the JEPs by
checking the class of objects in the concept extents. Moreover, the most general
JEPs can be selected for further analysis and for clustering.

5   Graph Mining
A molecule is a complex structure composed of atoms connected by bonds, that
can be considered as a graph. Vertexes of the molecule graph correspond to
the atoms of the molecule and are labeled with atom names. The edges of the
molecule graph are labeled with types of bonds between the corresponding atoms.
For applying FCA and for ﬁnding a set of JEPs, a molecular graph can be
described as as a set of subgraphs. Then, a formal context can be built with G
as a set of molecules, M as a set of subgraphs or substructures and I the relation
meaning that a molecule g has a substructure m. The problem now is to ﬁnd
“valid” and “interesting” substructures.
    One way to select valid and interesting substructures is to search for frequent
subgraphs –that often appear in molecular graphs– using graph mining. For a
set of graphs G and a frequency threshold Fmin , a graph s is frequent iﬀ s is a
subgraph of at least Fmin graphs from G, i.e. |{g ∈ G | s ⊆ g}| ≥ Fmin .
    For example, considering the set of molecular graphs G in Figure 5 and
Fmin = 3, the subgraphs “N-H” and “O=C” are frequent as they occur in all
molecular graphs while subgraph “C-OH” only occurring in graph (b) (Figure 5b)
and subgraph “F-C” only occurring in graph (c) (Figure 5c) are not frequent.
    For discovering frequent subgraphs, diﬀerent graph mining algorithms may
be applied [2, 3]. Here we used gSpan and set Fmin = 10 for the dataset of 100
218      Y. Asses, A. Buzmakov, T. Bourquard, S. O. Kuznetsov and A. Napoli


       (a) Imatinib               (b) K-252a                (c) CKK
       or Gleevec R

                      Fig. 5: Examples of molecules from database.


molecular graphs. This frequency threshold is suﬃciently low to have a set of
speciﬁc subgraphs characterizing every molecule, and it is suﬃciently high to
obtain feasible processing time.
    The set of mined subgraphs can be divided into groups, where a group con-
sist of a set of subgraphs appearing in the same set of molecular graphs. Thus,
the group forms an equivalence class and can be represented by only one sub-
graph. Furthermore, the largest subgraphs preserve the suﬃcient information on
substructures related to binding modes. In the present experiment, around 106
frequent subgraphs were extracted, then divided into 104 groups.
    It can be noticed that if there are two frequent subgraphs g1 and g2 such
that g1 ⊆ g2 then every closed JEP containing the subgraph g2 contains the
subgraph g1 . Thus, if a JEP contains g2 , there is no need to consider g1 .


6     Hierarchical Agglomerative Clustering (HAC)

Here we describe a hierarchical agglomerative clustering process (see [16]) based
on the extracted JEPs and background knowledge on functional groups. Molecules
are described by vectors having 55 components, including 42 functional groups3
and 13 JEPs. The 13 JEPs are selected as the most representative for the
molecules in the training set. Each attribute of the vector therefore corresponds
either to a chemical functional group or to a substructures of a JEP with value
set to 1 when this chemical function/substructure is present and null other-
wise. The choice of a proper similarity is crucial for ensuring the quality of the
clustering. Here, the cosine similarity was chosen according to the results of sev-
eral specialized studies [17, 18]. If m1 and m2 are the description vectors of two
molecules, then ((m1 , m2 ) denotes the scalar product of two vectors):

3
    The functional groups were extracted thanks to the specialized algorithm “Check-
    mol” http://merian.pch.univie.ac.at/ nhaider/cheminf/cmmm.html.
      A Hybrid Classiﬁcation Approach based on FCA and Emerging Patterns          219


                                                        (m1 , m2 )
                            simcos (m1 , m2 ) =                                   (1)
                                                       |m1 | · |m2 |

    The “centroid” of a cluster of molecules C, denoted by centr(C), is calcu-
lated as follows:

                                                    1 X
                                centr(C) =              mi                        (2)
                                                   |C|
                                                       mi ∈C

    Similarity between two clusters or between a molecule and a cluster is calcu-
lated with the same formula (1) by substituting the cluster C with its centroid
centr(C).
    The HAC clustering is a bottom-up process working as follows. For every
molecule a unique cluster is created. Actually, all these clusters will be progres-
sively merged until only one unique cluster remains. Considering at some step
the set of remaining clusters C = {C1 , C2 , .., Ck }, a new cluster Ck+1 is created
by merging the two clusters Ci and Cj maximizing the similarity measure be-
tween them. The new cluster is added to the set of clusters while Ci and Cj are
deleted from C. Finally, the process stops when only one cluster remains, |C| = 1.


             (Ci , Cj ) =      argmax           simcos (centr(Ci ), centr(Cj ))   (3)
                            Ci ,Cj ∈C,Ci 6=Cj

               Ck+1 :=                              Ci ∪ Cj                       (4)
                   C :=                   C ∪ {Ck+1 } \ {Ci , Cj }                (5)

    The result of HAC is shown on a dendrogram (see Figures 3 and 6). Each
“vertex” of the dendrogram corresponds to a merging step of the algorithm.
The number attached to the vertex represents the similarity between the two
clusters at the lower level. The correlation between chemical similarities and
binding modes is discussed below.


7     Results and Discussion

After applying graph mining on the set of molecules, a formal context including
30 objects (molecules) and 104 attributes (substructures) was built. The car-
dinalitiy of the sets of most general JEPs for the diﬀerent binding modes are
distributed as follows:

    – 35 JEPs for Type-1 binding mode;
    – 1 JEP for DFG-out binding mode;
    – 1 JEP for C-Helix-out binding mode;
    – 3 JEPs for Type-1bis binding mode.
222     Y. Asses, A. Buzmakov, T. Bourquard, S. O. Kuznetsov and A. Napoli


 4. Ganter, B., Kuznetsov, S.: Formalizing hypotheses with concepts. In Ganter, B.,
    Mineau, G., eds.: Conceptual Structures: Logical, Linguistic, and Computational
    Issues. Volume 1867 of Lecture Notes in Computer Science. Springer Berlin /
    Heidelberg (2000) 342–356
 5. Blinova, V.G., Dobrynin, D.A., Finn, V.K., Kuznetsov, S.O., Pankratova, E.S.:
    Toxicology analysis by means of the JSM-method. Bioinformatics 19(10) (2003)
    1201–1207
 6. Kuznetsov, S.O., Samokhin, M.V.: Learning closed sets of labeled graphs for chem-
    ical applications. In: Proceedings of ILP. (2005) 190–208
 7. Ganter, B., Kuznetsov, S.: Hypotheses and version spaces. In Ganter, B., de Moor,
    A., Lex, W., eds.: Conceptual Structures for Knowledge Creation and Communi-
    cation. Volume 2746 of Lecture Notes in Computer Science. Springer Berlin /
    Heidelberg (2003) 83–95
 8. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. 1st
    edn. Springer-Verlag New York, Inc., Secaucus, NJ, USA (1997)
 9. Ganter, B.: Two basic algorithms in concept analysis. In Kwuida, L., Sertkaya,
    B., eds.: Formal Concept Analysis. Volume 5986 of Lecture Notes in Computer
    Science. Springer Berlin / Heidelberg (1984) 312–340
10. Kuznetsov, S.O.: A fast algorithm for computing all intersections of objects in a
    ﬁnite semi-lattice. Automatic documentation and Mathematical linguistics 27(5)
    (1993) 11–21
11. Merwe, D., Obiedkov, S., Kourie, D.: AddIntent: a new incremental algorithm for
    constructing concept lattices. In Goos, G., Hartmanis, J., Leeuwen, J., Eklund, P.,
    eds.: Concept Lattices. Volume 2961. Springer Berlin / Heidelberg (2004) 372–385
12. Dong, G., Li, J.: Eﬃcient mining of emerging patterns: discovering trends and
    diﬀerences. In: Proceedings of the 5th ACM SIGKDD International Conference on
    Knowledge Discovery and Data Mining. KDD ’99, New York, ACM (1999) 43–52
13. Poezevara, G., Cuissart, B., Crémilleux, B.: Extracting and summarizing the fre-
    quent emerging graph patterns from a dataset of graphs. Journal of Intelligent
    Information Systems 37 (July 2011) 333–353
14. Sebag, M.: Delaying the choice of bias: A disjunctive version space approach. In:
    Proceedings of the 13 th International Conference on Machine Learning, Morgan
    Kaufmann (1996) 444–452
15. Nikolaev, N.I., Smirnov, E.N.: Stochastically guided disjunctive version space
    learning. In: Proceedings of the 12th European Conference on Artiﬁcial Intelli-
    gence, John Wiley & Sons, Ltd. (1996)
16. Murtagh, F.: A survey of recent advances in hierarchical clustering algorithms.
    The Computer Journal 26(4) (November 1983) 354–359
17. Qian, G., Sural, S., Gu, Y., Pramanik, S.: Similarity between euclidean and co-
    sine angle distance for nearest neighbor queries. In: Proceedings of 2004 ACM
    Symposium on Applied Computing, ACM Press (2004) 1232–1237
18. Yamagishi, M., Martins, N., Neshich, G., Cai, W., Shao, X., Beautrait, A., Maigret,
    B.: A fast surface-matching procedure for protein–ligand docking. Journal of
    Molecular Modeling 12(6) (2006) 965–972
19. Kaytoue, M., Assaghir, Z., Napoli, A., Kuznetsov, S.O.: Embedding tolerance
    relations in formal concept analysis: an application in information fusion. In: Pro-
    ceedings of the 19th ACM international conference on Information and knowledge
    management. CIKM ’10, New York, NY, USA, ACM (2010) 1689–1692