=Paper=
{{Paper
|id=Vol-1430/paper7
|storemode=property
|title=Lazy Associative Graph Classification
|pdfUrl=https://ceur-ws.org/Vol-1430/paper7.pdf
|volume=Vol-1430
|dblpUrl=https://dblp.org/rec/conf/ijcai/KashnitskyK15
}}
==Lazy Associative Graph Classification==
<pdf width="1500px">https://ceur-ws.org/Vol-1430/paper7.pdf</pdf>
<pre>
           Lazy associative graph classification

                   Yury Kashnitsky, and Sergei O. Kuznetsov

             National Research University Higher School of Economics
                                Moscow, Russia
                       {ykashnitsky, skuznetsov}@hse.ru


      Abstract. In this paper, we introduce a modification of the lazy as-
      sociative classification which addresses the graph classification problem.
      To deal with intersections of large graphs, graph intersections are ap-
      proximated with all common subgraphs up to a fixed size similarly to
      what is done with graphlet kernels. We illustrate the algorithm with a
      toy example and describe our experiments with a predictive toxicology
      dataset.

      Keywords: graph classification, graphlets, formal concept analysis, pat-
      tern structures, lazy associative classification


1   Introduction
    Classification methods for data given by graphs usually reduce initial graphs
to numeric representation and then use standard classification approaches, like
SVM [1] and Nearest neighbors with graph kernels [2], graph boosting [3], etc.
By doing so, one usually constructs numeric attributes corresponding to sub-
graphs of initial graphs or computes graph kernels, which usually are also based
on the number of common subgraphs of special type. In this paper, we suggest an
approach based on weak classifiers in the form of association rules [4] applied in
a “lazy” way: not all of the association rules are computed to avoid exponential
explosion, but only those that are relevant to objects to be classified. Lazy classi-
fication is well studied experimentally [5], here we extend the approach to graphs
and propose a uniform theoretical framework (based on pattern structures [6])
which can be applied to arbitrary kinds of descriptions. We show in a series
of experiments with data from the Predictive Toxicology Challenge (PTC [7])
that our approach outperforms learning models based on SVM with graphlet
kernel [8] and kNN with graphlet-based distance.
    The rest of the paper is organized as follows. In Section 2, we give main
definitions on labeled graphs, pattern structures, and lazy associative classifica-
tion. In Section 3, we consider an example. In Section 4, we discuss the results of
computational experiments on PTC dataset. In Section 5, we give the conclusion
and discuss directions of further research.
2      Main definitions

      In this section, we give the definitions of the main concepts used in the paper.


2.1     Labeled graphs and isomorphism

    First, we recall some standard definitions related to labeled graphs, see
e.g. [9,10,11].
    Undirected graph is a pair G = (V, E). Set V is referred to as a set of nodes
of a graph. Set E = {{v, u} | v, u ∈ V } ∪ E0 , a set of unordered elements of V , is
called a set of edges, and E0 ⊆ V — is a set of loops. If E0 = ∅, then G is called
a graph without loops.
    Graph H = (VH , EH ) is called a subgraph of graph G = (VG , EG ), if all nodes
and edges of H are at the same time nodes and edges of G correspondingly, i.e.
VH ⊆ VG and EH ⊆ EG .
    Graph H = (VH , EH ) is called an induced subgraph of graph G = (VG , EG ),
if H is a subgraph of G, and edges of H are comprised of all edges of G with
both nodes belonging to H.
    Given sets of nodes V , node labels LV , edges E, and edge labels LE , a labeled
graph is defined by a quadruple G = ((V, lv), (E, le)) such that

 – lv ⊆ V × LV is the relation that associates nodes with labels, i.e., lv is a set
   of pairs (vi , li ) such that node vi has label li ,
 – le ⊆ V × V × LE is the relation that associates edges with labels, i.e., le is
   a set of triples (vi , vj , lij ) such that edge (vi , vj ) has label lij .

Example 1. A molecule structure can be represented by a labeled graph.
                                      NH12    C3     CH23


                                              C4

                                        OH5        Cl6

    Here V = {1, 2, 3, 4, 5, 6}, E = {(1, 3), (2, 3), (3, 4), (4, 5), (4, 6)},
lv = {(1, N H2 ), (2, CH3 ), (3, C), (4, C), (5, OH), (6, Cl)},
le = {(1, 3, 1), (2, 3, 1), (3, 4, 2), (4, 5, 1), (4, 6, 1)}, and edge type 1 corresponds to
a single bond (ex. HN2 —C) while edge type 2 – to a double bond (ex. C = C).

    A labeled graph G1 = ((V1 , lv1 ), (E1 , le1 )) dominates a labeled graph G2 =
((V2 , lv2 ), (E2 , le2 )) with given order ≤ (e.g. natural, lexicographic) on vertex
and edge labels, or G2 ≤ G1 (or G2 is a subgraph of G1 ), if there exists an
injection ϕ : V2 → V1 such that it:

 – respects edges: (v, w) ∈ E2 ⇒ (ϕ(v), ϕ(w)) ∈ E1 ,
 – fits under labels: lv2 (v) ≤ lv1 (ϕ(v)), (v, w) ∈ E2 ⇒ le2 (v, w) ≤ le1 (ϕ(v), ϕ(w)).

   Two labeled graphs G1 and G2 are called isomorphic (G1 ' G2 ) if G1 ≤ G2
and G2 ≤ G1 .
                       CH13           C3   OH2               NH12       C3    Cl2


Example 2. G1 :                       C4              G2 :              C4

                            Cl   5
                                           NH62                  CH53        OH6


    G1 ' G2 as ∃ϕ : V2 = {1, 2, 3, 4, 5, 6} → V1 = {1, 2, 3, 4, 5, 6} = (6, 5, 4, 3, 1, 2),
satisfying the definitions of graph dominance and isomorphism.

    An injective function f : V → V 0 is called a subgraph isomorphism from G to
G , if there exists a subgraph of G0 : S ≤ G0 , such that f is a graph isomorphism
  0

from G to S, or G ' S.
                       CH13           C3   OH2               NH12       C3    Cl2


Example 3. G1 :                       C4              G2 :              C4

                                           NH62                  CH53        OH6


      G1 is subgraph-isomorphic to G2 .

   Given labeled graphs G1 and G2 , a set G1 u G2 =
{G | G ≤ G1 , G2 , ∀G∗ ≤ G1 , G2 G∗ 6≥ G} is called a set of maximal common
subgraphs of graphs G1 and G2 . We also refer to G1 u G2 as to intersection of
graphs G1 and G2 , and to u – as to similarity operator defined on graphs.
               NH                                 CH                           NH             CH 3
                        C             CH3                   C      OH                  C              C
                                                                                                            
                  2                                  3                            2                        
              
                                         
                                                 
                                                                      
                                                                               
                                                                                                             
                                                                                                              
                                                                                                         
                                               u 
                                                                                                         
                                                                             =             ,
                                                                                                         
Example 4.             C
                                           
                                                             C
                                                                         
                                                                                         C              C
                                                                                                               
              
                                          
                                                                       
                                                                                                             
                                                                                                               
              
              
                 OH             Cl
                                           
                                           
                                           
                                                  
                                                  
                                                      NH2        OH
                                                                         
                                                                         
                                                                         
                                                                                
                                                                                
                                                                                   OH                      OH 
                                                                                                               
                                                                                                               


   For sets of graphs G = {G1 , . . . , Gk } and H = {H1 , . . . , Hn } the similarity
operator is defined in the following way:
      G u H = MAX≤ {Gi u Hi | Gi ∈ G, Hj ∈ H}
   Given sets of labeled graphs G1 and G2 , we say that a set of graphs G1 is
subsumed by a set of graphs G2 , or G1 v G2 , if G1 u G2 = G1 .


2.2     Graphlets

Definition 1. A labeled graph g is called a k-graphlet of a labeled graph G if
g is a connected induced subgraph of graph G with k nodes [12].

Definition 2. A set of labeled graphs G k is called a k-graphlet representation
of a labeled graph G if any g ∈ G is a unique (up to subgraph isomorphism) k-
graphlet of graph G, i.e
∀g ∈ G k graph g is a k-graphlet of G, ∀g1 , g2 ∈ G one does not have g1 ≤ g2 .

Definition 3. k-graphlet distribution of a labeled graph G is the set {(gi , ni )},
where gi is a k-graphlet of G and ni is the number of k-graphlets in G isomorphic
to gi .
                           H                      H

                  CH3            H           OH           H


Example 5. G1 :                       G2 :
                  H              H           H            OH


                           H                      H


G1 = {C − C = C, C − C − H, C = C − H, C − C − C},
G2 = {C−C = C, C−C−H, C = C−H, C−C−O, C = C−O, C−O−H} – are
3-graphlet representations of graphs G1 and G2 correspondingly (with benzene
rings comprised of carbon molecules C). 3-graphlet distributions of graphs G1
and G2 are given in Table 1.


Table 1. 3-graphlet distributions of graphs G1 and G2 (benzene rings are comprised
of carbon molecules C).

                        CC=C CCH C=CH CCO C=CO COH CCC
                  G1      7   8    5   0    0   0   1
                  G2      6   4    4   2    2   2   0


    Graphlets were introduced in biomedicine and are used to compare real cellu-
lar networks with their models. It is easy to demonstrate that two networks are
different by simply showing a short list of properties in which they differ. It is
much harder to show that two networks are similar, as it requires demonstrating
their similarity in all of their exponentially many properties [12].
    Graphlet distribution serves as a measure of network local structure agree-
ment and was shown to express more structural information than other metrics
such as centrality, local clustering coefficient, degree distribution etc. In [12],
they considered all 30 combinations1 of graphlets with 2, 3, 4 and 5 nodes.


2.3    Pattern structures

  Pattern structures are natural extension of ideas proposed in Formal Concept
Analysis [13], [6].

Definition 4. Let G be a set (of objects), let (D, u) be a meet-semi-lattice (of
all possible object descriptions) and let δ : G → D be a mapping between objects
and descriptions. Set δ(G) := {δ(g)|g ∈ G} generates a complete subsemilattice
(Dδ , u) of (D, u), if every subset X of δ(G) has infimum uX in (D, u).
Pattern structure is a triple (G, D, δ), where D = (D, u), provided that the
set δ(G) := {δ(g) | g ∈ G} generates a complete subsemilattice (Dδ , u) [6,11].
1
    https://parasol.tamu.edu/dreu2013/OLeary
Definition 5. Patterns are elements of D. Patterns are naturally ordered by
subsumption relation v: given c, d ∈ D one has c v d ⇔ c u d = c. Operation u
is also called a similarity operation. A pattern structure (G, D, δ) gives rise
to the following derivation operators (·) :

                                                              l
                                                   A =               δ(g)                for A ∈ G,
                                                              g∈A
                        d = {g ∈ G | d v δ(g)}                                     for d ∈ (D, u).

   Pairs (A, d) satisfying A ⊆ G, d ∈ D, A = d, and A = d are called
pattern concepts of (G, D, δ).

Example 6. Let {1, 2, 3} be a set of objects, {G1 , G2 , G3 } – be a set of their
descriptions (i.e., graph representations):

           CH 3     C       NH2                   NH2     C          OH                 NH2     C       OH


   G1 :             C                  G2 :               C                   G3 :              C

            NH2          NH2                       C H3        Cl                         NH2       Cl


    D is the set of all sets of labeled graphs, u is a graph intersection operator,
D = (D, u). A set of objects (graphs) {1, 2, 3}, their “descriptions” (i.e. graphs
themselves) D = {G1 , G2 , G3 } (δ(i) = Gi , i = 1, . . . , 3), and similarity operator
u comprises a pattern structure ({1, 2, 3}, D, δ).
{1, 2, 3} = {N H2 − C = C}, because {N H2 − C = C} is the only graph,
subgraph-isomorphic to all three graphs 1, 2, and 3. Likewise,
{N H2 − C = C} = {1, 2, 3}, because graphs 1, 2, and 3 subsume graph {N H2 −
C = C}.
{1, 2} = {CH3 − C = C − N H2 }, because {CH3 − C = C − N H2 } is a graph,
subgraph-isomorphic to 1, and 2, but not to graph 3. Likewise,
{CH3 − C = C − N H2 } = {1, 2}, because only graphs 1, and 2 subsume graph
{CH3 − C = C − N H2 }, but graph 3 does not.
    Here is the set of all pattern concepts for this pattern structure:

                            NH2                               C H3                                       NH2


   {
                                       C   !                              C         !                          C         !
          {1, 2, 3}     ,              C       , {1, 2}   ,               C             , {1, 3}    ,          C             ,
                                                                              NH2                                  NH2
                  NH2


                                                                                                                                 }
                        C         OH   !
     {2, 3}   ,         C                  , (1, {G1 }) , (2, {G2 }) , (3, {G3 }) , (∅, {G1 , G2 , G3 })                         .
                             Cl


   For some pattern structures (e.g., for the pattern structures on sets of graphs
with labeled nodes) even computing subsumption of patterns may be NP-hard.
Hence, for practical situations one needs approximation tools, which would re-
place the patterns with simpler ones, even if that results in some loss of infor-
mation. To this end, we use a contractive monotone and idempotent mapping
ψ : D → D that replaces each pattern d ∈ D by ψ(d) such that the pattern
structure (G, D, δ) is replaced by (G, D, ψ ◦ δ). Under some natural algebraic
requirements that hold for all natural projections in particular pattern struc-
tures we studied in applications, see [11], the meet operation u is preserved:
ψ(X u Y ) = ψ(X) u ψ(Y ). This property of a projection allows one to relate
premises in the original representation with those approximated by a projection.
In this paper, we utilize projections to introduce graphlet-based classification
rules.

2.4   Lazy associative classification
   Consider a binary classification problem with a set of positive examples G+ ,
negative examples G− , test examples Gtest , and a pattern structure
(G+ ∪ G− , D, δ) defined on the training set.
Definition 6. A pattern h ∈ D is a positive premise iff [11]
                           h ∩ G− = ∅ and h ∩ G+ 6= ∅
A positive premise is a subset of the least general generalization of descriptions
of positive examples, which is not contained in (does not cover) any negative
example. A negative premise is defined similarly. Various classification schemes
using premises are possible, as an example consider the following simplest scheme
from [6]: if the description δ(g) of an undetermined example g contains a positive
premise h, i.e., h v δ(g), then g is classified positively. Negative classifications are
defined similarly. If δ(g) contains premises of both signs, or if δ(g) contains no
premise at all, then the classification is contradictory or undetermined, respec-
tively, and some probabilistic techniques allowing for a certain tolerance should
be applied.
Definition 7. Class association rule (CAR) [5] for a binary classification prob-
lem is an association rule in a form h → {+, −}, where h is a positive or negative
premise, respectively.
     The definition means that for a binary graph classification problem, for in-
stance, we can mine classification association rules in a form {gi } → {+, −},
i.e. if a test graph subsumes a subgraph gi , that is common only to positive
(negative) training examples, it is therefore classified as positive (negative). We
elaborate this idea in the next subsection. As there might be lots of such CARs,
we might come up with a single classification rule taking into account these
CARs. For instance, we can count all positive and negative CARs for each test
object and classify it with a majority voting procedure. Of course, the idea is eas-
ily generalized to multi-label classification problem. The described classification
schemes are explored in [5].
     Another advantage of the lazy classification framework is its obvious par-
allelization. Suppose there are K processors. If we consider classification of an
unlabeled object we can divide the training set into K separate subsets. Then,
for each subset we perform intersections between the labeled objects with the un-
labeled one to be classified. After all unfalsified intersections are found we can go
on to the classification phase which involves voting based on those intersections.
2.5     Graphlet-based lazy associative classification

    In this subsection, we combine the ideas of pattern structures and their pro-
jections, graphlets, and lazy associative classification, and introduce our algo-
rithm. First, we recall the definition of k-projection producing all graphs with
less than or equal to k nodes.

Definition 8. Given a graph pattern structure (G, D, δ), we call ψk (G) = {Hi =
((Vi , lvi ), (Ei , lei )) | Hi ≤ G, Hi is connected, |Vi | ≤ k} a k-projection, defined
for graph descriptions G.

Obviously, this operator is a projection, i.e. contractive, monotone, and idempo-
tent function.

Definition 9. Given S   a graph pattern structure (G, D, δ), k-graphlet deriva-
tion operator δk = 1≤l≤k ψl ◦ δ takes an object g described by graph G and
produces all l-graphlets of G for l = 1, . . . k.

Example 7. For object 1 with “graph description” G1 from example 5 δ3 (1) is
the set of all 1-,2-, and 3-graphlets of graph 1:
δ3 (1) = {C, H, C − C, C = C, C − H, C − C = C, C − C − H, C =
C − H, C − C − C}. To clarify, here δ(1) = {G1 }, δ3 (1) = ψ3 (δ(1)) = ψ3 (G1 ) =
{Hi = ((Vi , lvi ), (Ei , lei )) | Hi ≤ G1 , |Vi | ≤ 3}.

Definition 10. Given k-graphlet representations G1k and G2k of labeled graphs
G1 and G2 , the intersection G1k uk G2k is called k-graphlet intersection of G1
and G2 . The uk operator is further called k-graphlet similarity operator.

Example 8. For graphs 1 and 2 with “graph descriptions” G1 and G2 from exam-
ple 5 G1 u3 G2 = {C, H, C −C, C = C, C −H, C −C = C, C −C −H, C = C −H}
is the set of all common 1-, 2-, and 3-graphlets of graphs 1 and 2.

      Here are the main steps of our algorithm:

 1. All k-graphlet intersections of test examples and positive training examples
    are computed: h+ = Gtr uk G+ ;
 2. Each intersection h+ is tested on subsumption by negative training examples.
    If some of them subsumes h+ , then this intersection is falsified. Otherwise,
    h+ gives a vote for positive classification of the test example Gtr ;
 3. The same procedure is done for each intersection of Gtr with negative ex-
    amples;
 4. Test example Gtr is classified according to the weighted majority rule where
    each unfalsified intersection is given a weight equal to its cardinality (the
    cardinality of the corresponding set of graphs).
3    A toy example
    We illustrate the principle of our method with a toy example. Let us consider
the following training and test sets comprised of molecular descriptions of toxic
(G1 – G4 ) and non-toxic (G5 – G7 ) chemical compounds. The task is to build
a discriminative classifier able to determine whether the objects from the test
set (G8 − G11 ) are toxic or not. The main steps of the algorithm, described in
the previous section, are briefly illustrated with Tables 2 and 3. First, we build
3-graphlet intersections of test and training examples (we use only graphlets
with 3 nodes for the purpose of illustration). Then, a “+” or “—” sign with
cardinality of intersection is put in Table 3 if this intersection is not subsumed
by any example of the opposite class. Otherwise, the counter-example subsuming
this intersection is given.
    Positive examples:
           A       C       B          A       C       B           A       C       B          A       C           E


    G1 :           C           G2 :           C           G3 :            C           G4 :           C

               D       D                  B       D                   A       E                  B           E
    Negative examples:
           A       C       D          A       C       E           B       C       D


    G5 :           C           G6 :           C           G7 :            C

               D       D                  B       D                   D       E
    Test examples:
           A       C       B          A       C       D           A       C           D              A           C       B


    G8 :           C           G9 :           C           G10 :           C               G11 :                  C

               D       E                  B       E                   D           E                      A           D
    3-graphlet intersections of training and test examples are given in Table 2. For
instance, graphs G1 and G8 have 4 common 3-graphlets: A–C–B, A–C=C, B–
C=C, and C=C–D. In this simple case, we do not differentiate between a single
and a double bond (e.g., ACC here stands for A–C=C without ambiguity).
    Further, Table 3 summarizes the procedure. For instance, a ’+4’ sign for
graphs G1 and G8 means that all common 3-graphlets of G1 and G8 (i.e., A–C–
B, A–C=C, B–C=C, and C=C–D) are not subgraph-isomorphic to any of the
negative examples G5 – G7 altogether at the same time. Thus, this intersection
“gives a vote” of weight 4 (the cardinality of the mentioned set of graphlets)
for positive classification of G8 . On the contrary, all common 3-graphlets of G4
and G8 (A–C=C, B–C=C, and C=C–E) are altogether subgraph-isomorphic to
negative example G6 , therefore, the intersection of G4 and G8 doesn’t “give a
vote” for positive classification of G8 .
    Thus, molecules G8 and G11 are classified as toxic, G9 , G10 are classified as
non-toxic.

4    Experiments
  The proposed algorithm was tested with the 2001 Predictive Toxicology
Challenge dataset in comparison with SVM with graphlet kernel and k-Nearest-
     Table 2. All common 3-graphlets of test (G8 − G11 ) and training examples.

           G8                 G9              G10             G11
G1 ACB, ACC, BCC, CCD   ACC, BCC, CCD      ACC, CCD    ACB, ACC, BCC, CCD
G2 ACB, ACC, BCC, CCD   ACC, BCC, CCD      ACC, CCD    ACB, ACC, BCC, CCD
G3 ACB, ACC, BCC, CCE   ACC, BCC, CCE      ACC, CCE      ACB, ACC, BCC
G4   ACC, BCC, CCE    ACC, BCC, BCE, CCE   ACC, CCE         ACC, BCC
G5      ACC, CCD        ACC, ACD, CCD    ACC, ACD, CCD   ACC, ACD, CCD
G6 ACC, BCC, CCD, CCE ACC, BCC, CCD, CCE ACC, CCD, CCE   ACC, BCC, CCD
G7 BCC, CCD, CCE, DCE   BCC, CCD, CCE    CCD, CCE, CDE      BCC, CCD

                         Table 3. Lazy classification table

                          G1 G2 G3 G4 G5 G6 G7 Score Class
                      G8 +4 +4 +4 G6 G1 –4 –4 4:0     +
                      G9 G6 G6 G6 +4 –3 –4 –3 0:6     —
                      G10 G5 G5 G6 G6 –3 –3 –3 0:9    —
                      G11 +4 +4 +3 G6 –3 G1 G1 8:0    +
Neighbor with graphlet-based Hamming distance. SVM classifiers are considered
to be good benchmarks for graph classification problem [8]. We implemented a
Scikit-learn [14] version of Support Vector Classifier with graphlet kernel and
graphlets having up to 5 nodes. We also adopted a k-Nearest-Neighbor for graph
classification problem by defining a Hamming distance between two graphs (0 if
two objects have a certain graphlet in common, 1 otherwise). For instance, for
two graphs from example 5 in case of graphlets with up to 3 nodes this distance
is equal to 7 (G1 subsumes graphlet C − C − C not subsumed by G2 , while G2
subsumes graphlets {O, C − O, O − H, C − C − O, C = C − O, C − O − H}
not subsumed by G1 ).
    The training set is comprised of 417 molecular graphs of chemical compounds
with indication of whether a compound is toxic or not for a particular sex and
species group out of four possible groups: {mice, rats} × {male, female}. Thus, 4
separate sets were built for male rats (MR, 274 examples, 117 are toxic for male
rats, 157 are non-toxic), male mice (MM, 266 examples, 94 are positive, 172 are
negative), female rats (FR, 281 examples, 86 are positive, 195 are negative) and
female mice (FM, 279 examples, 108 are positive, 171 are negative).
    We run 5-fold cross-validation for each group (MR, MM, FR, FM) and com-
pared average classification metrics for each fold. The results for male rats are
presented in Table 4 (we got similar results for other groups).
    The parameters for SVM and kNN classifiers were tuned through the pro-
cess of GridSearch cross-validation2 . The ’K nodes’ parameter determines the
maximum number of nodes in graphlet representation of graphs, i.e. when it is
equal to 4, all graph are approximated with their 4-graphlet representation, or
all unique (in the sense of isomorphism) graphlets with up to 4 nodes.
    As we can observe, graphlet-based lazy associative classification is reason-
able with at least 3-graphlet descriptions. In case of 2-graphlet descriptions the
2
    http://scikit-learn.org/stable/modules/grid\_search.html
Table 4. Experimental results for the male rats group. “GLAC” stands for “Graphlet-
based lazy associative classification”, “SVM” here denotes “Support Vector Machine
with graphlet kernel” “kNN” here stands for a k-Nearest-Neighbor classifier with Ham-
ming distance.

                K nodes Accuracy Precision Recall F-score Time (sec.)
                   2      0.36     0.32     0.33 0.32         5.78
                   3      0.68     0.83     0.68 0.75        17.40
           GLAC
                   4      0.59     0.57     0.62 0.59        65.72
                   5      0.55      0.7     0.62 0.66       196.03
                   2      0.45     0.15     0.33 0.21         1.54
                   3      0.52     0.35     0.35 0.35         9.03
           SVM
                   4      0.41     0.27     0.28 0.28        61.31
                   5      0.36     0.24     0.25 0.24       295.89
                   2      0.45     0.15     0.33 0.21         3.35
                   3      0.34     0.21     0.23 0.22        15.75
            kNN
                   4      0.48     0.31     0.32 0.31        73.38
                   5      0.45     0.30     0.31 0.30       211.58
algorithm often refuses to classify test objects, because 2-graphlet intersections
of positive and test objects are falsified by negative objects and vice versa. But
3-graphlet descriptions are optimal for this method as the model is probably
overfitted in case of 4- and 5-graphlet descriptions.


5   Conclusion


    In this paper, we have proposed an approach to graph classification based on
the combination of graphlets, pattern structures and lazy classification. The key
principle of lazy classification is that one does not have to produce the whole set
of classification rules whatever they are. Instead, one generates those rules that
allow one to classify the current test object. The framework favors the complex
structure of objects as soon as the algorithm does not require a training phase.
    We have carried out a number of experiments in molecule classification within
the proposed lazy classification framework. We compared classification perfor-
mance of our method and SVM with graphlet kernel and KNN with graphlet-
based distance. The reason for such a choice is that SVM classifiers are considered
to be good benchmarks for graph classification problem, while kNN is a famous
lazy classification method.
    In our experiments graphlet-based lazy classification - following the same
learning curve as the other methods - shows better classification performance
compared to the classical methods in case of molecule toxicology prediction
problem. Further, we plan to investigate the overfitting problem for our algo-
rithm, in particular, the dependency of classification metrics on the number of
considered nodes in graphlets. Other types of descriptions and a parallel version
of our algorithm are also promising directions of study.
References
 1. Corinna Cortes and Vladimir Vapnik, “Support-Vector Networks,” Mach. Learn.,
    vol. 20, no. 3, pp. 273–297, Sept. 1995.
 2. S. V. N. Vishwanathan, Nicol N. Schraudolph, Risi Kondor, and Karsten M. Borg-
    wardt, “Graph Kernels,” J. Mach. Learn. Res., vol. 11, pp. 1201–1242, Aug.
    2010.
 3. Hiroto Saigo, Sebastian Nowozin, Tadashi Kadowaki, Taku Kudo, and Koji Tsuda,
    “GBoost: a mathematical programming approach to graph classification and re-
    gression,” Machine Learning, vol. 75, no. 1, pp. 69–89, 2009.
 4. Rakesh Agrawal and Ramakrishnan Srikant, “Fast Algorithms for Mining Associa-
    tion Rules in Large Databases,” in Proceedings of the 20th International Conference
    on Very Large Data Bases, San Francisco, CA, USA, 1994, VLDB ’94, pp. 487–499,
    Morgan Kaufmann Publishers Inc.
 5. Adriano Veloso, Wagner Meira Jr., and Mohammed J. Zaki, “Lazy Associative
    Classification,” in Proceedings of the Sixth International Conference on Data Min-
    ing, Washington, DC, USA, 2006, ICDM ’06, pp. 645–654, IEEE Computer Society.
 6. Bernhard Ganter and Sergei Kuznetsov, “Pattern Structures and Their Pro-
    jections,” in Conceptual Structures: Broadening the Base, Harry Delugach and
    Gerd Stumme, Eds., vol. 2120 of Lecture Notes in Computer Science, pp. 129–142.
    Springer, Berlin/Heidelberg, 2001.
 7. Christoph Helma and Stefan Kramer, “A Survey of the Predictive Toxicology
    Challenge 2000-2001,” Bioinformatics, vol. 19, no. 10, pp. 1179–1182, 2003.
 8. Nino Shervashidze, S. V. N. Vishwanathan, Tobias Petri, Kurt Mehlhorn, and
    Karsten M. Borgwardt, “Efficient graphlet kernels for large graph comparison,”
    Journal of Machine Learning Research - Proceedings Track, vol. 5, pp. 488–495,
    2009.
 9. Reinhard Diestel, Graph Theory (Graduate Texts in Mathematics), Springer, Au-
    gust 2005.
10. Horst Bunke and Kim Shearer, “A Graph Distance Metric Based on the Maximal
    Common Subgraph,” Pattern Recogn. Lett., vol. 19, no. 3-4, pp. 255–259, Mar.
    1998.
11. Sergei O. Kuznetsov, “Scalable Knowledge Discovery in Complex Data with Pat-
    tern Structures,” in PReMI, Pradipta Maji, Ashish Ghosh, M. Narasimha Murty,
    Kuntal Ghosh, and Sankar K. Pal, Eds. 2013, vol. 8251 of Lecture Notes in Com-
    puter Science, pp. 30–39, Springer.
12. Natasa Przulj, “Biological network comparison using graphlet degree distribution,”
    Bioinformatics, vol. 23, 2003.
13. Bernhard Ganter and Rudolf Wille, Formal Concept Analysis: Mathematical Foun-
    dations, Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1st edition, 1997.
14. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel,
    M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos,
    D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Ma-
    chine Learning in Python,” Journal of Machine Learning Research, vol. 12, pp.
    2825–2830, 2011.

</pre>