<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Symbolic Learning vs. Graph Kernels: An Experimental Comparison in a Chemical Application</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Luc Brun</string-name>
          <email>luc.brun@greyc.ensicaen.fr</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Donatello Conte</string-name>
          <email>dconte@unisa.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pasquale Foggia</string-name>
          <email>pfoggia@unisa.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mario Vento</string-name>
          <email>mvento@unisa.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Didier Villemin</string-name>
          <email>didier.villemin@ensicaen.fr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dipartimento di Ingegneria dell'Informazione e di Ingegneria Elettrica, Università di Salerno</institution>
          ,
          <addr-line>Via Ponte Don Melillo, 1 I-84084 Fisciano (SA)</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>LCMT UMR CNRS 6507 ENSICAEN-Université de Caen Basse-Normandie</institution>
          ,
          <addr-line>14050 Caen</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <fpage>31</fpage>
      <lpage>40</lpage>
      <abstract>
        <p>In this paper we present a quantitative comparison between two approaches, Graph Kernels and Symbolic Learning, within a classification scheme. The experimental case-study is the predictive toxicology evaluation, that is the inference of the toxic characteristics of chemical compounds from their structure. The results demonstrate that both approaches are comparable in terms of accuracy, but present pros and cons that are discussed in the last part of the paper.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Many scientific areas are populated by applications dealing with structured
information, i.e., complex information which can be seen as made of simpler parts
suitably interconnected. Structured information is widely used in many
applicative domains as software engineering, pattern recognition, chemistry, medicine,
linguistics, etc. Usually structured information is represented by means of graphs
because of their high expressive power.</p>
      <p>Despite their attractiveness in terms of representational power, structural
methods (i.e., methods dealing with structured information) imply complex
procedures both in the recognition and in the learning processes. The difficulty of
defining effective algorithms for facing this task is so high that the problem is
considered still open.</p>
      <p>In this paper we focus our attention on two different approaches using graphs
for classification tasks. The first one is based on the idea of facing the learning
problem directly in the representation space of the structured data (we will call
it Symbolic Learning). The second is based on the notion of Graph Kernels that
gives a measure of similarity between graphs which corresponds to a metric thus
making possible the adoption of well-known statistical/neural paradigms. Here
we present several Graph Kernels based on the notion of bag of patterns and
graph edit distance that both preserve most of the structural information of
graphs.</p>
      <p>The chosen application domain is the Predictive Toxicology Evaluation, that
is the characterization of the toxic nature of chemical compounds from their
chemical structure. Namely, we used data from the Predictive Toxicology
Challenge 4, a competition based on the prediction of cancerogenic molecules. This
domain is appropriate for the analysis because of the structured nature of
chemical compounds.</p>
      <p>The paper is organized as follows: in section 2 a brief survey of the two
approaches is presented, while details on the two methods will be given in the
section 3; section 4 reports the results of an experimental case-study of both the
approaches. Finally, in the last section, conclusions are outlined.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Works</title>
      <p>The Symbolic Learning can be summarized as follows: the goal of the system is to
build a structural description for each class (prototypes) so that the classification
is conducted by mean of a structural matching between the prototypes and the
structured representation of the object to classify.</p>
      <p>
        Papers dealing with this approach can be ascribed to two rather different
categories. The first category collects methods based on the assumption that
the prototypical descriptions are built by interacting, more or less deeply, with
an expert of the domain. To this concern, some authors [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] start from some
ideal prototypes and refine manually, in successive approximations, a model of
the possible deformations. Others [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] assume a deformational model and use
a small training set for tuning the parameters contained in it. Despite the
advantage of facing the problem without a training set or at least with a small
training set, this approach is effective only in simple problems. The inadequacy
of human knowledge to find a set of prototypes really representative of a given
class significantly increases the risk of errors, especially in domains containing
either many data or many different classes.
      </p>
      <p>
        The methods ascribed to the second category face the problem in a formal
way ([
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]) by considering it as a symbolic machine learning problem, so
formulated: “Given a suitably chosen set of input data, whose class is known and
possibly some background domain knowledge, find out a set of optimal prototypical
descriptions for each class”.
      </p>
      <p>Graph kernels methods are based on an implicit embedding of graphs within
a vector space of large dimension. Several graph kernels are introduced in
literature but graph kernels built on the notion of bag of patterns seem to be very
promising. The design of such kernels, implies the following choices or heuristics:
1. Selection of a bag of patterns from a graph,</p>
      <sec id="sec-2-1">
        <title>4 http://www.predictive-toxicology.org</title>
        <p>2. Definition of a kernel between bags based on a kernel between patterns,
3. Definition of a kernel between patterns.</p>
        <p>
          Kernels between graphs then correspond to kernels between bags of patterns.
The implicit assumption being here that bags of patterns capture most of the
structural information of graphs. Several kernels have been proposed based on
different types of patterns: Vert [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] proposed to compare the set of sub-trees of
two graphs; Graphlets kernels [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] are based on the number of common
subgraphs of two graphs. However, the comparison of complex patterns requires
important processing times for an insight which is not always clearly
demonstrated [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. As a consequence most of graph kernels are based on simpler patterns
such as walks [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], trails [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] or paths. Bags of patterns may be finite or infinite.
Infinite bags, such as bag of walks are computed implicitly using random walks
defined on both graphs being compared [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Bags of patterns may also be
computed explicitly [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] by enumerating all patterns of a given graph. Given two bags
of patterns the similarity between both bags is often measured using convolution
kernels [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. Both bags may alternatively be considered as two sets in the feature
space induced by the kernel between patterns. This point of view allows to
measure the similarity between bags using general kernels between sets. Finally, the
similarity between patterns should be designed according both to the feature
available on each element of the pattern and to the structural noise which may
corrupt patterns. Using walks on labeled graphs the well known kernel [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]
being equal to 1 if both walks are similar, an 0 otherwise, encodes the fact that
labels are either equal or not comparable. The construction of a bag of patterns
from a graph removes connections between elementary patterns within input
graphs which are then only represented by their bags. As previously mentioned,
assumption of bag of pattern kernels is that most of the relevant information
about graphs is contained in the pattern and thus that most of the information
is preserved by the transformation from a graph to a bag. Bunke [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] proposed
several kernels based on the graph edit distance which do not rely on such an
assumption.
3
3.1
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Methods Description</title>
      <sec id="sec-3-1">
        <title>Symbolic Learning</title>
        <p>
          The Symbolic Learning Method presented in this section (hereafter called SLM)
is inspired by Quinlan’s FOIL algorithm particularizing it for Attributed
Relational Graph (ARG) representation. An extension of this representation has
been presented for describing prototypes of Attributed Relational Graphs; these
graphs, called Generalized Attributed Relational Graphs (GARGs), contain
generalized nodes, edges and attributes. Then, a learning algorithm has been
formulated to build such prototypes by means of a set of operations directly defined
on graphs. The algorithm preserves the generality of the prototypes generated
by the classical machine learning algorithms and moreover, similarly to most
machine learning systems ([
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]), the prototypes obtained by this system are
consistent, i.e. each prototype covers samples of a same class.
        </p>
        <p>An ARG can be defined as a 6-tuple (N, E, AN , AE, aN , aE), where N and
E ⊂ N × N are, respectively, the sets of nodes and edges of the ARG, AN and
AE the sets of nodes and edge attributes and, finally, aN and aE the functions
which associate to each node or edge of the graph the corresponding attributes.</p>
        <p>A GARG is used for representing a prototype of a set of ARGs. In order
to allow a GARG to match a set of possibly different ARGs, in the SLM the
attribute definition has been extended. First of all, the set of types of node and
edge attributes is extended with the special type Φ, carrying no parameter and
allowed to match any attribute type, ignoring the attribute parameters. We say
that a GARG G∗ = (N, E, AN , AE, aN , aE) covers a sample G and we use the
notation G∗ ⊲ G (the symbol ⊲ denotes the relation called covering) if there is
a mapping μ : N ∗ → N such that μ is a monomorphism and the attributes of
the nodes and of the edges of G∗ are compatible with the corresponding ones
of G. The first condition requires that each primitive and each relation in the
prototype is present also in the sample; note that the converse condition does
not hold. The latter condition constrains the monomorphism to be consistent
with the attributes of the prototype and of the sample in the sense that the type
of the attribute of the prototype must be either equal to the special type Φ or
to the type of the corresponding attribute of the sample. In the latter case, all
the parameters of the attribute, which are actually sets of values, must contain
the value of the corresponding parameter of the sample. The goal of the learning
algorithm can be stated as follows: there is a (possibly infinite) set S∗ of all the
patterns that may occur, partitioned into C different classes S1∗, . . . , SC∗ , with
Si∗ ∩ Sj∗ = φ for i 6= j; to the algorithm is given a finite subset S ⊂ S∗ (training
set) of labeled patterns (S = S1 ∪ · · · ∪ SC with Si = S ∩ Si∗), from which it tries
to find a sequence of prototype graphs G1∗, G2∗, . . . , G∗p, each labeled with a class
identifier, such that:
∀G ∈ S∗∃i : Gi∗ ⊲ G(completeness of the prototype set)
(1)
∀G ∈ S∗, Gi∗ ⊲G ⇒ class(G) = class(Gi∗)(consistency of the prototype set) (2)
where class(G) and class(G∗) refer to the class associated with sample G and
prototype G∗, respectively. Of course, this is an ideal goal since only a finite
subset of S∗ is available to the algorithm; in practice, the algorithm can only
demonstrate that completeness and consistency hold for the samples in S. On
the other hand, (1) dictates that, in order to get as close as possible to the ideal
case, the prototypes generated should be able to model samples also not found
in S. However, they should not be too general otherwise (2) will not be satisfied.</p>
        <p>A description of the learning algorithm is presented in the following: the
algorithm starts with an empty list L of prototypes and tries to cover the
training set by successively adding consistent prototypes. When a new prototype is
found, the samples covered by it are eliminated and the process continues on the
remaining samples of the training set. Then a sample is compared sequentially
against the prototypes in the same order in which they have been generated, and
it is attributed to the class of the first prototype that covers it. In this way, each
prototype implicitly entails the condition that the sample is not covered by any
previous prototype. The algorithm fails if no consistent prototype covering the
remaining samples can be found. It is worth noting that the test of consistency
in the algorithm actually checks whether the prototype is almost consistent, i.e.
almost all the samples covered by G belongs to the same class:</p>
        <p>
          Consistent(G∗) ⇔ max (|Si(G∗)| / |S(G∗)|) ≥ θ
i
(3)
where S(G∗) denotes the sets of all the samples of the training set covered by
a prototype G∗, and Si(G∗) the samples of the class i covered by G∗ and θ
is a threshold close to 1. For further details about the representation and the
algorithm you can refer to [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <p>An heuristic function H is introduced for evaluating how promising is the
provisional prototype. The heuristic function used in the algorithm is given by
the product of Hcompl and Hcons:</p>
        <p>H = Hcompl(S, G∗) · Hcons(S, G∗)
(4)
and it is based on estimation of consistency and completeness of prototype. The
use of Hcons will drive the algorithm towards consistent prototypes while the
term Hcompl is introduced in order to privilege general prototypes with respect
to prototypes which, albeit consistent, cover only a small number of samples.</p>
        <p>See Figure 1 for an example of a GARG G∗ that covers the graphs G1 and
G2 representing two chemical compounds.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Graph Kernels</title>
        <p>In this section we will introduce two kind of Graph Kernels: Bag of trails Kernel
and Laplacian Kernel.</p>
        <p>Bag of trails Kernel Let us consider a graph G = (V, E) where V denotes the
set of vertices and E ⊂ V × V the set of edges. We define a simple-graph as a
graph with no multiple edges between two nodes and no loop (an edge linking
a node with itself). We define a walk as an alternating sequence of nodes and
edges, a trail as a walk with distinct edges and a path as a trail with distinct
nodes. The description of a graph by a bag of walks is penalized by a tottering
effect since long walks may remain on few edges hereby describing small parts of
a graph. Bags of trails reduce drastically this tottering effects and may capture
non linear features such as loops which are not described by open Paths. We
thus use bags of trails, in the remaining part of this paper. The cardinality of a
bag T being denoted by |T |. The kernel between trails is denoted Ktrail.</p>
        <p>
          Suard [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] has proposed numerous kernels for bags of paths which can be
easily extended to bags of trails. We proposed to extend the mean kernel to bag
of trails. By considering bag of trails as sets and a trail kernel Ktrail, we can
design a convolution kernel called mean kernel as follows:
        </p>
        <p>Kmean(T1, T2) = 1 1 X X Ktrail(t, t′). (5)</p>
        <p>|T1| |T2| t∈T1 t′∈T2</p>
        <p>Kweighted(T1, T2) =
where T1 and T2 denote two bags of trails.</p>
        <p>
          Following Haussler [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], this kernel is positive definite on the bag of trails
domain if and only if Ktrail is positive definite on the trail domain.
        </p>
        <p>
          However, the mean kernel compares all the paths of both bags without
considering their relevance in both bags. As a consequence, irrelevant paths may
corrupt the overall value of the kernel due to the "meaning" effect. Recently
Dupé et al. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], proposed to limit this effect by using a relevance measure of
trails. Let T1 and T2 two bags of trails, the weighted mean kernel between these
two bags is then defined as:
1
        </p>
        <p>1 X X &lt; RT1(t), RT2 (t′) &gt;d Ktrail(t, t′).
|T1| |T2| t∈T1 t′∈T2
(6)
where d ∈ R+ allows to weight the importance of the relevance measure and
RT (t) denote the relevance of trail t according to the bag T . This relevance
measure is defined as the similarity (the kernel) between the input trail and the
1
mean trail of the bag (RT (t) = |T | Pt′∈T Ktrail(t, t′)).</p>
        <p>Kernel defined by equation 6 is a convolution kernel and so is positive definite
if Ktrail is positive definite.</p>
        <p>
          The above bag of trail kernels are based on a trail kernel Ktrail. A kernel
between two trails t = (v1, e1, . . . , vn) and t′ = (v1′, e′1, . . . , vp′) can be derived
from the path kernel proposed by Kashima [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. This kernel is defined as 0 if both
trails have not the same size and as follows otherwise:
        </p>
        <p>
          |t|
Ktrail(t, t′) = δ(ϕ(v1), ϕ(v1′))Yδ(ψ(evi−1vi), ψ(evi′−1vi′ ))δ(ϕ(vi), ϕ(vi′)), (7)
i=2
where ϕ(v) and ψ(e) denote respectively the labels of vertex v and edge e while
δ is equal to 1 if its both arguments are equal and 0 otherwise. The kernel Ktrail
is definite positive as a tensor product of definite positive kernels.
Laplacian Kernel The graph edit distance between two graphs is defined as
the minimal number of vertices and edge removal/addition/relabeling required
to transform one graph into an other [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. The computational complexity of the
exact computation of the edit distance is exponential in the number of vertices.
Fortunately, Riesen and Bunke [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] proposed an heuristic to efficiently
approximate the edit distance between two graphs.
        </p>
        <p>Unfortunately, the edit distance does not define a metric and, as a
consequence, kernels directly based on the edit distance does not define a valid kernel.</p>
        <p>
          Let us consider a set of input graphs {G1, . . . , Gn} defining our graph test
database. Given a kernel k, the gram matrix K associated to this database is
an n × n matrix defined by Ki,j = k(Gi, Gj ). As denoted by Steinke [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], the
inverse of K (or its pseudo inverse if K is not invertible) may be considered as
a regularization operator on the set of vector of dimension n. Conversely, the
inverse (or pseudo inverse) of any definite positive regularization operator may be
considered as a kernel. More precisely, within an interpolation or classification
scheme, the optimal mapping of a real value f ∗ to each graph {G1, . . . , Gn}
which interpolates a vector of know values y while minimizing the norm induced
by K in Rn is achieved by solving the following minimization problem:
f ∗ = arg minf∈RnCLoss(f, y, K) + f tK−1f
where CLoss(·, ·, ·) denotes a given loss function encoding the distance between
vector f and the vector of known values y.
        </p>
        <p>
          One scheme to design a kernel consists thus to first build a definite positive
regularization operator and then to take its inverse (or its pseudo inverse) to
obtain a kernel. Having an interpolation problem in mind, we would like that our
regularization operator penalizes different values of f ∗ mapped onto graphs with
a small edit distance. We thus consider the Laplacian operator defined as follows:
given the set of graphs, {G1, . . . , Gn}, we first consider the n × n adjacency
matrix Wi,j = e− d(Giσ,Gj) where σ is a tuning variable and d(·, ·) denotes the
edit distance [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. The normalized Laplacian of {G1, . . . , Gn} is then defined as
n
L = I − D− 21 W D− 12 where D is a diagonal matrix defined by Di,i = Pj=1 Wi,j .
        </p>
        <p>
          Well know results from spectral graph theory establish that L is a symmetric,
semi definite positive matrix whose eigenvalues belongs to [
          <xref ref-type="bibr" rid="ref2">0, 2</xref>
          ]. Unfortunately,
the Laplacian is also well know for being non invertible. The only semi definite
property of the Laplacian matrix forbids a direct inversion of this matrix.
Moreover, the pseudo inverse of the Laplacian induces numerical instabilities which
does not lead to a reliable kernel. Therefore, following Smola [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], we rather
regularize the spectrum of L. The regularized version of L, denoted as L˜, is defined
as :
        </p>
        <p>n
L˜ = X r(λi)uiuit (8)</p>
        <p>
          i=1
where {λi, ui} denotes the eigensystem of L while r(·) is a regularization function.
Many regularization functions may be used, however we restricted our choice to
Regularized Laplacian ([
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]): r(λ) = 1 + ηλ where η is a tuning variable. The
regularized laplacian L˜ is invertible and its inverse K = L˜−1 is taken as a kernel.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experimental Evaluation</title>
      <p>The application used for the comparison between the two approaches is the
Predictive Toxicology Challenge.
4.1</p>
      <sec id="sec-4-1">
        <title>The Application Domain and the Test Sets</title>
        <p>Nowadays, almost every human being in industrialized societies is exposed to
chemical compounds whose long-term effects are not evident. The
carcinogenicity of such compounds, for instance, is often unknown. The US National
Toxicology Program (NTP) 5 conducted several experimentations on the effects of many
chemical compounds on mice and rats, but such empirical results are expensive
and very slow to obtain. Hence the necessity to find carcinogenicity models able
to generate reliable toxicity predictions for other compounds. The Predictive
Toxicology Challenge (PTC) 6 was initiated in this aim. It enabled participants
to submit machine learning programs which, from the large training datasets of
the NTP, provide carcinogenicity predictions of compounds in test sets.
According to the Structure Activity Relationship (SAR) hypothesis, which assume that
the activity of a molecule can be entirely determined by its structure, the only
informations used for toxicity prediction of a molecule are related to its chemical
structure.</p>
        <p>Four distinct training and test sets were provided (see Table 1), since mice
and rats reactions to carcinogens compounds are significantly different. In this
paper we consider all the provided datasets using, for both the methods, the
following representation: each chemical compound is represented by a graph
G(V, E, α, β) where V represents the set of atoms, E is the set of edges
representing bonds, α is the nodes labeling function that associates an atom symbol
to each node of the graph and β is the edges labeling function that associates
a bond multiplicity to each edge of the graph. The separation of the sets in
training and test set was made directly by the PTC committee, therefore no
cross-validation in training phase is allowed.</p>
        <sec id="sec-4-1-1">
          <title>5 National Toxicology Program: http://ntp.niehs.nih.gov/</title>
          <p>6 http://www.predictive-toxicology.org/</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>Evaluation and Comparison</title>
        <p>In Table 2 we report the comparative results obtained on FM, MM, FR and MR
datasets by the two different approaches (the classification by Graph Kernels is
made by means of SVM classifier).</p>
        <p>The three methods (two methods from the Graph Kernels) have comparable
results. However the time required, in the training phase, by Symbolic Learning
method is very high.</p>
        <p>We can conclude that in problems that have strict time constraints Graph
Kernels method is very effective while Symbolic Learning can be used only on
smaller problems, or where there are less time constraints. On the other hand
Symbolic Learning generates explicit, declarative knowledge on the classes in the
form of GARGs that may be understood with little effort by a human expert:
a direct inspection of such explicit prototypes can be sufficient to validate them
or possibly to delete the badly formed prototypes that could be detrimental
for classification performance. The Graph Kernel approach, instead, is able to
perform a classification, but not to explain the criteria that guided its choice.</p>
        <p>Method FM MM FR MR Elapsed Time
Symbolic Learning 60.4% 52.1% 62.0% 62.0% ∼ 2000 min
Bag of trails Kernel 61.2% 50.4% 62.8% 65.6% ∼ 2 min</p>
        <p>Laplacian Kernel 59.2% 55.2% 57.7% 60.9% ∼ 1 min 30 sec</p>
        <p>In conclusion the results confirm that there is no way to decide what is the
best method because both approaches have their peculiar virtues and
deficiencies.</p>
        <p>
          In our view, the two approaches should not be seen as competitors; instead,
only their cooperative integration can provide us with more powerful tools for
knowledge exploitation. The first step toward the integration of the two
methodologies could be their combined use in a multiexpert system [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. As widely
demonstrated in the literature, an accurate selection of the combining criteria allows us
to significantly improve the results obtained by the single experts by retaining
their strengths, while overcoming their weaknesses. We made a preliminary
evaluation in this direction: the results show that the two systems exhibit a certain
degree of orthogonality in their errors: that is, some of the samples misrecognized
by one of the algorithms are correctly classified by the other one.
5
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>In this paper we presented a comparison between two approaches, Graph
Kernels and Symbolic Learning, within a classification scheme. Our experiments
indicate that both approaches are comparable in terms of accuracy. Graph
kernel methods require less execution times but does not provide an explanation of
the classification criteria. Symbolic Learning, on the other hand have the reverse
advantages and drawbacks. Both methods are thus clearly complementary.</p>
      <p>Future steps, as we discussed in the last section, could be an integration of
the two approaches in order to improve the results obtained by the application
of the single approaches.</p>
      <p>Acknowledgments. We would like to express our thanks to Alice Kijewski and
David Lemaresquier for their support in the experimental phase.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bunke</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riesen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>A family of novel graph kernels for structural pattern recognition</article-title>
          .
          <source>In: XII Iberoamerican Congress on Pattern Recognition</source>
          . pp.
          <fpage>20</fpage>
          -
          <lpage>31</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Dupé</surname>
            ,
            <given-names>F.X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brun</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Tree covering within a graph kernel framework for shape classification</article-title>
          .
          <source>In: XV ICIAP</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Foggia</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Genna</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vento</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Symbolic vs connectionist learning: an experimental comparison in a structured domain</article-title>
          .
          <source>IEEE Transaction on Knowledge and Data Engineering 13-2</source>
          ,
          <fpage>176</fpage>
          -
          <lpage>195</lpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Haussler</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Convolution kernels on discrete structures</article-title>
          .
          <source>Tech. rep.</source>
          , Department of Computer Science, University of California at Santa Cruz (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Iannello</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Percannella</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sansone</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soda</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>On the use of classification reliability for improving performance of the one-per-class decomposition method</article-title>
          .
          <source>Data &amp; Knowledge Engineering</source>
          <volume>68</volume>
          ,
          <fpage>1398</fpage>
          -
          <lpage>1410</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kashima</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsuda</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Inokuchi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Marginalized kernel between labeled graphs</article-title>
          .
          <source>In: In Proc. of the Twentieth International conference on Machine Learning</source>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Mahé</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vert</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          :
          <article-title>Graph kernels based on tree patterns for molecules</article-title>
          .
          <source>Machine Learning 75(1)</source>
          ,
          <fpage>3</fpage>
          -
          <lpage>35</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. Michalski, R.:
          <article-title>Pattern recognition as rule-guided inductive inference</article-title>
          .
          <source>IEEE Transaction on PAMI 2-4</source>
          ,
          <fpage>349</fpage>
          -
          <lpage>361</lpage>
          (
          <year>1980</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Neuhaus</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bunke</surname>
          </string-name>
          , H.:
          <article-title>Bridging the Gap Between Graph Edit Distance</article-title>
          and
          <string-name>
            <given-names>Kernel</given-names>
            <surname>Machines</surname>
          </string-name>
          . World Scientific Publishing Co., Inc.,
          <string-name>
            <surname>River</surname>
            <given-names>Edge</given-names>
          </string-name>
          , NJ, USA (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Nishida</surname>
          </string-name>
          , H.:
          <article-title>Shape recognition by integrating structural descriptions and geometrical/statistical transforms</article-title>
          .
          <source>CVIU 64</source>
          ,
          <fpage>248</fpage>
          -
          <lpage>262</lpage>
          (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Riesen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bunke</surname>
          </string-name>
          , H.:
          <article-title>Approximate graph edit distance computation by means of bipartite graph matching</article-title>
          .
          <source>Image Vision Computing</source>
          <volume>27</volume>
          (
          <issue>7</issue>
          ),
          <fpage>950</fpage>
          -
          <lpage>959</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Rocha</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pavlidis</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>A shape analysis model with applications to a character recognition system</article-title>
          .
          <source>IEEE Transaction on PAMI 16-4</source>
          ,
          <fpage>393</fpage>
          -
          <lpage>404</lpage>
          (
          <year>1994</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Shervashidze</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vishwanathan</surname>
            ,
            <given-names>S.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petri</surname>
            ,
            <given-names>T.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mehlhorn</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Borgwardt</surname>
            ,
            <given-names>K.M.</given-names>
          </string-name>
          :
          <article-title>Efficient graphlet kernels for large graph comparison</article-title>
          .
          <source>In: Twelfth International Conference on Artificial Intelligence and Statistics</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Smola</surname>
            ,
            <given-names>A.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kondor</surname>
            ,
            <given-names>R.I.</given-names>
          </string-name>
          :
          <article-title>Kernels and regularization on graphs</article-title>
          .
          <source>In: XV Annual Conference on Learning Theory</source>
          . pp.
          <fpage>144</fpage>
          -
          <lpage>158</lpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Steinke</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schölkopf</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Kernels, regularization and differential equations</article-title>
          .
          <source>Pattern Recognition</source>
          <volume>41</volume>
          (
          <issue>11</issue>
          ),
          <fpage>3271</fpage>
          -
          <lpage>3286</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Suard</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rakotomamonjy</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bensrhair</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Kernel on bag of paths for measuring similarity of shapes</article-title>
          .
          <source>In: European Symposium on Artificial Neural Networks. Bruges-Belgique (April</source>
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>