<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Symbolic Learning vs. Graph Kernels: An Experimental Comparison in a Chemical Application</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Luc</forename><surname>Brun</surname></persName>
							<email>luc.brun@greyc.ensicaen.fr</email>
							<affiliation key="aff0">
								<orgName type="laboratory">GREYC UMR CNRS 6072</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Donatello</forename><surname>Conte</surname></persName>
							<email>dconte@unisa.it</email>
							<affiliation key="aff2">
								<orgName type="department">Dipartimento di Ingegneria dell&apos;Informazione e di Ingegneria Elettrica</orgName>
								<orgName type="institution">Università di Salerno</orgName>
								<address>
									<addrLine>Via Ponte Don Melillo, 1</addrLine>
									<postCode>I-84084</postCode>
									<settlement>Fisciano (SA)</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Pasquale</forename><surname>Foggia</surname></persName>
							<email>pfoggia@unisa.it</email>
							<affiliation key="aff2">
								<orgName type="department">Dipartimento di Ingegneria dell&apos;Informazione e di Ingegneria Elettrica</orgName>
								<orgName type="institution">Università di Salerno</orgName>
								<address>
									<addrLine>Via Ponte Don Melillo, 1</addrLine>
									<postCode>I-84084</postCode>
									<settlement>Fisciano (SA)</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Mario</forename><surname>Vento</surname></persName>
							<email>mvento@unisa.it</email>
							<affiliation key="aff2">
								<orgName type="department">Dipartimento di Ingegneria dell&apos;Informazione e di Ingegneria Elettrica</orgName>
								<orgName type="institution">Università di Salerno</orgName>
								<address>
									<addrLine>Via Ponte Don Melillo, 1</addrLine>
									<postCode>I-84084</postCode>
									<settlement>Fisciano (SA)</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Didier</forename><surname>Villemin</surname></persName>
							<email>didier.villemin@ensicaen.fr</email>
							<affiliation key="aff1">
								<orgName type="laboratory">LCMT UMR CNRS 6507</orgName>
								<orgName type="institution" key="instit1">ENSICAEN</orgName>
								<orgName type="institution" key="instit2">Université de Caen Basse-Normandie</orgName>
								<address>
									<postCode>14050</postCode>
									<settlement>Caen</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Symbolic Learning vs. Graph Kernels: An Experimental Comparison in a Chemical Application</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">A260CB4B004D741CB8602B8877CB8A4A</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T12:11+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper we present a quantitative comparison between two approaches, Graph Kernels and Symbolic Learning, within a classification scheme. The experimental case-study is the predictive toxicology evaluation, that is the inference of the toxic characteristics of chemical compounds from their structure. The results demonstrate that both approaches are comparable in terms of accuracy, but present pros and cons that are discussed in the last part of the paper.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Many scientific areas are populated by applications dealing with structured information, i.e., complex information which can be seen as made of simpler parts suitably interconnected. Structured information is widely used in many applicative domains as software engineering, pattern recognition, chemistry, medicine, linguistics, etc. Usually structured information is represented by means of graphs because of their high expressive power.</p><p>Despite their attractiveness in terms of representational power, structural methods (i.e., methods dealing with structured information) imply complex procedures both in the recognition and in the learning processes. The difficulty of defining effective algorithms for facing this task is so high that the problem is considered still open.</p><p>In this paper we focus our attention on two different approaches using graphs for classification tasks. The first one is based on the idea of facing the learning problem directly in the representation space of the structured data (we will call it Symbolic Learning). The second is based on the notion of Graph Kernels that gives a measure of similarity between graphs which corresponds to a metric thus making possible the adoption of well-known statistical/neural paradigms. Here we present several Graph Kernels based on the notion of bag of patterns and graph edit distance that both preserve most of the structural information of graphs.</p><p>The chosen application domain is the Predictive Toxicology Evaluation, that is the characterization of the toxic nature of chemical compounds from their chemical structure. Namely, we used data from the Predictive Toxicology Challenge 4 , a competition based on the prediction of cancerogenic molecules. This domain is appropriate for the analysis because of the structured nature of chemical compounds.</p><p>The paper is organized as follows: in section 2 a brief survey of the two approaches is presented, while details on the two methods will be given in the section 3; section 4 reports the results of an experimental case-study of both the approaches. Finally, in the last section, conclusions are outlined.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related Works</head><p>The Symbolic Learning can be summarized as follows: the goal of the system is to build a structural description for each class (prototypes) so that the classification is conducted by mean of a structural matching between the prototypes and the structured representation of the object to classify.</p><p>Papers dealing with this approach can be ascribed to two rather different categories. The first category collects methods based on the assumption that the prototypical descriptions are built by interacting, more or less deeply, with an expert of the domain. To this concern, some authors <ref type="bibr" target="#b11">[12]</ref> start from some ideal prototypes and refine manually, in successive approximations, a model of the possible deformations. Others <ref type="bibr" target="#b9">[10]</ref> assume a deformational model and use a small training set for tuning the parameters contained in it. Despite the advantage of facing the problem without a training set or at least with a small training set, this approach is effective only in simple problems. The inadequacy of human knowledge to find a set of prototypes really representative of a given class significantly increases the risk of errors, especially in domains containing either many data or many different classes.</p><p>The methods ascribed to the second category face the problem in a formal way ( <ref type="bibr" target="#b7">[8]</ref>) by considering it as a symbolic machine learning problem, so formulated: "Given a suitably chosen set of input data, whose class is known and possibly some background domain knowledge, find out a set of optimal prototypical descriptions for each class".</p><p>Graph kernels methods are based on an implicit embedding of graphs within a vector space of large dimension. Several graph kernels are introduced in literature but graph kernels built on the notion of bag of patterns seem to be very promising. The design of such kernels, implies the following choices or heuristics: 2. Definition of a kernel between bags based on a kernel between patterns, 3. Definition of a kernel between patterns. Kernels between graphs then correspond to kernels between bags of patterns. The implicit assumption being here that bags of patterns capture most of the structural information of graphs. Several kernels have been proposed based on different types of patterns: Vert <ref type="bibr" target="#b6">[7]</ref> proposed to compare the set of sub-trees of two graphs; Graphlets kernels <ref type="bibr" target="#b12">[13]</ref> are based on the number of common subgraphs of two graphs. However, the comparison of complex patterns requires important processing times for an insight which is not always clearly demonstrated <ref type="bibr" target="#b6">[7]</ref>. As a consequence most of graph kernels are based on simpler patterns such as walks <ref type="bibr" target="#b5">[6]</ref>, trails <ref type="bibr" target="#b1">[2]</ref> or paths. Bags of patterns may be finite or infinite. Infinite bags, such as bag of walks are computed implicitly using random walks defined on both graphs being compared <ref type="bibr" target="#b5">[6]</ref>. Bags of patterns may also be computed explicitly <ref type="bibr" target="#b1">[2]</ref> by enumerating all patterns of a given graph. Given two bags of patterns the similarity between both bags is often measured using convolution kernels <ref type="bibr" target="#b3">[4]</ref>. Both bags may alternatively be considered as two sets in the feature space induced by the kernel between patterns. This point of view allows to measure the similarity between bags using general kernels between sets. Finally, the similarity between patterns should be designed according both to the feature available on each element of the pattern and to the structural noise which may corrupt patterns. Using walks on labeled graphs the well known kernel <ref type="bibr" target="#b5">[6]</ref> being equal to 1 if both walks are similar, an 0 otherwise, encodes the fact that labels are either equal or not comparable. The construction of a bag of patterns from a graph removes connections between elementary patterns within input graphs which are then only represented by their bags. As previously mentioned, assumption of bag of pattern kernels is that most of the relevant information about graphs is contained in the pattern and thus that most of the information is preserved by the transformation from a graph to a bag. Bunke <ref type="bibr" target="#b0">[1]</ref> proposed several kernels based on the graph edit distance which do not rely on such an assumption.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Methods Description</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Symbolic Learning</head><p>The Symbolic Learning Method presented in this section (hereafter called SLM) is inspired by Quinlan's FOIL algorithm particularizing it for Attributed Relational Graph (ARG) representation. An extension of this representation has been presented for describing prototypes of Attributed Relational Graphs; these graphs, called Generalized Attributed Relational Graphs (GARGs), contain generalized nodes, edges and attributes. Then, a learning algorithm has been formulated to build such prototypes by means of a set of operations directly defined on graphs. The algorithm preserves the generality of the prototypes generated by the classical machine learning algorithms and moreover, similarly to most machine learning systems ( <ref type="bibr" target="#b7">[8]</ref>), the prototypes obtained by this system are consistent, i.e. each prototype covers samples of a same class.</p><p>An ARG can be defined as a 6-tuple (N, E, A N , A E , a N , a E ), where N and E ⊂ N × N are, respectively, the sets of nodes and edges of the ARG, A N and A E the sets of nodes and edge attributes and, finally, a N and a E the functions which associate to each node or edge of the graph the corresponding attributes.</p><p>A GARG is used for representing a prototype of a set of ARGs. In order to allow a GARG to match a set of possibly different ARGs, in the SLM the attribute definition has been extended. First of all, the set of types of node and edge attributes is extended with the special type Φ, carrying no parameter and allowed to match any attribute type, ignoring the attribute parameters. We say that a GARG G * = (N, E, A N , A E , a N , a E ) covers a sample G and we use the notation G * ⊲ G (the symbol ⊲ denotes the relation called covering) if there is a mapping µ : N * → N such that µ is a monomorphism and the attributes of the nodes and of the edges of G * are compatible with the corresponding ones of G. The first condition requires that each primitive and each relation in the prototype is present also in the sample; note that the converse condition does not hold. The latter condition constrains the monomorphism to be consistent with the attributes of the prototype and of the sample in the sense that the type of the attribute of the prototype must be either equal to the special type Φ or to the type of the corresponding attribute of the sample. In the latter case, all the parameters of the attribute, which are actually sets of values, must contain the value of the corresponding parameter of the sample. The goal of the learning algorithm can be stated as follows: there is a (possibly infinite) set S * of all the patterns that may occur, partitioned into C different classes S * 1 , . . . , S * C , with S * i ∩ S * j = φ for i = j; to the algorithm is given a finite subset S ⊂ S * (training set) of labeled patterns (S = S 1 ∪ • • • ∪ S C with S i = S ∩ S * i ), from which it tries to find a sequence of prototype graphs G * 1 , G * 2 , . . . , G * p , each labeled with a class identifier, such that:</p><formula xml:id="formula_0">∀G ∈ S * ∃i : G * i ⊲ G(completeness of the prototype set) (1) ∀G ∈ S * , G * i ⊲G ⇒ class(G) = class(G * i )(consistency of the prototype set) (2)</formula><p>where class(G) and class(G * ) refer to the class associated with sample G and prototype G * , respectively. Of course, this is an ideal goal since only a finite subset of S * is available to the algorithm; in practice, the algorithm can only demonstrate that completeness and consistency hold for the samples in S. On the other hand, (1) dictates that, in order to get as close as possible to the ideal case, the prototypes generated should be able to model samples also not found in S. However, they should not be too general otherwise (2) will not be satisfied.</p><p>A description of the learning algorithm is presented in the following: the algorithm starts with an empty list L of prototypes and tries to cover the training set by successively adding consistent prototypes. When a new prototype is found, the samples covered by it are eliminated and the process continues on the remaining samples of the training set. Then a sample is compared sequentially against the prototypes in the same order in which they have been generated, and it is attributed to the class of the first prototype that covers it. In this way, each prototype implicitly entails the condition that the sample is not covered by any previous prototype. The algorithm fails if no consistent prototype covering the remaining samples can be found. It is worth noting that the test of consistency in the algorithm actually checks whether the prototype is almost consistent, i.e. almost all the samples covered by G belongs to the same class:</p><formula xml:id="formula_1">Consistent(G * ) ⇔ max i (|S i (G * )| / |S(G * )|) ≥ θ<label>(3)</label></formula><p>where S(G * ) denotes the sets of all the samples of the training set covered by a prototype G * , and S i (G * ) the samples of the class i covered by G * and θ is a threshold close to 1. For further details about the representation and the algorithm you can refer to <ref type="bibr" target="#b2">[3]</ref>. An heuristic function H is introduced for evaluating how promising is the provisional prototype. The heuristic function used in the algorithm is given by the product of H compl and H cons :</p><formula xml:id="formula_2">H = H compl (S, G * ) • H cons (S, G * ) (<label>4</label></formula><formula xml:id="formula_3">)</formula><p>and it is based on estimation of consistency and completeness of prototype. The use of H cons will drive the algorithm towards consistent prototypes while the term H compl is introduced in order to privilege general prototypes with respect to prototypes which, albeit consistent, cover only a small number of samples. See Figure <ref type="figure" target="#fig_0">1</ref> for an example of a GARG G * that covers the graphs G 1 and G 2 representing two chemical compounds.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Graph Kernels</head><p>In this section we will introduce two kind of Graph Kernels: Bag of trails Kernel and Laplacian Kernel.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Bag of trails Kernel</head><p>Let us consider a graph G = (V, E) where V denotes the set of vertices and E ⊂ V × V the set of edges. We define a simple-graph as a graph with no multiple edges between two nodes and no loop (an edge linking a node with itself). We define a walk as an alternating sequence of nodes and edges, a trail as a walk with distinct edges and a path as a trail with distinct nodes. The description of a graph by a bag of walks is penalized by a tottering effect since long walks may remain on few edges hereby describing small parts of a graph. Bags of trails reduce drastically this tottering effects and may capture non linear features such as loops which are not described by open Paths. We thus use bags of trails, in the remaining part of this paper. The cardinality of a bag T being denoted by |T |. The kernel between trails is denoted K trail .</p><p>Suard <ref type="bibr" target="#b15">[16]</ref> has proposed numerous kernels for bags of paths which can be easily extended to bags of trails. We proposed to extend the mean kernel to bag of trails. By considering bag of trails as sets and a trail kernel K trail , we can design a convolution kernel called mean kernel as follows:</p><formula xml:id="formula_4">K mean (T 1 , T 2 ) = 1 |T 1 | 1 |T 2 | t∈T1 t ′ ∈T2 K trail (t, t ′ ).</formula><p>(</p><formula xml:id="formula_5">)<label>5</label></formula><p>where T 1 and T 2 denote two bags of trails. Following Haussler <ref type="bibr" target="#b3">[4]</ref>, this kernel is positive definite on the bag of trails domain if and only if K trail is positive definite on the trail domain.</p><p>However, the mean kernel compares all the paths of both bags without considering their relevance in both bags. As a consequence, irrelevant paths may corrupt the overall value of the kernel due to the "meaning" effect. Recently Dupé et al. <ref type="bibr" target="#b1">[2]</ref>, proposed to limit this effect by using a relevance measure of trails. Let T 1 and T 2 two bags of trails, the weighted mean kernel between these two bags is then defined as:</p><formula xml:id="formula_6">K weighted (T 1 , T 2 ) = 1 |T 1 | 1 |T 2 | t∈T1 t ′ ∈T2 &lt; R T1 (t), R T2 (t ′ ) &gt; d K trail (t, t ′ ). (6)</formula><p>where d ∈ R + allows to weight the importance of the relevance measure and R T (t) denote the relevance of trail t according to the bag T . This relevance measure is defined as the similarity (the kernel) between the input trail and the mean trail of the bag (R</p><formula xml:id="formula_7">T (t) = 1 |T | t ′ ∈T K trail (t, t ′ )).</formula><p>Kernel defined by equation 6 is a convolution kernel and so is positive definite if K trail is positive definite.</p><p>The above bag of trail kernels are based on a trail kernel K trail . A kernel between two trails t = (v 1 , e 1 , . . . , v n ) and t ′ = (v ′</p><p>1 , e ′ 1 , . . . , v ′ p ) can be derived from the path kernel proposed by <ref type="bibr">Kashima [6]</ref>. This kernel is defined as 0 if both trails have not the same size and as follows otherwise:</p><formula xml:id="formula_8">K trail (t, t ′ ) = δ(ϕ(v1), ϕ(v ′ 1 )) |t| Y i=2 δ(ψ(ev i−1 v i ), ψ(e v ′ i−1 v ′ i ))δ(ϕ(vi), ϕ(v ′ i )),<label>(7)</label></formula><p>where ϕ(v) and ψ(e) denote respectively the labels of vertex v and edge e while δ is equal to 1 if its both arguments are equal and 0 otherwise. The kernel K trail is definite positive as a tensor product of definite positive kernels.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Laplacian Kernel</head><p>The graph edit distance between two graphs is defined as the minimal number of vertices and edge removal/addition/relabeling required to transform one graph into an other <ref type="bibr" target="#b8">[9]</ref>. The computational complexity of the exact computation of the edit distance is exponential in the number of vertices. Fortunately, Riesen and Bunke <ref type="bibr" target="#b10">[11]</ref> proposed an heuristic to efficiently approximate the edit distance between two graphs. Unfortunately, the edit distance does not define a metric and, as a consequence, kernels directly based on the edit distance does not define a valid kernel.</p><p>Let us consider a set of input graphs {G 1 , . . . , G n } defining our graph test database. Given a kernel k, the gram matrix K associated to this database is an n × n matrix defined by K i,j = k(G i , G j ). As denoted by Steinke <ref type="bibr" target="#b14">[15]</ref>, the inverse of K (or its pseudo inverse if K is not invertible) may be considered as a regularization operator on the set of vector of dimension n. Conversely, the inverse (or pseudo inverse) of any definite positive regularization operator may be considered as a kernel. More precisely, within an interpolation or classification scheme, the optimal mapping of a real value f * to each graph {G 1 , . . . , G n } which interpolates a vector of know values y while minimizing the norm induced by K in R n is achieved by solving the following minimization problem:</p><formula xml:id="formula_9">f * = arg min f ∈R n CLoss(f, y, K) + f t K −1 f</formula><p>where CLoss(•, •, •) denotes a given loss function encoding the distance between vector f and the vector of known values y.</p><p>One scheme to design a kernel consists thus to first build a definite positive regularization operator and then to take its inverse (or its pseudo inverse) to obtain a kernel. Having an interpolation problem in mind, we would like that our regularization operator penalizes different values of f * mapped onto graphs with a small edit distance. We thus consider the Laplacian operator defined as follows: given the set of graphs, {G 1 , . . . , G n }, we first consider the n × n adjacency matrix W i,j = e − d(G i ,G j ) σ where σ is a tuning variable and d(•, •) denotes the edit distance <ref type="bibr" target="#b10">[11]</ref>. The normalized Laplacian of {G 1 , . . . , G n } is then defined as</p><formula xml:id="formula_10">L = I − D − 1 2 W D − 1 2</formula><p>where D is a diagonal matrix defined by D i,i = n j=1 W i,j . Well know results from spectral graph theory establish that L is a symmetric, semi definite positive matrix whose eigenvalues belongs to [0, 2]. Unfortunately, the Laplacian is also well know for being non invertible. The only semi definite property of the Laplacian matrix forbids a direct inversion of this matrix. Moreover, the pseudo inverse of the Laplacian induces numerical instabilities which does not lead to a reliable kernel. Therefore, following Smola <ref type="bibr" target="#b13">[14]</ref>, we rather regularize the spectrum of L. The regularized version of L, denoted as L, is defined as :</p><formula xml:id="formula_11">L = n i=1 r(λ i )u i u t i (<label>8</label></formula><formula xml:id="formula_12">)</formula><p>where {λ i , u i } denotes the eigensystem of L while r(•) is a regularization function.</p><p>Many regularization functions may be used, however we restricted our choice to Regularized Laplacian ( <ref type="bibr" target="#b13">[14]</ref>): r(λ) = 1 + ηλ where η is a tuning variable. The regularized laplacian L is invertible and its inverse K = L−1 is taken as a kernel.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Experimental Evaluation</head><p>The application used for the comparison between the two approaches is the Predictive Toxicology Challenge.  <ref type="table" target="#tab_1">1</ref>), since mice and rats reactions to carcinogens compounds are significantly different. In this paper we consider all the provided datasets using, for both the methods, the following representation: each chemical compound is represented by a graph G(V, E, α, β) where V represents the set of atoms, E is the set of edges representing bonds, α is the nodes labeling function that associates an atom symbol to each node of the graph and β is the edges labeling function that associates a bond multiplicity to each edge of the graph. The separation of the sets in training and test set was made directly by the PTC committee, therefore no cross-validation in training phase is allowed. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Evaluation and Comparison</head><p>In Table <ref type="table" target="#tab_2">2</ref> we report the comparative results obtained on FM, MM, FR and MR datasets by the two different approaches (the classification by Graph Kernels is made by means of SVM classifier).</p><p>The three methods (two methods from the Graph Kernels) have comparable results. However the time required, in the training phase, by Symbolic Learning method is very high.</p><p>We can conclude that in problems that have strict time constraints Graph Kernels method is very effective while Symbolic Learning can be used only on smaller problems, or where there are less time constraints. On the other hand Symbolic Learning generates explicit, declarative knowledge on the classes in the form of GARGs that may be understood with little effort by a human expert: a direct inspection of such explicit prototypes can be sufficient to validate them or possibly to delete the badly formed prototypes that could be detrimental for classification performance. The Graph Kernel approach, instead, is able to perform a classification, but not to explain the criteria that guided its choice. In conclusion the results confirm that there is no way to decide what is the best method because both approaches have their peculiar virtues and deficiencies.</p><p>In our view, the two approaches should not be seen as competitors; instead, only their cooperative integration can provide us with more powerful tools for knowledge exploitation. The first step toward the integration of the two methodologies could be their combined use in a multiexpert system <ref type="bibr" target="#b4">[5]</ref>. As widely demonstrated in the literature, an accurate selection of the combining criteria allows us to significantly improve the results obtained by the single experts by retaining their strengths, while overcoming their weaknesses. We made a preliminary evaluation in this direction: the results show that the two systems exhibit a certain degree of orthogonality in their errors: that is, some of the samples misrecognized by one of the algorithms are correctly classified by the other one.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusions</head><p>In this paper we presented a comparison between two approaches, Graph Kernels and Symbolic Learning, within a classification scheme. Our experiments indicate that both approaches are comparable in terms of accuracy. Graph kernel methods require less execution times but does not provide an explanation of the classification criteria. Symbolic Learning, on the other hand have the reverse advantages and drawbacks. Both methods are thus clearly complementary.</p><p>Future steps, as we discussed in the last section, could be an integration of the two approaches in order to improve the results obtained by the application of the single approaches.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. An example of GARG covering two graphs. The example is extracted from the application domain used in the experimental phase. In bold the parts of the two graphs covered by G *</figDesc><graphic coords="5,69.70,240.74,346.23,124.02" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>4.1 The Application Domain and the Test SetsNowadays, almost every human being in industrialized societies is exposed to chemical compounds whose long-term effects are not evident. The carcinogenicity of such compounds, for instance, is often unknown. The US National Toxicology Program (NTP)5 conducted several experimentations on the effects of many chemical compounds on mice and rats, but such empirical results are expensive and very slow to obtain. Hence the necessity to find carcinogenicity models able to generate reliable toxicity predictions for other compounds. The Predictive Toxicology Challenge (PTC)6 was initiated in this aim. It enabled participants to submit machine learning programs which, from the large training datasets of the NTP, provide carcinogenicity predictions of compounds in test sets. According to the Structure Activity Relationship (SAR) hypothesis, which assume that the activity of a molecule can be entirely determined by its structure, the only informations used for toxicity prediction of a molecule are related to its chemical structure.Four distinct training and test sets were provided (see Table</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1 .</head><label>1</label><figDesc>Number of Chemical Compounds in the sets used for the experimentation. The chemical compounds are separated in 4 sets according to the reactions of: Female Mice (FM), Male Mice (MM), Female Rats (FR) and Male Rats (MR)</figDesc><table><row><cell>Dataset</cell><cell>FM MM FR MR</cell></row><row><cell>Training Set</cell><cell>Cancerogenic Non-Cancerogenic 124 123 140 124 78 67 62 70</cell></row><row><cell>Test Set</cell><cell>Cancerogenic Non-Cancerogenic 43 36 63 55 35 28 35 50</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2 .</head><label>2</label><figDesc>Accuracy and Elapsed Time (of the training phase) for the two methods on the FM, MM, FR and MR datasets Method FM MM FR MR Elapsed Time Symbolic Learning 60.4% 52.1% 62.0% 62.0% ∼ 2000 min Bag of trails Kernel 61.2% 50.4% 62.8% 65.6% ∼ 2 min Laplacian Kernel 59.2% 55.2% 57.7% 60.9% ∼ 1 min 30 sec</figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_0">National Toxicology Program: http://ntp.niehs.nih.gov/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_1">http://www.predictive-toxicology.org/</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Acknowledgments. We would like to express our thanks to Alice Kijewski and David Lemaresquier for their support in the experimental phase.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">A family of novel graph kernels for structural pattern recognition</title>
		<author>
			<persName><forename type="first">H</forename><surname>Bunke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Riesen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">XII Iberoamerican Congress on Pattern Recognition</title>
				<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="20" to="31" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Tree covering within a graph kernel framework for shape classification</title>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">X</forename><surname>Dupé</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Brun</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">XV ICIAP</title>
				<imprint>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Symbolic vs connectionist learning: an experimental comparison in a structured domain</title>
		<author>
			<persName><forename type="first">P</forename><surname>Foggia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Genna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Vento</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transaction on Knowledge and Data Engineering</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="176" to="195" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Convolution kernels on discrete structures</title>
		<author>
			<persName><forename type="first">D</forename><surname>Haussler</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1999">1999</date>
		</imprint>
		<respStmt>
			<orgName>Department of Computer Science, University of California at Santa Cruz</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Tech. rep.</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">On the use of classification reliability for improving performance of the one-per-class decomposition method</title>
		<author>
			<persName><forename type="first">G</forename><surname>Iannello</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Percannella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Sansone</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Soda</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Data &amp; Knowledge Engineering</title>
		<imprint>
			<biblScope unit="volume">68</biblScope>
			<biblScope unit="page" from="1398" to="1410" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Marginalized kernel between labeled graphs</title>
		<author>
			<persName><forename type="first">H</forename><surname>Kashima</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Tsuda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Inokuchi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the Twentieth International conference on Machine Learning</title>
				<meeting>of the Twentieth International conference on Machine Learning</meeting>
		<imprint>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Graph kernels based on tree patterns for molecules</title>
		<author>
			<persName><forename type="first">P</forename><surname>Mahé</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Vert</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine Learning</title>
		<imprint>
			<biblScope unit="volume">75</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="3" to="35" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Pattern recognition as rule-guided inductive inference</title>
		<author>
			<persName><forename type="first">R</forename><surname>Michalski</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transaction on PAMI</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="349" to="361" />
			<date type="published" when="1980">1980</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">Bridging the Gap Between Graph Edit Distance and Kernel Machines</title>
		<author>
			<persName><forename type="first">M</forename><surname>Neuhaus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Bunke</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2007">2007</date>
			<publisher>World Scientific Publishing Co., Inc</publisher>
			<pubPlace>River Edge, NJ, USA</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Shape recognition by integrating structural descriptions and geometrical/statistical transforms</title>
		<author>
			<persName><forename type="first">H</forename><surname>Nishida</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">CVIU</title>
		<imprint>
			<biblScope unit="volume">64</biblScope>
			<biblScope unit="page" from="248" to="262" />
			<date type="published" when="1996">1996</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Approximate graph edit distance computation by means of bipartite graph matching</title>
		<author>
			<persName><forename type="first">K</forename><surname>Riesen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Bunke</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Image Vision Computing</title>
		<imprint>
			<biblScope unit="volume">27</biblScope>
			<biblScope unit="issue">7</biblScope>
			<biblScope unit="page" from="950" to="959" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">A shape analysis model with applications to a character recognition system</title>
		<author>
			<persName><forename type="first">J</forename><surname>Rocha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Pavlidis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transaction on PAMI</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="393" to="404" />
			<date type="published" when="1994">1994</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Efficient graphlet kernels for large graph comparison</title>
		<author>
			<persName><forename type="first">N</forename><surname>Shervashidze</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">V</forename><surname>Vishwanathan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">H</forename><surname>Petri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Mehlhorn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">M</forename><surname>Borgwardt</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Twelfth International Conference on Artificial Intelligence and Statistics</title>
				<imprint>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Kernels and regularization on graphs</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">J</forename><surname>Smola</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">I</forename><surname>Kondor</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">XV Annual Conference on Learning Theory</title>
				<imprint>
			<date type="published" when="2003">2003</date>
			<biblScope unit="page" from="144" to="158" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Kernels, regularization and differential equations</title>
		<author>
			<persName><forename type="first">F</forename><surname>Steinke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Schölkopf</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Pattern Recognition</title>
		<imprint>
			<biblScope unit="volume">41</biblScope>
			<biblScope unit="issue">11</biblScope>
			<biblScope unit="page" from="3271" to="3286" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Kernel on bag of paths for measuring similarity of shapes</title>
		<author>
			<persName><forename type="first">F</forename><surname>Suard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rakotomamonjy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bensrhair</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">European Symposium on Artificial Neural Networks</title>
				<meeting><address><addrLine>Bruges-Belgique</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2007-04">April 2007</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
