Lazy associative graph classification Yury Kashnitsky, and Sergei O. Kuznetsov National Research University Higher School of Economics Moscow, Russia {ykashnitsky, skuznetsov}@hse.ru Abstract. In this paper, we introduce a modification of the lazy as- sociative classification which addresses the graph classification problem. To deal with intersections of large graphs, graph intersections are ap- proximated with all common subgraphs up to a fixed size similarly to what is done with graphlet kernels. We illustrate the algorithm with a toy example and describe our experiments with a predictive toxicology dataset. Keywords: graph classification, graphlets, formal concept analysis, pat- tern structures, lazy associative classification 1 Introduction Classification methods for data given by graphs usually reduce initial graphs to numeric representation and then use standard classification approaches, like SVM [1] and Nearest neighbors with graph kernels [2], graph boosting [3], etc. By doing so, one usually constructs numeric attributes corresponding to sub- graphs of initial graphs or computes graph kernels, which usually are also based on the number of common subgraphs of special type. In this paper, we suggest an approach based on weak classifiers in the form of association rules [4] applied in a “lazy” way: not all of the association rules are computed to avoid exponential explosion, but only those that are relevant to objects to be classified. Lazy classi- fication is well studied experimentally [5], here we extend the approach to graphs and propose a uniform theoretical framework (based on pattern structures [6]) which can be applied to arbitrary kinds of descriptions. We show in a series of experiments with data from the Predictive Toxicology Challenge (PTC [7]) that our approach outperforms learning models based on SVM with graphlet kernel [8] and kNN with graphlet-based distance. The rest of the paper is organized as follows. In Section 2, we give main definitions on labeled graphs, pattern structures, and lazy associative classifica- tion. In Section 3, we consider an example. In Section 4, we discuss the results of computational experiments on PTC dataset. In Section 5, we give the conclusion and discuss directions of further research. 2 Main definitions In this section, we give the definitions of the main concepts used in the paper. 2.1 Labeled graphs and isomorphism First, we recall some standard definitions related to labeled graphs, see e.g. [9,10,11]. Undirected graph is a pair G = (V, E). Set V is referred to as a set of nodes of a graph. Set E = {{v, u} | v, u ∈ V } ∪ E0 , a set of unordered elements of V , is called a set of edges, and E0 ⊆ V — is a set of loops. If E0 = ∅, then G is called a graph without loops. Graph H = (VH , EH ) is called a subgraph of graph G = (VG , EG ), if all nodes and edges of H are at the same time nodes and edges of G correspondingly, i.e. VH ⊆ VG and EH ⊆ EG . Graph H = (VH , EH ) is called an induced subgraph of graph G = (VG , EG ), if H is a subgraph of G, and edges of H are comprised of all edges of G with both nodes belonging to H. Given sets of nodes V , node labels LV , edges E, and edge labels LE , a labeled graph is defined by a quadruple G = ((V, lv), (E, le)) such that – lv ⊆ V × LV is the relation that associates nodes with labels, i.e., lv is a set of pairs (vi , li ) such that node vi has label li , – le ⊆ V × V × LE is the relation that associates edges with labels, i.e., le is a set of triples (vi , vj , lij ) such that edge (vi , vj ) has label lij . Example 1. A molecule structure can be represented by a labeled graph. NH12 C3 CH23 C4 OH5 Cl6 Here V = {1, 2, 3, 4, 5, 6}, E = {(1, 3), (2, 3), (3, 4), (4, 5), (4, 6)}, lv = {(1, N H2 ), (2, CH3 ), (3, C), (4, C), (5, OH), (6, Cl)}, le = {(1, 3, 1), (2, 3, 1), (3, 4, 2), (4, 5, 1), (4, 6, 1)}, and edge type 1 corresponds to a single bond (ex. HN2 —C) while edge type 2 – to a double bond (ex. C = C). A labeled graph G1 = ((V1 , lv1 ), (E1 , le1 )) dominates a labeled graph G2 = ((V2 , lv2 ), (E2 , le2 )) with given order ≤ (e.g. natural, lexicographic) on vertex and edge labels, or G2 ≤ G1 (or G2 is a subgraph of G1 ), if there exists an injection ϕ : V2 → V1 such that it: – respects edges: (v, w) ∈ E2 ⇒ (ϕ(v), ϕ(w)) ∈ E1 , – fits under labels: lv2 (v) ≤ lv1 (ϕ(v)), (v, w) ∈ E2 ⇒ le2 (v, w) ≤ le1 (ϕ(v), ϕ(w)). Two labeled graphs G1 and G2 are called isomorphic (G1 ' G2 ) if G1 ≤ G2 and G2 ≤ G1 . CH13 C3 OH2 NH12 C3 Cl2 Example 2. G1 : C4 G2 : C4 Cl 5 NH62 CH53 OH6 G1 ' G2 as ∃ϕ : V2 = {1, 2, 3, 4, 5, 6} → V1 = {1, 2, 3, 4, 5, 6} = (6, 5, 4, 3, 1, 2), satisfying the definitions of graph dominance and isomorphism. An injective function f : V → V 0 is called a subgraph isomorphism from G to G , if there exists a subgraph of G0 : S ≤ G0 , such that f is a graph isomorphism 0 from G to S, or G ' S. CH13 C3 OH2 NH12 C3 Cl2 Example 3. G1 : C4 G2 : C4 NH62 CH53 OH6 G1 is subgraph-isomorphic to G2 . Given labeled graphs G1 and G2 , a set G1 u G2 = {G | G ≤ G1 , G2 , ∀G∗ ≤ G1 , G2 G∗ 6≥ G} is called a set of maximal common subgraphs of graphs G1 and G2 . We also refer to G1 u G2 as to intersection of graphs G1 and G2 , and to u – as to similarity operator defined on graphs.  NH  CH  NH CH 3 C CH3  C OH  C C     2  3  2                    u        =  ,       Example 4.  C  C  C C               OH Cl       NH2 OH       OH OH    For sets of graphs G = {G1 , . . . , Gk } and H = {H1 , . . . , Hn } the similarity operator is defined in the following way: G u H = MAX≤ {Gi u Hi | Gi ∈ G, Hj ∈ H} Given sets of labeled graphs G1 and G2 , we say that a set of graphs G1 is subsumed by a set of graphs G2 , or G1 v G2 , if G1 u G2 = G1 . 2.2 Graphlets Definition 1. A labeled graph g is called a k-graphlet of a labeled graph G if g is a connected induced subgraph of graph G with k nodes [12]. Definition 2. A set of labeled graphs G k is called a k-graphlet representation of a labeled graph G if any g ∈ G is a unique (up to subgraph isomorphism) k- graphlet of graph G, i.e ∀g ∈ G k graph g is a k-graphlet of G, ∀g1 , g2 ∈ G one does not have g1 ≤ g2 . Definition 3. k-graphlet distribution of a labeled graph G is the set {(gi , ni )}, where gi is a k-graphlet of G and ni is the number of k-graphlets in G isomorphic to gi . H H CH3 H OH H Example 5. G1 : G2 : H H H OH H H G1 = {C − C = C, C − C − H, C = C − H, C − C − C}, G2 = {C−C = C, C−C−H, C = C−H, C−C−O, C = C−O, C−O−H} – are 3-graphlet representations of graphs G1 and G2 correspondingly (with benzene rings comprised of carbon molecules C). 3-graphlet distributions of graphs G1 and G2 are given in Table 1. Table 1. 3-graphlet distributions of graphs G1 and G2 (benzene rings are comprised of carbon molecules C). CC=C CCH C=CH CCO C=CO COH CCC G1 7 8 5 0 0 0 1 G2 6 4 4 2 2 2 0 Graphlets were introduced in biomedicine and are used to compare real cellu- lar networks with their models. It is easy to demonstrate that two networks are different by simply showing a short list of properties in which they differ. It is much harder to show that two networks are similar, as it requires demonstrating their similarity in all of their exponentially many properties [12]. Graphlet distribution serves as a measure of network local structure agree- ment and was shown to express more structural information than other metrics such as centrality, local clustering coefficient, degree distribution etc. In [12], they considered all 30 combinations1 of graphlets with 2, 3, 4 and 5 nodes. 2.3 Pattern structures Pattern structures are natural extension of ideas proposed in Formal Concept Analysis [13], [6]. Definition 4. Let G be a set (of objects), let (D, u) be a meet-semi-lattice (of all possible object descriptions) and let δ : G → D be a mapping between objects and descriptions. Set δ(G) := {δ(g)|g ∈ G} generates a complete subsemilattice (Dδ , u) of (D, u), if every subset X of δ(G) has infimum uX in (D, u). Pattern structure is a triple (G, D, δ), where D = (D, u), provided that the set δ(G) := {δ(g) | g ∈ G} generates a complete subsemilattice (Dδ , u) [6,11]. 1 https://parasol.tamu.edu/dreu2013/OLeary Definition 5. Patterns are elements of D. Patterns are naturally ordered by subsumption relation v: given c, d ∈ D one has c v d ⇔ c u d = c. Operation u is also called a similarity operation. A pattern structure (G, D, δ) gives rise to the following derivation operators (·) : l A = δ(g) for A ∈ G, g∈A d = {g ∈ G | d v δ(g)} for d ∈ (D, u). Pairs (A, d) satisfying A ⊆ G, d ∈ D, A = d, and A = d are called pattern concepts of (G, D, δ). Example 6. Let {1, 2, 3} be a set of objects, {G1 , G2 , G3 } – be a set of their descriptions (i.e., graph representations): CH 3 C NH2 NH2 C OH NH2 C OH G1 : C G2 : C G3 : C NH2 NH2 C H3 Cl NH2 Cl D is the set of all sets of labeled graphs, u is a graph intersection operator, D = (D, u). A set of objects (graphs) {1, 2, 3}, their “descriptions” (i.e. graphs themselves) D = {G1 , G2 , G3 } (δ(i) = Gi , i = 1, . . . , 3), and similarity operator u comprises a pattern structure ({1, 2, 3}, D, δ). {1, 2, 3} = {N H2 − C = C}, because {N H2 − C = C} is the only graph, subgraph-isomorphic to all three graphs 1, 2, and 3. Likewise, {N H2 − C = C} = {1, 2, 3}, because graphs 1, 2, and 3 subsume graph {N H2 − C = C}. {1, 2} = {CH3 − C = C − N H2 }, because {CH3 − C = C − N H2 } is a graph, subgraph-isomorphic to 1, and 2, but not to graph 3. Likewise, {CH3 − C = C − N H2 } = {1, 2}, because only graphs 1, and 2 subsume graph {CH3 − C = C − N H2 }, but graph 3 does not. Here is the set of all pattern concepts for this pattern structure: NH2 C H3 NH2 { C ! C ! C ! {1, 2, 3} , C , {1, 2} , C , {1, 3} , C , NH2 NH2 NH2 } C OH ! {2, 3} , C , (1, {G1 }) , (2, {G2 }) , (3, {G3 }) , (∅, {G1 , G2 , G3 }) . Cl For some pattern structures (e.g., for the pattern structures on sets of graphs with labeled nodes) even computing subsumption of patterns may be NP-hard. Hence, for practical situations one needs approximation tools, which would re- place the patterns with simpler ones, even if that results in some loss of infor- mation. To this end, we use a contractive monotone and idempotent mapping ψ : D → D that replaces each pattern d ∈ D by ψ(d) such that the pattern structure (G, D, δ) is replaced by (G, D, ψ ◦ δ). Under some natural algebraic requirements that hold for all natural projections in particular pattern struc- tures we studied in applications, see [11], the meet operation u is preserved: ψ(X u Y ) = ψ(X) u ψ(Y ). This property of a projection allows one to relate premises in the original representation with those approximated by a projection. In this paper, we utilize projections to introduce graphlet-based classification rules. 2.4 Lazy associative classification Consider a binary classification problem with a set of positive examples G+ , negative examples G− , test examples Gtest , and a pattern structure (G+ ∪ G− , D, δ) defined on the training set. Definition 6. A pattern h ∈ D is a positive premise iff [11] h ∩ G− = ∅ and h ∩ G+ 6= ∅ A positive premise is a subset of the least general generalization of descriptions of positive examples, which is not contained in (does not cover) any negative example. A negative premise is defined similarly. Various classification schemes using premises are possible, as an example consider the following simplest scheme from [6]: if the description δ(g) of an undetermined example g contains a positive premise h, i.e., h v δ(g), then g is classified positively. Negative classifications are defined similarly. If δ(g) contains premises of both signs, or if δ(g) contains no premise at all, then the classification is contradictory or undetermined, respec- tively, and some probabilistic techniques allowing for a certain tolerance should be applied. Definition 7. Class association rule (CAR) [5] for a binary classification prob- lem is an association rule in a form h → {+, −}, where h is a positive or negative premise, respectively. The definition means that for a binary graph classification problem, for in- stance, we can mine classification association rules in a form {gi } → {+, −}, i.e. if a test graph subsumes a subgraph gi , that is common only to positive (negative) training examples, it is therefore classified as positive (negative). We elaborate this idea in the next subsection. As there might be lots of such CARs, we might come up with a single classification rule taking into account these CARs. For instance, we can count all positive and negative CARs for each test object and classify it with a majority voting procedure. Of course, the idea is eas- ily generalized to multi-label classification problem. The described classification schemes are explored in [5]. Another advantage of the lazy classification framework is its obvious par- allelization. Suppose there are K processors. If we consider classification of an unlabeled object we can divide the training set into K separate subsets. Then, for each subset we perform intersections between the labeled objects with the un- labeled one to be classified. After all unfalsified intersections are found we can go on to the classification phase which involves voting based on those intersections. 2.5 Graphlet-based lazy associative classification In this subsection, we combine the ideas of pattern structures and their pro- jections, graphlets, and lazy associative classification, and introduce our algo- rithm. First, we recall the definition of k-projection producing all graphs with less than or equal to k nodes. Definition 8. Given a graph pattern structure (G, D, δ), we call ψk (G) = {Hi = ((Vi , lvi ), (Ei , lei )) | Hi ≤ G, Hi is connected, |Vi | ≤ k} a k-projection, defined for graph descriptions G. Obviously, this operator is a projection, i.e. contractive, monotone, and idempo- tent function. Definition 9. Given S a graph pattern structure (G, D, δ), k-graphlet deriva- tion operator δk = 1≤l≤k ψl ◦ δ takes an object g described by graph G and produces all l-graphlets of G for l = 1, . . . k. Example 7. For object 1 with “graph description” G1 from example 5 δ3 (1) is the set of all 1-,2-, and 3-graphlets of graph 1: δ3 (1) = {C, H, C − C, C = C, C − H, C − C = C, C − C − H, C = C − H, C − C − C}. To clarify, here δ(1) = {G1 }, δ3 (1) = ψ3 (δ(1)) = ψ3 (G1 ) = {Hi = ((Vi , lvi ), (Ei , lei )) | Hi ≤ G1 , |Vi | ≤ 3}. Definition 10. Given k-graphlet representations G1k and G2k of labeled graphs G1 and G2 , the intersection G1k uk G2k is called k-graphlet intersection of G1 and G2 . The uk operator is further called k-graphlet similarity operator. Example 8. For graphs 1 and 2 with “graph descriptions” G1 and G2 from exam- ple 5 G1 u3 G2 = {C, H, C −C, C = C, C −H, C −C = C, C −C −H, C = C −H} is the set of all common 1-, 2-, and 3-graphlets of graphs 1 and 2. Here are the main steps of our algorithm: 1. All k-graphlet intersections of test examples and positive training examples are computed: h+ = Gtr uk G+ ; 2. Each intersection h+ is tested on subsumption by negative training examples. If some of them subsumes h+ , then this intersection is falsified. Otherwise, h+ gives a vote for positive classification of the test example Gtr ; 3. The same procedure is done for each intersection of Gtr with negative ex- amples; 4. Test example Gtr is classified according to the weighted majority rule where each unfalsified intersection is given a weight equal to its cardinality (the cardinality of the corresponding set of graphs). 3 A toy example We illustrate the principle of our method with a toy example. Let us consider the following training and test sets comprised of molecular descriptions of toxic (G1 – G4 ) and non-toxic (G5 – G7 ) chemical compounds. The task is to build a discriminative classifier able to determine whether the objects from the test set (G8 − G11 ) are toxic or not. The main steps of the algorithm, described in the previous section, are briefly illustrated with Tables 2 and 3. First, we build 3-graphlet intersections of test and training examples (we use only graphlets with 3 nodes for the purpose of illustration). Then, a “+” or “—” sign with cardinality of intersection is put in Table 3 if this intersection is not subsumed by any example of the opposite class. Otherwise, the counter-example subsuming this intersection is given. Positive examples: A C B A C B A C B A C E G1 : C G2 : C G3 : C G4 : C D D B D A E B E Negative examples: A C D A C E B C D G5 : C G6 : C G7 : C D D B D D E Test examples: A C B A C D A C D A C B G8 : C G9 : C G10 : C G11 : C D E B E D E A D 3-graphlet intersections of training and test examples are given in Table 2. For instance, graphs G1 and G8 have 4 common 3-graphlets: A–C–B, A–C=C, B– C=C, and C=C–D. In this simple case, we do not differentiate between a single and a double bond (e.g., ACC here stands for A–C=C without ambiguity). Further, Table 3 summarizes the procedure. For instance, a ’+4’ sign for graphs G1 and G8 means that all common 3-graphlets of G1 and G8 (i.e., A–C– B, A–C=C, B–C=C, and C=C–D) are not subgraph-isomorphic to any of the negative examples G5 – G7 altogether at the same time. Thus, this intersection “gives a vote” of weight 4 (the cardinality of the mentioned set of graphlets) for positive classification of G8 . On the contrary, all common 3-graphlets of G4 and G8 (A–C=C, B–C=C, and C=C–E) are altogether subgraph-isomorphic to negative example G6 , therefore, the intersection of G4 and G8 doesn’t “give a vote” for positive classification of G8 . Thus, molecules G8 and G11 are classified as toxic, G9 , G10 are classified as non-toxic. 4 Experiments The proposed algorithm was tested with the 2001 Predictive Toxicology Challenge dataset in comparison with SVM with graphlet kernel and k-Nearest- Table 2. All common 3-graphlets of test (G8 − G11 ) and training examples. G8 G9 G10 G11 G1 ACB, ACC, BCC, CCD ACC, BCC, CCD ACC, CCD ACB, ACC, BCC, CCD G2 ACB, ACC, BCC, CCD ACC, BCC, CCD ACC, CCD ACB, ACC, BCC, CCD G3 ACB, ACC, BCC, CCE ACC, BCC, CCE ACC, CCE ACB, ACC, BCC G4 ACC, BCC, CCE ACC, BCC, BCE, CCE ACC, CCE ACC, BCC G5 ACC, CCD ACC, ACD, CCD ACC, ACD, CCD ACC, ACD, CCD G6 ACC, BCC, CCD, CCE ACC, BCC, CCD, CCE ACC, CCD, CCE ACC, BCC, CCD G7 BCC, CCD, CCE, DCE BCC, CCD, CCE CCD, CCE, CDE BCC, CCD Table 3. Lazy classification table G1 G2 G3 G4 G5 G6 G7 Score Class G8 +4 +4 +4 G6 G1 –4 –4 4:0 + G9 G6 G6 G6 +4 –3 –4 –3 0:6 — G10 G5 G5 G6 G6 –3 –3 –3 0:9 — G11 +4 +4 +3 G6 –3 G1 G1 8:0 + Neighbor with graphlet-based Hamming distance. SVM classifiers are considered to be good benchmarks for graph classification problem [8]. We implemented a Scikit-learn [14] version of Support Vector Classifier with graphlet kernel and graphlets having up to 5 nodes. We also adopted a k-Nearest-Neighbor for graph classification problem by defining a Hamming distance between two graphs (0 if two objects have a certain graphlet in common, 1 otherwise). For instance, for two graphs from example 5 in case of graphlets with up to 3 nodes this distance is equal to 7 (G1 subsumes graphlet C − C − C not subsumed by G2 , while G2 subsumes graphlets {O, C − O, O − H, C − C − O, C = C − O, C − O − H} not subsumed by G1 ). The training set is comprised of 417 molecular graphs of chemical compounds with indication of whether a compound is toxic or not for a particular sex and species group out of four possible groups: {mice, rats} × {male, female}. Thus, 4 separate sets were built for male rats (MR, 274 examples, 117 are toxic for male rats, 157 are non-toxic), male mice (MM, 266 examples, 94 are positive, 172 are negative), female rats (FR, 281 examples, 86 are positive, 195 are negative) and female mice (FM, 279 examples, 108 are positive, 171 are negative). We run 5-fold cross-validation for each group (MR, MM, FR, FM) and com- pared average classification metrics for each fold. The results for male rats are presented in Table 4 (we got similar results for other groups). The parameters for SVM and kNN classifiers were tuned through the pro- cess of GridSearch cross-validation2 . The ’K nodes’ parameter determines the maximum number of nodes in graphlet representation of graphs, i.e. when it is equal to 4, all graph are approximated with their 4-graphlet representation, or all unique (in the sense of isomorphism) graphlets with up to 4 nodes. As we can observe, graphlet-based lazy associative classification is reason- able with at least 3-graphlet descriptions. In case of 2-graphlet descriptions the 2 http://scikit-learn.org/stable/modules/grid\_search.html Table 4. Experimental results for the male rats group. “GLAC” stands for “Graphlet- based lazy associative classification”, “SVM” here denotes “Support Vector Machine with graphlet kernel” “kNN” here stands for a k-Nearest-Neighbor classifier with Ham- ming distance. K nodes Accuracy Precision Recall F-score Time (sec.) 2 0.36 0.32 0.33 0.32 5.78 3 0.68 0.83 0.68 0.75 17.40 GLAC 4 0.59 0.57 0.62 0.59 65.72 5 0.55 0.7 0.62 0.66 196.03 2 0.45 0.15 0.33 0.21 1.54 3 0.52 0.35 0.35 0.35 9.03 SVM 4 0.41 0.27 0.28 0.28 61.31 5 0.36 0.24 0.25 0.24 295.89 2 0.45 0.15 0.33 0.21 3.35 3 0.34 0.21 0.23 0.22 15.75 kNN 4 0.48 0.31 0.32 0.31 73.38 5 0.45 0.30 0.31 0.30 211.58 algorithm often refuses to classify test objects, because 2-graphlet intersections of positive and test objects are falsified by negative objects and vice versa. But 3-graphlet descriptions are optimal for this method as the model is probably overfitted in case of 4- and 5-graphlet descriptions. 5 Conclusion In this paper, we have proposed an approach to graph classification based on the combination of graphlets, pattern structures and lazy classification. The key principle of lazy classification is that one does not have to produce the whole set of classification rules whatever they are. Instead, one generates those rules that allow one to classify the current test object. The framework favors the complex structure of objects as soon as the algorithm does not require a training phase. We have carried out a number of experiments in molecule classification within the proposed lazy classification framework. We compared classification perfor- mance of our method and SVM with graphlet kernel and KNN with graphlet- based distance. The reason for such a choice is that SVM classifiers are considered to be good benchmarks for graph classification problem, while kNN is a famous lazy classification method. In our experiments graphlet-based lazy classification - following the same learning curve as the other methods - shows better classification performance compared to the classical methods in case of molecule toxicology prediction problem. Further, we plan to investigate the overfitting problem for our algo- rithm, in particular, the dependency of classification metrics on the number of considered nodes in graphlets. Other types of descriptions and a parallel version of our algorithm are also promising directions of study. References 1. Corinna Cortes and Vladimir Vapnik, “Support-Vector Networks,” Mach. Learn., vol. 20, no. 3, pp. 273–297, Sept. 1995. 2. S. V. N. Vishwanathan, Nicol N. Schraudolph, Risi Kondor, and Karsten M. Borg- wardt, “Graph Kernels,” J. Mach. Learn. Res., vol. 11, pp. 1201–1242, Aug. 2010. 3. Hiroto Saigo, Sebastian Nowozin, Tadashi Kadowaki, Taku Kudo, and Koji Tsuda, “GBoost: a mathematical programming approach to graph classification and re- gression,” Machine Learning, vol. 75, no. 1, pp. 69–89, 2009. 4. Rakesh Agrawal and Ramakrishnan Srikant, “Fast Algorithms for Mining Associa- tion Rules in Large Databases,” in Proceedings of the 20th International Conference on Very Large Data Bases, San Francisco, CA, USA, 1994, VLDB ’94, pp. 487–499, Morgan Kaufmann Publishers Inc. 5. Adriano Veloso, Wagner Meira Jr., and Mohammed J. Zaki, “Lazy Associative Classification,” in Proceedings of the Sixth International Conference on Data Min- ing, Washington, DC, USA, 2006, ICDM ’06, pp. 645–654, IEEE Computer Society. 6. Bernhard Ganter and Sergei Kuznetsov, “Pattern Structures and Their Pro- jections,” in Conceptual Structures: Broadening the Base, Harry Delugach and Gerd Stumme, Eds., vol. 2120 of Lecture Notes in Computer Science, pp. 129–142. Springer, Berlin/Heidelberg, 2001. 7. Christoph Helma and Stefan Kramer, “A Survey of the Predictive Toxicology Challenge 2000-2001,” Bioinformatics, vol. 19, no. 10, pp. 1179–1182, 2003. 8. Nino Shervashidze, S. V. N. Vishwanathan, Tobias Petri, Kurt Mehlhorn, and Karsten M. Borgwardt, “Efficient graphlet kernels for large graph comparison,” Journal of Machine Learning Research - Proceedings Track, vol. 5, pp. 488–495, 2009. 9. Reinhard Diestel, Graph Theory (Graduate Texts in Mathematics), Springer, Au- gust 2005. 10. Horst Bunke and Kim Shearer, “A Graph Distance Metric Based on the Maximal Common Subgraph,” Pattern Recogn. Lett., vol. 19, no. 3-4, pp. 255–259, Mar. 1998. 11. Sergei O. Kuznetsov, “Scalable Knowledge Discovery in Complex Data with Pat- tern Structures,” in PReMI, Pradipta Maji, Ashish Ghosh, M. Narasimha Murty, Kuntal Ghosh, and Sankar K. Pal, Eds. 2013, vol. 8251 of Lecture Notes in Com- puter Science, pp. 30–39, Springer. 12. Natasa Przulj, “Biological network comparison using graphlet degree distribution,” Bioinformatics, vol. 23, 2003. 13. Bernhard Ganter and Rudolf Wille, Formal Concept Analysis: Mathematical Foun- dations, Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1st edition, 1997. 14. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Ma- chine Learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.