=Paper= {{Paper |id=Vol-205/paper-4 |storemode=property |title=Graph Retrieval with the Suffix Tree Model |pdfUrl=https://ceur-ws.org/Vol-205/paper4.pdf |volume=Vol-205 }} ==Graph Retrieval with the Suffix Tree Model== https://ceur-ws.org/Vol-205/paper4.pdf
       Graph Retrieval with the Suffix Tree Model
                   Mathias Lux1 and Sven Meyer zu Eissen2 and Michael Granitzer3


Abstract. The paper in hand presents an adoption of the             2. The operationalization of the retrieval functionality.
suffix tree model for the retrieval of labeled graphs. The suffix
tree model encodes path information of graphs in an efficient         The second challenge restricts the flexibility in formulating
way and so reduces the size of the data structures compared         a similarity function: ϕ must not be expensive to evaluate in
to path index based approaches, while offering a better run-        terms of runtime complexity since in our case a user waits
time performance than subgraph isomorphism based meth-              actively for retrieval results.
ods. Within a specific use case we evaluate the correlation of
the developed method to human judgement and compare the             2     RELATED WORK
correlation values to other methods. We show that in our use
case, which is the retrieval of digital photos annotated with       Although maximum common subgraph isomorphism is a nat-
MPEG-7 using the MPEG-7 Semantic Description Scheme,                ural starting point for graph similarity computation (see [3]),
the presented algorithm performs better than other methods.         it cannot be applied to our scenario: First, the question if
                                                                    two graphs G and H contain an isomorphic subgraph whose
                                                                    edge set has more than k ∈ N elements is NP-complete (see
1    INTRODUCTION                                                   [6]). Second, quantifying similarity using ratios of subgraph
Let G = hV, Ei be a graph, where V denotes the node set and         edge set sizes solely may not reflect our problem, since edge
E ⊆ V × V denotes the edge set. Given a query graph Gq and          label matches can be of different importance, depending on
a graph set G, graph retrieval deals with the task to identify      the value of an edge label.
a subset R ⊆ G with the property                                       For this and similar reasons, graph retrieval algorithms are
                                                                    tailored to the requirements of the underlying use case. For
                    ∀G ∈ R : ϕ(Gq , G) ≥ t                          example, Fonseca et al. used graph invariants of trees—in
                                                                    this specific case the eigenvalues of the tree’s and subtree’s
where ϕ : G × G → R denotes a similarity function and t ∈ R
                                                                    adjacency matrix—to identify relevant cliparts represented
is a minimum similarity threshold.
                                                                    as trees, representing adjacency and inclusion of color areas
   The research question how to search similar graphs in a
                                                                    within the cliparts, in a database (see [5],[12]).
database was already prescribed in a work by Simmons in
                                                                       Zong et al. (see [18]) retrieved labeled graphs using an index
1966 (see [13]), in which he matched conceptual graphs. Since
                                                                    in which the labels of paths up to a certain length were stored.
then, different applications areas emerged; they include query-
                                                                    The relevance between a query graph and a graph from the
ing chemical graph databases that store molecular structures,
                                                                    database was computed from a TF*IDF-like similarity mea-
retrieving vector and raster images using characteristics en-
                                                                    sure that was applied to the edge labels.
coded in a graph, and recently, searching in semantically en-
                                                                       Berreti et al. (see [2]) extracted information on neighbour-
riched data in the context of semantic Web applications.
                                                                    ing colour regions from raster images, which was encoded in
   Our application scenario relates to multimedia retrieval
                                                                    directed labeled graphs. To retrieve similar images a graph
with the MPEG-7 standard, where metadata are represented
                                                                    database was queried employing a tailored metric, which
as graphs: A user formulates his or her information need in the
                                                                    proved as slow but highly configurable.
form of a graph, which is then matched against an MPEG-7
graph database G.
   A property of MPEG-7 graphs is that their nodes and edges        2.1    Contribution
are labeled with text, say, for each G ∈ G there exists a func-
                                                                    Text retrieval methods based on the vector space model, es-
tion lE : E → TE as well as lV : V → TV , where TE , TV are
                                                                    pecially those using inverted lists as described in [1], have
term sets. The goal is to retrieve graphs that match both, the
                                                                    been applied to graph retrieval before: A graph’s labels form
query graph’s structure as well as the labels. The challenges
                                                                    a virtual document; likewise, the query graph’s labels are used
in this connection are twofold:
                                                                    to construct a query document. The similarity between these
1. The statement of a similarity function ϕ that reflects the       documents is computed using the vector space model along
   application scenario, and                                        with standard similarity measures like TF*IDF or BM-25.
                                                                       Unlike traditional vector space approaches our proposed
1 University of Technology Graz, Knowledge Management Insti-        method employs the suffix tree model, described in [8]. Its
  tute, Austria, email: mathias.lux@tugraz.at
2  Bauhaus University Weimar, Germany, email: sven.meyer-zu-        advantage is that similarity computations incorporate word
  eissen@medien.uni-weimar.de                                       order within sentences and text fragments. Applied to the out-
3 Know-Center Graz, Austria, email: mgrani@know-center.at
                                                                    lined MPEG-7 retrieval scenario, this property is especially
useful when matching labels in a graph’s paths, yielding to         similarity values while keeping the computational complexity
better similarity values like the respective experiments show.      bounded by a linear function. In this connection, knowledge
                                                                    about suffix trees is necessary prerequisite; some details are
                                                                    summarized in the next section.
3    APPLICATION SCENARIOS
The specification of semantics often follows a graph modeling       4.1    Suffix Trees
approach; the pioneering work of Sowa (see [14]) is one of
many examples. Similarity search in this and related contexts       The ith suffix of a document d = w1 . . . wm is the substring
reduces to graph retrieval.                                         of d that starts with word wi . A suffix tree of d is a labeled
   Currently a trend towards a semantically enriched Web can        tree that contains each suffix of d along a path whose edges
be noted. This movement started with the vision of a semantic       are labeled with the respective words. The construction of a
Web by Berners-Lee (see e.g. foreword in [4]) and resulted in       suffix tree is straightforward: The ith suffix of d is inserted by
the definition of a syntax for semantics, formally defined in an    checking whether some edge emanating from the root node is
ontology language based on the Resource Description Frame-          labeled with wi . If so, this edge is traversed and it is checked
work (RDF), which uses a model based on directed labeled            whether some edge of the successor node is labeled with wi+1 ,
graphs.                                                             and so on. If, in some depth k, a node n without a matching
   Another initiative, aimed at an interoperable standards for      edge is reached, a new node is created and linked to node n
multimedia data, is the Moving Picture Expert Group, in             with an edge labeled with wi+k .
short MPEG. Within their Multimedia Content Description                Figure 1 illustrates a the suffix tree in which the documents
Interface, short name MPEG-7, they defined a way to seman-          d1 =“Boy plays chess” and d2 =“Boy plays bridge too” have
tically describe the contents of multimedia files by intercon-      been inserted.
necting semantic objects (e.g. agents, places, and so on) by
typed semantic relations (see [7] for more details), which again                                         ϕ
results in directed labeled graphs that encode semantics.




                                                                                                                               bri
   All of the above mentioned scenarios model semantics with                                 s




                                                                                                                                dg
                                                                                          lay




                                                                                                                       chess
                                                                                        yp                                              to




                                                                                                            ys




                                                                                                                                 et
directed labeled graphs. While the same edge label can be                             bo                                                   o




                                                                                                         pla




                                                                                                                                   oo
used more than once within a graph, we assume that node
labels are unique within a graph as defined in MPEG-7, RDF                    chess   bridge     chess       bridge
                                                                                       too                    too
and conceptual graphs.

                                                                                                                    "boy plays chess"
4    APPLYING THE SUFFIX TREE                                                                                       "boy plays bridge too"
     MODEL TO GRAPH RETRIEVAL
                                                                    Figure 1. A suffix tree in which the documents d1 =“Boy plays
Information retrieval methods that have been used in the               chess” and d2 =“Boy plays bridge too” have been inserted.
past for graph retrieval have in common that they transform
database graphs Gi ∈ G as well as query graphs Gq to docu-
ments di and dq , respectively, which are then compared using
their vector space model representations in combination with
a related similarity measure like the cosine similarity. Here,      4.2    Path-based Graph Suffix Trees
the documents consist of sentences, which are made up of            Let di denote the document that is associated with Gi , and
node and edge label concatenations from paths in the corre-         likewise, let dq denote the document that is associated with
sponding graphs. This methodology raises two questions:             Gq . Both, di and dq consist of “sentences”, which are concate-
                                                                    nations of path labels from selected paths from Gi and Gq ,
1. Which paths of a graph should be used for the construction
                                                                    following a heuristic mentioned above.
   of di and dq ?
                                                                       A natural similarity measure between di and dq arises when
2. Which retrieval methodology should be chosen for query
                                                                    inserting each suffix from each sentence of di and dq into an
   matching?
                                                                    initially empty suffix tree GS = hVS , ES i. Let Ei ⊆ ES denote
                                                                    the set of the edges that have been traversed when all suffixes
   With respect to point (1), some heuristics have been pro-
                                                                    of di ’s sentences have been inserted into GS , and analogously,
posed. One prominent method is discussed in connection with
                                                                    let Eq ⊆ ES denote the traversed edge set for all sentences’
GraphGrep (see [11]). The paths of a graph are extracted ei-
                                                                    suffixes from dq . The similarity between di and dq can be
ther by identifying all paths in a graph up to a certain length,
                                                                    measured by how many edges Ei and Eq have in common,
e.g. with a depth first or breadth first search starting from
                                                                    e.g. quantified by the Jaccard coefficient:
each vertex (see e.g. [15]), or by identifying frequent substruc-
tures within the graphs (see e.g. [17] or [16]).                                                                 | Ei ∩ E q |
   The focus of our research refers to point (2). Known graph                          ϕS (Gi , Gq ) =
                                                                                                                 | E i ∪ Eq |
retrieval methods that rely on the vector space model disre-
gard term order or include only partial term order informa-            Furthermore in [8] two more weighting schemes using term
tion when using n-grams for indexing. In the following, a sim-      frequency and inverse document frequency of edges, are de-
ilarity measure is presented that tackles the aforementioned        scribed to enhance relevance and precision. For similarity cal-
problem; it compiles full path label order information into the     culation of graphs such a weighting can be applied.
  In addition to the two original weighting schemes a third
                                                                        1,000
scheme relying solely on IDF can be introduced. Stripping
                                                                        0,900
the term frequency from the original weighting formula, a




                                                                                                                          1
                                                                                                                    0, 6
                                                                                                                   0, 2

                                                                                                                       79
                                                                                                                      78
                                                                                                                      78
                                                                                                         0, 9


                                                                                                               3
                                                                                                         0, 3



                                                                                                                   0,
                                                                        0,800




                                                                                                             73


                                                                                                           73
similarity measure can be defined as follows:




                                                                                                           73
                                                                                0, 9




                                                                                                          0,
                                                                                0, 5
                                                                                     4
                                                                                   68
                                                                                  68
                                                                                  68




                                                                                               0, 5
                                                                                               0, 4


                                                                                                     4
                                                                                0,




                                                                                                  65
                                                                                                  65


                                                                                                  65
                                                                        0,700




                                                                                               0,
                                                                        0,600
                             1   X                                                                                            no relations
      ϕidf (Gi , Gq ) =            traversed(e) · IDF (e)               0,500                                                 undirected r.
                          | ES |                                                                                              full r.
                               e∈ES                                     0,400

                                                                       0,300
                                        0    e∈/ E i ∩ Eq
          with traversed (e) =                                          0,200
                                        1    e ∈ E i ∩ Eq
                                                                        0,100

  Here, IDF : E → R is defined to be the inverse docu-                  0,000
                                           n                                    no weighting      TF      TF*IDF      IDF
ment frequency function, IDF (e) = log( S(e)  ), with n being
the total number of documents and S : E → N denoting the
function that delivers the number of distinct documents that      Figure 3.       Evaluation of the Suffix Tree Metric in correlation to
traversed a given edge on insertion into the suffix tree.                                    human judgement


5   EVALUATION                                                    of the evaluation of the suffix tree model based metrics is
                                                                  shown in figure 3. With each weighting scheme three different
Although the presented suffix tree model for graphs can be        strategies for building the tree are evaluated: A first approach
applied to arbitrary graphs with node and edge labels, the        is to build the tree without taking the edge labels into account
evaluation was done within a multimedia retrieval scenario:       (shown as option no relations in figure 3), so only the sequence
Using MPEG-7, the Multimedia Content Description Inter-           of node labels is inserted into the tree. A second approach is
face, multimedia documents can be annotated using graphs          to normalize all relation labels without taking their directions
expressing the semantics of the multimedia document. This         into account (shown as option undirected r. in figure 3). This
particular functionality of MPEG-7 is defined in the Semantic     can be done by ignoring all direction information on edges.
Description Scheme (see [7] for details on MPEG-7).               The third option is to use the full paths including node and
                                                                  edge labels (shown as option full r. in figure 3).
                   Mathias                   Graz
                                                                     As can be seen easily the suffix tree model cannot provide
                  agentOf                   locationOf            an optimal approximation of human judgement with any of
                                                                  the presented weighting schemes. With no weighting schema a
                              Talking                             rounded maximum correlation value of 0.689 can be achieved.
                  patientOf                 patientOf             With the term frequency weighting, which was proposed in
                                                                  the original publications the correlation value even gets worse.
                    Sven                    Michael               The inverse document frequency (IDF) weighting proposed in
                                                                  this publication offers the best correlation with a maximum
   Figure 2. Illustration of an MPEG-7 based annotation           value of 0.791 taking all node and edge information (labels
expressing that Mathias is talking to Sven and Michael in Graz.   and direction) into account.
                                                                     Besides the above introduced suffix tree similarity measure
                                                                  for graphs following similarity measures from text and graph
   Within this scenario two graphs, like the one shown in fig-    retrieval were compared to human judgement:
ure 2, can be compared and a similarity value can be obtained.
Based on the used mechanism for similarity calculation dif-       1. Vector space based on node and edge labels, cosine co-
ferent results are achieved. Our evaluation aims to identify         efficient as similarity measure with following weighting
the most semantic method (in terms of human judgement)               schemes. This metric does not take the structure of the
for similarity calculation of MPEG-7 based annotations.              graph into account, the set of labels is treated as text doc-
   To evaluate the semantics of candidate similarity measure a       ument:
test set of 96 manually annotated digital photos was used. In
essence for all photos a labeled directed graph exists, which      (a) without weighting scheme (Text VS in fig. 4)
describes the semantics of the image by specifying persons,        (b) TF*IDF (Text VS TF*IDF in fig. 4)
time points, locations and events as nodes and interconnect-
                                                                   (c) BM25 (Text VS BM25 in fig. 4, see [9] and [10] for details
ing these nodes by labeled edges, like shown in figure 2. The
                                                                       on BM25)
graphs have a median number of nodes of 5.81, with a medium
number of 5.99 edges. From this test data set 20 photo pairs      2. Vector space with graph paths as terms, cosine coefficient
were identified, which were used to create a questionnaire. The      as similarity measure with following weighting schemes:
participants of the evaluation were asked to rate the pair-        (a) TF*IDF on paths with one arc (VS IDF Triple in fig. 4)
wise similarity of the photos. The averaged similarity from            and full length paths (VS IDF Paths in fig. 4)
the participants answers was correlated to the results of the
                                                                   (b) BM25 on paths with one arc (VS BM25 Triple in fig. 4)
candidate similarity measures.
                                                                       and full length paths (VS BM25 Paths in fig. 4)
   After initial evaluations of 18 and 15 participants a final
evaluation with 112 participants was carried out. The results     3. Maximum common subgraph metric from [3] (MCS in fig.
   4)                                                                                                                                              ACKNOWLEDGEMENTS
4. Error correcting subgraph isomorphism metric from [2]
   with boolean edge label distance functions and two options                                                                                      The Know-Center is funded by the Austrian Competence
   for used node label distance functions:                                                                                                         Center program K plus under the auspices of the Aus-
 (a) Boolean distance function (Berretti (Bool) in fig. 4)                                                                                         trian Ministry of Transport, Innovation and Technology
                                                                                                                                                   (http://www.ffg.at/index.php?cid=95) and by the State of
 (b) Term vector distance function (Berretti (VS) in fig. 4)                                                                                       Styria.

       1,000

       0,900
                                                                                                                                                   REFERENCES
                                         0,788                   0,774                                              0,782        0,786     0,791
       0,800                                         0,754               0,744                          0,740
                                                                                            0,675
                                                                                                                                                    [1] Ricardo A. Baeza-Yates and Berthier Ribeiro-Neto, Modern
       0,700
                                                                                  0,620                                                                 Information Retrieval, Addison-Wesley Longman Publishing
       0,600
                 0,518                                                                                                                                  Co., Inc., 1999.
                               0,489
       0,500
                                                                                                                                                    [2] S. Berretti, A. Del Bimbo, and P. Pala, ‘A graph edit distance
       0,400                                                                                                                                            based on node merging’, in Image and Video Retrieval: Third
       0,300                                                                                                                                            International Conference, CIVR 2004, volume 3115 of LNCS,
       0,200                                                                                                                                            pp. 464–472, Dublin, Ireland, (July 21-23 2004). Springer.
       0,100                                                                                                                                        [3] Horst Bunke and Kim Shearer, ‘A graph distance metric
       0,000
                                                                                                                                                        based on the maximal common subgraph’, Pattern Recogni-
                                                                                                                                                        tion Letters, 19(3-4), 255–259, (1998).
                                                                                 S
                           25



                                          le




                                                                                            l)
                                                                         s
                                                      s




                                                                                                        )




                                                                                                                                           .
                  F




                                                                 le




                                                                                                                                 r.
                                                                                                                  s




                                                                                                                                           r
                                                                        th




                                                                                                       S
                                                   th
                 D




                                                                              C




                                                                                                                    n
                                                                                          oo
                                       ip




                                                              ip




                                                                                                     (V




                                                                                                                                        ll
                          BM




                                                                                                                            ed
                                                                             M




                                                                                                                tio
              *I




                                                                      Pa
                                                Pa
                                    Tr




                                                                                                                                      fu
                                                          Tr




                                                                                        (B




                                                                                                                                                    [4] Dieter Fensel, James A. Hendler, and Henry Lieberman, Spin-
            TF




                                                                                                                            ct
                                                                                                             la
                                                                                                   ti
                      S




                                                                                                                                    F
                                25



                                               25




                                                                     F
                                                          F




                                                                                                 et




                                                                                                                          re
                                                                                                           re
                                                                                       ti




                                                                                                                                  ID
                                                                   ID
                      V




                                                        ID
        S




                                                                                     et



                                                                                               r
                               BM



                                         BM




                                                                                                                        di
       V


                 xt




                                                                                            er


                                                                                                        no
                                                                                    r




                                                                                                                                 T
                                                                 S
                                                    S




                                                                                                                  un




                                                                                                                                                        ning the Semantic Web Bringing the World Wide Web to Its
                                                                                 er
       xt


               Te




                                                                                            B




                                                                                                                                 S
                                                               V
                                                    V
                           S



                                       S




                                                                                                     F
                                                                                 B
     Te




                                                                                                                F
                          V



                                     V




                                                                                                   ID


                                                                                                              ID
                                                                                                  T




                                                                                                                                                        Full Potential, MIT Press, 2005.
                                                                                                            T
                                                                                                 S


                                                                                                           S




                                                                                                                                                    [5] Manuel J. Fonseca, B. Barroso, and Joaquim A. Jorge, ‘Re-
Figure 4.          Evaluation of different distance functions and metrics                                                                               trieving clipart images by content’, in Image and Video Re-
                  using the correlation to human judgement                                                                                              trieval: Third International Conference, CIVR 2004, volume
                                                                                                                                                        3115 of LNCS, pp. 500–507, Dublin, Ireland, (July 21-23
                                                                                                                                                        2004). Springer.
                                                                                                                                                    [6] Michael R. Garey and David S. Johnson, Computers and In-
   The evaluation results in figure 4 show that the suffix tree
                                                                                                                                                        tractability, W.H. Freeman and Company, New York, 1979.
model with proposed inverse document frequency weighting                                                                                            [7] Harald Kosch, Distributed Multimedia Database Technolo-
offers the best correlation to human judgement in the pre-                                                                                              gies, CRC Press, Nov. 2003.
sented domain. However the VS BM25 Triple metric offers a                                                                                           [8] Sven Meyer zu Eissen, Benno Stein, and Martin Potthast,
nearly as high correlation value. The two variants of the error                                                                                         ‘The suffix tree document model revisited’, in Proceedings
                                                                                                                                                        of the I-Know ’05 5th International Conference on Knowl-
correcting subgraph isomorphism metric of [2] do not perform                                                                                            edge Management, pp. 596–603, Graz, Austria, (July 2005).
as good as the other candidates. All evaluated text based sim-                                                                                          J.UCS.
ilarity and distance measures, which do not take the structure                                                                                      [9] S. E. Robertson and S. Walker, ‘Some simple effective approx-
in to account, do not correlate well with human judgement.                                                                                              imations to the 2-poisson model for probabilistic weighted re-
                                                                                                                                                        trieval’, in SIGIR ’94: Proceedings of the 17th annual interna-
                                                                                                                                                        tional ACM SIGIR conference on Research and development
6    CONCLUSION                                                                                                                                         in information retrieval, pp. 232–241, New York, NY, USA,
                                                                                                                                                        (1994). Springer-Verlag New York, Inc.
As can be seen easily from the evaluation similarity measures,                                                                                     [10] Stephen Robertson, Hugo Zaragoza, and Michael Taylor,
which take the structure information of the graphs into ac-                                                                                             ‘Simple bm25 extension to multiple weighted fields’, in CIKM
count, are superior to the tested text retrieval mechanisms,                                                                                            ’04: Proceedings of the thirteenth ACM international confer-
                                                                                                                                                        ence on Information and knowledge management, pp. 42–49,
which use node and edge labels for retrieval. The suffix tree                                                                                           New York, NY, USA, (2004). ACM Press.
method has a slightly better correlation coefficient and there-                                                                                    [11] Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno, ‘Al-
fore reflects human judgement better than the other meth-                                                                                               gorithmics and applications of tree and graph searching’, in
ods. However the difference to the vector space method is                                                                                               PODS ’02: Proceedings of the twenty-first ACM SIGMOD-
marginal, which justifies for example the usage of an path                                                                                              SIGACT-SIGART symposium on Principles of database sys-
                                                                                                                                                        tems, pp. 39–52. ACM Press, (2002).
index for graph retrieval. One possible explanation why the                                                                                        [12] Ali Shokoufandeh, Sven J. Dickinson, K. Siddiqi, and S.W.
triple based VS approach performs that good is that in the                                                                                              Zucker, ‘Indexing using a spectral encoding of topological
inspected domain all node labels are unique within a single                                                                                             structure’, in Conference on Computer Vision and Pattern
graph.                                                                                                                                                  Recognition, IEEE Computer Society, volume 2, pp. 491–497,
                                                                                                                                                        USA, (June 1999).
   The most interesting point is, that methods adapted from                                                                                        [13] R. F. Simmons, ‘Storage and retrieval of aspects of meaning
text retrieval perform better than the evaluated methods de-                                                                                            in directed graph structures’, Commun. ACM, 9(3), 211–215,
veloped for graphs, like MCS and the algorithm of Berretti et                                                                                           (1966).
al. described in [2] on the used test data set. However the num-                                                                                   [14] John F. Sowa, ‘Semantics of conceptual graphs’, in Proceed-
ber of photos in the set is too small for general conclusions,                                                                                          ings of the 17th annual meeting on Association for Computa-
                                                                                                                                                        tional Linguistics, pp. 39–44, Morristown, NJ, USA, (1979).
but as no test data sets for semantic annotations currently ex-                                                                                         Association for Computational Linguistics.
ist, the creation of semantic annotations for multimedia docu-                                                                                     [15] Gabriel Valiente, Algorithms on Trees and Graphs, Springer,
ments is a laborous task and the usefulness of random graphs                                                                                            Berlin, Germany, September 2002.
for evaluation is limited in this domain, an evaluation with a                                                                                     [16] Takashi Washio and Hiroshi Motoda, ‘State of the art of
                                                                                                                                                        graph-based data mining’, SIGKDD Explor. Newsl., 5(1), 59–
bigger data set was out of scope of the project. Nevertheless                                                                                           68, (2003).
the presented evaluation provides a starting point for further                                                                                     [17] Xifeng Yan, Philip S. Yu, and Jiawei Han, ‘Graph indexing:
investigations.                                                                                                                                         a frequent structure-based approach’, in SIGMOD ’04: Pro-
     ceedings of the 2004 ACM SIGMOD international conference
     on Management of data, pp. 335–346. ACM Press, (2004).
[18] Jiwei Zhong, Haiping Zhu, Jianming Li, and Yong Yu, ‘Con-
     ceptual graph matching for semantic search’, in ICCS ’02:
     Proceedings of the 10th International Conference on Concep-
     tual Structures, pp. 92–196, London, UK, (2002). Springer-
     Verlag.