<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Exploiting Recurrent Retrieval Needs in Querying Heterogeneous and Dynamic Graph Dataspaces</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Barbara Catania</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco De Fino</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanna Guerrini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Genoa</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Querying data sources containing heterogeneous and dynamic information is a complex task: high quality answers need to be produced in a limited response time. Dynamic contexts preclude user involvement in interpreting the request and thus solutions should be devised, relying either on additional knowledge about the current execution (e.g., query context or user pro le) or previous executions of the same request [?]. The aim of this discussion paper is to exploit similar requests recurring over time for improving query processing, in terms of result quality and processing time, and to provide a general framework for representing and managing information collected in the executions of (sets of) recurring queries. Our approach relies on the enabling concept of Pro led Graph Query Pattern (PGQP), which represents a set of (previously executed) queries associated with information about their executions. Di erently from apparently similar proposals (materialized views, smart caching), the proposed approach approximately matches queries with pro led patterns with the aim of retrieving various kinds of information, related to past executions of similar queries, and improving the processing of the query at hand.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The last few years have been characterized by the growth of highly heterogeneous
and dynamic data sources, shared across the network. These sources usually
contain both structured and unstructured data, strongly correlated and often highly
dynamic. Such data are often exploited much below their potential due to
difculties in accessing them. Indeed, users can specify their request only vaguely
because the format and the structure of the data encoding the relevant
information is unknown. As an example, consider a smart city explorer, that is, a set of
information published by a municipality to users that may retrieve, e.g.,
information about attractions, points of interests, and shops. In this speci c context,
data may come from di erent, heterogeneous and very dynamic datasets. At
di erent instants of time, also user position or the environment itself (including
data) may be changed.</p>
      <p>Processing complex requests on diverse and dynamic sources requires: (i)
request interpretation; (ii) source selection; (iii) actual processing of the request
on the relevant, and usually big in size, sources. Besides being costly, the overall
process may not guarantee the user is satis ed by the obtained result due to
various problems: the request could be incorrectly interpreted or processed on
inaccurate, incomplete, unreliable data; additionally, processing time could be
inadequate to the query urgency. A common solution to reduce query processing
time is to rely on approximate processing approaches, which provide a faster
answer at the cost of a lower accuracy. On the other hand, user involvement in
the interpretation of the request [?], is not adequate in dynamic contexts, where
urgent requests hamper user interaction. In [?], we claimed that, to cope with
the di culties raised by the heterogeneity and dynamic nature of the considered
environments, user pro les and request contexts, data and processing quality,
and similar requests recurring over time could be exploited.</p>
      <p>In this paper, we focus on similar requests recurring over time for improving
approximate processing of requests, assumed to be already interpreted and
represented according to a graph-based formalism [?]. Recurring retrieval needs have
been considered in query processing for a very long time in terms of materialized
views [?,?]. The idea is to precompute the results of some recurring queries as
materialized views, select some of such views (view selection problem) and re-use
them (view-based query processing) for the execution of new requests.
Unfortunately, the usage of materialized views in the contexts described above su ers
from some problems: (i) views associate a result with a given query, however in
the reference contexts other or additional information could be of interest (e.g.,
the set of used data sources or, under pay-as-you-go integration approaches [6],
query-to-data mappings); (ii) view updates are very frequent in dynamic
environments, reducing the e ciency of the overall system; (iii) view-based query
processing techniques usually rely on a precise semantics while heterogeneity
tends to favour approximation-based approaches. Recurring retrieval needs are
also at the basis of smart caching approaches. Here the main issue concerns
the de nition of suitable cache replacement policies. Some of the proposed
approaches rely on approximate query matching [?,?] but, similarly to materialized
views, cached queries are usually associated with query results.</p>
      <p>The framework we propose aims at representing and managing recurring
queries in order to improve query processing, in terms of result quality and
processing time, getting over the limitations of current materialized views and
smart caching approaches by relying on the enabling concept of Pro led Graph
Query Pattern (PGQP). The idea is to take advantage of prior processing in
order to obtain shortcuts to di erent points in the query processing stack. The
approach we provide generalizes smart caching methods allowing various kinds
of information to be associated with cached queries in order to improve query
processing even further.</p>
      <p>The remainder of the paper is organized as follows. PGQPs are introduced in
Section ??. The main phases of the proposed framework are presented in Sections
?? and ??. Finally, Section ?? concludes by outlining some future developments.</p>
    </sec>
    <sec id="sec-2">
      <title>Pro led Graph Query Patterns</title>
      <p>The framework we propose is based on a graph-based representation of data
spaces and user retrieval needs [?]. In particular, for the sake of simplicity and for
guaranteeing a tractable combined and data complexity (fundamental
requirements for dealing with massive in size graph databases), we assume the data
space is represented in terms of a set of graph data sources, i.e., graphs where
nodes can also be labeled with variables, for representing situations in which the
node identity is not known [?], due to the heterogeneous nature of data and the
possible lack of information. Furthermore, we assume a graph query is speci ed
as a directed, labeled graph with variables belonging to the set of Unions of
Acyclic Conjunctive Queries (UACQ) [?].</p>
      <p>An example of a UACQ query is shown in Figure ??(a). The query
represents the following request: nd the authors of the gurative artworks the user
is watching (i.e., they are located close to her position), together with such
artworks, a biography of the authors, and information about places in Genoa
where the authors are currently exposing their artworks. Since the user request
is related to her current position, it needs to be interpreted and processed before
such position changes. Additionally, data sources, and, in particular, information
about exhibitions are dynamic due to frequent exhibit schedule updates.
For representing and managing (sets of) recurring user requests, the
proposed framework relies on the enabling concept of Pro led Graph Query Pattern
(PGQP), introduced in [?]. Each PGQP corresponds to a graph query,
associated with data or metadata related to its past processing, together with some
values quantifying the accuracy of such association. More formally, a PGQP
can be de ned as a triple &lt; Q; I; v &gt; where Q 2 U ACQ, I is a set of processing
information, generated during the past processing of Q over the data space, and
v 2 [0; 1] is a value which quanti es the accuracy of associating Q with I (we
will detail this concept later).</p>
      <p>Several kinds of information can be associated with PGQPs; when previously
computed results are associated with a graph query, a PGQP becomes a slight
extension of a materialized view. In general, however, any kind of metadata
collected during query processing (e.g., the set of used data sources, query-to-data
mappings under pay-as-you-go integration approaches [?]) can be considered.
Any speci c instantiation of the general de nition makes PGQPs a shortcut to
di erent points of the query processing stack.</p>
      <p>Example 1. Consider an instantiation of the PGQP concept de ned above
targeted to source selection and assume that each PGQP graph query is
associated with the speci c data sources exploited during its past processing. Figure
??(b),(c),(d) shows some examples of PGQPs. Speci cally, PGQP 1 represents
the following request, represented as a UACQ query: nd the authors (?x) which
are exposing at Palazzo Ducale in GE (which is an acronym for Genoa), together
with a biography (?x) of such authors. Such query is associated with sources S1
and S2, meaning that past executions of the PGQP query relied on such sources
for computing the query result. Data source S1 may correspond to a dynamic
data source containing information about art exhibitions of various artists
(leftmost part of updated 1); on the other hand, S2 may correspond to a static data
source S3 (such as dbpedia) containing biographies of artists. Value 0:8 is the
accuracy measure associated with 1 which speci es that the results computed
during past executions relying on data sources S1 and S2 could be considered
80% accurate. An important di erence with respect to the materialized view
approach is thus that the selected PGQP is not associated with query results
(that would have required a frequent recomputation, given the dynamicity of
data sources, e.g., in art exhibits) but with the sources contributing data to the
result. Additionally, even by associating query results to PGQPs, graph 1
interpreted as a view would not have been selected since (referring for instance to
the approach in [?]) the match between \Genoa" and \GE" is approximate and,
thus, graph 1 cannot contribute to compute Q results.</p>
      <p>Figure ?? summarizes the overall PGQP-based framework we envision,
assuming a set of PGQPs is available. It relies on two main phases: (i) the aim of
the rst phase (PGQP-based Query Processing) is to decompose a given query
into a set of sub-queries which could be executed by relying on information
association with PGQPs, execute them, and compute the nal results. For each
sub-query, an accuracy value should also be computed that quanti es how good
is any generated partial result with respect to the original request; (ii) the aim of
the second phase (PGQP Set Management) is to refresh the PGQP set, possibly
taking into account new information generated during query processing.
3</p>
    </sec>
    <sec id="sec-3">
      <title>PGQP-Based Query Processing</title>
      <p>Given a query Q, PGQP-based query processing relies on information associated
with PGQPs to rewrite Q into an equivalent one and to process it improving
evaluation performance and result quality. Six main steps can be identi ed.
1. PGQP-based query decomposition. Given a query Q and a set S of
PGQPs, de ned in terms of graph queries, we rst determine whether it is
possible to rely on S for executing Q. This is possible if Q can be approximated
by a new query, obtained by composing a set of Q subgraphs, which best match
PGQPs, with the portion of Q which cannot be represented in terms of S. More
formally, denoting with i:1 the query represented by the PGQP, we look for a
subgraph Qsub of Q, a PGQP subset S S, and a fusion operator such that
Q can be rewritten in a new approximate query Qa = (fQ i j i 2 S g; Qsub),
where Q i denotes the (sub)graph of Q for which an approximate match with (a
subgraph of) i:1 exists, and Qa satis es some optimality criteria. PGQPs in S
are selected by combining approaches for approximate supergraph matching and
graph view selection. PGQPs i in S are selected according to a ranking value.
Such value could be generated by combining the similarity between Q and i:1,
computed through the similarity function underlying the matching process and
quantifying how close Q and i:1 are, and the PGQP accuracy measure i:3,
according to speci c functions.
2. PGQP-based processing. Each graph query Q i can be executed taking
into account information related to its prior executions, that is, information
associated with the query represented by i in S. To this aim, speci c PGQP-based
processing algorithms should be designed, depending on the information
associated with PGQPs.
3. Accuracy computation. The evaluation of each Q i subquery returns in
general an approximate result, for which an accuracy value should be computed.
This value is in uenced by three main components: (i) the similarity between
Q i and i:1; (ii) the usage of either a precise or an approximate PGQP-based
query processing algorithm; (iii) result cardinality, in case we want to deal also
with the empty or few answers problem [?], since it may impact user
satisfaction. The accuracy values associated with the result obtained by the processing
of each Q i are then used to update the accuracy values associated with i:1.
See Section ?? for additional details.
4. Residual query execution. Query Qsub, which represents the residual part
of Q, has to be executed based on a traditional graph query processing algorithm
(see, e.g. [?]). Processing information is collected during the execution and also,
in this case, an accuracy measure has to be associated with the result. In case a
precise query processing algorithm is used, such measure will be equal to 1.
5. Partial result merging. At the end of the processing, all partial results
derived from the execution of each Q i and from the evaluation of the residual
query Qsub are merged through a fusion operator and the nal result is returned
to the user.</p>
      <p>
        Example 2. Consider again Figure ?? and assume that (Q; 1:1) = (Q; 2:1) =
0:8 &gt; (Q; 3:1) = 0:4. Suppose we want to select just one PGQP and that the
ranking value r( ), associated with each PGQP i, is computed by multiplying
the similarity and the accuracy values. In this case, PGQP 1 is selected for
the PGQP-based processing, since r(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) = 0:64; r(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) = 0:56, and r(
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) = 0:28.
Thus, the decomposition step 1 will return: (i) S = f 1g; (ii) Q 1 de ned as
shown in Figure ??(a); (iii) Qsub de ned as shown in Figure ??(a); (iv) de ned
as the join between the partial results of Q 1 (tuples for variables ?a, ?m, ?b)
and Qsub (tuples for variables ?a, ?p).
      </p>
      <p>Suppose that the evaluation of Q 1 in step 2 generates the result R1: f?a =
\Andy Warhol", ?m = \Palazzo Ducale", ?b = \Bio"g. In step 3, an accuracy
value is computed for the result R1, say 0.9. The evaluation of Qsub in step 4
returns another partial result R2, for example R2: f?a = \Andy Warhol", ?p =
\Shot Marilyn"g. Finally, in step 5, R1 and R2 are joined, generating the nal
result f?a = \Andy Warhol", ?m = \Palazzo Ducale", ?b = \Bio", ?p = \Shot
Marilyn"g .
4</p>
    </sec>
    <sec id="sec-4">
      <title>PGQP Set Update</title>
      <p>The PGQP-based query processing of a query produces various information that
need to be re ected into the PGQP set to keep it up-to-date. More precisely,
we envision two main groups of updates, namely online updates, executed as
side e ects of query processing, and o ine updates, which periodically delete or
refresh PGQPs which are considered unreliable.</p>
      <p>Online Updates. PGQP-based query processing may generate two di erent
types of information: (i) in step 3, an accuracy value is computed for each
subquery executed by relying on a PGQP i; such value can be used to update
the aggregate accuracy value associated with i; suitable aggregation measures
should be de ned to this purpose and are current under investigation; (ii) in
step 5, new processing information I are generated from the processing of the
residual query Qsub, together with an accuracy value v; such information can be
interpreted as a new PGQP = (Qsub; I; v), which should be used for updating
the PGQP set in case Qsub is assumed to be recurrent. Various approaches can
be exploited to detect recurrent queries. A simple and optimistic approach could
be that of assuming any new query is recurrent, then monitoring the usage of
the related PGQP to re ne such decision. If Qsub is recurrent, should be added
to the set and made available for future query processing.</p>
      <p>O ine Updates. Due to dynamicity of the environment, PGQPs, after their
creation, may become unreliable for three main reasons: (i) the query they
represent may be no more considered as recurrent (see above); (ii) the accuracy value
becomes very low, meaning that the usage of the PGQP lead to the generation
of inaccurate results; (iii) the associated information may become imprecise due
to the variability of the data space. For this reason, periodically the PGQP set
should be refreshed. Di erent policies could be provided. As an example, any
PGQP with an accuracy value lower than a given threshold could be
considered unreliable and could be refreshed by executing the PGQP query, using a
standard graph query processing approach: similarly to the execution of a
residual query, new information gathered during query processing, as well as the
computed result accuracy value, could be used to update the PGQP considered
unreliable. Additionally, all PGQPs could be periodically refreshed to take care
of data source dynamicity.</p>
      <p>Example 3. Consider again query Q in Figure ??(a). Based on what we stated
in Example ??, we know that, after the PGQP-based processing of Q 1 , an
accuracy value equal to 0.9 is computed. This value is used to perform an online
update with the aim of updating the accuracy value associated with Q 1 , taking
into account this further PGQP-based execution. For example, the two values
could be multiplied (since they can be interpreted as the probability of two
independent events). Additionally, after the execution of the residual query Qsub,
PGQP &lt; Qsub; fS8; S9g; 0:75 &gt; could be added to the PGQP set, assuming it is
recurring. Now assume that the resulting accuracy value is lower than a given
threshold: this means that the processing information represented inside the
PGQP led to inaccurate query results. In this case, query 1:1 could be
reexecuted and the information collected during query processing used to refresh
1. For example, a new data source S3 could be identi ed as relevant, while a
previously identi ed data source, e.g., S2, may become useless.</p>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <p>This paper presents a rst step towards the design of query processing
approaches for recurring retrieval needs. Several issues still need to be investigated,
concerning speci c framework instantiations, characterization of speci c
information to be associated with graph queries in PGQPs and related PGQP-based
query processing approaches, and suitable approaches for detecting recurring
queries and refreshing the PGQP set. We are currently working on an
instantiation of the framework tailored to source selection for linked open data. The
idea is to associate each PGQP with information about the data sources used
in the past for query execution (as discussed in the presented examples), thus
providing an alternative approach with respect to index-based source selection
for linked open data [?] in all contexts in which queries are recurring.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Agrawal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chaudhuri</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Gionis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <article-title>Automated ranking of database query results</article-title>
          .
          <source>In CIDR</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Barcelo</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <article-title>Querying graph databases</article-title>
          .
          <source>In Proc. of PODS</source>
          , pp.
          <volume>175</volume>
          {
          <issue>188</issue>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Catania</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Fino</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Guerrini</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <article-title>Recurring retrieval needs in diverse and dynamic dataspaces: issues and reference framework</article-title>
          .
          <source>In GraphQ Workshop</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Catania</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guerrini</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Belussi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mandreoli</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martoglia</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Penzo</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <article-title>Wearable queries: adapting common retrieval needs to data and users</article-title>
          .
          <source>In DBRank Workshop</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Das</given-names>
            <surname>Sarma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            and
            <surname>Halevy</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>Bootstrapping Pay-As-You-Go data integration systems</article-title>
          .
          <source>In Proc. of SIGMOD</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Fan</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <article-title>Answering pattern queries using views</article-title>
          .
          <source>IEEE Trans. Knowl</source>
          . Data Eng.,
          <volume>28</volume>
          (
          <issue>2</issue>
          ):
          <volume>326</volume>
          {
          <fpage>341</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Goasdoue</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karanasos</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leblay</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Manolescu</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <article-title>View selection in semantic web databases</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>5</volume>
          (
          <issue>2</issue>
          ):
          <volume>97</volume>
          {
          <fpage>108</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>O.</given-names>
            <surname>Hartig and M. T. O</surname>
          </string-name>
          <article-title>zsu. Linked Data query processing</article-title>
          .
          <source>In Proc. of ICDE</source>
          , pp.
          <volume>1286</volume>
          {
          <issue>1289</issue>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. Mass,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Ramanath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Sagiv</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            , and
            <surname>Weikum</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>IQ: the case for iterative querying for knowledge</article-title>
          .
          <source>In Proc. of CIDR</source>
          , pp.
          <fpage>38</fpage>
          -
          <lpage>44</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Melnik</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia-Molina</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <article-title>and</article-title>
          <string-name>
            <surname>Rahm</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <article-title>Similarity ooding: a versatile graph matching algorithm and its application to schema matching</article-title>
          .
          <source>In Proc. of ICDE</source>
          , pp.
          <volume>117</volume>
          {
          <issue>128</issue>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Ntarmos</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trianta</surname>
            <given-names>llou</given-names>
          </string-name>
          , P., and
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>GraphCache: a caching system for graph queries</article-title>
          .
          <source>In Proc. of EDBT</source>
          , pp.
          <fpage>13</fpage>
          -
          <lpage>24</lpage>
          ,
          <year>2017</year>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Papailiou</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsoumakos</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karras</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Koziris</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <article-title>Graph-aware, workloadadaptive sparql query caching</article-title>
          .
          <source>In Proc. of SIGMOD</source>
          , pp.
          <fpage>1777</fpage>
          -
          <lpage>1792</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Tian</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Patel</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          <article-title>TALE: a tool for approximate large graph matching</article-title>
          .
          <source>In Proc. of ICDE</source>
          , pp.
          <fpage>963</fpage>
          -
          <lpage>972</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Shasha</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>Simple fast algorithms for the editing distance between trees and related problems</article-title>
          .
          <source>SIAM Journal on Computing</source>
          ,
          <volume>18</volume>
          (
          <issue>6</issue>
          ):
          <volume>1245</volume>
          {
          <fpage>1262</fpage>
          ,
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          and Han,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>On graph query optimization in large networks</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>3</volume>
          (
          <issue>1</issue>
          ):
          <fpage>340</fpage>
          -
          <lpage>351</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Weikum</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <article-title>Data and knowledge discovery</article-title>
          .
          <source>GRDI</source>
          <year>2020</year>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>