<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Recurring Retrieval Needs in Diverse and Dynamic Dataspaces: Issues and Reference Framework</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Barbara Catania</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco De Fino</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanna Guerrini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Genoa</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Processing information requests over heterogeneous dataspaces is very challenging because aimed at guaranteeing user satisfaction with respect to con icting requirements on result quality and response time. In [3], it has been argued that, in dynamic contexts preventing substantial user involvement in interpreting requests, information on similar requests recurring over time can be exploited during query processing. In this paper, referring to a graph-based modeling of dataspaces and requests, we propose a preliminary approach in this direction centered on the enabling concept of Pro led Graph Query Pattern (PGQP) as an aggregation of information on past evaluations of a set of previously executed queries. The information maintained in PGQP is not query results, as in materialized queries, but can include di erent kinds of data and metadata.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>Motivations. The last years have been
characterized by a tremendous growth of available
information, coming from information sources that are highly
heterogeneous in terms of structure, semantic
richness, and quality. Sources can indeed contain
unstructured data as well as data with well-de ned structure,
strongly correlated and semantically complex but
relatively static data (e.g., Linked Open Data), highly
dynamic user-generated data. This information is,
however, a resource currently exploited much below
its potential because of the di culties for the users
in accessing it. Users would like to nd answers to
complex requests expressing relationships among the
entities of their interest, but they are able to specify
requests only vaguely because they cannot reasonably
know the format and structure of the data that encode
the information relevant to them.</p>
      <p>As an example, consider a smart city explorer, that
2017, Copyright is with the authors. Published in the Workshop
Proceedings of the EDBT/ICDT 2017 Joint Conference (March 21,
2017, Venice, Italy) on CEUR-WS.org (ISSN 1613-0073).
Distribution of this paper is permitted under the terms of the Creative
Commons license CC-by-nc-nd 4.0
is, a set of smart information services o ered by a
municipality to users that, e.g., retrieves information
about attractions, points of interests, and shops. A
user might, for instance, ask for the authors of the
gurative artworks she is watching (i.e., they are located
close to her position), together with information about
other places where such authors are currently
exposing. In this speci c context, data may come from
different datasets, are heterogeneous and very dynamic,
as the user may quickly change her position or the
environment itself (including data) might be di erent at
di erent instants of time.</p>
      <p>Processing complex requests on diverse and dynamic
sources requires: (i) request interpretation; (ii) source
selection; (iii) actual processing of the request on the
relevant, and usually big in size, sources. The overall
process is costly and, nevertheless, it may not
guarantee user satisfaction on the obtained result since
the request (i) could be incorrectly interpreted, (ii)
could be processed on inaccurate, incomplete,
unreliable data, or (iii) could require a processing time
inadequate to its urgency.</p>
      <p>
        To make processing time more adequate to the
urgency of the request, approximate query processing
approaches can be considered, by providing a fast
answer at the cost of a lower accuracy [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Additionally,
to improve the interpretation of the request, and
related source selection, one possibility is to rely on user
involvement. Novel paradigms for exploratory
userdata interactions, that emphasize user context and
interactivity with the goal of facilitating exploration,
interpretation, retrieval, and assimilation of
information, are being developed. Such solutions, however,
do not seem adequate in dynamic contexts,
characterized by urgent requests and by communication means
hampering user interaction.
      </p>
      <p>
        Other innovative approaches should therefore be
devised, relying on di erent kinds of information for
nding possibly approximate answers to complex
information needs, even vaguely and imprecisely speci ed,
operating on the full spectrum of relevant content in
a dataspace of highly heterogeneous and poorly
controlled sources. In [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], to overcome di culties related
to heterogeneity and dynamic nature, the exploitation
of additional information, in terms of user pro le and
request context, data and processing quality, similar
requests recurring over time, is suggested.
      </p>
      <p>In this paper, we focus on similar requests recurring
over time for improving approximate processing of
requests that we assume already interpreted and
represented according to a given formalism. Recurring
retrieval needs are very common in dynamic contexts,
such as, for instance, during or after an exceptional
event (environmental emergencies or ash mobbing
initiatives), in the context of users belonging to the
same community or that are in the same place,
possibly at di erent times. The information needs are
widespread among di erent users, because induced by
the event, the interests of the community, and the
place, respectively. We aim at taking advantage of the
experience gained by prior processing in new searches
for limiting interpretation errors and response time,
thus reducing the possibility of producing
unsatisfactory answers.</p>
      <p>
        Recurring retrieval needs have been considered in
query processing for a very long time in terms of
materialized views (see, e.g., [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ]) The idea is to
precompute the results of some recurring queries as
materialized views, select some of such views (view selection
problem) and re-use them (view-based query
processing) for the execution of new requests. Unfortunately,
the usage of materialized views in the contexts
described above su ers of some problems: (i) views
associate a result to a given query, however in the
reference contexts we may be interested in associating
other or additional information (e.g., the set of used
data sources or, under pay-as-you-go integration
approaches [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], query-to-data mappings); (ii) view
updates are very frequent in dynamic environments,
reducing the e ciency of the overall system; (iii)
viewbased query processing techniques usually rely on a
precise semantics while heterogeneity tends to favour
approximation-based approaches.
      </p>
      <p>
        Alternative usages of the concept of recurring
retrieval needs in query processing have been recently
provided but, also in this case, precise query
processing is taken into account. For example, in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ],
common needs are used with the aim of expediting
the processing of graph queries against a database of
graphs, by reducing the number of subgraph
isomorphism tests to be performed.
      </p>
      <p>Contributions. This paper presents the preliminary
results of an on-going work aiming at exploiting
information about similar requests recurring over time
in approximate query processing of requests executed
over diverse and dynamic dataspaces. We refer to a
graph-based representation of dataspaces, interpreted
as a collection of (possibly) schemaless data sources,
and on a graph-based language for modeling user
retrieval needs, expressed as entity relationships queries
over such dataspaces. We propose a framework for
representing and managing (sets of) recurring user
requests, and for exploiting them in approximate query
processing. The proposed framework is based on the
concept of Pro led Graph Query Pattern (PGQP). Like
materialized views, each pro led graph query pattern
corresponds to a kind of index value associated with
information related to the past evaluation of the queries
it represents; however, di erently from materialized
views, such information does not necessarily
correspond to previously computed results, rather it ranges
from query results to any other higher level
information collected during query processing (e.g.,
queryto-data mappings). In specifying the framework, we
identify the need for PGQP-aware approximate query
processing techniques and PGQP management issues
raised by the framework. Each instantiation of the
framework will lead to the de nition of a speci c
PGQPaware query processing approach for a given set of
information associated with PGQPs. Approximated
results can be generated for two main reasons: due
to the dynamic nature of the dataspace, information
associated with PGQPs may change between two
distinct executions of the same query; PGQPs are
selected based on similarity criteria with respect to the
query at hand.</p>
      <p>The paper nally proposes a speci c instantiation
of the framework, in the context of which the various
components and concepts are formalized, and
algorithms for the most relevant tasks are speci ed. In
this instantiation, we formally de ne PGQPs but we
leave to future work the association of PGQPs with
information collected in prior executions.</p>
      <p>Paper organization. The proposed framework is
presented in Section 2 and a speci c instantiation of
the framework is de ned in Section 3. Section 4
concludes by discussing a number of open issues we are
investigating in this work-in-progress.
2.</p>
    </sec>
    <sec id="sec-2">
      <title>PGQP-BASED FRAMEWORK</title>
      <p>
        The framework we propose is based on a
graphbased representation of dataspaces, interpreted as a
collection of (possibly) schemaless data sources, and of
user retrieval needs, expressed as entity relationships
queries over such dataspaces [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Speci cally, we
assume to deal with data sources represented in terms
of partially speci ed graphs, where some information,
associated with nodes, may be unknown [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Similarly,
each request is represented in terms of a graph,
specifying entities and relationships of interest, which may
contain node variables, in order to characterize
information to be retrieved [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. As an example, Figure 2
(a) presents a graph query Q corresponding to the
information need \the authors of the gurative artworks
the user is watching (i.e., they are located close to her
position), together with such artworks, a biography of
the authors, and information about places in Genoa
where the authors are currently exposing their
artworks". Notice that, since the user request is related
to her current position, it needs to be interpreted and
processed before such position changes.
      </p>
      <p>
        For representing and managing (sets of) recurring
user requests, the framework relies on the enabling
concept of Pro led Graph Query Pattern (PGQP). Each
PGQP corresponds to a graph, associated with data or
metadata related to the past evaluation of the queries
it represents. Such information does not
necessarily correspond to previously computed results, thus,
PGQPs do not necessarily correspond to materialized
views. Rather, any kind of metadata collected
during query processing, as the set of used data sources,
query-to-data mappings under pay-as-you-go
integration approaches [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], or result quality information, can
be considered. As an example, Figure 2 presents a
graph query Q and two PGQPs 1 and 2.
      </p>
      <p>Independently from the information associated with
PGQPs, the proposed framework aims at addressing
three main problems which can be stated as follows
(see Figure 1 for an overall picture).</p>
      <p>PGQP-based query decomposition. Given a query
Q and a set S of PGQPs, de ned in terms of graph
queries, determine whether it is possible to rely on S
for executing Q. This is possible if Q can be
approximated by a new query, obtained by composing a set
of Q subgraphs, which best match PGQPs, with the
portion of Q which cannot be represented in terms of
S. More formally, we look for a subgraph Qsub of Q,
a PGQP subset S S, and a fusion operator such
that Q can be rewritten in a new approximate query
Qa = (fQ i j i 2 S g; Qsub), where Q i denotes the
(sub)graph of Q for which an approximate match with
(a subgraph of) i exists, and Qa satis es some
optimality criteria. That is, PGQPs in S are selected
according to a similarity function which quanti es
how close the query graph and each PGQP are.</p>
      <p>With reference to Figure 2, various decompositions
of Q are possible based on 1 and 2. Assuming 1
is more similar to Q than 2, the decomposition of Q
can return: (i) S = f 1g; (ii) Qsub de ned as shown
in Figure 2(a); (iii) de ned as the join between the
partial results of Q 1 (tuples for variables ?a, ?m, ?b)
and Qsub (tuples for variables ?a,?b). Notice that, by
interpreting PGQPs as views, traditional view
selection algorithms would not have selected any PGQP,
due to the approximate matches between \Genoa" and
\GE" and \Figurative" and \Figurative Art". Note
that, even if in this example Q 1 totally matches 1,
partial matches are also possible.</p>
      <p>
        PGQP-aware query processing. In order to
optimize Qa evaluation, we can rely on information
associated with PGQPs in S . More precisely, each graph
query Q i can be executed taking into account
information related to its prior execution, that is,
information associated with i in S. To this aim,
speci c PGQP-based processing algorithms should be
designed, depending on the information associated with
PGQPs. On the other hand, query Qsub, which
represents the residual part of Q, has to be executed based
on a traditional graph query processing algorithm (see,
e.g. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]). At the end of the processing, all partial
results are merged through the fusion operator and
the result is returned to the user.
      </p>
      <p>Referring to Figure 2(a), Q 1 is processed using
information associated with 1 while Qsub is processed
by a traditional query processing algorithm.
PGQP set update. For each graph query Q,
executed by the system without taking into account
PGQPs, information related to its processing is
collected (depending on the aim of the approach) and
used, together with Q, for updating the set of
available PGQPs at the end of the processing.</p>
      <p>As an example, Figure 2(d) shows PGQP 1
extended with the part of the query executed by a
traditional query processing algorithm.</p>
      <p>The framework described above can be instantiated
in several ways. As an example, consider an
instantiation targeted to source selection. In this case, the
information associated with PGQP 1 could be the
sources exploited during the past processing of the
queries it represents to retrieve the relevant
information. Thus, for instance, information relevant for 1
may come from a dynamic data source s1
containing information about art exhibitions of various artists
(left part of 1) and a static data source s2 (such as
dbpedia) containing biographies of artists (right part).
An important di erence with respect to the
materialized view approach is thus that the selected PGQP is
not associated with query results (that would have
required a frequent recomputation, given the dynamicity
of data sources, e.g., in art exhibits) but the sources
contributing data to the result. PGQP-based
processing would rely on this information for targeting to
these sources the execution of Q 1 . As an e ect of
the execution of Q, a third data source s3 containing
museum catalogues with artwork positions is
identied, and associated with the updated PGQP (shown
in Figure 2(d)).</p>
    </sec>
    <sec id="sec-3">
      <title>A PRELIMINARY PGQP-AWARE</title>
    </sec>
    <sec id="sec-4">
      <title>PROCESSING ALGORITHM</title>
      <p>In this section, we select suitable models for
dataspaces and user requests, we formalize the concept of
PGQP and we discuss their use and management in a
preliminar instantiation of the framework.
3.1</p>
    </sec>
    <sec id="sec-5">
      <title>Dataspace and User Requests</title>
      <p>
        Data Space Representation. Under the reference
context, data spaces may contain redundant or even
missing information. Furthermore, the heterogeneous
nature of data leads to schemaless representations. To
take care of these features, we assume the data space
is represented in terms of a collection of graph pattern
databases, i.e., graphs where nodes can also be labeled
with variables, for representing situations in which the
node identity is not known [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]).
      </p>
      <p>Definition 1. Let N be a set of node labels. Let
Vnode be a set of node variables. Let be an arbitrary
( nite or in nite) set of symbols. A Graph Pattern
Database over is a pair = (N; E) where: (i) N
N [Vnode is the nite set of nodes; (ii) E N N
is the set of edges. 2</p>
      <p>The semantics of a graph pattern database
corresponds to all graphs which are instances of the
pattern and it is de ned via homomorphisms. Given a
graph G = (N 0; E0) and a graph pattern database
= (N; E), a homomorphism h : ! G is a mapping
h : N ! N 0 that maps labels and variables used in
to labels used in G such that:
1. h1(n) = n, for every node label n 2 N ; and
2. for every edge (n1; e; n2) 2 E, there is a path
(h(n1); e; h(n2)) in G.</p>
      <p>
        The semantics of a graph pattern w.r.t. the
labeling alphabet is [[ ]] = fG over jG j= g, where
G j= holds if a homomorphism h : ! G exists.
Graph Query Language. In this work, for the sake
of simplicity and for guaranteeing a tractable
combined and data complexity (fundamental requirements
for dealing with massive in size graph databases), we
consider graph queries speci ed as unions of acyclic
conjunctive queries and we denote with UACQ the
resulting language [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. Formally, an UACQ query Q
is the union of expressions Q1; :::Qk where each Qh,
1 h k, is an expression of the form
ans(z1; :::; zn)
      </p>
      <p>^ (xi; ai; yi)
1 i m
such that the graph underlying Qh is acyclic, m &gt; 0,
each xi and yi is a node variable or a constant (1
i m), each ai 2 (1 i m), and each zi is some
variable xj or yj (1 i n, 1 j m). The query
Q is Boolean if ans(), i.e. n = 0.</p>
      <p>UACQ queries can always be interpreted as graph
(pattern) queries Q = ( ; z) where is the graph
pattern underlying Q and z = (z1; :::; zn).</p>
      <p>
        The semantics of a UACQ query Q = ( ; z) when
executed over a graph pattern database = (N; E) can
be de ned as the set of answers which are returned
for each instance of the graph pattern database.
According to [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], we call them certain answers and we
compute them as:
      </p>
      <p>
        certain (Q; ) = fv 2 N nj j= [v=z]g
where, given two graph patterns 1 and 2, 1 j= 2
( 1 implies 2) if [[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]] [[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]] .
3.2
      </p>
    </sec>
    <sec id="sec-6">
      <title>Profiled Graph Query Patterns</title>
      <p>
        A PGQP represents in a compact way a set of graph
queries executed in the past. In order to provide a
formal de nition of PGQPs, we propose to consider union
as aggregate operator of the component queries and to
represent a PGQP as a UACQ query. Since PGQPs
are used for the approximate evaluation of queries, the
issue also arises of determining whether each PGQP
represents, in a reasonably good way, (a portion of) a
graph query Q. If this is the case, becomes a
candidate for the PGQP-aware processing of Q and we say
that Q matches the PGQP. PGQP candidates can be
selected through approximate subgraph matching [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
and the usage of a graph similarity function.
      </p>
      <p>Definition 2. Let Q1 = (N1; E1); Q2 = (N2; E2) 2
U ACQ. An Approximate Subgraph Match between Q1
and Q2 is a bijection mapping : N10 $ N20 , where
Ni0 Ni, i = 1; 2. An Approximate Subgraph
Matching Function is a function m that, for each pair of
Algorithm 1 PGQP-based query decomposition
Algorithm 2 PGQP set update
INPUT</p>
      <p>Q: graph query
S = f 1; 2; :::; ng: set of PGQPs
m: a graph similarity function induced by an
approximate subgraph matching function m
vs: minimum allowed graph similarity
OUTPUT
(S ; Qsub), such that S S and Qsub is a subgraph of
Q
APPROACH
1: Let k 2 S such that m(Q; k) = max1 i n m(Q; i)
2: if m(Q; k) &gt; vs then
3: return (fQ k g; Q Q k )
4: else
5: return (fg; Q)
Q1; Q2 2 U ACQ, returns an approximate subgraph
match from Q1 to Q2. A Graph Similarity Function
is a metric function : U ACQ U ACQ ! [0; 1] such
that, for each Q1; Q2 2 U ACQ, (Q1; Q2) quanti es
the similarity between Q1 and Q2. is induced by an
Approximate Subgraph Matching Function m (denoted
by m) if the value assigned to (Q1; Q2) quanti es the
similarity between subgraphs Q01 and Q02 involved in
the match m(Q1; Q2). 2
PGQPs and PGQP matches can now be de ned.</p>
      <p>Definition 3. A Pro led Graph Query Pattern
is a graph query in UACQ. Let m be a graph
similarity function induced by an approximate subgraph
matching function m. A graph query Q matches a
PGQP , with similarity v 2 (0; 1], if m(Q; ) = v.
Q denotes the (sub)graph of Q involved in the match.
Q is a total match of if Q = Q, it is a partial match
otherwise. 2</p>
      <p>In the following, we assume that function m is
provided by the system and used during PGQP-based
query decomposition and PGQP set update.
3.3</p>
    </sec>
    <sec id="sec-7">
      <title>PGQP-based Query Decomposition</title>
      <p>In the following, we present a preliminary
PGQPbased query decomposition approach, de ned
assuming to: (i) select only one PGQP (thus, jS j = 1); (ii)
merge partial results thought relational operators.</p>
      <p>
        Several approximate graph matching functions have
been proposed so far which can be used as basis for
de ning similarity functions (see, e.g., [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ] for some
recent papers and [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] for a survey). A relevant
example, due to its exibility, is represented by the
Similarity Flooding function [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Based on the intuition that
similar nodes have similar neighbors, node
similarity values are computed relying on information about
node neighbors through an iterative xpoint
computation. The selection metric, used in Selection
Flooding to identify the best mapping, can be used (after
normalization of values between 0 and 1) as a graph
similarity function. Details about PGQP de nition in
terms of the Similarity Flooding function can be found
in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>According to De nition 3, each PGQP divides a
graph query Q into two subqueries, namely Q and
Q Q (computed by removing from Q rst edges in
INPUT</p>
      <p>Q: graph query
S = f 1; 2; :::; ng: set of PGQPs
vc: maximum allowed transformation cost
OUTPUT</p>
      <p>S: an updated set of PGQPs
APPROACH
1: Let h 2 S such that CT (Q; h) = min1 i nCT (Q; i)
2: if (CT (Q; h) vc) then
3: h = T (Q; h)
4: S = S n f hg [ f hg
5: else
6: S = S [ fQg
7: return S
Q and then all remaining non connected nodes
belonging to Q ). This consideration guides us in the
de nition of a preliminary decomposition algorithm
(see Algorithm 1). Given a query Q and a set of
PGQPs S, the PGQP k providing the highest
similarity value with respect to Q is selected and, if such
value is greater than a given threshold (thus, Q and k
are close enough), the binary decomposition described
above is returned. Otherwise, no decomposition is
applied.
3.4</p>
    </sec>
    <sec id="sec-8">
      <title>PGQP Set Update</title>
      <p>In order to update the PGQP store, we rely on a
graph transformation function.</p>
      <p>Definition 4. A Graph Transformation Function
is a function T : U ACQ U ACQ ! U ACQ such
that, for each query Q and PGQP , Q is a match of
T (Q; ), based on m. 2</p>
      <p>
        Transformation functions can be de ned in terms
of a small set of operations, previously de ned in the
context of the so called tree editing problem [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], which
at least contains node relabelling, deletion and
insertion. Each transformation function can be de ned
by specifying which operations should be applied to
a given PGQP for each node and edge of the query
Q at hand, in such a way that Q, after the
transformation, matches the resulting PGQP. An example
of graph transformation is the function that, given a
PGQP and a query Q, inserts into all nodes and
edges in Q Q that have not been mapped into
nodes by m.
      </p>
      <p>In order to identify the best transformation, a graph
cost function can be de ned as follows.</p>
      <p>Definition 5. A Graph Cost Function for a
transformation function T is a function CT : U ACQ
U ACQ ! R, which, for each graph query Q and PGQP
, assigns a cost value to T (Q; ). 2
Similarly to the tree editing problem, a graph cost
function quanti es the transformation e ort by
assigning a cost to each applied editing operation. As an
example, given a graph query Q and a PGQP , cost 1
can be assigned to the insertion of each new node and
new edge in ; a cost proportional to their distance
can be assigned to each node n in Q which is mapped
into a node v0 in by m.</p>
      <p>Algorithm 2 presents a simple approach for PGQP
set update based on the notions introduced above. In
particular, it updates the input set of PGQPs by rst
selecting the PGQP leading to the lowest
transformation cost for the input graph query Q. If such cost is
lower than a given threshold, the PGQP set is updated
by replacing the selected PGQP with the PGQP
obtained by the transformation. Otherwise, the PGQP
set is updated by inserting Q as a new PGQP.</p>
      <p>
        Example 1. Consider the graph query Q and the
PGQP set S = f 1; 2g presented in Figure 2.
Consider the graph similarity function m induced by the
Similarity Flooding algorithm [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] (see [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] for details),
a minimum similarity threshold vs = 0:7, and a
maximum cost threshold vc = 3.
      </p>
      <p>Q is processed as follows. Algorithm 1 rst
computes the mapping and the similarity between Q and
each i, relying on m. Partial approximate subgraph
matches can be generated from Q to both 1 and 2.
Assume the following graph similarities are computed:
m(Q; 1) = 0:83 and m(Q; 2) = 0:40. PGQP 1 is
selected because it is the most similar to Q. Since
similarity is greater than vs, decomposition is applied and
the pair (fQ 1 g; Q Q 1 ) is returned for processing.</p>
      <p>At the end of the processing, Algorithm 2 is used
for updating the PGQP store. Suppose that CT (Q
Q 1 ; 1) = 2:56 and CT (Q Q 2 ; 2) = 3. PGQP 1
is selected because of the lowest transformation cost.
Since the transformation cost is lower than vc, PGQP
1 is updated by inserting all nodes and edges in Q
Q 1 not mapped into 1 nodes by m. 3</p>
    </sec>
    <sec id="sec-9">
      <title>OPEN ISSUES</title>
      <p>The work reported in this paper represents just the
rst step towards the design of approximate query
processing approaches for recurring retrieval needs.
Several issues still need to be investigated. Some of them
are discussed in what follows.</p>
      <p>
        Framework instantiation. In this paper, we
considered a simple notion of approximate subgraph
matching, de ned by relaxing the notion of subgraph
isomorphism. Other approaches can however be
considered, which allow more exible types of
approximation (see, e.g., [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] for a survey). More e ective
query decomposition approaches should also be
designed which increase the number of generated
subqueries, together with speci c optimality criterias;
optimization approaches should be developed which take
into account the number and the shape of the
identied subqueries for selecting the most e cient
PGQPbased execution plan.
      </p>
      <p>Information associated with PGQPs. As already
discussed, several types of information can be
associated with PGQPs. For each type of information,
speci c PGQP-based query processing algorithms should
be devised.</p>
      <p>PGQP management. In this paper, we assumed to
deal with an initial set of PGQPs and we only
preliminarily discuss how to modify it taking into account a
new graph query execution. There are however
several further issues to be taken into account for PGQP
management, e.g., how to decide that a query is
recurrent and should be indeed used for updating the
PGQP set? What about the tradeo between the size
of the PGQP set, the query processing cost, and the
e ectiveness (in terms of result quality) of the overall
query approach?
5.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Barcelo</surname>
          </string-name>
          .
          <article-title>Querying graph databases</article-title>
          .
          <source>In Proc. of PODS</source>
          , pp.
          <volume>175</volume>
          {
          <fpage>188</fpage>
          . ACM,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Barcelo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Libkin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Reutter</surname>
          </string-name>
          .
          <article-title>Querying graph patterns</article-title>
          .
          <source>In Proc. of PODS</source>
          , pp.
          <volume>199</volume>
          {
          <fpage>210</fpage>
          . ACM,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Catania</surname>
          </string-name>
          et Al.
          <article-title>Wearable queries: adapting common retrieval needs to data and users</article-title>
          .
          <source>In Proc. DBRank Workshop</source>
          . ACM,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Catania</surname>
          </string-name>
          and
          <string-name>
            <given-names>G</given-names>
            <surname>Guerrini</surname>
          </string-name>
          .
          <article-title>Approximate queries with adaptive processing</article-title>
          .
          <source>Advanced Query Processing, Intelligent Systems Reference Library (36)</source>
          , pp.
          <fpage>237</fpage>
          -
          <lpage>269</lpage>
          . Springer,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B.</given-names>
            <surname>Catania</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>De Fino</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Guerrini</surname>
          </string-name>
          .
          <article-title>Recurring retrieval needs in diverse and dynamic dataspaces: issues and reference framework, extended version</article-title>
          .
          <source>Technical Report</source>
          , University of Genoa, Italy,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>A. Das Sarma</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Dong</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Halevy</surname>
          </string-name>
          .
          <article-title>Bootstrapping Pay-As-You-Go Data integration systems</article-title>
          .
          <source>In Proc. of SIGMOD. ACM</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>W.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          .
          <article-title>Answering pattern queries using views</article-title>
          .
          <source>IEEE Trans. Knowl</source>
          . Data Eng.,
          <volume>28</volume>
          (
          <issue>2</issue>
          ):
          <volume>326</volume>
          {
          <fpage>341</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F.</given-names>
            <surname>Goasdoue</surname>
          </string-name>
          et Al.
          <article-title>View selection in semantic web databases</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>5</volume>
          (
          <issue>2</issue>
          ):
          <volume>97</volume>
          {
          <fpage>108</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>He</surname>
          </string-name>
          et Al.
          <article-title>Assessing single-pair similarity over graphs by aggregating rst-meeting probabilities</article-title>
          .
          <source>Inf. Syst</source>
          .
          <volume>42</volume>
          :
          <fpage>107</fpage>
          -
          <lpage>122</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D.</given-names>
            <surname>Koutra</surname>
          </string-name>
          et Al.
          <article-title>DeltaCon: Principled Massive-Graph Similarity Function with Attribution</article-title>
          .
          <source>TKDD</source>
          <volume>10</volume>
          (
          <issue>3</issue>
          ):
          <fpage>28</fpage>
          ,
          <year>2016</year>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Melnik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Garcia-Molina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and E.</given-names>
            <surname>Rahm</surname>
          </string-name>
          .
          <article-title>Similarity ooding: A versatile graph matching algorithm and its application to schema matching</article-title>
          .
          <source>In Proc. of ICDE</source>
          , pp.
          <volume>117</volume>
          {
          <fpage>128</fpage>
          . IEEE,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tian</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.M.</given-names>
            <surname>Patel</surname>
          </string-name>
          .
          <article-title>TALE: A Tool for Approximate Large Graph Matching</article-title>
          .
          <source>In Proc. of ICDE</source>
          , pp.
          <fpage>963</fpage>
          -
          <lpage>972</lpage>
          ,
          <year>2008</year>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ntarmos</surname>
          </string-name>
          , and
          <string-name>
            <surname>P.</surname>
          </string-name>
          <article-title>Trianta llou. Indexing query graphs to speedup graph query processing</article-title>
          .
          <source>In Proc. of EDBT</source>
          , pages
          <volume>41</volume>
          {
          <fpage>52</fpage>
          .
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>G.</given-names>
            <surname>Weikum</surname>
          </string-name>
          .
          <article-title>Data and knowledge discovery</article-title>
          .
          <source>GRDI</source>
          <year>2020</year>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          .
          <article-title>Extending graph homomorphism and simulation for real life graph matching</article-title>
          .
          <source>PhD Thesis</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhang</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Shasha</surname>
          </string-name>
          .
          <article-title>Simple fast algorithms for the editing distance between trees and related problems</article-title>
          .
          <source>SIAM Journal on Computing</source>
          ,
          <volume>18</volume>
          (
          <issue>6</issue>
          ):
          <volume>1245</volume>
          {
          <fpage>1262</fpage>
          ,
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>P.</given-names>
            <surname>Zhao</surname>
          </string-name>
          , J. Han.
          <article-title>On graph query optimization in large networks</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>3</volume>
          (
          <issue>1</issue>
          ):
          <fpage>340</fpage>
          -
          <lpage>351</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>