=Paper=
{{Paper
|id=Vol-1810/GraphQ_paper_11
|storemode=property
|title=Recurring Retrieval Needs in Diverse and Dynamic Dataspaces: Issues and Reference Framework
|pdfUrl=https://ceur-ws.org/Vol-1810/GraphQ_paper_11.pdf
|volume=Vol-1810
|authors=Barbara Catania,Francesco De Fino,Giovanna Guerrini
|dblpUrl=https://dblp.org/rec/conf/edbt/CataniaFG17
}}
==Recurring Retrieval Needs in Diverse and Dynamic Dataspaces: Issues and Reference Framework==
Recurring Retrieval Needs in Diverse and Dynamic
Dataspaces: Issues and Reference Framework
Barbara Catania, Francesco De Fino, Giovanna Guerrini
University of Genoa, Italy
{firstname.lastname}@dibris.unige.it
ABSTRACT is, a set of smart information services offered by a
Processing information requests over heterogeneous municipality to users that, e.g., retrieves information
dataspaces is very challenging because aimed at guar- about attractions, points of interests, and shops. A
anteeing user satisfaction with respect to conflicting user might, for instance, ask for the authors of the fig-
requirements on result quality and response time. In urative artworks she is watching (i.e., they are located
[3], it has been argued that, in dynamic contexts pre- close to her position), together with information about
venting substantial user involvement in interpreting other places where such authors are currently expos-
requests, information on similar requests recurring over ing. In this specific context, data may come from dif-
time can be exploited during query processing. ferent datasets, are heterogeneous and very dynamic,
In this paper, referring to a graph-based modeling as the user may quickly change her position or the en-
of dataspaces and requests, we propose a preliminary vironment itself (including data) might be different at
approach in this direction centered on the enabling different instants of time.
concept of Profiled Graph Query Pattern (PGQP) as Processing complex requests on diverse and dynamic
an aggregation of information on past evaluations of sources requires: (i) request interpretation; (ii) source
a set of previously executed queries. The information selection; (iii) actual processing of the request on the
maintained in PGQP is not query results, as in mate- relevant, and usually big in size, sources. The overall
rialized queries, but can include different kinds of data process is costly and, nevertheless, it may not guar-
and metadata. antee user satisfaction on the obtained result since
the request (i) could be incorrectly interpreted, (ii)
could be processed on inaccurate, incomplete, unre-
1. INTRODUCTION liable data, or (iii) could require a processing time
Motivations. The last years have been character- inadequate to its urgency.
ized by a tremendous growth of available informa- To make processing time more adequate to the ur-
tion, coming from information sources that are highly gency of the request, approximate query processing
heterogeneous in terms of structure, semantic rich- approaches can be considered, by providing a fast an-
ness, and quality. Sources can indeed contain unstruc- swer at the cost of a lower accuracy [4]. Additionally,
tured data as well as data with well-defined structure, to improve the interpretation of the request, and re-
strongly correlated and semantically complex but rel- lated source selection, one possibility is to rely on user
atively static data (e.g., Linked Open Data), highly involvement. Novel paradigms for exploratory user-
dynamic user-generated data. This information is, data interactions, that emphasize user context and
however, a resource currently exploited much below interactivity with the goal of facilitating exploration,
its potential because of the difficulties for the users interpretation, retrieval, and assimilation of informa-
in accessing it. Users would like to find answers to tion, are being developed. Such solutions, however,
complex requests expressing relationships among the do not seem adequate in dynamic contexts, character-
entities of their interest, but they are able to specify ized by urgent requests and by communication means
requests only vaguely because they cannot reasonably hampering user interaction.
know the format and structure of the data that encode Other innovative approaches should therefore be de-
the information relevant to them. vised, relying on different kinds of information for find-
As an example, consider a smart city explorer, that ing possibly approximate answers to complex infor-
mation needs, even vaguely and imprecisely specified,
operating on the full spectrum of relevant content in
a dataspace of highly heterogeneous and poorly con-
trolled sources. In [3], to overcome difficulties related
to heterogeneity and dynamic nature, the exploitation
of additional information, in terms of user profile and
2017, Copyright is with the authors. Published in the Workshop
Proceedings of the EDBT/ICDT 2017 Joint Conference (March 21, request context, data and processing quality, similar
2017, Venice, Italy) on CEUR-WS.org (ISSN 1613-0073). Distri- requests recurring over time, is suggested.
bution of this paper is permitted under the terms of the Creative In this paper, we focus on similar requests recurring
Commons license CC-by-nc-nd 4.0
over time for improving approximate processing of re- spond to previously computed results, rather it ranges
quests that we assume already interpreted and rep- from query results to any other higher level informa-
resented according to a given formalism. Recurring tion collected during query processing (e.g., query-
retrieval needs are very common in dynamic contexts, to-data mappings). In specifying the framework, we
such as, for instance, during or after an exceptional identify the need for PGQP-aware approximate query
event (environmental emergencies or flash mobbing processing techniques and PGQP management issues
initiatives), in the context of users belonging to the raised by the framework. Each instantiation of the
same community or that are in the same place, pos- framework will lead to the definition of a specific PGQP-
sibly at different times. The information needs are aware query processing approach for a given set of
widespread among different users, because induced by information associated with PGQPs. Approximated
the event, the interests of the community, and the results can be generated for two main reasons: due
place, respectively. We aim at taking advantage of the to the dynamic nature of the dataspace, information
experience gained by prior processing in new searches associated with PGQPs may change between two dis-
for limiting interpretation errors and response time, tinct executions of the same query; PGQPs are se-
thus reducing the possibility of producing unsatisfac- lected based on similarity criteria with respect to the
tory answers. query at hand.
Recurring retrieval needs have been considered in The paper finally proposes a specific instantiation
query processing for a very long time in terms of ma- of the framework, in the context of which the various
terialized views (see, e.g., [7, 8]) The idea is to precom- components and concepts are formalized, and algo-
pute the results of some recurring queries as materi- rithms for the most relevant tasks are specified. In
alized views, select some of such views (view selection this instantiation, we formally define PGQPs but we
problem) and re-use them (view-based query process- leave to future work the association of PGQPs with
ing) for the execution of new requests. Unfortunately, information collected in prior executions.
the usage of materialized views in the contexts de-
scribed above suffers of some problems: (i) views as- Paper organization. The proposed framework is
sociate a result to a given query, however in the ref- presented in Section 2 and a specific instantiation of
erence contexts we may be interested in associating the framework is defined in Section 3. Section 4 con-
other or additional information (e.g., the set of used cludes by discussing a number of open issues we are
data sources or, under pay-as-you-go integration ap- investigating in this work-in-progress.
proaches [6], query-to-data mappings); (ii) view up-
dates are very frequent in dynamic environments, re-
ducing the efficiency of the overall system; (iii) view- 2. PGQP-BASED FRAMEWORK
based query processing techniques usually rely on a The framework we propose is based on a graph-
precise semantics while heterogeneity tends to favour based representation of dataspaces, interpreted as a
approximation-based approaches. collection of (possibly) schemaless data sources, and of
Alternative usages of the concept of recurring re- user retrieval needs, expressed as entity relationships
trieval needs in query processing have been recently queries over such dataspaces [14]. Specifically, we as-
provided but, also in this case, precise query pro- sume to deal with data sources represented in terms
cessing is taken into account. For example, in [13], of partially specified graphs, where some information,
common needs are used with the aim of expediting associated with nodes, may be unknown [2]. Similarly,
the processing of graph queries against a database of each request is represented in terms of a graph, speci-
graphs, by reducing the number of subgraph isomor- fying entities and relationships of interest, which may
phism tests to be performed. contain node variables, in order to characterize infor-
mation to be retrieved [2]. As an example, Figure 2
Contributions. This paper presents the preliminary (a) presents a graph query Q corresponding to the in-
results of an on-going work aiming at exploiting in- formation need “the authors of the figurative artworks
formation about similar requests recurring over time the user is watching (i.e., they are located close to her
in approximate query processing of requests executed position), together with such artworks, a biography of
over diverse and dynamic dataspaces. We refer to a the authors, and information about places in Genoa
graph-based representation of dataspaces, interpreted where the authors are currently exposing their art-
as a collection of (possibly) schemaless data sources, works”. Notice that, since the user request is related
and on a graph-based language for modeling user re- to her current position, it needs to be interpreted and
trieval needs, expressed as entity relationships queries processed before such position changes.
over such dataspaces. We propose a framework for For representing and managing (sets of) recurring
representing and managing (sets of) recurring user re- user requests, the framework relies on the enabling
quests, and for exploiting them in approximate query concept of Profiled Graph Query Pattern (PGQP). Each
processing. The proposed framework is based on the PGQP corresponds to a graph, associated with data or
concept of Profiled Graph Query Pattern (PGQP). Like metadata related to the past evaluation of the queries
materialized views, each profiled graph query pattern it represents. Such information does not necessar-
corresponds to a kind of index value associated with ily correspond to previously computed results, thus,
information related to the past evaluation of the queries PGQPs do not necessarily correspond to materialized
it represents; however, differently from materialized views. Rather, any kind of metadata collected dur-
views, such information does not necessarily corre- ing query processing, as the set of used data sources,
Figure 1: PGQP-based framework
query-to-data mappings under pay-as-you-go integra- cific PGQP-based processing algorithms should be de-
tion approaches [6], or result quality information, can signed, depending on the information associated with
be considered. As an example, Figure 2 presents a PGQPs. On the other hand, query Qsub , which repre-
graph query Q and two PGQPs φ1 and φ2 . sents the residual part of Q, has to be executed based
Independently from the information associated with on a traditional graph query processing algorithm (see,
PGQPs, the proposed framework aims at addressing e.g. [17]). At the end of the processing, all partial re-
three main problems which can be stated as follows sults are merged through the fusion operator θ and
(see Figure 1 for an overall picture). the result is returned to the user.
Referring to Figure 2(a), Qφ1 is processed using in-
PGQP-based query decomposition. Given a query formation associated with φ1 while Qsub is processed
Q and a set S of PGQPs, defined in terms of graph by a traditional query processing algorithm.
queries, determine whether it is possible to rely on S
for executing Q. This is possible if Q can be approx- PGQP set update. For each graph query Q, exe-
imated by a new query, obtained by composing a set cuted by the system without taking into account
of Q subgraphs, which best match PGQPs, with the PGQPs, information related to its processing is col-
portion of Q which cannot be represented in terms of lected (depending on the aim of the approach) and
S. More formally, we look for a subgraph Qsub of Q, used, together with Q, for updating the set of avail-
a PGQP subset S⊆ ⊆ S, and a fusion operator θ such able PGQPs at the end of the processing.
that Q can be rewritten in a new approximate query As an example, Figure 2(d) shows PGQP φ1 ex-
Qa = θ({Qφi |φi ∈ S⊆ }, Qsub ), where Qφi denotes the tended with the part of the query executed by a tra-
(sub)graph of Q for which an approximate match with ditional query processing algorithm.
(a subgraph of) φi exists, and Qa satisfies some op-
timality criteria. That is, PGQPs in S⊆ are selected The framework described above can be instantiated
according to a similarity function δ which quantifies in several ways. As an example, consider an instan-
how close the query graph and each PGQP are. tiation targeted to source selection. In this case, the
With reference to Figure 2, various decompositions information associated with PGQP φ1 could be the
of Q are possible based on φ1 and φ2 . Assuming φ1 sources exploited during the past processing of the
is more similar to Q than φ2 , the decomposition of Q queries it represents to retrieve the relevant informa-
can return: (i) S⊆ = {φ1 }; (ii) Qsub defined as shown tion. Thus, for instance, information relevant for φ1
in Figure 2(a); (iii) θ defined as the join between the may come from a dynamic data source s1 contain-
partial results of Qφ1 (tuples for variables ?a, ?m, ?b) ing information about art exhibitions of various artists
and Qsub (tuples for variables ?a,?b). Notice that, by (left part of φ1 ) and a static data source s2 (such as
interpreting PGQPs as views, traditional view selec- dbpedia) containing biographies of artists (right part).
tion algorithms would not have selected any PGQP, An important difference with respect to the material-
due to the approximate matches between “Genoa” and ized view approach is thus that the selected PGQP is
“GE” and “Figurative” and “Figurative Art”. Note not associated with query results (that would have re-
that, even if in this example Qφ1 totally matches φ1 , quired a frequent recomputation, given the dynamicity
partial matches are also possible. of data sources, e.g., in art exhibits) but the sources
contributing data to the result. PGQP-based process-
PGQP-aware query processing. In order to opti- ing would rely on this information for targeting to
mize Qa evaluation, we can rely on information asso- these sources the execution of Qφ1 . As an effect of
ciated with PGQPs in S⊆ . More precisely, each graph the execution of Q, a third data source s3 containing
query Qφi can be executed taking into account in- museum catalogues with artwork positions is identi-
formation related to its prior execution, that is, in- fied, and associated with the updated PGQP (shown
formation associated with φi in S. To this aim, spe- in Figure 2(d)).
Figure 2: Example PGQP: (a) graph query Q; (b) PGQP φ1 ; (c) PGQP φ2 ; (d) PGQP φ1 after the update phase
3. A PRELIMINARY PGQP-AWARE conjunctive queries and we denote with UACQ the re-
PROCESSING ALGORITHM sulting language [1, 2]. Formally, an UACQ query Q
is the union of expressions Q1 , ...Qk where each Qh ,
In this section, we select suitable models for datas- 1 ≤ h ≤ k, is an expression of the form
paces and user requests, we formalize the concept of ^
PGQP and we discuss their use and management in a ans(z1 , ..., zn ) ← (xi , ai , yi )
preliminar instantiation of the framework. 1≤i≤m
3.1 Dataspace and User Requests such that the graph underlying Qh is acyclic, m > 0,
Data Space Representation. Under the reference each xi and yi is a node variable or a constant (1 ≤
context, data spaces may contain redundant or even i ≤ m), each ai ∈ Σ (1 ≤ i ≤ m), and each zi is some
missing information. Furthermore, the heterogeneous variable xj or yj (1 ≤ i ≤ n, 1 ≤ j ≤ m). The query
nature of data leads to schemaless representations. To Q is Boolean if ans(), i.e. n = 0.
take care of these features, we assume the data space UACQ queries can always be interpreted as graph
is represented in terms of a collection of graph pattern (pattern) queries Q = (, z) where is the graph pat-
databases, i.e., graphs where nodes can also be labeled tern underlying Q and z = (z1 , ..., zn ).
with variables, for representing situations in which the The semantics of a UACQ query Q = (, z) when ex-
node identity is not known [1]). ecuted over a graph pattern database π = (N, E) can
be defined as the set of answers which are returned
Definition 1. Let N be a set of node labels. Let for each instance of the graph pattern database. Ac-
Vnode be a set of node variables. Let Σ be an arbitrary cording to [2], we call them certain answers and we
(finite or infinite) set of symbols. A Graph Pattern compute them as:
Database over Σ is a pair π = (N, E) where: (i) N ⊆ certainΣ (Q, π) = {v ∈ N n |π |= [v/z]}
N ∪Vnode is the finite set of nodes; (ii) E ⊆ N ×Σ×N
is the set of edges. 2 where, given two graph patterns π1 and π2 , π1 |= π2
(π1 implies π2 ) if [[π1 ]]Σ ⊆ [[π2 ]]Σ .
The semantics of a graph pattern database corre-
sponds to all graphs which are instances of the pat- 3.2 Profiled Graph Query Patterns
tern and it is defined via homomorphisms. Given a A PGQP represents in a compact way a set of graph
graph G = (N 0 , E 0 ) and a graph pattern database queries executed in the past. In order to provide a for-
π = (N, E), a homomorphism h : π → G is a mapping mal definition of PGQPs, we propose to consider union
h : N → N 0 that maps labels and variables used in π as aggregate operator of the component queries and to
to labels used in G such that: represent a PGQP as a UACQ query. Since PGQPs
1. h1 (n) = n, for every node label n ∈ N ; and are used for the approximate evaluation of queries, the
issue also arises of determining whether each PGQP
2. for every edge (n1 , e, n2 ) ∈ E, there is a path represents, in a reasonably good way, (a portion of) a
(h(n1 ), e, h(n2 )) in G. graph query Q. If this is the case, φ becomes a candi-
date for the PGQP-aware processing of Q and we say
The semantics of a graph pattern w.r.t. the label- that Q matches the PGQP. PGQP candidates can be
ing alphabet Σ is [[π]]Σ = {G over Σ|G |= π}, where selected through approximate subgraph matching [12]
G |= π holds if a homomorphism h : π → G exists. and the usage of a graph similarity function.
Graph Query Language. In this work, for the sake Definition 2. Let Q1 = (N1 , E1 ), Q2 = (N2 , E2 ) ∈
of simplicity and for guaranteeing a tractable com- U ACQ. An Approximate Subgraph Match between Q1
bined and data complexity (fundamental requirements and Q2 is a bijection mapping λ : N10 ↔ N20 , where
for dealing with massive in size graph databases), we Ni0 ⊆ Ni , i = 1, 2. An Approximate Subgraph Match-
consider graph queries specified as unions of acyclic ing Function is a function m that, for each pair of
Algorithm 1 PGQP-based query decomposition Algorithm 2 PGQP set update
INPUT INPUT
Q: graph query Q: graph query
S = {φ1 , φ2 , ..., φn }: set of PGQPs S = {φ1 , φ2 , ..., φn }: set of PGQPs
δm : a graph similarity function induced by an approxi- vc : maximum allowed transformation cost
mate subgraph matching function m OUTPUT
vs : minimum allowed graph similarity S: an updated set of PGQPs
OUTPUT APPROACH
(S⊆ , Qsub ), such that S⊆ ⊆ S and Qsub is a subgraph of 1: Let φh ∈ S such that CT (Q, φh ) = min1≤i≤n CT (Q, φi )
Q 2: if (CT (Q, φh ) ≤ vc ) then
APPROACH 3: φh = T (Q, φh )
1: Let φk ∈ S such that δm (Q, φk ) = max1≤i≤n δm (Q, φi ) 4: S = S \ {φh } ∪ {φh }
2: if δm (Q, φk ) > vs then 5: else
3: return ({Qφk }, Q − Qφk ) 6: S = S ∪ {Q}
4: else 7: return S
5: return ({}, Q)
Qφ and then all remaining non connected nodes be-
Q1 , Q2 ∈ U ACQ, returns an approximate subgraph longing to Qφ ). This consideration guides us in the
match from Q1 to Q2 . A Graph Similarity Function definition of a preliminary decomposition algorithm
is a metric function δ : U ACQ × U ACQ → [0, 1] such (see Algorithm 1). Given a query Q and a set of
that, for each Q1 , Q2 ∈ U ACQ, δ(Q1 , Q2 ) quantifies PGQPs S, the PGQP φk providing the highest simi-
the similarity between Q1 and Q2 . δ is induced by an larity value with respect to Q is selected and, if such
Approximate Subgraph Matching Function m (denoted value is greater than a given threshold (thus, Q and φk
by δm ) if the value assigned to (Q1 , Q2 ) quantifies the are close enough), the binary decomposition described
similarity between subgraphs Q01 and Q02 involved in above is returned. Otherwise, no decomposition is ap-
the match m(Q1 , Q2 ). 2 plied.
PGQPs and PGQP matches can now be defined. 3.4 PGQP Set Update
In order to update the PGQP store, we rely on a
Definition 3. A Profiled Graph Query Pattern φ graph transformation function.
is a graph query in UACQ. Let δm be a graph sim-
ilarity function induced by an approximate subgraph Definition 4. A Graph Transformation Function
matching function m. A graph query Q matches a is a function T : U ACQ × U ACQ → U ACQ such
PGQP φ, with similarity v ∈ (0, 1], if δm (Q, φ) = v. that, for each query Q and PGQP φ, Q is a match of
Qφ denotes the (sub)graph of Q involved in the match. T (Q, φ), based on δm . 2
Q is a total match of φ if Qφ = Q, it is a partial match
otherwise. 2 Transformation functions can be defined in terms
of a small set of operations, previously defined in the
In the following, we assume that function δm is pro- context of the so called tree editing problem [16], which
vided by the system and used during PGQP-based at least contains node relabelling, deletion and inser-
query decomposition and PGQP set update. tion. Each transformation function can be defined
by specifying which operations should be applied to
3.3 PGQP-based Query Decomposition a given PGQP for each node and edge of the query
In the following, we present a preliminary PGQP- Q at hand, in such a way that Q, after the trans-
based query decomposition approach, defined assum- formation, matches the resulting PGQP. An example
ing to: (i) select only one PGQP (thus, |S⊆ | = 1); (ii) of graph transformation is the function that, given a
merge partial results thought relational operators. PGQP φ and a query Q, inserts into φ all nodes and
Several approximate graph matching functions have edges in Q − Qφ that have not been mapped into φ
been proposed so far which can be used as basis for nodes by δm .
defining similarity functions (see, e.g., [9, 10] for some In order to identify the best transformation, a graph
recent papers and [15] for a survey). A relevant exam- cost function can be defined as follows.
ple, due to its flexibility, is represented by the Similar-
ity Flooding function [11]. Based on the intuition that Definition 5. A Graph Cost Function for a trans-
similar nodes have similar neighbors, node similar- formation function T is a function CT : U ACQ ×
ity values are computed relying on information about U ACQ → R, which, for each graph query Q and PGQP
node neighbors through an iterative fixpoint compu- φ, assigns a cost value to T (Q, φ). 2
tation. The selection metric, used in Selection Flood-
ing to identify the best mapping, can be used (after Similarly to the tree editing problem, a graph cost
normalization of values between 0 and 1) as a graph function quantifies the transformation effort by assign-
similarity function. Details about PGQP definition in ing a cost to each applied editing operation. As an ex-
terms of the Similarity Flooding function can be found ample, given a graph query Q and a PGQP φ, cost 1
in [5]. can be assigned to the insertion of each new node and
According to Definition 3, each PGQP φ divides a new edge in φ; a cost proportional to their distance
graph query Q into two subqueries, namely Qφ and can be assigned to each node n in Q which is mapped
Q − Qφ (computed by removing from Q first edges in into a node v 0 in φ by δm .
Algorithm 2 presents a simple approach for PGQP current and should be indeed used for updating the
set update based on the notions introduced above. In PGQP set? What about the tradeoff between the size
particular, it updates the input set of PGQPs by first of the PGQP set, the query processing cost, and the
selecting the PGQP leading to the lowest transforma- effectiveness (in terms of result quality) of the overall
tion cost for the input graph query Q. If such cost is query approach?
lower than a given threshold, the PGQP set is updated
by replacing the selected PGQP with the PGQP ob- 5. REFERENCES
tained by the transformation. Otherwise, the PGQP
[1] P. Barceló. Querying graph databases. In Proc.
set is updated by inserting Q as a new PGQP.
of PODS, pp. 175–188. ACM, 2013.
Example 1. Consider the graph query Q and the [2] P. Barceló, L. Libkin, and J. L. Reutter.
PGQP set S = {φ1 , φ2 } presented in Figure 2. Con- Querying graph patterns. In Proc. of PODS, pp.
sider the graph similarity function δm induced by the 199–210. ACM, 2011.
Similarity Flooding algorithm [11] (see [5] for details), [3] B. Catania et Al. Wearable queries: adapting
a minimum similarity threshold vs = 0.7, and a max- common retrieval needs to data and users. In
imum cost threshold vc = 3. Proc. DBRank Workshop. ACM, 2013.
Q is processed as follows. Algorithm 1 first com- [4] B. Catania and G Guerrini. Approximate
putes the mapping and the similarity between Q and queries with adaptive processing. Advanced
each φi , relying on δm . Partial approximate subgraph Query Processing, Intelligent Systems Reference
matches can be generated from Q to both φ1 and φ2 . Library (36), pp. 237-269. Springer, 2013.
Assume the following graph similarities are computed: [5] B. Catania, F. De Fino, and G. Guerrini.
δm (Q, φ1 ) = 0.83 and δm (Q, φ2 ) = 0.40. PGQP φ1 is Recurring retrieval needs in diverse and
selected because it is the most similar to Q. Since sim- dynamic dataspaces: issues and reference
ilarity is greater than vs , decomposition is applied and framework, extended version. Technical Report,
the pair ({Qφ1 }, Q − Qφ1 ) is returned for processing. University of Genoa, Italy, 2016.
At the end of the processing, Algorithm 2 is used [6] A. Das Sarma, X. Dong, and A. Halevy.
for updating the PGQP store. Suppose that CT (Q − Bootstrapping Pay-As-You-Go Data integration
Qφ1 , φ1 ) = 2.56 and CT (Q − Qφ2 , φ2 ) = 3. PGQP φ1 systems.In Proc. of SIGMOD. ACM, 2008.
is selected because of the lowest transformation cost. [7] W. Fan, X. Wang, and Y. Wu. Answering
Since the transformation cost is lower than vc , PGQP pattern queries using views. IEEE Trans.
φ1 is updated by inserting all nodes and edges in Q − Knowl. Data Eng., 28(2):326–341, 2016.
Qφ1 not mapped into φ1 nodes by δm . 3 [8] F. Goasdoué et Al. View selection in semantic
web databases. PVLDB, 5(2): 97–108, 2011.
4. OPEN ISSUES [9] J. He et Al. Assessing single-pair similarity over
The work reported in this paper represents just the graphs by aggregating first-meeting
first step towards the design of approximate query pro- probabilities. Inf. Syst. 42: 107-122, 2014.
cessing approaches for recurring retrieval needs. Sev- [10] D. Koutra et Al. DeltaCon: Principled
eral issues still need to be investigated. Some of them Massive-Graph Similarity Function with
are discussed in what follows. Attribution. TKDD 10(3): 28, 2016
Framework instantiation. In this paper, we consid- [11] S. Melnik, H. Garcia-Molina, and E. Rahm.
ered a simple notion of approximate subgraph match- Similarity flooding: A versatile graph matching
ing, defined by relaxing the notion of subgraph iso- algorithm and its application to schema
morphism. Other approaches can however be con- matching. In Proc. of ICDE, pp. 117–128. IEEE,
sidered, which allow more flexible types of approxi- 2002.
mation (see, e.g., [15] for a survey). More effective [12] Y. Tian and J.M. Patel. TALE: A Tool for
query decomposition approaches should also be de- Approximate Large Graph Matching. In Proc. of
signed which increase the number of generated sub- ICDE, pp. 963-972, 2008
queries, together with specific optimality criterias; op- [13] J. Wang, N. Ntarmos, and P. Triantafillou.
timization approaches should be developed which take Indexing query graphs to speedup graph query
into account the number and the shape of the identi- processing. In Proc. of EDBT, pages 41–52.
fied subqueries for selecting the most efficient PGQP- 2016.
based execution plan.
[14] G. Weikum. Data and knowledge discovery.
Information associated with PGQPs. As already
GRDI 2020, 2011.
discussed, several types of information can be associ-
[15] Y. Wu. Extending graph homomorphism and
ated with PGQPs. For each type of information, spe-
simulation for real life graph matching. PhD
cific PGQP-based query processing algorithms should
Thesis, 2010.
be devised.
PGQP management. In this paper, we assumed to [16] K. Zhang and D. Shasha. Simple fast algorithms
deal with an initial set of PGQPs and we only prelim- for the editing distance between trees and
inarily discuss how to modify it taking into account a related problems. SIAM Journal on Computing,
new graph query execution. There are however sev- 18(6):1245–1262, 1989.
eral further issues to be taken into account for PGQP [17] P. Zhao, J. Han. On graph query optimization
management, e.g., how to decide that a query is re- in large networks. PVLDB, 3(1): 340-351, 2010.