=Paper=
{{Paper
|id=Vol-1189/paper_24
|storemode=property
|title=Querying Incomplete Graphs with Data
|pdfUrl=https://ceur-ws.org/Vol-1189/paper_24.pdf
|volume=Vol-1189
|dblpUrl=https://dblp.org/rec/conf/amw/GheerbrantF14
}}
==Querying Incomplete Graphs with Data==
<pdf width="1500px">https://ceur-ws.org/Vol-1189/paper_24.pdf</pdf>
<pre>
         Querying incomplete graphs with data

                      Gaëlle Fontaine1 and Amélie Gheerbrant2
              1
                  Department of Computer Science, Universidad de Chile
                  2
                   LIAFA (Université Paris Diderot - Paris 7 & CNRS)


1    Introduction

Graph databases underlie several modern applications such as social networks
and the Semantic Web. In those scenarios, integrating and exchanging data is
very common, which leads to proliferation of incomplete graph data. However,
the well developed models of incompleteness of data do not apply to graph data.
This is mainly due to the fact that standard graph query languages concentrate
on graph topology; this requires functionalities beyond the abilities of standard
relational systems. Besides, many graph languages ignore the actual data stored.
    However, recently languages combining data and topology aspects of querying
have been proposed for graph databases. An example is a query Find pairs of
people in a social network connected by professional links restricted to people
of the same age). Formalisms developed to handle such queries include regular
expressions with memory (REM), regular expressions with equalities (REE) [5],
their extensions [1], as well as variants of XPath [4].
    Handling incompleteness by languages dealing with pure graph topology has
been studied in [2]. In this short note, we present preliminary results on dealing
with incompleteness at the levels of both data and topology, using some of the
recently proposed query languages.


2    Preliminaries

Let Σ be a finite alphabet Σ, let N be a countable set of node ids and let D
be an infinite alphabet of data values. A data graph G (over Σ, N and D) is a
tuple (V, E, ρ), where V ⊆ N is a finite set of nodes, E ⊆ V × Σ × V is a set of
Σ-labeled edges, and ρ : V → D assigns a data item to each node. A path π in G
is a sequence v0 a0 v1 a1 v2 · · · vk−1 ak−1 vk such that (vi−1 , ai−1 , vi ) ∈ E, for each
i ≤ k. The data path associated with π is the word ρ(v0 )a0 ρ(v1 )a1 · · · ak−1 ρ(vk ).
    The regular expressions with equality (REE) [5] are defined by the grammar

                        e ::=  | a | e.e | e ∪ e | e+ | | e= | e6= ,

where a ranges over Σ. Given a REE e, L(e) is defined by induction by the
following rules
L() = {d : d ∈ D},                 L(e= ) = {d1 a1 . . . dn1 an−1 dn ∈ L(e) : d1 = dn }
L(a) = {dad0 : d, d0 ∈ D},          L(e6= ) = {d1 a1 . . . dn1 an−1 dn ∈ L(e) : d1 6= dn },
and the standard rules for L(e.e0 ), L(e ∪ e0 ) and L(e+ ). REE expressions without
6= are called positive. The evaluation e(G) of an REE e on a data graph G gives
 pairs of nodes (u, v) such that there is a data path between them that belongs
 to L(e). The combined complexity of REEs is Ptime, while the data complexity
 is in Nlogspace. REEs are a subclass of a more expressive class REM whose
 data complexity remains the same but combined complexity is in PSpace [5].
     Let Vdata be an infinite set of label variables, let Vnode be an infinite set
 of nodes variables and let Vlabel be an infinite set of data variables. A data
 graph pattern (over Σ, N and D) is a data graph over the alphabets Σ ∪ Vlabel ,
 N ∪ Vnode and D ∪ Vdata . That is, a data graph pattern is a data graph with
 constant nodes and node variables, whose edges can be labeled with variables
 and elements of Σ, and in which the data values are taken from D ∪ Vdata .
     Given a data graph pattern G = (V, E, ρ) and a data graph G0 = (V 0 , E 0 , ρ0 ),
 a homomorphism is a triple (h1 , h2 , h3 ) of mappings h1 : V → V 0 , h2 that maps
 label variables in G to letters in Σ, h3 that maps data variables in G to data
 values in D such that

 – h1 (n) = n for all n ∈ N ,
 – for every edge (p, x, p0 ) ∈ E, the edge (h1 (p), h2 (x), h1 (p0 )) belongs to E 0 ,
 – for every pair (p, x) ∈ ρ, the pair (h1 (p), h3 (x)) belongs to ρ0 .

The data graph G0 is an homomorphic image of the data graph pattern G if
V 0 = h1 (V ).
    Given a data graph pattern G and a REE e, the certain answers of e over G
are defined as 3
                 \
                   {e(G0 ) : G0 is an homomorphic image of G}.


3     Evaluation of REEs over incomplete graphs

Theorem 1. The combined and data complexities of finding certain answers of
REEs over incomplete graph databases are coNP-complete. This remains true
for incomplete graphs without label variables.

    Membership in coNP follows from tractability of REEs. Hardness is by re-
duction to Positive 1-in-3 SAT. The proof extends to show that data complexity
of finding certain answers of REMs is coNP-complete, and their combined com-
plexity is PSpace-complete.
    Given a formula φ in CNF with 3 variables in each clause, we design an
incomplete graph data such that an homomorphic images with data in {0, 1}
corresponds to a valuation of the variables in φ. We define a REE e such that e
is false in an homomorphic image G0 iff the data of G0 belong to {0, 1} and the
3
    The reader familiar with incompleteness may notice that we adopted the closed
    world semantics. Given the definition of REE (and also of REM), it is easy to see
    that the open world semantics and the closed world semantics coincide.
valuation associated with G0 makes exactly one variable true in each clause of
φ.
    For positive REE, we can obtain a tractable algorithm by using a so-called
naive evaluation, cf. [3]. That is, we directly evaluate the REE over the incom-
plete data graph, treating the label variables as regular labels and the data
variables as regular data.

Proposition 1. The combined complexity of finding the certain answers of pos-
itive REEs over incomplete graph databases is tractable.

    We can also show, using naive evaluation, that data complexity of certain
answers of positive REMs is tractable. The exact combined complexity though
is open.


4   Future work

Many questions are left open. Instead of REM and REE, we could investigate
incompleteness for other languages designed to query graph databases. Natu-
ral candidates would be versions of XPath for graph databases [4] and register
logic [1]. Another direction of research is to consider a more general notion of
incomplete data graph (as in [2]), where we also allow the labels of the edges to
be regular expressions, as opposed to just label variables.


References
 1. P. Barceló, G. Fontaine and A. Widjaja Lin. Expressive Path Queries on Graphs
    with Data. In LPAR 2013, pages 71–85.
 2. P. Barceló, L. Libkin and J. L. Reutter. Querying graph patterns. In PODS 2011,
    pages 199–200.
 3. T. Imielinski and W. Lipski. Incomplete information in relational databases. J.
    ACM, 31(4):761–791, 1984.
 4. L. Libkin, W. Martens and D. Vrgoc. Querying graph databases with XPath. In
    ICDT 2013, pages 129–140.
 5. L. Libkin and D. Vrgoc. Regular path queries on graphs with data. In ICDT 2012,
    pages 74 – 85.

</pre>