A Formalism for Graph Databases and its Model of
                     Computation

                                Juan Reutter and Tony Tan

                                  University of Edinburgh

       Abstract. Graph databases are directed graphs in which the edges are labeled
       with symbols from a finite alphabet. In this paper we introduce a logic for such
       graphs in which the domain is the set of edges. We compare its expressiveness
       with the standard logic in which the domain the set of vertices. Furthermore,
       we introduce a robust model of computation for such logic, the so called graph
       pebble automata.


1   Introduction
The study of graph structured data has received much attention lately, due to numerous
applications in areas such as biological networks [12, 15], social networks [17], and the
semantic Web [9]. The common database model proposed by such applications is most
commonly denoted as graph databases, in which nodes are objects, and edge labels
define relationships between those objects [2].
    For querying graph structured data, one normally wishes to specify certain types of
paths between nodes. Most common examples of these queries are conjunctive regu-
lar path queries [1, 14, 6, 3]. Those querying formalisms have been thoroughly studied,
and their algorithmic properties are more or less understood. On the other hand, there
has been much less work devoted on other formalisms other than graph reachability
patterns, say, for example, the integrity constraints such as labels with unique names,
typing constraints on nodes, functional dependencies, domain and range of properties.
See, for instance, the survey [2] for more examples of integrity constraints.
    Our intention is to study formalisms for such graph databases which is capable of
expressing these integrity constraints, while at the same time still feature manageable
model checking properties. Obviously such formalisms depend on how the underlying
directed graph of the databases are represented in the first place.
    The standard representation of directed graphs is simply a set of nodes, together
with a binary relation on these nodes to represent the edge among them. The labeling of
the edges is represented as a function from the edges to the finite alphabet of symbols.
We call such representation the vertex representation.
    Another less common way to represent directed graphs is to take the edges as the
domain, together with some well defined binary relations on these edges to indicate how
two edges intersect. The labeling of the edges is represented as a set of unary predicates
on the domain. We call such representation the edge representation.
    In the first part of our paper we propose a vocabulary for the edge representation,
which we call E-vocabulary. We call V-vocabulary the vocabulary for the vertex repre-
sentation. We study the expressive power of E and compare it to V-vocabulary. In this
respect our contributions are the following.
 – The logic that we propose for edge representation is robust, in the sense that for
   each graph database in the vertex representation, there exists a unique (up to iso-
   morphism) graph database in the edge representation that have the same underlying
   directed graph. Vice versa, for each graph database in the edge representation, there
   exists a unique (up to isomorphism) graph database in the vertex representation that
   have the same underlying directed graph.
 – Next, we turn our attention to expressivity of E-vocabulary. For first-order logic
   (FO) we show that it is equivalent to V-vocabulary. On the other hand, for the
   existential monadic second-order logic (∃MSO), as well as monadic second-order
   logic (MSO), the E-vocabulary is more expressive than the V-vocabulary. That is,
   there are ∃MSO and MSO sentences in E-vocabulary that cannot be expressed in
   sentences in V-vocabulary in ∃MSO and MSO logics, respectively.
    In the second part of our paper we introduce a notion of automata for graph databases.
We follows the direction in [19] by defining pebble automata for directed graphs.
    Pebble automata was initially introduced for words over finite alphabet in [8]. Later
it was extended words over infinite alphabets in [16]. Roughly speaking, a k pebble
automaton, in short k-PA, is a finite state automaton equipped with k pebbles. The
pebbles are placed on/lifted from the input word in the stack discipline – first in last
out – and are intended to mark positions in the input word. One pebble can only mark
one position and the most recently placed pebble serves as the head of the automaton.
The automaton moves from one state to another depending on the equality tests among
data values in the positions currently marked by the pebbles, as well as, the equality
tests among the positions of the pebbles.
    Later in [19] the connection between graphs and pebble automata was initially intro-
duced. The main idea in [19] is that a word of even length over an infinite alphabet can
be viewed as a directed graph, hence pebble automata for words over infinite alphabets
can be viewed as a model of computation for directed graphs.
    In this paper we extend this connection to the case of graph databases, i.e., directed
graphs in which edges are labeled with symbols from a finite alphabet Σ. Some of the
results in this paper are the following.
 1. We define the notion of k pebble graph automata, or in short k-PA, for graph
    databases.
 2. Every first-order sentences of quantifier rank k over graph databases can be simu-
    lated by k-PA.
 3. We demonstrate the robustness of pebble automata by showing the equivalence
    between two-way alternating k-PA and one-way deterministic k-PA. This result
    settles a question raised in [16]. It was first spelled in [19] for words over infinite
    alphabet, but no formal proof has been given until now.
    This robustness immediately implies that the class of families of directed graphs
    captured by k-PA is closed under boolean operations.
We also note that almost all results in [19] can be carried over to the case of graph
databases, including the fact that reachability from the source node s to the target node
t can be checked by k-PA if and only if the distance from s to t is less than or equal
to 2k . This fact, together with item (1) above, yields the fact that reachability can be
expressed by first-order sentence of quantifier rank k if and only if the distance between
source and target nodes is less than or equal to 2k . As the proof is non standard, in the
sense that we do not use the standard Ehrenfeucht-Fraı̈ssé approach which is commonly
used in most finite definability results, it is worth to mention that pebble automata can
be a potentially useful tool to prove definability results in first-order logic over graph
databases.

Related work. Closely related to our work is Courcelles work [5], which appears to be
the first ones that suggest including the graph edges as part of the domain. The results
and definitions here do not follow from [5]. The first reason is that the logic introduced
by Courcelle is essentially two sorted logic. That is, the domain consists of two kinds of
elements: the vertices and the edges. Whereas, the logic that we define here has only the
edges as the domain. Thus, the logic is defined with different vocabulary than ours. The
second reason is that it has not been shown that every structure defined in the vocabulary
in [5] is indeed a directed graph. It is not clear at all in the first place why it is true. We
prove in this paper that indeed such is the case.
    Later on in the paper [10] monadic second-order logic was introduced for abstract
matroids, which are extensions of graphs. It was shown in [10] that many results in [5]
also hold in this setting. However, the emphasis in [10] is decidability issue for satisfac-
tion problem. So naturally it only considers the family of matroids with bounded branch
width, the analog of tree width for graphs. While in our paper we are more interested
in a model of computation for graph databases that feature manageable model checking
properties.
    Another work related to ours is the work in [4]. In that paper two models of com-
putation for directed graphs are introduced, the so called V -automata and E-automata.
In brief, given an input directed graph G, a V -automaton marks the vertices of G with
symbols from finite alphabet. The decision to accept G or not depends on this labeling.
E-automata operate in the same manner, except that they mark the edges, instead of
vertices. It is shown in [4] that V -automata are weaker than E-automata.
    These models, the V - and E-automata, are incomparable to our graph pebble au-
tomata. On one side, E-automata are capable of simulating µ-calculus on directed
graphs, but they are not closed under negation. On the other side, our graph pebble au-
tomata are capable of simulating the whole first-order logic on directed graphs, closed
under all boolean operations.

Organization. This paper is organized as follows. In Section 2 we define the vocabular-
ies V and E. Then in Section 3 we define the notion of structural equivalent, the notion
to compare two structures from V and E logics. In Section 4 we compare the expressive
power between V and E logics. We introduce graph pebble automata in Section 5. We
then extend all previous definitions to the labeled edges graphs in Section 6. Finally we
conclude with a future direction for our work in Section 7.

2   Representation for graph databases
Graph databases are usually defined as finite edge-labeled directed graphs [2]. In this
paper, in order to keep the presentations simple, we shall work only with unlabeled
directed graphs. We will explain how to extend these results for the case of labeled
graphs in Section 6.
    In what follows, we state two representations for graph databases. The first is the
standard one, where a directed graph is just a set of vertices equipped with a binary
relation on the vertices. We will denote its vocabulary by V.
    The second one is our proposed representation for directed graphs where the edges
are the domain. We will denote its vocabulary by E.

The vocabulary V. The vocabulary V simply consists of one binary predicate E. We
denote by STRUCT[V] the set of structures of V, which are simply directed graphs. A
V-structure is a structure in STRUCT[V].
    We will usually write G = (V (G), E(G)) for structures in STRUCT[V], where
V (G) = Dom(G) is the domain and E(G) is the binary relation on the elements in
V (G).
    The atomic formula in the logic V is either x = y or E(x, y). The meaning of
E(x, y) is simply (x, y) ∈ E. The first-order logic FO[V] is obtained by closing the
atomic formulas under the Boolean connectives and first-order quantification over V .
The logic MSO[V], which stands for monadic second-order, is obtained by adding
quantification over unary predicates on the domain. If the unary predicates quantifi-
cations are all existential, then we denote it by ∃MSO[V]. A V-sentence is a sentence
using the vocabulary V. A sentence ϕ defines a set of directed graphs via G(ϕ) := {G |
G |= ϕ}.
    For the sake of presentation, we only consider graphs G ∈ STRUCT[V] in which
there is no isolated vertices and there is no self loop.

The vocabulary E. Intuitively, rather than viewing a directed graph G = (V, E) as a set
V of vertices and E a binary relation on V , we take E as the domain and define some
relations among the elements in E.
    Let u and v be two vertices and e be an edge from u to v. What we mean by the head
of e is the vertex v, while the tail of e is the vertex u. Now the vocabulary E consists
of the binary relations HeadHead, HeadTail and TailTail on the directed edges, where
the intentions of each predicate are as follows.
  – TailTail(e1 , e2 ) means that the tails of e1 and e2 are the same.
  – HeadHead(e1 , e2 ) means that the heads of e1 and e2 are the same.
  – HeadTail(e1 , e2 ) means that the head of e1 is the tail of e2 .
    As above, STRUCT[E] denotes the set of all structures of E and an E-structure is
a structure in STRUCT[E]. We assume that the structures in STRUCT[E] satisfy the
following axioms.
E1. Both HeadHead and TailTail are equivalence relations.
E2. If HeadHead(e1 , e2 ) and HeadTail(e1 , e3 ), then HeadTail(e2 , e3 ).
E3. If TailTail(e1 , e2 ) and HeadTail(e3 , e1 ), then HeadTail(e3 , e2 ).
E4. If HeadTail(e1 , e3 ) and HeadTail(e2 , e3 ), then HeadHead(e1 , e2 ).
E5. If HeadTail(e3 , e1 ) and HeadTail(e3 , e2 ), then TailTail(e1 , e2 ).
E6. If HeadHead(e1 , e2 ) and TailTail(e1 , e2 ), then e1 = e2 .
E7. For all e, ¬HeadTail(e, e).
The purpose of axioms E1–E5 are for consistency, that is, the structures in STRUCT[E]
are really graphs in the ordinary sense of graphs as structures in STRUCT[V]. (See
Proposition 2 below.) Axiom E6 does not allow multiple edges, whereas Axiom E7
does not allow self-loop. Axioms E6 and E7 are not essential, but they will be useful
for our convenience in the presentation.
    As usual, FO[E], MSO[E] and ∃MSO[E] denote the classes of first-order, monadic
second-order and existential monadic second-order sentences in the logic E. An E-
sentence is a sentence using the vocabulary E.
    We will usually write E to denote the elements in STRUCT[E] and Dom(E) to
denote the domain of E. A sentence ϕ in E-logic defines a set of E-structures via

                                   G(ϕ) := {E | E |= ϕ}.


3    The equivalence between edge and vertex representations
In this section we will show that both the edge and the vertex representations essentially
denote the same class of objects.

Definition 1. Let G ∈ STRUCT[V] and G ∈ STRUCT[E]. We say that G and E are
structurally equivalent if there exists a 1-1 mapping ξ : E(G) → Dom(E) such that for
all (v1 , v2 ), (v2 , v3 ), (v1 , v3 ) ∈ E(G) and e1 , e2 ∈ Dom(E),
 1. ξ(v1 , v2 ) = e1 and ξ(v2 , v3 ) = e2 if and only if HeadTail(e1 , e2 );
 2. ξ(v1 , v2 ) = e1 and ξ(v1 , v3 ) = e2 if and only if TailTail(e1 , e2 ); and
 3. ξ(v1 , v3 ) = e1 and ξ(v2 , v3 ) = e2 if and only if HeadHead(e1 , e2 ).
The 1-1 mapping ξ is called a (V, E)-isomorphism.

In other words, if G and E are structurally equivalent, then they essentially denote the
same underlying directed graph. The following proposition states that this notion is
robust.

Proposition 1.
(a) Let G be a V-structure and E 1 , E 2 be E-structures. If both E 1 and E 2 are struc-
    turally equivalent to G, then E 1 and E 2 are isomorphic.
(b) Let G1 , G2 be V-structures and E be a E-structure. If both G1 and G2 are equiva-
    lent to E, then G1 and G2 are isomorphic.

    Moreover, the following proposition shows that both edge and vertex representa-
tions are equivalent, in the sense that each graph stored using the standard vertex repre-
sentation can be coded as a graph under the edge representation, and vice versa.

Proposition 2. 1. For every V-structure G, there exists a unique (up to isomorphism)
    E-structure E which is structurally equivalent to G.
 2. For every E-structure E, there exists a unique (up to isomorphism) V-structure G
    structurally equivalent to E.
    We do not state the full proof, but rather give an example of how the edge to vertex
translation works.

Example 1. Let E be an E-structure, where
    – Dom(E) = {e1 , e2 , e3 };
    – HeadHead = {(e1 , e1 ), (e2 , e2 ), (e3 , e3 ), (e1 , e3 ), (e3 , e1 )};
    – TailTail = {(e1 , e1 ), (e2 , e2 ), (e3 , e3 )};
    – HeadTail = {(e1 , e2 ), (e2 , e3 ), (e3 , e2 )}.
The following picture well illustrates the structure of E:
                                                      e2

                                e1        -                     R
                                              I
                                                      e3

We can get a V-structure G = (V (G), E(G)) equivalent to E as follows. Let H be the
equivalent classes of HeadHead and T the equivalent classes of TailTail, i.e. H =
{{e1 , e3 }, {e2 }} and T = {{e1 }, {e2 }, {e3 }}. Then we define G = (V (G), E(G)) as
follows. The set of vertices is V (G) = H × T , and ((H1 , T1 ), (H2 , T2 )) ∈ E(G) if and
only if T1 ∩ H2 6= ∅. It is depicted as follows.


               (∅, {e1 })
                                          -                     R ({e2 }, {e3 })
                             ({e1 , e3 }, {e2 })I


4      Vertex and edge representations and their logics
In this section we will study the relation between the expressive power of logics us-
ing vertex or edge vocabularies. We need the following definition. For a set A ⊆
STRUCT[V], we define EquivE (A) as the set of E-structures which are equivalent
to the structures in A. Formally,

     EquivE (A) = {E ∈ STRUCT[E] | E is structurally equivalent to some G ∈ A}.

Vice versa, for a set B ⊆ STRUCT[E], we define

     EquivV (B) = {G ∈ STRUCT[V] | G is structurally equivalent to some E ∈ B}.

  By Proposition 1, it is immediate that for every sets A ⊆ STRUCT[V] and B ⊆
STRUCT[E],

                   A = EquivV (EquivE (A)) and B = EquivE (EquivV (B))
From this we immediately get that A = EquivV (B) if and only if B = EquivE (A).
    Now we introduce the notion of (V, E)-equivalent, the logical version of Defin-
tion 1.
Definition 2. A V-sentence ϕ and an E-sentence ψ are (V, E)-equivalent if G(ϕ) =
EquivV (G(ψ)), or equivalently, G(ψ) = EquivE (G(ϕ)).
    Using the notion of (V, E)-equivalent, we can now compare the expressive power
between vertex and edge representations. Our first proposition shows that the edge rep-
resentation is as least as expressive as the vertex representation:
Proposition 3. Let L in {FO, ∃MSO, MSO}. Then, for every sentence ϕ ∈ L[V], there
exists a sentence ψ ∈ L[E] such that ϕ and ψ are (V, E)-equivalent
The proof is pretty straightforward, thus omitted.
    The natural question is whether the converse holds, that is, whether for every sen-
tence using the edge representation we can find an equivalent sentence using the vertex
representation. As we show below, it turns out that this is not true even for ∃MSO
sentences, nor if the full power of MSO is allowed.

Theorem 1. 1. There exists a sentence ψ ∈ ∃MSO[E] such that for all sentence ϕ ∈
    ∃MSO[V], ψ and ϕ are not (V, E)-equivalent.
 2. There exists a sentence ψ ∈ MSO[E] such that for all sentence ϕ ∈ MSO[V], ψ
    and ϕ are not (V, E)-equivalent.

Proof. We begin with the ∃MSO case. The idea is to use the fact that (s, t)-reachability
in directed graph is not expressible in ∃MSO[V] (see, for example, [13, Theorem 7.16]).
    For this we need to add two constants s and t to both V- and E-vocabularies, de-
noting the source and target vertices respectively. The interpretation of the constants s
and t in V-structures are the source and the target vertices, while their interpretation
in E-structures are two edges: one whose tail is the source vertex, and the other whose
head is the target vertex.
    We define the following class of V-structures consists of directed graphs in which
there is a path from s to t.
                                                                                       
                                   there are v1 , . . . , vk s.t. v1 = s and vk = t and
     RV = G ∈ STRUCT[V]
                                   for each i = 1, . . . , k − 1, (vi , vi+1 ) ∈ E(G)

    It can be readily seen that the class EquivE (RV ) is expressible in ∃MSO[E] in the
following sentence. There exists a set P such that

 – there is an edge y in P such that TailTail(y, s) holds;
 – there is an edge y in P such that HeadHead(y, t) holds;
 – for every edge y in P where ¬HeadHead(y, t), there is an edge z in P such that
   HeadTail(y, z) holds.

   This immediately implies that ∃MSO[E] is strictly more expressive than ∃MSO[V].
This proves the first case of the theorem.
    The proof for the second case goes along the same lines, this time using the fact that
directed graph hamiltonicity (i.e., whether a graph is hamiltonian) is not expressible in
MSO[V] (see, for example, [13, Corollary 7.24]). On the other hand, directed graph
hamiltonicity can be expressed in the following MSO[E] sentence. There exists a set U
such that

    – every two edges in U are connected (can be expressed as in the proof above); and
    – for every edge x, x is adjacent to some edge y in U (either HeadHead(x, y),
      TailTail(x, y), or HeadTail(x, y) holds);

                                                                                            2

    Next, we compare the two representations for the case of first-order logic. It turns
out that the edge and vertex representations are equivalent if one disallows second-order
quantification. Moreover, we also show that this transformation involves only a slight
increase in quantifier rank.
Proposition 4. For every sentence ψ ∈ FO[E], there exists a sentence ϕ ∈ FO[V] such
that they are (V, E)-equivalent and qr(ϕ) = 2qr(ψ).
    With respect to the vertex to edge transformation, the following is immediate from
the proof of proposition 3
Corollary 1. For every sentence ϕ ∈ FO[V], there exists a sentence ψ ∈ FO[E] such
that ϕ and ψ are (V, E)-equivalent and qr(ψ) = qr(ϕ) + 1.


5     Graph pebble automata

In this section we define pebble automata for directed graphs. It is based on the idea of
pebble automata (PA) for words over infinite alphabet [16]. Let D be a set of infinite
symbols. We assume that the nodes in the directed graphs always come from D.
    Briefly the way graph PA with k pebbles works as follows. If G is a directed
graph,
     and    (a
               1 , b1 ), . . . , (an , bn ) are the edges in E(G), then we feed a sequence w =
 a1        an
 b1  · · · bn into graph k-PA. The pebbles are numbered from 1 to k. The automaton
starts the computation with only pebble k on the sequence w. The pebbles are placed
on/lifted from w in the stack discipline according to the strict order of the pebbles: Peb-
ble i can be placed only when pebbles i + 1, . . . , k are above the sequence w. Each
pebble is intended to mark one position in w and the smallest numbered pebble on w,
or, equivalently the most recently placed pebble, serves as the head of the automaton.
The automaton moves from one state to another depending on whether the edges read
by the pebbles satisfy the HeadHead, TailTail, HeadTail relations.

Definition 3. A two-way alternating graph k-pebble automaton, (in short graph k-PA)
is a system A = hQ, q0 , F, µi, where

    – Q, q0 ∈ Q, U ⊆ Q and F ⊆ Q are a finite set of states, the initial state, the set of
      universal states and the set of final states, respectively; and
    – µ is a finite set of transitions of the form α → β such that
        • α is of the form
                                         (i, P, V00 , V10 , V01 , V11 , q)
          , where i ∈ {1, . . . , k}, P, V00 , V10 , V01 , V11 ⊆ {i + 1, . . . , k}, and
        • β is of the form (q, act), where q ∈ Q and
                     act ∈ {left, right, place-pebble, lift-pebble}.
       Given a sequence of edges w = ab11 · · · abnn , a configuration of A on /w. is a triple
                                                       

  [i, q, θ], where i ∈ {1, . . . , k}, q ∈ Q and θ : {i, i + 1, . . . , k} → {0, 1, . . . , n, n + 1}.
  The function θ defines the position of the pebbles and is called the pebble assignment.
  The symbols in the positions 0 and n + 1 are / and ., respectively.
       The initial configuration is γ0 = [k, q0 , θ0 ], where θ0 (k) = 0 is the initial pebble
  assignment. A configuration [i, q, θ] with q ∈ F is called an accepting configuration.
       A transition (i, P, V00 , V01 , V10 , V11 , p) → β applies to a configuration [j, q, θ], if
  (1) i = j and p = q,
  (2) P = {l > i : θ(l) = θ(i)},
(3.a) V00 = {l > i : aθ(l) = aθ(i) },
(3.b) V10 = {l > i : bθ(l) = aθ(i) },
(3.c) V10 = {l > i : aθ(l) = bθ(i) }, and
(3.d) V11 = {l > i : bθ(l) = bθ(i) }.
  A transition (i, P, V00 , V01 , V10 , V11 , p) → β applies to a configuration [j, q, θ], if con-
  ditions (1)–(3) above hold.
       We define the transition relation `A as follows: [i, q, θ] `A [i0 , q 0 , θ0 ], if there is a
  transition α → (p, act) ∈ µ that applies to [i, q, θ] such that q 0 = p, for all j > i,
  θ0 (j) = θ(j), and
    - if act = left, then i0 = i and θ0 (i) = θ(i) − 1,
    - if act = right, then i0 = i and θ0 (i) = θ(i) + 1,
    - if act = lift-pebble, then i0 = i + 1,
    - if act = place-pebble, then i0 = i − 1, θ0 (i − 1) = 0 and θ0 (i) = θ(i).
  As usual, we denote the reflexive, transitive closure of `A by `∗A .
     The acceptance criteria is based on the notion of leads to acceptance below. For
  every configuration γ = [i, q, θ],
    – if q ∈ F , then γ leads to acceptance;
    – if q ∈ U , then γ leads to acceptance if and only if for all configurations γ 0 such that
       γ ` γ 0 , γ 0 leads to acceptance;
    – if q ∈/ F ∪ U , then γ leads to acceptance if and only if there is at least one configu-
       ration γ 0 such that γ ` γ 0 , and γ 0 leads to acceptance.
  A sequence of edges ab11 · · · abnn is accepted by A, if the initial configuration γ0 leads
                                       

  to acceptance. The language L(A) consists of all sequence of edges accepted by A. Ob-
  viously, the sequence w induces a set of directed edges Gw as explain in the beginning
  of this section.
      We have presented here the notion of alternating graph PA, since it is easier to
  work with for our purposes. However, it is not difficult to define instead the notion
  of deterministic graph PA. The next theorem shows that this choice is without loss of
  generality, as both models are equivalent.
Theorem 2. 1. For each k ≥ 1, two-way non-deterministic graph k-PA and one-way
    deterministic graph k-PA have the same recognition power.
 2. For each k ≥ 1, graph k-PA languages are closed under boolean operation.

     Next, we introduce the relationship between graph PA and First Order logic.

Theorem 3. For every FO E-sentence ψ, there exists a graph k-PA Aψ such that k =
qr(ψ) and L(A) = G(ψ)).

Proof. The proof is an adaptation of similar result in [19]. First, by Theorem 2, PAk is
closed under boolean operations. Let ϕ = Qxk ψ(xk ) where Q ∈ {∀, ∃} and ψ(xk ) is
of quantifier rank k − 1.
    The proof is by straightforward induction on k. A k-PA A iterates pebble k through
all possible positions in the input. On each iteration, the automaton A recursively calls
a (k − 1)-PA A0 that accepts the language L(ψ(xk )), treating the position of pebble k
as the assignment value for xk .

    - If Q = ∀, then A accepts w if and only if A0 accepts on all iterations.
    - If Q = ∃, then A accepts w if and only if A0 accepts on at least one iteration.

                                                                                        2

    Notice that Theorem 3 is optimal in the sense that all k pebbles are needed. More
precisely, it is possible to adapt the proof of [19] to show that for every k ≥ 2 there
exists an FO E-sentence ψ, with k = qr(ψ), and such that L(A) 6= G(ψ)) for every
graph PA A using less than k pebbles.


6     When the edges are labeled with symbols from finite alphabet

In the usual graph databases setting the edges are labeled with symbols from a fixed
finite alphabet. Each symbol can be viewed as a unary predicate on the edges.
     In this section we extend the vocabularies V and E with unary predicates on the
edges, which we called extended V and E vocabularies. We also extend the definition
of graph pebble automata for edges labeled with symbols from a fixed alphabet.
     In the following we let Σ be a fixed finite alphabet.

Extended V logic. The vocabulary for the extended V logic consists of σ for each
σ ∈ Σ, where each σ is a binary predicate on the domain. We denote by V∗ the extended
V logic.
    An extended V-structure is a tuple G = (V, {σ}σ∈Σ ) such that V is the domain of
nodes and the sets {σ}σ∈Σ are disjoints. Intuitively, each relation σ denotes the set of
edges which are labeled with the symbol σ ∈ Σ. Since no edge can be labeled with two
different symbols, the sets {σ}σ∈Σ are disjoint.
Extended E logic. The vocabulary consists of HeadHead, HeadTail, TailTail, {σ}σ∈Σ ,
where each σ ∈ Σ is unary predicate on the domain. We denote by E∗ the extended E
logic.
    An extended E-structure is a tuple E = (U, HeadHead, HeadTail, TailTail, {σ}σ∈Σ ),
where U is the domain of edges, the relations HeadHead, HeadTail, TailTail on U are
defined as before, and each σ ∈ Σ is a unary predicate on U .
    It is straightforward to show that all results on the vocabularies V and E still hold
for the extended logics V∗ and E∗ . In the following we will elaborate this point more
precisely.

Definition 4. Let G be an V∗ structure and E an E∗ structure. We say that G and E are
structurally equivalent if there exists S a 1-1 mapping ξ : E(G) → Dom(E) such that for
all (v1 , v2 ), (v2 , v3 ), (v1 , v3 ) ∈ σ∈Σ σ and e1 , e2 ∈ Dom(E),

 1. for each σ ∈ Σ, (v1 , v2 ) ∈ σ if and only if ξ(v1 , v2 ) ∈ σ;
 2. ξ(v1 , v2 ) = e1 and ξ(v2 , v3 ) = e2 if and only if HeadTail(e1 , e2 );
 3. ξ(v1 , v2 ) = e1 and ξ(v1 , v3 ) = e2 if and only if TailTail(e1 , e2 ); and
 4. ξ(v1 , v3 ) = e1 and ξ(v2 , v3 ) = e2 if and only if HeadHead(e1 , e2 ).

The 1-1 mapping ξ is called a (V∗ , E∗ )-isomorphism.

Theorem 4. 1. Let L in {FO, ∃MSO, MSO}. Then, for every sentence ϕ ∈ L[V∗ ],
    there exists a sentence ψ ∈ L[E∗ ] such that ϕ and ψ are (V∗ , E∗ )-equivalent
 2. There exists a sentence ψ ∈ ∃MSO[E∗ ] such that for all sentence ϕ ∈ ∃MSO[V∗ ],
    ψ and ϕ are not (V∗ , E∗ )-equivalent.
 3. There exists a sentence ψ ∈ MSO[E∗ ] such that for all sentence ϕ ∈ MSO[V∗ ], ψ
    and ϕ are not (V∗ , E∗ )-equivalent.

     Next we define a graph pebble automata with unary predicates on the edges. It is
also prettymuchstraightforward
                                 extension of Definition 3. In this case the input is of
             σ1          σn
the form:  a1  · · ·  an  ∈ Σ × D × D, where σi ∈ Σ is the label of the edge
             b1          bn
(ai , bi ).
     The transitions are of the form: (i, σ, P, V00 , V10 , V01 , V11 , p) → (q, act). It is
straightforward to show that all the results in the previous section can be adapted for
such graph pebble automata. More precisely,

Theorem 5. 1. For PA with unary predicates, for each k ≥ 1, two-way non-deterministic
    graph k-PA and one-way deterministic graph k-PA have the same recognition
    power.
 2. For each k ≥ 1, graph k-PA (with unary predicates) languages are closed under
    boolean operation.
 3. For every FO E∗ -sentence ψ, there exists a graph k-PA Aψ with unary predicates
    such that k = qr(ψ) and L(A) = G(ψ)).
7    Future directions
We would like to apply our logics and graph pebble automata in a more application
oriented settings. Also, it is well known that the emptiness problem for graph pebble
automata is undecidable. One direction that we would like to pursue is to charaterize
a subclass of pebble automata, for which the emptiness problem is decidable. We also
would like to define and study similar logics for matroid and extend the graph pebble
automata for abstract matroid.
Acknowledgments: We thank the anonymous referees for many helpful comments. Partial sup-
port provided by EPSRC grant G049165 and FET-Open Project FoX, grant agreement 233599.

References
1. S. Abiteboul, P. Buneman, D. Suciu. Data on the Web: From Relations to Semistructured Data
   and XML. Morgan Kauffman, 1999.
2. R. Angles, C. Gutiérrez. Survey of graph database models. ACM Comput. Surv. 40(1): (2008).
3. P. Barceló, C. Hurtado, L. Libkin, P. Wood. Expressive languages for path queries over graph-
   structured data. In PODS 2010.
4. D. Berwanger, D. Janin. Automata on Directed Graphs: Edge Versus Vertex Marking. In
   ICGT 2006.
5. B. Courcelle. The expression of graph properties and graph transformations in monadic
   second-order logic. In Handbook of Graph Grammars and Computing by Graph Transfor-
   mations, Volume 1: Foundations, 1997.
6. I. Cruz, A. Mendelzon, P. Wood. A graphical query language supporting recursion. In SIG-
   MOD 1987.
7. R. Fagin, L. J. Stockmeyer, and M. Y. Vardi. On monadic NP vs. monadic co-NP. Info. and
   Comp., 120(1):78–92, 1995.
8. N. Globerman and D. Harel. Complexity results for multi-pebble automata and their logics.
   In ICALP 1994.
9. C. Gutierrez, C. Hurtado, A. Mendelzon. Foundations of semantic web databases. In PODS
   2004.
10. P. Hliněný. Branch-width, parse trees, and monadic second-order logic for matroids. J.
   Comb. Theory, Ser. B 96(3): 325–351 (2006)
11. R. Ladner, R. Lipton and L. Stockmeyer. Alternating Pushdown and Stack Automata. SIAM
   Journal of Comp. 13(1): 135–155, 1984.
12. U. Leser. A query language for biological networks. Bioinformatics 21 (suppl 2) (2005),
   ii33–ii39.
13. L. Libkin. Elements of Finite Model Theory. Springer, 2004.
14. A. O. Mendelzon, P. T. Wood. Finding regular simple paths in graph databases. SIAM J.
   Comput., 24(6):1235–1258, 1995.
15. R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, U. Alon. Network motifs:
   simple building blocks of complex networks. Science 298(5594) (2002), 824–827.
16. F. Neven, T. Schwentick, and V. Vianu. Finite state machines for strings over infinite alpha-
   bets. ACM ToCL, 5(3):403–435, 2004.
17. R. Ronen and O. Shmueli. SoQL: a language for querying and creating data in social net-
   works. In ICDE 2009.
18. T. Schwentick. On Winning Ehrenfeucht Games and Monadic NP. Ann. Pure Appl. Logic,
   79(1), 61–92, 1996.
19. T. Tan. Graph reachability and pebble automata over infinite alphabets. In LICS 2009.
20. G. Turán. On the definability of properties of finite graphs. Discrete Mathematics,
   49(3):291–302, 1984.