A Formalism for Graph Databases and its Model of Computation Juan Reutter and Tony Tan University of Edinburgh Abstract. Graph databases are directed graphs in which the edges are labeled with symbols from a finite alphabet. In this paper we introduce a logic for such graphs in which the domain is the set of edges. We compare its expressiveness with the standard logic in which the domain the set of vertices. Furthermore, we introduce a robust model of computation for such logic, the so called graph pebble automata. 1 Introduction The study of graph structured data has received much attention lately, due to numerous applications in areas such as biological networks [12, 15], social networks [17], and the semantic Web [9]. The common database model proposed by such applications is most commonly denoted as graph databases, in which nodes are objects, and edge labels define relationships between those objects [2]. For querying graph structured data, one normally wishes to specify certain types of paths between nodes. Most common examples of these queries are conjunctive regu- lar path queries [1, 14, 6, 3]. Those querying formalisms have been thoroughly studied, and their algorithmic properties are more or less understood. On the other hand, there has been much less work devoted on other formalisms other than graph reachability patterns, say, for example, the integrity constraints such as labels with unique names, typing constraints on nodes, functional dependencies, domain and range of properties. See, for instance, the survey [2] for more examples of integrity constraints. Our intention is to study formalisms for such graph databases which is capable of expressing these integrity constraints, while at the same time still feature manageable model checking properties. Obviously such formalisms depend on how the underlying directed graph of the databases are represented in the first place. The standard representation of directed graphs is simply a set of nodes, together with a binary relation on these nodes to represent the edge among them. The labeling of the edges is represented as a function from the edges to the finite alphabet of symbols. We call such representation the vertex representation. Another less common way to represent directed graphs is to take the edges as the domain, together with some well defined binary relations on these edges to indicate how two edges intersect. The labeling of the edges is represented as a set of unary predicates on the domain. We call such representation the edge representation. In the first part of our paper we propose a vocabulary for the edge representation, which we call E-vocabulary. We call V-vocabulary the vocabulary for the vertex repre- sentation. We study the expressive power of E and compare it to V-vocabulary. In this respect our contributions are the following. – The logic that we propose for edge representation is robust, in the sense that for each graph database in the vertex representation, there exists a unique (up to iso- morphism) graph database in the edge representation that have the same underlying directed graph. Vice versa, for each graph database in the edge representation, there exists a unique (up to isomorphism) graph database in the vertex representation that have the same underlying directed graph. – Next, we turn our attention to expressivity of E-vocabulary. For first-order logic (FO) we show that it is equivalent to V-vocabulary. On the other hand, for the existential monadic second-order logic (∃MSO), as well as monadic second-order logic (MSO), the E-vocabulary is more expressive than the V-vocabulary. That is, there are ∃MSO and MSO sentences in E-vocabulary that cannot be expressed in sentences in V-vocabulary in ∃MSO and MSO logics, respectively. In the second part of our paper we introduce a notion of automata for graph databases. We follows the direction in [19] by defining pebble automata for directed graphs. Pebble automata was initially introduced for words over finite alphabet in [8]. Later it was extended words over infinite alphabets in [16]. Roughly speaking, a k pebble automaton, in short k-PA, is a finite state automaton equipped with k pebbles. The pebbles are placed on/lifted from the input word in the stack discipline – first in last out – and are intended to mark positions in the input word. One pebble can only mark one position and the most recently placed pebble serves as the head of the automaton. The automaton moves from one state to another depending on the equality tests among data values in the positions currently marked by the pebbles, as well as, the equality tests among the positions of the pebbles. Later in [19] the connection between graphs and pebble automata was initially intro- duced. The main idea in [19] is that a word of even length over an infinite alphabet can be viewed as a directed graph, hence pebble automata for words over infinite alphabets can be viewed as a model of computation for directed graphs. In this paper we extend this connection to the case of graph databases, i.e., directed graphs in which edges are labeled with symbols from a finite alphabet Σ. Some of the results in this paper are the following. 1. We define the notion of k pebble graph automata, or in short k-PA, for graph databases. 2. Every first-order sentences of quantifier rank k over graph databases can be simu- lated by k-PA. 3. We demonstrate the robustness of pebble automata by showing the equivalence between two-way alternating k-PA and one-way deterministic k-PA. This result settles a question raised in [16]. It was first spelled in [19] for words over infinite alphabet, but no formal proof has been given until now. This robustness immediately implies that the class of families of directed graphs captured by k-PA is closed under boolean operations. We also note that almost all results in [19] can be carried over to the case of graph databases, including the fact that reachability from the source node s to the target node t can be checked by k-PA if and only if the distance from s to t is less than or equal to 2k . This fact, together with item (1) above, yields the fact that reachability can be expressed by first-order sentence of quantifier rank k if and only if the distance between source and target nodes is less than or equal to 2k . As the proof is non standard, in the sense that we do not use the standard Ehrenfeucht-Fraı̈ssé approach which is commonly used in most finite definability results, it is worth to mention that pebble automata can be a potentially useful tool to prove definability results in first-order logic over graph databases. Related work. Closely related to our work is Courcelles work [5], which appears to be the first ones that suggest including the graph edges as part of the domain. The results and definitions here do not follow from [5]. The first reason is that the logic introduced by Courcelle is essentially two sorted logic. That is, the domain consists of two kinds of elements: the vertices and the edges. Whereas, the logic that we define here has only the edges as the domain. Thus, the logic is defined with different vocabulary than ours. The second reason is that it has not been shown that every structure defined in the vocabulary in [5] is indeed a directed graph. It is not clear at all in the first place why it is true. We prove in this paper that indeed such is the case. Later on in the paper [10] monadic second-order logic was introduced for abstract matroids, which are extensions of graphs. It was shown in [10] that many results in [5] also hold in this setting. However, the emphasis in [10] is decidability issue for satisfac- tion problem. So naturally it only considers the family of matroids with bounded branch width, the analog of tree width for graphs. While in our paper we are more interested in a model of computation for graph databases that feature manageable model checking properties. Another work related to ours is the work in [4]. In that paper two models of com- putation for directed graphs are introduced, the so called V -automata and E-automata. In brief, given an input directed graph G, a V -automaton marks the vertices of G with symbols from finite alphabet. The decision to accept G or not depends on this labeling. E-automata operate in the same manner, except that they mark the edges, instead of vertices. It is shown in [4] that V -automata are weaker than E-automata. These models, the V - and E-automata, are incomparable to our graph pebble au- tomata. On one side, E-automata are capable of simulating µ-calculus on directed graphs, but they are not closed under negation. On the other side, our graph pebble au- tomata are capable of simulating the whole first-order logic on directed graphs, closed under all boolean operations. Organization. This paper is organized as follows. In Section 2 we define the vocabular- ies V and E. Then in Section 3 we define the notion of structural equivalent, the notion to compare two structures from V and E logics. In Section 4 we compare the expressive power between V and E logics. We introduce graph pebble automata in Section 5. We then extend all previous definitions to the labeled edges graphs in Section 6. Finally we conclude with a future direction for our work in Section 7. 2 Representation for graph databases Graph databases are usually defined as finite edge-labeled directed graphs [2]. In this paper, in order to keep the presentations simple, we shall work only with unlabeled directed graphs. We will explain how to extend these results for the case of labeled graphs in Section 6. In what follows, we state two representations for graph databases. The first is the standard one, where a directed graph is just a set of vertices equipped with a binary relation on the vertices. We will denote its vocabulary by V. The second one is our proposed representation for directed graphs where the edges are the domain. We will denote its vocabulary by E. The vocabulary V. The vocabulary V simply consists of one binary predicate E. We denote by STRUCT[V] the set of structures of V, which are simply directed graphs. A V-structure is a structure in STRUCT[V]. We will usually write G = (V (G), E(G)) for structures in STRUCT[V], where V (G) = Dom(G) is the domain and E(G) is the binary relation on the elements in V (G). The atomic formula in the logic V is either x = y or E(x, y). The meaning of E(x, y) is simply (x, y) ∈ E. The first-order logic FO[V] is obtained by closing the atomic formulas under the Boolean connectives and first-order quantification over V . The logic MSO[V], which stands for monadic second-order, is obtained by adding quantification over unary predicates on the domain. If the unary predicates quantifi- cations are all existential, then we denote it by ∃MSO[V]. A V-sentence is a sentence using the vocabulary V. A sentence ϕ defines a set of directed graphs via G(ϕ) := {G | G |= ϕ}. For the sake of presentation, we only consider graphs G ∈ STRUCT[V] in which there is no isolated vertices and there is no self loop. The vocabulary E. Intuitively, rather than viewing a directed graph G = (V, E) as a set V of vertices and E a binary relation on V , we take E as the domain and define some relations among the elements in E. Let u and v be two vertices and e be an edge from u to v. What we mean by the head of e is the vertex v, while the tail of e is the vertex u. Now the vocabulary E consists of the binary relations HeadHead, HeadTail and TailTail on the directed edges, where the intentions of each predicate are as follows. – TailTail(e1 , e2 ) means that the tails of e1 and e2 are the same. – HeadHead(e1 , e2 ) means that the heads of e1 and e2 are the same. – HeadTail(e1 , e2 ) means that the head of e1 is the tail of e2 . As above, STRUCT[E] denotes the set of all structures of E and an E-structure is a structure in STRUCT[E]. We assume that the structures in STRUCT[E] satisfy the following axioms. E1. Both HeadHead and TailTail are equivalence relations. E2. If HeadHead(e1 , e2 ) and HeadTail(e1 , e3 ), then HeadTail(e2 , e3 ). E3. If TailTail(e1 , e2 ) and HeadTail(e3 , e1 ), then HeadTail(e3 , e2 ). E4. If HeadTail(e1 , e3 ) and HeadTail(e2 , e3 ), then HeadHead(e1 , e2 ). E5. If HeadTail(e3 , e1 ) and HeadTail(e3 , e2 ), then TailTail(e1 , e2 ). E6. If HeadHead(e1 , e2 ) and TailTail(e1 , e2 ), then e1 = e2 . E7. For all e, ¬HeadTail(e, e). The purpose of axioms E1–E5 are for consistency, that is, the structures in STRUCT[E] are really graphs in the ordinary sense of graphs as structures in STRUCT[V]. (See Proposition 2 below.) Axiom E6 does not allow multiple edges, whereas Axiom E7 does not allow self-loop. Axioms E6 and E7 are not essential, but they will be useful for our convenience in the presentation. As usual, FO[E], MSO[E] and ∃MSO[E] denote the classes of first-order, monadic second-order and existential monadic second-order sentences in the logic E. An E- sentence is a sentence using the vocabulary E. We will usually write E to denote the elements in STRUCT[E] and Dom(E) to denote the domain of E. A sentence ϕ in E-logic defines a set of E-structures via G(ϕ) := {E | E |= ϕ}. 3 The equivalence between edge and vertex representations In this section we will show that both the edge and the vertex representations essentially denote the same class of objects. Definition 1. Let G ∈ STRUCT[V] and G ∈ STRUCT[E]. We say that G and E are structurally equivalent if there exists a 1-1 mapping ξ : E(G) → Dom(E) such that for all (v1 , v2 ), (v2 , v3 ), (v1 , v3 ) ∈ E(G) and e1 , e2 ∈ Dom(E), 1. ξ(v1 , v2 ) = e1 and ξ(v2 , v3 ) = e2 if and only if HeadTail(e1 , e2 ); 2. ξ(v1 , v2 ) = e1 and ξ(v1 , v3 ) = e2 if and only if TailTail(e1 , e2 ); and 3. ξ(v1 , v3 ) = e1 and ξ(v2 , v3 ) = e2 if and only if HeadHead(e1 , e2 ). The 1-1 mapping ξ is called a (V, E)-isomorphism. In other words, if G and E are structurally equivalent, then they essentially denote the same underlying directed graph. The following proposition states that this notion is robust. Proposition 1. (a) Let G be a V-structure and E 1 , E 2 be E-structures. If both E 1 and E 2 are struc- turally equivalent to G, then E 1 and E 2 are isomorphic. (b) Let G1 , G2 be V-structures and E be a E-structure. If both G1 and G2 are equiva- lent to E, then G1 and G2 are isomorphic. Moreover, the following proposition shows that both edge and vertex representa- tions are equivalent, in the sense that each graph stored using the standard vertex repre- sentation can be coded as a graph under the edge representation, and vice versa. Proposition 2. 1. For every V-structure G, there exists a unique (up to isomorphism) E-structure E which is structurally equivalent to G. 2. For every E-structure E, there exists a unique (up to isomorphism) V-structure G structurally equivalent to E. We do not state the full proof, but rather give an example of how the edge to vertex translation works. Example 1. Let E be an E-structure, where – Dom(E) = {e1 , e2 , e3 }; – HeadHead = {(e1 , e1 ), (e2 , e2 ), (e3 , e3 ), (e1 , e3 ), (e3 , e1 )}; – TailTail = {(e1 , e1 ), (e2 , e2 ), (e3 , e3 )}; – HeadTail = {(e1 , e2 ), (e2 , e3 ), (e3 , e2 )}. The following picture well illustrates the structure of E: e2 e1 - R I e3 We can get a V-structure G = (V (G), E(G)) equivalent to E as follows. Let H be the equivalent classes of HeadHead and T the equivalent classes of TailTail, i.e. H = {{e1 , e3 }, {e2 }} and T = {{e1 }, {e2 }, {e3 }}. Then we define G = (V (G), E(G)) as follows. The set of vertices is V (G) = H × T , and ((H1 , T1 ), (H2 , T2 )) ∈ E(G) if and only if T1 ∩ H2 6= ∅. It is depicted as follows. (∅, {e1 }) - R ({e2 }, {e3 }) ({e1 , e3 }, {e2 })I 4 Vertex and edge representations and their logics In this section we will study the relation between the expressive power of logics us- ing vertex or edge vocabularies. We need the following definition. For a set A ⊆ STRUCT[V], we define EquivE (A) as the set of E-structures which are equivalent to the structures in A. Formally, EquivE (A) = {E ∈ STRUCT[E] | E is structurally equivalent to some G ∈ A}. Vice versa, for a set B ⊆ STRUCT[E], we define EquivV (B) = {G ∈ STRUCT[V] | G is structurally equivalent to some E ∈ B}. By Proposition 1, it is immediate that for every sets A ⊆ STRUCT[V] and B ⊆ STRUCT[E], A = EquivV (EquivE (A)) and B = EquivE (EquivV (B)) From this we immediately get that A = EquivV (B) if and only if B = EquivE (A). Now we introduce the notion of (V, E)-equivalent, the logical version of Defin- tion 1. Definition 2. A V-sentence ϕ and an E-sentence ψ are (V, E)-equivalent if G(ϕ) = EquivV (G(ψ)), or equivalently, G(ψ) = EquivE (G(ϕ)). Using the notion of (V, E)-equivalent, we can now compare the expressive power between vertex and edge representations. Our first proposition shows that the edge rep- resentation is as least as expressive as the vertex representation: Proposition 3. Let L in {FO, ∃MSO, MSO}. Then, for every sentence ϕ ∈ L[V], there exists a sentence ψ ∈ L[E] such that ϕ and ψ are (V, E)-equivalent The proof is pretty straightforward, thus omitted. The natural question is whether the converse holds, that is, whether for every sen- tence using the edge representation we can find an equivalent sentence using the vertex representation. As we show below, it turns out that this is not true even for ∃MSO sentences, nor if the full power of MSO is allowed. Theorem 1. 1. There exists a sentence ψ ∈ ∃MSO[E] such that for all sentence ϕ ∈ ∃MSO[V], ψ and ϕ are not (V, E)-equivalent. 2. There exists a sentence ψ ∈ MSO[E] such that for all sentence ϕ ∈ MSO[V], ψ and ϕ are not (V, E)-equivalent. Proof. We begin with the ∃MSO case. The idea is to use the fact that (s, t)-reachability in directed graph is not expressible in ∃MSO[V] (see, for example, [13, Theorem 7.16]). For this we need to add two constants s and t to both V- and E-vocabularies, de- noting the source and target vertices respectively. The interpretation of the constants s and t in V-structures are the source and the target vertices, while their interpretation in E-structures are two edges: one whose tail is the source vertex, and the other whose head is the target vertex. We define the following class of V-structures consists of directed graphs in which there is a path from s to t.   there are v1 , . . . , vk s.t. v1 = s and vk = t and RV = G ∈ STRUCT[V] for each i = 1, . . . , k − 1, (vi , vi+1 ) ∈ E(G) It can be readily seen that the class EquivE (RV ) is expressible in ∃MSO[E] in the following sentence. There exists a set P such that – there is an edge y in P such that TailTail(y, s) holds; – there is an edge y in P such that HeadHead(y, t) holds; – for every edge y in P where ¬HeadHead(y, t), there is an edge z in P such that HeadTail(y, z) holds. This immediately implies that ∃MSO[E] is strictly more expressive than ∃MSO[V]. This proves the first case of the theorem. The proof for the second case goes along the same lines, this time using the fact that directed graph hamiltonicity (i.e., whether a graph is hamiltonian) is not expressible in MSO[V] (see, for example, [13, Corollary 7.24]). On the other hand, directed graph hamiltonicity can be expressed in the following MSO[E] sentence. There exists a set U such that – every two edges in U are connected (can be expressed as in the proof above); and – for every edge x, x is adjacent to some edge y in U (either HeadHead(x, y), TailTail(x, y), or HeadTail(x, y) holds); 2 Next, we compare the two representations for the case of first-order logic. It turns out that the edge and vertex representations are equivalent if one disallows second-order quantification. Moreover, we also show that this transformation involves only a slight increase in quantifier rank. Proposition 4. For every sentence ψ ∈ FO[E], there exists a sentence ϕ ∈ FO[V] such that they are (V, E)-equivalent and qr(ϕ) = 2qr(ψ). With respect to the vertex to edge transformation, the following is immediate from the proof of proposition 3 Corollary 1. For every sentence ϕ ∈ FO[V], there exists a sentence ψ ∈ FO[E] such that ϕ and ψ are (V, E)-equivalent and qr(ψ) = qr(ϕ) + 1. 5 Graph pebble automata In this section we define pebble automata for directed graphs. It is based on the idea of pebble automata (PA) for words over infinite alphabet [16]. Let D be a set of infinite symbols. We assume that the nodes in the directed graphs always come from D. Briefly the way graph PA with k pebbles works as follows. If G is a directed graph,  and (a  1 , b1 ), . . . , (an , bn ) are the edges in E(G), then we feed a sequence w = a1 an b1 · · · bn into graph k-PA. The pebbles are numbered from 1 to k. The automaton starts the computation with only pebble k on the sequence w. The pebbles are placed on/lifted from w in the stack discipline according to the strict order of the pebbles: Peb- ble i can be placed only when pebbles i + 1, . . . , k are above the sequence w. Each pebble is intended to mark one position in w and the smallest numbered pebble on w, or, equivalently the most recently placed pebble, serves as the head of the automaton. The automaton moves from one state to another depending on whether the edges read by the pebbles satisfy the HeadHead, TailTail, HeadTail relations. Definition 3. A two-way alternating graph k-pebble automaton, (in short graph k-PA) is a system A = hQ, q0 , F, µi, where – Q, q0 ∈ Q, U ⊆ Q and F ⊆ Q are a finite set of states, the initial state, the set of universal states and the set of final states, respectively; and – µ is a finite set of transitions of the form α → β such that • α is of the form (i, P, V00 , V10 , V01 , V11 , q) , where i ∈ {1, . . . , k}, P, V00 , V10 , V01 , V11 ⊆ {i + 1, . . . , k}, and • β is of the form (q, act), where q ∈ Q and act ∈ {left, right, place-pebble, lift-pebble}. Given a sequence of edges w = ab11 · · · abnn , a configuration of A on /w. is a triple   [i, q, θ], where i ∈ {1, . . . , k}, q ∈ Q and θ : {i, i + 1, . . . , k} → {0, 1, . . . , n, n + 1}. The function θ defines the position of the pebbles and is called the pebble assignment. The symbols in the positions 0 and n + 1 are / and ., respectively. The initial configuration is γ0 = [k, q0 , θ0 ], where θ0 (k) = 0 is the initial pebble assignment. A configuration [i, q, θ] with q ∈ F is called an accepting configuration. A transition (i, P, V00 , V01 , V10 , V11 , p) → β applies to a configuration [j, q, θ], if (1) i = j and p = q, (2) P = {l > i : θ(l) = θ(i)}, (3.a) V00 = {l > i : aθ(l) = aθ(i) }, (3.b) V10 = {l > i : bθ(l) = aθ(i) }, (3.c) V10 = {l > i : aθ(l) = bθ(i) }, and (3.d) V11 = {l > i : bθ(l) = bθ(i) }. A transition (i, P, V00 , V01 , V10 , V11 , p) → β applies to a configuration [j, q, θ], if con- ditions (1)–(3) above hold. We define the transition relation `A as follows: [i, q, θ] `A [i0 , q 0 , θ0 ], if there is a transition α → (p, act) ∈ µ that applies to [i, q, θ] such that q 0 = p, for all j > i, θ0 (j) = θ(j), and - if act = left, then i0 = i and θ0 (i) = θ(i) − 1, - if act = right, then i0 = i and θ0 (i) = θ(i) + 1, - if act = lift-pebble, then i0 = i + 1, - if act = place-pebble, then i0 = i − 1, θ0 (i − 1) = 0 and θ0 (i) = θ(i). As usual, we denote the reflexive, transitive closure of `A by `∗A . The acceptance criteria is based on the notion of leads to acceptance below. For every configuration γ = [i, q, θ], – if q ∈ F , then γ leads to acceptance; – if q ∈ U , then γ leads to acceptance if and only if for all configurations γ 0 such that γ ` γ 0 , γ 0 leads to acceptance; – if q ∈/ F ∪ U , then γ leads to acceptance if and only if there is at least one configu- ration γ 0 such that γ ` γ 0 , and γ 0 leads to acceptance. A sequence of edges ab11 · · · abnn is accepted by A, if the initial configuration γ0 leads   to acceptance. The language L(A) consists of all sequence of edges accepted by A. Ob- viously, the sequence w induces a set of directed edges Gw as explain in the beginning of this section. We have presented here the notion of alternating graph PA, since it is easier to work with for our purposes. However, it is not difficult to define instead the notion of deterministic graph PA. The next theorem shows that this choice is without loss of generality, as both models are equivalent. Theorem 2. 1. For each k ≥ 1, two-way non-deterministic graph k-PA and one-way deterministic graph k-PA have the same recognition power. 2. For each k ≥ 1, graph k-PA languages are closed under boolean operation. Next, we introduce the relationship between graph PA and First Order logic. Theorem 3. For every FO E-sentence ψ, there exists a graph k-PA Aψ such that k = qr(ψ) and L(A) = G(ψ)). Proof. The proof is an adaptation of similar result in [19]. First, by Theorem 2, PAk is closed under boolean operations. Let ϕ = Qxk ψ(xk ) where Q ∈ {∀, ∃} and ψ(xk ) is of quantifier rank k − 1. The proof is by straightforward induction on k. A k-PA A iterates pebble k through all possible positions in the input. On each iteration, the automaton A recursively calls a (k − 1)-PA A0 that accepts the language L(ψ(xk )), treating the position of pebble k as the assignment value for xk . - If Q = ∀, then A accepts w if and only if A0 accepts on all iterations. - If Q = ∃, then A accepts w if and only if A0 accepts on at least one iteration. 2 Notice that Theorem 3 is optimal in the sense that all k pebbles are needed. More precisely, it is possible to adapt the proof of [19] to show that for every k ≥ 2 there exists an FO E-sentence ψ, with k = qr(ψ), and such that L(A) 6= G(ψ)) for every graph PA A using less than k pebbles. 6 When the edges are labeled with symbols from finite alphabet In the usual graph databases setting the edges are labeled with symbols from a fixed finite alphabet. Each symbol can be viewed as a unary predicate on the edges. In this section we extend the vocabularies V and E with unary predicates on the edges, which we called extended V and E vocabularies. We also extend the definition of graph pebble automata for edges labeled with symbols from a fixed alphabet. In the following we let Σ be a fixed finite alphabet. Extended V logic. The vocabulary for the extended V logic consists of σ for each σ ∈ Σ, where each σ is a binary predicate on the domain. We denote by V∗ the extended V logic. An extended V-structure is a tuple G = (V, {σ}σ∈Σ ) such that V is the domain of nodes and the sets {σ}σ∈Σ are disjoints. Intuitively, each relation σ denotes the set of edges which are labeled with the symbol σ ∈ Σ. Since no edge can be labeled with two different symbols, the sets {σ}σ∈Σ are disjoint. Extended E logic. The vocabulary consists of HeadHead, HeadTail, TailTail, {σ}σ∈Σ , where each σ ∈ Σ is unary predicate on the domain. We denote by E∗ the extended E logic. An extended E-structure is a tuple E = (U, HeadHead, HeadTail, TailTail, {σ}σ∈Σ ), where U is the domain of edges, the relations HeadHead, HeadTail, TailTail on U are defined as before, and each σ ∈ Σ is a unary predicate on U . It is straightforward to show that all results on the vocabularies V and E still hold for the extended logics V∗ and E∗ . In the following we will elaborate this point more precisely. Definition 4. Let G be an V∗ structure and E an E∗ structure. We say that G and E are structurally equivalent if there exists S a 1-1 mapping ξ : E(G) → Dom(E) such that for all (v1 , v2 ), (v2 , v3 ), (v1 , v3 ) ∈ σ∈Σ σ and e1 , e2 ∈ Dom(E), 1. for each σ ∈ Σ, (v1 , v2 ) ∈ σ if and only if ξ(v1 , v2 ) ∈ σ; 2. ξ(v1 , v2 ) = e1 and ξ(v2 , v3 ) = e2 if and only if HeadTail(e1 , e2 ); 3. ξ(v1 , v2 ) = e1 and ξ(v1 , v3 ) = e2 if and only if TailTail(e1 , e2 ); and 4. ξ(v1 , v3 ) = e1 and ξ(v2 , v3 ) = e2 if and only if HeadHead(e1 , e2 ). The 1-1 mapping ξ is called a (V∗ , E∗ )-isomorphism. Theorem 4. 1. Let L in {FO, ∃MSO, MSO}. Then, for every sentence ϕ ∈ L[V∗ ], there exists a sentence ψ ∈ L[E∗ ] such that ϕ and ψ are (V∗ , E∗ )-equivalent 2. There exists a sentence ψ ∈ ∃MSO[E∗ ] such that for all sentence ϕ ∈ ∃MSO[V∗ ], ψ and ϕ are not (V∗ , E∗ )-equivalent. 3. There exists a sentence ψ ∈ MSO[E∗ ] such that for all sentence ϕ ∈ MSO[V∗ ], ψ and ϕ are not (V∗ , E∗ )-equivalent. Next we define a graph pebble automata with unary predicates on the edges. It is also prettymuchstraightforward   extension of Definition 3. In this case the input is of σ1 σn the form:  a1  · · ·  an  ∈ Σ × D × D, where σi ∈ Σ is the label of the edge b1 bn (ai , bi ). The transitions are of the form: (i, σ, P, V00 , V10 , V01 , V11 , p) → (q, act). It is straightforward to show that all the results in the previous section can be adapted for such graph pebble automata. More precisely, Theorem 5. 1. For PA with unary predicates, for each k ≥ 1, two-way non-deterministic graph k-PA and one-way deterministic graph k-PA have the same recognition power. 2. For each k ≥ 1, graph k-PA (with unary predicates) languages are closed under boolean operation. 3. For every FO E∗ -sentence ψ, there exists a graph k-PA Aψ with unary predicates such that k = qr(ψ) and L(A) = G(ψ)). 7 Future directions We would like to apply our logics and graph pebble automata in a more application oriented settings. Also, it is well known that the emptiness problem for graph pebble automata is undecidable. One direction that we would like to pursue is to charaterize a subclass of pebble automata, for which the emptiness problem is decidable. We also would like to define and study similar logics for matroid and extend the graph pebble automata for abstract matroid. Acknowledgments: We thank the anonymous referees for many helpful comments. Partial sup- port provided by EPSRC grant G049165 and FET-Open Project FoX, grant agreement 233599. References 1. S. Abiteboul, P. Buneman, D. Suciu. Data on the Web: From Relations to Semistructured Data and XML. Morgan Kauffman, 1999. 2. R. Angles, C. Gutiérrez. Survey of graph database models. ACM Comput. Surv. 40(1): (2008). 3. P. Barceló, C. Hurtado, L. Libkin, P. Wood. Expressive languages for path queries over graph- structured data. In PODS 2010. 4. D. Berwanger, D. Janin. Automata on Directed Graphs: Edge Versus Vertex Marking. In ICGT 2006. 5. B. Courcelle. The expression of graph properties and graph transformations in monadic second-order logic. In Handbook of Graph Grammars and Computing by Graph Transfor- mations, Volume 1: Foundations, 1997. 6. I. Cruz, A. Mendelzon, P. Wood. A graphical query language supporting recursion. In SIG- MOD 1987. 7. R. Fagin, L. J. Stockmeyer, and M. Y. Vardi. On monadic NP vs. monadic co-NP. Info. and Comp., 120(1):78–92, 1995. 8. N. Globerman and D. Harel. Complexity results for multi-pebble automata and their logics. In ICALP 1994. 9. C. Gutierrez, C. Hurtado, A. Mendelzon. Foundations of semantic web databases. In PODS 2004. 10. P. Hliněný. Branch-width, parse trees, and monadic second-order logic for matroids. J. Comb. Theory, Ser. B 96(3): 325–351 (2006) 11. R. Ladner, R. Lipton and L. Stockmeyer. Alternating Pushdown and Stack Automata. SIAM Journal of Comp. 13(1): 135–155, 1984. 12. U. Leser. A query language for biological networks. Bioinformatics 21 (suppl 2) (2005), ii33–ii39. 13. L. Libkin. Elements of Finite Model Theory. Springer, 2004. 14. A. O. Mendelzon, P. T. Wood. Finding regular simple paths in graph databases. SIAM J. Comput., 24(6):1235–1258, 1995. 15. R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, U. Alon. Network motifs: simple building blocks of complex networks. Science 298(5594) (2002), 824–827. 16. F. Neven, T. Schwentick, and V. Vianu. Finite state machines for strings over infinite alpha- bets. ACM ToCL, 5(3):403–435, 2004. 17. R. Ronen and O. Shmueli. SoQL: a language for querying and creating data in social net- works. In ICDE 2009. 18. T. Schwentick. On Winning Ehrenfeucht Games and Monadic NP. Ann. Pure Appl. Logic, 79(1), 61–92, 1996. 19. T. Tan. Graph reachability and pebble automata over infinite alphabets. In LICS 2009. 20. G. Turán. On the definability of properties of finite graphs. Discrete Mathematics, 49(3):291–302, 1984.