FO Queries Strongly Distributing over Components in Arbitrary Cardinality Francesco Di Cosmo dicosmo.francesco@gmail.com Abstract. In previous works a coordination-free strategy to compute Datalog with negation queries over databases distributed over many com- putational nodes has been studied, providing a syntactic characterization and proving the undecidability of the problem of deciding whether a query distributes. In view of a recast for FO queries, we report about a work in progress, namely the study of FO queries strongly distribut- ing over components. We prove that the decision problem of establishing whether a FO query strongly distributes over components is undecidable and highlight how some syntactic bonds typical of Datalog with negation, namely safeness, play a crucial role in specifying strongly distributive FO queries. Keywords: Strong distribution · Undecidability · Safeness 1 Introduction In [1] the problem of establishing which Datalog¬ queries distribute over com- ponents has been studied, i.e. to determine those queries q such that, for any database D, the following holds: [ q(D) = q(C) C∈cc(D) Here by cc (D) we denote the set of connected components of D (formally in- troduced in sec. 2.1). Informally, for example, if the database is a set of family trees, then each tree is a component, where its memebers are connected by family relationships. The aforementioned problem arises in the context of looking for a parallelized coordination-free query computation strategy over databases distributed across many computational nodes [2]. Indeed, we can interpret the right-hand term as the result of the following strategy: 1. Facts of the whole database are stored in a scattered fashion over many computational nodes. Hence, for each node there is a local database. 2. During a preliminary phase, nodes can communcate with each other to up- date local databases asking and getting all and only facts about individuals from the local database. In the end, each node will host some connected components of the global database, say one. 3. Each node computes the query over its local database, neglecting coordina- tion with other nodes, getting a set of local answers. 4. All the sets of local answers are merged through the union set operator. This is the result of the computation. By Datalog¬ we refer to a variant of standard Datalog¬ [3] whose queries are expressible through a program P and a goal g, such that: – P is a set of rules like: H←B where H, the head, is an asserted literal, over an intentional vocabulary, satisfying safeness, in the sense that its variables occurs also in B, the body, and B is a conjunction of literals, over the same vocabulary extended with an extensional one, also satisfying safeness, in the sense that every variable occurring in a negated literal in a rule occurs also in an asserted literal in the body of the same rule. – g is a rule without head, like: ⊥←B In both program and goal, neither constants nor equality are allowed. We say that a specification (P, g) is connected if, for each body B in (P, g), the graph (V, E) is connected, where V is the set of variables occurring in B and {x, y} ∈ E iff x and y occurs in the same asserted literal. For example the rule: E 0 (x, w) ← E(x, y) ∧ E(z, w) is not connected, but the following one is it: E 0 (x, w) ← E(x, y) ∧ E(y, w) Two results have been established, namely that: 1. A Datalog¬ query distributes over components iff it is specifiable by a con- nected specification; 2. The problem of deciding whether, given a Datalog¬ query q, q distributes over components is undecidable. These results are proved exploiting recursive specifications. With a view to re- casting these results in absence of recursion, in this paper we report on a work in progress, namely the study of first order (FO) queries distributing over com- ponents.1 Moreover, in this preliminary discussion, we abstract databases with purely relational structures of arbitrary cardinality and consider a stronger ver- sion of distribution over components, named strong distribution over compo- nents, i.e. those queries such that, for any structure S: [ [ q(S) = q(S 0 ) = q(C) S 0 ⊂S C∈cc(S) The questions we want to answer are: 1 We consider FO queries because, except for some details, they have the same ex- pressive power as Datalog¬ without recursion (see Codd’s theorems [3]). 1. Is it possible to characterize FO queries strongly distributing over compo- nents by syntactics means as in [1]? 2. Is the problem of deciding whether a FO query strongly distributes over components decidable? 2 Preliminaries 2.1 Structures and connected components Let L be a relational vocabulary, i.e. a finite non empty set of relation symbols R/n, with positive arity n > 0, and constant symbols c.2 A structure S over L is a not empty set Dom(S), called the domain of S, enriched by interpretations RS ⊂ Dom(S)n , for each relation symbol R/n ∈ L, and cS ∈ Dom(S), for each constant symbol c ∈ L. We say that a structure S 0 is a substructure of S (S is a superstructure of S ), S 0 ⊂ S, iff: 0 1. Dom(S 0 ) ⊂ Dom(S); 0 2. for each relation symbol R/n ∈ L, RS = RS ∩ Dom(S 0 )n ; 0 3. for each constant symbol c ∈ L, cS = cS .3 Given a subset A ⊂ Dom(S) such that, for each constant symbol c ∈ L, cS ∈ A , A generates a substructure A of S, the only one such that Dom(A) = A . The underlying graph of a structure S over L is the graph (V, E), where V = Dom(S) and {a, b} ∈ E iff there is a relation symbol R/n ∈ L and a n-tuple τ such that a, b ∈ τ ∈ RS . A connected component of S is a substruc- ture generated by a connected component of its underlying graph. The set of connected components of S is denoted cc (S). If S admits only one connected component, then we say that S is connected. Directed graphs are simple examples of relational structures. For instance the graph with vertex set V = {v1 , v2 , v3 } and edge set E = {(v1 , v2 ), (v3 , v3 )} can be considered as a structure S over the purely relational language L = {F/2}, where Dom(S) = V and F S = E. A substructure of S could be S 0 with 0 Dom(S 0 ) = {v1 , v3 } and F S = {(v3 , v3 )}, while the connected components of S are the structures ({v1 , v2 }, {(v1 , v2 )}) and ({v3 , v3 }, {(v3 , v3 )}). 2.2 Preservation theorems A FO formula ϕ (x1 , . . . , xn ) over a vocabulary L is preserved under superstruc- tures iff, for any two structures S 0 ⊂ S over L: ∀a1 , . . . , an ∈ Dom(S 0 ) S 0 |= ϕ (a1 , . . . , an ) ⇒ S |= ϕ (a1 , . . . , an ) 2 Note that no function symbols are allowed. 3 Therefore, cS ∈ Dom(S 0 ) holds. whereas, ϕ is preserved under substructures iff the vice versa is valid, i.e.: ∀a1 , . . . , an ∈ Dom(S 0 ) S |= ϕ (a1 , . . . , an ) ⇒ S 0 |= ϕ (a1 , . . . , an ) The following theorems [4] hold also for vocabularies involving function symbols. Theorem 1 (Preservation theorems). 1. A formula ϕ is preserved under superstructures iff it is equivalent to a for- mula in Σ1 , the set of prenex existential formulas. 2. A formula ϕ is preserved under substructures iff it is equivalent to a formula in Π1 , the set of prenex universal formulas. For example, ∃x x = x is a valid Σ1 sentence.4 Since it is valid, it is preserved under substructures and, by preservation theorem, it must be equivalent to a (valid) Π1 sentence, like ∀x x = x. Finally, note that a quantifier-free formula is both Σ1 and Π1 . 2.3 FO queries Let L be a purely relational vocabulary, i.e. a relational vocabulary also without constant symbols. A FO specification over L is a couple (ϕ, Ξ), where ϕ is a FO formula over L and Ξ is a sequence of all elements in Var(ϕ),5 the set of free variables in ϕ. A FO specification (ϕ (x1 , . . . , xn ) , Ξ) over L specify the FO query q(ϕ,Ξ) such that, for any structure S over L: q(S) = {(h(x))x∈Ξ |S |= ϕ (h(x1 ), . . . , h(xn ))} Given a FO query q we say that: 1. q is monotonic iff, for any two structures S 0 ⊂ S: q(S 0 ) ⊂ q(S) 2. q is local iff, for any structure S and a ∈ q(S): ∃C ∈ cc (S) a ∈ q(C) It is straightforward to prove that a FO query strongly distributes (as defined in sec. 1) iff it is both monotonic and local. Hence, we will split the study of strongly distributive queries in the study of monotonicity and locality. Lastly, it will be useful the notion of closure by constants: Definition 1. Given a FO formula ϕ (x1 , . . . , xn ) over a vocabulary L, let L0 be the extension of L with n new constant symbols c1 , . . . , cn . The closure by con- stants ϕc of ϕ is the sentence over L0 obtained replacing in ϕ each free variables xi with the constant ci , for any i ∈ {1, . . . , n}. 4 It is valid because the domain of a structure cannot be empty. 5 Eventually with repetitions. 3 Monotonicity We now characterize monotonic FO queries as those queries specifiable by a Σ1 formula. Proposition 1. Let ϕ (x1 , . . . , xn ) be a FO formula over a purely relational vocabulary L. A FO query q(ϕ,Ξ) is monotonic iff ϕ is preserved under super- structures. Proof. ⇒: Let S 0 ⊂ S and a1 , . . . , an ∈ Dom(S 0 ) such that S 0 |= ϕ (a1 , . . . , an ). Hence, there is at least a valuation h such that h(xi ) = ai for any i ∈ {1, . . . , n}, (h(x))x∈Ξ ∈ q(S 0 ) and, by hypothesis, (h(x))x∈Ξ ∈ q(S). So S |= ϕ (a1 , . . . , an ). ⇐: Similar to previous. Applying the preservation theorem 1, we obtain the following theorem. Theorem 2. A FO query q(ϕ,Ξ) is monotonic iff ϕ (ϕc ) is equivalent to a Σ1 formula (sentence). Therefore, to check whether a FO formula q(ϕ,Ξ) is monotonic amounts to check if the closure by constants ϕc is equivalent to a Σ1 sentence. We now prove that this last check is not algorithmically possible. Definition 2. Let F1 , F2 be two fragments of FO sentences over a vocabulary L. With Eq(F1 , F2 ) we denote the decision problem of establishing whether, given a sentence ϕ1 ∈ F1 , ∃ϕ2 ∈ F2 ϕ1 ↔ ϕ2 Lemma 1. Let F1 , F2 be two fragments of FO over a vocabulary L such that: 1. SAT (F1 ), the decision problem of satisfiability of a sentence in F1 , is unde- cidable; 2. SAT (F2 ) is decidable; 3. F1 contains all contradictory sentences. Then, Eq(F1 , F2 ) is undecidable. Proof. If Eq(F1 , F2 ) were decidable through an algorithm A, then, given a sen- tence ϕ ∈ F1 as input to A, – If the output is negative, then ϕ is not contradictory, namely satisfiable; – If the output is positive, then there is at least one ϕ0 ∈ F2 such that ϕ ↔ ϕ0 is valid and, by completeness of FO, also derivable. Since the set of derivable sentences is recursively enumerable, it is possible to algorithmically build up at least one such ϕ0 . Recalling that SAT (F2 ) is decidable, it is possible to decide if ϕ0 , so ϕ, is satisfiable. In both cases it would be possible to decide whether ϕ is satisfiable and SAT (F1 ) would be decidable, which contradicts the hypothesis 1. Theorem 3. Let L be a relational vocabulary sufficiently expressive, i.e. with at least one relational symbol R/n with n ≥ 2. Let also σ1 and π1 be, respectively, the set of existential sentences and universal sentences over L. Denoting with F O the set of all FO sentences over L, Eq(F O, σ1 ) and Eq(F O, π1 ) are both undecidable. Proof. It is well-known that the Bernays-Schönfinkel-Ramsey fragment (BSR), i.e. the set of FO prenex sentences (without function symbols)6 and with a prefix like ∃∗ ∀∗ , is such that SAT (BSR) is decidable [5]. Clearly, BSR contains both σ1 and π1 and they both contain all contradictory sentences.7 Since L is sufficiently expressive, SAT (F O) is undecidable [6]. Thereby, we can apply the previous lemma and obtain the thesis. We can summarize previous results in the following corollary: Corollary 1. The decision problem of establishing whether, given a FO query q(ϕ,Ξ) over a sufficiently expressive vocabulary, q(ϕ,Ξ) is monotonic is undecid- able. Since any contradictory formula specifies a local FO query,8 the same argument used in lemma 1 can be reused for the decision problem of establishing whether a FO formula ϕ specifies a strongly distributive query, i.e. if ϕ is such that: 1. ϕ specifies a local FO query; 2. ϕc is equivalent to a Σ1 sentence. So, we can conclude also the following corollary: Corollary 2. The decision problem of establishing wheter, given a FO query q(ϕ,Ξ) over a sufficiently expressive vocabulary, q(ϕ,Ξ) strongly distributes is un- decidable. 4 Locality Due to the previous section, we focus only on Σ1 formulas. Moreover, here we consider only quantifier-free disjunctive normal form (DNF) formulas without =.9 However, we report only about two syntactic bonds over conjunctions and 6 Recall that L is a purely relational vocabulary, hence function simbols would not occour anyway. 7 All contradictions are equivalent and ∃x x 6= x and ∀x x 6= x are two of them, the first in σ1 and the second in π1 . 8 Because, for any structures S, q(S) = ∅ holds. 9 Anyway, we are confident that adding equality would not change the core ideas of what follows, but would only require some more technicality, like an equality propagation procedure as the union-find algorithm in [7]. Yet, we do not prove it here. disjunctions necessary to admit locality, referable to safeness of Datalog¬ rules through the process of rectification and unfolding [3].10 We say that a formula ϕ (x1 , . . . , xn ) is local iff it specifies only local queries, i.e. iff, for any structure S and any a1 , . . . , an ∈ Dom(S):11 S |= ϕ (a1 , . . . , an ) ⇒ ∃C ∈ cc (S) C |= ϕ (a1 , . . . , an ) 4.1 Conjunctions Since a contradictory conjunction of literals is local, we will consider only sat- isfiable formulas. First, we focus on negative conjunctions, i.e. where all literals are negated, then, we take into account the remaining. Theorem 4. Let L be a purely relational vocabulary and ϕ (x1 , . . . , xn ) be a satisfiable negative conjunction of literals over L. Then ϕ is local iff n = 1. Proof. Since ϕ is satisfiable, it is not possible that in ϕ occur both an asserted literal and its negation. ⇒: Let S be a structure such that: • Dom(S) = {a1 , . . . , an }, where ai 6= aj if i 6= j; • for any relation symbol R/m ∈ L, S R = ∅.12 Therefore, each a ∈ Dom(S) forms a connected component and, since ϕ is negative: S |= ϕ(a1 , . . . , an ) Since ϕ is local, (a1 , . . . , an ) must lay on a single connected component. This is possible only if n = 1. ⇐: By hypothesis, ϕ is of the form ϕ(x). Then, let S be a structure and a ∈ Dom(S) such that: S |= ϕ(a) Clearly, ∃C ∈ cc (S) such that a ∈ Dom(C) and, by preservation theorem 2, also:13 C |= ϕ(a) By arbitrariness of S and a, ϕ is local. Theorem 5. Let L be a purely relational vocabulary and ϕ (x1 , . . . , xn ) a not negative satisfiable conjunction of literals over L. If ϕ is local, then ϕ is safe, i.e. any variable occurring in a negated literal ocours also in an asserted literal. 10 Rectification and unfolding are those processes that allow to translate Datalog¬ without recursion in FO. 11 Clearly, it follows also that a1 , . . . , an ∈ C. 12 I.e. S can be considered a plain set. 13 Each quantifier-free formula is also a Π1 formula. Proof. V Let ϕ (x1 , . . . , xn ) be a not negative satisfiable conjunction of literals i∈I Li , where x ∈ Var(ϕ) occurs only in negated literals. Say that x is x1 . Since ϕ is not contradictory, then there is a structure S and a1 , . . . , an ∈ Dom(S) such that: S |= ϕ (a1 , . . . , an ). Now consider the structure S 0 , obtained from S adding a new element a to Dom(S). Therefore, a does not take part in the interpretative part of S 0 and, clearly: S 0 |= ϕ (a, a2 , . . . , an ) Since {a} is the domain of a connected component of S 0 , the sequence (a, a2 , . . . , an ) does not lay on a single connected component and so ϕ is not local. 4.2 Disjunctions Theorem W 6. Let L be a purely relational vocabulary and ϕ (x1 , . . . , xn ) a dis- junction i∈I ϕi , where ϕi is a satisfiable Σ1 formula over L for each i ∈ I. If ϕ is local, then ϕ is regular, i.e. all disjuncts share the same set of free variables. W Proof. Let ϕ (x1 , . . . , xn ) be a satisfiable disjunction i∈I ϕi , where ϕi ∈ Σ1 for each i ∈ I. Suppose there are indices i, j ∈ I such that Var(ϕi ) 6= Var(ϕj ), say x ∈ Var(ϕi ) \ Var(ϕj ) and suppose that x is x1 . Since ϕj (y1 , . . . , ym ) is satisfiable, there is a structure S and a1 , . . . , am ∈ Dom(S) such that: S |= ϕj (a1 , . . . , am ) Let h be a valuation such that h(yi ) = ai for each i ∈ {1, . . . , m}. Now, as in the previous proof, consider a structure S 0 , obtained from S adding an element a to Dom(S). Then, the valuation k, obtained from h putting k(x) = a, still satisfies ϕj in S 0 (because x 6∈ Var(ϕj )). So, by semantic of disjunction: S 0 |= ϕ (k(x1 ), . . . , k(xn )) Since {k(x1 )} = {a} is the domain of a connected component of S 0 , (k(x1 ), . . . , k(xn )) does not lay on a single connected component and so ϕ is not local. 5 Conclusions and further work We have proved that, for sufficiently expressive vocabularies, the decision prob- lem of establishing if a FO query strongly distributes over components is unde- cidable. This has been possible through a contrast with the Entscheidungsprob- lem of satisfiability of F O formulas. However, we remark that we considered FO in its full expressive power, ignoring those syntactical bonds stemming from rectification and unfolding, i.e. reflecting Datalog¬ safeness. Would something change if those bonds were considered? In fact, tackling the problem of local- ity, we showed that those bonds, in the form of safe conjunctions and regular disjunctions, are necessary conditions to allow locality of DNF quantifier-free formulas. This preliminary result should be extended to a full classification of FO formulas specifying local FO queries. References 1. Ameloot, T.J., Ketsman, B., Neven, F., Zinn, D.: Datalog queries distributing over components. ACM TOCL 18(1), 1–35 (2017) 2. Ameloot, T.J., Ketsman, B., Neven, F., Zinn, D.: Weaker forms on monotonicity for declarative networking: a more fine-grained answer to the CALM-conjecture. ACM TODS 40(4), 1–45 (2016) 3. Ullman, J.D.: Principles of Database and Knowledge-Base Systems, Volume I. Com- puter Science Press, Rockville, Maryland (1989) 4. Chang, C.C., Keisler, H.: Model theory. Elsevier (1990) 5. Ramsey, F.P..: On a problem of formal logic. In: Gessel, I., Rota, G., CLASSIC PA- PERS IN COMBINATORICS 2009, pp. 1–24. Birkhäuser Boston (Springer) (2009) 6. Börger, E., Grädel, E., Gurevich, Y.: The classical decision problem.2nd edn. Springer Verlag (Springer Science & Business Media) (2001) 7. Aho, A.V., Hopcroft, J. E.: The design and analysis of computer algorithms. Addison-Wesley Longman Publishing Company, Inc., Boston, MA, USA (1974)