Toward Recursive View Update Strategies on Relations Van-Dang Tran3,1 , Hiroyuki Kato1,3 and Zhenjiang Hu2,1 1 National Institute of Informatics, Tokyo, Japan 2 Peking University, Beijing, China 3 The Graduate University for Advanced Studies, SOKENDAI, Kanagawa, Japan Abstract Recent work has shown how to use non-recursive Datalog as a programming language for view update strategies in relational databases. In this paper, we extend the idea by considering recursions in the Datalog language for more interesting view definitions (forward transformation) and update strategies (backward transformation), especially over relations that store complex data structures such as trees and graphs. Firstly, we present how recursive bidirectional transformations over such relations can be formulated in Datalog and encapsulated in high-order logic predicates. Secondly, we discuss the validation problem for well-behavedness of the recursive Datalog programs. Keywords Relational, XML, Datalog, Structural Recursion, Logic Programming 1. Introduction Recent work [1] has proposed an approach to use non-recursive Datalog - a well-known unidirectional language - to program view update strategies, i.e., backward transformations, more freely. The well-behavedness of a program is guaranteed by a validation algorithm. This is based on the interesting fact that while there may be many backward transformations for a forward transformation, there is at most one forward transformation for a backward transformation [2]. Therefore, the well-behavedness of a backward transformation can be validated by checking the existence of the corresponding forward transformation. Datalog without recursion is a suitable and friendly language for implementing data trans- formations in relational database management systems where views are commonly defined in relational algebra without recursion. However, in deductive databases where relations are used to represent more complex data such as graphs or tree-like data structures, recursion in Datalog plays a central role. In this paper, we extend the idea proposed in [1] to use recursive Datalog with extensions such as negation in programming more interesting bidirectional transformations over relations that represent complex data structures such as trees and graphs. Manually writing a recursive view update strategy is an expensive task for programmers and it is even more challenging to automatically validate such a program. The fixed-point semantics Bx 2021: 9th International Workshop on Bidirectional Transformations, part of STAF, June 21, 2021 " dangtv@nii.ac.jp (V. Tran); kato@nii.ac.jp (H. Kato); huzj@pku.edu.cn (Z. Hu) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) e 1 Man loye ager Emp Employee Bob 32 2 8 11 me e Ch Age am Age Expert ild Na N John 3 4 5 9 10 12 10 e e Mary Age Age Nam Nam Bob 32 26 Nil Nil 6 7 Nil Nil 13 14 Mary John Alice 26 10 42 Nil Nil Nil Nil edge Alice parent_id label id 42 1 Employee 2 2 Name 3 3 Bob Nil ... ... ... Figure 1: Storing an XML document in a single relation edge. of Datalog makes the well-behavedness property not expressible in first-order logic. Our key idea to solve the aforementioned challenge is to formulate a Datalog-written view update strategy into two parts: one for recursive patterns and another for inner update strategies. On the one hand, the recursive Datalog rules of the first part are predefined and pre-validated so that their well-behavedness is guaranteed. On the other hand, we allow programmers to manually write the inner update strategies in Datalog without recursion, which can be validated automatically. To combine these non-recursive rules with the predefined recursive ones, we extend Datalog with high-order predicates as in HiLog [3]. Specifically, the predefined recursive Datalog rules can be encapsulated in high-order predicates and called later in non-recursive rules written by users, and vice versa. 2. Recursive Bidirectional Transformations in Datalog 2.1. Shredding tree-structured data into relations For tree-structured data such as XML or JSON documents, we use the Edge shredding approach [4, 5] to store all the edges of the graph that represents the document. For simplicity, we consider unordered XML documents. Figure 1 shows an XML document of employees that is represented by a tree and stored in a single relation edge(parent_id,label,id). Each tuple in edge represents a node of the XML document, where id and parent_id are the ids of the node and its parent node, respectively, and label is the node’s tag name. We consider the value of a node as a tag name of a special child node Nil. For example, a tuple edge(3,‘Bob’,Nil) represents that the value of node 3 is ‘Bob’. Querying over the XML document now can be written in Datalog, a well-known relational data query language. Datalog can be considered as 11 % splitting the view 1 % splitting the source 12 subedge𝑣𝑖𝑒𝑤 (X, L, Y) :− edge𝑣𝑖𝑒𝑤 (X, L, Y), 2 subedge𝑙 (X, L, Y) :− edge(X, L, Y) , 𝑙 L = ‘Employee’, ¬ subedge𝑟 (_, _, X). L = ‘Employee’, ¬ edge(_, _, X) . 13 subedge𝑣𝑖𝑒𝑤 (X, L, Y) :− subedge𝑣𝑖𝑒𝑤 (_, _, X), 3 subedge𝑙 (X, L, Y) :− subedge𝑙 (_, _, X) , 𝑙 𝑙 edge(X, L, Y) . edge𝑣𝑖𝑒𝑤 (X, L, Y). 4 subedge𝑟 (X, L, Y) :− edge(X, L, Y) , 14 subedge𝑣𝑖𝑒𝑤 𝑟 (X, L, Y) :− edge𝑣𝑖𝑒𝑤 (X, L, Y), ¬ subedge𝑙 (X, L, Y). ¬ subedge𝑣𝑖𝑒𝑤 𝑙 (X, L, Y). 5 % two inner forward transformations 15 % two inner backward transformations 6 subedge𝑣𝑖𝑒𝑤 𝑙 (X, L, Y) :− subedge𝑙 (X, L, Y). 16 subedge𝑛𝑒𝑤 𝑙 (X, L, Y) :− subedge𝑣𝑖𝑒𝑤 𝑙 (X, L, Y) 7 subedge𝑣𝑖𝑒𝑤 𝑟 (X, L, Y) :− subedge𝑟 (X, L, Y), 17 subedge𝑟 (X, L, Y) :− subedge𝑣𝑖𝑒𝑤 𝑛𝑒𝑤 𝑟 (X, L, Y). subedge𝑟 (_, _, X) . 18 subedge𝑛𝑒𝑤 𝑟 (X, L, Y) :− subedge𝑟 (X, L, Y), 8 % merging ¬ subedge𝑟 (_, _, X). 9 edge𝑣𝑖𝑒𝑤 (X, L, Y) :− subedge𝑣𝑖𝑒𝑤 (X, L, Y). 19 % merging 𝑙 10 edge𝑣𝑖𝑒𝑤 (X, L, Y) :− subedge𝑣𝑖𝑒𝑤 (X, L, Y). 20 edge𝑛𝑒𝑤 (X, L, Y) :− subedge𝑛𝑒𝑤 𝑙 (X, L, Y). 𝑟 21 edge𝑛𝑒𝑤 (X, L, Y) :− subedge𝑛𝑒𝑤 𝑟 (X, L, Y). Figure 2: Forward (on the left) and backward (on the right) transformations. Prolog without functions. The recursion in Datalog makes it a suitable language for queries over the edge relation since recursive computation is required for tree traversals. Example 2.1. The following recursive Datalog rules extract from the tree in Figure 1 the subtree (the left subtree, stored in relation subedge𝑙 ) that contains only employees: ( r1 ) subedge𝑙 (X, L, Y) :− edge(X, L, Y) , L = ‘Employee’, ¬ edge(_, _, X). ( r2 ) subedge𝑙 (X, L, Y) :− subedge𝑙 (_, _, X), edge(X, L, Y) . Each Datalog rule represents a first-order logic sentence where the conjunction of all predicates in the rule body (right-hand side) implies the predicate in the rule head (left-hand side), and all variables are universally quantified [6]. Computing the relation in the rule head is computing the smallest instance that satisfies all the Datalog rules. For subedge𝑙 in these two rules, the first rule gives subedge𝑙 an initial instance of all outgoing edges edge(X,L,Y) of label ‘Employee’ from the root node X that has no parent node (¬edge(_,_,X)). The second rule recursively adds to subedge𝑙 all the outgoing edges from a node X that is reachable in subedge𝑙 . 2.2. Bidirectional Transformations over the Tree We shall use a tree lens combinator xfork presented in [7] to demonstrate how Datalog can be used to implement such a bidirectional transformation over a tree stored in a relation edge. As in xfork, we first split the original tree in Figure 1 into two subtrees (a left subtree and a right subtree) according to the labels of the outgoing edges from the root. For the left subtree, we apply an identity transformation. For the right subtree we apply a simple bidirectional transformation (similar to the hoist lens in [7]) that removes the outgoing edge from the root edge(1,‘manager’,11) (to consider an expert, who is a manager, as an employee). Finally, the two tree views obtained from the two inner bidirectional transformations are merged into a single tree. Figure 2 shows the forward and backward transformations written in Datalog. The forward direction works as follows. To split the original tree, we use two recursive Datalog rules in Example 2.1 to extract the left subtree subedge𝑙 (the tree of nodes 1 to 10). Then the right subtree subedge𝑟 , which is the remaining part, is the one that has all the edges in edge but not in subedge𝑙 and is computed by a rule in Line 4. Line 6 calculates a view as an identity instance of subedge𝑙 . Line 7 calculates a view by excluding the edge from the root of subedge𝑟 . In this rule, the second predicate of the body makes sure that node X has a parent node, i.e., node X is not the root. Lines 9 and 10 merge the two views into one edge𝑣𝑖𝑒𝑤 by taking the union. If the root nodes of these views have different IDs, we write an additional rule to unify them. In the backward direction, we have similar operations. Lines 12, 13, and 14 split the view edge𝑣𝑖𝑒𝑤 . Line 16 calculates the first new source subedge𝑛𝑒𝑤 𝑙 for subedge𝑣𝑖𝑒𝑤 𝑙 as the same instance. Lines 17 and 18 compute the second new source subedge𝑟 by taking all the edges 𝑛𝑒𝑤 of the view subedge𝑣𝑖𝑒𝑤 𝑟 and the outgoing edge from the root in the original source subedge𝑟 . Predicate ¬subedge𝑟 (_,_,X) in Line 18 is to ensure that node X has no parent node, i.e., X is the root. Lines 20 and 21 merge the two new sources. The forward and backward transformations in Figure 2 can be considered as program tem- plates for writing more bidirectional transformations over the tree. The Datalog rules of splitting and merging operations can be adjusted by changing the constant ‘Employee’ to another label. The two inner bidirectional transformations (Rules 6, 7, 16, 17, and 18) can be arbitrarily im- plemented. Since the splitting and merging operations are fixed, the well-behavedness of the program templates is guaranteed if the two inner bidirectional transformations are well-behaved. Our observation is that an inner bidirectional transformation works on a component that is structurally smaller than the original tree, and thus is easier to manually implement. Meanwhile, writing decomposing operations, e.g., splitting, requires more knowledge about recursion. This suggests a new approach to make bidirectional transformations over tree-structured data programmable in Datalog. The key idea is to predefine a variety of program templates that employ the recursive computation power of Datalog to work on recursive data structures like trees. Programmers can reuse these templates by manually implementing inner bidirectional transformations without recursion. Our example shows a typical program over a recursive data structure with three operations: decomposing (splitting) the tree, then processing each component, and merging the results. By recursively decomposing all components, we obtain structural recursions over the tree. In our program templates, we predefine recursive Datalog rules to deal with these recursion forms. We consider only a fixed number of commonly used recursive patterns that always terminate and are sufficiently expressive in practice. To enhance the reusability of program templates, we shall encapsulate Datalog rules in high-order predicates that can be easily recalled. 2.3. Encapsulating Datalog rules in high-order predicates We can consider subedge𝑙 and subedge𝑟 of the splitting operations (Rules 2, 3, and 4 in Figure 1) as a pair resulting from a split function over a relation edge and a constant ‘Employee’. The pair can be represented by a function {‘l’ ↦→ subedge𝑙 , ‘r’ ↦→ subedge𝑟 }. Therefore, we have: subedge𝑙 = split(edge,‘Employee’)(‘l’) and subedge𝑟 = split(edge, ‘Employee’)(‘r’). Rules 2, 3, and 4 can be parameterized with src and Label as follows: split ( src , Label ) ( ‘ l ’ ) (X, L, Y) :− src (X, L, Y) , L = Label , ¬ src(_, _, X). split ( src , Label ) ( ‘ l ’ ) (X, L, Y) :− split ( src , Label ) ( ‘ l ’ ) (_, _, X), src (X, L, Y) . split ( src , Label ) ( ‘ r ’ ) (X, L, Y) :− src (X, L, Y) , ¬ split( src , Label ) ( ‘ l ’ ) (X, L, Y) . Where split can be considered as a high-order logic predicate as in the HiLog syntax [3]. Similarly, we can have a predicate merge for the merging operation. By encapsulating the two inner backward transformations in two predicates put1 and put2 , the whole backward transformation can also be encapsulated in a single predicate: fork 𝑝𝑢𝑡 (put1 , put2 , Label)(src, view)(X, L, Y) :− merge( put1 (split ( src , Label ) ( ‘ l ’ ) , split (view, Label ) ( ‘ l ’ ) ) , put2 (split ( src , Label ) ( ‘ r ’ ) , split (view, Label ) ( ‘ r ’ ) ) ) (X, L, Y) . More interestingly, the corresponding forward transformations 𝑔𝑒𝑡1 , 𝑔𝑒𝑡2 , and 𝑓 𝑜𝑟𝑘𝑔𝑒𝑡 can be uniquely determined from their backward transformations 𝑝𝑢𝑡1 , 𝑝𝑢𝑡2 , and 𝑓 𝑜𝑟𝑘𝑝𝑢𝑡 , respectively [8, 2]. Therefore, the high-order predicates of 𝑝𝑢𝑡1 , 𝑝𝑢𝑡2 , and 𝑓 𝑜𝑟𝑘𝑝𝑢𝑡 are sufficient to capture the corresponding bidirectional transformations. 3. Validation As presented in Section 2, a Datalog program template consists of predefined Datalog rules and user-written rules. Validating the well-behavedness of bidirectional transformations specified by these rules plays a central role in our approach. While the predefined part can be pre-validated, an automated validator is required for statically checking the inner bidirectional transformation programs, which are arbitrarily written. Although many properties of Datalog rules can be reduced to the satisfiability of Datalog queries, it is well known that the satisfiability problem of a Datalog query is undecidable [9]. Fortunately, the predefined Datalog rules have performed more complicated computations such as recursion so that the user-written inner bidirectional transformations are much simpler, and thus do not require the full syntax of the Datalog language. The more sophisticated the predefined part is, the simpler the user-written part is, and thus it can be expressed in a restricted class of Datalog where the program’s well-behavedness is decidable. Indeed, for example, all the Datalog rules of the inner bidirectional transformations in Figure 2 are in guarded-negation Datalog (GN-Datalog) where the satisfiability problem is decidable [10]. Recursive Datalog rules in program templates are predefined to deal with common forms of structural recursion rather than arbitrary recursions. Therefore, to validate a property of these recursive rules, we can apply structural induction on the data structure to construct a proof for the property. The main challenge to applying structural induction is that the data structure is not explicitly described by an inductive type but shredded into an edge relation. The structure is preserved by constraints on the edge relation, e.g., primary/foreign keys on id and so forth. 4. Conclusion We have presented our ideas to use Datalog as a programming language for bidirectional transformations over tree-structured data. Although this short paper has demonstrated a simple example, we believe Datalog is expressive enough to cover many interesting recursive view update strategies over trees and even some restricted graphs. Acknowledgments This work is partially supported by the Japan Society for the Promotion of Science (JSPS) Grant-in-Aid for Scientific Research (S) No. 17H06099. References [1] V.-D. Tran, H. Kato, Z. Hu, Programmable view update strategies on relations, PVLDB 13 (2020) 726–739. [2] S. Fischer, Z. Hu, H. Pacheco, The essence of bidirectional programming, Science China Information Sciences 58 (2015) 1–21. [3] W. Chen, M. Kifer, D. S. Warren, HILOG: A foundation for higher-order logic programming, J. Log. Program. 15 (1993) 187–230. [4] D. Florescu, D. Kossmann, Storing and querying XML data using an RDMBS, IEEE Data Eng. Bull. 22 (1999) 27–34. [5] I. Tatarinov, S. Viglas, K. S. Beyer, J. Shanmugasundaram, E. J. Shekita, C. Zhang, Storing and querying ordered XML using a relational database system, in: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin, USA, June 3-6, 2002, 2002, pp. 204–215. [6] S. Ceri, G. Gottlob, L. Tanca, What you always wanted to know about datalog (and never dared to ask), TKDE 1 (1989) 146–166. [7] J. N. Foster, M. B. Greenwald, J. T. Moore, B. C. Pierce, A. Schmitt, Combinators for bi-directional tree transformations: A linguistic approach to the view update problem, in: Proceedings of the 32Nd ACM SIGPLAN-SIGACT Symposium on Principles of Program- ming Languages, 2005, pp. 233–246. [8] S. Fischer, Z. Hu, H. Pacheco, A clear picture of lens laws, in: International Conference on Mathematics of Program Construction, 2015, pp. 215–223. [9] O. Shmueli, Equivalence of datalog queries is undecidable, The Journal of Logic Program- ming 15 (1993) 231 – 241. [10] V. Bárány, B. ten Cate, M. Otto, Queries with guarded negation, Proc. VLDB Endow. 5 (2012) 1328–1339.