<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Toward Recursive View Update Strategies on Relations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Van-Dang Tran</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hiroyuki Kato</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhenjiang Hu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Institute of Informatics</institution>
          ,
          <addr-line>Tokyo</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Peking University</institution>
          ,
          <addr-line>Beijing</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>The Graduate University for Advanced Studies</institution>
          ,
          <addr-line>SOKENDAI, Kanagawa</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Recent work has shown how to use non-recursive Datalog as a programming language for view update strategies in relational databases. In this paper, we extend the idea by considering recursions in the Datalog language for more interesting view definitions (forward transformation) and update strategies (backward transformation), especially over relations that store complex data structures such as trees and graphs. Firstly, we present how recursive bidirectional transformations over such relations can be formulated in Datalog and encapsulated in high-order logic predicates. Secondly, we discuss the validation problem for well-behavedness of the recursive Datalog programs.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Relational</kwd>
        <kwd>XML</kwd>
        <kwd>Datalog</kwd>
        <kwd>Structural Recursion</kwd>
        <kwd>Logic Programming</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Recent work [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] has proposed an approach to use non-recursive Datalog - a well-known
unidirectional language - to program view update strategies, i.e., backward transformations,
more freely. The well-behavedness of a program is guaranteed by a validation algorithm.
This is based on the interesting fact that while there may be many backward transformations
for a forward transformation, there is at most one forward transformation for a backward
transformation [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Therefore, the well-behavedness of a backward transformation can be
validated by checking the existence of the corresponding forward transformation.
      </p>
      <p>Datalog without recursion is a suitable and friendly language for implementing data
transformations in relational database management systems where views are commonly defined in
relational algebra without recursion. However, in deductive databases where relations are used
to represent more complex data such as graphs or tree-like data structures, recursion in Datalog
plays a central role.</p>
      <p>
        In this paper, we extend the idea proposed in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] to use recursive Datalog with extensions
such as negation in programming more interesting bidirectional transformations over relations
that represent complex data structures such as trees and graphs.
      </p>
      <p>Manually writing a recursive view update strategy is an expensive task for programmers and
it is even more challenging to automatically validate such a program. The fixed-point semantics</p>
      <sec id="sec-1-1">
        <title>Expert</title>
        <p>11
12
me Age
a</p>
        <p>N
13 14
liceA 24
Nil Nil
2 C</p>
        <sec id="sec-1-1-1">
          <title>Name egA hild</title>
          <p>3
B
o
b
Nil
4
3
2
Nil 6
5
me Age
a
N
J
o
h
n
7
1
0
Nil Nil
edge
parent_id
1
2
3
. . .</p>
          <p>Employee
Manager
1</p>
        </sec>
      </sec>
      <sec id="sec-1-2">
        <title>Employee</title>
        <p>e8 Ag
am e
N
9
10
M
a
r
y
Nil
2
6
Nil
label</p>
      </sec>
      <sec id="sec-1-3">
        <title>Employee</title>
      </sec>
      <sec id="sec-1-4">
        <title>Name</title>
        <p>Bob
. . .</p>
        <p>id
2
3
Nil
. . .
&lt;Employees&gt;
&lt;Employee&gt;
&lt;Name&gt; Bob &lt;/Name&gt;
&lt;Age&gt; 32 &lt;/Age&gt;
&lt;Child&gt;
&lt;Name&gt; John &lt;/Name&gt;
&lt;Age&gt; 10 &lt;/Age&gt;
&lt;/Child&gt;
&lt;/Employee&gt;
&lt;Employee&gt;
&lt;Name&gt; Mary &lt;/Name&gt;
&lt;Age&gt; 26 &lt;/Age&gt;
&lt;/Employee&gt;
&lt;Manager&gt;
&lt;Expert&gt;
&lt;Name&gt; Alice &lt;/Name&gt;
&lt;Age&gt; 42 &lt;/Age&gt;
&lt;/Expert&gt;
&lt;/Manager&gt;
&lt;/Employees&gt;
of Datalog makes the well-behavedness property not expressible in first-order logic.</p>
        <p>
          Our key idea to solve the aforementioned challenge is to formulate a Datalog-written view
update strategy into two parts: one for recursive patterns and another for inner update strategies.
On the one hand, the recursive Datalog rules of the first part are predefined and pre-validated
so that their well-behavedness is guaranteed. On the other hand, we allow programmers to
manually write the inner update strategies in Datalog without recursion, which can be validated
automatically. To combine these non-recursive rules with the predefined recursive ones, we
extend Datalog with high-order predicates as in HiLog [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Specifically, the predefined recursive
Datalog rules can be encapsulated in high-order predicates and called later in non-recursive
rules written by users, and vice versa.
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Recursive Bidirectional Transformations in Datalog</title>
      <p>
        2.1. Shredding tree-structured data into relations
For tree-structured data such as XML or JSON documents, we use the Edge shredding approach
[
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ] to store all the edges of the graph that represents the document. For simplicity, we consider
unordered XML documents. Figure 1 shows an XML document of employees that is represented
by a tree and stored in a single relation edge(parent_id,label,id). Each tuple in edge
represents a node of the XML document, where id and parent_id are the ids of the node and
its parent node, respectively, and label is the node’s tag name. We consider the value of a
node as a tag name of a special child node Nil. For example, a tuple edge(3,‘Bob’,Nil)
represents that the value of node 3 is ‘Bob’. Querying over the XML document now can be
written in Datalog, a well-known relational data query language. Datalog can be considered as
      </p>
      <p>Prolog without functions. The recursion in Datalog makes it a suitable language for queries
over the edge relation since recursive computation is required for tree traversals.
Example 2.1. The following recursive Datalog rules extract from the tree in Figure 1 the subtree
(the left subtree, stored in relation subedge) that contains only employees:
( r1 )
( r2 )
subedge(X, L, Y) :− edge(X, L, Y) , L = ‘ Employee’, ¬ edge(_, _, X) .</p>
      <p>subedge(X, L, Y) :− subedge(_, _ , X) , edge(X, L, Y) .</p>
      <p>
        Each Datalog rule represents a first-order logic sentence where the conjunction of all predicates
in the rule body (right-hand side) implies the predicate in the rule head (left-hand side), and all
variables are universally quantified [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Computing the relation in the rule head is computing the
smallest instance that satisfies all the Datalog rules. For subedge in these two rules, the first
rule gives subedge an initial instance of all outgoing edges edge(X,L,Y) of label ‘Employee’
from the root node X that has no parent node (¬edge(_,_,X)). The second rule recursively
adds to subedge all the outgoing edges from a node X that is reachable in subedge.
2.2. Bidirectional Transformations over the Tree
We shall use a tree lens combinator xfork presented in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] to demonstrate how Datalog can be
used to implement such a bidirectional transformation over a tree stored in a relation edge.
As in xfork, we first split the original tree in Figure 1 into two subtrees (a left subtree and a
right subtree) according to the labels of the outgoing edges from the root. For the left subtree,
we apply an identity transformation. For the right subtree we apply a simple bidirectional
transformation (similar to the hoist lens in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]) that removes the outgoing edge from the root
edge(1,‘manager’,11) (to consider an expert, who is a manager, as an employee). Finally,
the two tree views obtained from the two inner bidirectional transformations are merged into a
single tree. Figure 2 shows the forward and backward transformations written in Datalog.
      </p>
      <p>The forward direction works as follows. To split the original tree, we use two recursive
Datalog rules in Example 2.1 to extract the left subtree subedge (the tree of nodes 1 to 10).
Then the right subtree subedge, which is the remaining part, is the one that has all the edges
in edge but not in subedge and is computed by a rule in Line 4. Line 6 calculates a view as an
identity instance of subedge. Line 7 calculates a view by excluding the edge from the root of
subedge. In this rule, the second predicate of the body makes sure that node X has a parent
node, i.e., node X is not the root. Lines 9 and 10 merge the two views into one edge by
taking the union. If the root nodes of these views have diferent IDs, we write an additional
rule to unify them.</p>
      <p>In the backward direction, we have similar operations. Lines 12, 13, and 14 split the view
edge. Line 16 calculates the first new source subedge for subedge as the same
instance. Lines 17 and 18 compute the second new source subedge by taking all the edges
of the view subedge and the outgoing edge from the root in the original source subedge.
Predicate ¬subedge(_,_,X) in Line 18 is to ensure that node X has no parent node, i.e., X is
the root. Lines 20 and 21 merge the two new sources.</p>
      <p>The forward and backward transformations in Figure 2 can be considered as program
templates for writing more bidirectional transformations over the tree. The Datalog rules of splitting
and merging operations can be adjusted by changing the constant ‘Employee’ to another label.
The two inner bidirectional transformations (Rules 6, 7, 16, 17, and 18) can be arbitrarily
implemented. Since the splitting and merging operations are fixed, the well-behavedness of the
program templates is guaranteed if the two inner bidirectional transformations are well-behaved.</p>
      <p>Our observation is that an inner bidirectional transformation works on a component that is
structurally smaller than the original tree, and thus is easier to manually implement. Meanwhile,
writing decomposing operations, e.g., splitting, requires more knowledge about recursion.
This suggests a new approach to make bidirectional transformations over tree-structured data
programmable in Datalog. The key idea is to predefine a variety of program templates that
employ the recursive computation power of Datalog to work on recursive data structures like
trees. Programmers can reuse these templates by manually implementing inner bidirectional
transformations without recursion.</p>
      <p>
        Our example shows a typical program over a recursive data structure with three operations:
decomposing (splitting) the tree, then processing each component, and merging the results. By
recursively decomposing all components, we obtain structural recursions over the tree. In our
program templates, we predefine recursive Datalog rules to deal with these recursion forms.
We consider only a fixed number of commonly used recursive patterns that always terminate
and are suficiently expressive in practice. To enhance the reusability of program templates, we
shall encapsulate Datalog rules in high-order predicates that can be easily recalled.
2.3. Encapsulating Datalog rules in high-order predicates
We can consider subedge and subedge of the splitting operations (Rules 2, 3, and 4 in Figure 1)
as a pair resulting from a split function over a relation edge and a constant ‘Employee’.
The pair can be represented by a function {‘l’ ↦→ subedge, ‘r’ ↦→ subedge}. Therefore,
we have: subedge = split(edge,‘Employee’)(‘l’) and subedge = split(edge,
‘Employee’)(‘r’). Rules 2, 3, and 4 can be parameterized with src and Label as follows:
split ( src , Label ) ( ‘ l ’ ) (X, L, Y) :− src (X, L, Y) , L = Label , ¬ src(_, _ , X).
split ( src , Label ) ( ‘ l ’ ) (X, L, Y) :− split ( src , Label ) ( ‘ l ’ ) (_ , _ , X), src (X, L, Y) .
split ( src , Label ) ( ‘ r ’ ) (X, L, Y) :− src (X, L, Y) , ¬ split( src , Label ) ( ‘ l ’ ) (X, L, Y) .
Where split can be considered as a high-order logic predicate as in the HiLog syntax [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>Similarly, we can have a predicate merge for the merging operation. By encapsulating the
two inner backward transformations in two predicates put1 and put2, the whole backward
transformation can also be encapsulated in a single predicate:
fork(put1, put2, Label)(src, view)(X, L, Y) :− merge(
put1(split ( src , Label ) ( ‘ l ’ ) , split (view, Label ) ( ‘ l ’ ) ) ,
put2(split ( src , Label ) ( ‘ r ’ ) , split (view, Label ) ( ‘ r ’ ) ) ) (X, L, Y) .</p>
      <p>
        More interestingly, the corresponding forward transformations 1, 2, and   can be
uniquely determined from their backward transformations 1, 2, and  , respectively
[
        <xref ref-type="bibr" rid="ref2 ref8">8, 2</xref>
        ]. Therefore, the high-order predicates of 1, 2, and   are suficient to capture
the corresponding bidirectional transformations.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Validation</title>
      <p>As presented in Section 2, a Datalog program template consists of predefined Datalog rules and
user-written rules. Validating the well-behavedness of bidirectional transformations specified by
these rules plays a central role in our approach. While the predefined part can be pre-validated,
an automated validator is required for statically checking the inner bidirectional transformation
programs, which are arbitrarily written.</p>
      <p>
        Although many properties of Datalog rules can be reduced to the satisfiability of Datalog
queries, it is well known that the satisfiability problem of a Datalog query is undecidable [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
Fortunately, the predefined Datalog rules have performed more complicated computations such
as recursion so that the user-written inner bidirectional transformations are much simpler,
and thus do not require the full syntax of the Datalog language. The more sophisticated the
predefined part is, the simpler the user-written part is, and thus it can be expressed in a restricted
class of Datalog where the program’s well-behavedness is decidable. Indeed, for example, all
the Datalog rules of the inner bidirectional transformations in Figure 2 are in guarded-negation
Datalog (GN-Datalog) where the satisfiability problem is decidable [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>Recursive Datalog rules in program templates are predefined to deal with common forms of
structural recursion rather than arbitrary recursions. Therefore, to validate a property of these
recursive rules, we can apply structural induction on the data structure to construct a proof for
the property. The main challenge to applying structural induction is that the data structure is
not explicitly described by an inductive type but shredded into an edge relation. The structure
is preserved by constraints on the edge relation, e.g., primary/foreign keys on id and so forth.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>We have presented our ideas to use Datalog as a programming language for bidirectional
transformations over tree-structured data. Although this short paper has demonstrated a simple
example, we believe Datalog is expressive enough to cover many interesting recursive view
update strategies over trees and even some restricted graphs.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>V.-D.</given-names>
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <source>Programmable view update strategies on relations, PVLDB</source>
          <volume>13</volume>
          (
          <year>2020</year>
          )
          <fpage>726</fpage>
          -
          <lpage>739</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Fischer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Pacheco</surname>
          </string-name>
          ,
          <article-title>The essence of bidirectional programming</article-title>
          ,
          <source>Science China Information Sciences</source>
          <volume>58</volume>
          (
          <year>2015</year>
          )
          <fpage>1</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kifer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Warren</surname>
          </string-name>
          ,
          <article-title>HILOG: A foundation for higher-order logic programming</article-title>
          ,
          <source>J. Log. Program</source>
          .
          <volume>15</volume>
          (
          <year>1993</year>
          )
          <fpage>187</fpage>
          -
          <lpage>230</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Florescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kossmann</surname>
          </string-name>
          ,
          <article-title>Storing and querying XML data using an RDMBS, IEEE Data Eng</article-title>
          .
          <source>Bull</source>
          .
          <volume>22</volume>
          (
          <year>1999</year>
          )
          <fpage>27</fpage>
          -
          <lpage>34</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>I.</given-names>
            <surname>Tatarinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Viglas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. S.</given-names>
            <surname>Beyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shanmugasundaram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Shekita</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. Zhang,</surname>
          </string-name>
          <article-title>Storing and querying ordered XML using a relational database system</article-title>
          ,
          <source>in: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data</source>
          , Madison, Wisconsin, USA, June 3-6,
          <year>2002</year>
          ,
          <year>2002</year>
          , pp.
          <fpage>204</fpage>
          -
          <lpage>215</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ceri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Gottlob</surname>
          </string-name>
          , L. Tanca,
          <article-title>What you always wanted to know about datalog (and never dared to ask)</article-title>
          ,
          <source>TKDE</source>
          <volume>1</volume>
          (
          <year>1989</year>
          )
          <fpage>146</fpage>
          -
          <lpage>166</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J. N.</given-names>
            <surname>Foster</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. B.</given-names>
            <surname>Greenwald</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. T.</given-names>
            <surname>Moore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. C.</given-names>
            <surname>Pierce</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Schmitt</surname>
          </string-name>
          ,
          <article-title>Combinators for bi-directional tree transformations: A linguistic approach to the view update problem</article-title>
          ,
          <source>in: Proceedings of the 32Nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages</source>
          ,
          <year>2005</year>
          , pp.
          <fpage>233</fpage>
          -
          <lpage>246</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Fischer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Pacheco</surname>
          </string-name>
          ,
          <article-title>A clear picture of lens laws</article-title>
          ,
          <source>in: International Conference on Mathematics of Program Construction</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>215</fpage>
          -
          <lpage>223</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>O.</given-names>
            <surname>Shmueli</surname>
          </string-name>
          ,
          <article-title>Equivalence of datalog queries is undecidable</article-title>
          ,
          <source>The Journal of Logic Programming</source>
          <volume>15</volume>
          (
          <year>1993</year>
          )
          <fpage>231</fpage>
          -
          <lpage>241</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>V.</given-names>
            <surname>Bárány</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ten
            <surname>Cate</surname>
          </string-name>
          , M. Otto,
          <article-title>Queries with guarded negation</article-title>
          ,
          <source>Proc. VLDB Endow</source>
          .
          <volume>5</volume>
          (
          <year>2012</year>
          )
          <fpage>1328</fpage>
          -
          <lpage>1339</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>