<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>TriAL-QL: Distributed Processing of Navigational Queries?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Martin Przyjaciel-Zablocki</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexander Schatzle</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Adrian Lange</string-name>
          <email>lange@iig.uni-freiburg.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Freiburg</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>IIG Telematics, University of Freiburg</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Navigational queries are natural query patterns for RDF data, but yet most existing RDF query languages fail to cover all the varieties inherent to its triple-based model, including SPARQL 1.1 and its derivatives. TriAL* is one of the most expressive approaches but not supported by any existing RDF triplestore. In this paper, we propose TriAL-QL, an easy to write and grasp language for TriAL*, preserving its procedural structure. To demonstrate its feasibility, we provide a proof of concept implementation for Hadoop using Hive as execution layer and give some preliminary experimental results.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Introduction
? A substantially extended version will appear in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
TriAL* based on Hadoop. While TriAL* is a neat approach for querying RDF,
its algebraic notation is inappropriate for practical usage. Thus, we propose the
TriAL* Query Language (TriAL-Ql) that keeps the procedural structure
of TriAL* by representing each algebra operation with a SQL-like statement.
This way, even complex navigational queries are easy to grasp and write.
      </p>
      <p>Our major contributions can be summarized as follows: (1) We introduce
TriAL-QL, a query language for TriAL* with an intuitive mapping between
both. (2) Next, we provide an Hadoop-based implementation of TriAL-QL
using Hive. Optimizations include a precomputed 1-hop neighborhood, di erent
evaluation strategies and a carefully chosen storage layout. (3) Finally, we show
some preliminary experiments that demonstrate the scalability and feasibility of
our approach.
2</p>
      <p>
        TriAL-QL: A Procedural Query Language for RDF
The Triple Algebra with Recursion (TriAL*) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] is one of the most expressive
RDF query languages with polynomial complexity. In contrast to many other
approaches, TriAL* is a closed and hence compositional language, i.e. the
output is a set of triples rather than graphs or bindings. It works directly with
triples, which allows us to write queries that are not expressible using query
languages based on the standard graph model (e.g. regular path queries and nested
regular expressions ). TriAL* takes the relational algebra as its basis with some
restrictions to guarantee closure. The most important operator is a triple join
between two ternary relations E1 and E2 representing sets of triples, de ned as:
E1 ./i;;j;k E2, where i; j; k 2 fs1; p1; o1; s2; p2; o2g indicate the implicit projection
on three elds to keep the operation closed with s1 referring to the subject of
E1, etc. represents the join conditions whereas is a set of conditions
between objects and data values. To express paths of arbitrary length, recursion
is added with the right (e ./i;;j;k) and lef t (./i;;j;k e) Kleene closure, where
e is a TriAL* expression. Thus, a reachability query that asks for pairs (x; z)
which follow the connection pattern
corresponds to the TriAL* expression (E ./ so11;=ps12; o2 ) with E being a relation name
in a triplestore (cf. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] for more details).
      </p>
      <p>TriAL-QL. Next, we introduce the notation of TriAL-QL. The basic idea
is to atten the algebra expressions of TriAL* to a sequence of interrelated
statements. A complete grammar can be found on our project website 1. Table 1
shows the algebra of TriAL* with the corresponding syntax in TriAL-QL.
Each algebra operation is represented by exactly one SQL-like statement.
Accordingly, we can express the previously discussed reachability query by the
following TriAL-QL statement:</p>
      <p>SELECT s1; p1; o2 FROM E ON o1 = s2 USING right
1 http://dbis.informatik.uni-freiburg.de/forschung/projekte/DiPoS/</p>
      <p>
        Next, we consider a more complex reachability problem introduced in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
asking for pairs (x; z) which follow a connection pattern that requires reasoning
capabilities:
      </p>
      <p>TriAL*:
e1 = (E ./ sp11;=os22; o1 )
e2 = (e1 ./ so11;=ps12;; op21=p2 )</p>
      <p>+
TriAL-QL:
e1 = SELECT s1; o2; o1 FROM E</p>
      <p>ON p1 = s2 USING lef t
e2 = SELECT s1; p1; o2 FROM e1</p>
      <p>ON o1 = s2; p1 = p2 USING lef t
The inner expression e1 computes the transitive closure of the predicates from
E, while e2 computes the transitive closure of this resulting relation. Again, we
can see that TriAL-QL stays close to its TriAL* expression illustrating the
strength of a compositional language where the result of the rst statement can
be used as input for the second. This makes TriAL-QL queries easy to write,
understand and modify.</p>
      <p>Further, new operators can be added smoothly, meeting our requirements.
First, we extend the syntax of TriAL-QL with a STORE operation that enables
us to materialize the result of a TriAL* expression e as a new relation in a
triplestore. This way, not only the output but also intermediate results can be
stored for later processing, if desired. Next, as we focus on processing web-scale
RDF data, we have seen the need to introduce more control over the recursion
depth of the right and lef t Kleene operator. Therefore, we introduce a scalar
that replaces the Kleene Star and limits the number of join compositions. In
TriAL-QL this can be formulated within the USING clause by writing, e.g.,
lef t(4) for applying lef t Kleene four times.
3</p>
      <p>
        A Distributed Execution Engine for TriAL-QL
We implemented our execution engine on top of Hadoop using Hive as
intermediate layer rather than working directly with MapReduce. This makes us
independent of any Hadoop changes (e.g. Yarn) while taking advantage of continuously
optimized Hive versions or newer execution engines like Tez that come along
with the recent SQL-on-Hadoop trend. Due to very limited space restrictions,
we give only a brief introduction to our implementation. In short, a TriAL-QL
query is rst parsed to generate an abstract syntax tree and next mapped to
TriAL* which is in turn translated into HiveQL queries. We investigated
different evaluation strategies based on (1) linear and (2) nonlinear recursion as
introduced in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and performed exhaustive experiments with di erent storage
formats (e.g. RCFile, ORC, Parquet) and strategies (e.g. indices, partitions) to
identify best practices for storing RDF data in Hive. Further, a precomputed
1-hop neighborhood reduces the amount of required joins.
      </p>
      <p>
        Experiments. Some preliminary results are illustrated in Figure 1. We used the
Social Intelligence Benchmark (SIB) data generator2 to create social networks
of di erent sizes. The left hand side (a) shows execution times and number of
resulting triples for an exemplary query with three joins and one set operation
using linear evaluation. Both series exhibit an almost linear scaling behavior.
The right hand side (b) compares the linear to the nonlinear evaluation on a
more complex query involving the Kleene Star. In this example, the nonlinear
evaluation is superior to the linear one with increasing data size as it reduces the
amount of required join iterations from 12 (linear) to 8 (nonlinear). Exhaustive
experiments that include more advanced evaluation strategies are needed for
more comprehensive conclusions and are part of ongoing work [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. However, the
rst preliminary results already demonstrate the scalability and feasibility of our
approach.
      </p>
      <p>1;250
0</p>
      <p>Runtimes
Results
0
linear
nonlinear
200 400 600 1000 2000 200 400 600</p>
      <p>Fig. 1. (a) execution times vs. results (linear) (b) linear vs. nonlinear execution
2 http://www.w3.org/wiki/Social_Network_Intelligence_BenchMark</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Afrati</surname>
            ,
            <given-names>F.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Borkar</surname>
            ,
            <given-names>V.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carey</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polyzotis</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ullman</surname>
            ,
            <given-names>J.D.</given-names>
          </string-name>
          :
          <article-title>Map-reduce extensions and recursive queries</article-title>
          .
          <source>In: EDBT</source>
          <year>2011</year>
          , Sweden, March
          <volume>21</volume>
          -
          <fpage>24</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Angles</surname>
          </string-name>
          , R.:
          <article-title>A comparison of current graph database models</article-title>
          .
          <source>In: 28th ICDE Workshops</source>
          ,
          <year>2012</year>
          , Arlington,
          <string-name>
            <surname>VA</surname>
          </string-name>
          , USA, April 1-
          <issue>5</issue>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Arenas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gottlob</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pieris</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Expressive languages for querying the semantic web</article-title>
          .
          <source>In: PODS'14</source>
          ,
          <string-name>
            <surname>Snowbird</surname>
            ,
            <given-names>UT</given-names>
          </string-name>
          , USA, June 22-27,
          <year>2014</year>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Libkin</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reutter</surname>
            ,
            <given-names>J.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vrgoc</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Trial for RDF: adapting graph query languages for RDF data</article-title>
          .
          <source>In: PODS</source>
          <year>2013</year>
          , New York, NY, USA - June 22 - 27,
          <year>2013</year>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Przyjaciel-Zablocki</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , Schatzle,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Lausen</surname>
          </string-name>
          , G.:
          <article-title>TriAL-QL: Distributed Processing of Navigational Queries</article-title>
          .
          <source>In: 18th WebDB</source>
          <year>2015</year>
          , Melbourne, Australia (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>