How to bring some MAGIC to SPARQL
                                                 Work in progress Paper
                                                        Christina Ehrlinger
                                                           University Passau
                                                               Germany
                                            Christina.Ehrlinger@uni-passau.de


ABSTRACT                                                              which focus on this like the RDF-3X triple store [7]. Second,
For SPARQL, the query language to retrieve information                we can rewrite the SPARQL query in order to optimize the
from an RDF graph, the optimization of the order of the               order of the triples.
containing triples is a challenging topical issue. In this paper,       The same kind of optimizations took place for SQL. There
we show an approach which passes the information about the            are highly performant database systems which also considers
already found binding for a variable to all other occurrences         the storage of the relations as well as techniques to optimize
of this variable in combination with a cost model in order to         the order of the joins, which have to be done in order to
minimize the execution time. Different to other approaches            execute SQL queries with more than one relation.
our approach as described in this paper can be applied to
a SPARQL query, which not only consists of basic graph                  The aim of this approach is not an optimization, which
pattern, but also contains group graph pattern like FILTER            perfectly fits into a self-designed triple store. It is a generic
or OPTIONAL. Our experiments show the applicability of                way how to reorder SPARQL queries in order to use their
our approach and function as preliminary proof of concept.            full potential according to their execution time.

                                                                         The rest of the paper is organized as follows. Section 2
1.     INTRODUCTION                                                   describes the problem statement for the reordering of the
   The structure of the web changes from linked documents,            triples and states a motivational example. Section 3 pres-
which contain the information, to directly link the informati-        ents our approach for using the sideways information pas-
on using the keyword Linked Open Data. This direct linking            sing first for a query containing a basic graph pattern and
of information generates a huge number of quite big graphs.           furthermore the adaption of the approach in order to use
In order to handle these graphs, the Resource Descripti-              other components besides to a basic graph pattern, for ex-
on Framework RDF 1 is one of the used standard models                 ample FILTER. Section 4 shows the experiments which were
for data interchange on the Web. It uses Uniform Resource             executed so far. Section 5 discusses related work.
Identifiers (URIs) to refer to resources in order to connect
them. The resulting structure is a directed, labeled graph,           2.   PROBLEM STATEMENT
where the edges represent the connection between the re-                 In order to present our approach for reordering a SPAR-
sources. These RDF graphs are generated very easily due to            QL query, we first have a closer look at a simple SPARQL
the absence of a schema. There exist specialized databases            query and discuss the impact of different orderings. Consi-
for storing these RDF graphs, so-called triple stores 2 .             dering the SPARQL query in Listing 1. This query can be
   The SPARQL query language [8] is used to access the                executed against a dataset generated using the Lehigh Uni-
information stored in these RDF graphs. These SPARQL                  versity Benchmark (LUBM) data generator [4]. It consists
queries must be performant in order to find the desired in-           of 5 triples, whereas triple t5 also contains a literal.
formation in these huge graphs. For optimizing the execution
time of a SPARQL query we can consider two different ap-                   SELECT ∗
proaches. First, we can design a triple store, which optimizes             WHERE {
the storing or the usage of indices. There are also approaches                 ?p : t e a c h e r O f ? c .            #t 1
                                                                               ? s : takesCourse ?c .                  #t 2
1
    https://www.w3.org/RDF/                                                    ? s : a d v i s o r ?p .                #t 3
2
    https://www.w3.org/wiki/LargeTripleStores                                  ? s : memberOf ?d .                     #t 4
                                                                               ?d : name ’ Department0 ’ .             #t 5
                                                                           }

                                                                            Listing 1: Example for a SPARQL query


31st GI-Workshop on Foundations of Databases (Grundlagen von Daten-
banken), 11.06.2019 - 14.06.2019, Saarburg, Germany.
Copyright is held by the author/owner(s).
   While executing the query as shown in Listing 1 using            In the following chapter, we present an approach in or-
the common triple store GraphDB 3 , the query has an ave-         der to use the sideways information passing for SPARQL
rage execution time of 4606 ms while executing ten times.         queries.
If we slightly change the order of the triples, we get the
query shown in Listing 2, whereas this query has an avera-        3.     USING SIDEWAYS INFORMATION PAS-
ge execution time of 12283 ms (executed ten times). Even
for this simple query executed against a quite small dataset
                                                                         SING FOR SPARQL QUERIES
of approximately 130 000 triples we see the huge impact of          In order to optimize a SPARQL query regarding the SIP,
reordering the triples of a SPARQL query.                         we can represent the query as a graph to visualize the possi-
                                                                  ble ways to pass the bindings. In Figure 1 we see the resulting
       SELECT ∗                                                   graph for the SPARQL query in Listing 1.
       WHERE {                                                      Every node represents a triple pattern of the query. Two
           ? s : takesCourse ?c .              #t 2               nodes are connected with an undirected edge if they share
           ?p : t e a c h e r O f ? c .        #t 1               a common variable. Conceptually, the representation allows
           ? s : memberOf ?d .                 #t 4               us to determine the best order of the triples by transforming
           ? s : a d v i s o r ?p .            #t 3               the undirected graph in a directed graph using the approach
           ?d : name ’ Department0 ’ .         #t 5               described in this paper, whereas the directed edge between
       }                                                          node t1 and node t2 denotes that t1 is in the ranking order
                                                                  before t2.
Listing 2: Reordered SPARQL query from Listing 1
  We also find examples of reordering which lead to a high                               t2
improvement regarding the execution time. In Listing 3 we
again have the same SPARQL query apart from the ordering                   t1                            t4            t5
of the triples. This query has an average execution time of
325 ms.
                                                                                         t3
       SELECT ∗
       WHERE {
           ?d : name ’ Department0 ’ .         #t 5
           ? s : memberOf ?d .                 #t 4                     Figure 1: Query graph for query in Listing 1
           ? s : a d v i s o r ?p .            #t 3
           ? s : takesCourse ?c .              #t 2                  Another way to visually represent the query is shown in
           ?p : t e a c h e r O f ? c .        #t 1               Figure 2. Here we represent every subject and object posi-
       }                                                          tion as a node, no matter if this is a variable, an IRI or a
                                                                  blank node. For the triple pattern
Listing 3: Reordered SPARQL query from Listing 1
                                                                                       ?s takesCourse ?c
   The mechanism of reordering to get a more efficient query      we get two nodes representing the variables ?s and respec-
is not a new idea. This was done in Datalog [5] while apply-      tively ?c and they are connected with a directed edge, which
ing the Magic Set Transformation [1] to a Datalog program         represents the predicate takesCourse between these two va-
in order to sort the subgoals of a rule and the same idea         riables. The edge is directed from the node representing the
was also used while applying the join order optimization to       subject of the triple to the node representing the object of
a SQL query in a relational database system [11].                 the triple. For the triple ?s takesCourse ?c, the edge is di-
                                                                  rected from the node representing the variable ?s to the node
   Especially for Datalog and SPARQL, the ordering of sub-        representing the variable ?c.
goals or respectively triples has a huge impact on the va-
riables. The term Sideways Information Passing, short SIP,
was used for this in Datalog and it describes how the bin-                ?c    teacherOf        ?p             ’Department0’
dings of variables are passed from one subgoal to another
subgoal. Depending on the order of the subgoals it results in                                                        name
                                                                    takesCourse       advisor
a variety of different SIP possibilities, called SIP strategies
[10]. For our approach, we adopt this term for SPARQL. In
the context of SPARQL, it describes the way the binding of                ?s                  memberOf                ?d
the variables is passed between the triples.

   Consider the query from Listing 3 again. We start with the
triple ?d :name ’Department0’. This generates bindings for             Figure 2: SPARQL graph for query in Listing 1
the variable ?d, which are passed to the second triple ?s
:memberOf ?d. Again this generates bindings for the varia-          This kind of representation describes the pattern we are
ble ?s, which is passed to the triples ?s :advisor ?p and         looking for while executing the query against a dataset.
?s :takesCourse ?c.                                                 The approach described in this paper focuses on how to
                                                                  rewrite the query with a more efficient ordering of the triples.
                                                                  This is done based on the query graph in order to optimize
3
    https://www.ontotext.com/products/graphdb/                    the sideways information passing.
  It consists of the following four steps:                                 a     takesCourse       b    takesCourse     c

  1. Generate query graph
                                                                       teacherOf               takesCourse        takesCourse
  2. Transform undirected graph into directed graph
  3. Determine order of the nodes according to the directi-                                                             e
                                                                           d                     advisor
     on of the edges
  4. Generate optimized SPARQL query                                                                              takesCourse
                                                                       teacherOf     advisor
  According to the different elements of a SPARQL query,
we have a closer look at the rearrangement of a query only                                         g
                                                                            f                                           h
consisting of a basic graph pattern and in a second step, we                       takesCourse
also consider all possible group graph patterns of a SPARQL
query.
                                                                    Figure 3: small sample instance graph as generated
3.1    Basic Graph Pattern Query                                    by LUBM
   Basic graph patterns (BGP) are sets of triple patterns
[8]. The most simple SPARQL query consists of exactly one
triple pattern:                                                     for a node if this node has at least one outgoing edge of
                                                                    this property. In general, the outgoing value for property p
                           ?s ?p ?o                                 determines the average number of awaiting bindings for the
   One example of a query which consists only of a basic            object position while the subject position already has a bin-
graph pattern but more than one triple pattern is shown in          ding, whereas the incoming value determines the average
Listing 1. As described before, in the first step we generate       number of awaiting bindings for the subject position while
the corresponding query graph for the SPARQL query from             the object position already has a binding.
Listing 1 as seen in Figure 1. During the second step, we              We also have to consider the cost model in order to choose
convert the undirected query graph into a directed graph.           the first node, if the query contains only triples of the form
In the following, we describe how this is achieved. Similar to      ?s p ?o, where only the predicate position p is bound and
Datalog, while performing the Sideways Information Passing          not a single literal or IRI is part of the query.
as a preliminary step before the Magic Set Transformation              In Figure 3 we see a small instance graph as an example of
[1], we try to find Literals or IRIs in the query to get a          the generated RDF graphs by LUBM. The shape of the no-
starting point for generating bindings for the variables. In        de represents the type of the node. A rectangle represents a
general, we are looking for the node with the highest num-          student, a triangle represents a professor and a circle repres-
ber of bound positions (subject, predicate, object). In our         ents a course. The outgoing and incoming values resulting
example, the only literal is the name of the department, so         from the small example can be seen in Table 1.
the triple, which contains this literal, will be the first triple
according to the ranking order.                                                 property          outgoing   incoming
   As explained before, this generates a binding for the va-                    advisor           1          2
riable ?d. This binding is passed to all triples, which also                    takesCourse       2          1.5
contain the variable ?d, no matter, if it occurs in the sub-                    teacherOf         2          1
ject or object position. In the query graph, this passing is
visualized by adding an outgoing direction to all undirected        Table 1: Outgoing and incoming values calculated
edges of the current node, which corresponds to the current         based on the instance graph in Figure 3
triple. The search for a node, where the corresponding triple
has the highest number of bound positions and the subse-               For a better understanding, we have a closer look at the
quent adding of directions to edges are repeated until all          calculations for the outgoing and incoming values. Consider
edges are directed.                                                 the property takesCourse. We sum up all edges with the la-
                                                                    bel takesCourse. Node e has three edges, the node b has two
   This approach works fine for SPARQL queries, where the           edges and the node g has one edge with the label takesCour-
corresponding query graph is basically a path. In our exam-         se. Overall we have six edges with this label. To determine
ple query, we can follow the path until node t4. Now more           the outgoing value we divide this sum by the number of no-
than one node is affected by adding the direction of an ed-         des, which have an outgoing edge with this label (in this
ge and we have to decide which of the two nodes t2 and t3           example three nodes). Overall we get an outgoing value of 2
should be processed next. In order to solve this, we have in-       for the property takesCourse. Similar calculations are done
troduced a cost-based model to pick the best next node out          for the incoming value for the property takesCourse. Now
of all possible nodes.                                              we consider all nodes, which have an incoming edge with
   The chosen cost model takes into account the average oc-         this label. Node a has two edges as well as node c. Both
currence of every property but in addition, also considers          nodes f and h have one edge. Again the number of edges
the position of the common variable. For every property p           is divided by the number of nodes in order to calculate the
we determine two different numbers, so-called outgoing and          average amount. Overall we get an incoming value of 1.5 for
incoming value. The naming of these values origins in the           the property takesCourse.
SPARQL graph representation. The outgoing value descri-                So we have two different values for every property. Due to
bes the average amount of outgoing edges of this property           this distinction, we take into account the sideways informa-
tion passing to the subject or rather the object position.               t1             t2
   Using this cost model we can now determine which node
to choose in our example from Figure 1. After handling the
nodes t5 and t4 we have to choose between the nodes t2 and                                            t4             t5
t3. In both nodes the corresponding triple contains the al-
ready bound variable in the subject position, so we compare
the outgoing values of both predicates as seen in Table 1 and            t6             t3
choose the smaller one in order to minimize the intermitted
results. Because advisor has an outgoing value of 1 while
takesCourse has an outgoing value of 2, we choose node t3        Figure 4: Query Graph for query in Listing 4 after
to be the subsequent processed node.                             processing the nodes t5, t4 and t3
   Based on the number of bound positions and if equivocal
based on the presented cost model we are able to transform
the undirected query graph into a directed graph. In the         we can imagine that in general, it can not result in the very
third step, we can use topological sorting for extracting the    best execution time if the filter is placed at an arbitrary
resulting order of the nodes from the directed graph. In the     position. Based on the experiments in section 4 we were able
last step, we map the node to the corresponding triple in        to substantiate the assumption to place the filter after all
order to get the optimized SPARQL query.                         triples, which have at least one common variable with the
                                                                 filter condition, were processed. Based on the query graph,
3.2   Handling FILTERs                                           all edges of the consuming node have to be directed in order
   Until now we have considered SPARQL queries which con-        to be the next node to be processed.
sist of a basic graph pattern or in other words of a set of         Considering our example query graph in Figure 4 we have
triple patterns. On the one hand, these triple patterns gene-    already processed the nodes t5, t4 and t3. In the current
rate bindings for variables. On the other hand, they are also    state we have to choose between node t2, node t1 or the
consuming the binding of variables for the subject position      consuming node t6. As described before, the consuming no-
in order to generate a binding for the object position or vice   de can only be a candidate, if all edges of the consuming
versa. Overall a triple pattern can generate and can consu-      node were turned into incoming edges in order to get the
me bindings for variables. Things are slightly different as it   bindings from all relevant triples. So we have to choose bet-
comes to FILTERs in SPARQL. SPARQL FILTERs restrict              ween t2 and t1. This is done using the approach presented
solutions to those for which the filter expression evaluates     before. In doing so, we first choose t2, afterwards t1. Now
to TRUE [8]. So a FILTER can only consume bindings for           all edges of the consuming node are directed and we can
the variables, but can not generate any bindings. In order to    add the filter to the order of the so far processed triples. If
handle this, we have to slightly adapt our present approach.     there would be any undirected edge in the query graph, we
While generating the query graph, the approach stays the         just continue using our approach. Overall this handling of
same, so for every filter in the query, we add a new node        a filter in a SPARQL query does not mean we always add
to the query graph next to the nodes for every triple. In        the filter at the end of the query. Considering the following
order to distinguish the node for a triple and the node for      filter regarding the variable ?s:
a filter, the node for a filter is drawn with a bold line. For
a better differentiation, we also name the node for the filter         FILTER(?s != http://www.Department0. Uni-
consuming node. The meaning of the edges stays the same.               versity0.edu/GraduateStudent29)
Consider the following SPARQL query, which is the same
SPARQL query as in Listing 1 extended with one FILTER              This filter would be added in the current order after we
expression:                                                      have processed all triples containing the variable ?s.
                                                                   In summary, we have adapted our approach to handle the
SELECT ∗                                                         FILTER construct of a SPARQL query.
WHERE {
    ?p t e a c h e r O f ? c .                     #t 1          3.3    Handling Group Graph Patterns next to
    ? s takesCourse ?c .                           #t 2                 FILTER
    ? s a d v i s o r ?p .                         #t 3             Next to a FILTER, there are many more possibilities to
    ? s memberOf ?d .                              #t 4          write a group graph pattern in SPARQL, for example OP-
    ?d name ’ Department0 ’ .                      #t 5          TIONAL or SERVICE. Also these elements can consume
    FILTER( ? p !=                                               bindings similar to a FILTER, but considering only the trip-
         <h t t p : / /www. Department0 .                        les contained in a group graph pattern, these triples also
          U n i v e r s i t y 0 . edu /                          generate bindings for each other.
          AssistantProfessor4 > )                  #t 6             Consider the SPARQL query in Listing 5 with a BGP of
}                                                                three triples and an OPTIONAL clause with two triples.
                                                                 The idea is to handle those queries on different abstracti-
Listing 4: extended SPARQL query from Listing 1                  on levels. While reordering the triples which are contained
                                                                 directly in the WHERE clause, the group graph pattern is
  The corresponding query graph is shown in Figure 4.            abstracted into one single node. On this basis, we can con-
  While transforming the query graph from an undirected          sider the triple patterns and group graph patterns inside of
graph into a directed graph, we use the same approach as         the current group graph pattern.
discussed before. The only change regards the handling of
these consuming nodes. From a naive standpoint of view,
SELECT ∗                                                          the execution times regarding all permutations of the query
WHERE {                                                           presented in Listing 1 depicted in Figure 6. For this example,
    ? s a d v i s o r ?p .                           #t 1         our approach achieves the best execution time by on average
    ? s memberOf ?d .                                #t 2         of 325 ms. All permutations were executed 10 times in order
    ?d name ’ Department0 ’ .                        #t 3         to reach a good average.
    OPTIONAL{                                        #O
         ? s : takesCourse ?c .               #t 4                                                           20                                             20
         ?p : t e a c h e r O f ? c .         #t 5                                          20
    }


                                                                  numbers of permutations
}
                                                                                            15                    14
     Listing 5: SPARQL query with OPTIONAL
   During the abstraction, the query graph for the first level
contains four nodes. The nodes t1, t2 and t3 represent the                                                                                 10
                                                                                            10           9             9
corresponding triple pattern while the node O represents the                                                                                            8
OPTIONAL clause as shown in Figure 5. Based on this,                                                                                   7
we consider a query graph containing the triples t4 and t5,                                                                        6
                                                                                                                               5
which are inside of the OPTIONAL. The incoming edges                                         5                             4
for the second query graph represent the information about                                         3 3
                                                                                                                                                    2
already bound variables from outside the OPTIONAL as
shown in Figure 5.                                                                                                                              0
                                                                                             0
                                                                                                    00  00  00  00  00   00   00   00
                      t1                     t4                                                  0-5 <20 <40 <60 <80 <100 <120 >150
               t3             O                                                                               execution time in ms
                      t2                     t5
                                                                  Figure 6: Distribution of the execution times in re-
                                                                  lation to the permutations of the query in Listing 1
Figure 5: Query Graphs for the query in Listing 5
for the two different abstraction levels                            This test setup was executed with several different queries
                                                                  but the overall picture is always the same. In Table 2 we see
   Until now we have only considered queries, which have at       a comparison between some of the LUBM queries regarding
most one OPTIONAL group graph pattern. This approach              the best possible execution time for one of the permutati-
is also applicable if a SPARQL query has more than one            ons compared with the execution time achieved while exe-
group graph pattern. If this is the case, we generate a con-      cuting the permutation derived from our approach. In the
suming node for every group graph pattern and use the same        last column, we listed the percentage of permutations of the
approach as described before. We only have to extend the          respective query, which had the same or smaller execution
mechanism to handle several consuming nodes as potential          time than the derived permutation from our approach.
next nodes. In order to get the best possible order, we use
a preference ranking for the different kind of group graph                                            best             achieved
patterns. So for example, a FILTER is ranked above an OP-               LUBM                                                                % of faster
                                                                                                    execution          execution
TIONAL. This approach is quite generic because a SPARQL                 Query                                                              permutations
                                                                                                      time               time
query can also be nested using an arbitrary number of le-                                    1        11 ms              11 ms                     -
vels. Conceptually handling these elements like OPTIONAL                                     2         7 ms              16 ms                  0,03056
should be possible using our presented approach as stated                                    7        15 ms              15 ms                     -
before. As described in section 4 we will need more experi-                                  9         4 ms               7 ms                  0,07778
ments and tests to show the applicability of our approach                                   12         3 ms              39 ms                  0,58334
for queries with group graph patterns like OPTIONAL.                                        13         4 ms               4 ms                     -

4.   EXPERIMENTS                                                  Table 2: Comparism between best possible executi-
   In this section, we present the experiments carried out        on time and achieved execution time using our ap-
to test our approach and to get an idea if this approach is       proach for some LUBM queries
promising. We used the LUBM data generator [4] to generate
a dataset of approximately 130 000 triples. These triples            For LUBM query 2 our approach as presented in this pa-
are stored in GraphDB. One of the first experiments was to        per achieves an execution time of 16 ms, whereas the best
execute all possible permutations of the BGP of a SPARQL          permutation achieves an execution time of 7 ms. Based on
query to get an idea about the different execution times          these two numbers, it seems that our approach does not per-
regarding the different orders of the triples. In addition, the   form well. If we add the information, that this query has 6
query was reordered using our presented approach. For every       triples, so 720 possible permutations and only approximately
query tested so far the reordered query was in the forefront      3% of these permutations have the same or better executi-
of all execution times from the permuted queries. To give an      on time, our approach is in a better position. For some of
idea how the execution times can differ, the distribution of      the queries, as LUBM query 7 with 4 triples, our approach
derived the best possible permutation based on the executi-         dataset showed the impact on the execution time based on
on time. These experiments also showed some examples like           the order of the triples. Also, these experiments functioned
LUBM query 12, where our approach does not determine a              as a preliminary proof of concept for our approach. While
good permutation of the query. These queries will be used           testing the execution time of all possible permutations of the
to improve our cost model.                                          triples, the order of triples, our approaches chooses, was in
  Based on the assumption the approach is using a good              the forefront of the smallest execution times.
heuristic, we have also examined the impact of the filter pla-         As described in the title, this is a work in progress paper.
cement. In order to do this, we used our approach to reorder        Therefore these experiments are not the top of the flagpole.
the BGP of the query, added the filter statement at every           The next steps will include tests with bigger datasets genera-
possible position and compared the execution times again.           ted with the LUBM dataset generator as well as tests using
Also for this test setup, the experiment matches with the           data and test queries from the Berlin SPARQL Benchmark
expectations about placing the filter after all triples which       (BSBM) [2]. While the test queries from LUBM are quite
contain at least one of the variables of the filter. In Table 3     short, the queries tested in BSBM contain more triples and
we see the impact of the filter placement.                          also more complex patterns like FILTERs or OPTIONAL
                                                                    clauses. Another benchmark which provides more complex
              filter placement    execution time                    test queries than LUBM is the SP 2 Bench [9], which we want
                   before t5          289 ms                        to use as a source for test queries.
                    after t5          279 ms                           In order to improve the reordering based on the current
                    after t4          281 ms                        approach we want to take into account semantic information.
                    after t3          277 ms                        This will be part of our future work.
                    after t2          277 ms
                    after t1          274 ms                        7.   REFERENCES
                                                                     [1] C. Beeri and R. Ramakrishnan. On the power of
Table 3: Impact of the FILTER placement for the                          magic. In Proceedings of the Sixth ACM
enlarged SPARQL query in Listing 4                                       SIGACT-SIGMOD-SIGART Symposium on Principles
                                                                         of Database Systems, PODS ’87, pages 269–284, New
   Using our approach for the query as shown in Listing 4 we             York, NY, USA, 1987. ACM.
place the filter after the triple t5, which is also the best po-     [2] C. Bizer and A. Schultz. The berlin sparql benchmark.
sition according to all possible positions as seen in Table 3.           Int. J. Semantic Web Inf. Syst., 5:1–24, 2009.
During our experiments, we have observed the impact of the           [3] A. Gubichev and T. Neumann. Exploiting the query
filter condition itself, but explicitly considering the conditi-         structure for efficient join ordering in sparql queries.
on of the filter will be part of our future research. Overall            In EDBT, pages 439–450. OpenProceedings.org, 2014.
our performed experiments showed the applicability of our            [4] Y. Guo, Z. Pan, and J. Heflin. Lubm: A benchmark
approach.                                                                for owl knowledge base systems. J. Web Sem.,
                                                                         3(2-3):158–182, 2005.
5.   RELATED WORK                                                    [5] J. W. Lloyd. Foundations of Logic Programming.
    In general, query optimization is a well-established area            Springer-Verlag, Berlin, Heidelberg, 1984.
especially for SQL, but regarding SPARQL queries it is a             [6] M. Meimaris and G. Papastefanatos. Distance-based
quite current topic. In the following, we shortly describe so-           triple reordering for SPARQL query optimization. In
me different approaches, which use different methodologies               33rd IEEE International Conference on Data
than our approach presented here. The Characteristic Set                 Engineering, ICDE 2017, San Diego, CA, USA, April
approach [3] uses dynamic programming based algorithm on                 19-22, 2017, pages 1559–1562. IEEE Computer
a precomputed hierarchical structure, which allows determi-              Society, 2017.
ning the best order of the triples. In [6] they translate a query    [7] T. Neumann and G. Weikum. Rdf-3x: a risc-style
into a multidimensional vector space and perform distance-               engine for rdf. PVLDB, 1(1):647–659, 2008.
based optimization by considering the relative differences           [8] E. Prud’hommeaux and A. Seaborne. SPARQL Query
between the triple patterns. Comparing different selectivity             Language for RDF. W3C Recommendation, 2008.
estimations for a SPARQL query was done in [12], whereby             [9] M. Schmidt, T. Hornung, G. Lausen, and C. Pinkel.
this approach focusses only on BGP queries. It describes dif-            Sp2bench: A SPARQL performance benchmark.
ferent ways to compute heuristics for the optimization but               CoRR, abs/0806.4627, 2008.
does not consider the structure of the triple. Our approach         [10] S. Sippu and E. Soisalon-Soininen. Multiple sip
fills this gap between using statistical data and taking the             strategies and bottom-up adorning in logic query
structure of the query as well as the triple into account.               optimization. In ICDT, 1990.
                                                                    [11] M. Steinbrunn, G. Moerkotte, and A. Kemper.
6.   CONCLUSION AND FUTURE WORK                                          Heuristic and randomized optimization for the join
  In this paper, we have presented our approach for opti-                ordering problem. The VLDB Journal, 6(3):191–208,
mizing a SPARQL query by means of using the sideways                     Aug. 1997.
information passing. We show how to use our approach for            [12] M. Stocker, A. Seaborne, A. Bernstein, C. Kiefer, and
a SPARQL query only consisting of a basic graph pattern.                 D. Reynolds. Sparql basic graph pattern optimization
Based on this we showed how to adapt the approach in or-                 using selectivity estimation. In Proceedings of the 17th
der to handle queries, which also contain FILTERs or other               International Conference on World Wide Web, WWW
group graph patterns. The experiments based on the LUBM                  ’08, pages 595–604, New York, NY, USA, 2008. ACM.