<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Italian Symposium on Advanced Database Systems, June</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Understanding RDF Data Representations in Triplestores*</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Matteo Lissandrini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tomer Sagi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Torben Bach Pedersen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Katja Hose</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Aalborg University</institution>
          ,
          <country country="DK">Denmark</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>1</volume>
      <fpage>9</fpage>
      <lpage>22</lpage>
      <abstract>
        <p>Because of the flexibility and expressiveness of their model, Knowledge Graphs (KGs) have received increasing interest. These resources are usually represented in RDF and stored in specialized data management systems called triplestores. Yet, while there exists a multitude of such systems, exploiting varying data representation and indexing schemes, it is unclear which of the many design choices are the most efective for a given database and query workload. Thus, first, we introduce a set of 20 access patterns, which we identify within 6 categories, adopted to analyze the needs of a given query workload. Then, we identify a novel three-dimensional design space for RDF data representations built on the dimensions of subdivision, redundancy, and compression of data. This design space maps the trade-ofs between diferent RDF data representations employed to store RDF data within a triplestore. Thus, each of the required access patterns is compared against its compatibility with a given data representation. As we show, this approach allows identifying both the most efective RDF data representation for a given query workload as well as unexplored design solutions.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;RDF Data Management</kwd>
        <kwd>Knowledge Graphs</kwd>
        <kwd>Data Management Systems Design</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Knowledge Graphs (KGs) are nowadays already in widespread use because of their ability to
represent entities and their relationships in many domains [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. KGs are usually represented as
RDF data1. RDF models a graph as a set of subject-predicate-object triples, such that nodes serve
as subjects and objects and edge labels as predicates. RDF data is usually stored in specialized
data management systems called triplestores supporting declarative queries written in the
SPARQL language. In parallel to the growing interest in KGs, we have seen the emergence of
many diferent triplestore implementations, either from academic prototypes (e.g., RDF-3X),
community projects (e.g., JENA TDB), or commercial products (e.g., Virtuoso, and GraphDB).
Nonetheless, each system employs its own peculiar set of design choices, each appearing equally
compelling at a first glance. Here, we provide a better understanding of the pros and cons of the
various design choices w.r.t. the needs of a given use case and query workload. We specifically
focus on the existing solutions for storing, indexing, and querying RDF data. Our analysis
complements existing surveys [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ], which merely provide a taxonomy of the features ofered by
the systems (e.g., API and data loading facilities) or classify them according to their underlying
technology (e.g., relational versus native graph), but fail short in providing a detailed analysis of
the design space in which their internal architectures reside. At the same time, this study, also
complements existing eforts in improving the performances of these system by optimizing their
query processing techniques [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Hence, we first characterize the feature space for query access
patterns, i.e., the logical operations in a query that, given some input values, establish what kind
of output values must be returned. Then, we uniquely identify the defining characteristics of
existing RDF data representations within a unifying three-dimensional design space across three
dimensions: Subdivision, Compression, and Redundancy (SCR). The design space defines the
dimensions along which any storage system for RDF data must be designed. Thus, our approach
identifies the most suitable data representation for a given access pattern, as well as access
patterns that are currently not optimally supported by the existing representations. We thereby
make the following contributions: (i) a feature space of 20 access patterns within 6 categories
for SPARQL queries (Table 1); (ii) a design space for RDF data representations based on data
subdivsion, redundancy, and compression (Table 2), which allows analyzing the trade-ofs of
each design choice; and finally (iii) a thorough methodology to analyze how diferent design
choices impact system performance when answering access patterns with specific features. In
the following, we first introduce the core concepts regarding the RDF data model (Section 2) and
the features of the SPARQL query language (Section 3). Then, we present our SCR design space
(Section 4) and conclude explaining how it allows deriving a number of important findings
about the data representations adopted by existing systems (Section 5).
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Preliminaries: RDF &amp; SPARQL</title>
      <p>
        The RDF data model revolves around the concepts of RDF triples and RDF graph patterns. An
RDF triple, which corresponds to an edge in the graph, is a factual statement consisting of a
subject (), a predicate (), and an object (). Subjects and objects, which represent nodes, can
be resources identified by International Resource Identifiers (IRI), anonymous nodes identified
by internal IDs (called blank nodes), or literals (which can appear only as objects). Thus, an
RDF graph  is a set of RDF triples. An RDF graph is queried by issuing a SPARQL query whose
answers are computed based on the subgraphs matching the structural and logical constraints
it prescribes. SPARQL queries contain one or more basic graph patterns (BGPs), which are sets
of triples with zero or more of their components replaced with variables from an infinite set
of variables  . Thus, triple patterns are (, , ) triples, where any position may be replaced
by a variable ∈ . We refer to each value in each position as an atom. Moreover, there are
special types of patterns that match pairs of atoms connected through sequences of edges
satisfying a specific set of predicates, e.g., all nodes connected by an arbitrary sequence of edges
with ex:childOf predicates. These patterns are called property paths. They are defined via a
specialized form of expressions called path expressions (similar to regular expressions) and ofer
a succinct way to extend matching of triple patterns to paths of arbitrary length [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Finally,
SELECT ?event ?eventLabel
WHERE {
?event wdt:P31/wdt:P279* wd:Q1190554. ★✪
?event wdt:P585|wdt:P580 ?date. ★
?event rdfs:label ?eventLabel.
      </p>
      <p>FILTER(DATATYPE(?date) = xsd:dateTime).</p>
      <p>FILTER(?date &lt;= 2019-12-31 &amp;&amp; ?date &gt; 2019-12-01).</p>
      <p>FILTER(LANG(?eventLabel) = "en").</p>
      <p>Return Values
Constants PO
Constants P
Range Closed
Range Special
Pivot N-way S≡S
★ 1-hop SàO
★ 1-hop OàS
✪ SàP*àO
}
Figure 1: SPARQL query with annotated access patterns.
solutions of a query are the variables bindings, i.e., the mappings from each variable to a given
node or edge in the graph that satisfies the conditions in the BGP.</p>
      <p>In its simplest form, a SPARQL query has the form “SELECT ⃗ WHERE  ”, with  =
{1, ... , } being a set of triple patterns (as in Figure 1, the annotations are explained in
the next section). Optionally, one or more FILTER clauses further constrain the variables in
 . Let  denote the finite set of variables occurring in  , i.e.,  ⊂ , then ⃗ is the vector
of variables returned by the query such that ⃗ ⊆   . Additional operators such as UNION
or OPTIONAL allow more than one BGP in a query by defining the non-conjunctive semantics
of their combination. Finally, SPARQL queries can also make use of GROUP BY and aggregate
operators. SPARQL queries are declarative and are therefore designed to be decoupled from the
physical data access methods, i.e., the low-level algorithms and data structures for organizing
and accessing data. This decoupling allows specific triplestore implementations to use diferent
data representations and query processing techniques to execute a given query.</p>
    </sec>
    <sec id="sec-3">
      <title>3. SPARQL Access Patterns</title>
      <p>To identify the best internal representation for an RDF dataset and a given SPARQL query, we
decompose the query’s triple patterns into atomic operators and identify how they access a
given data representation. In particular, we identify a set of access patterns that enable the
required translation from the declarative query language to its logical query plan. Thus, an
access pattern is the set of logical operations that, given a set of variable bindings over a graph,
determine what data to access (and optionally to change) and what output to produce. The
feature space of access patterns (summarized in Table 1) is comprised of 6 dimensions, each
containing a set of alternative features, namely: Constants, the presence of constant values as
atoms (instead of variables); Filter, the presence (or absence) of a filter on a range of values;
Traversal, the connection of two nodes by means of one or more subsequent triple patterns;
Pivot, the atom connecting multiple triple patterns in a BGP; Return, the information expected
to be returned; Write, whether and how a change in the content of the graph is needed.
Consider the example in Figure 1, representing a query to retrieve location coordinates of
archaeological sites, and compare it to the access patterns in Table 1. We can recognize, for
instance, the property path wdt:P31/wdt:P279*, of the form p1/p2* o, meaning that it
would match two alternative access patterns: (1) 1-hop traversals over p1 = wdt:P31 reaching
the target node o=wd:Q1190554 directly, and (2) *-hop traversals starting with one edge for
wdt:P31 and reaching the object through a sequence of arbitrarily long paths matching the p2
= wdt:P279 triple pattern. On the other hand, the ?event variable is a 3-way pivot, which
can also be executed as a set of binary ≡  pivot access patterns. Similarly, we can also identify
range filters and the presence of constant values. Each of these access patterns can be efectively
implemented and answered by specific RDF data representations and indexes.</p>
    </sec>
    <sec id="sec-4">
      <title>4. RDF Data Representations in the SCR Space</title>
      <p>To support the data access patterns, triplestores implement a set of diferent data representations
storing all or a subset of the graph . A data representation  to answer the given access
pattern  needs to provide a way to retrieve the values for the bindings of the variables ⃗ from
the information stored in  (or to modify  for write operations). Therefore, given an access
pattern  and a data representation  of , we evaluate the performance of  in providing the
correct instantiations of ⃗ for . Moreover, we also evaluate how efective, in terms of time, it
is to execute  over , i.e., its cost. Answering such a question allows, given two distinct data
representations, to select the most appropriate one, i.e., the one with the lowest cost.</p>
      <p>
        Cost model. Answering a query using a particular data representation incurs a cost expressed
in terms of the time it takes to retrieve all such answers. Several such cost models have
been proposed for RDF [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ], but were tied to a simple two-tiered memory hierarchy and
assumed a relational data representation. While such assumptions are no longer applicable,
it still holds across novel (existing or future) machine architectures and memory types that
random seek operations incur a diferent cost than sequential read operations. Additionally, we
identify another set of costs derived from the diference between accessing compressed and
uncompressed data. That is, while the use of compression is employed to minimize memory
requirements and speed up the movement of large result sets, it incurs additional costs in
compression and decompression. Therefore, we follow the recent convention of employing a
a)
      </p>
      <p>?s ?p ?o
&lt;uri1&gt; &lt;p1&gt; &lt;o1&gt;
&lt;uri1&gt; &lt;p2&gt; &lt;o2&gt;
&lt;uri2&gt; &lt;p1&gt; &lt;o3&gt;
&lt;uri2&gt; &lt;p3&gt; &lt;o4&gt;
&lt;uri3&gt; &lt;p1&gt; &lt;o1&gt;
&lt;uri4&gt; &lt;p2&gt; &lt;o5&gt;
. . . . . . . . .
&lt;uri22&gt; &lt;p1&gt; &lt;o14&gt;
&lt;uri22&gt; &lt;p3&gt; &lt;o15&gt;
b)</p>
      <p>
        ?s
&lt;uri1&gt;
&lt;uri2&gt;
&lt;uri3&gt;
&lt;uri4&gt;
. . .
&lt;uri22&gt;
&lt;p1&gt;,&lt;o1&gt; &lt;p2&gt;,&lt;o2&gt;
&lt;p1&gt;,&lt;o3&gt; &lt;p3&gt;,&lt;o4&gt;
&lt;p1&gt;,&lt;o1&gt;
&lt;p2&gt;,&lt;o5&gt;
set of cost constants [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] diferentiating between the cost of random and sequential access and
between the cost of compressed and uncompressed access.
      </p>
      <p>Data Representation Compatibility. To assess if a data representation can eficiently
support a specific access pattern, we define the notion of compatibility. We define this notion
specifically for read operations, under the assumption that a read operation is required before
deletion (to locate the data to be deleted) and before insertion (to locate where to insert the data
and to avoid duplicates) and is therefore required for every type of access pattern. Therefore,
given an access pattern  and a data representation  able to answer , we assume that there
is a sequence of operations specified by an algorithm Γ defined over  that can compute an
answer , such that  = Γ(, ). The number and cost of the operations specified by Γ directly
determine whether  is a representation suitable for eficiently computing the answers to .
When measuring the number of operations required to be executed over , we distinguish
between random seek and sequential operations. In particular, an RDF data representation can
be seek compatible, sequence compatible, or selection compatible.</p>
      <p>For instance, one way to store  is as a list of 3-tuples (one per edge) sorted by subject and
then by predicate and object (Figure 2.a and 2.d) with a clustered B+ tree index over them. In this
representation, the query processing cost would be analogous to that of a relational table with
three attributes (, , ), all part of a primary index. This representation is sequence compatible
with any 1-hop access pattern that binds , both  and , or all three positions. That is, the
algorithm Γ(⟨s1,p1,?o⟩, {B+tree,sorted()}), would first find the first tuple performing ()
steps traversing the B+ tree looking for s1, p1, and then perform a linear scan over the file to
retrieve the remaining tuples. However, the B+ tree is not seek compatible because the time to
ifnd the first result depends on the size of the graph, i.e., it involves (||) seek operations to
reach the first result. A diferent data representation is to employ a key-value data structure
(similar to Figure 2b) and use the pair subject-predicate as the key, and the object as the value.
In this data structure, triples sharing the same  and  will store the list of objects contiguously.
This data structure is both seek and sequence compatible for a traversal that, given  and ,
retrieves all corresponding objects. Nevertheless, this representation is neither seek compatible
nor sequence compatible if the query requires all edges for predicate  regardless of . Thus,
the definitions of cost and compatibility presented above allow analyzing the advantages and
drawbacks of the diferent choices in each dimension of the design space.</p>
      <p>The Design Space Dimensions. When designing data representations for an RDF store, one
can model the design decisions over three axes: Subdivision, Compression, and Redundancy. Each
of these orthogonal axes, whose properties are summarized in Table 2, represents a continuum
along which the data representation adopted by a system can be positioned.</p>
      <p>The Subdivision axis determines how fragmented the data is. At one extreme, all data is
stored contiguously in a single structure; at the other extreme, each edge and node is stored as
a separate object with pointers to its neighbors. In between, there are various data structures
such as B+ trees or hash maps. This axis contains design decisions such as sorting, grouping,
and hashing. Each of these decisions creates an additional subdivision in the data. Increasing
the extent of subdivision, allows us to minimize the number of unneeded data items accessed
to answer an access pattern (resulting in fewer operations that return items not in the result
set). For instance, in a sorted file (Figure 2a), the sorting keys act as the simplest subdivision
by collecting all triples with the same value together. On the other hand, with a hash table
(Figure 2b), given the target IRI, the hash function separates all relevant triples sharing the same
key. Therefore, decisions across the subdivision axis easily determine whether a data structure
is selection compatible, i.e., if it binds all and only the answers within a specific subdivision, but
it also impacts random and sequence compatibility.</p>
      <p>
        The goal of Compression is to minimize the number of bits read to reach the first tuple in the
result set and the number of bits required to read and potentially store the result set for further
processing. The potentially negative impact of compression is, of course, the decompression
required to evaluate a predicate in the access pattern if the access pattern is incompatible or
partially incompatible. For example, consider a compressed data representation tuned for queries
of the form (, ?, ?) and ≡  access patterns (e.g., BitMat [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]). Using this representation
to answer an  * /+  pattern (i.e., is  reachable from  through edges labeled with ) would
require a potentially large number of row decompression operations to identify the edges that
satisfy the traversal condition. Moreover, decisions across the compression axis heavily impact
the selection compatibility when data that does not contribute to the answer are compressed
together with data relevant for an access pattern.
      </p>
      <p>The third axis is Redundancy which causes (redundant) copies of the data to be stored in the
system. By adding redundant data representations and indexes, it is possible to define ideal (seek,
sequence, and selection compatible) data representations for each access pattern. However,
this comes at the cost of having to store the same information multiple times. For example, by
holding both an SPO clustered index and a PSO clustered index, each triple is stored twice. Thus,
this hinders the compatibility with write access patterns. Moreover, design decisions including
full/partial replication need to find a trade-of between storage space and eficient support of
query access patterns. Hence, the cost of maintaining multiple representations is three-fold: (i)
Increased latency for delete and insert operations with possibly reduced performance of read
operations as well. (ii) An increase in space requirements, subsequently straining the limited
space in main memory and causing an increase in all read costs. (iii) An increase in query
optimization time because alternative structures to access the data result in a higher number of
query execution plans that have to be considered.</p>
    </sec>
    <sec id="sec-5">
      <title>5. The Capabilities of the SCR Space Analysis</title>
      <p>
        Thanks to the feature space for analyzing query workload access patterns, and the
SubdivisionCompression-Redundancy (SCR) design space of data representations for RDF databases, we
enable the analysis of RDF store design decisions, specifically, which data representations can
efectively support a given workload. Previous workload analyses are centered around those
characteristics that would impact mainly the query optimizer. Instead, we present an analysis
of the access patterns specific to the storage layer . The efect of various choices along the three
axes is summarized in Table 2. We performed an analysis of popular RDF store benchmarks
under these assumptions and identified a number of workload requirements for which systems
rarely provide efective optimizations. Considering the Subdivision dimension, many systems
implement dedicated tables for triples sharing a specific predicate. Alternatively, queries that
match triples with the same subject can usually exploit the clustered representation where
several properties for the same subject (or object) are stored together in a single row [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
Furthermore, a constant in  or  can be exploited by a hash index of the form  ↦→. Yet, no
representation exploits subdivision for a pair of constants, e.g.,  ↦→. Overall, we see that
there is a strong incompatibility between filters on literals and the fact that all representations
store literals and special types contiguously. We also perform a similar analysis also across
the Redundancy dimension. For instance, specialized indexes have been proposed to speed
up reachability and multi-hop path expressions (e.g., Sparqling kleene [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]). Yet, in existing
systems, data representations supporting the traversal operations required by k-hop and *-hop
access patterns are rarely employed. Our survey also reveals the absence of index Compression
solutions (e.g., [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]). For instance, we do not find any application of efective compression
solutions for secondary representations, such as counting indexes (e.g., sp→#) and structures
like bloom-filters [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] or compact LSM-tries like SuRF [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>
        Thus, researchers can use the SCR design space to design novel solutions in an informed
manner. Such designs can be achieved either manually or even semi-automatically as proposed
by the recent efort in self-designing data structures [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and self-organizing database systems
[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. By cross-referencing the access patterns in a query workload with SCR design options,
future RDF stores will be able to add components of the system to match the expected workload.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This research is supported by the European Union H2020 research and innovation program under
the Marie Skłodowska-Curie grant agreement No 838216 and the Poul Due Jensen Foundation.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Sagi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lissandrini</surname>
          </string-name>
          , T. B.
          <string-name>
            <surname>Pedersen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Hose</surname>
          </string-name>
          ,
          <article-title>A design space for rdf data representations</article-title>
          ,
          <source>The VLDB Journal</source>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>27</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Noy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Narayanan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Patterson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Taylor</surname>
          </string-name>
          ,
          <article-title>Industry-scale knowledge graphs: Lessons and challenges</article-title>
          ,
          <source>ACM Queue 17</source>
          (
          <year>2019</year>
          )
          <fpage>48</fpage>
          -
          <lpage>75</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lissandrini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mottin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hose</surname>
          </string-name>
          , T. B.
          <string-name>
            <surname>Pedersen</surname>
          </string-name>
          ,
          <article-title>Knowledge graph exploration systems: are we lost?</article-title>
          ,
          <source>in: CIDR</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lissandrini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brugnara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Velegrakis</surname>
          </string-name>
          ,
          <article-title>Beyond macrobenchmarks: Microbenchmarkbased graph database evaluation</article-title>
          ,
          <source>Proc. VLDB Endow</source>
          .
          <volume>12</volume>
          (
          <year>2018</year>
          )
          <fpage>390</fpage>
          -
          <lpage>403</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>K.</given-names>
            <surname>Nitta</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Savnik</surname>
          </string-name>
          ,
          <article-title>Survey of RDF Storage Managers</article-title>
          ,
          <source>in: Proceedings of the International Conference on Advances in Databases, Knowledge, and Data Applications</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>I. Abdelaziz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Harbi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Khayyat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kalnis</surname>
          </string-name>
          ,
          <article-title>A survey and experimental comparison of distributed sparql engines for very large rdf data</article-title>
          ,
          <source>Proc. VLDB Endow</source>
          .
          <volume>10</volume>
          (
          <year>2017</year>
          )
          <fpage>2049</fpage>
          -
          <lpage>2060</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>K.</given-names>
            <surname>Rabbani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lissandrini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hose</surname>
          </string-name>
          ,
          <article-title>Optimizing SPARQL queries using shape statistics</article-title>
          ,
          <source>in: EDBT</source>
          <year>2021</year>
          ,
          <article-title>OpenProceedings</article-title>
          .org,
          <year>2021</year>
          , pp.
          <fpage>505</fpage>
          -
          <lpage>510</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Harris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Seaborne</surname>
          </string-name>
          , SPARQL
          <volume>1</volume>
          .
          <article-title>1 Query Language</article-title>
          .
          <source>W3C Recommendation</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          ,
          <article-title>A relational algebra for SPARQL</article-title>
          ,
          <source>Technical Report</source>
          , HP Laboratories Bristol, Bristol, UK,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>F.</given-names>
            <surname>Frasincar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.-J.</given-names>
            <surname>Houben</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Vdovjak</surname>
          </string-name>
          , P. Barna,
          <string-name>
            <surname>RAL:</surname>
          </string-name>
          <article-title>An Algebra for Querying RDF</article-title>
          , WWW
          <volume>7</volume>
          (
          <year>2004</year>
          )
          <fpage>83</fpage>
          -
          <lpage>109</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Idreos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zoumpatianos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hentschel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Kester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <article-title>The data calculator: Data structure design and cost synthesis from first principles and learned cost models</article-title>
          ,
          <source>in: SIGMOD</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>535</fpage>
          -
          <lpage>550</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Atre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Srinivasan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Hendler</surname>
          </string-name>
          ,
          <article-title>Bitmat: A main-memory bit matrix of RDF triples for conjunctive triple pattern queries</article-title>
          ,
          <source>in: ISWC (Posters &amp; Demonstrations)</source>
          ,
          <year>2008</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>2</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Bornea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dolby</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kementsietsidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Srinivas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dantressangle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Udrea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bhattacharjee</surname>
          </string-name>
          ,
          <article-title>Building an eficient RDF store over a relational database</article-title>
          ,
          <source>in: SIGMOD</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gubichev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Bedathur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Seufert</surname>
          </string-name>
          ,
          <article-title>Sparqling kleene: fast property paths in RDF-3X</article-title>
          ,
          <source>in: Workshop on Graph Data Management Experiences and Systems</source>
          , GRADES,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. G.</given-names>
            <surname>Andersen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pavlo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kaminsky</surname>
          </string-name>
          , L. Ma, R. Shen,
          <article-title>Reducing the storage overhead of main-memory OLTP databases with hybrid indexes</article-title>
          ,
          <source>in: SIGMOD</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>D.</given-names>
            <surname>Ficara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Giordano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Procissi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Vitucci</surname>
          </string-name>
          ,
          <article-title>Multilayer compressed counting bloom iflters</article-title>
          ,
          <source>in: Proceedings of the 27th Conference on Computer Communications</source>
          , IEEE,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Leis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. G.</given-names>
            <surname>Andersen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kaminsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Keeton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pavlo</surname>
          </string-name>
          , Surf:
          <article-title>Practical range query filtering with fast succinct tries</article-title>
          ,
          <source>in: SIGMOD</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>323</fpage>
          -
          <lpage>336</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Pavlo</surname>
          </string-name>
          , G. Angulo,
          <string-name>
            <given-names>J.</given-names>
            <surname>Arulraj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          , L. Ma, P. Menon,
          <string-name>
            <given-names>T. C.</given-names>
            <surname>Mowry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perron</surname>
          </string-name>
          , I. Quah,
          <string-name>
            <given-names>S.</given-names>
            <surname>Santurkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tomasic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Toor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. V.</given-names>
            <surname>Aken</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Xian</surname>
          </string-name>
          , T. Zhang,
          <article-title>Self-driving database management systems</article-title>
          ,
          <source>in: CIDR</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>