<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>In-Memory Dictionary-Based Indexing of oted RDF Triples</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ruben Taelman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ruben Verborgh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>RDF-Star</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Indexing</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IDLab, Department of Electronics and Information Systems, Ghent University - imec Ghent</institution>
          ,
          <country country="BE">Belgium</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>e upcoming RDF 1.2 recommendation is scheduled to introduce the concept of quoted triples, which allows statements to be made about other statements. Since quoted triples enable new forms of data access in SPARQL 1.2, in the form of quoted triple paerns, there is a need for new indexing strategies that can efficiently handle these data access paerns. As such, we explore and evaluate different in-memory indexing approaches for quoted triples. In this paper, we investigate four indexing approaches, and evaluate their performance over an artificial dataset with custom triple paern queries. Our findings show that the so-called indexed quoted triples dictionary vastly outperforms other approaches in terms of query execution time at the cost of increased storage size and ingestion time. Our work shows that indexing quoted triples in a dictionary separate from non-quoted RDF terms achieves good performance, and can be implemented using well-known indexing techniques into existing systems. erefore, we illustrate that the addition of quoted triples into the RDF stack can be achieved in a performant manner.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>triples in their triplestore, and enable queryable access using SPARQL. Even though some approaches
offer reports of their systems passing RDF-star specification tests, and provide high-level documenta‐
tion explaining the concepts of quoted triples for end-users, none of them offer detailed descriptions of
their indexing approach, or query performance evaluations over them. As such, there is an open knowl‐
edge gap on the research question “How to index quoted triples, and what is the impact on ingestion,
storage, and query performance?”.</p>
      <p>
        To fill this gap, we explore four different indexing approaches, implement them in a single system for
fair comparison, and evaluate them in terms of ingestion time, storage size, and query performance. We
focus on in-memory indexing, but approaches are generalizable to mixed disk/memory approaches such
as HDT  [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Like many RDF indexing approaches  [10, 11], we focus on providing indexed access for
triple paern queries, as these form the basis for answering full SPARQL queries.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2.  Related Work</title>
      <p>
        Given the recent introduction of RDF-star and the concept of quoted triples, scientific literature on
making use of it is limited. First, several use cases  [6, 7] have been explored using quoted triples.
Furthermore, a declarative language called RML-star  [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] has been introduced that allows heteroge‐
neous datasources such as CSV and JSON to be mapped into RDF datasets containing quoted triples.
is language is an extension of the RML language  [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], and has been implemented in
MorphKGCstar  [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Similarly, RSP-QL*  [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] was introduced as an extension to the RSP-QL model  [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] for
RDF Stream Processing to support quoted triples. Finally, two approaches  [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] are identified to trans‐
form RDF datasets containing quoted triples into a property graphs model [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        RDF-star is seeing wide adoption among SPARQL implementations, for which a full list of implemen‐
tations that adhere to the RDF-star community group specification can be found in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Unfortunately,
none of these approaches clearly document their storage and indexing approach, which motivates the
need for this article on comparing various indexing techniques.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3.  Background</title>
      <p>
        Indexing is an important and well-understood element of RDF storage systems and SPARQL query
engines, where it provides a trade-off between query execution time, storage space, and ingestion time.
Existing approaches are either based on existing database technologies, such as relational
databases [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] or document stores [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], or provide native support for RDF triples. In the context of this
paper, we focus on the laer. Furthermore, we limit our discussion to the storage of RDF triples without
considering the concept of named graphs, as these can be considered as fourth element in a quad-like
structure, for which straightforward index extensions are possible.
2 / 15
      </p>
    </sec>
    <sec id="sec-4">
      <title>3.1. Indexes For Different Orders</title>
      <p>
        A first important concept in RDF indexing is the storage of triples in different orders [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], which is
done by many RDF storage techniques, such as RDF-3X  [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and Hexastore  [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Given that a  triple
consists of a subject (S), predicate (P) and object (O), both systems include six indexes for different triple
component orders (SPO, SOP, OSP, OPS, PSO and POS). e presence of these indexes allows all possible
triple paerns to be executed efficiently. For example, the triple paern query ??O can be answered
most efficiently using the OSP or OPS indexes, while the query S?O could be answered using SOP and OSP.
Next to triple paern access efficiency, these orders also enable more efficient triple paern join pro‐
cessing inside query engines, where the highly efficient sort-merge join could for example be used for
joins between triple paerns if triples are sorted in the same manner. RDF-3X goes a step further, and
also provides six aggregated indexes (SP, SO, PS, PO, OS, and OP), and three one-valued indexes (S, P, and
O). e triples inside each index can be stored in different ways, such as ordered lists (Hexastore) or
B+Trees (RDF-3X). Approaches such as HDT [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and OSTRICH [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] go the different direction, and store
fewer indexes (SPO, POS, OSP) to focus purely on the triple paern access efficiency in combination with
a lower storage cost. In the context of this article, we assume tree-like indexing via nested hash maps,
and we refer to the triple component parts of an index as triple component indexes, corresponding to
the levels in the tree. For example, the SPO index would have 3 triple component indexes: S, P, and O.
      </p>
    </sec>
    <sec id="sec-5">
      <title>3.2. Dictionary Encoding</title>
      <p>A second important aspect in RDF indexing is the encoding of RDF terms using dictionaries. e
main purposes of dictionary-encoding are the reduction in storage overhead if RDF terms are reused
across multiple triples inside indexes, and more efficient comparison of RDF terms during query pro‐
cessing. e dictionary itself is a datastructure that maintains a bidirectional mapping of RDF terms to
their encodings. Instead of storing RDF terms directly inside indexes, terms are first encoded into a
more compact datatype, such as an integer, which is then stored inside the index instead. At query time,
non-variable triple paern terms can also be encoded, and queried inside the index. When returning
query results, encoded triples can be decoded using the dictionary.</p>
      <p>Fig.  1 shows an illustration of the typical components of a triplestore. is example store contains
three indexes, with triples stored in SPO, POS, and OSP orders in tree-like structure. ese indexes make
use of a single shared dictionary, which encodes the RDF terms inside all RDF triples stored by the
indexes.</p>
    </sec>
    <sec id="sec-6">
      <title>3.3. Triple Pattern eries</title>
      <p>To simplify discussions involving triple paern queries, we outline a traditional high-level query exe‐
cution approach for triple paern queries in Algorithm  1, Algorithm  2, and Algorithm  3. As shown in
Algorithm 1, the first step involves determining the most suitable index for a given triple paern query.
For example, a ??O query can be answered most efficiently using the OSP index, because O can be select‐
ed in constant time from the root of the tree. e triple paern query is then delegated to the index,
which is executed according to Algorithm 2. In this step, we recursively drill down into the tree-like in‐
dex by iterating over all matching terms of each triple paern component. e algorithm for finding all
matches for a single triple paern component is shown in Algorithm 3, which either returns all terms in
this part of the index if the term is a variable, or returns the term itself if the term is inside the index if
the term is not a variable. We will replace parts of these algorithms when introducing our quoted triples
indexing approaches in Section 5.</p>
      <p>FUNCTION QueryStore(store, tp)</p>
      <p>INPUT:
store: triple store
tp: triple pattern
OUTPUT:</p>
      <p>ts: sequence of triples
idx = most suitable index from store.indexes
ts = QueryIndex(idx, store.dict, tp)
RETURN ts</p>
      <p>Algorithm  1: Pseudocode for executing a triple pattern query over a triplestore containing
one or more indexes.</p>
      <p>FUNCTION QueryIndex(idx, dict, tp)</p>
      <p>INPUT:
idx: triple pattern index
dict: dictionary
tp: triple pattern
OUTPUT:</p>
      <p>ts: sequence of triples
tpe = dict.encode(tp);
ts = [];
FOR subject IN QueryIndexComponent(idx, dict, tpe.subject])
indx_s = idx[subject]
FOR predicate IN QueryIndexComponent(indx_s, dict, tpe.predicate])
indx_p = indx_s[predicate]
FOR object IN QueryIndexComponent(indx_p, dict, tpe.object])
object = indx_p[object]
ts.push(subject, predicate, object)</p>
      <p>END</p>
      <p>END
END
RETURN ts</p>
      <p>Algorithm  2: Pseudocode for executing a triple pattern query over an index, sorted in SPO
order.
FUNCTION QueryIndexComponent(idxc, dict, term)</p>
      <p>INPUT:
idxc: a certain triple component of a triple index
dict: dictionary
term: a term inside a triple pattern
OUTPUT:</p>
      <p>ts: sequence of triples
matchingTerms = []
IF term.type === 'Variable'</p>
      <p>FOR key IN idxc</p>
      <p>matchingTerms.push(dict.decode(key))</p>
      <p>END
ELSE</p>
      <p>IF idxc contains dict.encode(term)</p>
      <p>matchingTerms.push(term)</p>
      <p>END
END
RETURN matchingTerms</p>
      <p>Algorithm  3: Pseudocode for finding all matches of a single triple component inside an
index.</p>
    </sec>
    <sec id="sec-7">
      <title>4.  Use Case</title>
      <p>As example use case to illustrate different indexing approaches, we introduce a fictional dataset con‐
taining statements from different people on the color of violets in Listing 1.
@prefix : &lt;http://example.org/foo#&gt; .
:Alice :says &lt;&lt; :Violets :haveColor :Blue &gt;&gt; .
:Bob :says &lt;&lt; :Violets :haveColor :Yellow &gt;&gt; .</p>
      <p>Listing 1: Turtle snippets containing statements by Alice and Bob on the color of violets.</p>
      <p>To find out what color each person says violets have, we can execute the SPARQL query from
Listing  2. is query contains a variable in the subject position, and a variable inside the quoted triple
paern of the object position. For brevity in the remainder of this article, we refer to the variables in‐
side quoted triple paerns as quoted variables.</p>
      <p>PREFIX : &lt;http://example.org/foo#&gt;
SELECT ?person ?color WHERE {</p>
      <p>?person :says &lt;&lt; :Violets :haveColor ?color &gt;&gt; .
}</p>
      <p>Listing 2: SPARQL query to find what color each person says violets have.</p>
      <p>More advanced datasets may also contain nested quoted triples, in which any term inside a quoted
triple could be another quoted triple. ere is no upper limit of the nesting depth that can occur.</p>
    </sec>
    <sec id="sec-8">
      <title>5.  Indexing Approaches</title>
      <p>In this section, we introduce four approaches for indexing quoted triples, with increasing levels of
complexity. ese approaches build upon the well-established methods of using dictionary encoding
and storing triples in different orders as explained in Section  3. Our approaches only rely on changes
within the dictionary mechanism, while the triple index itself can remain unchanged.
5 / 15</p>
    </sec>
    <sec id="sec-9">
      <title>5.1. Singular Dictionary</title>
      <p>A straightforward way to achieve dictionary encoding of quoted triples, is to include quoted triples
directly inside the dictionary with all other RDF terms. As such, quoted triples are handled in exactly
the same manner as other RDF term types. For dictionaries that map strings to integers, this requires a
mechanism to convert quoted triples into strings. Fig. 2 shows an example of such dictionary contents
based on our use case data.</p>
      <p>Executing triple paern queries is identical to the baseline approach as explained in Section 3, except
for triple paerns containing quoted variables, such as the query ?person :says &lt;&lt;Violets haveColor
?color&gt;&gt;. As this approach has no direct way of matching the ?color variable to quoted triples, we are
required to convert quoted triple paern terms containing variables to variables, and perform a
postprocessing step to only emit those triples that match the quoted triple paern. e pseudo-code of this
algorithm is shown in Algorithm 4.</p>
      <p>e main advantage of this approach lies in its simplicity of implementation. However, we hypothe‐
size two main disadvantages:
6 / 15
1. Storage overhead: oted triples with shared terms lead to a storage overhead, such as the dupli‐
cate storage of :Violets and :haveColor in Fig. 2.
2. Slow quoted triple pattern execution: When executing triple paern queries with quoted vari‐
ables, there is no indexed access to matching quoted triples, which can lead to query performance
issues.</p>
    </sec>
    <sec id="sec-10">
      <title>5.2. oted Triples Dictionary</title>
      <p>In an aempt to cope with the two disadvantages of the singular dictionary approach, we can dedi‐
cate the storage of quoted triples to a separate dictionary, as shown in Fig. 3.</p>
      <p>To execute triple paern queries in this approach, the post-processing step from the singular dictio‐
nary is not needed anymore. Instead, we can hook directly into the internal processing of separate
triple component indexes. Concretely, when finding all matches of a given term inside a triple compo‐
nent index, we check if our term is a quoted triple paern. If so, we perform an inner join between all
quoted triple entries within the quoted triples dictionary, and the terms within the triple component in‐
dex. If the triple component index is index in a hash-like manner, then this inner join can be done effi‐
ciently in a hash join manner. e pseudo-code of this algorithm is shown in Algorithm 5.
FUNCTION QueryIndexComponentQuotedDict(idxc, dict, term)</p>
      <p>INPUT:
idxc: a certain triple component of a triple index
dict: quoted triples dictionary
term: a term inside a triple pattern
IF term.type === 'Quoted' &amp;&amp; term contains variables
matchingTerms = []
FOR quotedTriple IN dict.getAllQuotedTriples()</p>
      <p>IF idxc contains dict.encode(quotedTriple)</p>
      <p>matchingTerms.push(quotedTriple)</p>
      <p>END</p>
      <p>RETURN matchingTerms
ELSE</p>
      <p>RETURN QueryIndexComponent(idxc, dict, term)
END</p>
      <p>Algorithm 5: Pseudocode of the algorithm for finding all matching terms of a certain triple
component inside an index using a quoted triples dictionary. is algorithm is a variant of
QueryIndexComponent from Algorithm 3.</p>
      <p>We hypothesize that this separated quoted triples dictionary will result in a lower storage overhead.
Furthermore, we expect triple paern execution to be faster due to the fact that a quoted triple paern
will only lead to matches with entries in the quoted triples dictionary, as opposed to all possible terms.
7 / 15
However, as shown in Fig. 3, this approach can still lead to redundant storage if terms are shared across
different quoted triples, such as :Violet and :haveColor. Furthermore, the join inside the triple compo‐
nent index using all quoted triples might become too expensive if there are many non-matching quoted
triples.</p>
    </sec>
    <sec id="sec-11">
      <title>5.3. Referential oted Triples Dictionary</title>
      <p>To solve the redundant storage issue within the quoted triples dictionary approach, we extend upon
that approach by not storing quoted triples in full, but by instead encoding the three components of
that quoted triples, and using those encodings as key inside the dictionary. Fig. 4 illustrates an example
of this approach.</p>
      <p>Fig.  4: Plain terms and quoted triples are stored in separate dictionaries, where quoted
triples are recursively encoded using existing dictionary encodings.</p>
      <p>e triple paern query algorithm is identical to the one from Algorithm  5, except for the fact that
dictionary encoding and decoding will require the extra step of encoding and decoding of the three
triple components.</p>
      <p>We hypothesize that this approach will lead to lower storage usage due to the shared encoding of re‐
dundant terms inside quoted triples.</p>
    </sec>
    <sec id="sec-12">
      <title>5.4. Indexed oted Triples Dictionary</title>
      <p>e oted Triples Dictionary approach requires triple component indexes to join with all quoted
triples, which may be costly for selective quoted triple paerns in the presence of many non-matching
quoted triples. To solve this problem, we can modify the storage of our oted Triples Dictionary from
a map-like structure to a tree-like structure, so that triple paern matching can be done more efficiently.
Concretely, this tree-like structure can be implemented similar to a triple index, but it must map to inte‐
ger encodings of quoted triples. Fig. 5 illustrates an example of this approach. is example only makes
use of the SPO order for encodings, but in practise multiple other collation orders may be used.</p>
      <p>Fig.  5: Plain terms and quoted triples are stored in separate dictionaries, where quoted
triples are recursively encoded using existing dictionary encodings, and indexed in a tree
structure based on triple components.</p>
      <p>Executing triple paern queries is very similar to the approach of the oted Triples Dictionary, with
the difference that instead of joining with all quoted triples, we join with only those quoted triples that
match with the current quoted triple paern. e pseudo-code of this algorithm is shown in
Algorithm 6.</p>
      <p>FUNCTION QueryIndexComponentQuotedDictIndex(idxc, dict, term)</p>
      <p>INPUT:
idxc: a certain triple component of a triple index
dict: quoted triples dictionary
term: a term inside a triple pattern
IF term.type === 'Quoted' &amp;&amp; term contains variables
matchingTerms = []
FOR quotedTriple IN dict.getMatchingQuotedTriples(term)</p>
      <p>IF idxc contains dict.encode(quotedTriple)</p>
      <p>matchingTerms.push(quotedTriple)</p>
      <p>END
END</p>
      <p>RETURN matchingTerms
ELSE</p>
      <p>RETURN QueryIndexComponent(idxc, dict, term)
END</p>
      <p>Algorithm 6: Pseudocode of the algorithm for finding all matching terms of a certain triple
component inside an index using an indexed quoted triples dictionary. is algorithm is a
variant of QueryIndexComponent from Algorithm 3.</p>
      <p>We hypothesize that this approach will speed up triple paern query performance due to the higher
selectivity during joins between the quoted triples dictionary and the current component index.</p>
    </sec>
    <sec id="sec-13">
      <title>6.  Evaluation</title>
      <p>In this section, we evaluate the impact of the indexing approaches discussed in Section 5 in terms of
storage size, ingestion time, and query execution time. We start by discussing our implementation of
the approaches, followed by our experimental setup, results, and end with a discussion.
9 / 15</p>
    </sec>
    <sec id="sec-14">
      <title>6.1. Implementation</title>
      <p>To achieve a fair comparison between the different indexing approaches, we have implemented all
approaches in the same programming language (TypeScript/JavaScript). e implementation of these
approaches is open-source, and is available on GitHub at hps://github.com/rubensworks/rdf-stores.js.</p>
    </sec>
    <sec id="sec-15">
      <title>6.2. Experimental Setup</title>
      <p>To measure the performance impact of different quoted triple depths, we create synthetic datasets of
various sizes. Our dataset generator is based on the data model of Section 4 with different people (size /
10) and colors (10), and allows any number of triples to be generated. Furthermore, it allows a depth pa‐
rameter to be specified, which defines the nesting depth of quoted triples in object positions. For in‐
stance, a depth value of 1 generates quoted triples in the form of ?person :says &lt;&lt; :Violets
:haveColor ?color &gt;&gt;, while a depth value of 3 generated quoted triples in the form of ?person :says
&lt;&lt; ?person :says &lt;&lt; ?person :says &lt;&lt; :Violets :haveColor ?color &gt;&gt; &gt;&gt; &gt;&gt;.</p>
      <p>For our experiments, we range the dataset from 1.000 to 1.000.000 triples, with the depth ranging
from 1 to 5. For each combination, we measure the performance of the four indexing approaches in
terms of the following metrics:</p>
      <p>Storage size: e total memory consumption aer ingestion in MB.</p>
      <p>Ingestion time: e duration of ingesting the generated triples in milliseconds.
ery execution time: e total duration of executing all triple paern queries in milliseconds.
ery execution time was measured using 3 categories of queries (examples assume depth 2):
Low selectivity: ery people in the form of: ?person :says &lt;&lt; ?person :says &lt;&lt; :Violets
:haveColor :Red &gt;&gt; &gt;&gt;. Each query produces size / 10 results.</p>
      <p>Medium selectivity: ery colors in the form of: ?person :says &lt;&lt; :Bob :says &lt;&lt; :Violets
:haveColor ?color &gt;&gt; &gt;&gt;. Each query produces 10 results.</p>
      <p>High selectivity: ery colors of specific people in the form of: :Alice :says &lt;&lt; :Bob :says &lt;&lt;
:Violets :haveColor ?color &gt;&gt; &gt;&gt;. Each query produces 1 results.</p>
      <p>e four indexing approaches were configured with three indexes (SPO, POS, OSP), and the indexed
quoted triples dictionary was also configured with these three indexes. All experiments were executed
on a MacBook Pro 13-inch, 2020 with 16GB of RAM and a 2,3 GHz ad-Core Intel Core i7 processor.
Our experimental setup is fully reproducible, and is available together with the raw results at hps://
github.com/rubensworks/experiments-indexing-quoted-triples.</p>
    </sec>
    <sec id="sec-16">
      <title>6.3. Results</title>
      <p>e labels inside each figure map to the indexing approaches as follows:</p>
      <sec id="sec-16-1">
        <title>Singular Dictionary singular oted Triples Dictionary quoted Referential oted Triples Dictionary quoted-ref Indexed oted Triples Dictionary quoted-idx</title>
      </sec>
      <sec id="sec-16-2">
        <title>Subfig. 6.1: Depth 1</title>
      </sec>
      <sec id="sec-16-3">
        <title>Subfig. 6.2: Depth 5 Fig. 6: Storage sizes for the 4 indexing approaches with increasing dataset sizes.</title>
      </sec>
      <sec id="sec-16-4">
        <title>Subfig. 7.1: Depth 1</title>
      </sec>
      <sec id="sec-16-5">
        <title>Subfig. 7.2: Depth 5 Fig. 7: Ingestion times for the 4 indexing approaches with increasing dataset sizes.</title>
        <p>Subfig. 8.1: Depth 1</p>
      </sec>
      <sec id="sec-16-6">
        <title>Subfig. 8.2: Depth 3</title>
      </sec>
      <sec id="sec-16-7">
        <title>Subfig. 8.3: Depth 4</title>
      </sec>
      <sec id="sec-16-8">
        <title>Subfig. 8.4: Depth 5 Fig.  8: ery execution times for the 4 indexing approaches with increasing dataset sizes with low result selectivity.</title>
      </sec>
      <sec id="sec-16-9">
        <title>Subfig. 9.1: Depth 1</title>
      </sec>
      <sec id="sec-16-10">
        <title>Subfig. 9.2: Depth 3</title>
      </sec>
      <sec id="sec-16-11">
        <title>Subfig. 9.3: Depth 4</title>
      </sec>
      <sec id="sec-16-12">
        <title>Subfig. 9.4: Depth 5 Fig.  9: ery execution times for the 4 indexing approaches with increasing dataset sizes with medium result selectivity.</title>
        <p>Subfig. 10.1: Depth 1</p>
        <p>Subfig. 10.2: Depth 3
Subfig. 10.3: Depth 4</p>
        <p>Subfig. 10.4: Depth 5
6.4.1. Storage Size</p>
        <p>e results from Fig.  6 show that in terms of storage size, the singular dictionary and referential
quoted triples dictionary approaches perform the best. e quoted triples dictionary and indexed quot‐
ed triples dictionary approaches on the other hand require significantly more memory. Contrary to
what we hypothesized in Section  5, the storage overhead of the singular dictionary for terms inside
quoted triples is less significant than expected, and the gains from the removal of storage redundancy
with the referential quoted triples dictionary are minimal. is is due to the string interning memory
optimization that is used by Node.js, whereby multiple equal string instances are only stored once in
memory. e indexed quoted triples dictionary approach results in a significantly higher storage size
due to the three indexes that are used to index quoted triples.
6.4.2. Ingestion Time</p>
        <p>As expected, we observe similar results in terms of ingestion time in Fig. 7, where the singular dictio‐
nary and quoted triples dictionary are significantly faster than the referential and indexed quoted
triples dictionary approaches. ese approaches are faster due to their simpler encoding approach,
whereas the referential and indexed quoted triples dictionary approaches require more operations dur‐
ing triple encoding.
13 / 15
6.4.3. ery Execution Time</p>
        <p>e results in Fig.  8, Fig.  9, and Fig.  10 show that on average, the indexed quoted triples dictionary
approach vastly outperforms all other approaches. is is most significant for triple paerns with medi‐
um selectivity due to the fact that this approach has indexes corresponding exactly to these queries,
while the other approaches require iteration over all quoted triples. For triple paerns with low selec‐
tivity, the difference is smaller, but the indexed quoted triples dictionary approach is still faster overall.
e difference for triple paerns with high selectivity is minimal, as the overhead of triple paern dic‐
tionary encoding during query execution when fetching a single result becomes more apparent.</p>
      </sec>
    </sec>
    <sec id="sec-17">
      <title>7.  Conclusions</title>
      <p>In this article, we discussed four approaches for the in-memory indexing of quoted triples, ranging
from very naive to highly optimized for quoted triple paern access.</p>
      <p>Our results show that the indexed quoted triples dictionary approach is on average orders of magni‐
tude faster than other indexing approaches. In the most extreme case, this approach is over 4.000 to
11.000 times faster than other indexing approaches for a dataset size of 1 million with quoted triples at
depth 5. For smaller dataset sizes, lower quoted triple depths, and other types of queries, the difference
becomes smaller. is significant speedup at query time comes with the expected cost of increased stor‐
age size and ingestion time, which is approximately two-fold for a large dataset.</p>
      <p>As such, for use cases where quoted triple paern queries occur at various depths, and an increase in
storage size and ingestion time are acceptable, the indexed quoted triples dictionary approach is highly
beneficial for achieving good query performance. Furthermore, given the fact that this approach stores
quoted triple terms separate from all other terms inside the dictionary, there is no significant overhead
of using such a dictionary by default in triplestores, even if is unknown before ingestion starts if quoted
triples are present in the datasets.</p>
      <p>
        In future work, there is a need to evaluate the performance of different indexing approaches over
real-world data, as we only evaluated performance over artifically generated datasets, which do not
capture real-world properties [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. Furthermore, we wish to evaluate the performance of these indexes
within full SPARQL queries by plugging them into a SPARQL query engine [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ].
      </p>
      <p>In conclusion, the outcomes of this work show that the inclusion of the concept of quoted triples into
the RDF and SPARQL recommendations necessitates changes in triplestores in terms of indexing and
dictionary encoding, but that approaches such as the indexed quoted triples dictionary are able to pro‐
vide very good performance, while not requiring overly complicated additions implementation-wise.
References
14 / 15</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wood</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lanthaler</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>RDF 1.1: Concepts and Abstract Syntax</article-title>
          . W3C, hp:// www.w3.org/TR/2014/REC-rdf11
          <string-name>
            <surname>-</surname>
          </string-name>
          concepts-20140225/ (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Rodriguez</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neubauer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Constructions from dots and lines</article-title>
          .
          <source>arXiv preprint arXiv:1006</source>
          .
          <fpage>2361</fpage>
          . (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Hogan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blomqvist</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cochez</surname>
          </string-name>
          , M.,
          <string-name>
            <surname>d'Amato</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Melo</surname>
            , G.de, Gutierrez,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kirrane</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gayo</surname>
            ,
            <given-names>J.E.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Navigli</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumaier</surname>
            ,
            <given-names>S.,</given-names>
          </string-name>
          <article-title>others: Knowledge graphs</article-title>
          .
          <source>ACM Computing Surveys (CSUR)</source>
          .
          <volume>54</volume>
          ,
          <fpage>1</fpage>
          -
          <lpage>37</lpage>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Hartig</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Foundations of RDF* and SPARQL*:(An alternative approach to statement-level metadata in RDF)</article-title>
          .
          <source>In: AMW 2017 11th Alberto Mendelzon International Workshop on Foundations of Data Management and the Web</source>
          , Montevideo, Uruguay, June 7-9,
          <year>2017</year>
          . Juan Reuer, Divesh
          <string-name>
            <surname>Srivastava</surname>
          </string-name>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Hartig</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Champin</surname>
          </string-name>
          , P.-A.,
          <string-name>
            <surname>Kellogg</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seaborne</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>RDF-star and SPARQL-star</article-title>
          .
          <source>W3C</source>
          , hps:// www.w3.org/
          <year>2021</year>
          /12/rdf-star.
          <source>html</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Kasenchak</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehnert</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Loh</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Use case: ontologies and RDF-star for knowledge management</article-title>
          . In: e Semantic Web:
          <article-title>ESWC 2021 Satellite Events: Virtual Event</article-title>
          , June 6-10,
          <year>2021</year>
          ,
          <source>Revised Selected Papers 18</source>
          . pp.
          <fpage>254</fpage>
          -
          <lpage>260</lpage>
          . Springer (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Braun</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.H.-J.</surname>
          </string-name>
          ,
          <string-name>
            <surname>Käfer</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Self-verifying Web Resource Representations Using Solid, RDF-Star and Signed URIs</article-title>
          . In: e Semantic Web:
          <article-title>ESWC 2022 Satellite Events</article-title>
          : Hersonissos, Crete, Greece, May 29-June 2,
          <year>2022</year>
          , Proceedings. pp.
          <fpage>138</fpage>
          -
          <lpage>142</lpage>
          . Springer (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8] Group,
          <string-name>
            <surname>R.D.F</surname>
          </string-name>
          .-star W.C.C.:
          <article-title>RDF-star Implementations</article-title>
          . hps://w3c.github.io/rdf-star/implementations.html (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Fernández</surname>
            ,
            <given-names>J.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martínez-Prieto</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gutiérrez</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arias</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Binary RDF Representation for Publication and Exchange (HDT)</article-title>
          .
          <source>Web Semantics: Science, Services and Agents on the World Wide Web</source>
          .
          <volume>19</volume>
          ,
          <fpage>22</fpage>
          -
          <lpage>41</lpage>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weikum</surname>
          </string-name>
          , G.:
          <article-title>RDF-3X: a RISC-style engine for RDF</article-title>
          .
          <source>Proceedings of the VLDB Endowment</source>
          .
          <volume>1</volume>
          ,
          <fpage>647</fpage>
          -
          <lpage>659</lpage>
          (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Weiss</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karras</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernstein</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Hexastore: sextuple indexing for semantic web data management</article-title>
          .
          <source>Proceedings of the VLDB Endowment</source>
          .
          <volume>1</volume>
          ,
          <fpage>1008</fpage>
          -
          <lpage>1019</lpage>
          (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Delva</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arenas-Guerrero</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iglesias-Molina</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corcho</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chaves-Fraga</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dimou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>RML-star: A declarative mapping language for RDF-star generation</article-title>
          .
          <source>In: ISWC2021</source>
          , the International Semantic Web Conference.
          <source>CEUR</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Dimou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vander</surname>
            <given-names>Sande</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Colpaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Verborgh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Mannens</surname>
          </string-name>
          , E., Van de Walle, R.:
          <article-title>RML: A generic language for integrated RDF mappings of heterogeneous data</article-title>
          .
          <source>Ldow</source>
          .
          <volume>1184</volume>
          , (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Arenas-Guerrero</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iglesias-Molina</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chaves-Fraga</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garijo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corcho</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dimou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Morph-KGCstar: Declarative generation of RDF-star graphs from heterogeneous data. Semantic Web (Under Review)</article-title>
          .
          <source>(</source>
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Keskisärkkä</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blomqvist</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lind</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hartig</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>RSP-QL: Enabling Statement-Level Annotations in RDF Streams</article-title>
          .
          <source>In: Semantic Systems. e Power of AI and Knowledge Graphs: 15th International Conference, SEMANTiCS</source>
          <year>2019</year>
          , Karlsruhe, Germany, September 9-
          <issue>12</issue>
          ,
          <year>2019</year>
          , Proceedings. pp.
          <fpage>140</fpage>
          -
          <lpage>155</lpage>
          . Springer (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Dell'Aglio</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Della</given-names>
            <surname>Valle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Calbimonte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-P.</given-names>
            ,
            <surname>Corcho</surname>
          </string-name>
          ,
          <string-name>
            <surname>O.</surname>
          </string-name>
          :
          <article-title>RSP-QL semantics: A unifying query model to explain heterogeneity of RDF stream processing systems</article-title>
          .
          <source>International Journal on Semantic Web and Information Systems (IJSWIS)</source>
          .
          <volume>10</volume>
          ,
          <fpage>17</fpage>
          -
          <lpage>44</lpage>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Abuoda</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dell'Aglio</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Keen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hose</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Transforming RDF-star to Property Graphs: A Preliminary Analysis of Transformation Approaches-extended version</article-title>
          .
          <source>arXiv preprint arXiv:2210</source>
          .
          <fpage>05781</fpage>
          . (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Erling</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikhailov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Virtuoso: RDF support in a native RDBMS</article-title>
          .
          <source>In: Semantic Web Information Management</source>
          . pp.
          <fpage>501</fpage>
          -
          <lpage>519</lpage>
          . Springer (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Wallgrün</surname>
            ,
            <given-names>J.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frommberger</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dylla</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Freksa</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>alitative spatial representation and reasoning in the SparQ-toolbox</article-title>
          .
          <source>In: International Conference on Spatial Cognition</source>
          . pp.
          <fpage>39</fpage>
          -
          <lpage>58</lpage>
          . Springer (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Harth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Decker</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Optimized index structures for querying rdf from the web</article-title>
          . In: ird
          <string-name>
            <given-names>Latin</given-names>
            <surname>American Web Congress (LA-WEB</surname>
          </string-name>
          '
          <year>2005</year>
          ). pp.
          <fpage>10</fpage>
          -pp.
          <source>IEEE</source>
          (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Taelman</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Vander</given-names>
            <surname>Sande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Van Herwegen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Mannens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Verborgh</surname>
          </string-name>
          , R.:
          <article-title>Triple Storage for Random-Access Versioned erying of RDF Archives</article-title>
          .
          <source>Journal of Web Semantics</source>
          . (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Duan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kementsietsidis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srinivas</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Udrea</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Apples and oranges: a comparison of RDF benchmarks and real RDF datasets</article-title>
          .
          <source>In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data</source>
          . pp.
          <fpage>145</fpage>
          -
          <lpage>156</lpage>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Taelman</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , Van Herwegen,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Vander</surname>
          </string-name>
          <string-name>
            <surname>Sande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Verborgh</surname>
          </string-name>
          , R.:
          <article-title>Comunica: a Modular SPARQL ery Engine for the Web</article-title>
          .
          <source>In: Proceedings of the 17th International Semantic Web Conference</source>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>