<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Versioned Queries over RDF Archives: All You Need is SPARQL?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ignacio Cuevas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aidan Hogan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Chile &amp; IMFD</institution>
          <country country="CL">Chile</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We explore solutions for representing archives of versioned RDF data using the SPARQL standard and o -the-shelf engines. We consider six representations of RDF archives based on named graphs, and describe how input queries can be automatically rewritten to return solutions for a particular version, or solutions that change between versions. We evaluate these alternatives over an archive of 8 weekly versions of Wikidata and 146 queries using Virtuoso as the SPARQL engine.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        A key aspect of the Web is its dynamic nature, where documents are frequently
updated, deleted and added. Likewise when we speak of the Semantic Web, it is
important to consider that sources may be dynamic and RDF datasets are
subject to change [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. It is in this context that various works have looked at
versioning in the context of RDF/SPARQL [
        <xref ref-type="bibr" rid="ref12 ref23 ref25 ref7 ref8">25,23,7,8,12</xref>
        ], with recent works proposing
RDF archives [
        <xref ref-type="bibr" rid="ref2 ref20 ref3 ref5 ref6">5,3,2,6,20</xref>
        ] that manage RDF graphs and their historical changes,
allowing for querying across di erent versions of the graph. Within these works,
a variety of specialised indexing techniques [
        <xref ref-type="bibr" rid="ref2 ref20 ref3">3,2,20</xref>
        ], query languages [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] and
benchmarks [
        <xref ref-type="bibr" rid="ref13 ref6">13,6</xref>
        ] have been proposed, developed and evaluated. While these
represent important advances, many such works propose custom SPARQL
extensions, indexes, engines, etc., creating a barrier for adoption.
      </p>
      <p>
        In fact, versioned queries as proposed in the literature [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] can be supported
using o -the-shelf SPARQL engines with years of development, optimisation,
and deployment. SPARQL named graphs can, for example, be used to track
di erent versions of individual graphs. However, as Fernandez at al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] note,
the approach of using pure SPARQL would \typically render rather ine cient
SPARQL queries ". This raises a question: how ine cient will such queries be? If
a pure SPARQL solution could be found with reasonable performance, existing
SPARQL engines could be used to host and query RDF archives.
      </p>
      <p>
        In this paper, we present preliminary empirical results addressing this
research question. Speci cally we look at six representations of RDF archives
using named graphs and propose query rewriting mechanisms for them. We then
evaluate and compare these representations for an RDF archive of 8 versions of
Wikidata [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. Our experiments compare the sizes of the indexes generated, the
time taken for indexing, and the relative costs of query evaluation.
Copyright © 2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Various temporal extensions for RDF have been proposed in literature based
on annotations [
        <xref ref-type="bibr" rid="ref19 ref29 ref9">9,19,29</xref>
        ]. Proposed temporal extensions for SPARQL include
SPARQL [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], T-SPARQL [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], SPARQL-ST [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], SPARQLT [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ], etc. Related to
temporality, a number of systems support versioning for RDF, including
SemVersion [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ], POI [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], x-RDF-3x [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], R43ples [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], Dydra [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and Ostrich [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
      </p>
      <p>
        More recently RDF archives (of historical RDF data) have been gaining
attention. Fernandez et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] survey the theme, discussing the types of queries
that can be run on such archives. Cerdeira-Pena et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], Zaniolo et al. [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] and
Taelman et al. [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] propose compressed indexes for RDF archives, while Khurana
and Deshpande [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] propose indexes for historical graph data. Bahri et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
use Apache Spark to manage RDF archives in a distributed setting. Benchmarks
have also been proposed for RDF archives, including the BEnchmark of RDF
ARchives (BEAR) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and the Semantic Publishing Benchmark (SPB) [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>
        The past years have seen many developments for managing and querying
temporal, versioned or historical RDF data. But most of these approaches propose
specialised languages, implementations, etc., creating an obstacle for adoption.
A number of authors note that one can manage and query RDF archives using
vanilla SPARQL, though it may lead to complex or ine cient queries [
        <xref ref-type="bibr" rid="ref23 ref6">23,6</xref>
        ].
Recently SPARQL has been used to host and query the edit history of Wikidata,
but only one representation is explored [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. This paper describes preliminary
experiments to gain insights into the e ciency of o -the-shelf SPARQL engines
for hosting RDF archives using di erent types of representations.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Preliminaries</title>
      <p>
        RDF triples are composed of three sets of terms: IRIs (I), literals (L) and blank
nodes (B). We do not consider blank nodes in this work as they complicate the
detection of changes [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. An RDF triple (s; p; o) 2 I I (I [ L) consists of
a subject s, predicate p and object o. An RDF graph G is a set of triples. An
RDF archive is a tuple of RDF graphs G = (G1; : : : ; Gn).
      </p>
      <p>A triple pattern t := (s; p; o) 2 (I [ V) (I [ V) (I [ L [ V) is an RDF triple
that permits variables from V to appear in any position. We denote by vars(t)
the variables of t. A solution is a partial mapping : V ! I [ L. We denote
by dom( ) the domain of , i.e., the set of variables for which is de ned. We
say that two solutions 1; 2 are compatible, denoted 1 2, if and only if
1(v) = 2(v) for all v 2 dom( 1) \ dom( 2). We denote by (t) the image of
t under , replacing each variable v 2 dom( ) \ vars(t) by (v) in t. We denote
by t(G) := f j (t) G and dom( ) = vars(t)g the evaluation of t over G.</p>
      <p>
        SPARQL queries are based on triple patterns and a number of relational
operators. Similar to Perez et al. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], we de ne an abstract syntax for a subset
of SPARQL of pertinence to this paper as follows. A triple pattern t is a graph
pattern. Furthermore, if P and Q are graph patterns, and V V is a set of
[P and Q](G) := P (G) no Q(G)
[P union Q](G) := P (G) [ Q(G)
[P minus Q](G) := P (G) n Q(G)
      </p>
      <p>M1 no M2 := f 1 [ 2 j 1 2 M1; 2 2 M2; 1
M1 [ M2 := f j
2 M1 or
2g
2g
variables, then [P and Q], [P union Q] and [P minus Q] are graph patterns. We
de ne the semantics of these graph patterns in Figure 1.1</p>
      <p>
        In this paper, we rely on SPARQL datasets to represent RDF archives. A
SPARQL dataset D := fG; (x1; G1); : : : (xn; Gn)g consists of a default (RDF)
graph G and a set of named graphs of the form (xi; Gi) where xi 2 I, Gi is an
RDF graph, and xi 6= xj for 1 i m, 1 j m, i 6= j. We may represent D
as a set of quads of the form (G f g) [ (G1 fx1g) [ : : : [ (Gn fxng), where
2= I [ L [ V is a special symbol denoting the default graph.2 Di erent named
graphs can be queried using a GRAPH operator, creating quad patterns. A quad
pattern q = (s; p; o; g) 2 (I [ V) (I [ V) (I [ L [ V) (I [ f g) extends a triple
pattern with a fourth element that may be an IRI or *. Its evaluation is analogous
to that of a triple pattern: q(G) := f j (q) G and dom( ) = vars(q)g.
Following the SPARQL standard [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], we translate a triple pattern (s; p; o) to a
quad pattern (s; p; o; ) evaluated only on the default graph. The semantics of
the operators de ned in Table 1 then remain unchanged simply allowing P and
Q to now also represent quad patterns, considered to be (named) graph patterns.
      </p>
      <p>SPARQL provides two operators to initialise a SPARQL dataset: FROM and
FROM NAMED. We combine both for brevity into one operator. A graph pattern P
is considered to be a query. Likewise if P is a graph pattern, and M and N are
sets of IRIs, then fromM;N P is also a query. If (xi; Gi) 2 D let D(xi) = Gi;
otherwise if no graph is named xi in D, let D(xi) = fg. The evaluation of
fromM;N P on a dataset D is de ned as fromM;N P (D) := P (DM;N ), where
the query dataset DM;N := ([m2M D(m) f g) [ ([n2N D(n) n) is formed
from a default graph containing the union3 of graphs with a name m 2 M , and
all named graphs in D with a name n 2 N . We will be able to use these operators
to de ne query datasets that capture speci c versions in an RDF archive.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Versioned Data</title>
      <p>
        Our general approach is to represent an RDF archive as a SPARQL dataset
D but there are multiple representations by which this can be achieved, each
with its own strength and weaknesses. We propose six di erent representations
falling into three di erent categories as discussed by various authors (e.g., [
        <xref ref-type="bibr" rid="ref24 ref5">24,5</xref>
        ]):
Independent Copies (IC ), Change-Based (CB ) and Timestamp-Based (TB ).
1 A basic graph pattern ft1; : : : tng can be represented as [[t1 and : : :] and tn].
2 We thus assume a quad store, and disallow empty named graphs.
3 Since we do not allow blank nodes, a union or RDF merge is equivalent.
Independent Copies (IC): A natural representation is to store an RDF Archive
G = (G1; : : : ; Gn) as a SPARQL dataset D = f(x1; G1); : : : ; (xn; Gn)g, where
x1; : : : ; xn are IRIs that identify the version. This will result in relatively simple
(and thus likely e cient) rewritten queries, but can be expected to occupy a lot
of space, particularly where few triples change from version to version.
Change-Based (CB): The core idea of CB representations is to store only triples
that change from a given reference version. Along these lines, in the following
we denote by ij := Gi n Gj the triples in version i not in version j. We consider
four CB representations based on four transformations of G = (G1; : : : ; Gn):
As can be seen for the de nitions of Gi, these transformations are lossless: we
can retrieve any version of the graph from any such transformation. The rst two
transformations start with the earliest version as a base. The rst encodes deltas
always with respect to the rst version. The second encodes deltas with respect to
the previous version. The latter two transformations start with the latest version.
The third encodes deltas with respect to the latest version. The forth encodes
deltas with respect to the subsequent version. Letting H = (H1; : : : ; Hn) such
that Hi = Gn i+1 (1 i n), i.e., such that H \reverses" G, we remark that
G1n = Hn1 and Gnn 1 = Hnn 1. Each such transformation can then be represented
as a SPARQL dataset with 2n 1 named graphs.
      </p>
      <p>In terms of space we expect Gnn 1 and Gnn 1 to be the most e cient as they
always encode deltas from a neighbouring version. However, in terms of query
rewriting, we expect G1n and Gn1 to be more e cient as they do not require a
recursive construction of all intermediate versions towards the base version. In
terms of indexing a new version, we expect G1n followed by Gn 1 to be most
n
1
e cient as they require only computing the most recent deltas; we expect Gn,
followed by Gnn 1, to be much more expensive, requiring a recompute of all deltas.
On the other hand, Gn1 and Gnn 1 should be advantageous for queries over more
recent versions, and in particular over the most recent version (a common case).</p>
      <p>
        These four representations are analogous to di erential backups, incremental
backups, reverse-di erential backups, and reverse-incremental backups.
Timestamp-Based (TB): The intuition of the TB representation is to associate
each triple with the versions in which it is contained. Along these lines, we denote
by G(s; p; o) := fi j (s; p; o) 2 Gi for 1 i ng the versions containing (s; p; o).4
Let N := fN j 9(s; p; o) 2 Gi : G(s; p; o) = N; 1 i ng denote the family of
sets of versions associated with some triple in (s; p; o). We can then represent
the RDF archive with a named graph for each N 2 N . However, the number
of named graphs can reach 2n (or the number of unique triples in G). Another
4 The de nition G : I
(I [ L) ! 2f1;:::;ng [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is analogous to G = (G1; : : : ; Gn).
option is create a named graph for intervals [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. More speci cally, a triple (s; p; o)
is added to an interval [i; j] (for 1 i j n) if and only if (s; p; o) 2 Gk for
i k j and either i = 1 or (s; p; o) 2= Gi 1 and j = n or (s; p; o) 2= Gj+1; in
simpler terms, [i; j] is a maximal contiguous interval of versions in which (s; p; o)
appears (omitting empty intervals). The upper bound is now n(n + 1)=2 named
graphs, but triples may appear in multiple named graphs for di erent intervals.
      </p>
      <p>In general, we expect the space to be similar to Gnn 1 and Gnn 1; in other
words, quite good. However, as the number of versions n grows, O(n2) interval
graphs need to be unioned in the worst-case to materialise a particular version;
CB representations require processing O(1) (in the case of G1n and Gn1 ) or O(n)
(in the case of Gnn 1 and Gnn 1) named graphs to materialise a particular version.
Notation: We denote IC by i; di erential, incremental, reverse-di erential and
reverse-incremental CB by cd+, ci+, cd and ci , resp.; and interval TB by t.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Versioned Queries</title>
      <p>
        Given a SPARQL query Q over an RDF graph, we now describe automatic
rewritings of Q to generate solutions for di erent versions. We rst focus on
rewritings of triple patterns and then generalise. We assume that version
parameters are given via HTTP rather than extending the SPARQL syntax. For
reasons of space we rather present examples in online material [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
5.1
      </p>
      <p>Single-Version Queries
A single version query returns Q(Gv) for a speci ed version v. Our overall
strategy is to use FROM to construct the graph of the version where possible, as is the
case for all versions in i and t; for G1 in cd+, ci+; and for Gn in cd , ci . Otherwise
we rewrite each individual triple pattern appearing in Q in order to ensure that
it generates the same solutions as it would if evaluated over the graph of the
selected version. We now provide more details for each representation
Independent Copies (IC) For i we rewrite Q to fromfvg;fgQ, where v is the
IRI that names the graph for version v
Change-Based (CB) Recalling the observation that forwards and reverse CB
representations are analogous, for brevity we de ne the rewriting for the forwards
direction (cd+, ci+) only; the reverse direction (cd , ci ) follows naturally.</p>
      <p>For the di erential representation, we load the base version and the
positive delta into the default graph and, for each triple pattern, we subtract the
negative delta which is queried using a quad pattern. Thus for cd+ we rewrite
Q to fromf1;v1g;f1vgQ0, where 1, v1, 1v indicate the names of G1, 1v and 1v,
respectively; and Q0 replaces each triple pattern (s; p; o) 2 Q with the named
graph pattern [(s; p; o; ) minus (s; p; o; 1v)].</p>
      <p>Unfortunately the incremental rewriting is more complex. A rst idea would
be to take the union of G1 and all positive deltas 12; : : : ; vv 1 and then
subtract the (union of the) negative deltas 21; : : : ; vv 1; unfortunately this would
overlook triples that were removed from a version 1 &lt; u &lt; v but were added
back in a later version u &lt; u0 v (and were not removed again in a
version u0 &lt; u00 v). Hence a recursive rewriting appears to be necessary. Let
Q1 := fromf1g;fgQ; then Q2 := fromf1g;f12;21gQ01, where Q1 is the result
of replacing each triple pattern (s; p; o) in Q1 by the named graph pattern
P2 := [[(s; p; o; ) union (s; p; o; 21)] minus (s; p; o; 12)]. We can then apply this
rewriting recursively: Qi := fromf1g;f12;:::;(i-1)i;21;:::;i(i-1)gQ0i 1, where Q0i 1
replaces each named graph pattern Pi 1 appearing in Qi 1 with the recursive
pattern Pi := [[Pi 1 union (s; p; o; i(i-1))] minus (s; p; o; (i-1)i)].</p>
      <p>This rewriting leads to complex queries. We thus optimise using additional
features of SPARQL. To sketch the idea, our goal is then to make sure that each
triple pattern only matches triples that appear in a base version and were not
removed, or that appear in a positive delta ij such that 1 &lt; i &lt; j v and
do not appear in a (later) negative delta lk for j k &lt; l v. In practice,
for each delta ij stored as a named graph ji, in o ine processing, we index
meta-data of the form (ji; ver; j; $), and (ji; type; pos; $) in the case that i &lt; j
or (ji; type; neg; $) in the case that j &lt; i ($ 2 I is a reserved name for the
meta-data graph). We can then check the aforementioned condition by using
aggregation to nd the maximum version of a negative delta less than or equals
v in which the triple pattern matches, then ltering the base version or any
positive delta earlier than this maximum version. While the resulting query is
still quite complex, we found it to be more practical than the recursive rewriting.
Timestamp-Based (TB) Let i:j denote the name of the graph for the interval
of versions [i; j]. We rewrite Q to the query fromfIg;fgQ, where I is the set of
IRIs naming intervals in which v is contained; formally: I := fi:j j i v jg.
5.2</p>
      <p>Delta-Version Queries
Given a query Q, a control version u and a target version v, delta-version queries
give solutions in Q(Gv)nQ(Gu). The general strategy for rewriting is to construct
a query [Qv minus Qu] where Qv and Qu are the respective single-version queries.
5.3</p>
      <p>Other SPARQL Features
In the SPARQL algebra, only the evaluation of triple patterns and property paths
(regular expressions that match arbitrary length paths in the graph) directly
accept the graph as input. Hence, given a SPARQL query Q (over a default
graph), if we can individually rewrite each triple pattern and (property) path
pattern of Q to generate solutions for Gv, then the rewritten query Q0 will
generate precisely the solutions for Gv. We previously described this process
for triple patterns. However, in SPARQL we cannot always express a property
path over multiple named graphs in the query dataset. For example, consider
a property path :y+ indicating a path of one or more predicates :y, a positive
integer K 1, and two named graphs (n1; f(c2k 2; :y; c2k 1) j 1 k Kg)
and (n2; f(c2k 1; :y; c2k) j 1 k Kg) such that there is path for :y+ of length
2K (edges) that \alternates" between both named graphs. A GRAPH clause with
a variable would evaluate this path on each graph separately (we cannot bind
the graph variable to two graph names in a single solution). Though we can use
FROM over n1 and n2 to form a default graph for evaluating :a+, we can only
do this once per query. We can rather use a join of 2K GRAPH clauses, but K is
bounded by the data, not the query (nor the number of versions). Thus while we
support property paths for single-version queries on i and t, and delta-version
queries on i, we do not know how to support them in the other cases.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Experiments</title>
      <p>We now perform experiments to address the following three research questions
(Q1) Which of the six representations allow for better compression, more e
cient indexing, and more e cient updates of a new version? (Q2) How do the
query runtimes of compressed representations (CB, TB ) compare with indexing
complete versions (IC )? (Q3) Which representation works best overall?
Setting We address these questions for a Wikidata archive of 8 weekly truthy
versions from 2017-08-09 with 1.506 billion triples, to 2017-09-27 with 1.924
billion triples. The RDF archive consists of 13.477 billion triple{version pairs.
Each week 25{93 million triples are added, while 4{6 million triples are removed.
We take Wikidata's example queries, de ned by users5, and translate
Wikidataspeci c features (e.g., the label service) to standard SPARQL. We further lter
queries that feature federation to other endpoints, property paths (not supported
by all representations), and quali ers (not in truthy versions). The result is a test
set of 146 SPARQL queries. We take Virtuoso as our SPARQL implementation.
Runtimes are averaged over three runs. Query timeouts were set to 5 minutes.
The machine used has 120GB of RAM and a standard SATA hard-disk.
Indexing We rst look at the results of total index sizes for each representation.
In Figure 2 we show the index sizes (GB) for each representation, the time taken
(min.) to bulk load all versions in the representation, and the time taken (min.)
to update a seven-version archive with the eighth version. We see that i has the
largest index sizes, followed by cd+ and cd , then t, and nally ci+ and ci . The
bulk load times correlate with index size, with i (thus) being by far the slowest.
We see a similar trend in version updates, except that cd is far slower than the
other alternatives (even i) as the entire archive must be built from scratch.
Single version queries We apply the rewriting of our 146 SPARQL queries for
each of our six representations in order to retrieve results for version 1, 5 and 8.
We show the results as box-plots with log y-axis in Figure 3 (with the timeout
as the maximum). Median times were generally in the range of 100{1000 ms,
though 1{20 queries timed out in each experiment, a ecting the mean (shown
as a diamond). In terms of mean and median runtimes, i performs the best,
with c+, cd and t also performing competitively across the di erent versions.</p>
      <p>d
5 https://www.wikidata.org/wiki/Wikidata:SPARQL query service/queries/examples
200
)B160 164
G
(e 120
z
iS 80
x
ed 40
n
I 0</p>
      <p>105
)s 104
(m103
e
im102
T101</p>
      <p>100
Conversely, ci+ and ci perform poorly relative to the other options (except ci
in the case of version 8). Contrasting query times with index sizes, we note a
clear time{space trade-o , where the largest index performs best, the smallest
perform worst, and those with intermediate space perform middlingly.
Delta version queries Next we rewrite our 146 SPARQL queries in order to
retrieve delta results between versions 1{2, 4{5 and 7{8 from each of our six
representations. We again show the results as box-plots with log y-axis in
Figure 4 (with the timeout as the maximum). When compared with single version
queries, we see an increase in time, where the median runtimes of even the best
performing representations approach or exceed 1000 ms more often. This time
the best performance is o ered by t, followed by i, ci+ and ci . Conversely, cd+
and cd perform very poorly. Of note is that incremental builds perform better
than di erential builds; we believe that this is due to the ability to cache smaller
graphs, most of which are used to generate results for both versions.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusion</title>
      <p>We now re ect back on our research questions: (Q1) In terms of indexing space
and time, incremental builds with an initial base version (cd+) are best. (Q2) In
general the uncompressed IC representation (i) o ers the best query runtimes,
but interval TB (t) is quite competitive, and even outperforms IC for
deltaversion queries. (Q3) Rather than there being an overall winner, we note a
time{space tradeo , where less compact representations have faster queries and
more compact representations have slower queries. Interval TB (t) arguably
strikes the best balance for space and time, though this may not hold with more
versions, particularly in RDF archives where triples are often added or removed
multiple times, as a quadratic number of intervals may need to be accessed.</p>
      <p>
        For future work, it would be of interest to run experiments for other SPARQL
engines and other RDF archive benchmarks. Also it would be interesting to
run more diverse types of versioned queries, such as delta versions with larger
gaps, queries returning versions as solutions, etc.; a related direction would be
to consider operators from temporal logics [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. There are also open questions
relating to more optimal/concise query rewritings, and support for paths.
      </p>
      <p>
        In conclusion: for those who wish to host RDF archives, with di erent
versions of an RDF graph, is SPARQL all you need? Specialised languages and
systems can o er more features and consume less time and space. But with some
caveats, our results suggest that query rewriting over an o -the-shelf SPARQL
engine can be a solid (easy-to-deploy) option for such scenarios.
Online material Please see [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] for code and queries.
      </p>
      <p>Acknowledgements This work was funded by Fondecyt Grant No. 1181896 and
ANID Millennium Science Initiative Program ICN17 002.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>J. Anderson</surname>
            and
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Bendiken</surname>
          </string-name>
          .
          <article-title>Transaction-time queries in dydra</article-title>
          .
          <source>In Managing the Evolution and Preservation of the Data Web (MEPDaW)</source>
          , pages
          <fpage>11</fpage>
          {
          <fpage>19</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>A.</given-names>
            <surname>Bahri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Laajimi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N. Y.</given-names>
            <surname>Ayadi</surname>
          </string-name>
          .
          <article-title>Distributed RDF Archives Querying with Spark</article-title>
          .
          <source>In European Semantic Web Conference (ESWC)</source>
          , pages
          <fpage>451</fpage>
          {
          <fpage>465</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>A.</given-names>
            <surname>Cerdeira-Pena</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Farin~a,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Fernandez</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Mart</surname>
          </string-name>
          nez-Prieto.
          <article-title>Selfindexing RDF archives</article-title>
          .
          <source>In Data Compression Conference (DCC)</source>
          , pages
          <fpage>526</fpage>
          {
          <fpage>535</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>I. Cuevas. Online</given-names>
            <surname>Material</surname>
          </string-name>
          ,
          <year>2020</year>
          . https://github.com/HunterNacho/sparql-versioning/.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Fernandez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Polleres</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Umbrich</surname>
          </string-name>
          .
          <article-title>Towards E cient Archiving of Dynamic Linked Open Data</article-title>
          .
          <source>In DIACHRON Managing the Evolution and Preservation of the Data Web</source>
          , pages
          <volume>34</volume>
          {
          <fpage>49</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Fernandez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Umbrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Polleres</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Knuth</surname>
          </string-name>
          .
          <article-title>Evaluating query and storage strategies for RDF archives</article-title>
          .
          <source>Semantic Web</source>
          ,
          <volume>10</volume>
          (
          <issue>2</issue>
          ):
          <volume>247</volume>
          {
          <fpage>291</fpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>F.</given-names>
            <surname>Grandi</surname>
          </string-name>
          . T-SPARQL:
          <article-title>A tsql2-like temporal query language for RDF</article-title>
          .
          <source>In Local Proceedings of the Fourteenth East-European Conference on Advances in Databases and Information Systems</source>
          , pages
          <fpage>21</fpage>
          {
          <fpage>30</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>M.</given-names>
            <surname>Graube</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hensel</surname>
          </string-name>
          , and
          <string-name>
            <surname>L. Urbas.</surname>
          </string-name>
          <article-title>R43ples: Revisions for Triples - An Approach for Version Control in the Semantic Web</article-title>
          .
          <source>In Linked Data Quality (LDQ)</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Hurtado</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Vaisman</surname>
          </string-name>
          .
          <article-title>Introducing time into RDF</article-title>
          .
          <source>IEEE Trans. Knowl</source>
          . Data Eng.,
          <volume>19</volume>
          (
          <issue>2</issue>
          ):
          <volume>207</volume>
          {
          <fpage>218</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>S.</given-names>
            <surname>Harris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Seaborne</surname>
          </string-name>
          , and E. Prud'hommeaux, editors.
          <source>SPARQL 1.1 Query Language. 21 March</source>
          <year>2013</year>
          . Available at http://www.w3.org/TR/sparql11-query/.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. T. Kafer,
          <string-name>
            <given-names>A.</given-names>
            <surname>Abdelrahman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Umbrich</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          <article-title>O'Byrne, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          .
          <article-title>Observing linked data dynamics</article-title>
          .
          <source>In The Semantic Web: Semantics and Big Data</source>
          , 10th International Conference, ESWC 2013, Montpellier, France, May
          <volume>26</volume>
          -30,
          <year>2013</year>
          . Proceedings, pages
          <volume>213</volume>
          {
          <fpage>227</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>U.</given-names>
            <surname>Khurana</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Deshpande</surname>
          </string-name>
          .
          <article-title>Storing and Analyzing Historical Graph Data at Scale</article-title>
          .
          <source>In International Conference on Extending Database Technology (EDBT)</source>
          , pages
          <fpage>65</fpage>
          {
          <fpage>76</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>V.</given-names>
            <surname>Kotsev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Minadakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Papakonstantinou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Erling</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          <article-title>Fundulaki, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Kiryakov. Benchmarking RDF Query</surname>
          </string-name>
          <article-title>Engines: The LDBC Semantic Publishing Benchmark</article-title>
          .
          <source>In Benchmarking Linked Data (BLINK)</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>T.</given-names>
            <surname>Neumann</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Weikum.</surname>
          </string-name>
          x
          <article-title>-RDF-3X: Fast Querying, High Update Rates, and Consistency for RDF Databases</article-title>
          .
          <source>Proc. VLDB Endow</source>
          .,
          <volume>3</volume>
          (
          <issue>1</issue>
          ):
          <volume>256</volume>
          {
          <fpage>263</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>V.</given-names>
            <surname>Papakonstantinou</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Fundulaki</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Flouris</surname>
          </string-name>
          .
          <article-title>Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning Benchmark</article-title>
          .
          <source>In Scalable Semantic Web Knowledge Base Systems (SSWS)</source>
          , pages
          <fpage>45</fpage>
          {
          <fpage>60</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>J. Perez</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Arenas</surname>
            , and
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Gutierrez</surname>
          </string-name>
          .
          <article-title>Semantics and complexity of SPARQL</article-title>
          .
          <source>ACM Trans. Database Syst</source>
          .,
          <volume>34</volume>
          (
          <issue>3</issue>
          ):
          <volume>16</volume>
          :1{
          <fpage>16</fpage>
          :
          <fpage>45</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>M. Perry</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Jain</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Sheth</surname>
          </string-name>
          .
          <article-title>SPARQL-ST: extending SPARQL to support spatiotemporal queries</article-title>
          .
          <source>In Geospatial Semantics and the Semantic Web - Foundations</source>
          , Algorithms, and Applications, pages
          <volume>61</volume>
          {
          <fpage>86</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>A.</given-names>
            <surname>Pnueli</surname>
          </string-name>
          .
          <article-title>The temporal logic of programs</article-title>
          .
          <source>In Foundations of Computer Science (FOCS)</source>
          , pages
          <fpage>46</fpage>
          {
          <fpage>57</fpage>
          . IEEE Computer Society,
          <year>1977</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <given-names>A.</given-names>
            <surname>Pugliese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Udrea</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V. S.</given-names>
            <surname>Subrahmanian</surname>
          </string-name>
          .
          <article-title>Scaling RDF with time</article-title>
          .
          <source>In World Wide Web Conference (WWW)</source>
          , pages
          <fpage>605</fpage>
          {
          <fpage>614</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <given-names>R.</given-names>
            <surname>Taelman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. V.</given-names>
            <surname>Sande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. V.</given-names>
            <surname>Herwegen</surname>
          </string-name>
          , E. Mannens, and
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          .
          <article-title>Triple storage for random-access versioned querying of RDF archives</article-title>
          .
          <source>J. Web Semant</source>
          .,
          <volume>54</volume>
          :4{
          <fpage>28</fpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <given-names>R.</given-names>
            <surname>Taelman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. V.</given-names>
            <surname>Sande</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          . OSTRICH:
          <article-title>Versioned Random-Access Triple Store</article-title>
          .
          <source>In Comp. of The Web Conference</source>
          , pages
          <volume>127</volume>
          {
          <fpage>130</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <given-names>T. P.</given-names>
            <surname>Tanon</surname>
          </string-name>
          and
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Suchanek</surname>
          </string-name>
          .
          <article-title>Querying the Edit History of Wikidata</article-title>
          .
          <source>In ESWC Satellite Events</source>
          , pages
          <volume>161</volume>
          {
          <fpage>166</fpage>
          . Springer,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <given-names>J.</given-names>
            <surname>Tappolet</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          .
          <article-title>Applied temporal RDF: e cient temporal querying of RDF data with SPARQL</article-title>
          .
          <source>In Extended Semantic Web Conference (ESWC)</source>
          , pages
          <fpage>308</fpage>
          {
          <fpage>322</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tzitzikas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Theoharis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Andreou</surname>
          </string-name>
          .
          <article-title>On Storage Policies for Semantic Web Repositories That Support Versioning</article-title>
          .
          <source>In European Semantic Web Conference (ESWC)</source>
          , volume
          <volume>5021</volume>
          , pages
          <fpage>705</fpage>
          {
          <fpage>719</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25. M.
          <article-title>Volkel and T. Groza. SemVersion: An RDF-based ontology versioning system</article-title>
          .
          <source>In IADIS Conference: WWW/Internet</source>
          , volume
          <year>2006</year>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <given-names>D.</given-names>
            <surname>Vrandecic</surname>
          </string-name>
          and
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Krotzsch. Wikidata: a free collaborative knowledgebase</article-title>
          .
          <source>Commun. ACM</source>
          ,
          <volume>57</volume>
          (
          <issue>10</issue>
          ):
          <volume>78</volume>
          {
          <fpage>85</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>C. Zaniolo</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Atzori</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            , and
            <given-names>J. Gu.</given-names>
          </string-name>
          <article-title>User-friendly temporal queries on historical knowledge bases</article-title>
          .
          <source>Inf. Comput.</source>
          ,
          <volume>259</volume>
          (
          <issue>3</issue>
          ):
          <volume>444</volume>
          {
          <fpage>459</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <given-names>D.</given-names>
            <surname>Zeginis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tzitzikas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Christophides</surname>
          </string-name>
          .
          <article-title>On computing deltas of RDF/S knowledge bases</article-title>
          .
          <source>TWEB</source>
          ,
          <volume>5</volume>
          (
          <issue>3</issue>
          ):
          <volume>14</volume>
          :1{
          <fpage>14</fpage>
          :
          <fpage>36</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <given-names>A.</given-names>
            <surname>Zimmermann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lopes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Polleres</surname>
          </string-name>
          , and
          <string-name>
            <given-names>U.</given-names>
            <surname>Straccia</surname>
          </string-name>
          .
          <article-title>A general framework for representing, reasoning and querying with annotated Semantic Web data</article-title>
          .
          <source>J. Web Sem</source>
          .,
          <volume>11</volume>
          :
          <fpage>72</fpage>
          {
          <fpage>95</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>