=Paper= {{Paper |id=Vol-1824/mepdaw_paper_3 |storemode=property |title=Time Travel Queries in RDF Archives |pdfUrl=https://ceur-ws.org/Vol-1824/mepdaw_paper_3.pdf |volume=Vol-1824 |authors=Melisachew Wudage Chekol,Valeria Fionda,Giuseppe Pirrò |dblpUrl=https://dblp.org/rec/conf/esws/ChekolFP17 }} ==Time Travel Queries in RDF Archives== https://ceur-ws.org/Vol-1824/mepdaw_paper_3.pdf
               Time Travel Queries in RDF Archives

         Melisachew Wudage Chekol1 , Valeria Fionda2 , and Giuseppe Pirrò3
                              1
                                  Data and Web Science Group,
                              University of Mannheim, Germany
                          mel@informatik.uni-mannheim.de
                           2
                             DeMaCS, University of Calabria, Italy
                                 fionda@mat.unical.it
       3
         Institute for High Performance Computing and Networking, ICAR-CNR, Italy
                                   pirro@icar.cnr.it



       Abstract. We research the problem of querying RDF archives. In this setting
       novel data management challenges emerge in terms of support for time-traversing
       structured queries. We formalize an extension of SPARQL, called SPARQ–LTL,
       which incorporates primitives inspired by linear temporal logic. We give formal
       semantics for SPARQ–LTL and devise query rewriting strategies from SPARQ–
       LTL into SPARQL. The usage of SPARQ–LTL allows to gain conciseness and
       readability when expressing complex temporal queries. We implemented our ap-
       proach and evaluated query running time and query succinctness.


1   Introduction

Research in the field of archiving policies of Linked Open Data (LOD) opens up new
opportunities for the traceability of Semantic Web data over time [13]. Several strands
of research (e.g., [11]) have focused on providing primitives to trace whether a dataset
or a particular entity has changed, as these functionalities are not natively supported
by the SPARQL query language. Dealing with RDF archives poses challenges both in
terms of the representation/storage of historical data and the type of query primitives
to be provided. Most existing knowledge bases (e.g., DBpedia) allow to query the lat-
est version of data only, although making available historical data in the form of data
dumps. In this case, the design of a data model to store data versions as well as in-
frastructure for query processing is left open. Other efforts (e.g., [8]) have focused on
efficient indexing strategies for RDF archives.
    Querying historical data is important in many contexts, from analytic tasks where
one needs to understand how knowledge has evolved and updated (e.g., pollution levels
in a city), to generic exploratory research, where one is interested in posing queries like
“Retrieve the annotations of a gene since the discovery of a particular interaction” or
“Find players that are now managing some club they played for”. Typically one can
use plain SPARQL to express such queries. However, besides hindering the readability
of a query this approach is tedious and error-prone (e.g., one needs to be consistent
with variable names). Therefore, a number of proposals came up with extensions of
SPARQL both in terms of language primitives and indexing techniques (see Related
Work). Despite the plethora of approaches for querying temporal RDF data, we believe
that, especially for the case of RDF archives, having a simple approach that neither
require to setup complex processing infrastructures nor to learn complex temporal lan-
guages can be useful. Therefore, the tenet of this paper is to study how to facilitate
querying historical RDF data on existing SPARQL processors.
    We formalize a powerful extension of SPARQL called SPARQ–LTL, which allows
to use a variety of temporal operators inspired by Linear Temporal Logics (LTL) [9]
(e.g., SINCE, NEXT, PREVIOUS). As we will show, SPARQ–LTL allows to write
concise readable temporal queries in a simple way. In particular, in SPARQ–LTL the
relationships between time points are implicit and transparent to the user. To evaluate
SPARQ–LTL queries we devise a translation from SPARQ–LTL to SPARQL that al-
lows to readily use our machineries in an elegant and non-intrusive way on existing
SPARQL processors.
SPARQ–LTL by Example. We now provide some examples of SPARQ–LTL queries.
Example 1. Select footballers who played at least twice for the same club.
SELECT ?person WHERE {
  EVENTUALLY{
    ?person :occupation :football_player .
    ?person :member_of_team ?club .
    NEXT { EVENTUALLY {
      ?person :member_of_team ?club .
    }}
  } }
    The SPARQ–LTL query expressing the example makes usage of the EVENTUALLY
and NEXT temporal operators. EVENTUALLY checks that eventually (along the whole
time line) one has to be a football player, playing for some club (a binding of the variable
?club). After identifying the timepoint (data version) where the first part of the query
has a solution, the usage of NEXT, by moving at the next point in the timeline, checks
again the same condition. As an example, M. Hughes played for Manchester United
both in 1980-86 and 1988-95. Note that the management of timepoints is completely
transparent to the user. We now give another example of SPARQ–LTL along with its
translation into SPARQL. We assume each historical data version (e.g., DBpedia dump)
to be stored in a separate named graph.
Example 2. Find the name of the coach of the Italian national football team after the
sacking of Cesare Prandelli.
              SPARQ–LTL                              Translation into SPARQL
                                    SELECT ?n WHERE {
                                    {GRAPH  {
SELECT ?n WHERE {                   dbp:Italy dbpo:coach dbp:Cesare_Prandelli.
PAST {                              GRAPH  {
dbp:Italy dbpo:coach                dbp:Italy dbpo:coach ?n.
dbp:Cesare_Prandelli.               FILTER(?n != dbp:Cesare_Prandelli)}
NEXT {                              } } UNION ...... UNION{
dbp:Italy dbpo:coach ?n.            GRAPH  {
FILTER                              dbp:Italy dbpo:coach dbp:Cesare_Prandelli.
(?n!=dbp:Cesare_Prandelli)          GRAPH  {
} } }                               dbp:Italy dbpo:coach ?n.
                                    FILTER(?n != dbp:Cesare_Prandelli)
                                    }   } }}
    The previous SPARQ–LTL query makes usage of PAST to find when (in which
version) C. Prandelli was the coach. Then, via the nested NEXT operator executes the
innermost part of the query looking for the Italian football team coach in the subsequent
version. Assuming to have 6 versions of the data it can be noted that PAST makes usage
of UNION queries over each of the 6 versions vi , with i ∈ {1, ..., 6}, (since the starting
version is the current one, i.e., v6 ); then, for each vi , NEXT checks in version vi+1 (via
a FILTER) that the coach changed — actually, note that the evaluation of PAST can
transparently start from the version v5 since version v6 does not have a NEXT version.
The advantage of using SPARQ–LTL can be noted both in terms of succinctness (the
SPARQ–LTL query has ∼140 characters while the complete translation ∼600) and
readability. We formalize the translation in Section 3.3.
Contributions and Outline. We make the following main contributions: (i) a formal-
ization of SPARQ–LTL, a temporal extensions of SPARQL that offers a concise and
readable syntax for temporal queries; (ii) a formal semantics; (iii) a translation from
SPARQ–LTL into SPARQL; (iv) an experimental evaluation along with an analysis of
the succinctness of SPARQ–LTL queries.


2         Preliminaries
This section provides some background about the machineries used in this paper.


2.1        RDF and SPARQL
An RDF triple4 is a tuple of the form hs, p, oi∈I×I×I∪L, where I (IRIs) and L (lit-
erals) are countably infinite sets. An RDF graph G is a set of triples. To query RDF
data, a standard query language, called SPARQL, has been defined. The semantics of
a SPARQL query is defined in terms of solution mappings. A (solution) mapping µ is
a partial function µ: V → I ∪ L. Two mappings, say µ1 and µ2 , are compatible (resp.,
not compatible), denoted by µ1 ∼    = µ2 (resp., µ1 ∼
                                                    6 µ2 ), if µ1 (?v)
                                                     =                = µ2 (?v) holds
(resp., does not hold) for all variables ?v ∈ dom(µ1 ) ∩ dom(µ2 ) . If µ1 ∼ = µ2 then
µ1 ∪ µ2 denotes the mapping obtained by extending µ1 according to µ2 on all vari-
ables in dom(µ2 )\dom(µ1 ). This allows for defining the join, union, difference, and
left outer join operations between two sets of mappings M1 , and M2 as shown below:
                      M1 1 M2 = {µ1 ∪ µ2 | µ1 ∈ M1 , µ2 ∈ M2 and µ1 ∼
                                                                    = µ2 }
                      M1 ∪ M2 = {µ | µ ∈ M1 or µ ∈ M2 }
                      M1 \ M2 = {µ1 ∈ M1 | ∀µ2 ∈ M2 , µ1 6∼ µ2 }
                     M1 ./ M2 = M1 1 M2 ∪ M1 \ M2

The SPARQL semantics uses a function JQKG that evaluates a query Q on a graph G
and gives a multiset (bag) of mappings in the general case.




      4
          We do not consider bnodes.
2.2   Linear Temporal Logic

Linear temporal logic (LTL) is an extension of modal logic to formally specify systems
and reason over them; here, modalities are temporal operators relating events happening
at different time instants over a linearly ordered timeline. Classically, LTL formulas
are interpreted over infinite sequences of states. The LTLf variant [9, 14] considers
formulas interpreted over traces of finite length. Given the fact that we take inspiration
from LTL in the context of dynamic knowledge bases, we will focus on the PLTLf
variant, which is an extension of LTLf with past modalities [25].
    PLTLf formulas are built over a set V of atomic propositional variables by us-
ing the Boolean connectives ∧, ∨, ¬ plus the future temporal operators X (NEXT),
F (EVENTUALLY), G (ALWAYS), U (UNTIL), and the past temporal operators Y
(PREVIOUS), P (PAST), H (ALWAYSPAST), S (SINCE). The meaning of the tem-
poral operators is summarized in Table 1.


PLTLf Operator Meaning
     Xq        q has to hold in the next state
     Fq        q has to hold in some future state
     Gq        q has to hold in the current and all future states
   q1 U q2     q1 has to hold in all states in the future until there is a state in which q2 holds
     Yq        q has to hold in the previous state
     Pq        q has to hold in some past state
     Hq        q has to hold in all past states
   q1 S q2     q1 has to hold starting from the state in the past where q2 holds
                         Table 1. Meaning of LTL temporal operators.




2.3   Archiving Policies

Our main goal in this paper is to devise a temporal extension of SPARQL with particular
emphasis on an easy (and intuitive) usage of temporal operators and their combinations.
Our second goal is to enable querying RDF archives on existing SPARQL processors.
In what follows we provide an overview about archiving policies for historical RDF
data [11].
Independent Versions. The first approach works by keeping independent versions of
the data. In the case of RDF this could be implemented by assuming that each data ver-
sion (e.g., DBpedia dumps) is stored in a separate named graph. Consider the data taken
from DBpedia reported in Fig. 1. A query ran on March 2014 and asking for the coach of
the Italian football team would have returned Cesare Prandelli. The same query ran on
September 2014 would have returned Antonio Conte. Query infrastructures allowing to
query the latest version of the data, miss the flow of updates. As an example it would not
be possible to ask queries like “Find players since C. Prandelli was the coach”. A sim-
ple way to represent historical RDF data using versioning is via RDF quads. An RDF
quad (for simplicity, we omit bnodes) is a tuple of the form hs, p, o, ci ∈ I × I×(I∪L)×
I, where the fourth element of the quad represents the named graph to which the triple
belongs5 . Hence, quads that differ for the fourth element only (e.g., white (uncolored)
triples in Fig. 1), represent the fact that a triple is present in different versions.


    March 2014                     September 2014                             
               dbp:Cesare_Prandelli              dbp:Andrea_Pirlo                dbp:Antonio_Conte
                                                                                                                              dbp:Andrea_Pirlo
                               dbpo:name                                             dbpo:coach                                  .
           dbpo:coach                                                                                             dbpo:name
                                                    dbp:Luca_Antonelli
                                                                                                                                 .
                                               dbpo:name   .                 dbp:Italy_national_football_team
                                                                                                                                 .
            dbp:Italy_national_football_team
                                                           .               dbpo:regionalName                    dbpo:name dbp:Matteo_Darmian
 dbpo:regionalName                                         .
                                     dbpo:name
                                                   dbp:Giampaolo_Pazzini    dbp:UEFA_European_Championship           dbp:UEFA_Euro_2016_qualifying
              dbp:UEFA_European_Championship                                                             dbpo:current
    July 2016                                           update                 addition                deletion
             dbp:Giampiero_Ventura
                                                     dbp:Andrea_Pirlo       DBpedia Update Statistics about People
           dbpo:coach
                                               dbpo:name .
                                                           .
            dbp:Italy_national_football_team
                                                           .
          dbpo:regionalName dbpo:name dbp:Matteo_Darmian

     dbp:UEFA_European_Championship                  dbp:UEFA_Euro_2016
                                      dbpo:current


                                        Fig. 1. An excerpt of evolving data from DBpedia.


Tracking the changes. This approach is based on the computation (and storing) of
updates or differences (deltas) between versions. It requires additional computational
costs for delta propagation which may affect version-focused retrieving operations.
Timestamps. In this case each RDF triple is annotated with its temporal validity. From
a practical point of view, there are two main approaches: (i) compression techniques
e.g. using selfindexes or delta compression in B+Trees (e.g., [17]); (ii) annotate triples
(with time information) only when they are added or deleted.
    In this paper we consider the independent versions model to describe our SPARQ–
LTL language. Nevertheless, SPARQ–LTL can be also used (with minor changes) with
the other archiving models (see Section 6).

3         The SPARQ–LTL Language
We are now ready to introduce SPARQ–LTL, a general language that offers a concise
syntax for the writing of temporal queries. We first discuss SPARQ–LTL syntax, then
its formal semantics6 and finally the translation of SPARQ–LTL queries into SPARQL.

3.1        Syntax
Let V be a set of variables. The syntax of SPARQ–LTL query patterns (QP) is shown
in Fig. 2. The language extends SPARQL with constructs inspired by the temporal op-
erators of Linear Temporal Logic (LTL) as described in Table 1. In SPARQ–LTL the
      5
          For instance, the date in which a version is created
      6
          For the sake of readability we focus on set semantics.
complete name of the temporal constructs can be used, e.g. one can write ALWAYS
instead of G or ALWAYSPAST instead of H.


 QP ::= SPARQL operators
        t = (I ∪ V) × (I ∪ V) × (I ∪ L ∪ V) | QP1 AND QP2 | {QP1 } UNION {QP2 } |
        {QP1 } MINUS {QP2 } | QP1 OPT {QP2 } | QP FILTER {R} | GRAPH I ∪ V {QP} |

         Temporal operators
         X{QP} | F{QP} | G{QP} | {QP1 } U {QP2 } |
         Y{QP} | P{QP} | H{QP} | {QP1 } S {QP2 }

                                Fig. 2. Syntax of SPARQ–LTL.




3.2   Semantics

The formal semantics of SPARQ–LTL is shown in Table 2; rules in the top part define
the semantics of (standard) SPARQL operators (we focus on set semantics for sake of
exposition), while the semantics of the temporal constructs is defined in the bottom part.



               JtKGi    =   {µ | dom(µ) = var(t) and µ(t) ∈ Gi }
      Jq1 AND q2 KGi    =   {µ | µ ∈ Jq1 KGi 1 Jq2 KGi }
   Jq1 UNION q2 KGi     =   {µ | µ ∈ Jq1 KGi ∪ Jq2 KGi }
   Jq1 MINUS q2 KGi     =   {µ | µ ∈ Jq1 KGi \ Jq2 KGi }
      Jq1 OPT q2 KGi    =   {µ | µ ∈ Jq1 KGi ./ Jq2 KGi }
  Jq FILTER {R}KGi      =   {µ | µ ∈ JqKGi and eval(µ, R)=true}
    J GRAPH u {q}K      =   {µ | µ ∈ JqKGu }
                            S          0                   0
  J GRAPH ?v {q}K       =     Gj {µ | µ ∈ JqKGj and µ = µ ∪ {?v → Gj }}
             JX qKGi    =   S Gi+1
                            JqK
             JF qKGi    =     i≤j { dbp:INFT dbpo:name ?p.
SELECT ?p WHERE {               dbp:INFT dbpo:coach ?c2.
  dbp:INFT dbpo:name ?p.            FILTER (?c1!=?c2) } } UNION
  dbp:INFT dbpo:coach ?c1.   {GRAPH  { dbp:INFT dbpo:name ?p.
  PAST{                          dbp:INFT dbpo:coach ?c2.
    dbp:INFT dbpo:name ?p.           FILTER (?c1!=?c2) } } UNION
    dbp:INFT dbpo:coach ?c2. {GRAPH  { dbp:INFT dbpo:name ?p.
    FILTER (?c1!=?c2)            dbp:INFT dbpo:coach ?c2.
  } }                            FILTER (?c1!=?c2)} } UNION
                             {GRAPH  { dbp:INFT dbpo:name ?p.
                                 dbp:INFT dbpo:coach ?c2.
                                 FILTER (?c1!=?c2) } } UNION
                             {GRAPH  { dbp:INFT dbpo:name ?p.
                                 dbp:INFT dbpo:coach ?c2.
                                 FILTER (?c1!=?c2)} } }

    The SPARQL query on the right is automatically generated and can be evaluated on
existing processors. Note that the translation requires to look into all previous versions.
SPARQ-LTL               Rewriting
θ(t = hs, p, oi, i)   =t
θ(q1 AND q2 , i)      = θ(q1 , i) AND θ(q2 , i)
θ(q1 UNION q2 , i) = θ(q1 , i) UNION θ(q2 , i)
θ(q1 MINUS q2 , i) = θ(q1 , i) MINUS θ(q2 , i)
θ(q1 OPT q2 , i)      = θ(q1 , i) OPT θ(q2 , i)
θ(q FILTER {R}, i) = θ(q, i) FILTER (R)
θ( GRAPH v {q}, i) = { GRAPH v {θ(q, v)}}
θ( GRAPH ?v{q}, i)= { GRAPH h0i{θ(q, 0)} BIND(0 AS ?v)} UNION . . . UNION
                        { GRAPH hn-1i{θ(q, n-1)} BIND(n-1 AS ?v)}
θ(X{q}, i)            = GRAPH hi+1i{θ(q, i + 1)}
θ(F{q}, i)            = { GRAPH hii{θ(q, i)}}   UNION         { GRAPH hi+1i{θ(q, i+1)}}            UNION . . .
                        UNION { GRAPH hn-1i {θ(q, n-1) }}
θ(G{q}, i)            = GRAPH hii{θ(q, i)} GRAPH hi+1i{θ(q, i+1)} . . . GRAPH hn-1i{θ(q, n-1)}
θ({q1 }U{q2 }, i)     = { GRAPH hii{θ(q2 , i)}} UNION { GRAPH hi+1i{θ(q2 , i+1)} GRAPH hii{θ(q1 , i)}}
                        UNION . . . UNION { GRAPH hn-1i{θ(q2 , n-1)} GRAPH hii{θ(q1 , i)} . . .
                         GRAPH hn-2i{θ(q1 , n-2)}}
θ(Y{q}, i)            = GRAPH hi-1i{θ(q, i-1)}
θ(P{q}, i)            = { GRAPH hii{θ(q, i)}} UNION     { GRAPH hi-1i{θ(q, i-1)}}   UNION    ...     UNION
                        { GRAPH h0i{θ(q, 0)}}
θ(H{q}, i)            = GRAPH hii{θ(q, i)} GRAPH hi-1i{θ(q, i-1)} . . . GRAPH h0i{θ(q, 0)}
θ({q1 }S{q2 }, i)     = { GRAPH hii{θ(q2 , i)}} UNION { GRAPH hi-1i{θ(q2 , i-1)} GRAPH hii{θ(q1 , i)}}
                        UNION . . . UNION { GRAPH h0i{θ(q2 , 0)} GRAPH h1i{θ(q1 , 1)} . . .
                         GRAPH hii{θ(q1 , i)}}

Table 3. Translating SPARQ-LTL into SPARQL. θ(q, i) is the translation function where q is
a query pattern and i is the i-th version. Each version is stored in a separate named graph.
GRAPHh0i denotes the oldest version whereas GRAPHhn − 1i the newest.


4    Related Work
In databases, there exists a vast body of literature on temporal relational databases for
developing temporal data models and query languages (cf. the survey [30]). The most
prominent query language for temporal databases is TSQL2 [33]. TSQL2 is not a part
of the standard SQL, however, the latest standard (SQL:2011) supports the valid (viz.
application time) and transaction time models [35].
    In the Semantic Web, the introduction of time into RDF has been studied almost one
decade ago [22]. Gutierrez et al. [22] studied fundamental problems of temporal RDF
graphs such as entailment and outlined a query language allowing to express queries
making usage of intervals. Along these lines several other extensions of SPARQL such
as τ -SPARQL [34], T-SPARQL [18], tRDF [32] and RDF-TX [17] have been proposed.
τ -SPARQL extends SPARQL query patterns with two variables ?s and ?e to bind the
start time and end time of temporal RDF triples and express temporal queries. The
evaluation is done by rewriting τ -SPARQL queries into standard SPARQL queries. T-
SPARQL leverages a multi-temporal RDF model where each RDF triple is annotated
with a temporal element that represents a set of temporal intervals. T-SPARQL is based
on TSQL2 (temporal SQL). The tRDF system builds upon the Gutierrez et al. [22] tem-
poral RDF model. tRDF queries are evaluated using an index (viz. tGrin) based on a
strategy that clusters RDF triples using a graphical-temporal distance. RDF-TX [17]
offers both a temporal extension of SPARQL and an indexing system based on the
compressed multiversion B+ tree approach from relational databases. SPARQL-ST is a
query language for spatiotemporal RDF data [31]. It extends SPARQL with spatial and
temporal variables. The temporal variables appear in the fourth position of valid time
temporal triple patterns (i.e., when temporal triples are represented by quads); and thus,
these variables can be mapped into time intervals upon query evaluation. Additionally,
SPARQL-ST proposed a new filter operator called TEMPORAL FILTER which sup-
ports temporal constraints based on Allen’s interval relations [1]. Furthermore, an ex-
tension of SPARQL-ST called stSPARQL, using the valid time model, is studied in [5,
24], which uses linear constraints to query valid time spatiotemporal RDF data (stRDF).
stSPARQL is implemented and integrated into Strabon7 that extends SPARQL with a set
of temporal functions designed based on Allen’s interval algebra. It has also functions
for time interval intersection, union, and so on. While stSPARQL is based on Allen’s
interval algebra, SPARQ-LTL is based on LTL and is able to support stSPARQL’s tem-
poral functions via the FILTER operator. A logic-based approach for representing valid-
ity time in RDF(S) and OWL 2 is reported in [29]. Based on this approach, the authors
extend SPARQL by augmenting basic graph patterns with a number of temporal re-
lations such as during, occurs, at, and so on. No implementation is available for the
proposed query language. Our goal is to enable the querying of historical RDF data
with two tenets: (i) using existing SPARQL processors; (ii) providing a concise, expres-
sive and readable temporal language. Surprisingly, despite the plethora of related work,
none of the existing approaches can fulfill these desiderata. Approaches like tRDF and
RDF-TX have the advantage of using ad-hoc indexing strategies. However, this intro-
duces problems of index construction/maintenance and require non-standard compo-
nents, thus hindering their applicability on existing SPARQL processors. RDF-TX and
tRDF consider a subset of SPARQL (basic graph patterns) and the semantics of the tem-
poral language is only given in terms of SPARQL; we provide semantics for temporal
operators. None of these languages (τ -SPARQL, T-SPARQL, tRDF, and RDF-TX) has
focused on conciseness and readability of temporal queries, leaving to the user the bur-
den to encode in standard SPARQL the temporal parts. On the contrary, SPARQ–LTL
works with an abstract syntax that offers a rich class of (abstract) temporal operators
inspired by Linear Temporal Logic [14]. We also want to mention: (i) proposals like
tOWL [28] that focus on designing extensions to incorporate time; (ii) temporal con-
junctive query answering (e.g., Borgwardt et al.[7]); (iii) translation of SPARQL into
LTL (e.g., Mateescu et al.[26, 19], Artale et al. [3, 2]); (iv) approaches like the DBpedia
Wayback Machine [10] that allow to retrieve data at a certain timestamp (provided by
the user). Our approach differs from (i), (ii), (iii) in the fact that we focus on an exten-
sion of SPARQL to write concise and readable temporal queries, and query processing
on existing SPARQL processors; as for (iii) we proceed in the opposite direction, ex-
pressing LTL operators in SPARQL. Finally (iv), only focus on providing access to an
entity at a given time and do not offer any temporal query language. We point other
relevant literature in the area [11, 23, 20, 27, 4].
   7
       Strabon is a spatiotemporal RDF store http://www.strabon.di.uoa.gr
5       Experimental Evaluation

In the experimental evaluation we measured: (i) query running time when translating a
SPARQ–LTL query; (ii) query succinctness.
Datasets. We used both local and remote datasets. We considered data (person data
and infoboxes) from the latest 7 versions of DBpedia8 and loaded each version in a
separate named-graph by using Blazegraph9 as SPARQL processor. All the experiments
have been performed on an Intel i5 machine with 8GBs RAM. Results reported are the
average of 10 runs. Table 4 reports the size (in millions of triples) of the (local) data
loaded in each version along with the percentage of updated (Upds), added (Adds) and
removed (Dels) triples between versions.

                           Version Size Upds Adds Dels
                             3.6 ∼40M    -     -     -
                             3.7 ∼51M 11.2 % 6.3 % 2.1%
                             3.8 ∼59M 16.5% 6% 2.9%
                             3.9 ∼66M 36.8% 15.5% 3.7%
                            2014 ∼78M 23.2% 14.4% 3.9%
                            2015 ∼84M 11.5% 6.3% 7.3%
                            2016 ∼86M 4.4% 3.2% 1.2%

                Table 4. Statistics about the versions of DBpedia considered.




Running Time. We used a set of 10 SPARQ–LTL queries including from one to four
temporal constructs. In the first experiment, we measured the query time after translat-
ing the queries. Running times are reported in Table 5. As it can be observed the running
time for all queries is in the order of tens of seconds.


                            Query Time (s) Query Time (s)
                             Q1     11      Q6     12
                             Q2     23      Q7     23
                             Q3     11      Q8     32
                             Q4     14      Q9     33
                             Q5     41      Q10    31
                                  Table 5. Running Time.


   We noted that the query time grows almost exponentially with the number of tem-
poral constructs involved. Indeed, each temporal construct requires a certain number of
UNION queries. The cost in terms of time to load all the version was of about 3h (on a
    8
        http://downloads.dbpedia.org
    9
        https://www.blazegraph.com
laptop), with a total storage space for all the versions of about 50 GBs. To give a sense
of the amount of redundancy introduced by this approach, we noticed that the ∼40% of
triples are left untouched along all the versions. Overall, this approach has the advan-
tage of providing an immediate way for querying historical RDF data. The cost that one
has to pay consists in loading different data versions plus data redundancy. The level of
redundancy that can be afforded depends on the dataset; for datasets of small/medium
size like DBpedia this may be bearable.


Query Succinctness. We measured the length, in terms of number of characters, of the
SPARQ–LTL queries considered along with their translations. Fig. 3 shows the result of
this analysis. It is worth to point out that: (i) SPARQ–LTL queries not only are shorter
but also more readable; (ii) the user deals with a lower number of variables.


                                                                       SPARQ-LTL
             4000
                                                                       SPARQL
             3000

             2000

             1000

                 0
                      Q1    Q2    Q3     Q4    Q5    Q6    Q7    Q8     Q9   Q10
                     Fig. 3. Query length for SPARQ–LTL and translations.




6    SPARQ–LTL and other Archiving models

We now give a hint of how SPARQ–LTL can be used with other archiving models.
In particular we focus on timestamps and leave as a future work a complete treatment
of this topic. Timestamps allow to establish the time (or the version) at which a certain
event is true. In this setting, a temporal RDF graph is a graph where triples hs, p, oi in the
graph have a fourth element that represents validity [v1 , v2 ], i.e., (hs, p, oi, [v1 , v2 ]). As
mentioned in Section 2.3, for triples that do not change at all across versions the validity
can be omitted. The syntax of a graph in this setting is usually given by reifying temporal
facts into non-temporal facts [21]. We represent a temporal triple (hs, p, oi, [v1 , v2 ]) in
RDF as shown in Fig. 4.
    The advantage of this approach over the independent versions model is that data are
stored in one graph, although requiring an effort to generate and keep up to date the
validity of triples. In terms of storage space, this approach requires at most (only for
triples that change) four triples instead of one.
                                   s p _:x.
                                   _:x p o.
                                   _:x :validFrom v1.
                                   _:x :validThrough v2.

                            Fig. 4. Triples with the timestamps model.


     The independent versions model can be cast into the timestamp model as follows:
(i) start with the oldest data version (v1 ) and represent each triple as hs, p, oi; (ii) for
each version vi , i > 1, if hs, p, oi (resp., (hs, p, oi, [vj , ])) was in vi−1 and is not in vi
add (hs, p, oi, [v1 , vi−1 ]) (resp., (hs, p, oi, [vj , vi−1 ])); (iii) if hs, p, oi was not in vi−1
but is in vi add (hs, p, oi, [vi , −]).
     In this setting, the evaluation function used to define the semantics of SPARQ–LTL
will not make usage of different (named) graphs (see Table 2) but will make usage
of a single graph and leverage the validity information of triples. As an example, the
operator EVENTUALLY whose semantics according to the independent versions is
                                                    [
                                     JF qKGi =            JqKGj
                                                  i≤j