=Paper=
{{Paper
|id=Vol-1690/paper67
|storemode=property
|title=Gize: A Time Warp in the Web of Data
|pdfUrl=https://ceur-ws.org/Vol-1690/paper67.pdf
|volume=Vol-1690
|authors=Valeria Fionda,Melisachew Wudage Chekol,Giuseppe Pirrò
|dblpUrl=https://dblp.org/rec/conf/semweb/FiondaCP16
}}
==Gize: A Time Warp in the Web of Data==
Gize: A Time Warp in the Web of Data
Valeria Fionda1 , Melisachew Wudage Chekol2 , and Giuseppe Pirrò3
1
University of Calabria, Italy
fionda@mat.unical.it
2
University of Mannheim, Germany
mel@informatik.uni-mannheim.de
3
Institute for High Performance Computing and Networking, ICAR-CNR, Italy
pirro@icar.cnr.it
Abstract. We introduce the Gize framework for querying historical
RDF data. Gize builds upon two main pillars: a lightweight approach to
keep historical data, and an extension of SPARQL called SPARQ–LTL,
which incorporates temporal logic primitives to enable a rich class of
queries. One striking point of Gize is that its features can be readily
made available in existing query processors.
1 Introduction
Querying historical data is of utmost importance in many contexts, from city
monitoring, where one needs to track different aspects (e.g., pollution, popula-
tion) to generic exploratory research, where one is interested in posing queries
like “Retrieve players that are now managing some club they played for” or “Re-
trieve the annotation of a gene since the discovery of a particular interaction”.
The classical approach for querying RDF data (e.g., via SPARQL endpoints)
only considers the latest version. In fact, querying of historical RDF data poses
some challenges. The first concerns the representation and storing of historical
data. Some approaches (e.g., [4, 7]) allow to retrieve data by providing times-
tamps. Other resort to dedicated indexing structures (e.g., [5]) to speed-up query
processing. The second problem concerns what type of querying primitive to pro-
vide. The most common approach is to devise SPARQL extensions that work
with intervals or SPARQL translations into temporal logic. The addition of ad-
hoc components either in terms of data representation, query language or both,
hinders the applicability on existing RDF (query) processing infrastructures.
To cope with these issues we present the Gize4 framework. Gize is built
around two main components. The first is a lightweight approach to store RDF
data, where each version of the data is stored in a separate named graph. The
second component is a powerful extension of SPARQL called SPARQ–LTL. This
language inherits a variety of temporal operators from Linear Temporal Logics
(LTL) [2]. To evaluate SPARQ–LTL on existing SPARQL processors we devise
a translation to standard SPARQL queries.
4
Gize ( ) is the Ethiopian word for time.
Related Work. Approaches like the DBpedia Wayback Machine [4] allow to re-
trieve data at a certain timestamp (provided by the user). Other approaches
(e.g.,[7]) access historical data via (HTTP) content negotiation, typically using
the Memento framework. Yet other approaches (e.g., [5]) introduce timestamps
into RDF triples along with ad-hoc indexing strategies. In Description Logics,
some proposals focus on temporal conjunctive query answering (e.g., [1]); other
efforts (e.g., [6]) have focused on the translation of SPARQL into LTL. Sur-
prisingly, the design of solutions for querying historical RDF data on existing
SPARQL processors is still in its infancy. Gize fills this gap by contributing
an extension of SPARQL, called SPARQ–LTL, that allows for a rich class of
temporal primitives borrowed from LTL (e.g., SINCE, NEXT, PREVIOUS) along
with a translation from SPARQ–LTL queries into standard SPARQL queries.
2 The Gize Framework
Representing Historical RDF Data. The tenet of Gize is to enable querying
of historical RDF data on existing processors. To represent versions, Gize lever-
ages the notion of RDF quad. An RDF quad (for simplicity, we omit bnodes) is
a tuple of the form hs, p, o, ci ∈ I × I × (I ∪ L) × I, where I (IRIs) and L (literals)
are countably infinite sets. The forth element of the quad represents that named
graph to which the triple belongs.
June 2012 March 2014 September 2014
dbp:Cesare_Prandelli dbp:Cesare_Prandelli dbp:Andrea_Pirlo dbp:Antonio_Conte
dbp:Andrea_Pirlo dbp:Andrea_Pirlo
dbpo:coach dbpo:name dbpo:name dbpo:coach
. dbpo:coach .
dbp:Luca_Antonelli dbpo:name
. .
dbp:Italy_national_football_team
. dbpo:name . dbp:Italy_national_football_team
.
dbp:Italy_national_football_team
dbpo:regionalName dbpo:name dbp:Giampaolo_Pazzini . dbpo:regionalName dbpo:name dbp:Matteo_Darmian
dbpo:regionalName .
dbpo:name
dbp:UEFA_European_Championship dbp:Giampaolo_Pazzini dbp:UEFA_European_Championship dbp:UEFA_Euro_2016_qualifying
dbp:UEFA_Euro_2012 dbp:UEFA_European_Championship
dbpo:current dbpo:current
June 2016 July 2016 update addition deletion
dbp:Antonio_Conte dbp:Giampiero_Ventura
dbp:Andrea_Pirlo dbp:Andrea_Pirlo DBpedia Update Statistics about People
dbpo:coach dbpo:name . dbpo:coach
dbpo:name .
. .
dbp:Italy_national_football_team dbp:Italy_national_football_team
. .
dbpo:regionalName dbpo:name dbp:Matteo_Darmian dbpo:regionalName dbpo:name dbp:Matteo_Darmian
dbp:UEFA_European_Championship
dbp:UEFA_European_Championship dbp:UEFA_Euro_2016
dbp:UEFA_Euro_2016
dbpo:current dbpo:current
Fig. 1. An exceprt of evolving data from DBpedia.
Fig. 1 shows the evolution of some data taken from DBpedia. Each of the
5 versions considered is represented by a named graph. From this small data
sample one may notice that Italy has changed 3 coaches from July 2012 (C.
Prandelli) to July 2016 (G. Ventura). Interestingly, the latest coach (G. Ventura)
will start on July 18th. One may also notice that some players like A. Pirlo were
part of the team along the whole period, while some other like G. Pazzini or
M. Darmiani were left out or added, respectively. The figure also shows (bottom
right corner) update statistics about People in DBpedia. Each percentage is
computed wrt the previous version. The availability of historical data allows to
pose queries like “Find all players that played with the highest number of coaches”
or “Find players that played since C. Prandelli was the coach”. Most of existing
approaches either are not able to express such kind of queries or have to resort
to ad-hoc processing infrastructures.
The SPARQ–LTL Language. The syntax of SPARQ–LTL is shown below
while Table 1 provides a description of the temporal operators. Let q be a
SPARQ–LTL query, H = {h1 , h2 , . . . , hm } be the set of versions, and hc be
the current (not necessarily the latest) version of the data.
QP ::= QP1 . QP2 | {QP1 } UNION {QP2 } | {QP1 } MINUS {QP2 } | QP1 OPTIONAL {QP2 } |
| GRAPH I ∪ VQP1 | QP1 FILTER {R} | t = (I ∪ V) × (I ∪ V) × (I ∪ L ∪ V) |
| X{QP} | W{QP} | F{QP} | G{QP} | {QP1 } U {QP2 } |
| Y{QP} | Z{QP} | P{QP} | H{QP} | {QP1 } S {QP2 }
Operator SPARQ–LTL Syntax Meaning
Xq NEXT Evaluate q on version hc+1
Fq EVENTUALLY Evaluate q on all versions hc , ....hm
Gq ALWAYS The evaluation of q must be the same on all versions hc , ....hm
q1 U q2 UNTIL If S2 is the solution of q2 in a version, hk ∈ {hc , ....hm } then
there exists a solution S1 of q1 on hc such that
S1 is compatible with S2 in all versions {hc , ....hk }
Yq PREVIOUS Evaluate q on version hc−1
Pq PAST Evaluate q on all versions h1 , ....hc
Hq ALWAYSPAST The evaluation of q must be the same on all versions h1 , ....hc
q1 S q2 SINCE If S2 is the solution of q2 in a version, hk ∈ {h1 , ....hc } then
there exists a solution S1 of q1 on {hk , ....hc } such that
S1 is compatible with S2 in all versions {hk , ....hc }
Table 1. Meaning of the temporal operators in SPARQ–LTL.
SPARQ–LTL allows to use an additional set of keywords when writing SPARQL
queries. SPARQ–LTL are evaluated by translating the temporal operators via
a (set of) pattern(s) evaluated on named graphs maintaining data versions. We
give some examples by considering the five version of data shown in Fig. 1
(stored in separate named graphs). In what follows, dbp:INFT is a shorthand
for dbp:Italy_national_football_team.
Example 1. Select players who are playing in the Italian national football team
and played at least under a different coach than the current one.
SPARQ–LTL Translation into SPARQL
SELECT ?p WHERE {
dbp:INFT dbpo:name ?p.
dbp:INFT dbpo:coach ?c1.
{GRAPH { dbp:INFT dbpo:name ?p.
SELECT ?p WHERE { dbp:INFT dbpo:coach ?c2.
dbp:INFT dbpo:name ?p. FILTER (?c1!=?c2) } } UNION
dbp:INFT dbpo:coach ?c1. {GRAPH { dbp:INFT dbpo:name ?p.
PAST{ dbp:INFT dbpo:coach ?c2.
dbp:INFT dbpo:name ?p. FILTER (?c1!=?c2) } } UNION
dbp:INFT dbpo:coach ?c2. {GRAPH { dbp:INFT dbpo:name ?p.
FILTER (?c1!=?c2) dbp:INFT dbpo:coach ?c2.
} } FILTER (?c1!=?c2)} } UNION
{GRAPH { dbp:INFT dbpo:name ?p.
dbp:INFT dbpo:coach ?c2.
FILTER (?c1!=?c2) } } UNION
{GRAPH { dbp:INFT dbpo:name ?p.
dbp:INFT dbpo:coach ?c2.
FILTER (?c1!=?c2)} } }
The SPARQL query on the right is automatically generated and can be eval-
uated on existing processors. Note that the translation (because of the semantics
of PAST described in Table 1) requires to look into all versions.
Example 2. Find the name of the coach of the Italian national football team after
the sacking of Cesare Prandelli.
SPARQ–LTL Translation into SPARQL
SELECT ?n WHERE {
SELECT ?n WHERE { {GRAPH {
PAST { dbp:INFT dbpo:coach dbp:CP.
dbp:INFT dbpo:coach dbp:CP. GRAPH
NEXT { {dbp:INFT dbpo:coach ?n. FILTER (?n != dbp:CP )}
dbp:INFT dbpo:coach ?n. } } UNION ...... UNION
FILTER (?n != dbp:CP ) {GRAPH {
} dbp:INFT dbpo:coach dbp:CP.
} GRAPH
} {dbp:INFT dbpo:coach ?n. FILTER (?n != dbp:CP )
} } }}
In the previous query, dbp:CP is a shorthand for dbp:Cesare_Prandelli.
As before, the translation of PAST makes usage of UNION queries over each ver-
sions vi ; then, for each vi , NEXT checks in version vi+1 (via a FILTER) that the
coach changed.
3 Conclusions
We have outlined Gize, which enables to set-up an infrastructure for query-
ing historical RDF data on existing SPARQL processors. Gize adopts a simple
approach to store different versions of the data and a powerful temporal ex-
tension of SPARQL called SPARQ–LTL. As a future work, we are considering
approaches like RDF HDT [3] to improve the storage space consumption.
References
1. S. Borgwardt, M. Lippmann, and V. Thost. Temporalizing Rewritable Query Lan-
guages Over Knowledge Bases. JWS, 33:50–70, 2015.
2. G. D. De Giacomo and M. Y. Vardi. Linear Temporal Logic and Linear Dynamic
Logic on Finite Traces. In IJCAI, 2013.
3. J. D. Fernández, M. A. Martı́nez-Prieto, C. Gutiérrez, A. Polleres, and M. Arias.
Binary RDF Representation for Publication and Exchange. JWS, 19:22–41, 2013.
4. J. D. Fernández, P. Schneider, and J. Umbrich. The DBpedia Wayback Machine.
In SEMANTICS, pages 192–195, 2015.
5. S. Gao, J. Gu, and C. Zaniolo. RDF-TX: A Fast, User-Friendly System for Querying
the History of RDF Knowledge Bases. In EDBT, pages 269–280, 2016.
6. R. Mateescu, S. Meriot, and S. Rampacek. Extending SPARQL with Temporal
Logic. Report, 2009.
7. H. Van de Sompel, R. Sanderson, M. L. Nelson, L. L. Balakireva, H. Shankar, and
S. Ainsworth. An HTTP-based Versioning Mechanism for Linked Data. 2010.