Estimating the Dynamics of SPARQL Query
                            Results Using Binary Classification

                                        Alberto Moya Loustaunau and Aidan Hogan

                                                IMFD; DCC, University of Chile
                                                 {amoya,ahogan}@dcc.uhile.cl


                            Abstract. We address the problem of estimating when the results of
                            an input SPARQL query over dynamic RDF datasets will change. We
                            evaluate a framework that extracts features from the query and/or from
                            past versions of the target dataset and inputs them into binary classifiers
                            to predict whether or not the results for a query will change at a fixed
                            point in the near future. For this evaluation, we create a gold standard
                            based on 23 versions of Wikidata and a curated collection of 221 SPARQL
                            queries. Our results show that the quality of predictions possible using
                            (only) features based on the query structure and lightweight statistics of
                            the predicate dynamics – though capable of beating a random baseline –
                            are not competitive with results obtained using (more costly to derive)
                            knowledge of the complete historical changes in the query results.

                            Keywords: SPARQL · Linked Data · Dynamics · Wikidata


                    1     Introduction

                    Many applications that consume Linked Data (LD) face challenges related to
                    remote changes in the underlying data. Client-side caches can reduce the network
                    traffic between clients and servers, the load on servers, and the average latency of
                    responses. However, since datasets change over time, for caches to be useful, they
                    should be updated when the underlying data that they reflect change; predicting
                    such remote changes, however, is a challenging problem, particularly when data
                    are accessed as the results of queries to a SPARQL endpoint.
                        Since datasets change over time, long-running applications that cache and
                    repeatedly use query results obtained from an external SPARQL endpoint may
                    resubmit the queries regularly to ensure up-to-dateness. As a result, without fur-
                    ther information as to the dynamics of a particular SPARQL query, applications
                    face the choice of either performing frequent query executions that may be re-
                    dundant and repeatedly return the same results (aiming for stronger consistency
                    at the cost of more frequent querying), or performing infrequent query execu-
                    tions that may lead to stale data being persisted in the application when the
                    underlying sources change (accepting weaker consistency to improve efficiency/s-
                    calability). Given the costs for clients and servers of repetitive requests served
                    over the Web and the potential efficiency gains offered by local caches, several


Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)
2       Alberto Moya Loustaunau and Aidan Hogan

weak consistency approaches have been proposed that try to keep the local data
of the applications updated at lower cost by predicting changes [15,6,23].
    Some works study data dynamics based on the historical evolution of enti-
ties [13,7,21], but following such an approach for SPARQL queries is expensive
because (1) a SPARQL query may involve potentially many entities; and (2)
it is necessary to have the previous complete versions of data to analyse the
entities relevant to a query. Many works have explored the dynamics of Linked
Data with the intention of finding patterns that allow for characterizing, rec-
ognizing, and predicting changes based on analysis of the domains, predicates,
and schema [27,13,22,23,11]. Among these works are hybrid approaches devel-
oped to return fresher query results with faster response times by decomposing
a query into dynamic sub-queries executed remotely, and static sub-queries exe-
cuted over local caches [29], but such an approach only focuses on estimating the
dynamics of individual triple patterns, and does not consider, for example, query
operators. More generally, coping with changes in remote data still presents a
major challenge for applications leveraging dynamic Linked Data.
    Given a SPARQL query and a dynamic RDF graph (consisting of multiple
historical versions), in this work, we address the problem of predicting whether
or not the query’s results will change in the next version of the RDF graph.
To the best of our knowledge, this is the first work to address this problem.
Along these lines, we evaluate a general architecture based on Machine Learning
that extracts static features from a query, as well as features from the query
and dynamic dataset. These features are fed into a binary classifier to predict
whether or not the query results will change in a fixed point in the future,
or, more ambitiously, can be fed into a regression model to predict when the
query results are likely to change. With respect to the features used, we show
that there is a trade-off between those that are easy to compute but offer less
accurate predictions (e.g., static query features) versus those that are more costly
to compute but offer more accurate predictions (e.g., historical changes in query
results). Per this trade-off, which features to use for predicting the dynamics of
query results may then depend on the particular application.
    In order to better understand this conceptual trade-off – and more generally,
to evaluate the quality of predictions made by our framework – we create a
novel gold standard based on 23 weekly versions of the Wikidata knowledge
graph [33] and 221 user-generated queries; we show this gold standard to have a
variety of desirable features, including (most importantly) a balance of queries
whose results never change, always change, as well as non-trivial queries whose
results intermittently change. Using this gold standard, our experiments show
that although features based on static characteristics of the query and statistics
of changes in the data for individual predicates are more efficient to compute
and maintain, they do not offer the same prediction quality as features based on
knowledge of historical changes of the input query’s results.

Contributions: (1) We evaluate a general architecture for predicting when/if the
results of a SPARQL query will change in a future version of a dynamic RDF
dataset. (2) We evaluate a number of features to instantiate this architecture
                           Estimating the Dynamics of SPARQL Query Results                 3

based on analysis of the query, analysis of the dynamics of predicates in the
data, and analysis of historical changes in the query results. (3) We create a gold
standard for these tasks based on 23 consecutive versions of Wikidata and a set
of 221 real-world SPARQL queries. (4) We use this gold standard to compare
the predictions obtained using different types of features and classifiers.


2     Related Work

A variety of works have addressed the issue of modeling and consuming dynamic
Linked Data from a broad range of perspectives [24,26,27,30,13,9,12,8,14,22,11].
One of the major challenges considered is that of keeping cached copies of remote
dynamic data – cached for reasons of efficiency and scalability – up-to-date on
the consumer side, which we refer to as the synchronization problem.
    Some works have addressed the synchronization problem on the publisher
side, proposing notification mechanisms that keep registered consumers informed
about relevant changes to the data [18,24,26,16,12,10]; although such approaches
may facilitate strong consistency – meaning that consumers are kept up-to-date
with the remote data on the publisher side – they centralize the burden of
synchronization on the publisher, potentially leading to scalability issues.
    Conversely, a variety of works have looked at building models of remote data
that can help to predict which data are most dynamic, and which are most
static, indicating which subsets of the data may need be refreshed from the re-
mote source more often [24,26,27,30,13,9,22,11]; such works consider changes in
RDF datasets at differing levels of granularity, including documents [13], do-
mains [13,22,23], predicates [13,23], characteristic sets1 [22,11], etc. Features at
different levels of granularity can be fed into different predictive models based
on Poisson Processes [27], Markov Chains [32], Empirical Distributions [19], Ma-
chine Learning classification and regression [23,11], as well as a variety of other
heuristics [2,32,15] and metrics [6,15,1]. Such approaches obviate the need for a
subscription/notification mechanism. However, to ensure strong consistency in
the presence of highly dynamic data, consumers may need to conservatively send
a great many refresh requests to the server, which may be even more costly than
a subscription/notification mechanism; hence such approaches are better suited
for scenarios where weak consistency is more acceptable.
    Specifically regarding the dynamics of SPARQL query results, Passant and
Mendes [24] proposed sparqlPuSH as a notification framework based on Pub-
SubHubBub (recently standardized as WebSub [10]) aiming for strong consis-
tency. Rather aiming for weak consistency, Umbrich et al. [29,31,28] proposed
various methods to obtain knowledge about dynamics for different query pat-
terns, mainly based on predicates. Dehghanzadeh et al. [5] later proposed a
method to estimate the freshness using cardinality estimation techniques based
on predicates and characteristic sets. Combining the notion of subscription-based
notifications and predicting dynamics, Knuth et al. [15] propose a middleware to
1
    A characteristic set is the set of predicate terms used to describe a given subject [20].
4       Alberto Moya Loustaunau and Aidan Hogan

which consumers may subscribe that periodically ranks and schedules refreshes
for queries according to policies that take into account how likely the results
are to be stale, how long ago the query was last refreshed, how many results
previously changed, how long the query takes to run, etc.

Novelty: Given a query and a dynamic RDF dataset, we aim to predict whether
or not the query’s results will have changed in a fixed point in the near future.
Our work thus complements existing works aiming for weak consistency, but (i)
generalizes the problem, evaluating a framework that can incorporate statistics
on predicate dynamics [29,31,28,5] and historical changes in query results [15]
(as proposed in prior works for addressing related but yet distinct problems
relating to dynamic data), as well as novel types of features, (ii) introduces
new features based on query operators and statistics; (iii) creates a novel gold
standard based on Wikidata and presents comparative results that indicate the
relative predictive power inherent in different types of features.


3    Preliminaries: RDF and SPARQL

RDF is a conceptual data model based on directed graphs that can be used to
describe resources on the Web. RDF terms are elements of the set I ∪ B ∪ L
composed of IRIs I, literals L, and blank nodes B. A tuple (s, p, o) ∈ (I ∪ B) ×
(I) × (I ∪ B ∪ L) is called an RDF triple, where s is called subject, p is called
predicate, and o is called object. An RDF graph is a set of RDF triples.
    SPARQL is the recommended query language to retrieve and manipulate data
stored in the RDF format. In this work, we focus on SPARQL SELECT queries,
where we will first define a SPARQL 1.0 query. Let V be a set of variables
disjoint from the set of RDF terms. A SPARQL expression is built recursively
as follows. (1) A triple pattern t ∈ (I ∪ B ∪ L ∪ V) × (I ∪ V) × (I ∪ L ∪ B ∪ V) is an
expression. (2) If Q1 and Q2 are expressions and R is a filter condition, then Q1
FILTER R, Q1 UNION Q2 , Q1 OPTIONAL Q2 , Q1 AND Q2 are expressions. Finally,
if Q is an expression, V a list of variables and ∆ a boolean value, SELECT∆     V Q is
a SPARQL SELECT query, where V denotes the projected variables, and ∆ the
DISTINCT option that when true, removes duplicate results. The semantics of
a SPARQL SELECT query Q is defined in terms of its evaluation over an RDF
graph G, denoted Q(G), giving a set of partial mappings from projected variables
to the set I ∪ L ∪ B; we refer to Pérez et al. [25] for definitions. (A SPARQL
query is in fact evaluated over a set of named graphs; though we consider RDF
graphs here, our methods also generalize to the named graphs setting.)
    Our method supports SPARQL 1.1 SELECT queries, which allow a variety of
additional features. One key feature in this extension is that of property paths,
which allows for matching arbitrary length paths in an RDF graph, potentially
returning or matching the endpoints of the path. An IRI p is a path expression; if
e, e1 and e2 are path expressions, then ^e (inverse of e), e1 /e2 (e1 followed by e2 ),
e1 |e2 (e1 or e2 ), e* (zero or more e), e+ (one or more e), e? (zero or one e), and
(e) (parentheses used for precedence) are also path expressions; finally, if p, p1 ,
                           Estimating the Dynamics of SPARQL Query Results                   5

. . . , pn are IRIs, then !p, !(p1 | . . . |pn ) and !(p1 | . . . |pk |^pk+1 | . . . |^pn ) for
k+1 ≤ n (negated property sets) are path expressions. Thereafter, a path pattern
(s, e, o) where e is a path expression is an expression. Other features supported
in SPARQL 1.1 include sub-queries, negation, aggregation, value binding, and
so forth; for brevity, we do not introduce definitions for all such features.
       One topic we do wish to highlight, however – as it relates to the behavior of
a query over a dynamic RDF graph – is that of the monotonicity of SPARQL
queries [3,4]. We say that a SELECT query Q is monotone if and only if G1 ⊆ G2
implies that Q(G1 ) ⊆ Q(G2 ) for RDF graphs G1 and G2 ; intuitively, as data
are extended, the results of a monotone query can only be extended. Monotonic
SPARQL features include, for example, joins, unions, paths and filters; on the
other hand, non-monotonic SPARQL features include negation and optional [3].

4    Predicting the Dynamics of Query Results
We consider a dynamic RDF graph to be a sequence of n discrete versions of an
RDF graph denoted G := (G1 , . . . , Gn ); in practice, we assume these versions to
have regular intervals (e.g., hourly, daily, weekly, etc.). Further given a SPARQL
(1.1) SELECT query Q, we would like to have knowledge of the dynamics of its
results. The specific problem we consider in this work – which we call One Shot
Change (OSC) – accepts G, Q and a positive integer k as input and outputs a
boolean value predicting whether or not the results will change from Gn to some
future version Gn+k (i.e., Q(Gn ) 6= Q(Gn+k )?). In the case of bag semantics, we
define “6=” in terms of bag inequality, meaning that two bags of query results
are different if the multiplicity of any result differs. We thus currently focus on
predicting if the results of a query will change, rather than predicting when, or
to what extent, they will change; these latter problems are left for future work.

Architecture: In Figure 1, we provide an overview of a general architecture for
making predictions with respect to the OSC prediction task. A SPARQL query
Q and a dynamic RDF graph G are given as input. The Query Feature Extractor
extracts static features from the query, which include statistics about the number
of elements in the query (e.g., number of triple patterns, variables, etc.) as well
as the presence of particular query operators (e.g., UNION, FILTER, recursive path
expressions, etc.). The Predicate Feature Extractor takes as input the predicates
from the query and statistics about those predicates from the dynamic graph to
produce a set of aggregated numerical features about the dynamicity of predi-
cates used in the query. The Result Feature Extractor takes as input the results
of the query over past versions of the dynamic graph from which it produces
further features. All such features are passed to a (pre-trained) binary classifier
to generate the final OSC prediction. We will now describe in more detail the
query, data and result features considered in this work.

Query Features: Our initial set of features is based on static analysis of the input
SPARQL query. Figure 2 shows only a sample of evaluated features, as well as
– for the purposes of illustration – their values for an example query.
6        Alberto Moya Loustaunau and Aidan Hogan

                                     f1
                        Query        ...
           Q           Feature       ...
                      Extractor
                                     fi
                                    fi+1
                      Predicate      ...                             OSC
               G       Feature                   Classification
                                     ...                           Prediction
                      Extractor
                                     fj
                                    fj+1
                       Result        ...
           Q(G)        Feature       ...
                      Extractor
                                     fk

          Fig. 1. Proposed architecture for predicting change in query results


    The first group of features captures statistics about the query, indicating
loosely its size and complexity. One might consider that the higher these values
are, in general, the more dynamic we can expect the query to be since there are
more “opportunities” for the query results to be affected by change. In some
cases, however, the hypothesized correlation is not direct since, for example,
adding more triple patterns may serve to narrow the query down and focus it on
a static part of the graph (e.g., when looking for the movies of directors, adding
a triple pattern to restrict the results to directors who have died may reduce
the likelihood of changes in the results in a future version). Hence it will be of
interest to see, experimentally, how these features affect the predictions.
    The second group indicates the query operators used; we capture the presence
or absence of query operators and solution modifiers from SPARQL 1.1.
    In the third group gathers related features into one dimension: in the case
of Recursive path, we group queries with path expressions of the form e* or e+.
On the other hand, in Negation, we group non-monotonic features that allow
for modeling difference (MINUS, NOT EXISTS, !BOUND2 . These features – though
course-grained – are straightforward to extract from a query, and offer valu-
able insights into how the query may behave in a dynamic query; for example,
the Negation feature captures information about the (non-)monotonicity of the
query, while we suppose that Recursive paths, which may traverse an arbitrary
number of triples in the graph, might be more sensitive to change. Again, such
correlations are not without exception and will require empirical study.

Predicate Features: The next component extracts features that capture infor-
mation about the dynamics of predicates used in the query (without evaluating
the query on the dynamic graph). This component captures how often and how
many triples change for a predicate in a time interval, with the idea that – as
in previous works [29,31,28,5] – predicates capture rich information about the
dynamics of a dataset, where the results of queries with dynamic predicates will
be more sensitive to change. Assuming that the number of predicates is relatively
2
    We recall that !BOUND can be combined with OPTIONAL to express negation.
                          Estimating the Dynamics of SPARQL Query Results                        7


 SELECT ?item                                     № of triple patterns                          7
 WHERE {                                          № of variables                                3
   ?item :instance_of :human .                    № of projected variables                      1
   ?item :gender :female .                        № of predicates                               6
     { ?item :place_of_birth :Wales }
   UNION                                          FILTER                                        X
     { ?item :place_of_birth ?pob .               LIMIT                                         X
       ?pob :located_in* :Wales }                 UNION                                         X
   OPTIONAL                                       GROUPBY                                       X
     { ?sitelink schema:about ?item .
       ?sitelink schema:inLanguage "cy" }         Sub-query                                     X
   FILTER (!BOUND(?sitelink))                     Recursive path                                X
 }                                                Negation                                      X
 LIMIT 100
                                                  Predicates                 { :instance of, ... }


       Fig. 2. Example query and a sample of the features extracted from it

 :LMessi :name "L. Messi" ;                                        :LMessi :name "L. Messi" ;
                                :LMessi :name "L. Messi" ;
  :instance_of :Human ;                                             :instance_of :Human ;
                                 :children 2 ;
  :children 1 ;                                                     :children 3 ;
                                 :played :2014WC, :2018WC .
  :played :2014WC .                                                 :played :2014WC, :2018WC .


  Fig. 3. Example dynamic RDF graph G = (G1 , G2 , G3 ) (from left to right, resp.)


          :LMessi :instance_of :Human ;
                                                   :LMessi :instance_of :Human ;
           :children 1 , 2 ;
                                                    :children 2 , 3 .
           :played :2018WC .


   Fig. 4. Changes for G in Figure 4, showing G1 ⊕ G2 (left) and G2 ⊕ G3 (right)


low, such statistics can be easier to maintain. Formally, given two RDF graphs
Gi and Gj , we denote by Gi ⊕ Gj the set of triples (Gi ∪ Gj ) − (Gi ∩ Gj ) where
“−” denotes set difference; in other words, Gi ⊕ Gj denotes the triples in one
graph or the other but not both (noting that Gi ⊕ Gj = Gj ⊕ Gi ). Next, given an
RDF graph G and an IRI p, let #(G, p) := |{(x, y, z) ∈ G : y = p}| denote the
number of triples in G with predicate p. Finally, given a dynamic RDF graph
                                                                    n−1 #(Gi ⊕Gi+1 ,p)
G := (G1 , . . . , Gn ) and a predicate p, we denote by ∆(G, p) := Σi=1 #(Gi ∪Gi+1 ,p)
the normalized sum of the number of triples with the predicate p that changed
between pairs of consecutive versions.

Example 1. Consider the example dynamic RDF graph G in Figure 3 (based on
real data from Wikidata, with IRIs modified for the purposes of readability).
In Figure 4 we show the pairwise changes between each version: G1 ⊕ G2 and
G2 ⊕ G3 . For a predicate p, the value ∆(G, p) is then the sum of triples with
predicate p in the graphs G1 ⊕ G2 and G2 ⊕ G3 divided by the number of triples
with predicate p in the graphs G1 ∪ G2 and G2 ∪ G3 . Looking at Figure 4,
for example, :name does not appear (∆(G, :name) = 0), :children appears
twice in G1 ⊕ G2 and G1 ∪ G2 , as well as, twice in G2 ⊕ G3 and G2 ∪ G3
(∆(G, :children) = 4/4 = 1), and so forth.
8      Alberto Moya Loustaunau and Aidan Hogan

    Given a query Q with a set of predicates {pi , ..., pn }, we may then consider
a variety of aggregate functions over ∆(G, p1 ), ..., ∆(G, pn ), such as max, mean,
etc., to compute a final numeric feature for the query, representing a summary
of the level of dynamicity of the predicates it contains.

Result features: The third type of feature we capture indicates how many times
the query results Q have changed over the past versions of the dynamic graph.
While this feature offers rich information for prediction, given a query that has
not previously been seen, it is costly to compute, since it requires the evaluation
of the query over each past version within an interval, which in turn requires
maintaining indexes over the full data for a variety of past versions.


5   Gold Standard Dataset

In order to evaluate the effectiveness for predicting changes in SPARQL query
results, we require a dynamic RDF graph, with access to various historical ver-
sions; preferably this graph contains real-world, large-scale, diverse RDF data,
and with sufficient changes between versions to provide both positive and nega-
tive examples of queries whose results change. Furthermore, we require a set of
SPARQL queries that can be answered against this dynamic graph; preferably
these queries again should be diverse, representative of real-world user queries,
of a variety of shapes and sizes, using a variety of query operators, and with a
mix of both dynamic and static results over the dynamic graph. In particular,
we choose Wikidata [33] for our experiments, which, we shall argue, meets the
aforementioned requirements. We first give details of the dynamic data we collect
from Wikidata; thereafter, we discuss the queries we use in our experiments.

RDF Data: We use 23 Wikidata snapshots from 18/04/2017 to 27/09/2017,
which are captured (roughly) weekly in the truthy version that contains triples
(without qualifiers) that have the best non-deprecated rank for a given property;
this data corpus was previously collected and used by Gonzalez and Hogan [11].
The first version has 1,102,242,331 triples and 3,276 unique predicates, while
the final version has 1,924,967,162 triples (+74%) and 3,660 unique predicates
(+11%). Figure 5, on the left, shows the growth in triples as the time progress,
while on the right, shows the numbers of triples added and removed version-to-
version. We see that although many more triples are added than removed, there
are some triples removed each version. We further note some peaks in triples
added in some versions, which may be due to bulk imports of data. Between
version 11 and 12 we have an almost 2 week gap because we were not able to
obtain the data for that week, but this causes only the third highest peak. The
dynamic graph considers a total of 32.3 billion triples across 23 versions.
    Table 1 shows the ten most dynamic predicates according to the number of
added and deleted statements involving that predicate (Total) and the ratio of
added and deleted statements divided by the total number of statements for that
predicate across all snapshots (Dyn); for the latter, we only include predicates
                              Estimating the Dynamics of SPARQL Query Results                             9

        ·109                               ·107
                                      8
    2
                                                   Added     Removed
                                      6
 1.5


    1                                 4


 0.5                                  2


    0
          1                    23           1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

                  Version                                              Version


Fig. 5. Evolution of total triples in Wikidata (left) and number of triples added to and
removed from the nth version for the n + 1th version (right)

Table 1. Top-10 dynamic predicates per total (left) and proportional (right) changes

 № Predicate                        Total Dyn          № Predicate                             Total Dyn
 1      schema:description     464567354    0.04        1   owl:complementOf                  156862   0.99
 2      schema:dateModified     99330382    0.14        2   owl:onProperty                    156862   0.99
 3      schema:version          99330164    0.14        3   owl:someValuesFrom                 86674   0.71
 4      schema:name             44487864    0.01        4   wdt:P2462 (member of the deme)      3563   0.62
 5      rdfs:label              44487864    0.01        5   wdt:P3383 (film poster)             4388   0.58
 6      skos:prefLabel          44487864    0.01        6   wdt:P2331 (Cycling Archives ID)     4229   0.33
 7      wdt:P2093 (author name) 30671685    0.09        7   wdt:P1112 (deleted )                4366   0.20
 8      rdf:type                21808957    0.02        8   wdt:P505 (general manager)          4959   0.18
 9      wdt:P31 (instance of)   12483471    0.02        9   schema:dateModified             99330382   0.14
10      schema:about            10773295    0.02       10   schema:version                  99330164   0.14


that appear in all snapshots and appear in ≥ 1,000 statements overall. Though
the largest number of changes involves the predicate schema:description, its
dynamic value is low due to it being a common predicate. On the other hand, the
OWL properties have high dynamicity ratios due to taking blank node values;
we currently do not consider isomorphism of graphs due to blank nodes.

Queries: The corpus collected by Gonzalez and Hogan [11] does not offer queries.
In order to achieve a set of SPARQL queries that are answerable over Wikidata
and with which we could run experiments, we took the user-contributed example
queries from the Wikidata Query Service3 , consisting of 389 SPARQL 1.1 SELECT
queries of varying degrees of complexity, touching upon various domains of data,
and with a mix of varying query operators.
    A total of 96 queries had to be removed from the original set of 389 be-
cause they asked for information external to Wikidata (using SERVICE) or asked
for qualifiers that are not present in the truthy version. We also eliminated 12
queries that returned bindings to blank nodes to facilitate comparison between
3
    https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/
    examples
10      Alberto Moya Loustaunau and Aidan Hogan

Table 2. Distribution of triple patterns
(C: Constant, V: Variable, P: Path).

      Pattern       №          %                  Table 3. Distribution of paths
      VCV          435    59.43%
      VCC          190    25.96%                    Pattern      №          %
      VVV            9     1.23%
                                                    e*          62     74.70%
      CCV            6     0.82%
                                                    e1 /e2      56     67.47%
      VVC            5     0.68%
                                                    e+           8      9.64%
      CVV            3     0.41%
                                                    e1 |e2       6      7.23%
      CVC            1     0.14%
                                                    e?           6      7.23%
      CCC            0     0.00%
                                                    !(e)         0      0.00%
      VPC           69     9.43%
                                                    e           83    100.00%
      VPV           13     1.78%
      CPV            1     0.14%
      CPC            0     0.00%


the results. Furthermore, some queries featured non-deterministic elements –
such as LIMIT/OFFSET without ordering, SAMPLE or temporal functions – such as
TODAY() – that may lead to changes in results not related to changes in the data;
we remove 23 queries with SAMPLE and temporal functions, and add an ORDER
BY clause to all queries to ensure determinism with LIMIT/OFFSET and more
generally to facilitate quick comparison of results. We also eliminated 37 queries
with empty results for all snapshots. As a result of this process of filtering, we
end up with a total of 221 non-empty deterministic queries.
    In Table 2, we provide statistics on the distribution of different types of triple
patterns in the 221 queries before the transformation; of key importance is that
86.20% of the triple patterns have a constant in the predicate position, meaning
that they are compatible with a model based on the dynamics of predicates. We
also see that 83 triple patterns (11.34%) feature a path expression, where Table 3
provides the distribution of these expressions (remarking that multiple patterns
can be used in one expression); of note here is that recursion is commonly found
(74.70% use e* while 9.64% use e+), and that we find no negated property paths.
    Finally, we look at how the results for these queries change over the 23
versions. Figure 6 shows how many queries had some results change between
two consecutive versions, where we see that approximately half of the result sets
change in each version. But are these always the same queries that change every
time? Figure 6 shows the number of versions in which some result changed for
each query; the queries are ordered by the number of changes in their results,
where we can see that the results of 44 queries (19.90%) change each time,
17 queries (7.69%) never change, and more generally, we note a quite uniform
distribution of queries in between. More generally speaking, we conclude that
our query set has a good balance of queries whose results never change, queries
whose results always change, and queries whose results sometimes change.
                                   Estimating the Dynamics of SPARQL Query Results              11


                  200
                                                                         20


                                                      Versions changed
Queries changed
                  150


                  100                                                    10

                   50


                   0                                                      0
                        1                       23                            0   100     200

                                Version                                           Query


Fig. 6. Number of queries with some re-               Fig. 7. Number of versions in which
sults changed for each version                        some results changed for each query


Feature extraction: To compute the query features, we use Apache Jena to parse
the query and extract the necessary statistics and determine the presence of
the relevant query operators. In order to extract statistics on the dynamics of
individual predicates in the dynamic graph, a challenge here is scalability, since
we work with a total of 32.3 billion triples; hence we (1) sort each version of
Wikidata, (2) apply a merge-sort iterator over each pair of consecutive versions
(Gi and Gi+1 ) to detect triples that changed (Gi ⊕ Gj ), writing a separate file
for triples that were deleted (Gi − Gj ) and added (Gj − Gi ), (3) from these files,
we can then compute and sum the number of triples changed for each predicate
between each pair of versions. Finally, in order to create the ground truth in
terms of which results change between which versions, we index each version in
Virtuoso and compute the results for each query against each version and write
them to disk; we then compare consecutive pairs of results with a merge-sort.


6                  Evaluation

Our experiments evaluate the (relative) quality of One Shot Change (OSC) pre-
dictions that can be made based on query features, predicate features, result
features, as well as combinations thereof, using a selection of binary classifiers.

Setting: To build our final datasets, we must define a past interval to consider.
With 23 versions, we must hold out at least one version to predict, leaving a
maximum interval of 22 past versions to use for computing our features. However,
the more past versions we consider for a prediction, the fewer examples we can
generate; for example, if we consider 22 versions, we will only have one ground
truth label for each query. In the end, we opted to consider intervals of 3, 5, 9 and
17 previous versions, for example, in the case of an interval of three versions, we
use (Gi−3 , Gi−2 , Gi−1 ) to predict changes in queries results for the subsequent
version (Gi ), which allows us to predict OSC from G4 to G23 inclusive, providing
20 labeled examples per query. We thus have 4,420 labeled examples for intervals
12      Alberto Moya Loustaunau and Aidan Hogan

of 3, ranging down to 1,326 examples for intervals of 17. No feature explicitly
indicates the versions given or the version to be predicted.
    In the case of predicate dynamics, as discussed in Section 4, there are poten-
tially many predicates per query, each with its own value for ∆(G, p), where to
reduce (and fix) dimensionality, we may apply an aggregate function to choose
the min, max or mean value over all predicates. In preliminary experiments, we
found the mean value to offer the best results, followed by the max value; hence
in what follows we adopt the mean value in our experiments.
    For OSC prediction, we test four binary classifiers: Decision Trees, Linear
SVM, Naive Bayes, and Nearest Neighbors. We split the data to use 80% for
training and 20% for tests. To avoid overfitting, we use stratified 5-fold cross-
validation. We consider three types of features, as previously described – query
features (q), predicate features (p) and result features (r) – considering histor-
ical data (only) from the fixed interval. We also consider combinations of these
features: qp, qr, pr and qpr.

Results: Table 4 shows the F1 -score for the predictions made considering the
union of different combinations of the sets of features (q, p, r) and varying the
window sizes (3, 5, 9, 17); for reference, we include a baseline that randomly
guesses yes/no respecting the observed class distribution. Given the number of
configurations presented, we only include F1 scores for reasons of space; however,
we remark that the Precision and Recall scores were in general quite balanced.
The best results were obtained using the features from r, where, as can be
expected, prediction performs best when knowledge of changes in the historical
results of the input query is available. The features from sets q and p – which
do not assume the availability of such information – obtain a significantly lower
F1 -score; the best result including r was with pr (F1 = 0.831) using Linear SVM
and window size 17, while the best results without r was with q (F1 = 0.60)
using Linear SVM and window size 5.
    Furthermore, in Figure 8, we show that the size of the window influences
the quality of the results when considering r features (blue), being better as
the window increases. However, there is no clear trend in the results for increas-
ing window sizes when considering models without using r features (red). (As
aforementioned, we can see that Precision and Recall results are comparable.)


7    Summary, Conclusions and Future Work

In this paper we evaluate methods to predict whether or not (OSC) the results
of an input query will change at a fixed point in the near future. More specif-
ically, we evaluate a framework based on binary classifiers that accept features
extracted from the query, from past versions of the data, and/or from the com-
bination of both. Considering this framework, we hypothesize that there is a
conceptual trade-off between the cost of computing features and their value for
prediction: features extracted from queries alone are the most efficient to extract,
not requiring historical data, but are quite coarse-grained for prediction; on the
                         Estimating the Dynamics of SPARQL Query Results               13

Table 4. F1 -score for tested classifiers considering different sets of features (q, p, r)
and window sizes (3,5,9,17)

           Classifier                q       p       r      qp      qr      pr     qpr
  w=3      Random Baseline       0.499   0.489   0.502   0.489    0.495   0.495   0.516
           Decision Trees        0.497   0.519   0.781   0.497    0.569   0.706   0.600
           Naive Bayes           0.432   0.419   0.758   0.432    0.478   0.759   0.479
           Nearest Neighbors     0.505   0.517   0.776   0.509    0.679   0.781   0.680
           Linear SVM            0.596   0.372   0.758   0.593    0.775   0.762   0.774
           Best                 0.596 0.519 0.781 0.593 0.775 0.781 0.774
  w=5      Random Baseline       0.480   0.496   0.499   0.505    0.495   0.497   0.499
           Decision Trees        0.485   0.514   0.782   0.494    0.581   0.711   0.603
           Naive Bayes           0.431   0.486   0.782   0.432    0.508   0.797   0.500
           Nearest Neighbors     0.517   0.535   0.795   0.525    0.676   0.790   0.673
           Linear SVM            0.600   0.384   0.784   0.590    0.787   0.789   0.787
           Best                 0.600 0.535 0.795 0.590 0.787 0.797 0.787
  w=9      Random Baseline       0.497   0.505   0.493   0.483    0.497   0.511   0.515
           Decision Trees        0.456   0.517   0.793    0.48    0.668   0.735   0.651
           Naive Bayes           0.434   0.461   0.797   0.435    0.531   0.790   0.531
           Nearest Neighbors     0.514   0.528   0.787   0.517    0.653   0.800   0.648
           Linear SVM            0.578   0.397   0.797   0.571    0.795   0.800   0.796
           Best                 0.578 0.528 0.797 0.571 0.795 0.800 0.796
  w = 17 Random Baseline         0.486   0.494   0.482   0.512    0.524   0.511   0.490
         Decision Trees          0.511   0.511   0.827   0.469    0.712   0.712    0.71
         Naive Bayes             0.416   0.465   0.826   0.414    0.555   0.810   0.554
         Nearest Neighbors       0.548   0.480   0.826   0.554    0.744   0.820   0.731
         Linear SVM              0.554   0.438   0.826   0.560    0.819   0.831   0.815
           Best                 0.554 0.511 0.827 0.560 0.819 0.831 0.815


other hand, features based on changes in results over previous versions for an
(unseen) input query are often the most costly to acquire, but offer fine-grained
information for prediction; finally, features based on the dynamics of predicates
offer a balance between the two, allowing to summarize historical data into suc-
cinct statistics, thus offering more fine-grained information than static query
features but more coarse-grained information than historical results.
    We thus explore this trade-off experimentally using 23 versions and 221 user-
contributed queries for Wikidata to form a gold standard dataset. We use this
gold standard dataset to evaluate the trade-off identified between different types
of features for OSC predictions. Our results show that the features based on
historical changes to query results perform best (reaching F1 = 0.831 in the best
configuration), whereas considering static query features and predicate dynamics
alone is less competitive (reaching F1 = 0.600 in the best configuration) when
historical results are not available or are prohibitively costly to compute. Com-
14           Alberto Moya Loustaunau and Aidan Hogan

     1                           1                              1
 0.8                            0.8                            0.8
 0.6                            0.6                            0.6
 0.4                            0.4                            0.4
 0.2                            0.2                            0.2
     0                           0                              0
         3   5    9        17         3   5     9         17         3   5   9        17

                  F1                          Precision                      Recall


Fig. 8. Best results for combination of features in term of F1 -measure (left), Precision
score (center) and Recall (right) with increasing windows sizes; blue indicates results
using features from r (r, qr, pr and qpr) while red indicates results not using features
from r (q, p, qp and the Baselines)


paring query and predicate features, in fact, query features performed better,
where predicate features alone only barely outperformed a random baseline.
    In conclusion, our results show that having knowledge of the historical changes
of the results of a query is important for improving the quality of OSC predictions
using binary classifiers. However, in many scenarios, such knowledge is often not
available or may not be practical to compute. Considering a real-world caching
use-case, for example, historical changes in results may be maintained for queries
that are frequently repeated with relatively low-cost, but for a previously unseen
query, computing results for past versions at runtime would incur a prohibitive
cost. Hence, we identify an open research question: is it possible to estimate the
dynamics of query results without relying on (costly) knowledge about historical
changes of query results while staying competitive with the quality of prediction
possible when such knowledge is available?
    Regarding future work, our gold standard based on Wikidata could be ex-
tended to consider more versions spanning a longer period of time and/or more
queries; a very large query dataset for Wikidata was recently published [17],
which may be of use here. Building gold standards based on other datasets would
also help to diversify the evaluation process. Concerning the prediction tasks
themselves, one promising direction may be to apply a more fine-grained anal-
ysis of the query, considering (for example) the selectivity of particular triples
patterns as well as their dynamicity. There may also be better ways to perform
such predictions without relying on binary classifiers, but rather using more
analytical methods. Finally, it would be interesting to investigate the effective-
ness of these techniques in practice, developing caching systems, synchronization
schedules, and other applications, based on the evaluated predictions.

Material We make supplementary material (queries, data, results, etc.) available
at https://users.dcc.uchile.cl/~amoya/quweda2019/.

Acknowledgements This work was supported by the Millennium Institute for
Foundational Research on Data (IMFD), by Fondecyt Grant No. 1181896 and
by CONICYT PFCHA/Doctorado Nacional/2017 - 21171070.
                         Estimating the Dynamics of SPARQL Query Results              15

References
 1. Akhtar, U., Amin, M.B., Lee, S.: Evaluating scheduling strategies in LOD based
    application. In: Asia-Pacific Network Operations and Management Symposium,
    APNOMS. IEEE (2017)
 2. Alici, S., Altingövde, I.S., Ozcan, R., Cambazoglu, B.B., Ulusoy, Ö.: Adaptive
    time-to-live strategies for query result caching in web search engines. In: European
    Conference on IR Research, ECIR. Springer (2012)
 3. Arenas, M., Pérez, J.: Querying semantic web data with SPARQL. In: Principles
    of Database Systems (PODS). ACM (2011)
 4. Arenas, M., Ugarte, M.: Designing a query language for RDF: marrying open and
    closed worlds. ACM Trans. Database Syst. 42(4) (2017)
 5. Dehghanzadeh, S., Parreira, J.X., Karnstedt, M., Umbrich, J., Hauswirth, M.,
    Decker, S.: Optimizing SPARQL query processing on dynamic and static data
    based on query time/freshness requirements using materialization. In: Joint Inter-
    national Conference, JIST. Springer (2014)
 6. Dividino, R.Q., Gottron, T., Scherp, A.: Strategies for efficiently keeping local
    linked open data caches up-to-date. In: International Semantic Web Conference
    ISWC. Springer (2015)
 7. Dividino, R.Q., Gottron, T., Scherp, A., Gröner, G.: From changes to dynamics:
    Dynamics analysis of linked open data sources. In: Extended Semantic Web Con-
    ference, PROFILES@ESWC 2014 (2014)
 8. Dividino, R.Q., Kramer, A., Gottron, T.: An investigation of HTTP header infor-
    mation for detecting changes of linked open data sources. In: European Semantic
    Web Conference (ESWC). Springer (2014)
 9. Dividino, R.Q., Scherp, A., Gröner, G., Grotton, T.: Change-a-lod: Does the
    schema on the linked data cloud change or not? In: Workshop on Consuming
    Linked Data, COLD. CEUR-WS.org (2013)
10. Genestoux, J., Fitzpatrick, B., Slatkin, B., Atkins, M.: WebSub. W3C Recommen-
    dation (Jan 2018), https://www.w3.org/TR/websub/
11. González, L., Hogan, A.: Modelling dynamics in semantic web knowledge graphs
    with formal concept analysis. In: World Wide Web Conference. ACM (2018)
12. Ibáñez, L.D., Skaf-Molli, H., Molli, P., Corby, O.: Col-graph: Towards writable and
    scalable linked open data. In: International Semantic Web Conference (ISWC). pp.
    325–340. Springer (2014)
13. Käfer, T., Abdelrahman, A., Umbrich, J., O’Byrne, P., Hogan, A.: Observing linked
    data dynamics. In: Extended Semantic Web Conference, ESWC. Springer (2013)
14. Kjernsmo, K.: A survey of HTTP caching implementations on the open semantic
    web. In: European Semantic Web Conference, ESWC. Springer (2015)
15. Knuth, M., Hartig, O., Sack, H.: Scheduling refresh queries for keeping results
    from a SPARQL endpoint up-to-date (short paper). In: Confederated International
    Conferences: CoopIS, C&TC, and ODBASE. Springer (2016)
16. Mader, C., Martin, M., Stadler, C.: Facilitating the exploration and visualization
    of linked data. In: Linked Open Data - Creating Knowledge Out of Interlinked
    Data - Results of the LOD2 Project. Springer (2014)
17. Malyshev, S., Krötzsch, M., González, L., Gonsior, J., Bielefeldt, A.: Getting the
    most out of Wikidata: Semantic technology usage in Wikipedia’s knowledge graph.
    In: International Semantic Web Conference (ISWC) (2018)
18. Missier, P., Alper, P., Corcho, Ó., Dunlop, I., Goble, C.A.: Requirements and
    services for metadata management. IEEE Internet Computing (2007)
16      Alberto Moya Loustaunau and Aidan Hogan

19. Neumaier, S., Umbrich, J.: Measures for assessing the data freshness in open data
    portals. In: Open and Big Data, OBD. IEEE Computer Society (2016)
20. Neumann, T., Moerkotte, G.: Characteristic sets: Accurate cardinality estimation
    for RDF queries with multiple joins. In: International Conference on Data Engi-
    neering, ICDE. IEEE Computer Society (2011)
21. Nishioka, C., Scherp, A.: Temporal patterns and periodicity of entity dynamics in
    the linked open data cloud. In: Conference on Knowledge Capture, K-CAP. ACM
    (2015)
22. Nishioka, C., Scherp, A.: Information-theoretic analysis of entity dynamics on the
    linked open data cloud. In: International Workshop on Dataset PROFIling and
    fEderated Search for Linked Data (PROFILES ’16) ESWC. CEUR-WS.org (2016)
23. Nishioka, C., Scherp, A.: Keeping linked open data caches up-to-date by predicting
    the life-time of RDF triples. In: Conference on Web Intelligence. ACM (2017)
24. Passant, A., Mendes, P.N.: sparqlpush: Proactive notification of data updates in
    RDF stores using pubsubhubbub. In: Workshop on Scripting and Development for
    the Semantic Web. CEUR-WS.org (2010)
25. Pérez, J., Arenas, M., Gutiérrez, C.: Semantics and complexity of SPARQL. ACM
    Trans. Database Syst. (2009)
26. Tramp, S., Frischmuth, P., Ermilov, T., Auer, S.: Weaving a social data web with
    semantic pingback. In: Knowledge Engineering and Management by the Masses -
    EKAW. Springer (2010)
27. Umbrich, J., Hausenblas, M., Hogan, A., Polleres, A., Decker, S.: Towards dataset
    dynamics: Change frequency of linked open data sources. In: WWW2010 Workshop
    on Linked Data on the Web, LDOW. CEUR-WS.org (2010)
28. Umbrich, J., Karnstedt, M., Hogan, A., Parreira, J.X.: Freshening up while staying
    fast: Towards hybrid SPARQL queries. In: Knowledge Engineering and Knowledge
    Management, EKAW. Springer (2012)
29. Umbrich, J., Karnstedt, M., Hogan, A., Parreira, J.X.: Hybrid SPARQL queries:
    Fresh vs. fast results. In: International Semantic Web Conference,ISWC. Springer
    (2012)
30. Umbrich, J., Karnstedt, M., Land, S.: Towards understanding the changing web:
    Mining the dynamics of linked-data sources and entities. In: LWA 2010 - Lernen,
    Wissen & Adaptivität, Workshop (2010)
31. Umbrich, J., Karnstedt, M., Parreira, J.X., Polleres, A., Hauswirth, M.: Linked
    data and live querying for enabling support platforms for web dataspaces. In:
    International Conferenceon Data Engineering, ICDE. IEEE (2012)
32. Umbrich, J., Mrzelj, N., Polleres, A.: Towards capturing and preserving changes
    on the web of data. In: European Semantic Web Conference ESWC (2015)
33. Vrandecic, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Com-
    mun. ACM 57(10), 78–85 (2014)