=Paper= {{Paper |id=Vol-1824/mepdaw_paper_1 |storemode=property |title=Versioned Triple Pattern Fragments: A Low-cost Linked Data Interface Feature for Web Archives |pdfUrl=https://ceur-ws.org/Vol-1824/mepdaw_paper_1.pdf |volume=Vol-1824 |authors=Ruben Taelman,Miel Vander Sande,Ruben Verborgh,Erik Mannens |dblpUrl=https://dblp.org/rec/conf/esws/TaelmanSVM17 }} ==Versioned Triple Pattern Fragments: A Low-cost Linked Data Interface Feature for Web Archives== https://ceur-ws.org/Vol-1824/mepdaw_paper_1.pdf
             Versioned Triple Pattern Fragments:
           A Low-cost Linked Data Interface Feature
                      for Web Archives

                            Ruben Taelman, Miel Vander Sande,
                            Ruben Verborgh, and Erik Mannens

                           Ghent University – imec – IDLab, Belgium
                               {firstname.lastname}@ugent.be




       Abstract. Linked Datasets typically evolve over time because triples can be
       removed from or added to datasets, which results in different dataset versions.
       While most attention is typically given to the latest dataset version, a lot of useful
       information is still present in previous versions and its historical evolution. In order
       to make this historical information queryable at Web scale, a low-cost interface is
       required that provides access to different dataset versions. In this paper, we add a
       versioning feature to the existing Triple Pattern Fragments interface for queries at,
       between and for versions, with an accompanying vocabulary for describing the
       results, metadata and hypermedia controls. This interface feature is an important
       step into the direction of making versioned datasets queryable on the Web, with a
       low publication cost and effort.

       Keywords: Linked Data, versioning, Triple Pattern Fragments, Linked Data
       Fragments, sparql


1   Introduction

rdf [3] and sparql [8] are methods for respectively representing and querying Linked
Data. The rdf data model is atemporal, as rdf itself does not define a mechanism to
annotate information with dates. In practice, Linked Datasets can, however, be dynamic
on different levels [10], with dataset, schema, and/or instance-level changes. These
changes open up new data analysis possibilities, such as looking up data at certain points
in time, requesting changes over time, or querying for times this data was valid. These
querying methods are useful for example for analyzing concept drift or the management
of ontology evolution.
     While most data publishers currently only provide queryable access to the latest
version of their datasets, data dumps of previous dataset versions are also often made
available, such as the DBpedia dataset [1]. A survey on archiving Linked Open Data [16]
shows that there is a need for querying over dynamic Linked Datasets at, between and for
different versions.
     Temporal sparql language extensions [7, 11] at query endpoints would enable such
queries, but these endpoints would at least have the same complexity as sparql, making
it costly to make these endpoints public [21]. In order to make this version querying
2                            Ruben Taelman et al.

possible at Web-scale, a low-cost solution such as the Triple Pattern Fragments [21] (tpf)
would be ideal. By itself, the tpf interface is, just like the rdf data model, atemporal.
In this paper, we introduce Versioned Triple Pattern Fragments (vtpf), a Web API
feature [20] that provides hypermedia controls for 3 versioning queries types. For this
we also introduce a new version vocabulary, which is graph-based and allows versions,
changesets and versionsets to be represented.
    In the next section, we introduce the requirements of the vtpf interface, after which
we present related work in Section 3. In Sections 4 and 5, we respectively introduce the
version vocabulary and the vtpf interface. Finally, in Section 6, the conclusions and
future work are discussed.


2   Requirements

In order to enable queries over different dataset versions using the tpf framework, we
introduced the following three requirements in previous work [14]:
 1. An extension of the tpf interface for versioning query types.
 2. A storage solution supporting the query atoms.
 3. A tpf client that is able to consume the tpf interface extension.
    In this work, we introduce vtpf as a solution for the first task. We design vtpf as an
interface feature rather than a separate api, in order to maximize interface compatibility
and reusability [20]. This means that the tpf interface remains as it is, such that regular
tpf clients can continue to access it as any other tpf interface. By adding the vtpf
feature, we merely provide extra possibilities for vtpf-supporting clients, which can be
ignored by others. At any later stage, new or existing interface features can be added
(such as other query types [17] or metadata [18]), while maintaining compatibility with
tpf and vtpf clients. This multi-dimensional interface extensibility works because the
interface explicitly describes its interaction and functionality through in-band hypermedia
controls [6].
    We list the following requirements for this interface, in order to make it version-aware
and self-descriptive:
 1. api support for triple pattern queries on versions, between, versions and for versions
 2. html and rdf controls for the three versioning query types, allowing humans and
     machines to consume data.
 3. html and rdf metadata to provide information about dataset versions, with links to
     other relevant versions.
 4. html and rdf forms as hypermedia controls for automatic discoverability of the
     interface.


3   Related Work

Triple Pattern Fragments [21] (tpf) is a Linked Data interface that provides access to
Linked Datasets using triple pattern queries. By restricting access to these simple queries,
the publication cost of datasets using the tpf interface is significantly reduced when
                                                 Versioned Triple Pattern Fragments        3

compared to sparql query endpoints [4]. As part of this framework, a client-side sparql
engine has been developed that is able to consume data from tpf interfaces.
     The tpf interface is based on the rest principles and makes the api browsable
for machines. This is done by exposing hypermedia controls that provide declarative
instructions to clients on how they can consume data from this interface, similar to
how humans can understand and use visual html forms. tpf uses the Hydra Core
Vocabulary [9] to represent these self-descriptive controls, by providing a link template
explaining how triple patterns can be queried, as can be seen in Listing 1.1. Furthermore,
each tpf response contains relevant metadata about the dataset and the data fragment,
including an estimate of the total number of triples for that triple pattern.
     In previous work [14], we explored the possibilities for adding versioning support to
the tpf interface for different query atoms on storage, interface and client-level. In this
paper, we focus on the following query atoms [5] related to versioning:
  – Version Materialization (vm) queries data at a single version.
  – Delta Materialization (dm) queries differences between two versions.
  – Version Queries (vq) annotate query results with versions for which they are valid.
These three query atoms correspond to realistic versioning queries, and each of them can
be used to evaluate other, and more complex versioning queries [5, 14].
     Two tpf extensions exist that add a kind of versioning feature to the interface. The
tpf Memento extension [19] supports vm queries. It does this based on the Memento
protocol [12], which allows clients to select versions of http resources based on content
negotiation. tpf-qs [13] is an approach that supports vq queries by annotating triples
with time intervals. These two approaches are both not generic enough to support vm, dm
and vq queries. Furthermore, they both do not introduce version-oriented hypermedia
controls, which is required for generic version feature discovery.
     While vm queries return a regular list of triples, dm and vq return triples with
annotations. For dm queries, a method for representating deltas between two versions
is required. A theoretical diff ontology [2] was explored for representing rdf graph
deltas, which directly links difference graphs to other graphs. As this is only a theoretical
ontology, no practicly usable implementation exists. The Talis Changeset vocabulary [15]
uses triple-level changesets that can contain additions and deletions, which requires
reification and therefore has semantical issues. vq queries return a combined view over
all versions, and annotate each triple with a list of versions, which is related to how
tpf-qs [13] compares different strategies for annotating triples with timestamps.


4   Vocabulary

As stated in our requirements, the vtpf interface must expose hypermedia controls
and version metadata in its rdf responses. Furthermore, the three query atoms require
different result structures. For these hypermedia controls, metadata and result annotation,
we introduce the version vocabulary, inspired by the existing, but limited vocabularies
discussed in Section 3. This vocabulary is available at http://w3id.org/version/
ontology, for which we use the ver: prefix.
4     Ruben Taelman et al.




    Fig. 1: vm query html form and results.




    Fig. 2: dm query html form and results.




    Fig. 3: vq query html form and results.
                                                    Versioned Triple Pattern Fragments         5

                             void:Dataset         ver:beforeVersion
                 rdf:subClassOf                   ver:afterVersion
                                                  ver:latestVersion
    ?                        ver:Version
        ver:relatedVersion                        ver:firstVersion
                                        ver:changeSetStart
ver:versionSetContains
                                        ver:changeSetEnd
                                                   rdf:subClassOf     ver:ChangeSetAdditions
         ver:VersionSet            ver:ChangeSet
                                                   rdf:subClassOf     ver:ChangeSetDeletions

            Fig. 4: Overview of the classes and properties in the version vocabulary.


     Fig. 4 shows an overview of all available classes and properties in this new ontology,
which consists of the version, changeset and versionset concepts. These last two sets are
graph-based, meaning that triples can be part of a changeset or versionset graph.
     The basis of this ontology is a ver:Version, which is a subclass of a void:Dataset.
This means that each version of a dataset is a dataset by itself. For indicating the version
of any resource, we introduce the ver:relatedVersion property. This property has
several subproperties, for referencing versions before or after a version, but also for
referencing the first or latest known version with relation to a dataset.
     A second main class is the ver:ChangeSet, which indicates a set of added or removed
triples between two versions. Each changeset must indicate the version range over which
it is defined, using the ver:changeSetStart and ver:changeSetEnd properties. These
properties both reference a ver:Version and are inclusive. Each changeset can either
contain a list of added or a list of removed triples, where multiple changesets can exist
between two versions. The ver:ChangeSetAdditions and ver:ChangeSetDeletions
classes respectively indicate these types of changesets. As multiple changesets can exists,
additions and deletions are always listed separately.
     Finally, the ver:VersionSet class indicates a collection of triples that all exist in the
same dataset versions. The versions in which the triples in this versionset exist are defined
using the ver:versionSetContains property, which refers to a single ver:Version.


5   Interface
In this section, we introduce the three query atoms within the vtpf interface. We finish
the section with the introduction of a live vtpf interface.
    In the following subsections, we introduce new html and rdf forms for these query
atoms, together with methods for representing their results. These three new forms are
added to the existing tpf interface as an extension. Both the html and rdf forms have
toggles for selecting the query atom, in html this is represented as a radio button and in
rdf this is done using a versionType parameter. In order to remain backwards-compatible
for clients that only support the regular tpf interface, we keep the original tpf form, but
internally transform it to a vm query that selects triples against the latest dataset version.
    As discussed in previous work [14], each query atom has a certain complexity when it
is evaluated using a certain storage policy. Depending on this storage policy, the publisher
6                            Ruben Taelman et al.

may want to restrict the possible query atoms that are possible through the vtpf interface.
For example, a strategy with individual copies per version may be efficient for vm, but
slow for dm. For the remainder of this paper, we consider a storage solution that is able
to handle the three query atoms efficiently for at least triple patterns, and we therefore
enable the three query atoms at the interface by-default.


5.1   Version Materialized

In order to enable triple pattern queries against specific dataset versions (vm), we added
a version parameter to both the rdf and html forms.
     Fig. 1 shows an example of the html form for querying triples within a certain version.
This allows clients to select a version to query in, given a list of all available versions.
     Listing 1.2 shows an equivalent hypermedia control for this using the Hydra Core
Vocabulary [9]. This is almost equal to the existing tpf control where only a version
property was added, which can be filled in by clients. Listing 1.3 shows additional
metadata that is added to the results. This metadata allows clients to determine the version
of the dataset that is currently being queried, indicated using ver:relatedVersion.
Furthermore, the total number of currently available versions is indicated. Finally, links
to other versions are indicated, which allows clients to determine next, previous, first and
last versions of datasets, and improves the discoverability of other versions.


5.2   Delta Materialized

The second main versioned query atom is for querying the differences between two
versions (dm). For this, we add two parameters to the rdf and html form, for respectively
selecting the start and end version.
    Fig. 2 shows the html form for selecting the start and end version. Each triple is now
annotated with a green “+” or a red minus “-”. “+” indicates that this triple was added
somewhere inbetween the start and end versions. “-” indicates the opposite, if that triple
was removed.
    In Listing 1.4, the equivalent hypermedia controls are shown. Listing 1.5 shows
an example of a dm query result. For dm queries, each triple is contained in a ver:
ChangeSetAdditions or ver:ChangeSetDeletions graph, and respectively indicate if
the triples are additions or deletions for the given version range. These two graphs are
annotated with this version range in the metadata graph, as shown in Listing 1.6.


5.3   Version Query

Finally, the vq query atom selects triples over all versions, and annotates each triple with
the versions in which they are present. This query atom requires no additional parameter
to the rdf and html form, it only needs a flag indicating the query atom.
    The html view for this query atom is shown in Fig. 3 in which all triples are annotated
with their list of versions. Listing 1.8 shows an example of rdf results for vq queries. All
triples are annotated with a versionset graph. These versionset graphs are annotated with
the versions for which they are applicable, as can be seen in Listing 1.9.
                                                  Versioned Triple Pattern Fragments    7

 {
   hydra:search [
    hydra:template "http://fragments.dbpedia.org/2014{?s,p,o}";
    hydra:variableRepresentation hydra:ExplicitRepresentation;
    hydra:mapping [ hydra:variable "s"; hydra:property rdf:subject   ],
                  [ hydra:variable "p"; hydra:property rdf:predicate ],
                  [ hydra:variable "o"; hydra:property rdf:object    ]
  ].
}
               Listing 1.1: tpf query form using the Hydra Core Vocabulary.

 {
   hydra:search [
    hydra:template "http://versioned.linkeddatafragments.org/bear?versionType=
                       VersionMaterialized{&s,p,o,v}";
    hydra:variableRepresentation hydra:ExplicitRepresentation;
    hydra:mapping [ hydra:variable "s"; hydra:property rdf:subject        ],
                  [ hydra:variable "p"; hydra:property rdf:predicate      ],
                  [ hydra:variable "o"; hydra:property rdf:object         ],
                  [ hydra:variable "v"; hydra:property ver:relatedVersion ]
  ].
}
          Listing 1.2: The vm query rdf form for triple triples in a single version.

 {
  
    a ver:Version;
    ver:relatedVersion ;
    ver:versionCount "9"^^xsd:integer;
    ver:afterVersion ;
    ver:firstVersion ;
    ver:latestVersion 
}
         Listing 1.3: vm query result metadata that contains links to other versions.

 {
   hydra:search [
    hydra:template "http://versioned.linkeddatafragments.org/bear?versionType=
                       DeltaMaterialized{&s,p,o,s,e}";
    hydra:variableRepresentation hydra:ExplicitRepresentation;
    hydra:mapping [ hydra:variable "s"; hydra:property rdf:subject        ],
                  [ hydra:variable "p"; hydra:property rdf:predicate      ],
                  [ hydra:variable "o"; hydra:property rdf:object         ],
                  [ hydra:variable "s"; hydra:property ver:changeSetStart ],
                  [ hydra:variable "e"; hydra:property ver:changeSetEnd   ]
  ].
}
        Listing 1.4: The dm query rdf form for triple differences over two versions.
8                            Ruben Taelman et al.



_:changeSetDeletions {
  ol:node0
    a ;
    a ;
     "mystylequicktip";
     .
}
_:changeSetAdditions {
  ol:node0
     "49.0000000";
     "32.0000000".
}
                      Listing 1.5: dm query rdf results in changesets.




http://versioned.linkeddatafragments.org/bear#metadata> {
  _:changeSetAdditions a ver:ChangeSetAdditions;
    ver:changeSetStart ;
    ver:changeSetEnd   .

    _:changeSetDeletions a ver:ChangeSetDeletions;
      ver:changeSetStart ;
      ver:changeSetEnd   .

     a ver:Version;
      ver:beforeVersion ;
      ver:firstVersion ;
      ver:latestVersion .

     a ver:Version;
      ver:afterVersion ;
      ver:firstVersion ;
      ver:latestVersion 
}
            Listing 1.6: dm query rdf result metadata with changeset annotations.
                                                  Versioned Triple Pattern Fragments   9

 {
   hydra:search [
    hydra:template "http://versioned.linkeddatafragments.org/bear?versionType=
                       Version{&s,p,o}";
    hydra:variableRepresentation hydra:ExplicitRepresentation;
    hydra:mapping [ hydra:variable "s"; hydra:property rdf:subject   ],
                  [ hydra:variable "p"; hydra:property rdf:predicate ],
                  [ hydra:variable "o"; hydra:property rdf:object    ]
  ].
}
              Listing 1.7: The vq query rdf form for version-annotated triples.

_:versionSet0 {
  ol:node0 a 
     "49.0000000";
     "32.0000000".
}

_:versionSet1 {
  ol:node0 a ;
     "mystylequicktip";
     .
}
                       Listing 1.8: vq query rdf results in versionsets.

 {
  _:versionSet0 a ver:VersionSet;
    ver:versionSetContains .

    _:versionSet1 a ver:VersionSet;
      ver:versionSetContains .

     a ver:Version;
      ver:beforeVersion ;
      ver:firstVersion ;
      ver:latestVersion .

     a ver:Version;
      ver:afterVersion ;
      ver:firstVersion ;
      ver:latestVersion 
}
            Listing 1.9: vq query rdf result metadata with versionset annotations.
10                             Ruben Taelman et al.

5.4   Live Demo
As an example usage of the vtpf interface, we set up a live entrypoint at http:
//versioned.linkeddatafragments.org/bear. This interface exposes access to the
first ten versions of the rdf archive provided by the bear benchmark [5], which consists
of triples compiled from the Dynamic Linked Data Observatory1. These first ten versions
contain more than 72 million unique triples, with approximately 30 million triples per
version. As backend storage solution we use ostrich2, which supports the three query
atoms for triple patterns.


6     Conclusions

In this paper, we introduced an interface for exposing query access to different versions
of a dataset for triples at, between and for different different versions. This fulfills our
first api requirement. For requirements 2 and 3, we provide html and rdf controls and
metadata for allowing both humans and machines to understand the interface and the
data. Finally, for requirement 4, we used hypermedia controls to allow machines to
automatically discover how to use these query methods. As vtpf is a minimal extension
to the existing tpf interface, this allows data owners to expose their different dataset
versions on the Web with a low cost. This is a solution to the first task for making rdf
queryable on the Web with a low publication cost.
     In future work, we intend to provide solutions for the other tasks [14] for making
different dataset versions possible using the tpf framework. We will extend the existing
tpf client so that it is able to understand and consume data from the vtpf interface
extension, for the three query atoms. For this, a variant of the sparql query language will
be needed, such as t-sparql [7] or sparql-st [11]. Furthermore, our storage solution
that is used as a backend to this vtpf interface will be further developed and investigated.
Finally, this interface, together with the solutions for the other tasks will be evaluated, so
that the publication and consumption cost with the vtpf approach can be determined.
     An interface like vtpf lowers the barrier for data owners to publish different versions
of their data, which will lead to the availability of more Linked Datasets at different
versions over the Web. This will open up a new world of querying possibilities at,
between, and for versions for domains that require such access, such as the analysis of
concept drift.




 1 http://swse.deri.org/dyldo/
 2 ostrich is a work-in-progress and will be introduced as a triple store with versioning support in
   future work.
                                                    Versioned Triple Pattern Fragments         11

References
 1. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: A nucleus
    for a Web of open data. In: The semantic web, pp. 722–735. Springer (2007)
 2. Berners-Lee, T., Connolly, D.: Delta: an ontology for the distribution of differences between
    rdf graphs. World Wide Web, http://www.w3.org/DesignIssues/Diff 4(3), 4–3 (2004)
 3. Cyganiak, R., Wood, D., Lanthaler, M.: rdf 1.1: Concepts and abstract syntax. Recommenda-
    tion, W3C (Feb 2014), http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/
 4. Feigenbaum, L., Todd Williams, G., Grant Clark, K., Torres, E.: sparql 1.1 protocol. Rec.,
    W3C (Mar 2013), http://www.w3.org/TR/2013/REC-sparql11-protocol-20130321/
 5. Fernández, J.D., Umbrich, J., Polleres, A., Knuth, M.: Evaluating query and storage strategies
    for rdf archives. In: Proceedings of the 12th International Conference on Semantic Systems
    (2016)
 6. Fielding, R.T.: rest apis must be hypertext-driven (Oct 2008), http://roy.gbiv.com/
    untangled/2008/rest-apis-must-be-hypertext-driven
 7. Grandi, F.: t-sparql: A TSQL2-like temporal query language for rdf. In: ADBIS (Local
    Proceedings). pp. 21–30. Citeseer (2010)
 8. Harris, S., Seaborne, A., Prud’hommeaux, E.: sparql 1.1 query language. Recommendation,
    W3C (Mar 2013), http://www.w3.org/TR/2013/REC-sparql11-query-20130321/
 9. Lanthaler, M., Gütl, C.: Hydra: A vocabulary for hypermedia-driven Web apis. In: Proceedings
    of the 6th Workshop on Linked Data on the Web (May 2013)
10. Meimaris, M., Papastefanatos, G., Viglas, S., Stavrakas, Y., Pateritsas, C., Anagnostopoulos,
    I.: A query language for multi-version data Web archives. Expert Systems 33(4), 383–404
    (2016)
11. Perry, M., Jain, P., Sheth, A.P.: sparql-st: Extending sparql to support spatiotemporal
    queries. In: Geospatial semantics and the semantic web, pp. 61–86. Springer (2011)
12. Van de Sompel, H., Nelson, M.L., Sanderson, R., Balakireva, L.L., Ainsworth, S., Shankar,
    H.: Memento: Time travel for the Web. arXiv preprint arXiv:0911.1112 (2009)
13. Taelman, R., Verborgh, R., Colpaert, P., Mannens, E.: Continuous client-side query evaluation
    over dynamic Linked Data. In: The Semantic Web: ESWC 2016 Satellite Events (May 2016)
14. Taelman, R., Verborgh, R., Mannens, E.: Exposing rdf archives using Triple Pattern Frag-
    ments. In: Proceedings of the 20th International Conference on Knowledge Engineering and
    Knowledge Management: Posters and Demos (Nov 2016)
15. Tunnicliffe, S., Davis, I.: Changeset vocabulary (2005), http://vocab.org/changeset/
16. Umbrich, J., Decker, S., Hausenblas, M., Polleres, A., Hogan, A.: Towards dataset dynamics:
    Change frequency of Linked Open Data sources. 3rd International Workshop on Linked Data
    on the Web (LDOW) (2010)
17. Van Herwegen, J., De Vocht, L., Verborgh, R., Mannens, E., Van de Walle, R.: Substring
    filtering for low-cost Linked Data interfaces. In: The Semantic Web – ISWC 2015. Lecture
    Notes in Computer Science, vol. 9366, pp. 128–143. Springer (Oct 2015)
18. Vander Sande, M., Verborgh, R., Van Herwegen, J., Mannens, E., Van de Walle, R.: Oppor-
    tunistic Linked Data querying through approximate membership metadata. In: The Semantic
    Web – ISWC 2015. Lecture Notes in Computer Science, vol. 9366, pp. 92–110. Springer (Oct
    2015)
19. Verborgh, R.: Querying history with Linked Data (2016), http://ruben.verborgh.org/
    blog/2016/06/22/querying-history-with-linked-data/
20. Verborgh, R., Dumontier, M.: A Web API ecosystem through feature-based reuse (2016),
    http://arxiv.org/abs/1609.07108, under submission
21. Verborgh, R., Vander Sande, M., Hartig, O., Van Herwegen, J., De Vocht, L., De Meester,
    B., Haesendonck, G., Colpaert, P.: Triple Pattern Fragments: a low-cost knowledge graph
    interface for the Web. Journal of Web Semantics 37–38 (Mar 2016)