Versioned Triple Pattern Fragments: A Low-cost Linked Data Interface Feature for Web Archives Ruben Taelman, Miel Vander Sande, Ruben Verborgh, and Erik Mannens Ghent University – imec – IDLab, Belgium {firstname.lastname}@ugent.be Abstract. Linked Datasets typically evolve over time because triples can be removed from or added to datasets, which results in different dataset versions. While most attention is typically given to the latest dataset version, a lot of useful information is still present in previous versions and its historical evolution. In order to make this historical information queryable at Web scale, a low-cost interface is required that provides access to different dataset versions. In this paper, we add a versioning feature to the existing Triple Pattern Fragments interface for queries at, between and for versions, with an accompanying vocabulary for describing the results, metadata and hypermedia controls. This interface feature is an important step into the direction of making versioned datasets queryable on the Web, with a low publication cost and effort. Keywords: Linked Data, versioning, Triple Pattern Fragments, Linked Data Fragments, sparql 1 Introduction rdf [3] and sparql [8] are methods for respectively representing and querying Linked Data. The rdf data model is atemporal, as rdf itself does not define a mechanism to annotate information with dates. In practice, Linked Datasets can, however, be dynamic on different levels [10], with dataset, schema, and/or instance-level changes. These changes open up new data analysis possibilities, such as looking up data at certain points in time, requesting changes over time, or querying for times this data was valid. These querying methods are useful for example for analyzing concept drift or the management of ontology evolution. While most data publishers currently only provide queryable access to the latest version of their datasets, data dumps of previous dataset versions are also often made available, such as the DBpedia dataset [1]. A survey on archiving Linked Open Data [16] shows that there is a need for querying over dynamic Linked Datasets at, between and for different versions. Temporal sparql language extensions [7, 11] at query endpoints would enable such queries, but these endpoints would at least have the same complexity as sparql, making it costly to make these endpoints public [21]. In order to make this version querying 2 Ruben Taelman et al. possible at Web-scale, a low-cost solution such as the Triple Pattern Fragments [21] (tpf) would be ideal. By itself, the tpf interface is, just like the rdf data model, atemporal. In this paper, we introduce Versioned Triple Pattern Fragments (vtpf), a Web API feature [20] that provides hypermedia controls for 3 versioning queries types. For this we also introduce a new version vocabulary, which is graph-based and allows versions, changesets and versionsets to be represented. In the next section, we introduce the requirements of the vtpf interface, after which we present related work in Section 3. In Sections 4 and 5, we respectively introduce the version vocabulary and the vtpf interface. Finally, in Section 6, the conclusions and future work are discussed. 2 Requirements In order to enable queries over different dataset versions using the tpf framework, we introduced the following three requirements in previous work [14]: 1. An extension of the tpf interface for versioning query types. 2. A storage solution supporting the query atoms. 3. A tpf client that is able to consume the tpf interface extension. In this work, we introduce vtpf as a solution for the first task. We design vtpf as an interface feature rather than a separate api, in order to maximize interface compatibility and reusability [20]. This means that the tpf interface remains as it is, such that regular tpf clients can continue to access it as any other tpf interface. By adding the vtpf feature, we merely provide extra possibilities for vtpf-supporting clients, which can be ignored by others. At any later stage, new or existing interface features can be added (such as other query types [17] or metadata [18]), while maintaining compatibility with tpf and vtpf clients. This multi-dimensional interface extensibility works because the interface explicitly describes its interaction and functionality through in-band hypermedia controls [6]. We list the following requirements for this interface, in order to make it version-aware and self-descriptive: 1. api support for triple pattern queries on versions, between, versions and for versions 2. html and rdf controls for the three versioning query types, allowing humans and machines to consume data. 3. html and rdf metadata to provide information about dataset versions, with links to other relevant versions. 4. html and rdf forms as hypermedia controls for automatic discoverability of the interface. 3 Related Work Triple Pattern Fragments [21] (tpf) is a Linked Data interface that provides access to Linked Datasets using triple pattern queries. By restricting access to these simple queries, the publication cost of datasets using the tpf interface is significantly reduced when Versioned Triple Pattern Fragments 3 compared to sparql query endpoints [4]. As part of this framework, a client-side sparql engine has been developed that is able to consume data from tpf interfaces. The tpf interface is based on the rest principles and makes the api browsable for machines. This is done by exposing hypermedia controls that provide declarative instructions to clients on how they can consume data from this interface, similar to how humans can understand and use visual html forms. tpf uses the Hydra Core Vocabulary [9] to represent these self-descriptive controls, by providing a link template explaining how triple patterns can be queried, as can be seen in Listing 1.1. Furthermore, each tpf response contains relevant metadata about the dataset and the data fragment, including an estimate of the total number of triples for that triple pattern. In previous work [14], we explored the possibilities for adding versioning support to the tpf interface for different query atoms on storage, interface and client-level. In this paper, we focus on the following query atoms [5] related to versioning: – Version Materialization (vm) queries data at a single version. – Delta Materialization (dm) queries differences between two versions. – Version Queries (vq) annotate query results with versions for which they are valid. These three query atoms correspond to realistic versioning queries, and each of them can be used to evaluate other, and more complex versioning queries [5, 14]. Two tpf extensions exist that add a kind of versioning feature to the interface. The tpf Memento extension [19] supports vm queries. It does this based on the Memento protocol [12], which allows clients to select versions of http resources based on content negotiation. tpf-qs [13] is an approach that supports vq queries by annotating triples with time intervals. These two approaches are both not generic enough to support vm, dm and vq queries. Furthermore, they both do not introduce version-oriented hypermedia controls, which is required for generic version feature discovery. While vm queries return a regular list of triples, dm and vq return triples with annotations. For dm queries, a method for representating deltas between two versions is required. A theoretical diff ontology [2] was explored for representing rdf graph deltas, which directly links difference graphs to other graphs. As this is only a theoretical ontology, no practicly usable implementation exists. The Talis Changeset vocabulary [15] uses triple-level changesets that can contain additions and deletions, which requires reification and therefore has semantical issues. vq queries return a combined view over all versions, and annotate each triple with a list of versions, which is related to how tpf-qs [13] compares different strategies for annotating triples with timestamps. 4 Vocabulary As stated in our requirements, the vtpf interface must expose hypermedia controls and version metadata in its rdf responses. Furthermore, the three query atoms require different result structures. For these hypermedia controls, metadata and result annotation, we introduce the version vocabulary, inspired by the existing, but limited vocabularies discussed in Section 3. This vocabulary is available at http://w3id.org/version/ ontology, for which we use the ver: prefix. 4 Ruben Taelman et al. Fig. 1: vm query html form and results. Fig. 2: dm query html form and results. Fig. 3: vq query html form and results. Versioned Triple Pattern Fragments 5 void:Dataset ver:beforeVersion rdf:subClassOf ver:afterVersion ver:latestVersion ? ver:Version ver:relatedVersion ver:firstVersion ver:changeSetStart ver:versionSetContains ver:changeSetEnd rdf:subClassOf ver:ChangeSetAdditions ver:VersionSet ver:ChangeSet rdf:subClassOf ver:ChangeSetDeletions Fig. 4: Overview of the classes and properties in the version vocabulary. Fig. 4 shows an overview of all available classes and properties in this new ontology, which consists of the version, changeset and versionset concepts. These last two sets are graph-based, meaning that triples can be part of a changeset or versionset graph. The basis of this ontology is a ver:Version, which is a subclass of a void:Dataset. This means that each version of a dataset is a dataset by itself. For indicating the version of any resource, we introduce the ver:relatedVersion property. This property has several subproperties, for referencing versions before or after a version, but also for referencing the first or latest known version with relation to a dataset. A second main class is the ver:ChangeSet, which indicates a set of added or removed triples between two versions. Each changeset must indicate the version range over which it is defined, using the ver:changeSetStart and ver:changeSetEnd properties. These properties both reference a ver:Version and are inclusive. Each changeset can either contain a list of added or a list of removed triples, where multiple changesets can exist between two versions. The ver:ChangeSetAdditions and ver:ChangeSetDeletions classes respectively indicate these types of changesets. As multiple changesets can exists, additions and deletions are always listed separately. Finally, the ver:VersionSet class indicates a collection of triples that all exist in the same dataset versions. The versions in which the triples in this versionset exist are defined using the ver:versionSetContains property, which refers to a single ver:Version. 5 Interface In this section, we introduce the three query atoms within the vtpf interface. We finish the section with the introduction of a live vtpf interface. In the following subsections, we introduce new html and rdf forms for these query atoms, together with methods for representing their results. These three new forms are added to the existing tpf interface as an extension. Both the html and rdf forms have toggles for selecting the query atom, in html this is represented as a radio button and in rdf this is done using a versionType parameter. In order to remain backwards-compatible for clients that only support the regular tpf interface, we keep the original tpf form, but internally transform it to a vm query that selects triples against the latest dataset version. As discussed in previous work [14], each query atom has a certain complexity when it is evaluated using a certain storage policy. Depending on this storage policy, the publisher 6 Ruben Taelman et al. may want to restrict the possible query atoms that are possible through the vtpf interface. For example, a strategy with individual copies per version may be efficient for vm, but slow for dm. For the remainder of this paper, we consider a storage solution that is able to handle the three query atoms efficiently for at least triple patterns, and we therefore enable the three query atoms at the interface by-default. 5.1 Version Materialized In order to enable triple pattern queries against specific dataset versions (vm), we added a version parameter to both the rdf and html forms. Fig. 1 shows an example of the html form for querying triples within a certain version. This allows clients to select a version to query in, given a list of all available versions. Listing 1.2 shows an equivalent hypermedia control for this using the Hydra Core Vocabulary [9]. This is almost equal to the existing tpf control where only a version property was added, which can be filled in by clients. Listing 1.3 shows additional metadata that is added to the results. This metadata allows clients to determine the version of the dataset that is currently being queried, indicated using ver:relatedVersion. Furthermore, the total number of currently available versions is indicated. Finally, links to other versions are indicated, which allows clients to determine next, previous, first and last versions of datasets, and improves the discoverability of other versions. 5.2 Delta Materialized The second main versioned query atom is for querying the differences between two versions (dm). For this, we add two parameters to the rdf and html form, for respectively selecting the start and end version. Fig. 2 shows the html form for selecting the start and end version. Each triple is now annotated with a green “+” or a red minus “-”. “+” indicates that this triple was added somewhere inbetween the start and end versions. “-” indicates the opposite, if that triple was removed. In Listing 1.4, the equivalent hypermedia controls are shown. Listing 1.5 shows an example of a dm query result. For dm queries, each triple is contained in a ver: ChangeSetAdditions or ver:ChangeSetDeletions graph, and respectively indicate if the triples are additions or deletions for the given version range. These two graphs are annotated with this version range in the metadata graph, as shown in Listing 1.6. 5.3 Version Query Finally, the vq query atom selects triples over all versions, and annotates each triple with the versions in which they are present. This query atom requires no additional parameter to the rdf and html form, it only needs a flag indicating the query atom. The html view for this query atom is shown in Fig. 3 in which all triples are annotated with their list of versions. Listing 1.8 shows an example of rdf results for vq queries. All triples are annotated with a versionset graph. These versionset graphs are annotated with the versions for which they are applicable, as can be seen in Listing 1.9. Versioned Triple Pattern Fragments 7 { hydra:search [ hydra:template "http://fragments.dbpedia.org/2014{?s,p,o}"; hydra:variableRepresentation hydra:ExplicitRepresentation; hydra:mapping [ hydra:variable "s"; hydra:property rdf:subject ], [ hydra:variable "p"; hydra:property rdf:predicate ], [ hydra:variable "o"; hydra:property rdf:object ] ]. } Listing 1.1: tpf query form using the Hydra Core Vocabulary. { hydra:search [ hydra:template "http://versioned.linkeddatafragments.org/bear?versionType= VersionMaterialized{&s,p,o,v}"; hydra:variableRepresentation hydra:ExplicitRepresentation; hydra:mapping [ hydra:variable "s"; hydra:property rdf:subject ], [ hydra:variable "p"; hydra:property rdf:predicate ], [ hydra:variable "o"; hydra:property rdf:object ], [ hydra:variable "v"; hydra:property ver:relatedVersion ] ]. } Listing 1.2: The vm query rdf form for triple triples in a single version. { a ver:Version; ver:relatedVersion ; ver:versionCount "9"^^xsd:integer; ver:afterVersion ; ver:firstVersion ; ver:latestVersion } Listing 1.3: vm query result metadata that contains links to other versions. { hydra:search [ hydra:template "http://versioned.linkeddatafragments.org/bear?versionType= DeltaMaterialized{&s,p,o,s,e}"; hydra:variableRepresentation hydra:ExplicitRepresentation; hydra:mapping [ hydra:variable "s"; hydra:property rdf:subject ], [ hydra:variable "p"; hydra:property rdf:predicate ], [ hydra:variable "o"; hydra:property rdf:object ], [ hydra:variable "s"; hydra:property ver:changeSetStart ], [ hydra:variable "e"; hydra:property ver:changeSetEnd ] ]. } Listing 1.4: The dm query rdf form for triple differences over two versions. 8 Ruben Taelman et al. _:changeSetDeletions { ol:node0 a ; a ; "mystylequicktip"; . } _:changeSetAdditions { ol:node0 "49.0000000"; "32.0000000". } Listing 1.5: dm query rdf results in changesets. http://versioned.linkeddatafragments.org/bear#metadata> { _:changeSetAdditions a ver:ChangeSetAdditions; ver:changeSetStart ; ver:changeSetEnd . _:changeSetDeletions a ver:ChangeSetDeletions; ver:changeSetStart ; ver:changeSetEnd . a ver:Version; ver:beforeVersion ; ver:firstVersion ; ver:latestVersion . a ver:Version; ver:afterVersion ; ver:firstVersion ; ver:latestVersion } Listing 1.6: dm query rdf result metadata with changeset annotations. Versioned Triple Pattern Fragments 9 { hydra:search [ hydra:template "http://versioned.linkeddatafragments.org/bear?versionType= Version{&s,p,o}"; hydra:variableRepresentation hydra:ExplicitRepresentation; hydra:mapping [ hydra:variable "s"; hydra:property rdf:subject ], [ hydra:variable "p"; hydra:property rdf:predicate ], [ hydra:variable "o"; hydra:property rdf:object ] ]. } Listing 1.7: The vq query rdf form for version-annotated triples. _:versionSet0 { ol:node0 a "49.0000000"; "32.0000000". } _:versionSet1 { ol:node0 a ; "mystylequicktip"; . } Listing 1.8: vq query rdf results in versionsets. { _:versionSet0 a ver:VersionSet; ver:versionSetContains . _:versionSet1 a ver:VersionSet; ver:versionSetContains . a ver:Version; ver:beforeVersion ; ver:firstVersion ; ver:latestVersion . a ver:Version; ver:afterVersion ; ver:firstVersion ; ver:latestVersion } Listing 1.9: vq query rdf result metadata with versionset annotations. 10 Ruben Taelman et al. 5.4 Live Demo As an example usage of the vtpf interface, we set up a live entrypoint at http: //versioned.linkeddatafragments.org/bear. This interface exposes access to the first ten versions of the rdf archive provided by the bear benchmark [5], which consists of triples compiled from the Dynamic Linked Data Observatory1. These first ten versions contain more than 72 million unique triples, with approximately 30 million triples per version. As backend storage solution we use ostrich2, which supports the three query atoms for triple patterns. 6 Conclusions In this paper, we introduced an interface for exposing query access to different versions of a dataset for triples at, between and for different different versions. This fulfills our first api requirement. For requirements 2 and 3, we provide html and rdf controls and metadata for allowing both humans and machines to understand the interface and the data. Finally, for requirement 4, we used hypermedia controls to allow machines to automatically discover how to use these query methods. As vtpf is a minimal extension to the existing tpf interface, this allows data owners to expose their different dataset versions on the Web with a low cost. This is a solution to the first task for making rdf queryable on the Web with a low publication cost. In future work, we intend to provide solutions for the other tasks [14] for making different dataset versions possible using the tpf framework. We will extend the existing tpf client so that it is able to understand and consume data from the vtpf interface extension, for the three query atoms. For this, a variant of the sparql query language will be needed, such as t-sparql [7] or sparql-st [11]. Furthermore, our storage solution that is used as a backend to this vtpf interface will be further developed and investigated. Finally, this interface, together with the solutions for the other tasks will be evaluated, so that the publication and consumption cost with the vtpf approach can be determined. An interface like vtpf lowers the barrier for data owners to publish different versions of their data, which will lead to the availability of more Linked Datasets at different versions over the Web. This will open up a new world of querying possibilities at, between, and for versions for domains that require such access, such as the analysis of concept drift. 1 http://swse.deri.org/dyldo/ 2 ostrich is a work-in-progress and will be introduced as a triple store with versioning support in future work. Versioned Triple Pattern Fragments 11 References 1. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: A nucleus for a Web of open data. In: The semantic web, pp. 722–735. Springer (2007) 2. Berners-Lee, T., Connolly, D.: Delta: an ontology for the distribution of differences between rdf graphs. World Wide Web, http://www.w3.org/DesignIssues/Diff 4(3), 4–3 (2004) 3. Cyganiak, R., Wood, D., Lanthaler, M.: rdf 1.1: Concepts and abstract syntax. Recommenda- tion, W3C (Feb 2014), http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/ 4. Feigenbaum, L., Todd Williams, G., Grant Clark, K., Torres, E.: sparql 1.1 protocol. Rec., W3C (Mar 2013), http://www.w3.org/TR/2013/REC-sparql11-protocol-20130321/ 5. Fernández, J.D., Umbrich, J., Polleres, A., Knuth, M.: Evaluating query and storage strategies for rdf archives. In: Proceedings of the 12th International Conference on Semantic Systems (2016) 6. Fielding, R.T.: rest apis must be hypertext-driven (Oct 2008), http://roy.gbiv.com/ untangled/2008/rest-apis-must-be-hypertext-driven 7. Grandi, F.: t-sparql: A TSQL2-like temporal query language for rdf. In: ADBIS (Local Proceedings). pp. 21–30. Citeseer (2010) 8. Harris, S., Seaborne, A., Prud’hommeaux, E.: sparql 1.1 query language. Recommendation, W3C (Mar 2013), http://www.w3.org/TR/2013/REC-sparql11-query-20130321/ 9. Lanthaler, M., Gütl, C.: Hydra: A vocabulary for hypermedia-driven Web apis. In: Proceedings of the 6th Workshop on Linked Data on the Web (May 2013) 10. Meimaris, M., Papastefanatos, G., Viglas, S., Stavrakas, Y., Pateritsas, C., Anagnostopoulos, I.: A query language for multi-version data Web archives. Expert Systems 33(4), 383–404 (2016) 11. Perry, M., Jain, P., Sheth, A.P.: sparql-st: Extending sparql to support spatiotemporal queries. In: Geospatial semantics and the semantic web, pp. 61–86. Springer (2011) 12. Van de Sompel, H., Nelson, M.L., Sanderson, R., Balakireva, L.L., Ainsworth, S., Shankar, H.: Memento: Time travel for the Web. arXiv preprint arXiv:0911.1112 (2009) 13. Taelman, R., Verborgh, R., Colpaert, P., Mannens, E.: Continuous client-side query evaluation over dynamic Linked Data. In: The Semantic Web: ESWC 2016 Satellite Events (May 2016) 14. Taelman, R., Verborgh, R., Mannens, E.: Exposing rdf archives using Triple Pattern Frag- ments. In: Proceedings of the 20th International Conference on Knowledge Engineering and Knowledge Management: Posters and Demos (Nov 2016) 15. Tunnicliffe, S., Davis, I.: Changeset vocabulary (2005), http://vocab.org/changeset/ 16. Umbrich, J., Decker, S., Hausenblas, M., Polleres, A., Hogan, A.: Towards dataset dynamics: Change frequency of Linked Open Data sources. 3rd International Workshop on Linked Data on the Web (LDOW) (2010) 17. Van Herwegen, J., De Vocht, L., Verborgh, R., Mannens, E., Van de Walle, R.: Substring filtering for low-cost Linked Data interfaces. In: The Semantic Web – ISWC 2015. Lecture Notes in Computer Science, vol. 9366, pp. 128–143. Springer (Oct 2015) 18. Vander Sande, M., Verborgh, R., Van Herwegen, J., Mannens, E., Van de Walle, R.: Oppor- tunistic Linked Data querying through approximate membership metadata. In: The Semantic Web – ISWC 2015. Lecture Notes in Computer Science, vol. 9366, pp. 92–110. Springer (Oct 2015) 19. Verborgh, R.: Querying history with Linked Data (2016), http://ruben.verborgh.org/ blog/2016/06/22/querying-history-with-linked-data/ 20. Verborgh, R., Dumontier, M.: A Web API ecosystem through feature-based reuse (2016), http://arxiv.org/abs/1609.07108, under submission 21. Verborgh, R., Vander Sande, M., Hartig, O., Van Herwegen, J., De Vocht, L., De Meester, B., Haesendonck, G., Colpaert, P.: Triple Pattern Fragments: a low-cost knowledge graph interface for the Web. Journal of Web Semantics 37–38 (Mar 2016)