Enabling the Scholarly Discourse of the Future: Versioning RDF Data in the Digital Humanities Martina Bürgermeister University of Graz Graz, Austria Abstract The dynamic and collaborative scholarly landscape of the humanities and cultural sciences is in urgent need of reliable knowledge about the origin and genesis of its data. Versioning, with its capability to docu- ment data and its modifications, can be used to integrate and provide this critical information. In the following paper, the versioning of RDF data is presented as a method to make research more trustworthy, and as a means of turning the processes through which data is changed over time into objects of study within the digital humanities. 1 Introduction Scholarly work and the knowledge that it generates are the result of iterat- ive processes. Making these processes transparent is an indispensable part of scholarly communication: it serves as the basis for trust in data, which in turn encourages the use of that data in subsequent research (Borgman, 2010; Zimmerman, 2008). The social need for trust in data is reflected in the concept of the Semantic Web, where the legitimacy of data plays a fundamental role. There are sev- eral ways to render data more trustworthy, one of them being the provision Creative Commons License Attribution 4.0 International (CC BY 4.0). In: Tara Andrews, Franziska Diehr, Thomas Efer, Andreas Kuczera and Joris van Zun- dert (eds.): Graph Technologies in the Humanities - Proceedings 2020, published at http://ceur-ws.org. 1 of metadata concerning the origin of data and the modifications it has un- dergone over time: [N]ot only could such provenance metadata be used as selection cri- teria, their existence may encourage scholars to contribute data to the community, as credit could be assigned more accurately and publicly. Provenance metadata also could enable processed data to be ‘rolled back’ through transformations for the purposes of correcting errors at in- terim stages or computing different transformations (Borgman, 2010, pp. 131-132). Documenting data and its modifications and, if necessary, reversing those modifications, is precisely the function that versioning can fulfill. This paper presents versioning as a method to make research in the humanities and cul- tural sciences more trustworthy by providing researchers with greater know- ledge about the origin and genesis of their data. To date, the various approaches that have emerged over the last 15 years have not produced a universally accepted versioning standard for RDF data. The suitability and usefulness of these approaches for the digital humanities will be evaluated in the final section of the present paper (5), where typical re- search scenarios will demonstrate the different requirements for versioning mechanisms. First, however, I will address the extent to which versioning can contribute to the trustworthiness of data (2). This will be followed by a discussion of the various approaches to the versioning of RDF data (3), and of the question as to which archiving models are suitable for which applica- tion (4). 2 Versioning as a Dimension of Provenance In 2000, Tim Berners-Lee launched his seminal idea of the Semantic Web Stack (Berners-Lee, 2000) (Figure1). Significantly, it is not only concerned with the development of technical standards to make information machine- understandable, but also incorporates social needs that arise out of dealing with the Semantic Web, with the layers of ‘proof’ and ‘trust’ representing essential preconditions for broad acceptance and use of shared content. Since the debut of Berners-Lee’s stack, many of the technologies featured in Figure1 have become W3C standards (XML, RDF(S), OWL, etc.). When it comes to ‘proof’ and ‘trust,’ however, the situation is quite different. As becomes clear from Berners-Lee’s diagram, the task of the proof layer is to explain the conclusions drawn from the logical layer and to make their origin traceable – Sergei Sizov has referred to the former as “the layer of provenance knowledge” (Sizov, 2007, p. 94). It is this knowledge about resources and what has become of them that engenders trust in the Semantic Web. 2 Figure 1: Semantic Web Stack with proof and trust layers highlighted The World Wide Web Consortium (W3C) is an institution that pro- motes the development and implementation of internet-related standards. Between September 2009 and December 2010, the W3C sponsored the formation of a Provenance Incubator Group, whose goal was to define the phenomenon of provenance in the context of data management as compre- hensively as possible. Their final report presents 14 dimensions of proven- ance – one of which is versioning, defined as “records of changes to an ar- tifact over time and what entities and processes were associated with those changes” (Gil et al., 2010). Having emphasized that changes must be con- sidered in context, the report goes on to explain that this kind of record is considered a dimension of provenance precisely because [d]ealing with evolution and versioning is a critical requirement for a provenance representation. As an artifact evolves over time, its proven- ance should be augmented in specific ways that reflect the changes made over prior versions and what entities and processes were associated with those changes (Gil et al., 2010). Recording versions of digital resources to document change over time is an important part of describing provenance. Versioning enables us to refer to intermediate stages in the research process, but it also allows us to trace a ‘version history’ and to identify and analyze very specific changes between versions. In this respect, versioning functions as a mechanism of the proof layer and helps to foster confidence in digital resources. 3 3 RDF Versioning Approaches RDF versioning can be implemented in different ways. Three different ap- proaches are presented in this section, the first of which is reification (3.1), a way of attaching provenance information or change descriptions to indi- vidual triples. The concept of named graphs is then discussed as a possible alternative (3.2), before the final subsection addresses the extent to which traditional version control systems are suitable for the versioning of RDF graphs (3.3). 3.1 Reification In RDF, it is possible to further describe each RDF statement through a built-in vocabulary.1 This process is called reification, and describes the re- lationship between an instance of a triple and those resources to which the triple refers: “reification is a method of formally modeling a statement in such a way that it can actually be attached as a property to the new state- ment” (Powers, 2003, p. 69). A reified statement can also contain informa- tion on provenance (who made the statement and when), which strengthens confidence in the statement: [T]he key component of reification is the ability to make a statement and have the statement be treated as fact, without any implication that the contents of the statement are themselves facts. This has particular interest when it comes to trust (Powers, 2003, p. 78). In order to version a dataset via reification, it is necessary to furnish indi- vidual triples or a set of triples with information on authors, a version num- ber, or a timestamp. Ideally, the nature of the change itself is also described. For this purpose, separate vocabularies such as the Changeset Vocabulary by Tunnicliffe and Davis (2005-2009) have been developed,2 which makes it possible to express an exact delta between two versions of a resource by means of two sets of triples (additions and removals). An example descrip- tion in RDF/XML appears as follows: Original Title A short description of this resource 1 There is a rdfs:class called rdf:Statement, with the properties rdf:predicate, rdf:subject and rdf:object, https://www.w3.org/TR/rdf-schema/#ch_reificationvocab. 2 http://purl.org/vocab/changeset 4 This example shows a resource with a description and a title. Then the title is modified: New Title A short description of this resource Describing this kind of modification as a reified statement would look like this: 2006-01-01T00:00:00Z Anne Onymous Change of title Original Title New Title 5 Figure 2: Named graph While the changes are described precisely by the vocabulary, the change de- scription ends up being much longer than the statement itself. Even if no exact description of changes is given, each reification of triples involves the addition of a whole statement with subject, predicate, and object (Seaborne and Davis, 2010). This syntactical overhead is a major reason why versioning is often implemented in a different way that reduces memory requirements (see section 4). 3.2 Named Graphs The semantic concept of named graphs goes back to an essay by Carroll et al. (2005). One of their main concerns was to establish a framework that makes resources more trustworthy. When version 1.1 of the RDF standard was in- troduced in 2014, named graphs were included. Since then, it has been pos- sible to name graphs and extend triple statements with an additional com- ponent (IRI): An RDF dataset is a collection of RDF graphs. All but one of these graphs have an associated IRI or blank node. They are called named graphs, and the IRI or blank node is called the graph name. The remain- ing graph does not have an associated IRI, and is called the default graph of the RDF dataset (Schreiber and Raimond, 2014). Figure 2 shows a graph (A), which is assigned a name (B). This named graph then forms part of a new graph (C), which contextualizes the original graph and associates it with other statements. Compared to the mechanism of re- ification, a named RDF graph is easier to read and much more space efficient, which is precisely why current approaches to RDF versioning tend to use this type of information linkage. If a triple is modified, i.e. added or deleted, it 6 can be provided with an IRI, which in turn can be described by other state- ments. 3.3 Version Control Systems Traditional version control systems have their uses, especially in software de- velopment. They store and manage text files which, when modified, lead to new versions with a unique identifier, a timestamp, and the author’s name. It is always possible to determine who changed what and when: every file in the system has a version history, which allows versions to be compared. This comparison is done using diff algorithms, which detect any change to the file, be it in the addition of spaces or the correction of a spelling mistake. Each saved change automatically establishes a new version, allowing the res- toration of each individual saved text state. But how useful is it in practical terms to manage graphs in line form? Serialization techniques allow RDF graphs to be managed as text in line form with traditional version control systems. The best way to accomplish this is to use N-Triples notation, in which the individual triples of an RDF graph are written in one line. Because the N-Triples format employs neither prefixes nor truncated notation, the serialized result of the same RDF graph always looks the same. Sorted N-Triples provide a canonical representation of RDF that is easy to parse and serialize. However, this format has the dis- advantage of being verbose and tedious to read. The following example describes an entry from the Getty Art & Architec- ture Thesaurus3 in N-Triples: "national libraries (institutions)"@en . . "2010-06-10T15:11:49"^^ . By comparison, the same statements in Turtle appear as follows: @prefix aat: . @prefix rdfs: . 3 http://www.getty.edu/research/tools/vocabularies/aat/ 7 @prefix dct: . @prefix xsd: ; dct:created "2010-06-10T15:11:49"^^xsd:dateTime . Especially for small datasets, version management with traditional systems can be a viable choice (Meinhardt et al., 2015), but using them to version RDF graphs comes with several disadvantages: 1. If changes to the graphs are to be tracked, the differences between two graphs must be calculated. The result (delta) is large even for small changes (for example, even a minute change to the URI of the subject entails changes in all other lines that make statements about the sub- ject). 2. As a consequence, the quality criteria which should be fulfilled by a diff algorithm are not met: [T]he diff should construct a minimum set of changes to trans- form one version into the next one. Minimality is important because it captures to some extent the semantics that a human would give when presented with the two versions. It is import- ant also in that more compact deltas provide savings in storage (Cobena et al., 2002). 3. If a change is made to the text that does not lead to a change in the graph, because, for example, a space is inserted before the period at the end of the statement, the dependence of delta on the serialization of the text means that a delta is calculated that indicates that a textual change has occurred. 4. This also means that it is impossible to materialize a past version of a given graph together with the delta – what is required is the graph in the exact same serialization that was used to create the delta. 4 RDF Archiving The main challenges in versioning RDF data are the storage of versions, the performance of the archive, and the associated possibilities of informa- tion retrieval (Fernandez Garcia et al., 2018; Fernández et al., 2015). There are three major strategies for archiving versions: ‘independent copies’ (4.1), where a snapshot or a complete copy is stored; the ‘change-based’ approach (4.2), where only the changes to the graphs are recorded; and the ‘time-based’ 8 approach (4.3), which takes the lifetime of triples into account by archiv- ing the validity period for statements. In real-world technical implementa- tions, these three approaches to archiving frequently appear in combination, which helps to bring the requirements of the respective archive situation and the desired retrieval functionalities into line with available resources in terms of performance and storage capacity. 4.1 Independent Copies An important advantage of the ‘independent copies’ strategy is the low tech- nical effort required to implement the storage of data record versions. Each version is stored and managed as a new, isolated dataset. The DBpedia pro- ject,4 initiated by the Freie Universität Berlin and Leipzig University in co- operation with OpenLink Software with the goal of extracting data from Wikipedia and making it available as Linked Open Data (LOD), has used this method to archive 18 versions of their entire data pool over the course of the regular updates undertaken between its initial release in 2007 and Oc- tober 2016. Wikidata,5 a project of Wikimedia Germany, also archives independent datasets. The project started in 2012 with the objective of providing data that can be used by any Wikimedia project, including Wikipedia. Since 2015, data dumps have been made available on the Internet Archive, amounting to a total of 162 Wikidata versions by the time of writing this paper.6 In order to make these versions queryable and comparable, each version could be stored as a graph in a triple store and provided with additional metadata concerning provenance.7 This way of dealing with multiple versions is very well suited for playing back entire versions and querying individual versions.8 However, the disadvantage of this approach is that it is very memory-intensive: the process of archiving each version is accompanied by an increasing number of duplicated triples, because there is a static core of unchanged triples that occurs in every version. 4 http://wiki.dbpedia.org/ 5 https://www.wikidata.org/w/index.php?title=Wikidata:Main_Page&oldid=1086709037 6 https://archive.org/details/wikimediadownloads 7 The user Tbt (https://www.wikidata.org/wiki/Wikidata:History_Query_Service) is working on a tool to query past Wikidata versions. In the future, all deletions in a named graph and all additions in a separate named graph will be available for query. On the discussion page, the author writes the following: “This tool ingests data from the XML revision dumps, so it follows what is done for dumps regarding oversight. I do not know exactly what is done for XML dumps.” Tpt (talk) 16:57, 9 April 2019 (UTC) 8 On query requirements, (Fernandez Garcia et al., 2018; Fernández et al., 2015). 9 4.2 Change-Based Approach The redundancies discussed in the previous subsection do not occur with change-based archiving. The amount of memory required is kept to a min- imum. Only the changes (deltas) are archived, while static elements do not have to be repeated. The delta of each triple consists of the change descrip- tion and the change relationship (delete or add).9 Because triples, unlike text, can be localized without reference to lines, a version consists of a set of de- leted triples and a set of added triples (Graube et al., 2014).10 Many practical features of version control systems can also be found in the implementation of systems adapted to RDF such as R&Wbase, a system proposed by Vander Sande et al. (2013) that is based on the core concept of distributed version control, which enables triple read and write. Here, vari- ous functionalities known from version control systems such as committing, merging, branching, and tagging are made available for collaborative work on RDF datasets. Commits, for example, are described using the PROV on- tology, with each commit including a timestamp, the previous version, the name of the version just created, a title, and a responsible user. In effect, R&Wbase thus works as a separate versioning layer in a quad store. In many respects, R43ples, the implementation developed by Graube et al. (2014), is very similar. It does, however, employ an additional triple store which acts as what we could call a ‘versioning proxy.’ As the applica- tion discussed before, the versions can be queried and updated via SPARQL – Graube et al. (2014) have introduced specific SPARQL keywords for this purpose, namely REVISION, BRANCH, and TAG. Archives that store triple versions via a change-based approach are de- signed to record the changes to versions. When querying a specific state of the dataset at a certain point in time, the system’s computational effort in- creases: first, all deltas up to the first entry (or the next full version/snapshot) must be recalculated, followed by the deltas from the reconstructed first ver- sion to the desired version. In order to cut down on the amount of pro- cessing power required, Im et al. (2012) introduced the concept of aggreg- ated delta, which is independent from its predecessor version because it com- bines all change information in compressed form and as such reduces re- sponse time significantly. 9 Barabucci 2016 defines delta as follows: “A delta […] is a tuple of changes (C) and change relations (R) that describes how to transform the source document (S) into the tar- get document (T)” (Barabucci et al., 2016, 50). 10 The deltas are calculated from syntactic changes to the data. Changes in semantics (e.g. when a class is renamed) are much more complex to calculate – these are referred to as high-level deltas (Fiorelli et al., 2017, pp. 147-148). Some version control systems (e.g. Subversion, https://subversion.apache.org/) solely save changes. 10 4.3 Time-Based Approach As mentioned above, there are also approaches that add a time component to the previously discussed storage forms (snapshot or delta) with the goal of optimizing system performance for very specific queries (e.g. the query for valid triples at a certain time or interval). A time-based strategy can be implemented in two ways. One possibility is to assign to each triple, for as long as it exists, meta information in the form of a time stamp as each new version is created. However, with this ap- proach, there is always write work for the system, since it also assigns a new time stamp to triples that remain unchanged. An alternative would be to only annotate triples when they are added or deleted, meaning that a max- imum of just two data fields is added to the triple. This implementation can be found, for example, in X-RDF-3X, a plat- form developed by Neumann and Weikum (2010) which adds two addi- tional fields to the triple: one timestamp for creation and one for deletion, with the latter having a zero value for valid triple versions. The interval between the created and deleted timestamp represents the lifetime of the triple version. The state of a database at a certain point in time can thus be reconstructed by returning all triples for which the point in time falls within the corresponding lifetime intervals (Neumann and Weikum, 2010, p. 258) – a system which allows a quick representation of what has changed from one version to the next, because a change means that the respective triple must be recorded with a time stamp. This procedure is therefore also change-based, but the delta consists of timestamps as opposed to deleted and added triples in different named graphs. There are also systems that combine all three of these archiving strategies in order to get the best of all worlds. In the implementation by Meinhardt et al. (2015), each version in the archive exists as a changeset that contains information about modifications at a certain point in time. The changeset contains at least one snapshot. If triples are added, they are stored as a delta to a snapshot with a time stamp; if a triple is deleted, a new snapshot is cre- ated, also carrying a time stamp. When a version is to be materialized, then the snapshot closest to the requested time is retrieved and the corresponding delta is added. The approach of Taelman et al. (2019) also combines snap- shot and delta archiving. In addition, they rely on special indexing methods for enhanced efficiency in “evaluating queries at a certain version, between any two versions, and for versions” (Taelman et al., 2019, p. 4). 11 5 Use Cases in the Digital Humanities The models for version storage discussed in the present paper allow for a wide range of query requirements to be met. But what kind of demands regarding the versioning of RDF or Linked Open Data exist within the context of the digital humanities? Fiorelli et al. (2017) define the requirements for an RDF versioning system by distinguishing between user and developer. The user is primarily inter- ested in being able to reference saved versions, whereas the developer wishes to track changes. In actual scholarly practice, however, the requirement pro- files cannot be separated so neatly – in many cases, there is a need for both perspectives. This section presents three likely user scenarios to showcase the importance of versioning in the digital humanities, and to demonstrate how well the RDF versioning approaches discussed so far perform when confron- ted with real-life challenges. The first scenario addresses the problem of data consistency (5.1), scenario two deals with the issue of collaborative research (5.2), and scenario three is concerned with the specific nature of scholarly discourse in the humanities (5.3). 5.1 Version Reference and Data Consistency Scenario 1: Occasional Data Updates for a Digital Scholarly Edition (DSE) I am the editor of a digital scholarly edition of account books. I modeled my data in RDF and published it. I did a statistical analysis of the data and displayed it to the users. Now I want to add another account book, but the evaluation of my analysis is based on a closed database. How can I make the current state of research accessible and still publish new RDF data? On the one hand, this scenario describes a basic retrieval task, namely the re- trieval of a specific version of the recorded data. On the other, there is also the need to query this past version, for example in order to keep statistical values verifiable. The ‘independent copy’ approach is perfectly adequate to fulfill these requirements, as each historical version dumps in an ordinary RDF memory. It is easy to provide full versions in the form of snapshots and to enrich these snapshots with additional metadata – for example, the PROV ontology can be used to create a version catalog or to provide addi- tional provenance data, whereas different graphs (versions) can be queried via SPARQL. A time-based strategy involving the annotation of individual triples with their validity range could also work well. Materializing a specific past version is the most computationally demanding task when using an approach that 12 only stores deltas, since all deltas must be re-calculated when requesting a full version. Assuming that the deltas are stored in such a way that they are only connected to the previous delta, the change chain is calculated back to the first complete version and then, in order to be able to materialize a certain later state, the deltas are calculated forward again up to the desired version. In this scenario, combined storage approaches that record both snapshots and deltas can help to keep the computational effort manageable. 5.2 Change Inspection and Evolution Tracking Scenario 2: Collaborative Ontology Development I work for the National Library, and we want to develop a cross- institutional common ontology that is maintained collaboratively. With these collaborative processes, would it not be useful to track the changes and to analyze how and why they occurred? This scenario focuses on the problem of collaborative editing. There is a need to query the changes that occurred between two or more versions. Here, functionalities provided by version control systems are expected – these in- volve merging branches, highlighting conflicts, quickly undoing changes, and so on. Yet a second requirement is also formulated, namely the track- ing of very specific changes. Calculating the difference between the versions of the datasets with an ‘independent copy’ approach requires significant computational assets given that the calculation of specific deltas takes place at query time. With a time- based approach, however, this task can be accomplished with little effort, provided that the time of adding and deleting the triple is available as meta in- formation for each triple. In either case, the change-based implementations by Vander Sande et al. (2013) and Graube et al. (2014) discussed in subsec- tion 4.2 are useful because they include practical functionalities of version control systems that can be queried using an extended SPARQL vocabulary. They also offer the possibility to use the ‘commit’ function to describe the changes in the collaborative process in more detail. 5.3 Qualitative/Quantitative Analysis of Data Evolution Scenario 3: Analysis of Historical Topic Evolution in Wikidata I am a historian, and I want to analyze historical topic development in Wikidata from the beginning of Wikidata until today. I am interested in the evolution of knowledge and its representation. For instance, how are historical events and persons perceived and described? What are the primary topics, and how have certain aspects evolved and changed over time? How can I retrieve this kind of data? 13 In this scenario, the goal is to discover patterns that arise and develop over time. This results in specific requirements: not only should it be possible to compare past versions with each other, but the system should also allow users to query the changes that have taken place. To query information that is present in several versions, or specific changes that take place between them, is a task that is very difficult to accomplish with a straightforward ‘independ- ent copy’ approach – it is far easier to query the validity of triples using a time-based solution. However, when it comes to tracking concept changes, the ability to query deltas is required. The approach developed by Meinhardt et al. (2015) makes it possible to search for changes with certain timestamps by using the MEMENTO protocol. The solutions proposed by Taelman et al. (2019) are another promising way to address this retrieval challenge. 6 Conclusion When versioning data and archiving it for research purposes, its usability for future scholarship is a major concern. The choice of a specific versioning strategy depends on how often and how much our research data changes and, above all, on which information is to be made retrievable. This paper has illustrated the different requirements for versioning systems with user scen- arios that are likely to occur in the context of the digital humanities. As has become clear, there are technically simple solutions to the problem of render- ing RDF data and the changes that it is subjected to referenceable and retriev- able. System requirements are bound to increase with a higher frequency of changes and mounting demands concerning their traceability, especially in collaborative research. For the collaborative development of graph data, versioning models are required that adopt functionalities from collaborative software development. Additional features permitting detailed analysis of changes over time are also desirable, and will empower scholars to undertake increasingly complex research projects as more and more RDF data is being published – and changed – in years to come. As scholars working in the digital humanities, we are particularly inter- ested in the origin and development of our research data. Versioning can be used to integrate and provide this critical information, whose importance for scholarly practice cannot be overstated. Moreover, versioning mechanisms ensure that our data can itself be used as an object of study. Not only does versioning create trust – it also enables the scholarly discourse of the future. References Barabucci, G., Ciancarini, P., Di Iorio, A., and Vitali, F. (2016). Measuring the Quality of Diff Algorithms: A Formalization. Computer Standards 14 & Interfaces, 46:52–65, DOI: 10.1016/j.csi.2015.12.005. Berners-Lee, T. (2000). Semantic Web on XML. https://www.w3.org/2000/ Talks/1206-xml2k-tbl/slide1-0.html. Borgman, C. L. (2010). Scholarship in the Digital Age. Information, Infrastructure, and the Internet. MIT Press, Cambridge, MA, DOI: 10.7551/mitpress/7434.001.0001. Carroll, J., Bizer, C., Hayes, P., and Stickler, P. (2005). Named Graphs, Provenance and Trust. In WWW’05: Proceedings of the 14th international conference on World Wide Web, pages 613–622. DOI: 10.1145/1060745.1060835. Cobena, G., Abiteboul, S., and Marian, A. (2002). Detecting Changes in XML Documents. In Proceedings 18th International Conference on Data Engineering, pages 41–52. DOI: 10.1109/ICDE.2002.994696. Fernández, J. D., Umbrich, J., Polleres, A., and Knuth, M. (2015). To- wards Efficient Archiving of Dynamic Linked Open Data. In Debat- tista, J., d’Aquin, M., and Lange, C., editors, Proceedings of the First DIA- CHRON Workshop on Managing the Evolution and Preservation of the Data Web, volume 1377 of CEUR Workshop Proceedings, pages 34–49. http://ceur-ws.org/Vol-1377/. Fernandez Garcia, J. D., Umbrich, J., Polleres, A., and Knuth, M. (2018). Evaluating Query and Storage Strategies for RDF Archives. Semantic Web Journal, http://epub.wu.ac.at/6488/. Fiorelli, M., Pazienza, M. T., Stellato, A., and Andrea, T. (2017). Change Management and Validation for Collaborative Editing of RDF Data- sets. International Journal of Metadata, Semantics and Ontologies, 12(2/3):142–154, DOI: 10.1504/IJMSO.2017.090783. Gil, Y., Cheney, J., Groth, P., Groth, P., et al. (2010). Provenance XG Final Report. Technical report, W3C Incubator Group. Graube, M., Hensel, S., and Urbas, L. (2014). R43ples: Revisions for triples - An Approach for Version Control in the Semantic Web. In Proceedings of the 1st Workshop on Linked Data Quality co-located with 10th Interna- tional Conference on Semantic Systems, SEMANTiCS. Im, D.-H., Lee, S.-W., and Kim, H.-J. (2012). A Version Manage- ment Framework for RDF Triple Stores. International Journal of 15 Software Engineering and Knowledge Engineering, 22(1):85–106, DOI: 10.1142/S0218194012500040. Meinhardt, P., Knuth, M., and Sack, H. (2015). TailR: A Platform for Pre- serving History on the Web of Data. In Proceedings of the 11th Interna- tional Conference on Semantic Systems, ACM, SEMANTICS ’15, pages 57–64, New York, NY. Association for Computing Machinery, DOI: 10.1145/2814864.2814875. Neumann, T. and Weikum, G. (2010). X-RDF-3X: Fast Querying, High Update Rates, and Consistency for RDF Databases. Proc. VLDB Endow- ment, 3(1/2):256–263. Powers, S. (2003). Practical RDF. O’Reilly, Beijing/Cambridge. Schreiber, G. and Raimond, Y. (2014). RDF 1.1 Primer. Technical report, W3C. Seaborne, A. and Davis, I. (2010). Supporting Change Propagation in RDF. In Proceedings of the W3C Workshop – RDF Next Steps. Sizov, S. (2007). What Makes You Think That? The Semantic Web’s Proof Layer. IEEE Intelligent Systems, 22(6):94–99, DOI: 10.1109/MIS.2007.120. Taelman, R., Vander Sande, M., and Van Herwegen, J. (2019). Triple Stor- age for Random-Access Versioned Querying of RDF Archives. Web Se- mantics: Science, Services and Agents on the World Wide Web, 54:4–28, DOI: https://doi.org/10.1016/j.websem.2018.08.001. Tunnicliffe, S. and Davis, I. (2005-2009). Changeset Vocabulary. https:// vocab.org/changeset/. Vander Sande, M., Colpaert, P., Verborgh, R., Coppens, et al. (2013). R & Wbase: Git for Triples. In Bizer, C., Heath, T., Berners-Lee, T., Hausen- blas, M., and Auer, S., editors, Proceedings of the 6th Workshop on Linked Data on the Web. http://ceur-ws.org/Vol-996/papers/ldow2013-paper-01.pdf. Zimmerman, A. S. (2008). New Knowledge from Old Data: The Role of Standards in the Sharing and Reuse of Ecological Data. Science, Techno- logy, & Human Values, 33(5):631–652, DOI: 10.1177/0162243907306704. 16