=Paper=
{{Paper
|id=Vol-1215/paper-03
|storemode=property
|title=R43ples: Revisions for Triples - An Approach for Version Control in the Semantic Web
|pdfUrl=https://ceur-ws.org/Vol-1215/paper-03.pdf
|volume=Vol-1215
|dblpUrl=https://dblp.org/rec/conf/i-semantics/GraubeHU14
}}
==R43ples: Revisions for Triples - An Approach for Version Control in the Semantic Web==
R43ples: Revisions for Triples An Approach for Version Control in the Semantic Web Markus Graube Stephan Hensel Leon Urbas Chair for Process Control Chair for Process Control Chair for Process Control Systems Engineering Systems Engineering, Systems Engineering, Technische Universität Technische Universität Technische Universität Dresden, Germany Dresden, Germany Dresden, Germany markus.graube@tu- stephan.hensel@tu- leon.urbas@tu- dresden.de dresden.de dresden.de ABSTRACT enablers to support new inter-organisational collaboration For most use cases, the Semantic Web provides essential models for the creation of virtual enterprises. The Com- mechanisms to interlink data in a fast and efficient way. Vantage1 project explores the capabilities of Linked Data However, it is still not widely accepted in industry since some (LD) as a flexible and rapidly unifying way to provide access important features are not mature enough. Requirements to the data vaults of all stakeholders of a virtual enterprise include easier model transformation and access to dynamic by creating a product-centred collaboration space. However, data. One of the most missing important features is version the almost unlimited openness and flexibility of Linked Data control which would make it possible to record changes in may also involve disadvantages. Industrial applications re- a way that they can be rolled back at any time. Recent quire reliability, security and stability. Thus, they need to version control system are not very well integrated into the keep control over the process of data manipulation. In fact, Semantic Web. version control is an essential requirement for them to adapt this technology. This paper shows a novel way of dealing with version control for Linked Data. It presents R43ples as an approach using Section 2 of this paper states the need for version control named graphs to semantically store the differences between systems in the Semantic Web and provides an overview of revisions. Furthermore it allows direct access and manip- related work and our contributions. In section 3, the ver- ulation of revisions with SPARQL. Thus, the access is al- sion control concept of R43ples is presented. Section 4 de- most transparent for the clients which can still use known scribes the prototypical implementation. Section 5 evaluates SPARQL queries enhanced with some additional keywords. the concept and gives some metrics of the implementation. A prototypical implementation of the system shows a proof Section 6 discusses the concept further before the paper is of concept and performance considerations. concluded with an outlook of possible enhancements. Categories and Subject Descriptors 2. BACKGROUND H.2.3 [Database Management]: Languages—Query lan- 2.1 Linked Data guages; H.3.4 [Information Storage and Retrieval]: Sys- Linked Data is a set of best practices for modelling and tems and Software—Information networks; H.3.5 [Information interconnecting information in a widely accepted semantic Storage and Retrieval]: Online Information Services— way. It is becoming more and more important in the world Web-based services of Linked Open Data. It uses the Resource Description Framework (RDF) as the base model. RDF handles informa- General Terms tion as a semantic network of single statements consisting Linked Data, Versioning, SPARQL, Revision, Query, Named of subject, predicate and object. LD information entities Graphs are referenced by URIs. A named graph is a collection of RDF statements grouped together and identified by a URI. Named graphs are a kind of transformation of quads (a triple 1. INTRODUCTION with a fourth element). SPARQL (SPARQL Protocol And The explosion of the Semantic Web in recent years [9] has RDF Query Language) is the dominant query language in provided the opportunity to develop advanced technology the Semantic Web. It uses a graph-based matching mecha- nism with powerful filter and aggregating functionality and additional support for named graphs. Nevertheless, Linked Data might also be a useful technology for industrial envi- ronments [6]. This requires controlled write mechanisms to the Linked Data cloud as stated by Berners-Lee and O’Hara [2]. Version control could be one way to achieve such a con- Copyright is held by the author/owner(s). trolled read-write mechanism. LDQ 2014, 1st Workshop on Linked Data Quality Sept. 2, 2014, Leipzig, 1 Germany. EU FP7 Integrated Project “Collaborative Manufacturing Network for Competitive Advantage”: www.comvantage.eu 2.2 Version Control for Triples statements. Im et al. [8] use a delta-based approach for The major function of version control systems is to record versioning RDF triples and introduce an aggregated delta changes in the information model in order to get back to approach which leverages the construction of a version by a prior version when needed. Furthermore, version control storing additional deltas not only to the prior version but to makes it possible to merge changes of different authors into all other versions. one common information base. Obviously, this functional- ity is not only needed for software engineering but for data Some Semantic Web applications support synchronising be- in general. This includes Linked Data which has a special tween different users, e.g. OntoWiki Mobile [4]. This is close demand for version control because of its very open nature to a version control system. However, this feature is deeply and the number of possible contributors to a data set. integrated into the specific application and its stack. Current version control systems are usually either text-based The concept of Vander Sande et. al. [13], based on [14], for (changes can be localised in lines) or completely binary (no version control seems to meet almost all requirements for localisation of changes possible). However, Linked Data the Semantic Web. Unfortunately, only parts of the system is graph-based and thus in this case the existing systems are modelled semantically, e.g. other parts may use hash don’t meet the localisation mechanisms which is necessary tables to get relations between revisions and difference sets. for merging revisions. Additionally, one can differentiate be- Furthermore, the distributed nature of Git is not utilised tween distributed systems and central systems. In a central despite of the promising title of the article. system like Subversion2 the whole repository is stored on a central server and the clients have local working copies. 2.4 Contributions In distributed systems like Git3 , every client holds the full R43ples offers a completely semantic approach for version- repository and can re-synchronise with other clients. ing RDF data sets in named graphs and accessing them via SPARQL. The concept is based partly on the work of Vander 2.3 Related Work Sande et. al. [13]. However, our approach has no need for additional languages since we use the SPARQL 1.1 features 2.3.1 Model Versioning for updating data. This can be done with adding a few There has been a lot of previous work on versioning of mod- keywords to SPARQL. Furthermore, we propose a model els. For example, Watkins and Nicole [16] started with an of revision information describing both commits as well as ontology for modelling the provenance of documents defin- changes in a purely semantic way using named graphs in- ing a set of meta information for versioning. Taentzer et stead of additional look-up tables. Finally, we provide a al. [12] distinguish between state-based and operation based performance evaluation of a prototypical implementation. versioning systems which have different mechanisms for con- flict detecting and handling. However, although versioning of models is a key technique in model driven engineering, it 3. CONCEPT is not supported by a widely accepted concept. Most models 3.1 Graph Based Version Control described in the literature use entities with identifiers and We use a central repository since no local working copy in don’t rely on any order in a collection. Thus, they can be a traditional sense can be checked out in the Semantic Web easily handled as graphs, which fits the base model in the and held on the client. The complete graph could be ex- Semantic Web. tremly large and every piece of information is potentially connected with other information spread over the global Linked Data cloud. This also excludes conventional Lock- 2.3.2 Temporal RDF Modify-Unlock mechanisms. This would imply that the Another interesting approach which allows tracking of infor- whole network has to be locked. Thus we use a Copy- mation over time in Linked Data is the use of temporal RDF Modify-Merge mechanism where clients get their informa- suggested by [7]. However, we think that versioninghas the tion via SPARQL (copy), work with this in their local mem- advantage over time labelling that related changes are bun- ory (modify) and commit their updates to the server via dled in semantic way and not only by the same time stamp. SPARQL again (merge). This makes it possible for users Furthermore, there is no query for temporal RDF available to keep on working with the well-known SPARQL interface that has a good compatibility with SPARQL. while providing fast and flexible revisions management. 2.3.3 Semantic Web Versioning R43ples handles version control on a graph level and not the Most authors who handle version control systems for the Se- instance level. Thus, a specific version of a whole named mantic Web follow an operation based approach which relies graph is the unit under version control. It is stored as a on specific operations and are thus not well integrated in the revision which can be queried and used as a base for further current Semantic Web environment. Auer and Herre [1] base changes. Unlike in file-based systems (e.g. Subversion or their concept on atomic changes to RDF graphs which are Git) where a revision contains a set of files representing a annotated in reified statements4 of the original data. The specific point in time, a revision in R43ples contains only approach of Cassidy and Ballantine [3] uses context infor- one single named graph. mation in order to store information about patches. The changed triples are on the other hand modelled as reified 3.2 Semantic Revision Model 2 http://subversion.apache.org/ 3.2.1 Data Model of Revisions 3 http://git-scm.com/ The whole approach uses semantics in order to avoid hidden 4 http://www.w3.org/TR/rdf-mt/#Reif meanings which makes it hard for other clients to access the information. Thus, revisions are modelled as Linked Data. 3.3 Dynamic Handling of Revisions The data model uses PROV-O [15] as base ontology and is extended by some attributes. The vocabulary is called Re- 3.3.1 Querying Revisions vision Management Ontology (RMO). Figure 1 shows an ex- Information from the MASTER revision is instantly avail- cerpt of a graph revision model, with one commit generating able since the whole data set exists in the specified named a new revision for a specific named graph (marked in grey). graph. It is used when the client does not specify a revision. The revisions are linked to the named graph http://test (via Therefore, it is likely that it will be accessed very often. the property rmo:revisionOf ) and contain a revision num- ber (rmo:revisionNumber ) for a simple human friendly rep- However, other revisions must be generated dynamically as resentation. The property prov:wasDerivedFrom connects only the delta information is stored between two revisions. two revisions and describes the revision graph. The commit With respect to the revision to be generated, all triples of between two revisions is modelled as standard prov:Activity the add set must be added to the the previous revision and connected via prov:used and prov:generated attributes. It all triples of the delete set must be removed from the pre- holds meta information about commit time (prov:atTime), vious revision. R43ples accepts slightly enhanced SPARQL commit message (dcterms:title) and the actor committing queries which allow to add the revision number for each spec- the changes (prov:wasAssociatedWith). ified graph in the SPARQL query. For each named graph g specified in a query, a temporary graph T Gg , r is generated 3.2.2 Naming Graphs for Storing Revisions for the specified revision r according to equation 1 (gx = full The named graph with the URI of the revisioned graph holds materialised revision x of graph g): the MASTER revision representing the terminal revision of the default branch in the revision graph. The information about other revisions and their connections and further re- nearestBranch X visioned graphs is stored in an additional named graph for T Gg,r = gnearestBranch + (deleteSetg,i −addSetg,i ) each revisioned graph called. All revi- revision i=r sion control systems have to provide information of all re- (1) visions while handling the number of storage. Since “97,3% of the entire data in each version remains unchanged” [8] it This simple formula can be mapped to a series of SPARQL is necessary to compress this data. Delta-based storage is queries as presented in the pseudo code below. It firsts cre- the approach of choice here. According to [10] RDF triples ates a graph merging all change sets. Af- are the smallest unit of change and are thus the basis for terwards it rewrites the query so it uses this new tempo- calculating the differences as deltas between revisions. The rary graph instead of the specified one. The result of the differences of revisions are again a set of triples and can be SPARQL query on that graph is returned after cleaning up stored in additional named graphs. Every revision consists the temporary graph. of one ADD set and one DELETE set assigned with the properties rmo:deltaAdded and rmo:deltaRemoved. Apply- ing these delta sets to the prior revision will lead to the def select_query(query): current revision. for (graph,rev_g) in query.graphs_and_revs(): sparql("COPY GRAPH \ 3.2.3 Tags and Branches TO GRAPH ") The R43ples approach supports tags as references to specific for rev in graph.path_to_revision(rev_g): revisions via the property rmo:references (as shown in fig- sparql("REMOVE GRAPH ure 2). They are of type rmo:Tag and have a unique name FROM GRAPH ") (rmo:tagName) as well as a description (rdfs:description). sparql("ADD GRAPH Similarly, different branches are supported by allowing dif- TO GRAPH ") ferent successors of one revision via prov:isDerivedFrom. Each query.replace(graph, "graph -rev_g ") terminal revision of the generated branches is referenced by result = sparql(query_string) a rmo:Branch entity. The rmo:Master is a subclass pointing for (graph,rev_g) in query.graphs_and_revs(): to the default graph. All these references point to copies of sparql("DROP GRAPH ") a full graph of this revision via rmo:fullGraph property. return result The centralised approach of R43ples can easily achieve the necessary uniqueness of the revision numbers. The revision When considering the merging of revisions, it does not mat- numbers can follow different schemes, for example just or- ter which previous revision is used to generate the merged re- dinals or using a hash. We decided for a more complex vision due to the properties of SPARQL. An INSERT state- naming scheme which indicates the position of a revision in ment of an existing triple does not insert it a second time the graph. For the system these are just strings for provid- and a DELETE statement of a non-existing triple does not ing a human-friendly identifier without semantic meaning end in an error message. The add set A and delete set D of (although the revision number, not shown in figure 2, is also a revision with the set of triples Rm merged from revision part of the URI, e.g. “3.1-22”). The users need to be able to with sets of triples R1 and R2 must comply with the rules retrieve the whole revision graph including the numbers of from equations 2 and 3. the revisions. With R43ples it is possible to receive this in- A = (Rm \R1 ) ∪ (Rm \R2 ) (2) formation like any other data via SPARQL queries directly on the revision graph . D = (R1 \Rm ) ∪ (R2 \Rm ) (3) Figure 1: Data Model of a revision graph with ontology RMO Figure 2: Model of master, branches and tags 3.3.2 Updating Revisions bined or one change has to be selected in preference to the Clients update revisions via the established SPARQL UP- other. This is performed via an additional administrator DATE command. This updates the revision graph with a interface on the server. new revision node which references the new change sets. The changes are both reflected in the new add and delete sets as well as in the updated full graph. However, updates can 3.4 SPARQL extension for R43ples only be performed on the terminal sibling of a branch. In a SPARQL query it has to be possible to determine the re- vision of the involved named graphs. Furthermore, update If a client wants to update a revision which is not referenced queries should contain information about the author and by a branch, the commit is rejected. The client has to merge a commit message. Partly, this information could be em- its local changes with the most recent information of the bedded into the name of the graph. However, we strongly branch. Merging is the application of two different change believe that loading identifiers with semantics would be a sets to one entity. If the local merge is possible, the client violation of the basic principles of Linked Data. Another can recommit these merged changes. The other option is to option are new keywords or specifying this information as explicitly create a new branch for the local changes. part of the WHERE clause as triple patterns like ?revision rmo:revisionOf ; rmo:revisionNumber "43". The client cannot usually merge if it is unable to reconcile However, the latter one has the disadvantage that there is the changes. These conflicts have to be resolved afterwards no clear distinction between the specification of revision in- in order to get a common consolidated data model in the formation and SPARQL query pattern. revision control system. Thus, the changes have to be com- We decided to introduce the additional keyword REVISION SELECT ? s ?p ? o USER FROM REVISION ”4 3 ” BRANCH REVISION ”4 2 ” TO ” WHERE { F e a t u r e xyz ” ? s ?p ? o . } Listing 3: Query for branching from revision 42 Listing 1: SELECT query for revision 43 of graph HTTP Parameter Description graph-revision-number Revision of graph of last query USER graph-revision-number-of-master Current MASTER INSERT DATA INTO REVISION ” revision number of MASTER” MESSAGE ”Small change ” graph { . } Table 1: HTTP header parameters Listing 2: Update query building on top of revison 42 The clients are kept aware of the recent MASTER revision in every SPARQL response. The HTTP response header is extended by additional fields which specify the current MAS- to SPARQL to add the necessary semantic. Furthermore, TER revision number and the revision number on which the the update mechanisms need some meta information about query was executed for every named graph involved. Table 1 the commit introduced by the keywords USER and MES- describes the construction of the parameter names. All un- SAGE. Finally, the creation of tags and branches is solved derlined sub strings are replaced with the current named by the keywords BRANCH and TAG. graph under version control. This information is not needed by the client for querying. Yet it provides the new revision In a SELECT query the user can define the revision number number after a commit and is thus very useful for the client. by applying the FROM clause with the keyword REVISION. It can be a number representing a revision, a string repre- 4. IMPLEMENTATION senting a branch or tag (e.g. “master”) or empty. When it The concept was implemented as proof of concept and its is empty or the keyword REVISION is missing, the MAS- source code is publicly available via GitHub5 . The prototype TER revision will be used as default. An exemplary query is realised as a SPARQL proxy rather than a modification of is shown in listing 1. an existing open-source SPARQL endpoint. The implemen- tation works as a Java application. Jersey6 is used as REST- Updates (INSERT or DELETE queries) can only be exe- ful (Representational State Transfer) Web service framework cuted on a branch specified by the branch name or the num- and grizzly7 as the web server while Virtuoso8 acts as triple ber of a revision referenced by a branch. In INSERT and store and SPARQL endpoint. A live demonstration sys- DELETE queries the performing user must first be defined. tem is running on http://eatld.et.tu-dresden.de:8890/ Therefore the keyword USER is reserved. After the FROM r43ples/sparql. respectively the INTO clause the keyword REVISION iden- tifies the graph revision following the same approach as in Figure 3 shows the system structure. If a client wants to a SELECT query. Furthermore, there could be attached a use the revision control features of R43ples it has to send commit message following the keyword MESSAGE as shown the SPARQL queries to R43ples’ SPARQL endpoint instead in listing 2. of the triplestore’s endpoint. Furthermore there is an ad- ministrator interface which acts as a test bed for functions The REVISION parameter is necessary for the SPARQL that don’t yet have a proper REST interface. These func- endpoint to check to which branch revision a client wants to tions are controlled by a command line interface and perform apply its changes. If the client wants to update a revision complex management of the graphs under version control. that is not directly referenced by a branch, the server will reject the commit. Then, the client needs to check if its data model is consistent with the new information from a branch revision. If so, it can resubmit its changes, or it can open a new branch if there is a conflict the client is not able to handle. If the branch revision of the server matches that of the client, the server will accept the change and create a new revision with the information provided. Then, the Figure 3: System Structure responding branch reference will be forwarded to this new revision. R43ples stores no information about the revisions itself but Listing 3 depicts a SPARQL query for generating a new 5 branch. In the example, a new branch is created with the https://github.com/plt-tud/r43ples 6 information from revision 42. The same interface is avail- https://jersey.java.net/ 7 able for creating a tag using the keyword TAG instead of https://grizzly.java.net/ 8 BRANCH. http://virtuoso.openlinksw.com/ CONSTRUCT {? s ?p ? o } WHERE { The merging feature is still under construction while we are GRAPH { ? s ?p ? o } investigating different approaches for a user friendly inter- FILTER NOT EXISTS { GRAPH { ? s face. ?p ? o } } } 5. EVALUATION Listing 4: Get all added triples 5.1 Response Time An important metric for evaluating the usability of this con- cept is the response time of the service for R43ples queries in various configurations. Therefore, we have measured the uses a configured triplestore which is accessed by the triple time between the request sent by the client and the response store interface. The communication is based on SPARQL received using Apache jMeter12 . We evaluated the operation queries. To ensure the integrity of the data, only the SPARQL time of R43ples in a complex setup on a 4 GB RAM system proxy should have access to the different graphs which it cre- running a Virtuoso 7 as SPARQL endpoint connected to ates. Access rights are defined in the triple store. The clients R43ples. We generated random data sets with sizes of 100, need to know if the endpoint supports R43ples features in 1000, 10000 and 100000 triples. Then we created ten revi- addition to standard SPARQL. Hence, R43ples copies the sions for each data set with changes of 10 to 100 triples. Fi- SPARQL 1.1 Service Description9 of the connected endpoint nally, we measured the response time for a simple SPARQL and adds sd:r43ples as further sd:Feature. query (querying all triples and limiting them to ten results) dependent on all data sets, all revisions and all different The implemented proxy SPARQL endpoint can also handle change sizes. The measurement was repeated 20 times to standard SPARQL queries. Of course, this raises the re- capture random effects such as computing load. quirement that the revisioned graph shall be only edited by R43ples and its specific queries. Otherwise inconsistencies Figure 4 presents some results showing the response time would be generated. Virtuoso supports such access policies in comparison to variations of the three variables around a for the SPARQL endpoint, prohibiting write access to the specific setup (1000 triples in the data set, going back five graph and all graphs which are related commits into the past with 50 triples changing in every com- to R43ples. mit). The left plot shows that the response time increases linearly with the number of commits plus a constant bias of The generation and update of the version system informa- some milliseconds. The size of the commit seems also to be tion is completely implemented with the help of SPARQL almost linear to the response time as suggested by the mid- queries. R43ples performs a SPARQL update on a tem- dle plot. Even the size of the data set has linear influence porary copy of the full graph of the specified branch. Af- (note the logarithmic scale in the right plot). terwards, it retrieves all added triples with the SPARQL query from listing 4 which returns all triples which are in A deeper analysis shows that the structure of the data set NEW-REV-TEMP but not in LAST-REV. After the same is not significant. The overhead for querying a revision that concept was used for the removed triples, the ADD and is available as full graph is about 10 ms in comparison to a DELETE sets are constructed with the help of a SPARQL direct SPARQL query and is thus almost negligible. How- CONSTRUCT query. Then the new revision information is ever, if the revision has to be generated by R43ples, the inserted in and the actual full graph is dominant factors are the overall size of changes to be re- updated with the help of INSERT and DELETE queries. versed and the size of the data set. Equation 4 lists a sim- ple linear model which almost exactly reflects these findings The administrator interface offers an additional way for in- (R2 = 0.98) with the variables T as R43ples response time teracting with R43ples for those features which don’t have in milliseconds, SDS as data set size, SC as change size and a friendly REST interface yet. Those tasks are currently: P as path length to a full graph revision. Thus, in many application T would be of order O(SDS ). • Put an existing graph under revision management T = 100 + 0.06 ∗ SDS + 0.7 ∗ (P ∗ SC ) (4) • Import a new graph under version control The results makes sense since the algorithm has to duplicate • Generate visualisation of the revision graph (yEd ex- the graph and then apply all changes. Both efforts are pro- port) portional to the size. As minor result R43ples can perform few revisions and big changes in each revision step better • Set a new MASTER revision than lots of small changes assuming that the overall num- ber of changed triples is the same. Furthermore, UPDATE • Merge two revisions query time increases linearly with the size of the committed change set. The admin interface currently supports turtle serialisation10 for the export and import of RDF data. The visualisation 5.2 Storage of the revisions, their connections and branches is done by The costs for a new revision S∆,Revision (in additional triples) creating a GraphML file which can be viewed with yEd11 . are almost proportional to the size of changes and indepen- 9 dent from the complexity of previous revisions and the re- http://www.w3.org/TR/sparql11-service-description/ 10 vision graph (S∆,Revision = SC + 12). The additional fixed http://www.w3.org/2007/02/turtle/primer/ 11 12 http://www.yworks.com/de/products_yed_about.html http://jmeter.apache.org/ Figure 4: R43ples response time in comparison to the revision path length (left), the change size of the single commits (middle), and the size of the data set (right) triples in (six for a revision; six for the change sets are not equal to the ones in the full graph, pro- commit) are negligible. The creation of a branch or a tag hibiting a correct application of the changes when gener- copies the full graph besides the addition of a fixed number ating an old revision. This can of course be solved by a of triples (S∆,Tag = SDS + 11; S∆,Branch = SDS + 15). prior Skolemization which should ideally be performed by the client or could also be achieved by an enhanced version of R43ples before executing a SPARQL query. 6. DISCUSSION Although the approach presented here solves most of the Currently, the generation of uncached revision follows a sim- versioning problems, there are also some drawbacks. ple approach applying all changes from the first successor until reaching the leaf of a branch. However, if there are Named graphs are used extensively, mainly for storing differ- many tags in the revision graph, it could be more efficient ences between revisions. This means that the use of named to use another revision path to generate this revision. Thus, graphs for other purposes cannot be guaranteed. Those pur- one has to solve a shortest-path-problem. poses could be structuring of information, access control or additional provenance information. One might ask if we Another point of discussion is the way of transferring the need an additional context attribute as a fifth element ex- necessary additional information. Currently, the R43ples plicitly declared for revision control. SPARQL server transports the MASTER revision as well as the relevant revisions of all involved graphs in the HTTP The concept is fully transparent for SPARQL clients which header. On the other hand, the R43ples clients transport are not aware of the R43ples version control system. They information about the graph revisions in the HTTP body. can use the prototype as common SPARQL endpoint with- An alternative would be to transfer both information in the out additional features always working on the master re- HTTP body and thus on the same level. This would need vision. Clients can easily check if an endpoint supports an extension of the SPARQL result model. R43ples query by evaluating the service description of the endpoint. The integration of version control into the existing Semantic Web tool environment is not easy. A basic requirement is Clients will usually work on MASTER or other branches in that these tools don’t work on a file basis but on a triplestore order to get the most recent information. However, there with SPARQL interface. Under these circumstances it would could be situations when clients should continuously work be no big problem to exchange the SPARQL interface with on a specific revision of a graph. Then, this revision of the the slightly enhanced R43ples interface. graph should be tagged in order to store a full copy. A possible solution would be the automatic detection of such The performance of the prototype limits the application to frequently used revisions and triggering of tag generation. medium sized data sets. Queries on data sets with more than a few thousand triples take longer than most users Another drawback is the lack of support for blank nodes in are willing to wait. This can be solved by splitting large the current implementation. You can’t assume that blank data sets into smaller ones and by directly implementing the nodes from different graphs with the same blank node iden- concept into the SPARQL endpoint which should improve tifiers are the same. For example, the blank nodes in the performance considerably. Another promising approach we [7] C. Gutierrez, C. Hurtado, and A. Vaisman. are currently working on is the use of enhanced SPARQL Introducing time into RDF. IEEE Transactions on rewriting in order to perform the query taking into account Knowledge and Data Engineering, 19(2):207–218, Feb. the full graph and all change sets in one request Hence, the 2007. generation of the whole graph for the specified revision is [8] D.-H. Im, S.-W. Lee, and H.-J. Kim. A version not necessary which really takes long time for big datasets. management framework for RDF triple stores. International Journal of Software Engineering and Finally, security is a crucial point for all industrial appli- Knowledge Engineering, 22(01):85–106, Feb. 2012. cations. We rely on the adaptable security mechanisms of [9] J. Murdock, C. Buckner, and C. Allen. Containing the existing triple stores and SPARQL endpoints. These should semantic explosion. In Procedings of PhiloWeb, Lyon, only provide information about the revision tree and the 2012. revisioned data sets to authenticated and authorised users. [10] D. Ognyanov and A. Kiryakov. Tracking changes in This could be achieved for example by the approach sug- RDF(S) repositories. In Knowledge Engineering and gested by [11, 5]. Knowledge Management: Ontologies and the Semantic Web, page 373–378. Springer, 2002. 7. CONCLUSIONS [11] P. Ortiz, O. Lazaro, M. Uriarte, and M. Carnerero. We have presented a concept for a semantic revision con- Enhanced multi-domain access-control for secure trol system for Linked Data which uses the capabilities of mobile collaboration through linked data cloud in SPARQL. The implemented prototype works well for query- manufacturing. In Proceedings of IEEE World of ing cached graphs. The generation of uncached graphs is suf- Wireless Mobile and Multimedia Networks ficient for small to medium sized data sets. The advantage (WoWMoM) conference 2013, 2013. of our approach is that it is completely based on semantics [12] G. Taentzer, C. Ermel, P. Langer, and M. Wimmer. A and thus the information about revisions can be retrieved via fundamental approach to model versioning based on SPARQL. Furthermore, SPARQL is used as access mecha- graph modifications: from theory to implementation. nism with only slight adaptations in order to ensure the se- Software & Systems Modeling, page 1–34, 2012. mantic use of revision information while keeping the query [13] M. Vander Sande, P. Colpaert, R. Verborgh, compatible to standard SPARQL. S. Coppens, E. Mannens, and R. Van de Walle. R&Wbase: git for triples. In Proceedings of the 6th However, this concept still needs further research. Our next Workshop on Linked Data on the Web, 2013. steps will involve investigating different merging approaches [14] M. Völkel and T. Groza. SemVersion: an RDF-based and an intensive consideration of how this concept can be ontology versioning system. In Proceedings of the integrated into existing tools. IADIS international conference WWW/Internet, volume 2006, pages 195—202. Citeseer, 2006. [15] W3C. PROV-O: the PROV ontology, Apr. 2013. 8. ACKNOWLEDGEMENTS [16] E. R. Watkins and D. A. Nicole. Version control in This research was partly funded by the European Commis- online software repositories. In Proceedings of the 2005 sion on the grant number 284928 (ComVantage). International Conference on Software Engineering Research and Practice, volume 2, page 550–556, 2005. 9. REFERENCES [1] S. Auer and H. Herre. A versioning and evolution framework for RDF knowledge bases. In Perspectives of Systems Informatics, page 55–69. Springer, 2007. [2] T. Berners-Lee and K. O’Hara. The read-write linked data web. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 371(1987):20120513–20120513, Feb. 2013. [3] S. Cassidy and J. Ballantine. Version control for RDF triple stores. ICSOFT (ISDM/EHST/DC), 7:5–12, 2007. [4] T. Ermilov, N. Heino, S. Tramp, and S. Auer. Ontowiki mobile–knowledge management in your pocket. In The Semantic Web: Research and Applications, page 185–199. Springer, 2011. [5] M. Graube, P. Ortiz, M. Carnerero, O. Lazaro, M. Uriarte, and L. Urbas. Flexibility vs. security in linked enterprise data access control graphs. In Proc. of 9th IEEE Int. Conf. on Information Assurance and Security, 2013. [6] M. Graube, J. Pfeffer, J. Ziegler, and L. Urbas. Linked data as integrating technology for industrial data. International Journal of Distributed Systems and Technologies (IJDST), 3(3):40–52, 2012.