=Paper= {{Paper |id=Vol-1215/paper-03 |storemode=property |title=R43ples: Revisions for Triples - An Approach for Version Control in the Semantic Web |pdfUrl=https://ceur-ws.org/Vol-1215/paper-03.pdf |volume=Vol-1215 |dblpUrl=https://dblp.org/rec/conf/i-semantics/GraubeHU14 }} ==R43ples: Revisions for Triples - An Approach for Version Control in the Semantic Web== https://ceur-ws.org/Vol-1215/paper-03.pdf
                                   R43ples: Revisions for Triples
                        An Approach for Version Control in the Semantic Web
                  Markus Graube                             Stephan Hensel                        Leon Urbas
              Chair for Process Control                 Chair for Process Control           Chair for Process Control
                Systems Engineering                      Systems Engineering,                Systems Engineering,
               Technische Universität                    Technische Universität              Technische Universität
                 Dresden, Germany                         Dresden, Germany                    Dresden, Germany
               markus.graube@tu-                        stephan.hensel@tu-                      leon.urbas@tu-
                  dresden.de                                dresden.de                            dresden.de

ABSTRACT                                                                enablers to support new inter-organisational collaboration
For most use cases, the Semantic Web provides essential                 models for the creation of virtual enterprises. The Com-
mechanisms to interlink data in a fast and efficient way.               Vantage1 project explores the capabilities of Linked Data
However, it is still not widely accepted in industry since some         (LD) as a flexible and rapidly unifying way to provide access
important features are not mature enough. Requirements                  to the data vaults of all stakeholders of a virtual enterprise
include easier model transformation and access to dynamic               by creating a product-centred collaboration space. However,
data. One of the most missing important features is version             the almost unlimited openness and flexibility of Linked Data
control which would make it possible to record changes in               may also involve disadvantages. Industrial applications re-
a way that they can be rolled back at any time. Recent                  quire reliability, security and stability. Thus, they need to
version control system are not very well integrated into the            keep control over the process of data manipulation. In fact,
Semantic Web.                                                           version control is an essential requirement for them to adapt
                                                                        this technology.
This paper shows a novel way of dealing with version control
for Linked Data. It presents R43ples as an approach using               Section 2 of this paper states the need for version control
named graphs to semantically store the differences between              systems in the Semantic Web and provides an overview of
revisions. Furthermore it allows direct access and manip-               related work and our contributions. In section 3, the ver-
ulation of revisions with SPARQL. Thus, the access is al-               sion control concept of R43ples is presented. Section 4 de-
most transparent for the clients which can still use known              scribes the prototypical implementation. Section 5 evaluates
SPARQL queries enhanced with some additional keywords.                  the concept and gives some metrics of the implementation.
A prototypical implementation of the system shows a proof               Section 6 discusses the concept further before the paper is
of concept and performance considerations.                              concluded with an outlook of possible enhancements.

Categories and Subject Descriptors                                      2. BACKGROUND
H.2.3 [Database Management]: Languages—Query lan-                       2.1 Linked Data
guages; H.3.4 [Information Storage and Retrieval]: Sys-                 Linked Data is a set of best practices for modelling and
tems and Software—Information networks; H.3.5 [Information              interconnecting information in a widely accepted semantic
Storage and Retrieval]: Online Information Services—                    way. It is becoming more and more important in the world
Web-based services                                                      of Linked Open Data. It uses the Resource Description
                                                                        Framework (RDF) as the base model. RDF handles informa-
General Terms                                                           tion as a semantic network of single statements consisting
Linked Data, Versioning, SPARQL, Revision, Query, Named                 of subject, predicate and object. LD information entities
Graphs                                                                  are referenced by URIs. A named graph is a collection of
                                                                        RDF statements grouped together and identified by a URI.
                                                                        Named graphs are a kind of transformation of quads (a triple
1.    INTRODUCTION                                                      with a fourth element). SPARQL (SPARQL Protocol And
The explosion of the Semantic Web in recent years [9] has               RDF Query Language) is the dominant query language in
provided the opportunity to develop advanced technology                 the Semantic Web. It uses a graph-based matching mecha-
                                                                        nism with powerful filter and aggregating functionality and
                                                                        additional support for named graphs. Nevertheless, Linked
                                                                        Data might also be a useful technology for industrial envi-
                                                                        ronments [6]. This requires controlled write mechanisms to
                                                                        the Linked Data cloud as stated by Berners-Lee and O’Hara
                                                                        [2]. Version control could be one way to achieve such a con-
Copyright is held by the author/owner(s).                               trolled read-write mechanism.
LDQ 2014, 1st Workshop on Linked Data Quality Sept. 2, 2014, Leipzig,   1
Germany.                                                                  EU FP7 Integrated Project “Collaborative Manufacturing
                                                                        Network for Competitive Advantage”: www.comvantage.eu
2.2     Version Control for Triples                              statements. Im et al. [8] use a delta-based approach for
The major function of version control systems is to record       versioning RDF triples and introduce an aggregated delta
changes in the information model in order to get back to         approach which leverages the construction of a version by
a prior version when needed. Furthermore, version control        storing additional deltas not only to the prior version but to
makes it possible to merge changes of different authors into     all other versions.
one common information base. Obviously, this functional-
ity is not only needed for software engineering but for data     Some Semantic Web applications support synchronising be-
in general. This includes Linked Data which has a special        tween different users, e.g. OntoWiki Mobile [4]. This is close
demand for version control because of its very open nature       to a version control system. However, this feature is deeply
and the number of possible contributors to a data set.           integrated into the specific application and its stack.

Current version control systems are usually either text-based    The concept of Vander Sande et. al. [13], based on [14], for
(changes can be localised in lines) or completely binary (no     version control seems to meet almost all requirements for
localisation of changes possible). However, Linked Data          the Semantic Web. Unfortunately, only parts of the system
is graph-based and thus in this case the existing systems        are modelled semantically, e.g. other parts may use hash
don’t meet the localisation mechanisms which is necessary        tables to get relations between revisions and difference sets.
for merging revisions. Additionally, one can differentiate be-   Furthermore, the distributed nature of Git is not utilised
tween distributed systems and central systems. In a central      despite of the promising title of the article.
system like Subversion2 the whole repository is stored on
a central server and the clients have local working copies.      2.4     Contributions
In distributed systems like Git3 , every client holds the full   R43ples offers a completely semantic approach for version-
repository and can re-synchronise with other clients.            ing RDF data sets in named graphs and accessing them via
                                                                 SPARQL. The concept is based partly on the work of Vander
2.3     Related Work                                             Sande et. al. [13]. However, our approach has no need for
                                                                 additional languages since we use the SPARQL 1.1 features
2.3.1    Model Versioning                                        for updating data. This can be done with adding a few
There has been a lot of previous work on versioning of mod-      keywords to SPARQL. Furthermore, we propose a model
els. For example, Watkins and Nicole [16] started with an        of revision information describing both commits as well as
ontology for modelling the provenance of documents defin-        changes in a purely semantic way using named graphs in-
ing a set of meta information for versioning. Taentzer et        stead of additional look-up tables. Finally, we provide a
al. [12] distinguish between state-based and operation based     performance evaluation of a prototypical implementation.
versioning systems which have different mechanisms for con-
flict detecting and handling. However, although versioning
of models is a key technique in model driven engineering, it     3. CONCEPT
is not supported by a widely accepted concept. Most models       3.1 Graph Based Version Control
described in the literature use entities with identifiers and    We use a central repository since no local working copy in
don’t rely on any order in a collection. Thus, they can be       a traditional sense can be checked out in the Semantic Web
easily handled as graphs, which fits the base model in the       and held on the client. The complete graph could be ex-
Semantic Web.                                                    tremly large and every piece of information is potentially
                                                                 connected with other information spread over the global
                                                                 Linked Data cloud. This also excludes conventional Lock-
2.3.2    Temporal RDF
                                                                 Modify-Unlock mechanisms. This would imply that the
Another interesting approach which allows tracking of infor-
                                                                 whole network has to be locked. Thus we use a Copy-
mation over time in Linked Data is the use of temporal RDF
                                                                 Modify-Merge mechanism where clients get their informa-
suggested by [7]. However, we think that versioninghas the
                                                                 tion via SPARQL (copy), work with this in their local mem-
advantage over time labelling that related changes are bun-
                                                                 ory (modify) and commit their updates to the server via
dled in semantic way and not only by the same time stamp.
                                                                 SPARQL again (merge). This makes it possible for users
Furthermore, there is no query for temporal RDF available
                                                                 to keep on working with the well-known SPARQL interface
that has a good compatibility with SPARQL.
                                                                 while providing fast and flexible revisions management.

2.3.3    Semantic Web Versioning                                 R43ples handles version control on a graph level and not the
Most authors who handle version control systems for the Se-      instance level. Thus, a specific version of a whole named
mantic Web follow an operation based approach which relies       graph is the unit under version control. It is stored as a
on specific operations and are thus not well integrated in the   revision which can be queried and used as a base for further
current Semantic Web environment. Auer and Herre [1] base        changes. Unlike in file-based systems (e.g. Subversion or
their concept on atomic changes to RDF graphs which are          Git) where a revision contains a set of files representing a
annotated in reified statements4 of the original data. The       specific point in time, a revision in R43ples contains only
approach of Cassidy and Ballantine [3] uses context infor-       one single named graph.
mation in order to store information about patches. The
changed triples are on the other hand modelled as reified        3.2     Semantic Revision Model
2
  http://subversion.apache.org/                                  3.2.1    Data Model of Revisions
3
  http://git-scm.com/                                            The whole approach uses semantics in order to avoid hidden
4
  http://www.w3.org/TR/rdf-mt/#Reif                              meanings which makes it hard for other clients to access the
information. Thus, revisions are modelled as Linked Data.        3.3     Dynamic Handling of Revisions
The data model uses PROV-O [15] as base ontology and is
extended by some attributes. The vocabulary is called Re-        3.3.1    Querying Revisions
vision Management Ontology (RMO). Figure 1 shows an ex-          Information from the MASTER revision is instantly avail-
cerpt of a graph revision model, with one commit generating      able since the whole data set exists in the specified named
a new revision for a specific named graph (marked in grey).      graph. It is used when the client does not specify a revision.
The revisions are linked to the named graph http://test (via     Therefore, it is likely that it will be accessed very often.
the property rmo:revisionOf ) and contain a revision num-
ber (rmo:revisionNumber ) for a simple human friendly rep-       However, other revisions must be generated dynamically as
resentation. The property prov:wasDerivedFrom connects           only the delta information is stored between two revisions.
two revisions and describes the revision graph. The commit       With respect to the revision to be generated, all triples of
between two revisions is modelled as standard prov:Activity      the add set must be added to the the previous revision and
connected via prov:used and prov:generated attributes. It        all triples of the delete set must be removed from the pre-
holds meta information about commit time (prov:atTime),          vious revision. R43ples accepts slightly enhanced SPARQL
commit message (dcterms:title) and the actor committing          queries which allow to add the revision number for each spec-
the changes (prov:wasAssociatedWith).                            ified graph in the SPARQL query. For each named graph g
                                                                 specified in a query, a temporary graph T Gg , r is generated
3.2.2    Naming Graphs for Storing Revisions                     for the specified revision r according to equation 1 (gx = full
The named graph with the URI of the revisioned graph holds       materialised revision x of graph g):
the MASTER revision representing the terminal revision of
the default branch in the revision graph. The information
about other revisions and their connections and further re-                                  nearestBranch
                                                                                                  X
visioned graphs is stored in an additional named graph for       T Gg,r = gnearestBranch +                  (deleteSetg,i −addSetg,i )
each revisioned graph called . All revi-                                  revision i=r
sion control systems have to provide information of all re-                                                                       (1)
visions while handling the number of storage. Since “97,3%
of the entire data in each version remains unchanged” [8] it     This simple formula can be mapped to a series of SPARQL
is necessary to compress this data. Delta-based storage is       queries as presented in the pseudo code below. It firsts cre-
the approach of choice here. According to [10] RDF triples       ates a graph  merging all change sets. Af-
are the smallest unit of change and are thus the basis for       terwards it rewrites the query so it uses this new tempo-
calculating the differences as deltas between revisions. The     rary graph instead of the specified one. The result of the
differences of revisions are again a set of triples and can be   SPARQL query on that graph is returned after cleaning up
stored in additional named graphs. Every revision consists       the temporary graph.
of one ADD set and one DELETE set assigned with the
properties rmo:deltaAdded and rmo:deltaRemoved. Apply-
ing these delta sets to the prior revision will lead to the      def select_query(query):
current revision.                                                    for (graph,rev_g) in query.graphs_and_revs():
                                                                         sparql("COPY GRAPH  \
3.2.3    Tags and Branches                                                      TO GRAPH ")
The R43ples approach supports tags as references to specific             for rev in graph.path_to_revision(rev_g):
revisions via the property rmo:references (as shown in fig-                  sparql("REMOVE GRAPH 
ure 2). They are of type rmo:Tag and have a unique name                              FROM GRAPH ")
(rmo:tagName) as well as a description (rdfs:description).                   sparql("ADD GRAPH 
Similarly, different branches are supported by allowing dif-                         TO GRAPH ")
ferent successors of one revision via prov:isDerivedFrom. Each           query.replace(graph, "graph -rev_g ")
terminal revision of the generated branches is referenced by         result = sparql(query_string)
a rmo:Branch entity. The rmo:Master is a subclass pointing           for (graph,rev_g) in query.graphs_and_revs():
to the default graph. All these references point to copies of            sparql("DROP GRAPH ")
a full graph of this revision via rmo:fullGraph property.            return result

The centralised approach of R43ples can easily achieve the
necessary uniqueness of the revision numbers. The revision       When considering the merging of revisions, it does not mat-
numbers can follow different schemes, for example just or-       ter which previous revision is used to generate the merged re-
dinals or using a hash. We decided for a more complex            vision due to the properties of SPARQL. An INSERT state-
naming scheme which indicates the position of a revision in      ment of an existing triple does not insert it a second time
the graph. For the system these are just strings for provid-     and a DELETE statement of a non-existing triple does not
ing a human-friendly identifier without semantic meaning         end in an error message. The add set A and delete set D of
(although the revision number, not shown in figure 2, is also    a revision with the set of triples Rm merged from revision
part of the URI, e.g. “3.1-22”). The users need to be able to    with sets of triples R1 and R2 must comply with the rules
retrieve the whole revision graph including the numbers of       from equations 2 and 3.
the revisions. With R43ples it is possible to receive this in-
                                                                                A = (Rm \R1 ) ∪ (Rm \R2 )                         (2)
formation like any other data via SPARQL queries directly
on the revision graph .                                      D = (R1 \Rm ) ∪ (R2 \Rm )                         (3)
                           Figure 1: Data Model of a revision graph with ontology RMO




                                     Figure 2: Model of master, branches and tags


3.3.2    Updating Revisions                                      bined or one change has to be selected in preference to the
Clients update revisions via the established SPARQL UP-          other. This is performed via an additional administrator
DATE command. This updates the revision graph with a             interface on the server.
new revision node which references the new change sets. The
changes are both reflected in the new add and delete sets as
well as in the updated full graph. However, updates can          3.4   SPARQL extension for R43ples
only be performed on the terminal sibling of a branch.           In a SPARQL query it has to be possible to determine the re-
                                                                 vision of the involved named graphs. Furthermore, update
If a client wants to update a revision which is not referenced   queries should contain information about the author and
by a branch, the commit is rejected. The client has to merge     a commit message. Partly, this information could be em-
its local changes with the most recent information of the        bedded into the name of the graph. However, we strongly
branch. Merging is the application of two different change       believe that loading identifiers with semantics would be a
sets to one entity. If the local merge is possible, the client   violation of the basic principles of Linked Data. Another
can recommit these merged changes. The other option is to        option are new keywords or specifying this information as
explicitly create a new branch for the local changes.            part of the WHERE clause as triple patterns like ?revision
                                                                 rmo:revisionOf  ; rmo:revisionNumber "43".
The client cannot usually merge if it is unable to reconcile     However, the latter one has the disadvantage that there is
the changes. These conflicts have to be resolved afterwards      no clear distinction between the specification of revision in-
in order to get a common consolidated data model in the          formation and SPARQL query pattern.
revision control system. Thus, the changes have to be com-
                                                                 We decided to introduce the additional keyword REVISION
   SELECT ? s ?p ? o                                                  USER 
   FROM  REVISION ”4 3 ”                                 BRANCH  REVISION ”4 2 ” TO ”
   WHERE {                                                                F e a t u r e xyz ”
     ? s ?p ? o .
   }
                                                                     Listing 3: Query for branching from revision 42

 Listing 1: SELECT query for revision 43 of graph
                                                                     HTTP Parameter                    Description
                                                                     graph-revision-number             Revision of graph of
                                                                                                       last query
   USER                                                     graph-revision-number-of-master   Current    MASTER
   INSERT DATA INTO  REVISION ”                                                           revision number of
       MASTER” MESSAGE ”Small change ”                                                                 graph
   {   . }
                                                                            Table 1: HTTP header parameters
Listing 2: Update query building on top of revison
42
                                                                 The clients are kept aware of the recent MASTER revision
                                                                 in every SPARQL response. The HTTP response header is
                                                                 extended by additional fields which specify the current MAS-
to SPARQL to add the necessary semantic. Furthermore,            TER revision number and the revision number on which the
the update mechanisms need some meta information about           query was executed for every named graph involved. Table 1
the commit introduced by the keywords USER and MES-              describes the construction of the parameter names. All un-
SAGE. Finally, the creation of tags and branches is solved       derlined sub strings are replaced with the current named
by the keywords BRANCH and TAG.                                  graph under version control. This information is not needed
                                                                 by the client for querying. Yet it provides the new revision
In a SELECT query the user can define the revision number        number after a commit and is thus very useful for the client.
by applying the FROM clause with the keyword REVISION.
It can be a number representing a revision, a string repre-      4.     IMPLEMENTATION
senting a branch or tag (e.g. “master”) or empty. When it        The concept was implemented as proof of concept and its
is empty or the keyword REVISION is missing, the MAS-            source code is publicly available via GitHub5 . The prototype
TER revision will be used as default. An exemplary query         is realised as a SPARQL proxy rather than a modification of
is shown in listing 1.                                           an existing open-source SPARQL endpoint. The implemen-
                                                                 tation works as a Java application. Jersey6 is used as REST-
Updates (INSERT or DELETE queries) can only be exe-              ful (Representational State Transfer) Web service framework
cuted on a branch specified by the branch name or the num-       and grizzly7 as the web server while Virtuoso8 acts as triple
ber of a revision referenced by a branch. In INSERT and          store and SPARQL endpoint. A live demonstration sys-
DELETE queries the performing user must first be defined.        tem is running on http://eatld.et.tu-dresden.de:8890/
Therefore the keyword USER is reserved. After the FROM           r43ples/sparql.
respectively the INTO clause the keyword REVISION iden-
tifies the graph revision following the same approach as in      Figure 3 shows the system structure. If a client wants to
a SELECT query. Furthermore, there could be attached a           use the revision control features of R43ples it has to send
commit message following the keyword MESSAGE as shown            the SPARQL queries to R43ples’ SPARQL endpoint instead
in listing 2.                                                    of the triplestore’s endpoint. Furthermore there is an ad-
                                                                 ministrator interface which acts as a test bed for functions
The REVISION parameter is necessary for the SPARQL               that don’t yet have a proper REST interface. These func-
endpoint to check to which branch revision a client wants to     tions are controlled by a command line interface and perform
apply its changes. If the client wants to update a revision      complex management of the graphs under version control.
that is not directly referenced by a branch, the server will
reject the commit. Then, the client needs to check if its data
model is consistent with the new information from a branch
revision. If so, it can resubmit its changes, or it can open
a new branch if there is a conflict the client is not able to
handle. If the branch revision of the server matches that
of the client, the server will accept the change and create
a new revision with the information provided. Then, the
                                                                                Figure 3: System Structure
responding branch reference will be forwarded to this new
revision.
                                                                 R43ples stores no information about the revisions itself but
Listing 3 depicts a SPARQL query for generating a new
                                                                 5
branch. In the example, a new branch is created with the           https://github.com/plt-tud/r43ples
                                                                 6
information from revision 42. The same interface is avail-         https://jersey.java.net/
                                                                 7
able for creating a tag using the keyword TAG instead of           https://grizzly.java.net/
                                                                 8
BRANCH.                                                            http://virtuoso.openlinksw.com/
     CONSTRUCT {? s ?p ? o } WHERE {                               The merging feature is still under construction while we are
       GRAPH  { ? s ?p ? o }                         investigating different approaches for a user friendly inter-
       FILTER NOT EXISTS { GRAPH  { ? s                  face.
           ?p ? o } }
     }
                                                                   5. EVALUATION
             Listing 4: Get all added triples                      5.1 Response Time
                                                                   An important metric for evaluating the usability of this con-
                                                                   cept is the response time of the service for R43ples queries
                                                                   in various configurations. Therefore, we have measured the
uses a configured triplestore which is accessed by the triple
                                                                   time between the request sent by the client and the response
store interface. The communication is based on SPARQL
                                                                   received using Apache jMeter12 . We evaluated the operation
queries. To ensure the integrity of the data, only the SPARQL
                                                                   time of R43ples in a complex setup on a 4 GB RAM system
proxy should have access to the different graphs which it cre-
                                                                   running a Virtuoso 7 as SPARQL endpoint connected to
ates. Access rights are defined in the triple store. The clients
                                                                   R43ples. We generated random data sets with sizes of 100,
need to know if the endpoint supports R43ples features in
                                                                   1000, 10000 and 100000 triples. Then we created ten revi-
addition to standard SPARQL. Hence, R43ples copies the
                                                                   sions for each data set with changes of 10 to 100 triples. Fi-
SPARQL 1.1 Service Description9 of the connected endpoint
                                                                   nally, we measured the response time for a simple SPARQL
and adds sd:r43ples as further sd:Feature.
                                                                   query (querying all triples and limiting them to ten results)
                                                                   dependent on all data sets, all revisions and all different
The implemented proxy SPARQL endpoint can also handle
                                                                   change sizes. The measurement was repeated 20 times to
standard SPARQL queries. Of course, this raises the re-
                                                                   capture random effects such as computing load.
quirement that the revisioned graph shall be only edited by
R43ples and its specific queries. Otherwise inconsistencies
                                                                   Figure 4 presents some results showing the response time
would be generated. Virtuoso supports such access policies
                                                                   in comparison to variations of the three variables around a
for the SPARQL endpoint, prohibiting write access to the
                                                                   specific setup (1000 triples in the data set, going back five
 graph and all graphs which are related
                                                                   commits into the past with 50 triples changing in every com-
to R43ples.
                                                                   mit). The left plot shows that the response time increases
                                                                   linearly with the number of commits plus a constant bias of
The generation and update of the version system informa-
                                                                   some milliseconds. The size of the commit seems also to be
tion is completely implemented with the help of SPARQL
                                                                   almost linear to the response time as suggested by the mid-
queries. R43ples performs a SPARQL update on a tem-
                                                                   dle plot. Even the size of the data set has linear influence
porary copy of the full graph of the specified branch. Af-
                                                                   (note the logarithmic scale in the right plot).
terwards, it retrieves all added triples with the SPARQL
query from listing 4 which returns all triples which are in
                                                                   A deeper analysis shows that the structure of the data set
NEW-REV-TEMP but not in LAST-REV. After the same
                                                                   is not significant. The overhead for querying a revision that
concept was used for the removed triples, the ADD and
                                                                   is available as full graph is about 10 ms in comparison to a
DELETE sets are constructed with the help of a SPARQL
                                                                   direct SPARQL query and is thus almost negligible. How-
CONSTRUCT query. Then the new revision information is
                                                                   ever, if the revision has to be generated by R43ples, the
inserted in  and the actual full graph is
                                                                   dominant factors are the overall size of changes to be re-
updated with the help of INSERT and DELETE queries.
                                                                   versed and the size of the data set. Equation 4 lists a sim-
                                                                   ple linear model which almost exactly reflects these findings
The administrator interface offers an additional way for in-
                                                                   (R2 = 0.98) with the variables T as R43ples response time
teracting with R43ples for those features which don’t have
                                                                   in milliseconds, SDS as data set size, SC as change size and
a friendly REST interface yet. Those tasks are currently:
                                                                   P as path length to a full graph revision. Thus, in many
                                                                   application T would be of order O(SDS ).
     • Put an existing graph under revision management                          T = 100 + 0.06 ∗ SDS + 0.7 ∗ (P ∗ SC )         (4)
     • Import a new graph under version control                    The results makes sense since the algorithm has to duplicate
     • Generate visualisation of the revision graph (yEd ex-       the graph and then apply all changes. Both efforts are pro-
       port)                                                       portional to the size. As minor result R43ples can perform
                                                                   few revisions and big changes in each revision step better
     • Set a new MASTER revision                                   than lots of small changes assuming that the overall num-
                                                                   ber of changed triples is the same. Furthermore, UPDATE
     • Merge two revisions                                         query time increases linearly with the size of the committed
                                                                   change set.
The admin interface currently supports turtle serialisation10
for the export and import of RDF data. The visualisation           5.2      Storage
of the revisions, their connections and branches is done by        The costs for a new revision S∆,Revision (in additional triples)
creating a GraphML file which can be viewed with yEd11 .           are almost proportional to the size of changes and indepen-
9                                                                  dent from the complexity of previous revisions and the re-
   http://www.w3.org/TR/sparql11-service-description/
10                                                                 vision graph (S∆,Revision = SC + 12). The additional fixed
   http://www.w3.org/2007/02/turtle/primer/
11                                                                 12
   http://www.yworks.com/de/products_yed_about.html                     http://jmeter.apache.org/
Figure 4: R43ples response time in comparison to the revision path length (left), the change size of the single
commits (middle), and the size of the data set (right)


triples in  (six for a revision; six for the   change sets are not equal to the ones in the full graph, pro-
commit) are negligible. The creation of a branch or a tag         hibiting a correct application of the changes when gener-
copies the full graph besides the addition of a fixed number      ating an old revision. This can of course be solved by a
of triples (S∆,Tag = SDS + 11; S∆,Branch = SDS + 15).             prior Skolemization which should ideally be performed by
                                                                  the client or could also be achieved by an enhanced version
                                                                  of R43ples before executing a SPARQL query.
6.   DISCUSSION
Although the approach presented here solves most of the           Currently, the generation of uncached revision follows a sim-
versioning problems, there are also some drawbacks.               ple approach applying all changes from the first successor
                                                                  until reaching the leaf of a branch. However, if there are
Named graphs are used extensively, mainly for storing differ-     many tags in the revision graph, it could be more efficient
ences between revisions. This means that the use of named         to use another revision path to generate this revision. Thus,
graphs for other purposes cannot be guaranteed. Those pur-        one has to solve a shortest-path-problem.
poses could be structuring of information, access control or
additional provenance information. One might ask if we            Another point of discussion is the way of transferring the
need an additional context attribute as a fifth element ex-       necessary additional information. Currently, the R43ples
plicitly declared for revision control.                           SPARQL server transports the MASTER revision as well as
                                                                  the relevant revisions of all involved graphs in the HTTP
The concept is fully transparent for SPARQL clients which         header. On the other hand, the R43ples clients transport
are not aware of the R43ples version control system. They         information about the graph revisions in the HTTP body.
can use the prototype as common SPARQL endpoint with-             An alternative would be to transfer both information in the
out additional features always working on the master re-          HTTP body and thus on the same level. This would need
vision. Clients can easily check if an endpoint supports          an extension of the SPARQL result model.
R43ples query by evaluating the service description of the
endpoint.                                                         The integration of version control into the existing Semantic
                                                                  Web tool environment is not easy. A basic requirement is
Clients will usually work on MASTER or other branches in          that these tools don’t work on a file basis but on a triplestore
order to get the most recent information. However, there          with SPARQL interface. Under these circumstances it would
could be situations when clients should continuously work         be no big problem to exchange the SPARQL interface with
on a specific revision of a graph. Then, this revision of the     the slightly enhanced R43ples interface.
graph should be tagged in order to store a full copy. A
possible solution would be the automatic detection of such        The performance of the prototype limits the application to
frequently used revisions and triggering of tag generation.       medium sized data sets. Queries on data sets with more
                                                                  than a few thousand triples take longer than most users
Another drawback is the lack of support for blank nodes in        are willing to wait. This can be solved by splitting large
the current implementation. You can’t assume that blank           data sets into smaller ones and by directly implementing the
nodes from different graphs with the same blank node iden-        concept into the SPARQL endpoint which should improve
tifiers are the same. For example, the blank nodes in the
performance considerably. Another promising approach we           [7] C. Gutierrez, C. Hurtado, and A. Vaisman.
are currently working on is the use of enhanced SPARQL                Introducing time into RDF. IEEE Transactions on
rewriting in order to perform the query taking into account           Knowledge and Data Engineering, 19(2):207–218, Feb.
the full graph and all change sets in one request Hence, the          2007.
generation of the whole graph for the specified revision is       [8] D.-H. Im, S.-W. Lee, and H.-J. Kim. A version
not necessary which really takes long time for big datasets.          management framework for RDF triple stores.
                                                                      International Journal of Software Engineering and
Finally, security is a crucial point for all industrial appli-        Knowledge Engineering, 22(01):85–106, Feb. 2012.
cations. We rely on the adaptable security mechanisms of          [9] J. Murdock, C. Buckner, and C. Allen. Containing the
existing triple stores and SPARQL endpoints. These should             semantic explosion. In Procedings of PhiloWeb, Lyon,
only provide information about the revision tree and the              2012.
revisioned data sets to authenticated and authorised users.      [10] D. Ognyanov and A. Kiryakov. Tracking changes in
This could be achieved for example by the approach sug-               RDF(S) repositories. In Knowledge Engineering and
gested by [11, 5].                                                    Knowledge Management: Ontologies and the Semantic
                                                                      Web, page 373–378. Springer, 2002.
7.   CONCLUSIONS                                                 [11] P. Ortiz, O. Lazaro, M. Uriarte, and M. Carnerero.
We have presented a concept for a semantic revision con-              Enhanced multi-domain access-control for secure
trol system for Linked Data which uses the capabilities of            mobile collaboration through linked data cloud in
SPARQL. The implemented prototype works well for query-               manufacturing. In Proceedings of IEEE World of
ing cached graphs. The generation of uncached graphs is suf-          Wireless Mobile and Multimedia Networks
ficient for small to medium sized data sets. The advantage            (WoWMoM) conference 2013, 2013.
of our approach is that it is completely based on semantics      [12] G. Taentzer, C. Ermel, P. Langer, and M. Wimmer. A
and thus the information about revisions can be retrieved via         fundamental approach to model versioning based on
SPARQL. Furthermore, SPARQL is used as access mecha-                  graph modifications: from theory to implementation.
nism with only slight adaptations in order to ensure the se-          Software & Systems Modeling, page 1–34, 2012.
mantic use of revision information while keeping the query       [13] M. Vander Sande, P. Colpaert, R. Verborgh,
compatible to standard SPARQL.                                        S. Coppens, E. Mannens, and R. Van de Walle.
                                                                      R&Wbase: git for triples. In Proceedings of the 6th
However, this concept still needs further research. Our next          Workshop on Linked Data on the Web, 2013.
steps will involve investigating different merging approaches    [14] M. Völkel and T. Groza. SemVersion: an RDF-based
and an intensive consideration of how this concept can be             ontology versioning system. In Proceedings of the
integrated into existing tools.                                       IADIS international conference WWW/Internet,
                                                                      volume 2006, pages 195—202. Citeseer, 2006.
                                                                 [15] W3C. PROV-O: the PROV ontology, Apr. 2013.
8.   ACKNOWLEDGEMENTS                                            [16] E. R. Watkins and D. A. Nicole. Version control in
This research was partly funded by the European Commis-
                                                                      online software repositories. In Proceedings of the 2005
sion on the grant number 284928 (ComVantage).
                                                                      International Conference on Software Engineering
                                                                      Research and Practice, volume 2, page 550–556, 2005.
9.   REFERENCES
 [1] S. Auer and H. Herre. A versioning and evolution
     framework for RDF knowledge bases. In Perspectives
     of Systems Informatics, page 55–69. Springer, 2007.
 [2] T. Berners-Lee and K. O’Hara. The read-write linked
     data web. Philosophical Transactions of the Royal
     Society A: Mathematical, Physical and Engineering
     Sciences, 371(1987):20120513–20120513, Feb. 2013.
 [3] S. Cassidy and J. Ballantine. Version control for RDF
     triple stores. ICSOFT (ISDM/EHST/DC), 7:5–12,
     2007.
 [4] T. Ermilov, N. Heino, S. Tramp, and S. Auer.
     Ontowiki mobile–knowledge management in your
     pocket. In The Semantic Web: Research and
     Applications, page 185–199. Springer, 2011.
 [5] M. Graube, P. Ortiz, M. Carnerero, O. Lazaro,
     M. Uriarte, and L. Urbas. Flexibility vs. security in
     linked enterprise data access control graphs. In Proc.
     of 9th IEEE Int. Conf. on Information Assurance and
     Security, 2013.
 [6] M. Graube, J. Pfeffer, J. Ziegler, and L. Urbas. Linked
     data as integrating technology for industrial data.
     International Journal of Distributed Systems and
     Technologies (IJDST), 3(3):40–52, 2012.