<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Workshop on Linked Data Quality Sept.</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>An Approach for Version Control in the Semantic Web</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Markus Graube</string-name>
          <email>markus.graube@tu-</email>
          <email>markus.graube@tudresden.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stephan Hensel</string-name>
          <email>stephan.hensel@tu-</email>
          <email>stephan.hensel@tudresden.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Leon Urbas</string-name>
          <email>leon.urbas@tu-</email>
          <email>leon.urbas@tudresden.de</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Chair for Process Control, Systems Engineering, Technische Universität</institution>
          ,
          <addr-line>Dresden</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Chair for Process Control, Systems Engineering, Technische Universität</institution>
          ,
          <addr-line>Dresden</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Chair for Process Control, Systems Engineering, Technische Universität</institution>
          ,
          <addr-line>Dresden</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <volume>2</volume>
      <issue>2014</issue>
      <abstract>
        <p>For most use cases, the Semantic Web provides essential mechanisms to interlink data in a fast and e cient way. However, it is still not widely accepted in industry since some important features are not mature enough. Requirements include easier model transformation and access to dynamic data. One of the most missing important features is version control which would make it possible to record changes in a way that they can be rolled back at any time. Recent version control system are not very well integrated into the Semantic Web.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>This paper shows a novel way of dealing with version control
for Linked Data. It presents R43ples as an approach using
named graphs to semantically store the di erences between
revisions. Furthermore it allows direct access and
manipulation of revisions with SPARQL. Thus, the access is
almost transparent for the clients which can still use known
SPARQL queries enhanced with some additional keywords.
A prototypical implementation of the system shows a proof
of concept and performance considerations.
enablers to support new inter-organisational collaboration
models for the creation of virtual enterprises. The
ComVantage1 project explores the capabilities of Linked Data
(LD) as a exible and rapidly unifying way to provide access
to the data vaults of all stakeholders of a virtual enterprise
by creating a product-centred collaboration space. However,
the almost unlimited openness and exibility of Linked Data
may also involve disadvantages. Industrial applications
require reliability, security and stability. Thus, they need to
keep control over the process of data manipulation. In fact,
version control is an essential requirement for them to adapt
this technology.</p>
      <p>Section 2 of this paper states the need for version control
systems in the Semantic Web and provides an overview of
related work and our contributions. In section 3, the
version control concept of R43ples is presented. Section 4
describes the prototypical implementation. Section 5 evaluates
the concept and gives some metrics of the implementation.
Section 6 discusses the concept further before the paper is
concluded with an outlook of possible enhancements.</p>
    </sec>
    <sec id="sec-2">
      <title>2.2 Version Control for Triples</title>
      <p>The major function of version control systems is to record
changes in the information model in order to get back to
a prior version when needed. Furthermore, version control
makes it possible to merge changes of di erent authors into
one common information base. Obviously, this
functionality is not only needed for software engineering but for data
in general. This includes Linked Data which has a special
demand for version control because of its very open nature
and the number of possible contributors to a data set.
Current version control systems are usually either text-based
(changes can be localised in lines) or completely binary (no
localisation of changes possible). However, Linked Data
is graph-based and thus in this case the existing systems
don't meet the localisation mechanisms which is necessary
for merging revisions. Additionally, one can di erentiate
between distributed systems and central systems. In a central
system like Subversion2 the whole repository is stored on
a central server and the clients have local working copies.
In distributed systems like Git3, every client holds the full
repository and can re-synchronise with other clients.</p>
    </sec>
    <sec id="sec-3">
      <title>2.3 Related Work</title>
      <sec id="sec-3-1">
        <title>2.3.1 Model Versioning</title>
        <p>
          There has been a lot of previous work on versioning of
models. For example, Watkins and Nicole [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] started with an
ontology for modelling the provenance of documents de
ning a set of meta information for versioning. Taentzer et
al. [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] distinguish between state-based and operation based
versioning systems which have di erent mechanisms for
conict detecting and handling. However, although versioning
of models is a key technique in model driven engineering, it
is not supported by a widely accepted concept. Most models
described in the literature use entities with identi ers and
don't rely on any order in a collection. Thus, they can be
easily handled as graphs, which ts the base model in the
Semantic Web.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>2.3.2 Temporal RDF</title>
        <p>
          Another interesting approach which allows tracking of
information over time in Linked Data is the use of temporal RDF
suggested by [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. However, we think that versioninghas the
advantage over time labelling that related changes are
bundled in semantic way and not only by the same time stamp.
Furthermore, there is no query for temporal RDF available
that has a good compatibility with SPARQL.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>2.3.3 Semantic Web Versioning</title>
        <p>
          Most authors who handle version control systems for the
Semantic Web follow an operation based approach which relies
on speci c operations and are thus not well integrated in the
current Semantic Web environment. Auer and Herre [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] base
their concept on atomic changes to RDF graphs which are
annotated in rei ed statements4 of the original data. The
approach of Cassidy and Ballantine [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] uses context
information in order to store information about patches. The
changed triples are on the other hand modelled as rei ed
2http://subversion.apache.org/
3http://git-scm.com/
4http://www.w3.org/TR/rdf-mt/#Reif
statements. Im et al. [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] use a delta-based approach for
versioning RDF triples and introduce an aggregated delta
approach which leverages the construction of a version by
storing additional deltas not only to the prior version but to
all other versions.
        </p>
        <p>
          Some Semantic Web applications support synchronising
between di erent users, e.g. OntoWiki Mobile [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. This is close
to a version control system. However, this feature is deeply
integrated into the speci c application and its stack.
The concept of Vander Sande et. al. [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], based on [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], for
version control seems to meet almost all requirements for
the Semantic Web. Unfortunately, only parts of the system
are modelled semantically, e.g. other parts may use hash
tables to get relations between revisions and di erence sets.
Furthermore, the distributed nature of Git is not utilised
despite of the promising title of the article.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>2.4 Contributions</title>
      <p>
        R43ples o ers a completely semantic approach for
versioning RDF data sets in named graphs and accessing them via
SPARQL. The concept is based partly on the work of Vander
Sande et. al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. However, our approach has no need for
additional languages since we use the SPARQL 1.1 features
for updating data. This can be done with adding a few
keywords to SPARQL. Furthermore, we propose a model
of revision information describing both commits as well as
changes in a purely semantic way using named graphs
instead of additional look-up tables. Finally, we provide a
performance evaluation of a prototypical implementation.
      </p>
    </sec>
    <sec id="sec-5">
      <title>3. CONCEPT</title>
    </sec>
    <sec id="sec-6">
      <title>3.1 Graph Based Version Control</title>
      <p>We use a central repository since no local working copy in
a traditional sense can be checked out in the Semantic Web
and held on the client. The complete graph could be
extremly large and every piece of information is potentially
connected with other information spread over the global
Linked Data cloud. This also excludes conventional
LockModify-Unlock mechanisms. This would imply that the
whole network has to be locked. Thus we use a
CopyModify-Merge mechanism where clients get their
information via SPARQL (copy), work with this in their local
memory (modify) and commit their updates to the server via
SPARQL again (merge). This makes it possible for users
to keep on working with the well-known SPARQL interface
while providing fast and exible revisions management.
R43ples handles version control on a graph level and not the
instance level. Thus, a speci c version of a whole named
graph is the unit under version control. It is stored as a
revision which can be queried and used as a base for further
changes. Unlike in le-based systems (e.g. Subversion or
Git) where a revision contains a set of les representing a
speci c point in time, a revision in R43ples contains only
one single named graph.</p>
    </sec>
    <sec id="sec-7">
      <title>3.2 Semantic Revision Model</title>
      <sec id="sec-7-1">
        <title>3.2.1 Data Model of Revisions</title>
        <p>
          The whole approach uses semantics in order to avoid hidden
meanings which makes it hard for other clients to access the
information. Thus, revisions are modelled as Linked Data.
The data model uses PROV-O [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] as base ontology and is
extended by some attributes. The vocabulary is called
Revision Management Ontology (RMO). Figure 1 shows an
excerpt of a graph revision model, with one commit generating
a new revision for a speci c named graph (marked in grey).
The revisions are linked to the named graph http://test (via
the property rmo:revisionOf ) and contain a revision
number (rmo:revisionNumber ) for a simple human friendly
representation. The property prov:wasDerivedFrom connects
two revisions and describes the revision graph. The commit
between two revisions is modelled as standard prov:Activity
connected via prov:used and prov:generated attributes. It
holds meta information about commit time (prov:atTime),
commit message (dcterms:title) and the actor committing
the changes (prov:wasAssociatedWith).
        </p>
      </sec>
      <sec id="sec-7-2">
        <title>3.2.2 Naming Graphs for Storing Revisions</title>
        <p>
          The named graph with the URI of the revisioned graph holds
the MASTER revision representing the terminal revision of
the default branch in the revision graph. The information
about other revisions and their connections and further
revisioned graphs is stored in an additional named graph for
each revisioned graph called &lt;r43ples-revisions&gt;. All
revision control systems have to provide information of all
revisions while handling the number of storage. Since \97,3%
of the entire data in each version remains unchanged" [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] it
is necessary to compress this data. Delta-based storage is
the approach of choice here. According to [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] RDF triples
are the smallest unit of change and are thus the basis for
calculating the di erences as deltas between revisions. The
di erences of revisions are again a set of triples and can be
stored in additional named graphs. Every revision consists
of one ADD set and one DELETE set assigned with the
properties rmo:deltaAdded and rmo:deltaRemoved.
Applying these delta sets to the prior revision will lead to the
current revision.
        </p>
      </sec>
      <sec id="sec-7-3">
        <title>3.2.3 Tags and Branches</title>
        <p>The R43ples approach supports tags as references to speci c
revisions via the property rmo:references (as shown in
gure 2). They are of type rmo:Tag and have a unique name
(rmo:tagName) as well as a description (rdfs:description).
Similarly, di erent branches are supported by allowing
different successors of one revision via prov:isDerivedFrom. Each
terminal revision of the generated branches is referenced by
a rmo:Branch entity. The rmo:Master is a subclass pointing
to the default graph. All these references point to copies of
a full graph of this revision via rmo:fullGraph property.
The centralised approach of R43ples can easily achieve the
necessary uniqueness of the revision numbers. The revision
numbers can follow di erent schemes, for example just
ordinals or using a hash. We decided for a more complex
naming scheme which indicates the position of a revision in
the graph. For the system these are just strings for
providing a human-friendly identi er without semantic meaning
(although the revision number, not shown in gure 2, is also
part of the URI, e.g. \3.1-22"). The users need to be able to
retrieve the whole revision graph including the numbers of
the revisions. With R43ples it is possible to receive this
information like any other data via SPARQL queries directly
on the revision graph &lt;r43ples-revisions&gt;.
3.3</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Dynamic Handling of Revisions</title>
      <sec id="sec-8-1">
        <title>3.3.1 Querying Revisions</title>
        <p>Information from the MASTER revision is instantly
available since the whole data set exists in the speci ed named
graph. It is used when the client does not specify a revision.
Therefore, it is likely that it will be accessed very often.
However, other revisions must be generated dynamically as
only the delta information is stored between two revisions.
With respect to the revision to be generated, all triples of
the add set must be added to the the previous revision and
all triples of the delete set must be removed from the
previous revision. R43ples accepts slightly enhanced SPARQL
queries which allow to add the revision number for each
speci ed graph in the SPARQL query. For each named graph g
speci ed in a query, a temporary graph T Gg; r is generated
for the speci ed revision r according to equation 1 (gx = full
materialised revision x of graph g):
nearestBranch</p>
        <p>X
revision i=r
T Gg;r = gnearestBranch+
(deleteSetg;i addSetg;i)
This simple formula can be mapped to a series of SPARQL
queries as presented in the pseudo code below. It rsts
creates a graph &lt;graph -rev_g &gt; merging all change sets.
Afterwards it rewrites the query so it uses this new
temporary graph instead of the speci ed one. The result of the
SPARQL query on that graph is returned after cleaning up
the temporary graph.
def select_query(query):
for (graph,rev_g) in query.graphs_and_revs():
sparql("COPY GRAPH &lt;graph &gt; \</p>
        <p>TO GRAPH &lt;graph -rev_g &gt;")
for rev in graph.path_to_revision(rev_g):
sparql("REMOVE GRAPH &lt;rev.add_set_graph &gt;</p>
        <p>FROM GRAPH &lt;graph -rev_g &gt;")
sparql("ADD GRAPH &lt;rev.delete_set_graph &gt;</p>
        <p>TO GRAPH &lt;graph -rev_g &gt;")
query.replace(graph, "graph -rev_g ")
result = sparql(query_string)
for (graph,rev_g) in query.graphs_and_revs():
sparql("DROP GRAPH &lt;graph -rev_g &gt;")
return result
When considering the merging of revisions, it does not
matter which previous revision is used to generate the merged
revision due to the properties of SPARQL. An INSERT
statement of an existing triple does not insert it a second time
and a DELETE statement of a non-existing triple does not
end in an error message. The add set A and delete set D of
a revision with the set of triples Rm merged from revision
with sets of triples R1 and R2 must comply with the rules
from equations 2 and 3.</p>
        <p>A = (RmnR1) [ (RmnR2)
D = (R1nRm) [ (R2nRm)
(1)
(2)</p>
      </sec>
      <sec id="sec-8-2">
        <title>3.3.2 Updating Revisions</title>
        <p>Clients update revisions via the established SPARQL
UPDATE command. This updates the revision graph with a
new revision node which references the new change sets. The
changes are both re ected in the new add and delete sets as
well as in the updated full graph. However, updates can
only be performed on the terminal sibling of a branch.
If a client wants to update a revision which is not referenced
by a branch, the commit is rejected. The client has to merge
its local changes with the most recent information of the
branch. Merging is the application of two di erent change
sets to one entity. If the local merge is possible, the client
can recommit these merged changes. The other option is to
explicitly create a new branch for the local changes.
The client cannot usually merge if it is unable to reconcile
the changes. These con icts have to be resolved afterwards
in order to get a common consolidated data model in the
revision control system. Thus, the changes have to be
combined or one change has to be selected in preference to the
other. This is performed via an additional administrator
interface on the server.</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>3.4 SPARQL extension for R43ples</title>
      <p>In a SPARQL query it has to be possible to determine the
revision of the involved named graphs. Furthermore, update
queries should contain information about the author and
a commit message. Partly, this information could be
embedded into the name of the graph. However, we strongly
believe that loading identi ers with semantics would be a
violation of the basic principles of Linked Data. Another
option are new keywords or specifying this information as
part of the WHERE clause as triple patterns like ?revision
rmo:revisionOf &lt;sampleGraph&gt; ; rmo:revisionNumber "43".
However, the latter one has the disadvantage that there is
no clear distinction between the speci cation of revision
information and SPARQL query pattern.</p>
      <p>We decided to introduce the additional keyword REVISION
SELECT ? s ?p ? o
FROM &lt;sampleGraph&gt; REVISION "4 3 "
WHERE f</p>
      <p>? s ?p ? o .
to SPARQL to add the necessary semantic. Furthermore,
the update mechanisms need some meta information about
the commit introduced by the keywords USER and
MESSAGE. Finally, the creation of tags and branches is solved
by the keywords BRANCH and TAG.</p>
      <p>In a SELECT query the user can de ne the revision number
by applying the FROM clause with the keyword REVISION.
It can be a number representing a revision, a string
representing a branch or tag (e.g. \master") or empty. When it
is empty or the keyword REVISION is missing, the
MASTER revision will be used as default. An exemplary query
is shown in listing 1.</p>
      <p>Updates (INSERT or DELETE queries) can only be
executed on a branch speci ed by the branch name or the
number of a revision referenced by a branch. In INSERT and
DELETE queries the performing user must rst be de ned.
Therefore the keyword USER is reserved. After the FROM
respectively the INTO clause the keyword REVISION
identi es the graph revision following the same approach as in
a SELECT query. Furthermore, there could be attached a
commit message following the keyword MESSAGE as shown
in listing 2.</p>
      <p>The REVISION parameter is necessary for the SPARQL
endpoint to check to which branch revision a client wants to
apply its changes. If the client wants to update a revision
that is not directly referenced by a branch, the server will
reject the commit. Then, the client needs to check if its data
model is consistent with the new information from a branch
revision. If so, it can resubmit its changes, or it can open
a new branch if there is a con ict the client is not able to
handle. If the branch revision of the server matches that
of the client, the server will accept the change and create
a new revision with the information provided. Then, the
responding branch reference will be forwarded to this new
revision.</p>
      <p>Listing 3 depicts a SPARQL query for generating a new
branch. In the example, a new branch is created with the
information from revision 42. The same interface is
available for creating a tag using the keyword TAG instead of
BRANCH.</p>
      <p>USER &lt;mgraube&gt;
BRANCH &lt;sampleGraph&gt; REVISION "4 2 " TO "</p>
      <p>F e a t u r e xyz "
Listing 3: Query for branching from revision 42</p>
      <sec id="sec-9-1">
        <title>HTTP Parameter</title>
        <p>graph-revision-number
graph-revision-number-of-master</p>
      </sec>
      <sec id="sec-9-2">
        <title>Description</title>
        <p>Revision of graph of
last query
Current MASTER
revision number of
graph
The clients are kept aware of the recent MASTER revision
in every SPARQL response. The HTTP response header is
extended by additional elds which specify the current
MASTER revision number and the revision number on which the
query was executed for every named graph involved. Table 1
describes the construction of the parameter names. All
underlined sub strings are replaced with the current named
graph under version control. This information is not needed
by the client for querying. Yet it provides the new revision
number after a commit and is thus very useful for the client.</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>4. IMPLEMENTATION</title>
      <p>The concept was implemented as proof of concept and its
source code is publicly available via GitHub5. The prototype
is realised as a SPARQL proxy rather than a modi cation of
an existing open-source SPARQL endpoint. The
implementation works as a Java application. Jersey6 is used as
RESTful (Representational State Transfer) Web service framework
and grizzly7 as the web server while Virtuoso8 acts as triple
store and SPARQL endpoint. A live demonstration
system is running on http://eatld.et.tu-dresden.de:8890/
r43ples/sparql.</p>
      <p>Figure 3 shows the system structure. If a client wants to
use the revision control features of R43ples it has to send
the SPARQL queries to R43ples' SPARQL endpoint instead
of the triplestore's endpoint. Furthermore there is an
administrator interface which acts as a test bed for functions
that don't yet have a proper REST interface. These
functions are controlled by a command line interface and perform
complex management of the graphs under version control.
R43ples stores no information about the revisions itself but
5https://github.com/plt-tud/r43ples
6https://jersey.java.net/
7https://grizzly.java.net/
8http://virtuoso.openlinksw.com/
CONSTRUCT f? s ?p ? o g WHERE f</p>
      <p>GRAPH &lt;NEW REV TEMP&gt; f ? s ?p ? o g
FILTER NOT EXISTS f GRAPH &lt;LAST REV&gt; f ? s
?p ? o g g
uses a con gured triplestore which is accessed by the triple
store interface. The communication is based on SPARQL
queries. To ensure the integrity of the data, only the SPARQL
proxy should have access to the di erent graphs which it
creates. Access rights are de ned in the triple store. The clients
need to know if the endpoint supports R43ples features in
addition to standard SPARQL. Hence, R43ples copies the
SPARQL 1.1 Service Description9 of the connected endpoint
and adds sd:r43ples as further sd:Feature.</p>
      <p>The implemented proxy SPARQL endpoint can also handle
standard SPARQL queries. Of course, this raises the
requirement that the revisioned graph shall be only edited by
R43ples and its speci c queries. Otherwise inconsistencies
would be generated. Virtuoso supports such access policies
for the SPARQL endpoint, prohibiting write access to the
&lt;r43ples-revisions&gt; graph and all graphs which are related
to R43ples.</p>
      <p>The generation and update of the version system
information is completely implemented with the help of SPARQL
queries. R43ples performs a SPARQL update on a
temporary copy of the full graph of the speci ed branch.
Afterwards, it retrieves all added triples with the SPARQL
query from listing 4 which returns all triples which are in
NEW-REV-TEMP but not in LAST-REV. After the same
concept was used for the removed triples, the ADD and
DELETE sets are constructed with the help of a SPARQL
CONSTRUCT query. Then the new revision information is
inserted in &lt;r43ples-revisions&gt; and the actual full graph is
updated with the help of INSERT and DELETE queries.
The administrator interface o ers an additional way for
interacting with R43ples for those features which don't have
a friendly REST interface yet. Those tasks are currently:
Put an existing graph under revision management
Import a new graph under version control
Generate visualisation of the revision graph (yEd
export)
Set a new MASTER revision</p>
      <p>Merge two revisions
The admin interface currently supports turtle serialisation10
for the export and import of RDF data. The visualisation
of the revisions, their connections and branches is done by
creating a GraphML le which can be viewed with yEd11.
9http://www.w3.org/TR/sparql11-service-description/
10http://www.w3.org/2007/02/turtle/primer/
11http://www.yworks.com/de/products_yed_about.html
The merging feature is still under construction while we are
investigating di erent approaches for a user friendly
interface.</p>
    </sec>
    <sec id="sec-11">
      <title>5. EVALUATION</title>
    </sec>
    <sec id="sec-12">
      <title>5.1 Response Time</title>
      <p>An important metric for evaluating the usability of this
concept is the response time of the service for R43ples queries
in various con gurations. Therefore, we have measured the
time between the request sent by the client and the response
received using Apache jMeter12. We evaluated the operation
time of R43ples in a complex setup on a 4 GB RAM system
running a Virtuoso 7 as SPARQL endpoint connected to
R43ples. We generated random data sets with sizes of 100,
1000, 10000 and 100000 triples. Then we created ten
revisions for each data set with changes of 10 to 100 triples.
Finally, we measured the response time for a simple SPARQL
query (querying all triples and limiting them to ten results)
dependent on all data sets, all revisions and all di erent
change sizes. The measurement was repeated 20 times to
capture random e ects such as computing load.
Figure 4 presents some results showing the response time
in comparison to variations of the three variables around a
speci c setup (1000 triples in the data set, going back ve
commits into the past with 50 triples changing in every
commit). The left plot shows that the response time increases
linearly with the number of commits plus a constant bias of
some milliseconds. The size of the commit seems also to be
almost linear to the response time as suggested by the
middle plot. Even the size of the data set has linear in uence
(note the logarithmic scale in the right plot).</p>
      <p>A deeper analysis shows that the structure of the data set
is not signi cant. The overhead for querying a revision that
is available as full graph is about 10 ms in comparison to a
direct SPARQL query and is thus almost negligible.
However, if the revision has to be generated by R43ples, the
dominant factors are the overall size of changes to be
reversed and the size of the data set. Equation 4 lists a
simple linear model which almost exactly re ects these ndings
(R2 = 0:98) with the variables T as R43ples response time
in milliseconds, SDS as data set size, SC as change size and
P as path length to a full graph revision. Thus, in many
application T would be of order O(SDS).</p>
      <p>T = 100 + 0:06 SDS + 0:7 (P
SC )
(4)
The results makes sense since the algorithm has to duplicate
the graph and then apply all changes. Both e orts are
proportional to the size. As minor result R43ples can perform
few revisions and big changes in each revision step better
than lots of small changes assuming that the overall
number of changed triples is the same. Furthermore, UPDATE
query time increases linearly with the size of the committed
change set.</p>
    </sec>
    <sec id="sec-13">
      <title>5.2 Storage</title>
      <p>The costs for a new revision S ;Revision (in additional triples)
are almost proportional to the size of changes and
independent from the complexity of previous revisions and the
revision graph (S ;Revision = SC + 12). The additional xed
12http://jmeter.apache.org/
triples in &lt;r43ples-revisions&gt; (six for a revision; six for the
commit) are negligible. The creation of a branch or a tag
copies the full graph besides the addition of a xed number
of triples (S ;Tag = SDS + 11; S ;Branch = SDS + 15).</p>
    </sec>
    <sec id="sec-14">
      <title>6. DISCUSSION</title>
      <p>Although the approach presented here solves most of the
versioning problems, there are also some drawbacks.
Named graphs are used extensively, mainly for storing di
erences between revisions. This means that the use of named
graphs for other purposes cannot be guaranteed. Those
purposes could be structuring of information, access control or
additional provenance information. One might ask if we
need an additional context attribute as a fth element
explicitly declared for revision control.</p>
      <p>The concept is fully transparent for SPARQL clients which
are not aware of the R43ples version control system. They
can use the prototype as common SPARQL endpoint
without additional features always working on the master
revision. Clients can easily check if an endpoint supports
R43ples query by evaluating the service description of the
endpoint.</p>
      <p>Clients will usually work on MASTER or other branches in
order to get the most recent information. However, there
could be situations when clients should continuously work
on a speci c revision of a graph. Then, this revision of the
graph should be tagged in order to store a full copy. A
possible solution would be the automatic detection of such
frequently used revisions and triggering of tag generation.
Another drawback is the lack of support for blank nodes in
the current implementation. You can't assume that blank
nodes from di erent graphs with the same blank node
identi ers are the same. For example, the blank nodes in the
change sets are not equal to the ones in the full graph,
prohibiting a correct application of the changes when
generating an old revision. This can of course be solved by a
prior Skolemization which should ideally be performed by
the client or could also be achieved by an enhanced version
of R43ples before executing a SPARQL query.</p>
      <p>Currently, the generation of uncached revision follows a
simple approach applying all changes from the rst successor
until reaching the leaf of a branch. However, if there are
many tags in the revision graph, it could be more e cient
to use another revision path to generate this revision. Thus,
one has to solve a shortest-path-problem.</p>
      <p>Another point of discussion is the way of transferring the
necessary additional information. Currently, the R43ples
SPARQL server transports the MASTER revision as well as
the relevant revisions of all involved graphs in the HTTP
header. On the other hand, the R43ples clients transport
information about the graph revisions in the HTTP body.
An alternative would be to transfer both information in the
HTTP body and thus on the same level. This would need
an extension of the SPARQL result model.</p>
      <p>The integration of version control into the existing Semantic
Web tool environment is not easy. A basic requirement is
that these tools don't work on a le basis but on a triplestore
with SPARQL interface. Under these circumstances it would
be no big problem to exchange the SPARQL interface with
the slightly enhanced R43ples interface.</p>
      <p>
        The performance of the prototype limits the application to
medium sized data sets. Queries on data sets with more
than a few thousand triples take longer than most users
are willing to wait. This can be solved by splitting large
data sets into smaller ones and by directly implementing the
concept into the SPARQL endpoint which should improve
performance considerably. Another promising approach we
are currently working on is the use of enhanced SPARQL
rewriting in order to perform the query taking into account
the full graph and all change sets in one request Hence, the
generation of the whole graph for the speci ed revision is
not necessary which really takes long time for big datasets.
Finally, security is a crucial point for all industrial
applications. We rely on the adaptable security mechanisms of
existing triple stores and SPARQL endpoints. These should
only provide information about the revision tree and the
revisioned data sets to authenticated and authorised users.
This could be achieved for example by the approach
suggested by [
        <xref ref-type="bibr" rid="ref11 ref5">11, 5</xref>
        ].
      </p>
    </sec>
    <sec id="sec-15">
      <title>7. CONCLUSIONS</title>
      <p>We have presented a concept for a semantic revision
control system for Linked Data which uses the capabilities of
SPARQL. The implemented prototype works well for
querying cached graphs. The generation of uncached graphs is
sufcient for small to medium sized data sets. The advantage
of our approach is that it is completely based on semantics
and thus the information about revisions can be retrieved via
SPARQL. Furthermore, SPARQL is used as access
mechanism with only slight adaptations in order to ensure the
semantic use of revision information while keeping the query
compatible to standard SPARQL.</p>
      <p>However, this concept still needs further research. Our next
steps will involve investigating di erent merging approaches
and an intensive consideration of how this concept can be
integrated into existing tools.</p>
    </sec>
    <sec id="sec-16">
      <title>8. ACKNOWLEDGEMENTS</title>
      <p>This research was partly funded by the European
Commission on the grant number 284928 (ComVantage).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Herre</surname>
          </string-name>
          .
          <article-title>A versioning and evolution framework for RDF knowledge bases</article-title>
          .
          <source>In Perspectives of Systems Informatics, page</source>
          <volume>55</volume>
          {
          <fpage>69</fpage>
          . Springer,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Berners-Lee and K. O'Hara.</surname>
          </string-name>
          <article-title>The read-write linked data web</article-title>
          .
          <source>Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences</source>
          ,
          <volume>371</volume>
          (
          <year>1987</year>
          ):
          <volume>20120513</volume>
          {
          <fpage>20120513</fpage>
          ,
          <string-name>
            <surname>Feb</surname>
          </string-name>
          .
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Cassidy</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Ballantine</surname>
          </string-name>
          .
          <article-title>Version control for RDF triple stores</article-title>
          .
          <source>ICSOFT</source>
          (ISDM/EHST/DC),
          <volume>7</volume>
          :5{
          <fpage>12</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Ermilov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Heino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tramp</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          .
          <article-title>Ontowiki mobile{knowledge management in your pocket</article-title>
          .
          <source>In The Semantic Web: Research and Applications</source>
          , page
          <volume>185</volume>
          {
          <fpage>199</fpage>
          . Springer,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Graube</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ortiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Carnerero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Lazaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Uriarte</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Urbas</surname>
          </string-name>
          .
          <article-title>Flexibility vs. security in linked enterprise data access control graphs</article-title>
          .
          <source>In Proc. of 9th IEEE Int. Conf. on Information Assurance and Security</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Graube</surname>
          </string-name>
          , J. Pfe er, J. Ziegler, and
          <string-name>
            <given-names>L.</given-names>
            <surname>Urbas</surname>
          </string-name>
          .
          <article-title>Linked data as integrating technology for industrial data</article-title>
          .
          <source>International Journal of Distributed Systems and Technologies (IJDST)</source>
          ,
          <volume>3</volume>
          (
          <issue>3</issue>
          ):
          <volume>40</volume>
          {
          <fpage>52</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hurtado</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaisman</surname>
          </string-name>
          .
          <article-title>Introducing time into RDF</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          ,
          <volume>19</volume>
          (
          <issue>2</issue>
          ):
          <volume>207</volume>
          {
          <fpage>218</fpage>
          ,
          <string-name>
            <surname>Feb</surname>
          </string-name>
          .
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.-H.</given-names>
            <surname>Im</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-W.</given-names>
            <surname>Lee</surname>
          </string-name>
          , and H.
          <string-name>
            <surname>-J. Kim</surname>
          </string-name>
          .
          <article-title>A version management framework for RDF triple stores</article-title>
          .
          <source>International Journal of Software Engineering and Knowledge Engineering</source>
          ,
          <volume>22</volume>
          (
          <issue>01</issue>
          ):
          <volume>85</volume>
          {
          <fpage>106</fpage>
          ,
          <string-name>
            <surname>Feb</surname>
          </string-name>
          .
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Murdock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Buckner</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Allen</surname>
          </string-name>
          .
          <article-title>Containing the semantic explosion</article-title>
          .
          <source>In Procedings of PhiloWeb</source>
          , Lyon,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D.</given-names>
            <surname>Ognyanov</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Kiryakov</surname>
          </string-name>
          .
          <article-title>Tracking changes in RDF(S) repositories</article-title>
          .
          <article-title>In Knowledge Engineering and Knowledge Management: Ontologies and the Semantic Web</article-title>
          , page
          <volume>373</volume>
          {
          <fpage>378</fpage>
          . Springer,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ortiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Lazaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Uriarte</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Carnerero</surname>
          </string-name>
          .
          <article-title>Enhanced multi-domain access-control for secure mobile collaboration through linked data cloud in manufacturing</article-title>
          .
          <source>In Proceedings of IEEE World of Wireless Mobile and Multimedia Networks (WoWMoM) conference</source>
          <year>2013</year>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>G.</given-names>
            <surname>Taentzer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ermel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Langer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Wimmer</surname>
          </string-name>
          .
          <article-title>A fundamental approach to model versioning based on graph modi cations: from theory to implementation</article-title>
          .
          <source>Software &amp; Systems Modeling, page</source>
          <volume>1</volume>
          {
          <fpage>34</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Vander Sande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Colpaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Coppens</surname>
          </string-name>
          , E. Mannens, and
          <string-name>
            <surname>R. Van de Walle. R</surname>
          </string-name>
          &amp;
          <article-title>Wbase: git for triples</article-title>
          .
          <source>In Proceedings of the 6th Workshop on Linked Data on the Web</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Vo</surname>
          </string-name>
          <article-title>lkel and T. Groza. SemVersion: an RDF-based ontology versioning system</article-title>
          .
          <source>In Proceedings of the IADIS international conference WWW/Internet</source>
          , volume
          <volume>2006</volume>
          , pages
          <fpage>195</fpage>
          |
          <fpage>202</fpage>
          .
          <string-name>
            <surname>Citeseer</surname>
          </string-name>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <fpage>W3C</fpage>
          .
          <article-title>PROV-O: the PROV ontology</article-title>
          ,
          <source>Apr</source>
          .
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>E. R.</given-names>
            <surname>Watkins</surname>
          </string-name>
          and
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Nicole</surname>
          </string-name>
          .
          <article-title>Version control in online software repositories</article-title>
          .
          <source>In Proceedings of the 2005 International Conference on Software Engineering Research and Practice</source>
          , volume
          <volume>2</volume>
          , page
          <volume>550</volume>
          {
          <fpage>556</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>