<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>April</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Provenance Information in the Web of Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Olaf Hartig</string-name>
          <email>hartig@informatik.hu-berlin.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Humboldt-Universit a ̈t zu Berlin Department of Computer Science Berlin</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2009</year>
      </pub-date>
      <volume>20</volume>
      <issue>2009</issue>
      <abstract>
        <p>The openness of the Web and the ease to combine linked data from different sources creates new challenges. Systems that consume linked data must evaluate quality and trustworthiness of the data. A common approach for data quality assessment is the analysis of provenance information. For this reason, this paper discusses provenance of data on the Web and proposes a suitable provenance model. While traditional provenance research usually addresses the creation of data, our provenance model also represents data access, a dimension of provenance that is particularly relevant in the context of Web data. Based on our model we identify options to obtain provenance information and we raise open questions concerning the publication of provenance-related metadata for linked data on the Web.</p>
      </abstract>
      <kwd-group>
        <kwd>Provenance</kwd>
        <kwd>Lineage</kwd>
        <kwd>Web Data</kwd>
        <kwd>Web of Data</kwd>
        <kwd>Linked Data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>Today, a large amount of RDF data is published on the
Web; large datasets are interlinked; new applications emerge
that utilize this data in novel and innovative ways. An
upcoming challenge that has to be addressed in these
applications is the evaluation of qualities of the data retrieved from
the Web, qualities such as accuracy, timeliness, reliability,
and trustworthiness.</p>
      <p>
        A recent study shows that one of the main factors that
influence the trust of users in Web content is provenance [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
Thus, a common approach for data quality assessment is
the analysis of provenance information. “Information about
provenance constitutes the proof of correctness [...] and [...]
determines the quality and amount of trust [...]” [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ]
      </p>
      <p>
        Provenance information about a data item is information
about the history of the item, starting from its creation,
including information about its origins. Tan [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ]
distinguishes two granularities of provenance: workflow (or
coarsegrained) provenance and data (or fine-grained) provenance.
Workflow provenance represents “the entire history of the
derivation of the final output of” [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ] a workflow. Davidson
et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] provide an overview of provenance in workflow
systems. Data provenance, in contrast, provides a more
detailed view on the derivation of single pieces of data. A
particular area of research on data provenance is provenance in
databases which considers the provenance of query results.
In this context, Buneman et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] distinguish why- and
where-provenance: why-provenance represents the origins
that were involved in calculating a single entry of a query
result; where-provenance refers to the exact locations an
element of a query result entry has been extracted from. Green
et al. [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] additionally introduce how-provenance that, in
contrast to why-provenance, describes how the origins were
involved in the calculation.
      </p>
      <p>
        While a great many approaches exist that represent
provenance [
        <xref ref-type="bibr" rid="ref29 ref30 ref4">4, 29, 30</xref>
        ], none of these explicitly addresses the
characteristics of provenance of data from the Web. Web data
provenance includes the access of data items on the Web,
an information not required in the context of self-contained
systems such as a DBMS or a workflow management system.
      </p>
      <p>In this paper we discuss provenance of Web data. We aim
to provide a base for research on the application of
provenance information to assess qualities of linked data from the
Web. Our main contributions are the following:
• We propose a provenance model that captures both,
information about Web-based data access as well as
information about the creation of data.
• We describe options to obtain provenance information
by accessing metadata on the Web.
• We analyze vocabularies for RDF data that allow to
describe provenance information.
• We identify open questions concerning the publication
of provenance-related metadata for linked data on the
Web.</p>
      <p>This paper is structured as follows. First, Section 2 reviews
related work. In Section 3 we introduce our provenance
model for Web data. A discussion of options to obtain
provenance information is given in Section 4. Finally, Section 5
raises open questions and Section 6 concludes this paper.
2.</p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK</title>
      <p>
        Representing and analyzing provenance is a topic of
research since many years [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Simmhan et al. [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ] provide
a taxonomy of provenance characteristics. The authors
differentiate between data-oriented approaches and
process-oriented approaches. While data-oriented approaches focus
on data items, process-oriented approaches emphasize
information about the processes that consume and generate the
data. Due to its level of abstraction our provenance model
can be used as a basis for both types of approaches as well
as for hybrid approaches.
      </p>
      <p>
        An approach to model provenance on a more detailed level
is the Open Provenance Model introduced by Moreau et
al. [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. Similar to our model, the authors distinguish three
types of pieces of provenance information: artifacts,
processes, and agents. The Open Provenance Model represents
provenance by graphs. The nodes in these graph represent
the artifacts, processes, and agents. The edges are directed
and they have a predefined semantic depending on the type
of the adjacent nodes. For instance, an edge that connects a
process with an agent means the process was controlled by
the agent. Some edges can be annotated with a use
casespecific role. Due to its more detailed representation the
Open Provenance Model can be used to realize the
description of parts of a provenance graph that complies with our,
more abstract model.
      </p>
      <p>
        Bunemann et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] raise several open questions for data
provenance in the age of the Web. The authors identify three
main issues: i) obtaining provenance information, ii) citing
components of a digital library such as (components of) a
document in another context, and iii) ensuring integrity of
citations under the assumption that cited databases evolve.
We address the first issue by discussing options to obtain
provenance information in Section 4.1.
      </p>
      <p>
        Harth et al. [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] argue for a provenance model for the Web
that includes a “social dimension to associate provenance
with the originator (typically a person) of a given piece of
information.” Given such a model it is possible to embed
provenance-based quality assessments in the social context
of users. We agree to the authors’ request. With our
provenance model we encourage to represent human actors and
their relation to data items.
      </p>
      <p>
        A more technical notion of provenance is represented by
Ding et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] who understand the provenance of RDF
data as the RDF graphs of which parts of an analyzed RDF
graph has been derived from. The authors argue that
tracking complete RDF graphs is too coarse-grained and that a
representation on the level of single RDF statements is
unsuitable, too. For this reason, Ding et al. introduce RDF
molecules as the finest sub-graphs that can be generated by
a lossless decomposition of an RDF graph. Our provenance
model represents data items on an abstract level. Thus,
actual applications may use any level of granularity: RDF
graphs, statements, or RDF molecules.
      </p>
      <p>
        Hausenblas et al. [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] touch another aspect of Web data
provenance. The authors distinguish sources of Web data
based on the way these sources represent RDF data. Sources
may contain RDF data in a non-serialized form (e.g.
inmemory, in-store) or arbitrary data in a serialized form.
Sources with serialized data may be i) RDF model-compliant
and standalone, ii) RDF model-compliant and embedded, or
iii) non-compliant to RDF model. Our provenance model
also differentiates between the data itself and its
representation in a document.
      </p>
      <p>
        Another approach that considers provenance in the
context of the Semantic Web has been developed in the
Inference Web project. Da Silva et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] describe a provenance
infrastructure that supports “the extraction, maintenance
and usage of knowledge provenance related to answers of
web applications and services.” The term knowledge
provenance refers to information about the origin of knowledge
and about the reasoning processes used to produce answers.
In [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] the authors present the Proof Markup Language to
describe justifications for results of an answering engine or
a reasoner. A formal definition of justifications for
entailments in OWL ontologies is provided by Horridge et al. [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ].
These justifications may describe the execution of a specific
kind of data creation processes represented by our
provenance model.
3.
      </p>
    </sec>
    <sec id="sec-3">
      <title>A MODEL OF WEB DATA PROVENANCE</title>
      <p>
        Provenance research in the context of databases [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ] or in
the context of workflows [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] usually focuses on the creation
of data items such as query results and data products. In
the majority of cases, these approaches apply a notion of the
sources of a data item that is directly related to the creation
process. To represent the provenance of data from the Web
we need an additional dimension. Provenance information
of Web data must comprise the aspect of publishing and
accessing data on the Web. Questions such as who operates
the service that provides a dataset are equally important as
asking for the entity that created the data. For this reason,
we suggest a provenance model for data from the Web that
includes both dimensions, the creation and the access of
data. In this section we describe our model: we introduce
the basic elements, we present the data creation dimension,
and we describe the representation of data access.
3.1
      </p>
    </sec>
    <sec id="sec-4">
      <title>Basics of the Provenance Model</title>
      <p>
        Provenance information can be used for various purposes.
Possible uses are the estimation of data quality, the tracing
of audit trails of data, the repetition of data derivations, the
determination of liabilities, and the discovery of data [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ].
The main purpose of our provenance model is to support
the assessment of data qualities such as accuracy, reliability,
and timeliness.
      </p>
      <p>We propose to describe the provenance of a specific data
item from the Web (e.g. a specific RDF graph or RDF
statement) by a provenance graph. The nodes of
provenance graphs are provenance element s that represent pieces
of the provenance information about the data, pieces such
as the actual creator of a specific dataset. Our provenance
model identifies different types of provenance elements and
it describes the relationships between these types and, thus,
between the possible provenance elements in a provenance
graph. Since provenance information for a data item may
comprise information about source data a provenance graph
for the data item may contain subgraphs that describe the
provenance of the source data. Thus, our understanding of a
data item does not only include RDF graphs and RDF
statements but it also covers typical source data such as workflow
results and database entries (Table 3 in Appendix B lists
further examples for data items).</p>
      <p>We broadly distinguish three types of provenance elements:
actors, executions, and artifacts. An actor usually performs
the execution of an action or a process which – in most
cases – yields an artifact such as a specific dataset. An
execution may include the use of artifacts which, in turn, might
be the result of another execution. Furthermore, direct
relationships between artifacts as well as between actors may
exist. For instance, a specific company is responsible for its
Web server.</p>
      <p>Our model defines different element types as
specializations of actors, executions, and artifacts. For example, data
creators are a specific kind of actors and RDF graphs are
a specific kind of artifacts. Each provenance element in a
provenance graph corresponds to at least one of these types.
The edges in a provenance graph correspond to the
relationships between element types of the adjacent provenance
elements.</p>
      <p>All provenance elements have attributes that represent
provenance-specific information about the elements. For
instance, a specific data creator may have a name and may
work for a well-known organization. The actual attributes
and their extent depend on the element type and the needs
in the application scenario. For some element types,
however, all conceivable provenance elements will likely have
attributes of the same kind. We call these attributes
universal attributes and associate them with the corresponding
element types in our model.</p>
      <p>Please notice, to cover a broad variety of applications we
designed our provenance model with generality in mind.
Different applications of our model may have different needs
with respect to the amount and granularity of the
represented provenance information. For this reason, the
elements of our model abstract from actual use cases; we do
not suggest a specific implementation of provenance graphs
nor do we prescribe attributes that must be used for specific
provenance elements. However, we give examples for various
provenance element types in Appendix B.
3.2</p>
    </sec>
    <sec id="sec-5">
      <title>Data Creation</title>
      <p>The creation of a data item can be a complex process such
as executing a sophisticated workflow. Thus, provenance
information about a data item may include a comprehensive
description of the execution of data creation processes. On
the other hand, the creation of data can be as
straightforward as performing a simple action such as filling out a Web
form. To represent this simple action as provenance of the
derived data a few details may suffice. Furthermore, what
is a simple action in one situation may be considered as a
complex process in another case.</p>
      <p>
        Our provenance model abstracts from the different notions
of data creation. Figure 1 depicts the relationships of all
element types that cover the data creation dimension. The
central element type is the data creation. Data creations
represent the execution of actions or processes that create new
data items. Thus, in the provenance graph of a specific data
item actual data creations are represented by provenance
elements of the data creation type. All data creations have a
creation time and use a method (cf. the universal attributes
in Figure 1). Examples for data creation methods are the
aforementioned completion of a Web form as well as the
execution of a workflow, of a query, of a transformation, or of
a reasoning process. The existence of further,
provenancerelated attributes of a data creation and the granularity of
these attributes depend on the specific creation method and
on the application of the provenance model. For instance,
the execution of a query could be associated with a
representation of how-provenance [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]; the inference of data could
be represented with a justification [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ][
        <xref ref-type="bibr" rid="ref23">23</xref>
        ].
      </p>
      <p>Provenance elements that are associated with a data
creation are the created data item, data creators, source data,
and creation guidelines. Data creator s are actors that
perform the data creation. Our model distinguishes non-human
and human data creators. Non-human creators are data
creating devices such as sensors and data creating services such
as software agents, reasoners, query engines, or workflow
engines. Human data creators, called data creating entities, are
persons, groups, organizations, etc. Data creating entities
may create the data directly – as in the Web form example –
or they are responsible for a non-human data creator that
creates the data. A provenance-relevant attribute of all data
creating entities is their relation to the created data. Further
attributes depend on the actual provenance element and on
the implementation of the provenance model. For example,
a data creating service may implement a specific algorithm
and, usually, has a version number and a developer.</p>
      <p>A data creator often makes use of source data to create
new data. Examples for source data are the content of a
document used for machine learning, the statements in a
knowledge base used to entail a new statement, and the
entries in a database used to answer a query. The
granularity by which a provenance graph represents source data
depends on the use case. Notice, not every data creation
uses source data as indicated by the dashed connection in
Figure 1. However, all source data has provenance which
should be represented as a subgraph in the provenance graph
of the created data. Thus, a provenance graph may
recursively contain subgraphs for source data, for the sources of
source data, and so on.</p>
      <p>Further input artifacts that may be associated with a data
creation are the creation guidelines which guided the
execution of the data creation. Examples for creation guidelines
are transformation rules, mapping definitions, entailment
rules, and database queries.
3.3</p>
    </sec>
    <sec id="sec-6">
      <title>Data Access</title>
      <p>A system that uses Web data must access this data from
a provider on the Web. Information about this process
and about the providers is important for a representation
of provenance that aims to support the assessment of data
qualities. Hence, our provenance model introduces element
types that give attention to the data access dimension.
Figure 2 depicts the main element types and their relationships.</p>
      <p>
        Data published on the Web is embedded in a host
artifact, usually a document. Following the terminology of the
W3C Technical Architecture Group we call this artifact an
information resource [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. Each information resource has a
type, e.g., it is an RDF document or an HTML document.
      </p>
      <p>
        A system, the data accessor, retrieves information
resources from a provider. Our provenance model allows a detailed
representation of providers by distinguishing data providing
services, data publishers, and service providers. A data
providing service is a non-human actor – usually a Web service
or a server – that processes data access requests and actually
sends the information resource over the Web.
Provenancerelated attributes of data providing services may be a
description of the software that realizes the service. Notice, a
data providing service in a provenance graph may also be a
data creating service. For instance, a D2R server [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] creates
RDF data from a relational database and provides this data
as linked data in RDF documents, in HTML documents, or
as the result of SPARQL queries.
      </p>
      <p>Data publisher s are persons, groups, or organizations that
use a data providing service to publish data on the Web.
Similar to the services, a data publisher may also be the
data creating entity. The service provider element type
represents entities such as a person, a groups, or an organization
that controls a data providing service. Our model introduces
this type because data publishers may publish their data on
platforms that are provided by a third party, the service
provider. Information about this third party may be
relevant as provenance information. However, a data publisher
may administer its own service and, thus, could be the
service provider itself.</p>
      <p>Our provenance model represents the actual execution of
a data access by an element type called data access.
Provenance information that is common to all provenance
elements of this type is the access time and the access method.
A major access method an HTTP-based resource request
where the URI of the requested resource would be another
provenance-related attribute of the corresponding
provenance elements. Additional attributes of data access provenance
elements may represent the content negotiation [17,
Section 12] and possible redirections [17, Section 10.3] that
happened during the data access. Further examples for access
methods are API-based data access and its specialization,
query-based access. Provenance information in these cases
are the API call with its parameters and the issued query,
respectively. Notice, query-based data access usually is a
data creation too.</p>
      <p>Further provenance information not considered so far is
the availability and validity of digital signatures. Since this
information is important to assess the quality of data our
provenance model contains additional element types, namely
intergrity assurances, digital signatures, public keys and
signers (cf. Figure 3).</p>
      <p>
        An intergrity assurance basically represents the
verification of a digital signature for the signed artifact. The
verification requires the public key of the signer. Intergrity
assurances are associated with information about the result
of the verification. Digital signatures have several properties
that are relevant as provenance information, e.g., the date
of issue and the signature scheme [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. The provenance
element for a public key could describe the creation and the
expiration date.
      </p>
    </sec>
    <sec id="sec-7">
      <title>OBTAINING PROVENANCE INFORMA</title>
    </sec>
    <sec id="sec-8">
      <title>TION</title>
      <p>A system that applies our provenance model generates
provenance graphs for data items. To create the provenance
elements of such a graph the system has to collect
different pieces of provenance information automatically. In this
section we discuss options to obtain provenance information
for Web data.</p>
      <p>Some pieces of provenance information can be recorded
by a system; for other pieces the system relies on
metadata provided by third parties. Thus, we distinguish
recordable provenance information and metadata-reliant
provenance information. Basically, recordable provenance
information is information on executions that are performed by
the system itself or that can sufficiently be monitored by
the system. Usually, these executions are data accesses
initiated by the system, signature verifications, and local
data creations. Metadata-reliant provenance information,
in contrast, can not be recorded automatically but requires
the evaluation of metadata that is published on the Web.
Metadata-reliant provenance information comprises
information about executions inaccessible to the system as well
as information about actors and artifacts involved in these
executions. Furthermore, obtaining more exhaustive
provenance information about certain actors involved in accessible
executions may also require metadata. For instance, even if
a system can record information about a self-initiated data
access a proper representation of the involved providers
requires metadata.</p>
      <p>
        Recording provenance information is a fundamental topic
of provenance research. For instance, Bose and Frew [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
study scientific workflow management systems that aim to
track provenance for data products; Horridge et al. [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]
present concepts to compute justifications for entailments in
ontologies; Tan [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ] discusses different approaches to
propagate and to compute provenance of query results in database
systems. The concepts developed in these contexts can be
adapted to generate recordable provenance information in
the context of Web data. On that account, we focus on
metadata-reliant provenance information in the remainder
of this section. We identify methods to access relevant
metadata on the Web, we analyze vocabularies that allow a
representation of provenance-related metadata, and we study
the existence of such metadata on the Web.
4.1
      </p>
    </sec>
    <sec id="sec-9">
      <title>Accessing Provenance-Related Metadata</title>
      <p>Provenance-relevant metadata is either directly attached
to a data item or its host document or it is available as
additional data on the Web. Examples for attached metadata
are RDF statements about an RDF graph that contains the
statements, author and creation date of blog entries added
to a syndication feed, or information about an image
embedded in the image file. Both, attached metadata and detached
metadata, may be represented in RDF using vocabularies as
described in Section 4.2 or it may be data of another form.
In the following, we present options to discover detached
metadata on the Web.</p>
      <p>
        Accessing data on the Web is often based on HTTP URIs.
Since these URIs are grounded in the Domain Name System
(DNS) it is possible to query a WHOIS [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] service in order
to get provenance information about the accessed data item.
However, the responses of WHOIS services are hardly usable
for automatic evaluation because the WHOIS protocol does
not prescribe a standard form to structure returned data.
      </p>
      <p>
        A source of data about the content provided by a Web
server are sitemaps. A sitemap is an XML document that
informs search engine crawlers about URLs on a website.
The semantic sitemap approach [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] extends these
documents with information about the location of RDF data
and about alternative means to access this data (e.g. data
dumps and SPARQL endpoints). Even if the information in
a semantic sitemap is marginally provenance-related an
important element is the specification of a URI that represents
referenced datasets. Given the provider follows the linked
data principles a look-up of these URIs will yield RDF-based
metadata that may describe provenance information about
the datasets.
      </p>
      <p>
        Using linked data, in any case, is an important approach
to discover provenance-related metadata. As with the URI
of an RDF dataset it is possible to look up any HTTP URI
that represents a piece of provenance (e.g. the URI of a
data item such as a named RDF graph [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] or the URI of an
actor). Moreover, collecting provenance information may
involve following RDF links in order to get more complete
information during the generation of provenance graphs.
      </p>
      <p>
        Another method to discover metadata about Web
resources is POWDER [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ], the Protocol for Web Description
Resources. POWDER introduces so called description
resources to describe resources on the Web. These
descriptions are either based on RDF data or on simple keywords
(i.e. tags) and they may contain provenance information.
      </p>
      <p>
        A specific kind of actors represented by our provenance
model are Web services that create or provide data.
Different standards exist to describe Web services [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ]. These
descriptions may also contain information that are relevant
for provenance graphs.
4.2
      </p>
    </sec>
    <sec id="sec-10">
      <title>Provenance-Related Vocabularies</title>
      <p>Various vocabularies exist that allow to describe
provenance information with RDF data. In the following, we
describe these vocabularies and we relate the classes and
properties defined by these vocabularies to the elements of
our provenance model. Afterwards, we study the presence
of the vocabularies in the Web.</p>
      <p>
        A popular standard to represent general-purpose
metadata are the Dublin Core Metadata Terms [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] which are
available as an RDFS Schema. The following properties
defined by this schema can be associated with a resource to
describe provenance information:
• dcterms:contributor1, dcterms:creator – The
contributor of a resource is defined as “an entity
responsible for making contributions to the resource” [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] and
the creator is “an entity primarily responsible for
making the resource.” [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] Thus, these properties may be
used to obtain information about data creators of a
data item. However, the actual type of the referenced
1We use the following namespace prefixes in this paper:
dcterms: http://purl.org/dc/terms/
dc11: http://purl.org/dc/elements/1.1/
foaf: http://xmlns.com/foaf/0.1/
sioc: http://rdfs.org/sioc/ns#
swp: http://www.w3.org/2004/03/trix/swp-2/
wot: http://xmlns.com/wot/0.1/
iwProv: http://inferenceweb.stanford.edu/2006/06/pml-provenance.owl#
ouzo: http://www.mygrid.org.uk/provenance#
cs: http://purl.org/vocab/changeset/schema#
creators as well as their role in the data creation
process remain unclear because the Dublin Core schema
does not distinguish different types of data creators as
our provenance model does. Analyzing data about the
creator (or about the contributor) may yield further
information (e.g. the type) that can help to derive a
more precise provenance graph.
• dcterms:source – The source of a resource is “a
related resource from which the described resource is
derived.” [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] With this property it is possible to create
provenance elements associated as source data with a
data creation element.
• dcterms:created – This property specifies the
creation date of a resource and can be used to set the
creation time attribute associated with the execution
of a data creation.
• dcterms:modified – This property specifies the date
on which a resource has been changed. We propose
to represent the modification of a data item as a data
creation which creates a new, modified version of the
original data item. The creation time attribute
associated with this data creation can be set using the
dcterms:modified property.
• dcterms:publisher – The publisher of a resource is
“an entity responsible for making the resource
available” [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] This property may be used to obtain
information about a provider of an information resource
whereas the actual type of the provider (data providing
service, data publisher, or service provider) remains
unclear.
• dcterms:provenance – This property links a resource
to “a statement of any changes in ownership and
custody of the resource since its creation that are
significant for its authenticity, integrity, and
interpretation.” [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] Due to the very general definition, it is
difficult to use such a provenance statement during the
creation of a provenance graph.
      </p>
      <p>
        The Friend of a Friend (FOAF) vocabulary [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] provides
classes and properties to describe entities such as persons,
organizations, groups and software agents. FOAF-based
descriptions can be used to obtain basic information about
actors (e.g. names, group membership, email addresses,
identifying online accounts). Furthermore, FOAF contains the
property foaf:maker and its inverse foaf:made to relate the
described entities to resources made by the entities. These
properties can be used to identify the data creator of a data
item. However, using these properties raises the same
questions as for the dcterms:creator property.
      </p>
      <p>
        The Semantically-Interlinked Online Communities (SIOC)
ontology [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] describes information from online communities.
The ontology associates SIOC items such as blog posts,
comments, and e-mail messages to users that are identified
by their online accounts. The following properties describe
provenance-relevant information:
• sioc:has creator, sioc:creator of,
sioc:has modifier, sioc:modifier of – These
properties relate a SIOC item to a user who created it
or who modified it. In contrast to the corresponding
Dublin Core and FOAF properties, representing the
referenced users according to our provenance model is
less ambiguous: creators and modifiers referenced in a
SIOC-based description are data creating entities.
• sioc:has owner, sioc:owner of – These properties
express ownership of SIOC items. This information may
provide an indication of the relation of a data publisher
to provided data and, thus, might be used to set the
corresponding attribute of the entity in a provenance
graph.
• sioc:earlier version, sioc:later version,
sioc:next version, sioc:previous version – These
properties relate different versions of a SIOC item with
each other and could be used to create relationships
between artifacts in a provenance graph.
      </p>
      <p>
        The Semantic Web Publishing Vocabulary (SWP) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
enables the description of information about provision of data.
With SWP it is possible to represent the attitude of a
legal person to an RDF graph. SWP supports two attitudes:
claiming the graph is true and quoting the graph without
a comment on its truth. These commitments towards the
truth can be used to derive a data publisher’s or a data
creating entity’s relation to provided or created artifacts.
Furthermore, the SWP allows to describe digests and digital
signatures of RDF graphs and to represent public keys.
Similarly, the Web Of Trust schema (WOT) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] enables
descriptions that document the use of public key cryptography tools
to sign documents. However, two differences between WOT
and the signature-related part of SWP exist. First, digital
signatures in SWP are represented as RDF data; WOT, in
contrast, refers to signatures that are individually encoded
in dedicated documents. Second, while the digital
signatures described with WOT sign information resources, the
signatures in SWP-based descriptions sign a specific kind
of data item, namely RDF graphs. However, both,
SWPbased descriptions as well as WOT-based descriptions, can
be used to obtain information about public key and digital
signatures in order to represent them in a provenance graph.
      </p>
      <p>
        Further vocabularies that can be used to describe
provenance information of specific types of data items are the
following:
• The Ontology Metadata Vocabulary (OMV) [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]
describes ontologies. OMV includes properties for
creators, contributers, reviewers, and creation and
modification dates.
• The Proof Markup Language [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] describes
justifications for results of an answering engine or an inference
process.
• The Changeset Vocabulary [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] describes changes to
      </p>
      <p>
        RDF-based resource descriptions.
• The Ouzo Provenance Ontology [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ] describes the run
of a (scientific) workflow, the processed data, and the
entities responsible for the workflow run.
4.3
      </p>
    </sec>
    <sec id="sec-11">
      <title>Existence of Provenance Metadata</title>
      <p>
        To study the existence of metadata on the Web that uses
the aforementioned vocabularies and their
provenance-relevant properties we utilized two Web data indexes available
on the Web, namely the Web service Ping the Semantic
Web (PTSW) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and the Sindice search engine [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. PTSW
is a service that receives notifications from different
applications that create, update, or discover RDF documents on
the Web; PTSW aggregates these notifications and provides
an up-to-date list of existing RDF documents. Furthermore,
the PTSW website provides different statistics. Currently,
PTSW indexes 1,073,2182 RDF documents. For each of the
vocabularies discussed before Table 1 presents the number
of documents registered at PTSW that use this vocabulary.
As the numbers indicate, FOAF and SIOC are widely used.
The other vocabularies are not at all or they are used in an
insignificant number of documents.
      </p>
      <p>The statistics of PTSW may not be representative
because they heavily depend on the applications that notify
PTSW. For this reason, we utilized the Sindice search
engine for another inquiry. Sindice indexes structured data
from the Web. We queried Sindice for documents that
contain RDF statements with the provenance-relevant
properties of the vocabularies. Table 2 in Appendix A lists the
number of documents in the Sindice index for each
property. According to Sindice, Dublin Core is roughly as often
used as FOAF,a conclusion that cannot be drawn from the
PTSW statistics. Furthermore, in contrast to the PTSW
numbers, the Sindice queries reveal that the legacy Dublin
Core metadata elements are used more widely than the
recommended new metadata terms. Consistent with the
findings discovered in the PTSW statistics, the SWP, the Proof
Markup Language, the Ouzo Provenance Ontology, and the
Changeset Vocabulary are not used at all3. Moreover,
considering that Sindice currently indexes about 48.99 million
documents the numbers for the other vocabularies are not
satisfying either. Thus, we conclude that there is only very
little provenance-related, RDF-based metadata available on
the Web.</p>
    </sec>
    <sec id="sec-12">
      <title>OPEN QUESTIONS</title>
      <p>Our analysis of vocabularies that allow to express
provenance information reveals two problems. First, the
vocabularies are partly unsuitable and lack certain features. Our
review of the vocabularies shows a lack of unambiguousness for
certain properties. In particular, it is difficult to distinguish
between the different types of providers and it is
impossible to express the actual relationships between providers of
different types. The same holds for data creators.
Furthermore, it is impossible to describe the execution of a data
access. These descriptions may be required to document
the access of source data executed by a third party.
2All numbers are from February 7, 2009.
3Occurences of 1 to 3 in Table 2 refer to documents that
specify the corresponding properties.</p>
      <p>To overcome these issues we propose to develop a
vocabulary that enables data publishers to describe the provenance
of the provided data more precisely. This new vocabulary
may refine existing vocabularies. Nonetheless, the
development of this new vocabulary should be based on the
presented provenance model.</p>
      <p>
        The second problem is the general lack of
provenancerelated metadata in the Web of linked data. Reasons might
be the lack of suitable vocabularies, a lack of usable tools
to generate provenance-related metadata, and ignorance or
at least a lack of sensitization. The first two reasons can
be ascribed to technical problems that should be solvable
by the development of the proposed vocabulary and by the
implementation of corresponding tools. The lack of
sensitization is a more general problem that must be addressed
by the linked data community. A possible approach may
evolve based on the recently released Vocabulary of
Interlinked Datasets (voiD) [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ] which is a vocabulary to
describe the content of RDF-based datasets and the links
between different datasets. Since voiD enables the discovery
and usage of linked datasets it may raise the awareness of
publishers to provide metadata for their datasets. This
understanding may be used to motivate publishers to provide
provenance information along with their voiD descriptions.
6.
      </p>
    </sec>
    <sec id="sec-13">
      <title>CONCLUSIONS</title>
      <p>In this paper we propose a provenance model for Web data
and we discuss options to obtain provenance information. In
contrast to provenance research in areas such as workflows
and databases the analysis of the Web data provenance must
include information about the access of data in the Web.
Thus, our provenance model includes two dimensions: data
creation and data access. By specifying the relationships
of rather general types of pieces of provenance information
our model describes provenance on an abstract level. This
generality gives applications a choice to refine the model
according to their use case.</p>
      <p>Based on our provenance model we describe options to
obtain provenance information and we analyze vocabularies
to express such information. Our analysis identifies several
open questions. We aim to address these questions in the
future.</p>
      <p>As further future work we will develop concepts to
estimate the trustworthiness of Web data based on our
provenance model. These estimations have to consider subjective
assessments of the elements in a provenance graph (e.g. the
reliability of a data creator) as well as the trustworthiness of
the data that has been used to create the provenance graph.
7.</p>
    </sec>
    <sec id="sec-14">
      <title>ACKNOWLEDGMENT</title>
      <p>We thank Jun Zhao for her valuable feedback on our
provenance model.
8.</p>
    </sec>
    <sec id="sec-15">
      <title>APPENDIX A.</title>
    </sec>
    <sec id="sec-16">
      <title>USAGE OF VOCABULARIES</title>
      <p>This appendix details the result of a Sindice-based search
for documents that contain certain provenance-relevant
properties. Table 2 lists each property together with the number
of documents that contain at least one RDF statement with
the property. The numbers have been recorded on February
7, 2009, by querying the Sindice search engine.</p>
    </sec>
    <sec id="sec-17">
      <title>PROVENANCE ITEM TYPES</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          . D2R Server - Publishing
          <source>Relational Databases on the Semantic Web. Poster at the 5th International Semantic Web Conference (ISWC)</source>
          ,
          <year>Nov</year>
          .
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>U.</given-names>
            <surname>Bojars</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. G.</given-names>
            <surname>Breslin. SIOC Core Ontology</surname>
          </string-name>
          <string-name>
            <surname>Specification</surname>
          </string-name>
          ,
          <source>Revision</source>
          <volume>1</volume>
          .
          <fpage>30</fpage>
          .
          <string-name>
            <surname>Online</surname>
          </string-name>
          , Jan.
          <year>2009</year>
          . Retrieved Feb.
          <volume>7</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>U.</given-names>
            <surname>Bojars</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Giasson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Breslin</surname>
          </string-name>
          .
          <article-title>An Architecture to Discover and Query Decentralized RDF Data</article-title>
          .
          <source>In Proceedings of the 3rd Workshop on Scripting for the Semantic Web</source>
          (SFSW)
          <string-name>
            <surname>at</surname>
            <given-names>ESWC</given-names>
          </string-name>
          ,
          <year>June 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bose</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Frew</surname>
          </string-name>
          .
          <article-title>Lineage retrieval for scientific data processing: A survey</article-title>
          .
          <source>ACM Computing Surveys</source>
          ,
          <volume>37</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>28</lpage>
          , Mar.
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Brickley</surname>
          </string-name>
          . Web Of Trust RDF Ontology. Online, Feb.
          <year>2004</year>
          . Retrieved Feb.
          <volume>7</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Brickley</surname>
          </string-name>
          and
          <string-name>
            <given-names>L. Miller. FOAF Vocabulary</given-names>
            <surname>Specification. Online</surname>
          </string-name>
          , Nov.
          <year>2007</year>
          . Retrieved Feb.
          <volume>7</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Buneman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Khanna</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W. C.</given-names>
            <surname>Tan</surname>
          </string-name>
          .
          <article-title>Data Provenance: Some Basic Issues</article-title>
          .
          <source>In Proceedings of the 20th Conference on Foundations of Software Technology and Theoretical Computer Science (FST TCS)</source>
          . Springer, Dec.
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Buneman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Khanna</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W. C.</given-names>
            <surname>Tan</surname>
          </string-name>
          .
          <article-title>Why and Where: A Characterization of Data Provenance</article-title>
          .
          <source>In Proceedings of the 8th International Conference on Database Theory (ICDT)</source>
          . Springer, Jan.
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Carroll</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hayes</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Stickler. Named Graphs</surname>
          </string-name>
          ,
          <article-title>Provenance and Trust</article-title>
          .
          <source>In Proceedings of the 14th International World Wide Web Conference (WWW )</source>
          . ACM Press, May
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Stenzhorn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Delbru</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Decker</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Tummarello. Semantic Sitemaps</surname>
          </string-name>
          :
          <article-title>Efficient and Flexible Access to Datasets on the Semantic Web</article-title>
          .
          <source>In Proceedings of the 5th European Semantic Web Conference (ESWC)</source>
          . Springer,
          <year>June 2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>P. P.</given-names>
            da
            <surname>Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>McGuinness</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Fikes</surname>
          </string-name>
          .
          <article-title>A Proof Markup Language for Semantic Web Services</article-title>
          .
          <source>Information Systems</source>
          ,
          <volume>31</volume>
          (
          <issue>4-5</issue>
          ):
          <fpage>381</fpage>
          -
          <lpage>395</lpage>
          ,
          <year>June 2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P. P.</given-names>
            da
            <surname>Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>McGuinness</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>McCool</surname>
          </string-name>
          .
          <article-title>Knowledge Provenance Infrastructure</article-title>
          .
          <source>Data Engineering Bulletin</source>
          ,
          <volume>26</volume>
          (
          <issue>4</issue>
          ):
          <fpage>26</fpage>
          -
          <lpage>32</lpage>
          , Dec.
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>L.</given-names>
            <surname>Daigle. WHOIS Protocol</surname>
          </string-name>
          <article-title>Specification</article-title>
          .
          <source>IETF RFC 3912</source>
          ,
          <string-name>
            <surname>Sept</surname>
          </string-name>
          .
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S. B.</given-names>
            <surname>Davidson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. C.</given-names>
            <surname>Boulakia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Eyal</surname>
          </string-name>
          , B. Lud¨ascher,
          <string-name>
            <surname>T. M. McPhillips</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Bowers</surname>
            ,
            <given-names>M. K.</given-names>
          </string-name>
          <string-name>
            <surname>Anand</surname>
            , and
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Freire</surname>
          </string-name>
          .
          <source>Provenance in Scientific Workflow Systems. IEEE Data Engineering Bulletin</source>
          ,
          <volume>30</volume>
          (
          <issue>4</issue>
          ):
          <fpage>44</fpage>
          -
          <lpage>50</lpage>
          , Dec.
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Finin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P.</surname>
          </string-name>
          da Silva, and
          <string-name>
            <given-names>D. L.</given-names>
            <surname>McGuinness. Tracking RDF Graph</surname>
          </string-name>
          <article-title>Provenance using RDF Molecules</article-title>
          .
          <source>Technical Report TR-CS-05-06</source>
          , UMBC, Apr.
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Dublin</given-names>
            <surname>Core Metadata Initiative Usage Board. DCMI Metadata</surname>
          </string-name>
          <article-title>Terms</article-title>
          . Online, Jan.
          <year>2008</year>
          . Retrieved Feb.
          <volume>7</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>R.</given-names>
            <surname>Fielding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gettys</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mogul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Frystyk</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          .
          <source>Hypertext Transfer Protocol - HTTP/1.1. IETF RFC</source>
          <year>2068</year>
          , Jan.
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gil</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Artz</surname>
          </string-name>
          .
          <article-title>Towards Content Trust of Web Resources</article-title>
          .
          <source>Journal of Web Semantics</source>
          ,
          <volume>5</volume>
          (
          <issue>4</issue>
          ):
          <fpage>227</fpage>
          -
          <lpage>239</lpage>
          , Dec.
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>S.</given-names>
            <surname>Goldwasser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Micali</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Rivest</surname>
          </string-name>
          .
          <string-name>
            <given-names>A Digital</given-names>
            <surname>Signature Scheme Secure Against Adaptive</surname>
          </string-name>
          Chosen-Message Attacks.
          <source>SIAM Journal on Computing</source>
          ,
          <volume>17</volume>
          (
          <issue>2</issue>
          ):
          <fpage>281</fpage>
          -
          <lpage>308</lpage>
          , Apr.
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>T. J.</given-names>
            <surname>Green</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Karvounarakis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Tannen</surname>
          </string-name>
          . Provenance Semirings.
          <source>In Proceedings of the 26th Symposium on Principles of Database Systems (PODS)</source>
          . ACM,
          <year>June 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>A.</given-names>
            <surname>Harth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Polleres</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Decker</surname>
          </string-name>
          .
          <article-title>Towards a Social Provenance Model for the Web</article-title>
          .
          <source>In Proceedings of the Workshop on Principles of Provenance, Nov</source>
          .
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hausenblas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Slany</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Ayers</surname>
          </string-name>
          .
          <article-title>A Performance and Scalability Metric for Virtual RDF Graphs</article-title>
          .
          <source>In Proceedings of the 3rd Workshop on Scripting for the Semantic Web</source>
          (SFSW)
          <string-name>
            <surname>at</surname>
            <given-names>ESWC</given-names>
          </string-name>
          ,
          <year>June 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>M.</given-names>
            <surname>Horridge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Parsia</surname>
          </string-name>
          , and
          <string-name>
            <given-names>U.</given-names>
            <surname>Sattler</surname>
          </string-name>
          .
          <article-title>Laconic and Precise Justifications in OWL</article-title>
          .
          <source>In Proceedings of the 7th International Semantic Web Conference (ISWC)</source>
          . Springer, Oct.
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>I.</given-names>
            <surname>Jacobs</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Walsh</surname>
          </string-name>
          .
          <source>Architecture of the World Wide Web</source>
          , Volume One.
          <source>W3C Recommendation</source>
          , Online, Dec.
          <year>2004</year>
          . Retrieved Feb.
          <volume>7</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>L.</given-names>
            <surname>Moreau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Plale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Miles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Goble</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Missier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Barga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Simmhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Futrelle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>McGrath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Myers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Paulson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bowers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ludaescher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kwasnikowska</surname>
          </string-name>
          , J. Van den Bussche, T. Ellkvist,
          <string-name>
            <given-names>J.</given-names>
            <surname>Freire</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Groth</surname>
          </string-name>
          .
          <article-title>The Open Provenance Model</article-title>
          .
          <source>Technical report, Electronics and Computer Science</source>
          , University of Southampton,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>E.</given-names>
            <surname>Oren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Delbru</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Catasta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Stenzhorn</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Tummarello.</surname>
          </string-name>
          <article-title>Sindice.com: A Document-oriented Lookup Index for Open Linked Data</article-title>
          .
          <source>International Journal of Metadata, Semantics and Ontologies</source>
          ,
          <volume>3</volume>
          (
          <issue>1</issue>
          ):
          <fpage>37</fpage>
          -
          <lpage>52</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>R.</given-names>
            <surname>Palma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hartmann</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Haase. OMV Ontology</surname>
          </string-name>
          <article-title>Metadata Vocabulary for the Semantic Web</article-title>
          ,
          <year>v2</year>
          .
          <fpage>4</fpage>
          .
          <string-name>
            <surname>Online</surname>
          </string-name>
          , Jan.
          <year>2008</year>
          . Retrieved Feb.
          <volume>7</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>K.</given-names>
            <surname>Scheppe</surname>
          </string-name>
          .
          <article-title>Protocol for Web Description Resources (POWDER): Primer</article-title>
          . W3C Working Draft, Online, Nov.
          <year>2008</year>
          . Retrieved Feb.
          <volume>7</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Simmhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Plale</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Gannon</surname>
          </string-name>
          .
          <article-title>A Survey of Data Provenance in e-Science</article-title>
          .
          <source>SIGMOD Record</source>
          ,
          <volume>34</volume>
          (
          <issue>3</issue>
          ):
          <fpage>31</fpage>
          -
          <lpage>36</lpage>
          , Sept.
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>W. C.</given-names>
            <surname>Tan</surname>
          </string-name>
          . Provenance in Databases: Past, Current, and Future.
          <source>IEEE Data Engineering Bulletin</source>
          ,
          <volume>30</volume>
          (
          <issue>4</issue>
          ):
          <fpage>3</fpage>
          -
          <lpage>12</lpage>
          , Dec.
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>S.</given-names>
            <surname>Tunnicliffe</surname>
          </string-name>
          and
          <string-name>
            <given-names>I. Davis. Changeset</given-names>
            <surname>Vocabulary</surname>
          </string-name>
          . Online, Mar.
          <year>2006</year>
          . Retrieved Feb.
          <volume>7</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>K.</given-names>
            <surname>Verma</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Sheth</surname>
          </string-name>
          .
          <article-title>Semantically Annotating a Web Service</article-title>
          .
          <source>IEEE Internet Computing</source>
          ,
          <volume>11</volume>
          (
          <issue>2</issue>
          ):
          <fpage>83</fpage>
          -
          <lpage>85</lpage>
          , Mar.
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          .
          <article-title>A conceptual model for e-science provenance</article-title>
          .
          <source>PhD thesis</source>
          , University of Manchester,
          <year>June 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Alexander</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hausenblas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          . Vocabulary of Interlinked Datasets. Online, Jan.
          <year>2009</year>
          .
          <article-title>Retrieved Feb. 7. This appendix lists examples for various provenance element types in the data creation dimension (cf. Table 3) and in the data access dimension (cf</article-title>
          . Table 4).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>