<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ProvDS: Uncertain Provenance Management over Incomplete Linked Data Streams</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Qian Liu</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Open Distributed Systems</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>TU Berlin</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Germany</string-name>
        </contrib>
      </contrib-group>
      <fpage>2</fpage>
      <lpage>9</lpage>
      <abstract>
        <p>Problem statement Data processing in distributed environments is often across heterogeneous systems, bearing the need to exchange provenance information, such as, how and when data was generated, combined, recombined, and processed. Distributed systems involve multiple participants and data sources which can produce unreliable, erroneous data. Besides, there maybe exists oceans amount of data to deal with, e.g., in elds such as Internet of Things (IoT) and Smart Cities. Therefore, dynamic stream-based data processing mechanisms are more reasonable in these environments. Hence, we propose provenance and recovery-aware data management techniques that take dynamic, incomplete streams as inputs, and simultaneously recover the missing data and compute the provenance over the reconstructed streams. Unlike traditional provenance management techniques, which are applied on complete and static data, our research focuses on dynamic and incomplete heterogeneous data.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Provenance provides the knowledge of how a piece of data or query results were
produced. Provenance is especially important in open environments such as the
IoT where data can be discovered, modi ed or mashed-up by arbitrary parties.
In fact, the IoT involves many uncoordinated data producers, which generate
vast amounts of data with high velocity (e.g., from sensors, mobile devices).
This variety of data streams yields very heterogeneous and incomplete data. In
this context, it is crucial to establish the trust and the transparency of the data
to facilitate reliable insight to the end user. Linked Data [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] provides means to
integrate static heterogeneous data while Linked Stream Data [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] extends these
paradigms to data streams. More speci cally, Linked Stream Data allows us to
integrate static knowledge with dynamic data from distributed sources.
      </p>
      <p>The outcomes of the research proposed in this paper will be applicable over
a range of domains and will bene t users handling di erent data analysis tasks
(e.g., prediction, trend analysis), real-time events streams processing, as well
as stream reasoning over Linked Data. Such environments are vulnerable to
the error propagation problem since these errors cannot be traced without any
knowledge of provenance. In these elds, our approach will allow end-users to
mitigate the in uence of incomplete or erroneous data on their applications.
Our solution will allow users to better grasp how the data and query results
were produced, which is a key element in establishing transparency and data
governance.
3</p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>
        Various types of provenance information have been formalized semantically in
the Open Provenance Model [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In the same context, the W3C PROV model [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
has been introduced to standardize a recommendation for the exchange of
provenance over the Web. Such provenance models provide a way to describe where
data originated and how it is processed and propagated.
      </p>
      <p>
        On the systems side, Glavic et al. introduced Perm [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], which was a
provenanceaware DBMS able to compute, store, and query relational provenance data. Their
work assumed a strict schema, whereas Linked Data Streams are by de nition
schema free.
      </p>
      <p>
        Provenance of Linked Data is often attached to a dataset descriptor that
is typically embedded in a Vocabulary of Interlinked Dataset (VoID) le [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
The HCLS Community Pro le1 provides detailed descriptions what metadata
should be provided by linked datasets. The support for provenance is one reason
for the introduction of named graphs in the latest version of RDF [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Other
approaches, such as nanopublications2, extensively use named graphs to enable
subsets of linked data to be referrable and to describe pieces of data. Provenance
can also be attached to data through annotation-based techniques. These
techniques assign annotations to each of the triples in a dataset and then track these
annotations as they propagate through either the reasoning or query processing
pipeline. Formally, these annotated data are represented by algebraic structures
such as communitative semirings [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Theoharis et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] proposed a
comprehensive theoretical foundation for tracing provenance in RDF queries. Provenance
in Linked Data has also been used to determine and propagate trust values [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        Zimmermann et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] proposed annotating triples with temporal data and
provenance values referring to the source of the triple. The authors used a
standard triple-oriented data model and included temporal and provenance
annotations. A triple takes the form of a statement (Subject, Predicate, Object,
Annotation), i.e., an N-Quad. Such statements could be stored in any triplestore
supporting N-Quads. A similar approach was described by Udrean et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
where the authors extended RDF for temporal, uncertain, and provenance
annotations. The main focus of this work was to develop a theoretical model to
manage such metadata information. The authors proposed also a query language
which allowed users to query such metadata. Unlike Zimmermann's solution,
Udrean's solution annotated the predicates with provenance information. In the
same context, Nguyen et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] suggested to use a singleton property instead
of RDF rei cation or named graphs to describe provenance.
      </p>
      <p>The implementations described above of annotated RDF provenance
approaches often do not address \provenance-tracing", i.e., how a query result was</p>
      <sec id="sec-2-1">
        <title>1 http://www.w3.org/TR/hcls-dataset/ 2 http://nanopub.org/wordpress/</title>
        <p>constructed. Moreover, these implementations are only applied to small (around
10 million triples), static, and complete datasets which focus on inferred triples
but do not aim at reporting provenance evolution (e.g., with provenance
polynomials, which are algebraic structures representing how data was combined) for
SPARQL query results.</p>
        <p>
          Wylot et al. in their system TripleProv [
          <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
          ] introduced techniques to store,
trace, and query provenance information in Linked Data. TripleProv returns a
description of the way the results of an RDF query were derived; speci cally,
it gives an explanation how particular pieces of data were combined to deliver
the results. The system allows the user also to scope the query execution with
provenance information, as the user can input a provenance speci cation of the
data he wants to use to derive the results.
        </p>
        <p>None of the approaches described above speci cally target incomplete Linked
Data Streams. These approaches were designed for static data, therefor they do
not take into account dynamics of input streams and they do not allow users to
execute continuous queries over such streams. Moreover, due to the employed
storage models (multiple indicies and provenance annotations) their performance
deteriorates for dynamic data. Besides, all these techniques assume that data
is complete; to the best of our knowledge there are no methods to deal with
incomplete or re-constructed dynamic Linked Data.
4</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Research questions</title>
      <p>In this research, we will investigate two questions in provenance management:
(Q1) How can we enable dynamic provenance tracing in the
context of incomplete Linked Data Streams? This question will
investigate means to deliver the user a continuous provenance, i.e., a dynamic
provenance trace of the continuous query that executes over a long period of
time on dynamic data. The returned provenance trace represents how
particular pieces of data were produced and combined to deliver the results. We
will also introduce methods to track provenance of the recovery process, i.e.,
to provide recovery-aware provenance for continuous queries over multiple
incomplete Linked Data Streams.
(Q2) How can we derive probabilistic provenance graphs on
recovered data? Continuing with tracking provenance of the recovery process for
di erent sources of incomplete Linked Data Streams in Q1, this question will
investigate methodology to derive probabilistic provenance graph based on
Linked Data Streams before and after recovery. To achieve this goal, we will
use a moving correlation on di erent sources of Linked Data Streams and
will compute their fractional contributions to the recovered data, i.e., the
fractional provenance of the recovered pieces of data. Afterwards, we will
trace back the provenance graphs of the source elements (elements used in
the recovery from di erent sources) and use them to reconstruct a
provenance graph of the recovered pieces of data to assess the probability of the
reconstructed provenance graph (see Section 6 for details).</p>
    </sec>
    <sec id="sec-4">
      <title>Hypotheses</title>
      <p>
        An accurate provenance of query results over incomplete and recovered data in
dynamic distributed environments can be computed at low costs in terms of
memory consumption and query execution time (bound complexity).
State-ofthe-art research [
        <xref ref-type="bibr" rid="ref14 ref16 ref17">16, 17, 14</xref>
        ] for static data shows necessary performance overhead
of 20%-30%. Our hypothesis is that the same e ciency is possible for the case
of dynamic data.
6
      </p>
    </sec>
    <sec id="sec-5">
      <title>Approach</title>
      <p>Our main goal is to e ciently handle incomplete and dynamic data. Such data
has to be recovered online and the lineage, i.e., full history of what happened
to data as it went through diverse processes, of the recovered data has to be
provided. We will also include the information of the recovery process in the
provenance description of query results, such that users will be able to know
how recovered pieces of data have in uenced the results of their query.</p>
      <p>To achieve this goal, we propose the following research contributions to
answer the two research questions (Q):</p>
      <p>Continuous Provenance Polynomial for describing provenance in a
dynamic setting (Q1)
Online provenance-aware recovery of incomplete Data for
recovering incomplete data and tracing provenance of the recovery process
(Q1)
Fractional provenance of recovered incomplete Linked Data streams
for discovering provenance of data recovered with external
recovery techniques (Q2)</p>
      <p>
        Continuous provenance polynomial over Linked Data streams. The
goal of this task is to provide a dynamic provenance trace of continuous queries,
i.e., a continuous provenance polynomial. This task builds on TripleProv [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]
generalizing provenance polynomials to incomplete Linked Data Streams. We
will target continuous queries executed over such data. In addition, in these
provenance polynomials, we will include information on the recovery process.
      </p>
      <p>The main challenge of this task relates to dealing with the velocity of the
input data. The provenance polynomial has to satisfy two requirements: i) It has
to be computed e ciently in a continuous fashion along with the execution of
the query, and ii) it has to show how the query execution process evolves over
time.</p>
      <p>Figure 1 illustrates a system that consists of a knowledge base with data on
drivers and companies they work for (KB=(driver1, worksFor, company1),(driver2,
worksFor, company1),(driver3, worksFor, company2)). Each driver is equipped
with a device streaming his position (triples vn = (drivern; position; locationi)).
Sjinjn denotes an input stream for the driver n. A triple in the stream has its
provenance: device id, physical state of the device, con guration parameters,
etc. The provenance of the triple is grouped under a provenance annotation gj.
We want to notify two truck drivers working for the same company when they
are within a given distance, e.g., 1km. We also want to be able to trace back
how the noti cation was generated. To detect spatial locations we use a query
Tn = (?drivern; detectedAt; ?locationi). Let n be the result of such query for
the nth driver. ./ denotes a natural join, to obtain information on the proximity
of the two drivers.</p>
      <p>S|out|
(gB⊗gC)⊗gX r1</p>
      <p>r2
(gA⊗gD)⊗gX r3</p>
      <p>rn
Γ5( (gB⊗gC)⊗gX )</p>
      <p>⋈34
Γ3(gB⊗gC)</p>
      <p>⋈12
Γ1(gB) T1(S)</p>
      <p>T2(S) Γ2(gC)
gA v1</p>
      <p>v2
gB v3</p>
      <p>vn
S|in|1
gC v1</p>
      <p>v2
gD v3
vn
S|in|2</p>
      <p>T3(KB) Γ4(gX)</p>
      <p>Online provenance-aware recovery of incomplete Linked Data Streams.
The goal of this task is to develop techniques that will allow us to trace detailed
information on the recovery process over incomplete Linked Data streams. The
core challenges related to this task are: i) how to detect the incompleteness of
the input Linked Data streams, ii) how to handle the dynamics of the input
Linked Data streams, and iii) how to devise highly e cient serialization
strategies for continuous provenance information. Furthermore, in a heterogeneous
integrated environment, the information on the recovery process has to be
exchanged between multiple participants to support the accurate derivation of the
nal provenance.</p>
      <p>Fractional provenance of incomplete Linked Data streams. The goal
of this task is to determine the fractional provenance of recovered pieces of
data, i.e., computing probabilistic information about pieces of data which have
contributed to the recovery process and their impact on the recovered piece of
data.</p>
      <p>The main challenges of this task are dealing with the velocity of the data,
as well as, the need to exchange information on fractional provenance between
di erent participants in a Linked Data streams environment. The computation
of fractional provenance has to be accurate and e cient. In addition, the
information on fractional provenance has to be exchanged in a compact and e cient
way.</p>
      <p>r1,3
r4,3
r2,3</p>
      <p>S1
S2
S3
S4</p>
      <p>R
E
C
O
V
E
R
Y</p>
      <p>S1
S2
S3
S4</p>
      <p>v1
c1 c2 v2
v3
c4
v4
We will evaluate our approaches in three aspects, 1) extra costs for
provenance tracking and tracing, 2) adaptability to changing sources3 of Linked Data
Streams, 3) The accuracy of our query and recovery provenance tracing
mechanisms. We will use open Linked Data collections and open collections of time
series that contain heterogeneous unstructured data. We will integrate these data
collections to our system as static knowledge and stream data. We will also
design a set of workload queries using static and dynamic data. The collections of
integrated Linked Data and time series data we will employ in our experiments
are:
3 Data sources in distributed systems are very often unstable, which embodies in two
aspects: 1) some of the sources can feed data for a period of time, afterwards,
disappear completely (e.g., because of some unexpected abnormalities), however, maybe
reconnect after a resetting. 2) di erent sources may have di erent data generation
speed.</p>
      <p>DBpedia4 is a crowdsourced community e ort to extract structured
information from Wikipedia and make this information available on the Web.
The dataset consists of 6.9 billion RDF triples extracted from Wikipedia.
These datasets will be used to evaluate our provenance computation over
incomplete Linked Data Streams techniques.</p>
      <p>The Web Data Commons project5 extracts structured data from the
Common Crawl, the largest web corpus available to the public. The dataset
contains more than 20 billion RDF triples extracted from the Web. These
datasets will be used to evaluate the scalability of our techniques with big
data collections of di erent structure.</p>
      <p>LinkedSensorData6 is an RDF dataset containing expressive descriptions of
20,000 weather stations in the United States. The dataset contains nearly 2
billion RDF triples describing 160 million observations. These datasets will
be used to evaluate the scalability of our methods using real-world sensor
data.</p>
      <p>The Swiss Federal O ce for the Environment (FOEN)7 o ers access to
streams of time series that describe weather phenomena (temperature, air
pressure, humidity, precipitation, etc.). The time series contain from 200'000
up to few millions of values each. These datasets will be useful for the
evaluation of the compression scalability with the length of time series.
8</p>
    </sec>
    <sec id="sec-6">
      <title>Re ections</title>
      <p>Large distributed systems with billions of connected devices generate enormous
amounts of values every minute. These values are produced in multiple ways for
a speci c scenario, under di erent circumstances, with di erent reputation, etc.
This heterogeneous environment requires assessments of quality, reliability, and
trustworthiness. With the Linked Data approaches dynamic data can be
combined with static knowledge to enable complex data analysis. This combination,
however, introduces incomplete and possibly erroneous data to a knowledge base.
The number of elements in the physical infrastructure, failures and uncertainty
cannot be avoided, such as sensor failures, power outages, sensor to central server
transmission problems, etc. In such scenarios we need additional provenance
information describing the data involved in the recovery of the missing elements
and its fractional contribution. This process increases the overall requirement of
transparency from simple physical device information to algorithmic methods,
pieces of data involved in producing a value, the probabilistic contribution of
correlated elements to the recovery, i.e., the fractional provenance.</p>
      <p>To the best of our knowledge, our research is one of the pioneers in the area
of handling provenance over incomplete Linked Data Streams.</p>
      <sec id="sec-6-1">
        <title>4 http://wiki.dbpedia.org 5 http://webdatacommons.org 6 http://wiki.knoesis.org/index.php/LinkedSensorData 7 https://www.bafu.admin.ch/bafu/en/home.html</title>
        <p>I would like to express my deep gratitude to my supervisors Prof. Dr.
Manfred Hauswirth and Dr. Marcin Wylot for their patient guidance, constructive
suggestions of this research proposal.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Heath</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Linked Data: Evolving the web into a global data space</article-title>
          .
          <source>Synthesis lectures on the semantic web: theory and technology</source>
          , vol.
          <volume>1</volume>
          , no.
          <issue>1</issue>
          , pp.
          <volume>1</volume>
          {
          <issue>136</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Le-Phuoc</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parreira</surname>
            ,
            <given-names>J.X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hauswirth</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Linked stream data processing (</article-title>
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Moreau</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cli</surname>
            <given-names>ord</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Freire</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Futrelle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Gil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Groth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Kwasnikowska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Miles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Missier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Myers</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          , et al.:
          <article-title>The Open Provenance Model core speci cation (v1. 1)</article-title>
          .
          <source>Tech. Rep. 6</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Moreau</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Missier</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>PROV-DM: The prov data model, W3C recommendation</article-title>
          (
          <issue>30 April 2013</issue>
          ), http://www.w3.org/TR/prov-dm/
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Glavic</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Perm: E cient Provenance Support for Relational Databases</article-title>
          .
          <source>Ph.D. thesis</source>
          , University of Zurich (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Alexander</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hausenblas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <source>Describing Linked Datasets with the VoID Vocabulary (03 March</source>
          <year>2011</year>
          ), http://www.w3.org/TR/void/
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wood</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lanthaler</surname>
          </string-name>
          , M.
          <source>: RDF 1.1 Concepts</source>
          and Abstract
          <string-name>
            <surname>Syntax</surname>
          </string-name>
          (
          <volume>25</volume>
          <issue>February 2014</issue>
          ), http://www.w3.org/TR/rdf11-concepts/
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Green</surname>
            ,
            <given-names>T.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karvounarakis</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tannen</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Provenance semirings (</article-title>
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Theoharis</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fundulaki</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karvounarakis</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Christophides</surname>
          </string-name>
          , V.:
          <article-title>On provenance of queries on semantic web data</article-title>
          .
          <source>IEEE Internet Computing</source>
          <volume>15</volume>
          (
          <issue>1</issue>
          ),
          <volume>31</volume>
          {39 (Jan
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Hartig</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Querying trust in rdf data with tsparql</article-title>
          .
          <source>The Semantic Web: Research</source>
          and Applications pp.
          <volume>5</volume>
          {
          <issue>20</issue>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Zimmermann</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lopes</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Straccia</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>A general framework for representing, reasoning and querying with annotated semantic web data</article-title>
          .
          <source>Web Semant</source>
          .
          <volume>11</volume>
          ,
          <issue>72</issue>
          {95 (Mar
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Udrea</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Recupero</surname>
            ,
            <given-names>D.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Subrahmanian</surname>
            ,
            <given-names>V.S.:</given-names>
          </string-name>
          <article-title>Annotated rdf</article-title>
          .
          <source>ACM Trans. Comput. Logic</source>
          <volume>11</volume>
          (
          <issue>2</issue>
          ),
          <volume>10</volume>
          :1{
          <fpage>10</fpage>
          :41 (Jan
          <year>2010</year>
          ), http://doi.acm.
          <source>org/10</source>
          .1145/1656242. 1656245
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bodenreider</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sheth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Don't like rdf rei cation?: making statements about statements using singleton property</article-title>
          .
          <source>In: Proceedings of the 23rd international conference on World wide web</source>
          . pp.
          <volume>759</volume>
          {
          <fpage>770</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Wylot</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cudre-Mauroux</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Groth</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>TripleProv: E cient processing of lineage queries in a native RDF store</article-title>
          .
          <source>In: Proceedings of the 23rd international conference on World wide web</source>
          . pp.
          <volume>455</volume>
          {
          <fpage>466</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Wylot</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cudre-Mauroux</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Groth</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Executing provenance-enabled queries over web data</article-title>
          .
          <source>In: Proceedings of the 24th International Conference on World Wide Web</source>
          . pp.
          <volume>1275</volume>
          {
          <fpage>1285</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Glavic</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alonso</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>The Perm Provenance Management System in Action</article-title>
          .
          <source>In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data</source>
          . pp.
          <volume>1055</volume>
          {
          <fpage>1058</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Arab</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gawlick</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Radhakrishnan</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glavic</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>A generic provenance middleware for queries, updates</article-title>
          .
          <source>In: 6th USENIX Workshop on the Theory and Practice of Provenance (TaPP</source>
          <year>2014</year>
          ). USENIX Association,
          <source>Cologne (Jun</source>
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>