<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Big Data Provenance: State-Of-The-Art Analysis and Emerging Research Challenges</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Alfredo</forename><surname>Cuzzocrea</surname></persName>
							<email>alfredo.cuzzocrea@dia.units.it</email>
							<affiliation key="aff0">
								<orgName type="department">DIA Department</orgName>
								<orgName type="institution" key="instit1">University of Trieste</orgName>
								<orgName type="institution" key="instit2">ICAR-CNR Italy</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Big Data Provenance: State-Of-The-Art Analysis and Emerging Research Challenges</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">74AFA1B62A3BA25D1170030B39C07B0E</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T21:39+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Big Data Provenance</term>
					<term>Privacy of Big Data</term>
					<term>Big Data Lineage</term>
					<term>Big Data Derivation</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper focuses the attention on big data provenance issues, and provides a comprehensive survey on state-of-theart analysis and emerging research challenges in this scientific field. Big data provenance is actually one of the most relevant problem in big data research, as confirmed by the great deal of attention devoted to this topic by larger and larger database and data mining research communities. This contribution aims at representing a milestone in the exciting big data provenance research road.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>CCS Concepts</head><p>•Theory of computation → Data provenance;</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">INTRODUCTION</head><p>In big data research, privacy and security of big data (e.g., <ref type="bibr" target="#b12">[13,</ref><ref type="bibr" target="#b13">14,</ref><ref type="bibr" target="#b11">12]</ref>) play a major role. Along with these topics, provenance of big data (e.g., <ref type="bibr" target="#b15">[16,</ref><ref type="bibr" target="#b16">17,</ref><ref type="bibr" target="#b9">10,</ref><ref type="bibr" target="#b17">18,</ref><ref type="bibr" target="#b3">4]</ref>) is relevant as well. Data provenance concerns with the problem of detecting the origin, the creation and the propagation process of data within a data-intensive system. In other words, data provenance consists in the lineage (e.g., <ref type="bibr" target="#b26">[27]</ref>) and derivation (e.g., <ref type="bibr" target="#b21">[22]</ref>) of data and data objects, and it puts its conceptual roots in extensively studies performed in the past in the contexts of arts, literary works, manuscripts, sculptures, and so forth. Another concept that is close to the "data provenance" one is represented by the so-called ownership of data (e.g., <ref type="bibr" target="#b20">[21]</ref>), which refers to the issue of defining and providing information about the rightful owner of data assets, and to the acquisition, use and distribution policy implemented by the data owner. This way, data ownership primarily shapes itself like a data governance process that When applied to big data, provenance problems become prohibitive (e.g., <ref type="bibr" target="#b9">[10]</ref>), mostly due to the enormous size of big data. For instance, one of the most successful data provenance techniques consists in the so-called annotation-based approaches (e.g., <ref type="bibr" target="#b21">[22]</ref>) that propose modifying the input database queries in order to support data provenance tasks, while being able to access all the target data set. Obviously, the latter requirement becomes very hard when applied to big data repositories. Many others research challenges and open issues still arise in big data provenance research. For instance, advanced concepts like confidentiality of the data provenance process, secure and privacy-preserving big data provenance, flexible big data provenance query tools, and so forth, still need to be deeply investigated.</p><p>Inspired by these considerations, in this paper we provide an overview of relevant research issues and challenges of the above-introduced big data provenance problems, by also highlighting possible future efforts within these research directions.</p><p>The remaining part of this paper is organized as follows. Section 2 contains a comprehensive analysis of state-of-theart proposals that focus on big data provenance issues. In Section 3, we recognize and report on emerging challenges in big data provenance research, by highlighting possible promising directions. Finally, Section 4 draws the conclusions of our research.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">STATE-OF-THE-ART ANALYSIS</head><p>Data provenance is relevant for a wide spectrum of typical enterprise data tasks, such as: (i) data validation (e.g., <ref type="bibr" target="#b6">[7]</ref>); (ii) data debugging (e.g., <ref type="bibr" target="#b19">[20]</ref>); (iii) data auditing (e.g., <ref type="bibr" target="#b25">[26]</ref>); (iv ) data quality (e.g., <ref type="bibr" target="#b23">[24]</ref>); (v ) data reliability (e.g., <ref type="bibr" target="#b2">[3]</ref>). Application-wise, the provenance problem has been typically addressed in database management systems (e.g., <ref type="bibr" target="#b8">[9]</ref>), but several efforts even arise in the contexts of workflow management systems (e.g., <ref type="bibr" target="#b14">[15]</ref>) and distributed systems (e.g., <ref type="bibr" target="#b24">[25]</ref>).</p><p>As regards the proper research side, there are several research initiatives that composes the state-of-the-art. Here, we review some of them.</p><p>[11] describes a framework for modeling and capturing provenance in MapReduce jobs and deriving MapReduce tasks, called Kepler. The approach is distributed in nature, and it exploits the MySQL Cluster distributed database system <ref type="bibr" target="#b1">[2]</ref>. <ref type="bibr" target="#b18">[19,</ref><ref type="bibr" target="#b22">23]</ref> proposed an extension of Hadoop <ref type="bibr" target="#b0">[1]</ref> called Reduce and Map Provenance (RAMP). It introduces a wrapperbased method that can be easily deployed on top of Hadoop yet resulting transparent to it. Tracing of data-intensive processes is supported as well.</p><p>[5] describes an extension of Hadoop for implementing provenance detection in MapReduce jobs, called Hadoop-Prov. The goal of HadoopProv is to minimize overheads introduced by computing provenance, which is usually a resource-consuming task. The proposed system provides flexible tools for querying the so-built big data provenance graph.</p><p>Pig Lipstick <ref type="bibr" target="#b5">[6]</ref> is a kind of hybrid big data provenance system that combines the management of fine-grained dependencies, which are typical of database-oriented provenance systems, with the management of workflow-grained dependencies, which are typical of workflow-oriented provenance systems. The internal model for reasoning on big data provenance is graph-like in nature. <ref type="bibr" target="#b3">[4]</ref> proposes anatomy and functionalities of a layer-based architecture for supporting big data provenance. In particular, the architecture is composite in nature and it focuses on the provenance collection, querying and visualization of provenance in the specialized context of scientific applications.</p><p>[17] considers the problem of managing fine-grained provenance in Data Stream Management Systems (DSMS). Indeed, this problem is recognized as particularly hard due to the fact of the need of supporting flexible analysis tools over the so-computed provenance, such as revision processing or query debugging. With this goal in mind, the paper proposes a novel big data provenance framework based on the concept of operator instrumentation. It consists in modifying the behavior of operators in order to generate and propagate fine-grained provenance through several operators of a query.</p><p>CloudProv, a framework for integrating, modeling and monitoring data provenance in Cloud environments, is presented in <ref type="bibr" target="#b17">[18]</ref>. The proposed framework is based on a method that allows us to model collected provenance information as to continuously acquire and monitor such information for real-time applications, according to a serviceoriented paradigm.</p><p>Finally, Oruta, an innovative privacy-preserving public auditing mechanism for supporting data sharing in untrusted Cloud environments is proposed in <ref type="bibr" target="#b25">[26]</ref>. The proposed mechanism makes use of homomorphism authenticators <ref type="bibr" target="#b7">[8]</ref> that allows the third party auditor to check the integrity of shared data from a given user group, yet not superimposing the need for accessing all data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">EMERGING RESEARCH CHAL-LENGES</head><p>A relevant number of issues and challenges in big data provenance research arise. In the following, we will introduce and discuss some noticeable ones.</p><p>Accessing Big Data Big data are prominently enormous-in-size, hence accessing the entire big data set become problematic. Accessing data is a strict requirement for data provenance techniques, hence this makes applying classical methods not suitable to the particular context of dealing with big data provenance.</p><p>Analyzing Big Data In order to apply data provenance methods, state-of-the-art techniques require to analyze the target (big) data set. Here, a major problem is represented by the scalability of big data, which can be really explosive.</p><p>Scalability Issues When dealing with big data, one of the most problematic drawbacks is represented by scalability, as highlighted before. This again occurs with provenance of big data, as provenance techniques are multi-step in nature and they need to access and process target data repetitively. This poses relevant issues, as big data are typically growing-in-size and large-scale.</p><p>Information Sharing Data provenance methods very often require the need for sharing information among the actors that perform the same data provenance task. The latter is not easy when dealing with big data, as such data are typically distributed over large-scale network environments, hence information sharing introduces relevant research challenges as well as technological drawbacks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Minimum Computational Overhead Requirement</head><p>Data provenance techniques may be data-intensive and resource-consuming. This imposes the need for devising and implementing techniques that introduce a minimum computational overhead, in order to avoid impacting on the performance of the target system, e.g. workflow management systems.</p><p>Query Optimization Issues Data provenance techniques need to access and query data in order to determine their provenance, even in an interactive manner. This applicative requirement introduces severe drawbacks when these techniques run over big data, as querying big data is a crucial open problem at now.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Transformation Issues</head><p>During data provenance tasks, data sources need to be transformed among different data formats. Tracing provenance must be introduced accordingly, in order to track all the different transformations occurred. This topic is a first-class one in the family of big data provenance research issues, which also has several points in common with the data exchange research area.</p><p>When Computing Provenance? There exist two alternatives for computing provenance. One predicates to compute provenance only when the same provenance is required (this is called lazy provenance model ). The other one argues to compute provenance every time data are transformed (this is called eagerly provenance model ). Both models have pros and cons. They also imply different computational overheads. This one is still an open problem to be considered in future efforts.</p><p>Data Modeling Support for Provenance When data sources are processed to detect their provenance, several transformations must be applied, as mentioned above. This also implies the need of devising ad-hoc data models for supporting provenance, as data sources may be significantly different. In this case, semantic techniques seem promising to this direction.</p><p>Heterogeneity of Data Source Models Data provenance techniques usually run over heterogeneous data sources hence they need to cope with heterogeneous data models as well. Therefore, heterogeneity of data sources is a big challenge for such techniques, as data sources expose different formats, (data) types, and schema.</p><p>User Annotation Support for Provenance The data provenance process is usually enriched by user annotation, according to which domain experts are devoted to annotate data in order to enhance the effectiveness of this process. As a consequence, data provenance tools need to introduce adhoc software modules capable of supporting user annotation over big data.</p><p>Secure and Privacy-Preserving Provenance Provenance can represent a security and privacy breach for target data sources. Therefore, a relevant issue for future efforts is represented by the need for secure and privacy-preserving big data provenance techniques. Possible solutions are those based on accepting a sort of compromise among security and privacy of data sources from a side, and provenance of data sources from the other side.</p><p>Flexible Provenance Query Tools Provenance needs to be used not only to detect the lineage and the derivation of data and data objects, but also in the vest of enabling methodology for flexible query tools focused to support next-generation cybersecurity systems where users may be interested in tracking records generated by a particular person in a specific research lab, or detecting the confidentiality of tracked records, i.e. understanding who may have looked these tracked records beyond to authorized people.</p><p>Provenance Visualization Tools Visualization tools are extremely important for big data provenance techniques, as the provenance one is an interactive process that typically requires intelligent tools for visualizing actual results and supporting next-step decisions. This will be a relevant research challenge in future years.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">CONCLUSIONS</head><p>This paper has provided a comprehensive survey on stateof-the-art analysis and emerging research challenges in the context of big data provenance research. We have highlighted benefits and limitations of most relevant proposals, and we have described possible research directions in the exciting big data provenance research road.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>c 2016 ,</head><label>2016</label><figDesc>Copyright is with the authors. Published in the Workshop Proceedings of the EDBT/ICDT 2016 Joint Conference (March 15, 2016, Bordeaux, France) on CEUR-WS.org (ISSN 1613-0073). Distribution of this paper is permitted under the terms of the Creative Commons license CCby-nc-nd 4.0 details an organization's legal ownership of enterprise-wide data.</figDesc></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<ptr target="http://wiki.apache.org/hadoop" />
		<title level="m">Apache Hadoop</title>
				<imprint>
			<date type="published" when="2015-01-15">2015-01-15</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<ptr target="https://www.mysql.com/products/cluster/" />
		<title level="m">MySQL Cluster CGE</title>
				<imprint>
			<date type="published" when="2015-01-15">2015-01-15</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Assessing data reliability in an information system</title>
		<author>
			<persName><forename type="first">N</forename><surname>Agmon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ahituv</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. of Management Information Systems</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="34" to="44" />
			<date type="published" when="1987">1987</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">A layer based architecture for provenance in big data</title>
		<author>
			<persName><forename type="first">R</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Imran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Seay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Walker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE International Conference on Big Data, Big Data 2014</title>
				<meeting><address><addrLine>Washington, DC, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014-10-27">2014. October 27-30, 2014. 2014</date>
			<biblScope unit="page" from="1" to="7" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Hadoopprov: Towards provenance as a first class citizen in mapreduce</title>
		<author>
			<persName><forename type="first">S</forename><surname>Akoush</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Sohan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Hopper</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">5th Workshop on the Theory and Practice of Provenance, TaPP&apos;13</title>
				<meeting><address><addrLine>Lombard, IL, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2013">April 2-3, 2013, 2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Putting lipstick on pig: Enabling database-style workflow provenance</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Amsterdamer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">B</forename><surname>Davidson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Deutch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Milo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Stoyanovich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Tannen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">PVLDB</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="346" to="357" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Roomba: Automatic validation, correction and generation of dataset metadata</title>
		<author>
			<persName><forename type="first">A</forename><surname>Assaf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Senart</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Troncy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 24th International Conference on World Wide Web Companion, WWW 2015</title>
				<meeting>the 24th International Conference on World Wide Web Companion, WWW 2015<address><addrLine>Florence, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2015">May 18-22, 2015 -. 2015</date>
			<biblScope unit="page" from="159" to="162" />
		</imprint>
	</monogr>
	<note>Companion</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Provable data possession at untrusted stores</title>
		<author>
			<persName><forename type="first">G</forename><surname>Ateniese</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">C</forename><surname>Burns</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Curtmola</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Herring</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kissner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><forename type="middle">N J</forename><surname>Peterson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">X</forename><surname>Song</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2007 ACM Conference on Computer and Communications Security, CCS 2007</title>
				<meeting>the 2007 ACM Conference on Computer and Communications Security, CCS 2007<address><addrLine>Alexandria, Virginia, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2007">October 28-31, 2007. 2007</date>
			<biblScope unit="page" from="598" to="609" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Provenance management in curated databases</title>
		<author>
			<persName><forename type="first">P</forename><surname>Buneman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Chapman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Cheney</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the ACM SIGMOD International Conference on Management of Data</title>
				<meeting>the ACM SIGMOD International Conference on Management of Data<address><addrLine>Chicago, Illinois, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2006">June 27-29, 2006. 2006</date>
			<biblScope unit="page" from="539" to="550" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Milieu: Lightweight and configurable big data provenance for science</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Cheah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">R</forename><surname>Canon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Plale</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Ramakrishnan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE International Congress on Big Data, BigData Congress 2013</title>
				<imprint>
			<date type="published" when="2013-06-27">June 27 2013-July 2, 2013. 2013</date>
			<biblScope unit="page" from="46" to="53" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Provenance for mapreduce-based data-intensive workflows</title>
		<author>
			<persName><forename type="first">D</forename><surname>Crawl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Altintas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 6th Workshop on Workflows in Support of Large-Scale Science, co-located with</title>
				<meeting>the 6th Workshop on Workflows in Support of Large-Scale Science, co-located with<address><addrLine>SC11, Seattle, WA, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2011-11-14">November 14, 2011. 2011</date>
			<biblScope unit="page" from="21" to="30" />
		</imprint>
	</monogr>
	<note>WORKS&apos;11</note>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Privacy and security of big data: Current challenges and future research perspectives</title>
		<author>
			<persName><forename type="first">A</forename><surname>Cuzzocrea</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the First International Workshop on Privacy and Secuirty of Big Data, PSBD@CIKM 2014</title>
				<meeting>the First International Workshop on Privacy and Secuirty of Big Data, PSBD@CIKM 2014<address><addrLine>Shanghai, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014-11-07">November 7, 2014. 2014</date>
			<biblScope unit="page" from="45" to="47" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">A robust sampling-based framework for privacy preserving OLAP</title>
		<author>
			<persName><forename type="first">A</forename><surname>Cuzzocrea</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Russo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Saccà</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Data Warehousing and Knowledge Discovery, 10th International Conference, DaWaK 2008</title>
				<meeting><address><addrLine>Turin, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2008">September 2-5, 2008. 2008</date>
			<biblScope unit="page" from="97" to="114" />
		</imprint>
	</monogr>
	<note>Proceedings</note>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Balancing accuracy and privacy of OLAP aggregations on data cubes</title>
		<author>
			<persName><forename type="first">A</forename><surname>Cuzzocrea</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Saccà</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">DOLAP 2010, ACM 13th International Workshop on Data Warehousing and OLAP</title>
				<meeting><address><addrLine>Toronto, Ontario, Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2010-10-30">October 30, 2010. 2010</date>
			<biblScope unit="page" from="93" to="98" />
		</imprint>
	</monogr>
	<note>Proceedings</note>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Provenance and scientific workflows: challenges and opportunities</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">B</forename><surname>Davidson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Freire</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008</title>
				<meeting>the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008<address><addrLine>Vancouver, BC, Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2008">June 10-12, 2008. 2008</date>
			<biblScope unit="page" from="1345" to="1350" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Ariadne: managing fine-grained provenance on data streams</title>
		<author>
			<persName><forename type="first">B</forename><surname>Glavic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">S</forename><surname>Esmaili</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">M</forename><surname>Fischer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Tatbul</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The 7th ACM International Conference on Distributed Event-Based Systems, DEBS &apos;13</title>
				<meeting><address><addrLine>Arlington, TX, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2013-07-03">June 29 -July 03, 2013. 2013</date>
			<biblScope unit="page" from="39" to="50" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Efficient stream provenance via operator instrumentation</title>
		<author>
			<persName><forename type="first">B</forename><surname>Glavic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">S</forename><surname>Esmaili</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">M</forename><surname>Fischer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Tatbul</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Trans. Internet Techn</title>
		<imprint>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page">26</biblScope>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Provenance as a service: A data-centric approach for real-time monitoring</title>
		<author>
			<persName><forename type="first">R</forename><surname>Hammad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Wu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE International Congress on Big Data</title>
				<meeting><address><addrLine>Anchorage, AK, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014-06-27">2014. June 27 -July 2, 2014. 2014</date>
			<biblScope unit="page" from="258" to="265" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Provenance for generalized map and reduce workflows</title>
		<author>
			<persName><forename type="first">R</forename><surname>Ikeda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Park</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Widom</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Fifth Biennial Conference on Innovative Data Systems Research</title>
				<meeting><address><addrLine>Asilomar, CA, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2011">January 9-12, 2011. 2011</date>
			<biblScope unit="page" from="273" to="283" />
		</imprint>
	</monogr>
	<note>Online Proceedings</note>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">NLP data cleansing based on linguistic ontology constraints</title>
		<author>
			<persName><forename type="first">D</forename><surname>Kontokostas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Brümmer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hellmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lehmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Ioannidis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Semantic Web: Trends and Challenges -11th International Conference, ESWC 2014</title>
				<meeting><address><addrLine>Anissaras, Crete, Greece</addrLine></address></meeting>
		<imprint>
			<publisher>Proceedings</publisher>
			<date type="published" when="2014">May 25-29, 2014. 2014</date>
			<biblScope unit="page" from="224" to="239" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Accountable proof of ownership for data using timing element in cloud services</title>
		<author>
			<persName><forename type="first">M</forename><surname>Mizan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Rahman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Khan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>Haque</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Hasan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on High Performance Computing &amp; Simulation, HPCS 2013</title>
				<meeting><address><addrLine>Helsinki, Finland</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2013">July 1-5, 2013. 2013</date>
			<biblScope unit="page" from="57" to="64" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Transparent provenance derivation for user decisions</title>
		<author>
			<persName><forename type="first">I</forename><surname>Nunes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Miles</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Luck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">J P</forename><surname>De Lucena</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Provenance and Annotation of Data and Processes -4th International Provenance and Annotation Workshop, IPAW 2012</title>
				<meeting><address><addrLine>Santa Barbara, CA, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2012">June 19-21, 2012. 2012</date>
			<biblScope unit="page" from="111" to="125" />
		</imprint>
	</monogr>
	<note>Revised Selected Papers</note>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">RAMP: A system for capturing and tracing provenance in mapreduce workflows</title>
		<author>
			<persName><forename type="first">H</forename><surname>Park</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ikeda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Widom</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">PVLDB</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="issue">12</biblScope>
			<biblScope unit="page" from="1351" to="1354" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Information quality assessment</title>
		<author>
			<persName><forename type="first">L</forename><surname>Pipino</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Encyclopedia of Database Systems</title>
				<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="1512" to="1515" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Security and data accountability in distributed systems: A provenance survey</title>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">S</forename><surname>Tan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">K L</forename><surname>Ko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Holmes</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">10th IEEE International Conference on High Performance Computing and Communications &amp; 2013 IEEE International Conference on Embedded and Ubiquitous Computing, HPCC/EUC 2013</title>
				<meeting><address><addrLine>Zhangjiajie, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2013">November 13-15, 2013. 2013</date>
			<biblScope unit="page" from="1571" to="1578" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Oruta: Privacy-preserving public auditingfor shared data in the cloud</title>
		<author>
			<persName><forename type="first">B</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Li</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE T. Cloud Computing</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="43" to="56" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Subzero: A fine-grained lineage system for scientific databases</title>
		<author>
			<persName><forename type="first">E</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Madden</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Stonebraker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">29th IEEE International Conference on Data Engineering, ICDE 2013</title>
				<meeting><address><addrLine>Brisbane, Australia</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2013">April 8-12, 2013. 2013</date>
			<biblScope unit="page" from="865" to="876" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
