<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Repairing Provenance Policy Violations by Inventing Non-Functional Nodes</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Saumen</forename><surname>Dey</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Dept. of Computer Science</orgName>
								<orgName type="institution">University of California</orgName>
								<address>
									<settlement>Davis</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Daniel</forename><surname>Zinn</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">Genome Center</orgName>
								<orgName type="institution">University of California</orgName>
								<address>
									<settlement>Davis</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Bertram</forename><surname>Ludäscher</surname></persName>
						</author>
						<title level="a" type="main">Repairing Provenance Policy Violations by Inventing Non-Functional Nodes</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">B4AD5C7E7A635BB1B345C6E64F056C43</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T14:20+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In scientific collaborations, provenance is increasingly used to explain, debug, reproduce, and determine the validity and quality of data products.</p><p>In such environments, it can be infeasible or undesirable to publish the complete provenance of all the final output data products. We have developed PROPUB, a system that allows users to publish a customized version of their data provenance, based on a set of publication and customization requests, while observing certain provenance publication policies, expressed as logic integrity constraints. The user's customization requests may violate one or more integrity constraints. In previous work, we removed additional parts of the provenance graph (i.e., not directly requested by the user) to repair policy violations. In this paper, we present an alternative approach which ensures that all relevant nodes are retained in the provenance graph. The key idea is to introduce new (non-functional) nodes that are used to represent lineage dependencies, without revealing information that the user wants to protect. With this new approach, a user may now explore different provenance publication strategies, and choose the most appropriate one before publishing sensitive provenance data.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>In the emerging paradigm of collaborative, data-intensive science, sharing data products even prior to publication is desirable <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref>. Yet, without a proper scientific publication associated with openly published data, its validity and accuracy might be questionable. This is problematic in an open environment, where published data by one scientist is used by another scientist as input for further data analyses. In such an environment, data provenance (the lineage and processing history of data) can help to ensure data quality <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b4">5,</ref><ref type="bibr" target="#b5">6,</ref><ref type="bibr" target="#b6">7]</ref>. It is thus desirable to publish data products together with their provenance.</p><p>In many cases, however, provenance data can be sensitive and may contain private information or intellectual property that should not be revealed <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8,</ref><ref type="bibr" target="#b4">5]</ref>. Consequently, a balancing act (Figure <ref type="figure">1</ref>) is necessary between (i) the desire to publish provenance data so that collaborators can understand and rely on the shared data products, and (ii) the need to protect sensitive information, e.g., due to privacy concerns or intellectual property issues.</p><p>We view provenance as a bipartite, directed, acyclic graph, capturing which data nodes were consumed and produced, respectively, by invocation (i.e., computation) nodes. Our model thus corresponds to the Open Provenance Model (OPM) which captures the dependencies between data artifacts and invocations <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b9">10]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>The Balancing Act</head><p>Provenance Publishing</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Privacy &amp; Relevancy Concerns</head><p>Fig. <ref type="figure">1</ref>. In collaborative settings, scientists publish provenance for an improved understanding of the result data. With increasing privacy concerns, collaborators have to choose the right balance between providing sufficient provenance data and protecting sensitive information.</p><p>To sanitize provenance graphs, a scientist can remove sensitive data nodes or invocations nodes from the provenance graph. Alternatively, she can abstract a set of sensitive nodes by grouping them into a single, abstract node. This update may violate some of the integrity constraints of the provenance graph <ref type="bibr" target="#b10">[11]</ref>. For example, grouping multiple nodes into one abstraction node may introduce new dependencies which were absent in the initial provenance graph. Hiding nodes may also make some nodes in the final graph appear independent of each other even though they are dependent in the initial graph. Thus, one can no longer trust that the published provenance data is "correct" (e.g., there are no false dependencies) or "complete" (e.g., there are no false independencies). Therefore, we propose a system that allows a publisher to provide a high-level specification what parts of the provenance graph are to be published and which parts are to be sanitized, while guaranteeing that at the same time certain provenance publication constraints are observed.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Motivating Example</head><p>Figure <ref type="figure" target="#fig_1">2</ref>(a) shows the provenance graph (PG) taken from the First Provenance Challenge <ref type="bibr" target="#b11">[12]</ref>. Data nodes are depicted as circles and invocation nodes (representing computations) as boxes; dependencies among them are shown as directed edges. These edges capture the lineage of data and thus are typically drawn from right (newer nodes) to left (older nodes). For example, Figure <ref type="figure" target="#fig_2">3</ref> shows the provenance graph we get after applying all the customization requests. We see that this provenance graph violates three provenance policies: There</p><formula xml:id="formula_0">! "# ! $%# ! $$# ! $&amp;# ' $ ! ! $(# ! $)# * $ ! ! $+# , $ ! * &amp; ! ! $-# , &amp; ! * ( ! ! $.# , ( ! ! $/# ! $"# ! &amp;%# "#$%&amp;'(!%&amp;)*+,*! %&amp;)*+,*-. /0 1! %&amp;)*+,*-. /2 1!</formula><p>(a) Provenance graph (PG) and publish request  to anonymize data nodes {d11, d12}, to abstract nodes {m1, d14, s1}, and to hide {c1, d18, c2} <ref type="figure">,-,!</ref> ./'0(!123(*(23(2&amp;(! is a cycle between d 13 and g 1 , a type error for the edge from s 2 to g 1 (the graph should be bipartite), and there is no dependency between d 19 and d 16 , violating, respectively, the No-Cyclic Dependency (NCD), No-Type Error (NTE) and No-False Independence (NFI) policies. On the other hand, the provenance policies No-Write Conflict (NWC) and No-False Dependence (NFD) are not violated by these customization requests.</p><formula xml:id="formula_1">! "# ! $%# ! $$# ! $&amp;# ' $ ! ! $(# ! $)# * $ ! ! $+# , $ ! * &amp; ! ! $-# , &amp; ! ! $.# ! $"# "#$#%&amp;'()! "*+,-".,! /'0)! /'0)! "*+,-".,</formula><formula xml:id="formula_2">! "#$ ! "%$ &amp; ' ! ! "($ "#! ! )$ ! "*$ ! ""$ ! "'$ ! ")$ $%&amp;'(! )%*(!+,</formula><formula xml:id="formula_3">!! 24&amp;! !! 253! 26! 27(! "! "! 2&amp;3! "!</formula><p>Outline and Contributions. In Section 3, we first describe the provenance model, user requests, provenance policies, and logical architecture of PROPUB. This overall framework was proposed recently in <ref type="bibr" target="#b10">[11]</ref>. In Section 4 we present our main contribution, i.e., a new way to repair policy violations, not by removing additional nodes (as in our prior work), but by introducing new (non-functional) nodes that represent the original lineage dependencies, without revealing information that the user wants to protect. We describe in detail how policy violations will be repaired such that all relevant nodes are retained in the final provenance graph. Related work is discussed in Section 5 and Section 6 presents some concluding remarks and suggestions for future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Provenance Publisher (PROPUB)</head><p>In our recent work, we developed the system PROPUB <ref type="bibr" target="#b10">[11]</ref>, which uses a declarative approach to publish customized policy-aware provenance. PROPUB accepts the initial provenance graph and two types of input specifications. (i) User Requests: the publication and customization requests, and (ii) Provenance Policies: the integrity constraints Relation Name Description used(I, D) An edge specifying that the invocation I used the data artifact D. gen by(D, I) An edge to indicate that the data artifact D was generated by invocation I. actor(I, A) An invocation node I, which was executed by actor A. data(D, R) A data artifact node, whose value can be retrieved using the reference R. dep(X, Y) An auxiliary relation and defined as dep = used ∪ gen by and to specify that node X depends on node Y , irrespective of the node types.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1. Provenance Model for PROPUB</head><p>User Request Description ur:lineage(D) Selects the complete lineage for the data artifact D ur:anonymize(N) Erases the actor/process identify or the data reference from the node N ur:hide(N) Removes the invocation or data node N ur:abstract(N, G) Collapses all nodes N to the abstract group G ur:retain(N) Keeps the node N in the customized provenance Table <ref type="table">2</ref>. User requests for lineage publication and customization to be observed. PROPUB then applies all user requests on the initial provenance graph and checks for policy violations. In case there is a violation, it applies repairs and generates the customized provenance graph.</p><p>Provenance Model. The provenance model used in PROPUB is based on OPM, the Open Provenance Model <ref type="bibr" target="#b12">[13]</ref> and our earlier work <ref type="bibr" target="#b13">[14]</ref> We use the schema shown in Table <ref type="table">1</ref>.</p><p>User Requests. The user requests supported by the PROPUB framework are summarized in Table <ref type="table">2</ref>. The PROPUB system expects user requests to be asserted as relational facts that can then be used by a Datalog rule engine. An user request can be a publication request or a customization request. A customization user request can request to remove a node or an edge or to keep that in the final graph.</p><p>Provenance Policies. The provenance graph supported by PROPUB is a bipartite directed acyclic graph. Also, an invocation can read many data artifacts, but a data artifact is written by exactly one invocation. We developed three provenance policies to verify if these structural properties are satisfied in the provenance graph PG ∆u , which we get after applying all the customization requests on PG . PROPUB has two more provenance policies to ensure the correctness and completeness of information. These provenance policies are briefly defined in Table <ref type="table" target="#tab_1">3</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Provenance Policy Description</head><p>No-Write Conflict (NWC) A data artifact can be written by only one invocation. No-Cyclic Dependency (NCD) There is no cycle between any two nodes X and Y.</p><p>No </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 4. Integrity constraint relations used to detect policy violations</head><p>We use a set of integrity constraints (ICs) to check whether the provenance policies defined in Table <ref type="table" target="#tab_1">3</ref> are satisfied. Table <ref type="table">4</ref> lists the "witness relations" that are defined by rules (not shown) and which are used to detect particular IC violations. 3   </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Logical Architecture</head><p>The logical architecture of the PROPUB system is shown in Figure <ref type="figure" target="#fig_4">4</ref>. The user submits a set of publication and customization requests U 0 . The module Direct-Conflict-Detection detects direct conflicts among the given user-requests. For example, a hide and a retain request on the same node is an obvious conflict. The user needs to update her original requests until all direct conflicts are resolved, resulting in a conflict-free user request U. The Lineage-Selection module computes the sub graph PG , which contains all tobe-published data items (specified using the 'lineage' predicate) together with their complete provenance.</p><p>The Request-Policy-Evaluation module calculates the updates (∆u: inform of insert and delete) needed to apply all the user requests from U on PG . It applies ∆u on PG and get a customized provenance graph PG ∆u . Then it checks if all the selected provenance policies (PP) are observed by evaluating respective integrity constraints. In case some of the policies are violated, this module calculates updates (∆p: inform of insert and delete) needed to repair the violations. In a final conflict resolution step using the module Implied-Conflict-Detection-Resolution, the system detects all such implied 3 For example., we can detect whether a data node is created by different invocations X and Y and record this as ic:wc(X, Y ). conflicts by comparing ∆u and ∆p. In case an implicit conflict is detected, it selects another subset from the given U and PP following the user preferences. These steps are repeated until there is no more policy violations. It then applies ∆p on PG ∆u to get the customized provenance graph (CG) ready to be published.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Repairing Policy Violations</head><p>If we apply the customization user requests on PG , we get an intermediate provenance graph PG ∆u as we shown in Figure <ref type="figure" target="#fig_1">2</ref>(a), 2(b) and 3. But, PG ∆u may violate one or more provenance policies. In case the PG ∆u violates a structural policy (NWC, NCD, and NTE), it will no more be a proper provenance graph. Also, in case it violates a non-structural policy (NFI and NFD), PG ∆u may contain incorrect information or may become incomplete. Thus, user will not be able to publish PG ∆u . To resolve this issue, we apply the customization requests U on PG in a strategic way such that it confirms to all the provenance policies.</p><p>Our strategy is primarily based on two ideas (i) inventing non-functional nodes, and (ii) converting user requests using other forms of user requests.</p><p>Inventing Non-functional Nodes. In case PG ∆u has a structural violation, PROPUB resolves the violation by adding a new non-functional node. A non-functional node is added to maintain the structure of provenance graph. Presence of a non-functional node in the final customized graph may represent one data or invocation node or a set of data and invocation nodes. No mapping is maintained between the non-functional node and the nodes it replaced. Also, it will not carry any URL. Thus, no one will be able to reach to the value of a data artifact or the source code of an actor from a non-functional  node. PROPUB invents minimum numbers of non-functional nodes to resolve a policy violation.</p><p>PROPUB uses the same strategy to resolve NFD policy violations. The fix to the violation of this policy is complex and may need more than one non-functional node to be added. In spite of this complexity, PROPUB resolves the violation using the minimum numbers of non-functional nodes.</p><p>Converting User Requests. A publisher can use ur:hide requests to hide individual nodes or the partial structure of the provenance graph. When we apply these user requests all the selected nodes and the associated edges are removed from the provenance graph PG and a set of independence may be created which violates the NFI policy. We can use the inventing new non-functional node strategy as discussed above and replace the selected node by a non-functional node to resolve this policy violation. But, this approach keeps the structure of the original provenance graph in the final provenance graph. Instead, PROPUB converts these ur:hide user requests into an equivalent set of ur:abstract user requests so that all the selected nodes are removed and no unintended dependencies are removed.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Repairing Structural Policy Violations</head><p>No-Type Error. This policy is violated in case there is a direct dependency between two nodes of same type (i.e. a dependency between two data artifacts or a dependency between two invocations). PROPUB invents a non-functional invocation node in case the policy violation is between two data artifacts as shown in Fig. <ref type="figure" target="#fig_6">5</ref>. In the similar way, PROPUB invents a non-functional data node in case the policy violation is between two invocation nodes. We used the rules as shown below to create the non-functional nodes and fix the violations of this policy.</p><p>del_dep(X,Y) :-ic:te(X,Y). add_data(f(X,Y),T) :-ic:te(X,Y), d_actor(X,_), T='ic:te'. add_actor(f(X,Y),T) :-ic:te(X,Y), d_data(X,_), T='ic:te'. add_dep(X,f(X,Y)) :-ic:te(N1,N2), X is N1. add_dep(f(X,Y),Y) :-ic:te(N1,N2), Y is N2.</p><p>No-Write Conflict. This policy is violated in case there are N (N ≥ 2) gen by edges for a data node. To resolve this violation PROPUB removes incorrect gen by edges for the violated data node and keeps only one gen by edge, which is there in PG. But, this  may violate the NFI policy as it removes dependencies for N − 1 invocation nodes.</p><p>To get around this side effect PROPUB invents N − 1 non-functional data nodes and creates N − 1 gen by edges as shown in Fig. <ref type="figure">6</ref>. Lastly, it copies all the used edges for the violated data node over to all N − 1 non-functional data nodes. Following rules are used to create the non-functional nodes and fix the violations of this policy: No-Cyclic Dependency. This policy is violated in case a node is reachable from itself. In Fig. <ref type="figure">7</ref>, there is a cycle between a invocation node and a data node. To fix this violation PROPUB invents a non-functional invocation node and creates a used edge between the data node and the non-functional invocation node. Then it removes all the gen by edges from the invocation node (except the one with the data node with which it has the cycle) and copies them over to the non-functional invocation node. In the similar way, PROPUB resolves this violation between two invocation nodes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Repairing No-False Independence (NFI) Policy Violations</head><p>This policy is violated in case two nodes are not dependent in PG ∆u even though they are in PG . This may occur in case the ur:hide user requests are applied on PG as shown in Fig. <ref type="figure" target="#fig_10">8</ref>. One way to resolve this violation is to insert direct dependencies, which are there in PG but missing in PG ∆u , between any two nodes in PG ∆u . But, this process may add too many edges and the graph may become unreadable. One optimization to this process is to develop transitive dependencies to reduce the total number of new edges needed. This may be computation intensive. PROPUB uses a different strategy to fix this violation. Following rules are used to transform the ur:hide requests into an equivalent set of ur:abstract requests:</p><p>hide_connected(X,Y) :-ur:hide(X), ur:hide(Y), dep(X,Y). hide_connected(X,X) :-ur:hide(X). hide_connected(X,Y) :-hide_connected(Y,X). hide_connected(X,Y) :-hide_connected(X,Z), hide_connected(Z,Y). smaller(X) :-hide_connected(X,Y), X &lt; Y. minimum(X) :-ur:hide(X), not(smaller(X)). abstract_hide(X,G) :-hide_connected(X,G), minimum(G).</p><p>The customization user requests ur:abstract removes nodes from PG , but does not violate the NFI policy. To avoid the NFI polic violations PROPUB transforms the ur:hide user requests into an equivalent set of ur:abstract user requests. These will be applied to PG in the same way the User issued ur:abstract requests are applied.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Repairing No-False Dependence (NFD) Policy Violations</head><p>This policy is violated in case two nodes are dependent in PG ∆u even though they are not in PG . This may occur in case the ur:abstract user requests are applied on PG as shown in Fig. <ref type="figure">9</ref>. In Fig. <ref type="figure">9</ref>(a) we have a partial provenance graph showing the ur:abstract requests and the nodes with direct dependencies with one or more nodes selected to be abstracted. This figure shows that in PG the data artifact '1' depends on data artifact 'a' and 'b'. In the similar way, the data node '2' depends on invocation nodes 'b' and 'c' and so on. Now, if we apply these ur:abstract requests by collapsing all the selected nodes into a abstracted node then in PG ∆u the data artifact '1' become depended on nodes 'a', 'b', 'c', 'd', and 'e' and thus making PG ∆u incorrect, as shown in Fig. <ref type="figure">9</ref>.</p><formula xml:id="formula_4">!" #" !" $" %" &amp;" '"<label>(" )" *"</label></formula><p>+"</p><p>#!$%&amp;#'%"</p><p>(a) The ur:abstract user requests</p><formula xml:id="formula_5">!" #" !" $" %" &amp;" '"<label>(" )" *"</label></formula><p>+"</p><p>(b) The dependencies created by applying these ur:abstract user requests Fig. <ref type="figure">9</ref>. In (a) we show the boundary of one ur:abstract user requests set and the nodes with a direct dependency with one or more nodes selected to be abstracted. After these ur:abstract user requests are applied on PG we get a new set of dependencies as shown in (b).</p><p>To avoid this policy violations PROPUB takes a systematic three stages approach to apply the ur:abstract user requests. Instead of collapsing into one abstracted node, it invents a number of non-functional data and invocation nodes to maintain the dependencies between any two nodes in PG ∆u as they are in PG . This systematic approach ensures that the minimum number of non-functional nodes are invented. In the first stage, PROPUB develops two sets in and out. The in is a set of data nodes which is used by some of the invocation nodes selected to be abstracted and invocation nodes which generated some of the data nodes selected to be abstracted. The out is a set of data nodes which is generated by one of the invocation node selected to abstracted and invocation nodes which used some of the data nodes selected to abstracted. It also calculates the dependencies for each of the node in set out on the nodes of the set in. Now, PROPUB creates non-functional data nodes for each of the invocation nodes from the sets in and out. One non-functional data node is created for exactly one invocation node from the set in through a gen by edge. One non-functional data node is created for more then one invocation node from the set out through used edges in case these invocations have the same dependencies on the set in. At this stage, a nonfunctional data node is connected either to a node from the set in or to one or many nodes from the set out. For example, invocation node '4' '5' and depends on invocation nodes 'b' and 'c' and PROPUB will create only one non-functional data node and two used edges. This is shown in Fig. <ref type="figure">10(a)</ref>.</p><p>In the second stage, it calculates the list of dependencies of all nodes from the set out to the nodes from in. PROPUB creates one non-functional invocation node for each of these unique dependency lists and it creates gen by edges for nodes from the set out which has the same dependency list. Then it creates used edges to connect to the nodes in in set from any of these non-functional invocation nodes. It will connect with respective non-functional data node created in the last stage in case an edge needs to be created with an invocation either from in or out. This outcome is shown in Fig. <ref type="figure">10(b)</ref>.   In the final stage, PROPUB combines nodes if possible. For example, in Fig. <ref type="figure">10(b</ref>) the path from node '6' to node 'e' has three consecutive non-functional nodes with no other dependencies. These three nodes can be replaced by only one non-functional data node. The result is shown in Fig. <ref type="figure">10(c</ref>). Now, PROPUB removes all the nodes selected to be abstracted and associated edges from PG .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4">Algorithm</head><p>The algorithm mentioned in Fig. <ref type="figure" target="#fig_12">11</ref> finds the customized provenance graph, if available. In this approach, we add non-functional nodes to repair policy violations. In Figure <ref type="figure" target="#fig_2">3</ref> we show that PG ∆u has a structural policy violation between nodes g 1 and s 2 . Using this approach, we introduce a non-functional data node d such that d is dependent on g 1 ; and s 2 is dependent on d. Now, to fix the cycle between d 13 and g 1 we introduce the non-functional invocation node g 2 and create a dependency (gen by) edge from d 15 to g 2 . Then, we get the final CG as shown in Figure <ref type="figure" target="#fig_1">12</ref>. Note that we are now able to keep all the relevant nodes in CG.</p><formula xml:id="formula_6">! "#$ ! "%$ !"# ! &amp;$ ! "'$ ! ""$ ! "($ ! "&amp;$ !$# !# %&amp;'# !# %()# !# %')# !# %*# !# %+,#</formula><p>-./+0-'+1) "23 #!"4# -./+0-'+1/ "3 #!"4# -./+0-'+15 "3 #!"4# -%6%7589,1) "" 4# -%6%7589,1) ": 4# ;8),1) "&lt; 4# ;8),1' " 4# ;8),1' : 4# /:# ! $ ! ")$ !:# Fig. <ref type="figure" target="#fig_1">12</ref>. Customized Provenance Graph after repairing all policy violations 5 Related Work</p><p>In <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b4">5,</ref><ref type="bibr" target="#b5">6,</ref><ref type="bibr" target="#b6">7]</ref>, it has been observed that provenance can be used, e.g., to interpret results, diagnose errors, fix bugs, improve reproducibility, and generally to build trust on the final data products and the underlying processes. In addition, provenance can be used to enhance exploratory processes <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b15">16,</ref><ref type="bibr" target="#b16">17]</ref>, and techniques have been developed to deal with provenance efficiently <ref type="bibr" target="#b17">[18,</ref><ref type="bibr" target="#b18">19]</ref>. In many cases, provenance carries sensitive information, which can cause privacy concerns related to a data, actor, or workflow specification. Studying provenance, one can capture the functionality (being able to guess the output of the actor given a set of inputs) of an actor (module), or the execution flow of a workflow <ref type="bibr" target="#b7">[8]</ref>.</p><p>The security view approach <ref type="bibr" target="#b4">[5]</ref> limits the available provenance to a user by providing a partial view of the workflow through a role-based access control mechanism, and by defining a set of access permissions on actors, channels, and input/output ports as specified by the workflow owner at design time. The ZOOM * UserViews approach <ref type="bibr" target="#b19">[20]</ref> allows to define a partial, zoomed-out view of a workflow, based on a user-defined distinction between relevant and irrelevant actors. Provenance information is restricted by the definition of that partial view of the workflow.</p><p>In our recent work <ref type="bibr" target="#b10">[11]</ref>, we developed PROPUB, which uses a declarative approach to publish customized policy-aware provenance. In this paper, we developed a new way to repair policy violations, not by removing additional nodes (as in <ref type="bibr" target="#b10">[11]</ref>), but by introducing new (non-functional) nodes that represent the original lineage dependencies, without revealing information that the user wants to protect. We described in detail how policy violations will be repaired such that all relevant nodes are retained in the final provenance graph.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Conclusions</head><p>We discussed the need for provenance in scientific collaboration. Provenance data helps to build trust in the published results and data. However, provenance can also contain sensitive data and/or too much irrelevant detail. Thus, scientists should be able to "customize" provenance data before sharing it.</p><p>Our current PROPUB system is based on the open provenance model (OPM). We plan to extend PROPUB to include model extensions, e.g., to support structured data structures, in particular nested collections <ref type="bibr" target="#b18">[19]</ref>. Furthermore, PROPUB currently suggests only one specific modified graph based on a given U and PP. In future work, we plan to investigate how to extend this approach to rank alternative solutions, thus supporting scientists even more in finding the desirable balance between revealing provenance information and preserving privacy when sharing data with collaborators.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>Figure2(a) shows the provenance graph (PG) taken from the First Provenance Challenge<ref type="bibr" target="#b11">[12]</ref>. Data nodes are depicted as circles and invocation nodes (representing computations) as boxes; dependencies among them are shown as directed edges. These edges capture the lineage of data and thus are typically drawn from right (newer nodes) to left (older nodes). For example, d 16 was generated by an invocation s 2 , and was in turn used by invocation c 2 , denoted by, respectively s 2 gen by ←− d 16 and d 16 used ←− c 2 .Let us assume, the user wants to publish data products d 18 and d 19 along with their lineage data. Then, she will issue the publication requests as shown in Figure2(a). A recursive query is used to retrieve all data and invocation nodes upstream from d 18 and d 19 and we get a modified provenance graph (PG ) as shown in Figure2(b). Note that the lineage of d 20 up to s 3 is not relevant for d 18 and d 19 and hence not included in PG . Further assume that before publishing PG , the user also requests a set of customizations as shown in Figure2(b).Figure3shows the provenance graph we get after applying all the customization requests. We see that this provenance graph violates three provenance policies: There</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. (a) User requests to publish the provenance of {d18, d19}; and (b) customization requests</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 3 .</head><label>3</label><figDesc>Fig. 3. Provenance graph after applying all user requests. Provenance policies No-Type Error (NTE), No-Cyclic Dependency (NCD) and No-False Independence (NFI) are violated, while No-Write Conflict (NWC) and No-False Dependence (NFD) are satisfied.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head></head><label></label><figDesc>: A provenance (or lineage) graph is an acyclic graph P G = (V, E), where the nodes V = D ∪ I represent either data items D or actor invocations I. The graph G is bipartite, i.e., the edges E = E use ∪ E gby are either used edges E use ⊆ I × D or generated-by edges E gby ⊆ D × I. Here, a used edge (i, d) ∈ E means that invocation i has read d as part of its input, while a generated-by edge (d, i) ∈ E means that d was output data, written by invocation i.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Fig. 4 .</head><label>4</label><figDesc>Fig. 4. PROPUB Architecture</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Fig. 5 .</head><label>5</label><figDesc>Fig. 5. (a) direct dependency between data nodes causing a type error (NTE violation); (b) PROPUB resolves this by inventing a non-functional invocation node.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_8"><head>Fig. 6 .Fig. 7 .</head><label>67</label><figDesc>Fig. 6. In (a) there are two gen by edges with the data node causing the No-Write conflict policy violation and PROPUB resolves this by inventing a non-functional data node as shown in (b).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_9"><head></head><label></label><figDesc>del_data(D) :-ic:wc(D). del_dep(D,I) :-ic:wc(D), d_gen_by(D,I). add_data(f(D,I),T) :-ic:wc(D), d_gen_by(D,I), T='ic:wc'. add_dep(f(D,I),I) :-ic:wc(D), d_gen_by(D,I). add_dep(I,f(D,I1)) :-ic:wc(D), d_gen_by(D,I1), d_used(I,D).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_10"><head>Fig. 8 .</head><label>8</label><figDesc>Fig. 8. PG and user requests ur:hide are shown in (a). In (b) some dependencies are removed between nodes in PG∆u . PROPUB then resolves this in two steps (i) transforms these ur:hide requests into equivalent ur:abstract requests and (ii) applies these ur:abstract requests on PG and gets the customized graph is shown in (c).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_11"><head>3 Fig. 10</head><label>310</label><figDesc>Fig. 10. Repairing No-False Dependence Policy Violations</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_12"><head>Fig. 11 .</head><label>11</label><figDesc>Fig. 11. Computing CG using the Inventing Non-Functional Nodes approach</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 3 .</head><label>3</label><figDesc>Provenance Policies Write conflict: invocations X and Y are creating the same data node. ic:cd(X, Y) Cyclic dependency between nodes X and Y. ic:te(X, Y) Type error: nodes X and Y are connected via used or gen by edges, but don't have the corresponding node types.</figDesc><table><row><cell>-Type Error (NTE)</cell><cell cols="2">Two nodes with a direct dependency are of different types.</cell></row><row><cell cols="2">No-False Dependence (NFD) Two nodes are dependent in PG</cell><cell>∆u only if they are dependent</cell></row><row><cell></cell><cell>in PG .</cell></row><row><cell cols="3">No-False Independence (NFI) Two nodes are independent in PG</cell><cell>∆u only if they are indepen-</cell></row><row><cell></cell><cell>dent in PG .</cell></row><row><cell>Constraint Description</cell><cell></cell></row><row><cell>ic:wc(X, Y)</cell><cell></cell></row></table><note>ic:fd(X, Y) False dependency: node Y depends on X in PG ∆u , but not in PG .ic:fi(X, Y) False independence: node Y depends on X in PG , but not in PG ∆u .</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head></head><label></label><figDesc>Repairing No-False Dependence Policy Violations</figDesc><table><row><cell cols="3">Algorithm: CALCULATECUSTOMPG</cell></row><row><cell cols="4">INPUT: provenance graph PG, user requests U and provenance policies PP</cell></row><row><cell cols="4">OUTPUT: customized provenance graph CG</cell></row><row><cell cols="4">1. Test for Direct Conflicts // as explained in Section 3.1</cell></row><row><cell cols="4">2. IF there are Direct Conflicts THEN</cell></row><row><cell>3.</cell><cell cols="3">RETURN false // User can resubmit after changing U</cell></row><row><cell cols="2">4. ELSE</cell><cell></cell></row><row><cell>5.</cell><cell cols="3">Compute PG // as explained in Section 3.1</cell></row><row><cell>6.</cell><cell></cell><cell></cell></row><row><cell></cell><cell></cell><cell></cell><cell>3</cell></row><row><cell>8.</cell><cell cols="2">Resolve NCC violations on PG</cell><cell>∆u // as explained in Section 4.1</cell></row><row><cell>9.</cell><cell cols="3">Resolve NWC violations on modified PG</cell><cell>∆u // as explained in Section 4.1</cell></row><row><cell cols="4">10. Resolve NFT violations on modified PG</cell><cell>∆u // as explained in Section 4.1</cell></row><row><cell cols="2">11. CG = PG</cell><cell cols="2">∆u // Final customized provenance graph</cell></row><row><cell cols="3">12. RETURN CG</cell></row></table><note>Transform ur:hide user requests into ur:abstract user requests // as explained in Section 4.2 7.Apply all ur:abstract user requests on PG to get PG ∆u // as explained in Section 4.</note></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title/>
	</analytic>
	<monogr>
		<title level="j">Nature: Special Issue on Data Sharing</title>
		<imprint>
			<biblScope unit="volume">461</biblScope>
			<date type="published" when="2009-09">September 2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Linking multiple workflow provenance traces for interoperable collaborative science</title>
		<author>
			<persName><forename type="first">P</forename><surname>Missier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Ludäscher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bowers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Dey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sarkar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Shrestha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Altintas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Anand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Goble</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Workflows in Support of Large-Scale Science (WORKS), 2010 5th Workshop on</title>
				<imprint>
			<biblScope unit="page" from="1" to="8" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Lineage retrieval for scientific data processing: a survey</title>
		<author>
			<persName><forename type="first">R</forename><surname>Bose</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Frew</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Computing Surveys (CSUR)</title>
		<imprint>
			<biblScope unit="volume">37</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="1" to="28" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">A survey of data provenance in e-science</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Simmhan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Plale</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Gannon</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM SIGMOD Record</title>
		<imprint>
			<biblScope unit="volume">34</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="31" to="36" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Scientific workflow provenance querying with security views</title>
		<author>
			<persName><forename type="first">A</forename><surname>Chebotko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Fotouhi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Yang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">WAIM&apos;08. The Ninth International Conference on</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2008">2008. 2008</date>
			<biblScope unit="page" from="349" to="356" />
		</imprint>
	</monogr>
	<note>Web-Age Information Management</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Provenance for Computational Tasks: A Survey</title>
		<author>
			<persName><forename type="first">J</forename><surname>Freire</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Koop</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Santos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">T</forename><surname>Silva</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computing in Science and Engineering</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="11" to="21" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Privacy issues in scientific workflow provenance</title>
		<author>
			<persName><forename type="first">S</forename><surname>Davidson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Khanna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Roy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Boulakia</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1st International Workshop on Workflow Approaches to New Data-centric Science</title>
				<meeting>the 1st International Workshop on Workflow Approaches to New Data-centric Science</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="1" to="6" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Preserving Module Privacy in Workflow Provenance</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">B</forename><surname>Davidson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Khanna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Panigrahi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Roy</surname></persName>
		</author>
		<idno>CoRR abs/1005.5543</idno>
		<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Moreau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Clifford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Freire</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Futrelle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Gil</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Groth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Kwasnikowska</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Miles</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Missier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Myers</surname></persName>
		</author>
		<title level="m">The open provenance model core specification</title>
				<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
	<note>Future Generation Computer Systems</note>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Provenance browser: Displaying and querying scientific workflow provenance graphs</title>
		<author>
			<persName><forename type="first">M</forename><surname>Anand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bowers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Ludascher</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Data Engineering (ICDE), 2010 IEEE 26th International Conference on</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="1201" to="1204" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">PROPUB: Towards a Declarative Approach for Publishing Customized, Policy-Aware Provenance</title>
		<author>
			<persName><forename type="first">S</forename><surname>Dey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zinn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Ludäscher</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Scientific and Statistical Database Management Conference</title>
				<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
	<note>to appear</note>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Special issue: The first provenance challenge</title>
		<author>
			<persName><forename type="first">L</forename><surname>Moreau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Ludäscher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Altintas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Barga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bowers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Callahan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Clifford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Cohen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Cohen-Boulakia</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Concurrency and Computation: Practice and Experience</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="409" to="418" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Moreau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Clifford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Freire</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Gil</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Groth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Futrelle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Kwasnikowska</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Miles</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Missier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Myers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Simmhan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Stephan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">V</forename><surname>Den Bussche</surname></persName>
		</author>
		<ptr target="http://openprovenance.org/(De-cember2009" />
		<title level="m">OPM: The Open Provenance Model Core Specification</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Exploring scientific workflow provenance using hybrid queries over nested data and lineage graphs</title>
		<author>
			<persName><forename type="first">M</forename><surname>Anand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bowers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mcphillips</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Ludäscher</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Scientific and Statistical Database Management</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="237" to="254" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Provenance and scientific workflows: challenges and opportunities</title>
		<author>
			<persName><forename type="first">S</forename><surname>Davidson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Freire</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SIGMOD Conference</title>
				<imprint>
			<publisher>Citeseer</publisher>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="1345" to="1350" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Managing rapidlyevolving scientific workflows</title>
		<author>
			<persName><forename type="first">J</forename><surname>Freire</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Silva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Callahan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Santos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Scheidegger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Vo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Provenance and Annotation of Data</title>
				<imprint>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="10" to="18" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Provenance for visualizations: Reproducibility and beyond</title>
		<author>
			<persName><forename type="first">C</forename><surname>Silva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Freire</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Callahan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computing in Science &amp; Engineering</title>
		<imprint>
			<biblScope unit="page" from="82" to="89" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Efficient Lineage Tracking For Scientific Workflows</title>
		<author>
			<persName><forename type="first">T</forename><surname>Heinis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Alonso</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2008 ACM SIGMOD conference</title>
				<meeting>the 2008 ACM SIGMOD conference</meeting>
		<imprint>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="1007" to="1018" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Techniques for efficiently querying scientific workflow provenance graphs</title>
		<author>
			<persName><forename type="first">M</forename><surname>Anand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bowers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Ludäscher</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 13th International Conference on Extending Database Technology</title>
				<meeting>the 13th International Conference on Extending Database Technology</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="287" to="298" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Zoom* userviews: Querying relevant provenance in workflow systems</title>
		<author>
			<persName><forename type="first">O</forename><surname>Biton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Cohen-Boulakia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Davidson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 33rd international conference on Very large data bases</title>
				<meeting>the 33rd international conference on Very large data bases</meeting>
		<imprint>
			<publisher>VLDB Endowment</publisher>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="1366" to="1369" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
