<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Saffron: A Data Value Assessment Tool for Quantifying the Value of Data Assets</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Judie</forename><surname>Attard</surname></persName>
							<email>judie.attard@adaptcentre.ie</email>
							<affiliation key="aff0">
								<orgName type="department">ADAPT Centre</orgName>
								<orgName type="institution">Trinity College Dublin</orgName>
								<address>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Jeremy</forename><surname>Debattista</surname></persName>
							<email>debattij@scss.tcd.ie</email>
							<affiliation key="aff1">
								<orgName type="department">School of Computer Science and Statistics</orgName>
								<orgName type="institution">Trinity College Dublin</orgName>
								<address>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Rob</forename><surname>Brennan</surname></persName>
							<email>rob.brennan@dcu.ie</email>
							<affiliation key="aff2">
								<orgName type="department" key="dep1">ADAPT Centre</orgName>
								<orgName type="department" key="dep2">School of Computing</orgName>
								<orgName type="institution">Dublin City University</orgName>
								<address>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Saffron: A Data Value Assessment Tool for Quantifying the Value of Data Assets</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">8CA171083CED91EFCAD7A1833B7739ED</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T14:06+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Data value</term>
					<term>Data governance</term>
					<term>Data value monitoring</term>
					<term>Data value assessment</term>
					<term>Linked Data</term>
					<term>Explainability</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Data has become an indispensable commodity and it is the basis for many products and services. It has become increasingly important to understand the value of this data in order to be able to exploit it and reap the full benefits. Yet, many businesses and entities are simply hoarding data without understanding its true potential. We here present Saffron; a Data Value Assessment Tool that enables the quantification of the value of data assets based on a number of different data value dimensions. Based on the Data Value Vocabulary (DaVe), Saffron enables the extensible representation of the calculated value of data assets, whilst also catering for the subjective and contextual nature of data value. The tool exploits semantic technologies in order to provide traceable explanations of the calculated data value. Saffron therefore provides the first step towards the efficient and effective exploitation of data assets.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>"Data is the new oil" is a claim supported by many. Even though there are many things that differ between data and oil as a resource, such as their renewability and their effect on the environment, one cannot deny the similarities in their usage and utility potential, as well as in their nature of being indispensable commodities in today's society. We are increasingly relying on data or databased products and services, particularly in recent times, when the use of big data is ever so prevalent, and successful decision-making requires the effective contextual exploitation of information.</p><p>Whether one agrees with the above-mentioned claim or not, it is undeniable that data is, to different extents, valuable. But what is exactly meant by data value? Numerous publications in literature explore this term in various domains. Whilst the existing definitions of value might be somewhat similar, there is currently no consensus on the definition of "data value", or on its representation. Moreover, it is inherently challenging to measure the value of data due to the subjective and contextual nature of value. In fact, to the extent of our knowledge, there currently exists no tool or framework that quantifies the value of data based on various data value dimensions (aspects that characterise data value, e.g. quality, cost, usage). In literature there are some approaches towards measuring one or two of these dimensions, such as <ref type="bibr" target="#b1">[2]</ref><ref type="bibr" target="#b2">[3]</ref><ref type="bibr" target="#b3">[4]</ref>, however these cannot be deemed as appropriate solutions to quantify data value since they do not cater for the highly heterogeneous nature of data value. While it is evident that the use of data has become a vital part of our everyday lives, only few are able to understand the usefulness of measuring of the value of data. In fact, many businesses are hoarding data without actually exploiting it or understanding its potential.</p><p>In order to target the niche in the topic of data value, our goal in this paper is to tackle the quantification of data value. This quantification is essential to the efficient and effective exploitation of data. We therefore propose our Data Value Assessment Tool Saffron; a customisable semantic-based tool that considers a number of data value dimensions to provide a comprehensive and context-aware data value quantification. Saffron connects to data governance centres to extract relevant metadata, uplifts it to a data value knowledge graph, and presents analysis and semantic driven traceable explanations of the calculated data value.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Saffron: The Data Value Assessment Tool</head><p>Our motivation for Saffron is to enable the optimisation of data value chains based on the quantification of data value. The tool therefore provides the capability of monitoring data assets as used within an enterprise, and uses the relevant metadata to calculate the value of the assets. Considering the lack of consensus on what characterises data value, we here designed Saffron to be extendible, and to calculate data value based on a number of different data value dimensions and the relevant metric groups and metrics as defined in <ref type="bibr" target="#b0">[1]</ref>. We also take into consideration insight and feedback given by relevant stakeholders.</p><p>Figure <ref type="figure" target="#fig_0">1</ref> shows a diagram of the architecture of the Saffron tool. The latter enables its users to connect to one or more data governance centres through APIs. These centres include any methods used by an entity to manage their data, and the relevant metadata. Saffron is therefore able to extract the metadata on data assets as required.</p><p>In the Semantic Data Management component, Saffron uses the Data Value Vocabulary<ref type="foot" target="#foot_0">4</ref> (DaVe) to construct a knowledge graph containing information such as the name of the data asset, its description, and other metadata required to calculate the implemented metrics. We refer to the latter as data asset readings.</p><p>As a proof of concept, we here implemented four different dimensions to characterise data value, namely Infrastructure, Usage, Data, and Quality. For each of these dimensions we implemented a number of metrics, totalling to eight metrics over the four dimensions. Table <ref type="table" target="#tab_0">1</ref> provides an overview based on the hierarchy used in the DaVe vocabulary. Each of these metrics require one or more data asset readings. For example for the Created By metric we require the ID of the person who created the data asset. These readings are then used within the respective formulas of each metric to calculate the metric value. These results are added to the data asset knowledge graph and persisted to a triple store. For the quantification of the data value of data asset, we take into consideration the metric values calculated as described above, as well as any Metric Settings and Dimension Weights specified by the user through the Saffron Dashboard. The metric settings are 'assumptions' required to cater for the subjective nature of data value. For example, one might consider an older data asset to be more valuable, but the opposite might also stand true. Therefore these settings are used in order to tailor the overall data value calculation according to the specific use context. Similarly, the dimension weights are used to cater for the contextual nature of data value, where one dimension might be considered to be relevant in one context, but less in another. For example, the usage dimension would be considered less important than the quality dimension (particularly a timeliness metric) for weather forecast data. It is important to note that the metric calculations are not affected with the dimensions weights, and are therefore objective.</p><p>Through the Saffron Dashboard the user is able to access a number of interactive visualisations, including: <ref type="bibr" target="#b0">(1)</ref> The overall data value of a project (consisting of a number of data assets); <ref type="bibr" target="#b1">(2)</ref> The data value for specific assets, including a breakdown of the dimension values; <ref type="bibr" target="#b2">(3)</ref> The metric values for specific assets; <ref type="bibr" target="#b3">(4)</ref> The historic metric values for specific assets as they changed over time; and (5) The project dimensions weights' current settings.</p><p>In the Saffron Dashboard the user is also able to view an explanation of how the data value was calculated. This explanation is generated within the Semantic Data Management component, where asserted knowledge about the data asset (from the knowledge graph) and the user set weights are coupled with the terminology concepts about data value as defined in the DaVe vocabulary. This enables us to present the user with a concise explanation of why and how Saffron provided the given result as the data value of a data asset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Conclusion</head><p>In this paper we presented the Saffron: Data Value Assessment Tool; the first tool that enables users to quantify the value of their data assets based on a number of dimensions. The tool is extendible and caters for the subjectivity and context dependence of data valuation through the use of weights and settings. Whilst still a proof of concept with a limited amount of implemented dimensions and metrics, the Saffron tool is already being validated and evaluated with stakeholders. Saffron is a concrete step towards quantifying the value of data assets and enabling their effective and efficient exploitation.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Architecture of Saffron: The Data Value Assessment Tool</figDesc><graphic coords="3,136.49,115.84,342.37,79.75" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Implemented Dimensions, Metric Groups, and Metrics</figDesc><table><row><cell cols="2">Dimension Metric Group</cell><cell>Metric</cell></row><row><cell>Usage</cell><cell>User Data</cell><cell>Created By Class of User Last Modification Date Created On</cell></row><row><cell>Quality</cell><cell>Intrinsic</cell><cell>Completeness Accuracy</cell></row><row><cell>Data</cell><cell>Extrinsic</cell><cell>Trust</cell></row><row><cell cols="2">Infrastructure Data</cell><cell>Data Management</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_0">http://theme-e.adaptcentre.ie/dave/</note>
		</body>
		<back>

			<div type="funding">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This research has received funding from the ADAPT Centre for Digital Content Technology, funded under the SFI Research Centres Programme (Grant 13/RC/2106), co-funded by the European Regional Development Fund and the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No. 713567.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">A semantic data value vocabulary supporting data value assessment and measurement integration</title>
		<author>
			<persName><forename type="first">J</forename><surname>Attard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Brennan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 20th International Conference on Enterprise Information Systems -Volume 2: ICEIS</title>
				<meeting>the 20th International Conference on Enterprise Information Systems -Volume 2: ICEIS</meeting>
		<imprint>
			<publisher>INSTICC, SciTePress</publisher>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="133" to="144" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Modeling the Information-value Decay of Medical Problems for Problem List Maintenance</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">G</forename><surname>Klann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Schadow</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1st ACM International Health Informatics Symposium</title>
				<meeting>the 1st ACM International Health Informatics Symposium<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="371" to="375" />
		</imprint>
	</monogr>
	<note>IHI &apos;10</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Semantic Impact Graphs for Information Valuation</title>
		<author>
			<persName><forename type="first">S</forename><surname>Al Saffar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">L</forename><surname>Heileman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Eighth ACM Symposium on Document Engineering</title>
				<meeting>the Eighth ACM Symposium on Document Engineering<address><addrLine>New York, NY, USA; Sao Paulo, Brazil</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="209" to="212" />
		</imprint>
	</monogr>
	<note>DocEng &apos;08. event-place</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Information Valuation for Information Lifecycle Management</title>
		<author>
			<persName><forename type="first">Chen</forename><surname>Ying</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Second International Conference on Autonomic Computing (ICAC&apos;05)</title>
				<imprint>
			<date type="published" when="2005-06">Jun 2005</date>
			<biblScope unit="page" from="135" to="146" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
