<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Sa ron: A Data Value Assessment Tool for Quantifying the Value of Data Assets ? ??</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>ADAPT Centre</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Trinity College Dublin</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ireland judie.attard@adaptcentre.ie</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ADAPT Centre, School of Computing, Dublin City University</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Computer Science and Statistics, Trinity College Dublin</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>tent Technology, funded under the SFI Research Centres Programme (Grant 13/RC/2106), co-funded by the European Regional Development Fund and the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No.</institution>
          <addr-line>713567</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1864</year>
      </pub-date>
      <abstract>
        <p>Data has become an indispensable commodity and it is the basis for many products and services. It has become increasingly important to understand the value of this data in order to be able to exploit it and reap the full bene ts. Yet, many businesses and entities are simply hoarding data without understanding its true potential. We here present Sa ron; a Data Value Assessment Tool that enables the quanti cation of the value of data assets based on a number of di erent data value dimensions. Based on the Data Value Vocabulary (DaVe), Sa ron enables the extensible representation of the calculated value of data assets, whilst also catering for the subjective and contextual nature of data value. The tool exploits semantic technologies in order to provide traceable explanations of the calculated data value. Sa ron therefore provides the rst step towards the e cient and e ective exploitation of data assets.</p>
      </abstract>
      <kwd-group>
        <kwd>Data value Data governance Data value monitoring Data value assessment Linked Data Explainability</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>\Data is the new oil" is a claim supported by many. Even though there are many
things that di er between data and oil as a resource, such as their renewability
and their e ect on the environment, one cannot deny the similarities in their
usage and utility potential, as well as in their nature of being indispensable
commodities in today's society. We are increasingly relying on data or
databased products and services, particularly in recent times, when the use of big
data is ever so prevalent, and successful decision-making requires the e ective
contextual exploitation of information.</p>
      <p>Whether one agrees with the above-mentioned claim or not, it is undeniable
that data is, to di erent extents, valuable. But what is exactly meant by data
value? Numerous publications in literature explore this term in various domains.
Whilst the existing de nitions of value might be somewhat similar, there is
currently no consensus on the de nition of \data value", or on its representation.
Moreover, it is inherently challenging to measure the value of data due to the
subjective and contextual nature of value. In fact, to the extent of our knowledge,
there currently exists no tool or framework that quanti es the value of data
based on various data value dimensions (aspects that characterise data value, e.g.
quality, cost, usage). In literature there are some approaches towards measuring
one or two of these dimensions, such as [2{4], however these cannot be deemed
as appropriate solutions to quantify data value since they do not cater for the
highly heterogeneous nature of data value. While it is evident that the use of data
has become a vital part of our everyday lives, only few are able to understand
the usefulness of measuring of the value of data. In fact, many businesses are
hoarding data without actually exploiting it or understanding its potential.</p>
      <p>In order to target the niche in the topic of data value, our goal in this paper
is to tackle the quanti cation of data value. This quanti cation is essential to the
e cient and e ective exploitation of data. We therefore propose our Data Value
Assessment Tool Sa ron; a customisable semantic-based tool that considers a
number of data value dimensions to provide a comprehensive and context-aware
data value quanti cation. Sa ron connects to data governance centres to extract
relevant metadata, uplifts it to a data value knowledge graph, and presents
analysis and semantic driven traceable explanations of the calculated data value.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Sa ron: The Data Value Assessment Tool</title>
      <p>Our motivation for Sa ron is to enable the optimisation of data value chains
based on the quanti cation of data value. The tool therefore provides the
capability of monitoring data assets as used within an enterprise, and uses the
relevant metadata to calculate the value of the assets. Considering the lack of
consensus on what characterises data value, we here designed Sa ron to be
extendible, and to calculate data value based on a number of di erent data value
dimensions and the relevant metric groups and metrics as de ned in [1]. We also
take into consideration insight and feedback given by relevant stakeholders.</p>
      <p>Figure 1 shows a diagram of the architecture of the Sa ron tool. The latter
enables its users to connect to one or more data governance centres through
APIs. These centres include any methods used by an entity to manage their data,
and the relevant metadata. Sa ron is therefore able to extract the metadata on
data assets as required.</p>
      <p>In the Semantic Data Management component, Sa ron uses the Data Value
Vocabulary4 (DaVe) to construct a knowledge graph containing information such
4 http://theme-e.adaptcentre.ie/dave/</p>
      <p>Sa ron: A Data Value Assessment Tool
as the name of the data asset, its description, and other metadata required to
calculate the implemented metrics. We refer to the latter as data asset readings.</p>
      <p>As a proof of concept, we here implemented four di erent dimensions to
characterise data value, namely Infrastructure, Usage, Data, and Quality. For
each of these dimensions we implemented a number of metrics, totalling to eight
metrics over the four dimensions. Table 1 provides an overview based on the
hierarchy used in the DaVe vocabulary. Each of these metrics require one or
more data asset readings. For example for the Created By metric we require the
ID of the person who created the data asset. These readings are then used within
the respective formulas of each metric to calculate the metric value. These results
are added to the data asset knowledge graph and persisted to a triple store.</p>
      <p>For the quanti cation of the data value of data asset, we take into
consideration the metric values calculated as described above, as well as any Metric
Settings and Dimension Weights speci ed by the user through the Sa ron
Dashboard. The metric settings are `assumptions' required to cater for the subjective
nature of data value. For example, one might consider an older data asset to be
more valuable, but the opposite might also stand true. Therefore these settings
are used in order to tailor the overall data value calculation according to the
speci c use context. Similarly, the dimension weights are used to cater for the
contextual nature of data value, where one dimension might be considered to be
relevant in one context, but less in another. For example, the usage dimension
would be considered less important than the quality dimension (particularly a
timeliness metric) for weather forecast data. It is important to note that the
metric calculations are not a ected with the dimensions weights, and are therefore
objective.</p>
      <p>
        Through the Sa ron Dashboard the user is able to access a number of
interactive visualisations, including: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) The overall data value of a project (consisting
of a number of data assets); (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) The data value for speci c assets, including a
breakdown of the dimension values; (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) The metric values for speci c assets; (
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
The historic metric values for speci c assets as they changed over time; and (5)
The project dimensions weights' current settings.
      </p>
      <p>In the Sa ron Dashboard the user is also able to view an explanation of
how the data value was calculated. This explanation is generated within the
Semantic Data Management component, where asserted knowledge about the
data asset (from the knowledge graph) and the user set weights are coupled with
the terminology concepts about data value as de ned in the DaVe vocabulary.
This enables us to present the user with a concise explanation of why and how
Sa ron provided the given result as the data value of a data asset.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Conclusion</title>
      <p>In this paper we presented the Sa ron: Data Value Assessment Tool; the rst
tool that enables users to quantify the value of their data assets based on a
number of dimensions. The tool is extendible and caters for the subjectivity and
context dependence of data valuation through the use of weights and settings.
Whilst still a proof of concept with a limited amount of implemented
dimensions and metrics, the Sa ron tool is already being validated and evaluated with
stakeholders. Sa ron is a concrete step towards quantifying the value of data
assets and enabling their e ective and e cient exploitation.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Attard</surname>
            .,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brennan.</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.:</surname>
          </string-name>
          <article-title>A semantic data value vocabulary supporting data value assessment and measurement integration</article-title>
          .
          <source>In: Proceedings of the 20th International Conference on Enterprise Information Systems - Volume</source>
          <volume>2</volume>
          : ICEIS,. pp.
          <volume>133</volume>
          {
          <fpage>144</fpage>
          . INSTICC,
          <string-name>
            <surname>SciTePress</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Klann</surname>
            ,
            <given-names>J.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schadow</surname>
          </string-name>
          , G.:
          <article-title>Modeling the Information-value Decay of Medical Problems for Problem List Maintenance</article-title>
          .
          <source>In: Proceedings of the 1st ACM International Health Informatics Symposium</source>
          . pp.
          <volume>371</volume>
          {
          <fpage>375</fpage>
          . IHI '10,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. al Sa ar, S.,
          <string-name>
            <surname>Heileman</surname>
            ,
            <given-names>G.L.</given-names>
          </string-name>
          :
          <article-title>Semantic Impact Graphs for Information Valuation</article-title>
          .
          <source>In: Proceedings of the Eighth ACM Symposium on Document Engineering</source>
          . pp.
          <volume>209</volume>
          {
          <fpage>212</fpage>
          . DocEng '08,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2008</year>
          ),
          <article-title>event-place: Sao Paulo</article-title>
          , Brazil
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Ying</surname>
          </string-name>
          ,
          <article-title>Chen: Information Valuation for Information Lifecycle Management</article-title>
          . In: Second International Conference on Autonomic
          <source>Computing (ICAC'05)</source>
          . pp.
          <volume>135</volume>
          {
          <issue>146</issue>
          (Jun
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>