<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Need for Data Sharing Agreements in Data Management</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Southampton</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>There is an evidently growing legal, cultural and technological need for tools and models that allow users to express their own intentions and consent over the usage of their personal data and information. Service providers and institutions that manage personal data rely on specifying monolithic \Terms and Conditions" written in natural language and enforced in an ad-hoc manner, by presenting users with topdown, coarse-grained, opt-in/out options. We advocate the need for users to describe their personal contract of data usage in a formal, machineprocessable language. Semantic Web technologies can have a central role in this approach by providing the formal tools and languages required. Expressing data sharing intentions, consent and data usage agreements in a technical way enables the development of algorithms that automatically respect a user's policy. This helps organisations increase technological capabilities, abide by legal requirements, and avoid ad-hoc processes, thus saving engineering resources.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>The need for machine processable data sharing agreements</title>
      <p>In this paper we advocate the need for machine processable data sharing
agreements which we believe will prove particularly valuable to the health data
management domain.</p>
      <p>
        Traditional data privacy approaches such as privacy[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], k-anonymity[11],
or l-diversity[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], provide top-down mechanisms to specify global, authoritative,
coarse-grained privacy policies. These problems as studied in the Web, A.I.,
Databases, and Knowledge Representation communities thus far have focused
on ensuring condentiality of individuals' identity while releasing data, by either
masking, suppressing, altering or inserting noise to data in order to achieve
deindentication. On the other hand, access control approaches, including the most
common role-based [10] and attribute-based [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] approaches, aim to maintain
condentiality of private information by completely disallowing access to certain
data, based on roles/attributes of users and purposes.
      </p>
      <p>All these technologies consider protection against a non-trusted party and
as such they aim to directly limit access to data, or hide the identity of the
Copyright c 2019 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
individual, rather than provide a contract of access that can be used also for
later or indirect use by a relatively trusted party.</p>
      <p>The latter idea has been limitedly studied in the context of privacy
languages for the World Wide Web, with most notable the now deprecated
example of the P3P language [13]. P3P alloweded website owners to specify
coarsegrained policies, using predened options, regarding usage of the clients' data;
web clients/browsers would also specify their preferences in a similar way and
automatically accept or reject visiting a page depending on a match between the
policies. Such coarse-grained predened options, and the fact that these languages
mostly provide \accept/reject" policies and don't oer exibility for partial access
has led almost to an abandonment of machine processable eorts to specify access
and usage contracts, leaving data owners and service providers to rely exclusively
on legal, natural language, \Terms of Use" agreements.</p>
      <p>
        We claim the need to develop the theory, algorithms and implementations
of expressing, supporting and managing ne-grained intentions, or contracts of
data access and data usage, as well as personal user consent. To achieve that,
we believe that systems need to pursue the following objectives:
1. Create a bottom-up setting where individuals and organisations can create
data sharing policies backed-up by a formal machine-processable technical
language. This is in contrast to the classic service-client data sharing model
where clients commit their data to a service provider after being presented
with coarse-grained \Terms and Conditions" written in natural language,
and on which the clients get only a few and simple opt-in/opt-out options.
2. Enable rich and more dynamic expressions of access and usage policies.
Current access control systems are based on a predened set of static rules in order
to prohibit access to certain elds of a data repository; these access control
rules cannot express sharing contracts such as a policy that prohibits a data
item to be shared depending on its history of sharing, or on the amount of
data already given to the data requester.
3. Develop algorithms that will allow data requesters to obtain the maximal
result set of the query that still abides by the contracts of the data owners.
4. Support the goal of data auditing [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] which is to determine if private
information was disclosed in answering queries, as well as the goal of accountability
[15] which aims to understand the responsibilities of dierent parties in data
processing. Data sharing agreements would be central in data auditing and
accountability scenarios, since they inform exactly how to (or not to) use a
particular dataset.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>The Role of the Semantic Web</title>
      <p>We believe that semantic web technologies are inherently suitable to serve the
role of providing the common shared vocabularies for data sharing intentions and
agreements, together with the algorithmic machinery that is needed to process
these agreements.</p>
      <p>
        Building on top of more general schemas such as FOAF, and schema.org that
can be used to describe persons and personal data, or DICOM [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] for modeling
healthcare and medical imaging metadata, there are several ongoing approaches
using knowledge graphs to express aspects of data sharing agreements. The Open
Digital Rights Language (ODRL) [12] is a policy expression language that
models content, services, actions, prohibitions, and obligations. PROV-O [14] models
provenance information generated in dierent systems and under dierent contexts.
More interestingly, the SPECIAL [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] provides a vocabulary for expressing
consent together with data processing workows which take such consent into
account, while in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] the authors develop an ontology that models privacy policies
described in actual medical research data sharing agreements.
      </p>
      <p>
        These knowledge graphs are great steps towards a vision where users or
parties encode their preferences and intentions of data usage in a machine
processable way and data processing algorithms automatically respect these preferences.
In order to achieve this, the developed vocabularies have to be backed by the
development of generic and re-applicable algorithms; possibly borrowing from
data integration [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] or ontology based query answering [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
10. Sandhu, R.S., Coyne, E.J., Feinstein, H.L., Youman, C.E.: Role-based access
control models. Computer 29(2), 38{47 (1996)
11. Sweeney, L.: k-anonymity: A model for protecting privacy. International
Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10(05),
557{570 (2002)
12. W3C: Odrl information model 2.2, https://www.w3.org/TR/odrl-model/
13. W3C: The platform for privacy preferences 1.0, https://www.w3.org/TR/P3P/
14. W3C: Prov-o: The prov ontology, https://www.w3.org/TR/prov-o/
15. Weitzner, D.J., Abelson, H., Berners-Lee, T., Feigenbaum, J., Hendler, J.,
Sussman, G.J.: Information accountability. Comm. of the ACM 51(6), 82
(2008)
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <article-title>Digital imaging and communications in medicine</article-title>
          , https://www.dicomstandard.org/
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>2. The special usage policy language</article-title>
          , https://aic.ai.wu.ac.at/qadlod/policyLanguage/
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Agrawal</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bayardo</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Faloutsos</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiernan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rantzau</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srikant</surname>
          </string-name>
          , R.:
          <article-title>Auditing compliance with a hippocratic database</article-title>
          .
          <source>In: Proceedings of the Thirtieth International Conference on Very Large Data Bases - Volume</source>
          <volume>30</volume>
          . pp.
          <volume>516</volume>
          {
          <fpage>527</fpage>
          . VLDB '
          <volume>04</volume>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Dwork</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Dierential privacy: A survey of results</article-title>
          . In: Agrawal,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Duan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            ,
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . (eds.)
          <source>Theory and Applications of Models of Computation</source>
          . pp.
          <volume>1</volume>
          {
          <fpage>19</fpage>
          . Springer Berlin Heidelberg (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pandey</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sahai</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Waters</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Attribute-based encryption for ne-grained access control of encrypted data</article-title>
          .
          <source>In: Proceedings of the 13th ACM conference on Computer and communications security</source>
          . pp.
          <volume>89</volume>
          {
          <fpage>98</fpage>
          .
          <string-name>
            <surname>Acm</surname>
          </string-name>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Konstantinidis</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ambite</surname>
            ,
            <given-names>J.L.</given-names>
          </string-name>
          :
          <article-title>Scalable query rewriting: a graph-based approach</article-title>
          .
          <source>In: Proceedings of the ACM SIGMOD International Conference on Management of Data</source>
          . pp.
          <volume>97</volume>
          {
          <fpage>108</fpage>
          . Athens, Greece (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Samani</surname>
          </string-name>
          , R.:
          <article-title>Dsap: Data sharing agreement privacy ontology</article-title>
          . In:
          <article-title>Semantic Web Applications and Tools for Healthcare and</article-title>
          Life
          <string-name>
            <surname>Sciences</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Machanavajjhala</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Venkitasubramaniam</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kifer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gehrke</surname>
          </string-name>
          , J.:
          <article-title>ldiversity: Privacy beyond k-anonymity</article-title>
          .
          <source>In: ICDE</source>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Perez-Urbina</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rodr</surname>
          </string-name>
          guez-D
          <string-name>
            <surname>az</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>Grove</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Konstantinidis</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sirin</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>Evaluation of query rewriting approaches for owl 2</article-title>
          . In: Joint Workshop on Scalable and
          <article-title>High-Performance Semantic Web Systems (SSWS+ HPCSW</article-title>
          <year>2012</year>
          ). p.
          <fpage>32</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>