<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Source Information Disclosure in Ontology-based Data Integration (Extended Abstract)?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Michael Benedikt</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bernardo Cuenca Grau</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Egor V. Kostylev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Oxford</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>? This extended abstract is based on the the paper “Source Information Disclosure in Ontologybased Data Integration”, published in the proceedings of AAAI-2017.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Ontology-based data integration systems allow users to access data sitting in
multiple sources by means of queries over a global schema described by an ontology. User
queries are formulated against the vocabulary of the ontology and the relationships
between the datasources and the ontology terms are specified declaratively by mappings.</p>
      <p>In practice, datasources often contain sensitive information that data owners want to
keep inaccessible to users. In the setting of ontology-based data integration, the risks of
unauthorized information disclosure quickly become apparent since the information
exposed to users depends on a complex combination of schema reconciliation, reasoning
over the ontology, and access to data in the sources via the mappings.</p>
      <p>In this paper, we formalize and study the privacy requirements that a data integration
system should satisfy before it is made available to users for querying, as well as on the
computational complexity of checking whether such requirements are fulfilled.</p>
      <p>Our logical framework for information disclosure builds on work in the database
community. In line of existing approaches, we assume that sensitive information is
represented by a query (the policy) over the source schema, and also that schema-level
information (ontology, mappings, source schemas, and policy specification) is publicly
available. In contrast, the actual data is only made available as a result of user queries
over the ontology. Disclosure in our framework occurs when users are able to uncover
an answer to the policy query by querying the system and exploiting the availability
of schema-level information. We consider disclosure for a particular dataset, and also
whether a schema admits a dataset on which disclosure occurs.</p>
      <p>We provide lower and upper bounds on disclosure analysis, in the process
introducing a number of techniques for analyzing logical privacy issues in ontology-based
data integration. In our analysis, we consider different ontology, mapping, and policy
languages. In all cases, we put special emphasis on the results most relevant to standard
OBDA, where the ontology is expressed in DL-LiteR and the mappings are GAV.</p>
      <p>Our results have implications on related work. In particular, they imply new lower
bounds for the instance-based determinacy problem in databases, which is at the core
of data pricing—the problem of automatically assigning a fair price to a chunk of data
given the price of a given set of views.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>