<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Preserving Confidentiality in Ontologies: Can we develop secure ontologies?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Erika Guetti Suca</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Supervisor: Fl a´vio Soares Correˆa da Silva</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Mathematics and Statistics - University of Sa ̃o Paulo -</institution>
          <country country="BR">Brazil</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Many semantic web applications require selective sharing of ontologies due to copyright, confidentiality, business, security concerns and others. Our motivation is to protect the possibility to infer sensitive information and improperly extract it from the connected knowledge bases represented in ontologies. Supporting the design knowledge bases in order to overcome possible types of attacks. We give a brief description of a software tool we are currently building. We propose heuristics based on properties of ontologies and their planned use to identify and apply confidentiality preservation techniques that minimize the risk of breaches. Finally, we discuss several open problems.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Motivating Examples</title>
      <p>We classify two levels of information ownership: (1)Full access to data: data will be
processed before releasing and (2) Partial access to data: distributed data access. Our
examples are related to Healthcare, but we can easily imagine similar needs in many
other scenarios. The following example refers to the first category.</p>
      <p>
        The hospital wants to release Table 1 for the purpose of classification analysis on
the class attribute, Transfuse, which has two values, YES and NO, indicating whether or
not the patient has received a blood transfusion. Without a loss of generality, we assume
that the only sensitive value in Surgery is Transgender. Table 2 shows the data after the
anonymization using the LKC-privacy model[
        <xref ref-type="bibr" rid="ref4">Fung et al. 2010</xref>
        ]. The general intuition of
LKC-privacy is to ensure that every combination of values in QID with maximum length
L is shared by at least K records, and that the confidence of inferring any sensitive values
in S is not greater than C, where L , K , C are thresholds and S is a set of sensitive values
specified by the data holder(the hospital). We mentioned that Table 2 output is a secure
view, after its processing in order to generalize the records into equivalence groups, so
that each group contains at least k records with respect to some QID attributes, because it
preserves the confidentiality of sensitive data (in this case the identity of patients). Hence,
the sensitive values in each qid group are diversified enough to disorient confident
inferences [
        <xref ref-type="bibr" rid="ref7">Mohammed et al. 2009</xref>
        ].
      </p>
      <p>
        We need to establish a balance between retaining context and protecting
participants aiming to preserve data that is of interest to the end user. These techniques are based
on Statistical Disclosure Control(SDC) and they are called Privacy Preserving Data
Publishing (PPDP), which eliminate privacy threats while, preserving useful information for
data analysis. There are several anonymization models, depending on the requirements of
the publication of data[
        <xref ref-type="bibr" rid="ref4">Fung et al. 2010</xref>
        ].
      </p>
      <p>
        The following examples are related to partial access to data. Suppose that a
system has generated an answer to a user that preserves confidential information. However,
some works [
        <xref ref-type="bibr" rid="ref2">Bonatti et al. 2014</xref>
        ,
        <xref ref-type="bibr" rid="ref1">Bonatti et al. 2015</xref>
        ] showed that it is possible to break
confidentiality once a smart user considers the following possibilities (examples 2 and 3).
      </p>
    </sec>
    <sec id="sec-3">
      <title>Example 2: Attacks using User’s Background Knowledge</title>
      <p>Generally, only one part of the domain is modeled. The user may exploit various sources
of background knowledge and meta-knowledge to reconstruct the hidden part of the
knowledge base. The additional knowledge, from a public ontology could be used to
infer secrets and confidentiality breach, for instance. We define Cn(KB) as the set of
logical consequences of a knowledge base. In effect, the condition Cn(KB) \ S = ; is
not sufficient to protect confidentiality. We suppose that there is one secret
S = fOncologyP atient(Bob)g and</p>
      <p>KB = fSocialSecurityN umber(Bob; 12345); OncologyP atient(user123);
SocialSecurityN umber(user123; 12345)g.</p>
      <p>
        KB does not entail OncologyP atient(Bob), KB is a secure view.
However, it is common knowledge that a SocialSecurityN umber uniquely identifies
one person, then the attacker can infer that Bob = user123, and he may
consequently discloses the secret. From a probabilistic perspective, the attacker should not
change the probability distribution distribution over the possible answers to a
sensitive query that represents a set of secrets, even though revealing the secret is also
important[
        <xref ref-type="bibr" rid="ref3">Cuenca Grau and Horrocks 2008</xref>
        ].
      </p>
    </sec>
    <sec id="sec-4">
      <title>Example 3: Attacks to Complete Knowledge</title>
      <p>Suppose the attacker has complete knowledge about a certain set of axioms. Then the
attacker may be able to reconstruct some secrets from the “I don’t know” answers of a
secure view KBu . In particular, for all instances, the system constantly answers ”I do
not know” to any query over one secret. Consider a hospital’s knowledge base that
defines a concept P atient and a role P atient of that describes which patient belongs to
which of the Y hospital department. The KB consists of assertions like: P atient(X)
and P atient of (X; Y ) where we assume that each patient p belongs to exactly one
department i. 1 i n k A user u is authorized to see all assertions except the instances
of Y = n, because n is a special department, dedicated to highly dangerous diseases. So
S ( set of secrecies) are all assertions where P atient of (X; n), i.e., the members of n are
all the patients that are treated for a special disease.</p>
      <p>Based on the knowledge that for each patient e, KB contains exactly one assertion
P atient of (e; i) that is assumed to be known by the attackers, a smart attack can easily
identify all the members of n .</p>
    </sec>
    <sec id="sec-5">
      <title>3. Related Work</title>
      <p>
        The work [
        <xref ref-type="bibr" rid="ref6">Grau and Motik 2014</xref>
        ] propose the partition of the knowledge base into a
visible part KBv and a hidden part KBh of the secrets to be protect. The KBh is accessible
through a limited query interface. Their objective is to publish sensitive information in
Kh, ensuring it is not exposed in any way, but preserving all the consequences the users
could reason over Kv [ Kh This methodology works under the assumption that the
signatures of KBv and KBh are disjointed, i.e., it does not consider protecting the axioms that
are implied by a combination of KBv and KBh. Their work establishes lower bounds on
the size and the number of queries that an import-by-query algorithm may need to ask an
oracle in order to solve a reasoning task. About the possibility of reuse, their model is
flexible and useful for KB’s designers to ensure selective access to parts of KB’s.
Nevertheless, it is not discussed how to select the hidden part given a set of target secrets ;
moreover the user’s background knowledge that should be maintained is not analyzed.
      </p>
      <p>
        A probabilistic perspective is introduced by the work
[
        <xref ref-type="bibr" rid="ref3">Cuenca Grau and Horrocks 2008</xref>
        ]. Enlarging the public view should not change
the probability distribution over the possible answers to a sensitive query that can
represent a set of secrets. Their confidentiality condition allows P to be replaced with a
different P 0 after enlarging the public view. But, taking a closer look, P does not really
consider the user’s a priori knowledge about the KB.
      </p>
      <p>
        The work [
        <xref ref-type="bibr" rid="ref5">Grau and Kostylev 2016</xref>
        ] focus on data publishing and anonymization,
but not on access control. They lay theoretical foundations for Privacy-Preserving Data
Publishing(PPDP) in the context of Linked Data, formalizing anonymization in terms of
suppressor functions. Working the computational complexity of the decision problems
underlying the policy compliance, safety, and optimality requirements. Their policy
compliance ensures that sensitive information remains protected when the anonymized data is
considered in isolation. They can ensure safety by replacing all occurrences of sensitive
data with blank nodes. As a consequence, they ensure the published linked datasets are
protected against disclosure of sensitive information while remaining practically useful.
This framework does not receive OWL 2 ontologies yet, and the authors expect that the
introduction of OWL2 will lead to significant technical challenges, especially in
combination with closed-world semantics.
      </p>
      <p>
        Lastly, the works [
        <xref ref-type="bibr" rid="ref2">Bonatti et al. 2014</xref>
        ,
        <xref ref-type="bibr" rid="ref1">Bonatti et al. 2015</xref>
        ] introduce a stronger
confidentiality model that take both object level and meta-level background knowledge
into consideration and define a method for computing secure knowledge views. They
illustrate attacks using an object-level background, complete knowledge, and signatures
including in the the formalization of user’s prior knowledge. Their methods are inspired
in Controlled Query Evaluation (CQE) based on lies and/or refusals. Technically, they
use lies because rejected queries are not explicitly marked. Once the user requests access
to information by means of a query, the policy requirements are enforced. Their model
proposes a safe approximation of background meta-knowledge and it checks its answers
to users queries that do not entail any secret. They applied the data filtering without
considering a context analysis that helps to distinguish the relevance or interest of the end
user from the data.
      </p>
    </sec>
    <sec id="sec-6">
      <title>4. The Proposal</title>
      <p>
        We summarize the confidentiality problem in how to create secure views for answers to
user queries that preserve the confidentiality given an insecure ontology. We consider
that the secure view of the ontology does not entail a secret. So, we assume that
every scenario of published data has its own assumptions and requirements on the data
holder, the data recipients, and the purpose of published data. Our framework
implements a conceptual structure, a Simple Confidentiality Model (SCM), based on the works
[
        <xref ref-type="bibr" rid="ref3">Cuenca Grau and Horrocks 2008</xref>
        ,
        <xref ref-type="bibr" rid="ref1">Bonatti et al. 2015</xref>
        ]. The SCM considers the user’s
background knowledge to create a secure view, then, it is not vulnerable to the attacks
mentioned in Examples 2 and 3. We show in Figure 1 the general components of our
framework. We are currently building it up.
      </p>
      <p>The framework considers two ways of interaction with a user: (1) Configure
features of KB’s:- the user sets up inherent characteristics of a knowledge base ( i.e. the
user’s background knowledge, a finite set of secrecies that should not be disclosed for
each user). Furthermore, there are characteristics that concerns the objective of
information utility as policy representations, hidden features (it is important to preserve the
semantics of the data or not) are also essentials. (2) User Queries: if the queries are
compliant with the preservation of confidentiality, the system will show a secure view.
Otherwise, it will show a justification because it is not compliant with the policies.
1
2</p>
      <p>Input</p>
      <p>Ontology
(Evolution features)
Users Secrecies
to hide by User</p>
      <p>Hidden Features
(Utility , Type Domain)</p>
      <p>Output
Performance Evaluation</p>
      <p>Report</p>
      <p>Input
User queries</p>
      <p>Output
Secure views
to Users</p>
      <p>Output</p>
      <p>Justification for
non compliant queries</p>
      <p>Policy Reasoner
Selection Confidentiality Strategy</p>
      <p>Evaluation Heuristics
Alterations and Creation
of Secure Views</p>
      <p>OWL API</p>
      <p>Compliant
Queries
Results</p>
      <p>KB
(Ontologies)</p>
    </sec>
    <sec id="sec-7">
      <title>5. Evaluation plan</title>
      <p>We propose heuristics based on properties of knowledge bases and their planned use
to identify and apply confidentiality preservation techniques that minimize the risk of
breaches. Our intention is to systematize, as far as possible, the heuristics performance
evaluation connected to confidentiality preservation techniques. We propose the
following:
1. OWL 2 Profiles: The W3C distinguishes three OWL 2 profiles (OWL 2 QL, OWL
2 RL, OWL 2 EL). Reasoning over ontologies in each of these profiles is performed
using a set of rules that vary with each profile, there is some overlap among the
different rule sets. The essential differences between the OWL 2 profiles are in
their restrictions on inverse properties, and universal and existential quantifiers.
We want to understand whether the differences between OWL 2 profiles influence
creating a secure view.
2. Reasoning Strategies: Forward chaining, backward chaining or a hybrid strategy.</p>
      <p>We want to understand how the strategies of ontology reasoners influence to
creating secure view.
3. Ontology Development Strategies: There are two main ways of comprehending
ontology development, top-down and bottom-up. Ontology construction can not
be considered in isolation, without relation to its use. We want to understand
whether the strategies of ontology development impact on creating create a secure
view. Our intention is to work in the Healthcare domain.</p>
    </sec>
    <sec id="sec-8">
      <title>6. Work Plan</title>
      <p>We present the properties and general requirements for a software tool for preventing
access to sensitive information. Our immediate future work is to develop our proposal
in case studies related to healthcare. The doctoral defense is expected to be in February
2018. Some specific anticipated outcomes of our work include:</p>
      <p>Systematization and comparative analysis of proposed heuristics.</p>
      <p>Basing ourselves heuristics, our goal is to identify the characteristics of ontologies
that allow preserving the confidentiality, or the properties of non-viability.
Creating scenarios in which ontology developers have restricted access to the parts
of the ontology by others with the objective to improve the computational
complexity.</p>
      <p>Identify others kinds of attacks and propose privacy models that ensure the
confidentiality offering support for conflict resolution: semantic inconsistencies and
ambiguities.</p>
      <p>A future direction of our work is to consider other forms of distortion of the
knowledge base to ensure confidentiality. For example, we can explore not only remove
elements of the knowledge base, but we may also add new elements.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Bonatti</surname>
            ,
            <given-names>P. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petrova</surname>
            ,
            <given-names>I. M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Sauro</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Optimized construction of secure knowledge-base views</article-title>
          . In Calvanese, D. and
          <string-name>
            <surname>Konev</surname>
          </string-name>
          , B., editors,
          <source>Description Logics</source>
          , volume
          <volume>1350</volume>
          <source>of CEUR Workshop Proceedings. CEUR-WS.org.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Bonatti</surname>
            ,
            <given-names>P. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sauro</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Petrova</surname>
            ,
            <given-names>I. M.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>A mechanism for ontology confidentiality</article-title>
          .
          <source>In Proceedings of the 29th Italian Conference on Computational Logic</source>
          , Torino, Italy, June 16-18,
          <year>2014</year>
          ., pages
          <fpage>147</fpage>
          -
          <lpage>161</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Cuenca</given-names>
            <surname>Grau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            and
            <surname>Horrocks</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>Privacy-preserving query answering in logicbased information systems</article-title>
          .
          <source>In Proceedings of the 2008 Conference on ECAI 2008: 18th European Conference on Artificial Intelligence</source>
          , pages
          <fpage>40</fpage>
          -
          <lpage>44</lpage>
          , Amsterdam, The Netherlands,
          <article-title>The Netherlands</article-title>
          . IOS Press.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Fung</surname>
            ,
            <given-names>B. C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fu</surname>
          </string-name>
          , A. W.-C., and
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>P. S.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Introduction to PrivacyPreserving Data Publishing: Concepts and Techniques</article-title>
          . Chapman &amp; Hall/CRC, 1st edition.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Grau</surname>
            ,
            <given-names>B. C.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Kostylev</surname>
            ,
            <given-names>E. V.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Logical foundations of privacy-preserving publishing of linked data</article-title>
          .
          <source>In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17</source>
          ,
          <year>2016</year>
          , Phoenix, Arizona, USA., pages
          <fpage>943</fpage>
          -
          <lpage>949</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Grau</surname>
            ,
            <given-names>B. C.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Motik</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Reasoning over ontologies with hidden content: The import-by-query approach</article-title>
          . CoRR, abs/1401.5853.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Mohammed</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fung</surname>
            ,
            <given-names>B. C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hung</surname>
            ,
            <given-names>P. C.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Lee</surname>
          </string-name>
          , C.
          <article-title>-</article-title>
          k. (
          <year>2009</year>
          ).
          <article-title>Anonymizing healthcare data: A case study on the blood transfusion service</article-title>
          .
          <source>In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          , pages
          <fpage>1285</fpage>
          -
          <lpage>1294</lpage>
          . ACM.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>