<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Logical Approach for Preserving Confidentiality in Shared Knowledge Bases</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Erika Guetti Suca</string-name>
          <email>eguettig@ime.usp.br</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fla´vio Soares Correˆa da Silva</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Mathematics and Statistics - University of Sa ̃o Paulo -</institution>
          <country country="BR">Brazil</country>
        </aff>
      </contrib-group>
      <fpage>132</fpage>
      <lpage>137</lpage>
      <abstract>
        <p>The control of interconnection mechanisms in shared knowledge bases is important to ensure that sensitive information is not extracted in an inappropriate way from connected bases. Our goal is to propose a logical model to specify mechanisms for query control, reasoning and evolution of knowledge bases and their corresponding ontologies, ensuring the confidentiality of information whenever appropriate. We describe the techniques currently in development to address this problem and show their capabilities and limitations. Finally, we introduce the requirements for a software tool to allow a designer of knowledge bases to define sensitive data elements and implement mechanisms to ensure their confidentiality.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>explain the expected outputs of our work and finally in section 6 we present a discussion
and our conclusions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Confidentiality Problem</title>
      <p>We approach the confidentiality problem from two perspectives: inference control
strategies and the inference modeling problem. Inference control strategies are based on
inference time: (1) During knowledge base design time – the main advantages is that it
is usually fast since it only considers the database schema and the corresponding
constraints without the actual instances, but the evolution of data is generally not covered in
the model. (2) During query-processing time – it provides maximal data availability
because all disclosed data can be evaluated to verify the existence of an inference channel.
However, it is usually more expensive and time consuming than the design time approach
and affects system usage.</p>
      <p>In the present work we explore in greater detail the second approach.</p>
      <p>In the inference modeling problem we aim at the implementation of two main
approaches: (1) Logic-based – policy requirements are enforced when the user requests
access to information by means of a query. In the field of ontologies this technique is
called Controlled Query Evaluation(CQE) [Bonatti et al. 2015, Eldora et al. 2011]. The
main advantages are the clear formalization and decidability results, as well as the
independence of the application domain. However, logic-based systems have high
complexity, making it expensive for large Web applications. (2) Anonymization based – it
refers to privacy preserving data publishing (PPDP) assuming that sensitive data must be
retained for data analysis. The main attack and privacy models come from database
techniques based on Statistical Disclosure Control(SDC) [Fung et al. 2010]. In knowledge
bases the anonymization has been applied mainly in the medical area [Grau and Kostylev
2016], [Domingo-Ferrer et al. 2013]. This model has a smaller complexity than
logicbased models, it works on specific domains, with little formalization of the techniques.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Motivating Examples</title>
      <p>We propose two examples to motivate the need to preserve confidentiality, but we can
easily imagine similar needs in many other scenarios. The two examples considered here
relate to healthcare.</p>
    </sec>
    <sec id="sec-4">
      <title>Example 1: Anonymizing Healthcare Data</title>
      <p>Consider the raw patient data in Table 1 where each record represents a surgery case with
the patient specific information. Job, Sex, and Age are quasi-identifying(QID) attributes,
this attributes uniquely identify an individual. The hospital wants to release the Table
1 for the purpose of classification analysis on the class attribute, Transfuse, which has
two values, YES and NO, indicating whether or not the patient has received blood
transfusion. Without a loss of generality, we assume that the only sensitive value in Surgery
is Transgender. Table 2 shows the data after the anonymization using the LKC-privacy
model [Fung et al. 2010] and after processing in order to generalize the records into
equivalence groups so that each group contains at least k records with respect to some
QID attributes. The general intuition of LKC-privacy is to ensure that every combination
of values in QID with maximum length L, they are shared by at least K records, and the
confidence of inferring any sensitive values in S is not greater than C , where L , K , C
are thresholds and S is a set of sensitive values specified by the data holder(the
hospital). In this way, the sensitive values in each qid group are diversified enough to disorient
confident inferences [Mohammed et al. 2009]. There are several anonymization models,
depending on the requirements in the publication of the data.</p>
    </sec>
    <sec id="sec-5">
      <title>Example 2: Protecting Confidentiality Across Several Institutions</title>
      <p>The citizen Jane needs to take a certain preventive medicine for breast cancer. Suppose
Jane does not want her physician or the pharmacy to supply the details of the prescription
to her health insurance company because she does not want to risk an increase in her
health insurance premium on the basis of the fact that medicine she has been prescribed
is intended for use by women who are believed to have a high risk of developing breast
cancer. In such a setting, in order for Jane to be reimbursed by her insurance company,
the pharmacy needs to be able to certify to the insurance company, through a trusted third
party, that Jane has indeed incurred a medical expense that is covered by her insurance
policy [Bao et al. 2007]. In a simple way, consider KP , the Pharmacy Knowledge Base
and KI as Insurance Company Knowledge Base, SJ are Jane’s secrets, such that KP \
KI 6 SJ .</p>
    </sec>
    <sec id="sec-6">
      <title>4. Proposal</title>
      <p>We have developed a simple formal confidentiality model M adapted from the main attack
and privacy models found in the literature [Cuenca Grau and Horrocks 2008,Bonatti et al.
2015]. We consider a single knowledge base as the union of several knowledge bases, and
the notion of logical consequence, for all knowledge base K will be denoted by Cn(K).
We contemplate the problem of confidentiality involves two sub problems, namely (1) the
secure publishing of data based on query-processing and (2) the secure evolution of data.
Definition 1 Let M be a Simple Confidentiality Model (SCM) as follows:
KB, is a knowledge base.</p>
      <p>U , is a set of users of KB. Different users access to different views of the KB.
For all u 2 U :
– Su is a finite set of secrecies that should not be disclosed to the user u.
– The queries from one user u are answered using a view of the knowledge
base KBu KB. KBu is a secure view if Cn(KBu) \ Su = ;.
– A view KBu is maximal secure if it is secure and there exists no K0 such
as KBu K0 KB and Cn(K0) \ Su = ;.
– BKu, is the set of statements that features the background knowledge of
user u.
f is a filtering function that maps for each u 2 U , a view V Cn(KBu).</p>
      <p>Responsibility for the publication of data is given by the function f of Definition
1, f implements a confidentiality strategy.</p>
      <p>Definition 2 Secure Publishing: f is secure if for all u 2 U and s 2 Su, there exists
K 2 KB, such that K is the answer to a query Q of user u:
f (K; u) generates secure views KBu using a specific confidentiality preservation
algorithm, and
s 2= Cn(K [ BKu).</p>
      <p>Definition 3 Secure Evolution: Given KB = (S; D), S is the knowledge base scheme
and D represents the data sets. The evolution KB = (S; D) to KB0 = (S0; D0) is secure
w.r.t. Q and a secure view V if the confidentiality of KB = (S; D) entails KB0 = (S0; D0)
with a secure view V 0, assuming that V 0 was generated using the same definitions of view
V . We can distinguish two types of evolution: when evolution happens in S or when the
change occurs in D.</p>
      <p>Suppose S does not contain the schema and S0 = S [f g. Then KB0 = (S0; D)
does not take break confidentiality since S0 not introduce any correlation with any
secrecies of KB.</p>
      <p>Confidentiality is independent of evolution in D, since it is not related to some
secret query.</p>
    </sec>
    <sec id="sec-7">
      <title>5. Expected Outcomes and Results</title>
      <p>We are working on the properties and general requirements for a software tool for the
performance evaluation of the confidentiality preservation techniques and heuristic strategies
for assurance of confidentiality. Assisting the knowledge engineer to correct the design of
the knowledge base identifying axioms that involve to the disclosure of a secret. Beyond
the improvement of our model, some specific expected outcomes of our work include:
Systematization of heuristic evaluation of the types and confidentiality
preservation techniques.</p>
      <p>Developing methods that allow the users to judge the correctness of the data need
to support flexible conflict resolution.
Systematization of techniques to evaluate the completeness and correctness of our
results.</p>
      <p>Implementation of a software tool to ontologies to ensure the confidentiality of
selected pieces of information. The tool can be included as a plugin of Prote´ge´1,
allowing to assess the strengths and weaknesses of the implementations of
techniques and features of ontologies provided by the user input.</p>
      <p>Comparative analysis of the indicators provided by the metrics of the studied
heuristic techniques.</p>
      <p>As a initial case study, we will developed the examples presented in this paper.</p>
      <p>Each strategy is directly related to the purpose of the secure views of a KB. One
of our goals in this project is to identify heuristics based on properties of knowledge
bases and their planned use to identify and apply confidentiality preservation techniques
that minimize the risk of breaches. At present, some heuristics can be sketched as follows:
Data anonymization should be applied when:
– The goal is to preserve the semantics of the data, omitting personal data,
e.g. to establish a balance between retaining context and protecting
participants.
– The purpose is to study the properties of a data set without allowing the
identification of a particular individual.
– It is important not to interfere with the usefulness of the original
knowledge base.
– It is important to preserve the original data and reversibility of the securing
process is a requirement.</p>
      <p>Creating secure views following logical approaches should be applied when:
– The availability of secure views is not restricted to preserve a set of data.
– It is totally independent of the application domain.</p>
      <p>– It can be adjusted to a specific technique.</p>
      <p>In Figure 1 we have a generic description of the software tool we plan to develop
in this project. The tool will be released as open source in GitHub.</p>
    </sec>
    <sec id="sec-8">
      <title>6. Conclusions</title>
      <p>In this paper we present the following items:</p>
      <p>We formally define the problem of generating a knowledge base that preserves
confidentiality from an insecure knowledge base.</p>
      <p>We present the properties and general requirements for a software tool proposed
for the evaluation of performance of confidentiality preservation techniques and
heuristic building strategies to guarantee confidentiality in specific cases.
A future direction of our work is to consider other forms of distortion of the
knowledge base to ensure confidentiality. For example, we can explore not only remove
elements of the knowledge base, but we may add new elements.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Bao et al. 2007]
          <string-name>
            <surname>Bao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Slutzki</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Honavar</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          (
          <year>2007</year>
          ).
          <article-title>Privacy-preserving reasoning on the semanticweb</article-title>
          . pages
          <fpage>791</fpage>
          -
          <lpage>797</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Bonatti et al. 2015] Bonatti,
          <string-name>
            <given-names>P. A.</given-names>
            ,
            <surname>Petrova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. M.</given-names>
            , and
            <surname>Sauro</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Optimized construction of secure knowledge-base views</article-title>
          . In Calvanese, D. and
          <string-name>
            <surname>Konev</surname>
          </string-name>
          , B., editors,
          <source>Description Logics</source>
          , volume
          <volume>1350</volume>
          <source>of CEUR Workshop Proceedings. CEUR-WS.org.</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>[Cuenca Grau and Horrocks</source>
          <year>2008</year>
          ]
          <string-name>
            <given-names>Cuenca</given-names>
            <surname>Grau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            and
            <surname>Horrocks</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>Privacypreserving query answering in logic-based information systems</article-title>
          .
          <source>In Proceedings of the 2008 Conference on ECAI 2008: 18th European Conference on Artificial Intelligence</source>
          , pages
          <fpage>40</fpage>
          -
          <lpage>44</lpage>
          , Amsterdam, The Netherlands,
          <article-title>The Netherlands</article-title>
          . IOS Press.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [
          <string-name>
            <surname>Domingo-Ferrer</surname>
          </string-name>
          et al. 2013]
          <article-title>Domingo-</article-title>
          <string-name>
            <surname>Ferrer</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Sa</given-names>
            ´nchez, D., and
            <surname>Rufian-Torrell</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Anonymization of nominal data based on semantic marginality</article-title>
          .
          <source>Information Sciences</source>
          ,
          <volume>242</volume>
          :
          <fpage>35</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Eldora et al. 2011] Eldora, Knechtel,
          <string-name>
            <surname>M.</surname>
          </string-name>
          , and Pen˜aloza, R. (
          <year>2011</year>
          ).
          <article-title>Correcting access restrictions to a consequence more flexibly</article-title>
          . In Rosati, R.,
          <string-name>
            <surname>Rudolph</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Zakharyaschev</surname>
          </string-name>
          , M., editors,
          <source>Description Logics</source>
          , volume
          <volume>745</volume>
          .
          <article-title>CEUR-WS.org</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [Fung et al. 2010]
          <string-name>
            <surname>Fung</surname>
            ,
            <given-names>B. C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fu</surname>
          </string-name>
          , A. W.-C., and
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>P. S.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Introduction to Privacy-Preserving Data Publishing: Concepts and Techniques</article-title>
          . Chapman &amp; Hall/CRC, 1st edition.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>[Grau and Kostylev</source>
          <year>2016</year>
          ]
          <string-name>
            <surname>Grau</surname>
            ,
            <given-names>B. C.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Kostylev</surname>
            ,
            <given-names>E. V.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Logical foundations of privacy-preserving publishing of linked data</article-title>
          .
          <source>In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17</source>
          ,
          <year>2016</year>
          , Phoenix, Arizona, USA., pages
          <fpage>943</fpage>
          -
          <lpage>949</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>[Maia and Whitehead</source>
          <year>2014</year>
          ]
          <string-name>
            <surname>Maia</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Whitehead</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Pesquisa global sobre crimes econoˆmicos</article-title>
          .
          <source>Technical report, PWC Global.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Mohammed et al. 2009]
          <string-name>
            <surname>Mohammed</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fung</surname>
            ,
            <given-names>B. C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hung</surname>
            ,
            <given-names>P. C.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Lee</surname>
          </string-name>
          , C.
          <article-title>-</article-title>
          k. (
          <year>2009</year>
          ).
          <article-title>Anonymizing healthcare data: A case study on the blood transfusion service</article-title>
          .
          <source>In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          , pages
          <fpage>1285</fpage>
          -
          <lpage>1294</lpage>
          . ACM.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>