=Paper= {{Paper |id=Vol-2211/paper-07 |storemode=property |title=Towards Privacy-Preserving Ontology Publishing |pdfUrl=https://ceur-ws.org/Vol-2211/paper-07.pdf |volume=Vol-2211 |authors=Franz Baader,Adrian Nuradiansyah |dblpUrl=https://dblp.org/rec/conf/dlog/BaaderN18 }} ==Towards Privacy-Preserving Ontology Publishing== https://ceur-ws.org/Vol-2211/paper-07.pdf
Towards Privacy-Preserving Ontology Publishing

                       Franz Baader, Adrian Nuradiansyah?
                        firstname.lastname@tu-dresden.de

                      Theoretical Computer Science, TU Dresden




        Abstract. We make a first step towards adapting the approach of Cuenca
        Grau and Kostylev for privacy-preserving publishing of linked data to De-
        scription Logic ontologies. We consider the case where both the knowl-
        edge about individuals and the privacy policies are expressed using EL
        concepts. We introduce the notions of compliance of a concept with a
        policy and of safety of a concept for a policy, and show how optimal
        compliant (safe) generalizations of a given EL concept can be computed.



1     Introduction

When publishing information about individuals, one needs to ensure that cer-
tain privacy constraints are fulfilled. These constraints are encoded as privacy
policies, and before publishing the information one needs to check whether the
information is compliant with these policies [6]. We illustrate this setting using
an example from [6]: when publishing information about hospitals, doctors, and
patients, the policy may require that one should not be able to find out who
are the cancer patients. In case the information to be published is not policy
compliant, it first needs to be modified in a minimal way to make it compliant.
However, compliance per se is not enough if a possible attacker can also obtain
relevant information from other sources, which together with the published in-
formation might violate the privacy policy. Safety requires that the combination
of the published information with any other compliant information is again com-
pliant [6]. More information on privacy-preserving data publishing can be found
in the survey [7].
    In [6], this problem was investigated in a setting where the information to
be published is given as a relational dataset with (labeled) null values, and the
policy is given by a conjunctive query. In order to make a given dataset compliant
or safe, one is basically allowed to replace constants (or null values) by new null
values. The paper investigates the complexity of deciding compliance (Is a given
modification of a dataset policy compliant?), safety (Is a given modification of
a dataset safe w.r.t. a policy?), and optimality (Is a given modification of a
dataset safe w.r.t. a policy and changes the dataset in a minimal way?). The
obtained complexity results depend on whether combined or data complexity is
considered, and whether closed- or open-world semantics are used. The paper
?
    Funded by DFG within the Research Training Group 1907 “RoSI”
does not consider the case where the information in the dataset is augmented
by ontological knowledge.
    In the present paper, we make a first step towards handling ontologies in
this context, but consider a quite restricted setting, where information about an
individual is given by a concept of the inexpressive Description Logic (DL) EL.
Basically, this is the setting where the ontology consists of an ABox containing
only concept assertions of the form C(a) for possibly complex concepts C, but
no role assertion. In [8], such an ABox was called an instance store. In addition,
we assume that there is no TBox, i.e., all the information about the individual
a is given by the concept C.1 A policy is then given by an instance query, i.e.,
by an EL concept D. A concept C (giving information about some individual a)
is compliant with this policy, if it is not subsumed by D, i.e., if C(a) does not
imply D(a). In our example, the policy could be formalized as the EL concept

            D = Patient u ∃seen_by.(Doctor u ∃works_in.Oncology),

which says that one should not be able to find out who are the patients that are
seen by a doctor that works for the oncology department. The concept

    C = Patient u Male u ∃seen_by.(Doctor u Female u ∃works_in.Oncology)

is not compliant with the policy D since C v D. The concept

        C 0 = Male u ∃seen_by.(Doctor u Female u ∃works_in.Oncology)

is a compliant generalization of C, i.e., C v C 0 and C 0 6v D. However, it is not
safe since C 0 u Patient v D, i.e., if the attacker already knows that a is a patient
then together with C 0 (a) the hidden information D is revealed. In contrast,

           C 00 = Male u ∃seen_by.(Doctor u Female u ∃works_in.>),

is a safe generalization of C, though it is less obvious to see this. This concept
is, however, not optimal since more information than necessary is removed. In
fact, the concept

           C 000 = Male u ∃seen_by.(Doctor u Female u ∃works_in.>)
                          ∃seen_by.(Female u ∃works_in.Oncology)

is a safe generalization of C that is more specific than C 00 , i.e. C v C 000 @ C 00 .
     In this paper, we will show how to compute optimal compliant and optimal
safe generalizations of EL concepts C with EL policies, but instead of only one
policy concept we allow for a finite set of EL concepts as policy, where a concept
C 0 is compliant with the policy {D1 , . . . , Dp } iff it is compliant with each element
of this set, i.e., C 6v Di holds for all i = 1, . . . , p. But first, we need to introduce
the DL EL and the notions of compliance, safety, etc. in a more formal way.
1
    Since EL concepts are closed under conjunction, we can assume that the ABox
    contains only one assertion for a.
2    Preliminaries
A wide range of DLs of different expressive power haven been investigated in
the literature [2]. Here, we only introduce the DL EL, for which reasoning is
tractable [3,5,1]. Let NC and NR be mutually disjoint sets of concept and role
names, respectively. Then EL concepts over these names are constructed from
concept names using the constructors top concept (>), conjunction (C u D),
and existential restriction (∃r.C). The size of an EL concept C is the number of
occurrences of > as well as concept and role names in C, and its role depth is
the maximal nesting of existential restrictions.
     The semantics of EL is defined through interpretations I = (∆I , ·I ), where
∆ is a non-empty set, called the domain, and ·I is the interpretation function,
  I

which maps every A ∈ NC to a set AI ⊆ ∆I and every r ∈ NR to a binary
relation rI ⊆ ∆I × ∆I . This function ·I is extended to arbitrary EL concepts
by setting >I := ∆I , (C u D)I := C I ∩ DI , and (∃r.C)I := {δ ∈ ∆I | ∃η ∈
C I .(δ, η) ∈ rI }.
     The EL concept C is subsumed by the EL concept D (written C v D) if
C I ⊆ DI holds for all interpretations I. Strict subsumption (written C @ D)
holds if C v D and D 6v C, and we say that C is equivalent to D (written
C ≡ D) if C v D and D v C. Subsumption between EL concepts can be decided
in polynomial time [3]. The following recursive characterization of subsumption
in EL was shown in [4].

Proposition 1. Let

                   C = A1 u . . . u Ak u ∃r1 .C1 u . . . u ∃rm .Cm , and
                   D = B1 u . . . u B` u ∃s1 .D1 u . . . u ∃sn .Dn ,

where A1 , . . . , Ak , B1 , . . . , B` ∈ NC and r1 , . . . , rm , s1 , . . . , sn ∈ NR . Then we
have C v D iff
 – {B1 , . . . , B` } ⊆ {A1 , . . . , Ak }, and
 – for all i ∈ {1, . . . , n} there is j ∈ {1, . . . , m} such that si = rj and Cj v Di .

    We are now ready to define the important notions regarding privacy-preserving
publishing of ontological information that will be investigated in this paper. As
mentioned in the introduction, policies are finite sets of EL concepts. We assume
in the following, that the concepts occurring in the policy are not equivalent to
top since otherwise there would not be compliant concepts.

Definition 1. A policy is a finite set P = {D1 , . . . , Dp } of EL concepts such
that > 6≡ Di for i = 1, . . . , p. Given an EL concept C and a policy P =
{D1 , . . . , Dp }, the EL concept C 0 is
 – compliant with P if C 0 6v Di holds for all i = 1, . . . , p;
 – safe for P if C 0 u C 00 is compliant with P for all EL concepts C 00 that are
   compliant with P;
 – a P-compliant generalization of C if C v C 0 and C 0 is compliant with P;
 – an optimal P-compliant generalization of C if it is a P-compliant general-
   ization of C and there is no P-compliant generalization C 00 of C such that
   C 00 @ C 0 ;
 – a P-safe generalization of C if C v C 0 and C 0 is safe for P;
 – an optimal P-safe generalization of C if it is a P-safe generalization of C
   and there is no P-safe generalization C 00 of C such that C 00 @ C 0 .

It is easy to see that safety implies compliance since the top concept is always
compliant: if C 0 is safe for P, then > u C 0 ≡ C 0 is compliant.


3   Characterizing compliance

In this section, we characterize the concepts that are compliant with a given
policy P, and use this to develop an algorithm that computes all optimal P-
compliant generalizations of a given EL concept C.
    But first, we need to introduce some more notations. We call an EL concept
an atom if it is a concept name or an existential restriction. Given an EL concept
C, we denote the set of atoms occurring in its top-level conjunction with con(C).
For example, if C = A u ∃r.(B u ∃s.A), then con(C) = {A, ∃r.(B u ∃s.A)}. As a
special case of Proposition 1, subsumption between atoms can be characterized
as follows. If E, F are atoms, then E v F iff

 – E = F ∈ NC or
 – E, F are existential restrictions of the form E = ∃r.E 0 , F = ∃r.F 0 such that
   E0 v F 0.

Definition 2. Let S, T be sets of atoms. Then we say that S covers T if for
every F ∈ T there is E ∈ S such that E v F .

With this notation, Proposition 1 can be reformulated as follows: C v D iff
con(C) covers con(D). The following (polynomial-time decidable) characteriza-
tion of compliance is thus an immediate consequence of Proposition 1.

Proposition 2. The EL concept C 0 is compliant with the policy P = {D1 , . . . ,
Dp } iff con(C 0 ) does not cover con(Di ) for any i = 1, . . . , p, i.e., for every
i = 1, . . . , p, at least one of the following two properties holds:

 – there is a concept name A ∈ con(Di ) such that A 6∈ con(C 0 ); or
 – there is an existential restriction ∃r.D ∈ con(Di ) such that C 6v D for all
   existential restrictions of the form ∃r.C ∈ con(C 0 ).

Now assume that we are given an EL concept C and a policy P = {D1 , . . . , Dp },
and we want to construct a P-compliant generalization C 0 of C. For C 0 to satisfy
the condition of Proposition 2, there needs to exist for every i = 1, . . . , p an
element of con(Di ) that is not covered by any element of con(C 0 ). In case con(C)
contains elements covering such an atom, we need to remove or generalize them
appropriately.
Definition 3. We say that H ⊆ con(D1 ) ∪ . . . ∪ con(Dp ) is a hitting set of
con(D1 ), . . . , con(Dp ) if H ∩ con(Di ) 6= ∅ for every i = 1, . . . , p. This hitting set
is minimal if there is no other hitting set strictly contained in it.
    Basically, the idea is now to choose a hitting set H of con(D1 ), . . . , con(Dp )
and use H to guide the construction of a compliant generalization of C. In order
to make this generalization as specific as possible, we use minimal hitting sets.
In case the policy contains concepts Di with which C is already compliant (i.e.,
C 6v Di holds), nothing needs to be done w.r.t. these concepts. This is why, in
the following definition, con(Di ) does not take part in the construction of the
hitting set if C 6v Di .
Definition 4. Let C be an EL-concept and P = {D1 , . . . , Dp } a policy. The set
SCG(C, P) of specific compliant generalizations of C w.r.t. P consists of the
concepts that can be constructed from C as follows:
 – If C is compliant with P, then SCG(C, P) = {C}.
 – Otherwise, choose a minimal hitting set H of con(Di1 ), . . . , con(Diq ) where
   i1 , . . . , iq are exactly the indices for which C v Di . Note that q ≥ 1 since we
   are in the case where C is not compliant with P. In addition, according to
   our definition of a policy, none of the concepts Di is equivalent to >, and
   thus the sets con(Dij ) are non-empty. Consequently, at least one minimal
   hitting set exists. Each minimal hitting set yields a concept in SCG(C, P)
   by removing or modifying atoms in the top-level conjunction of C in the
   following way:
     • For every concept name A ∈ con(C), remove A from the top-level con-
          junction of C if A ∈ H;
     • For every existential restriction ∃ri .Ci ∈ con(C), consider the set

                     Pi := {G | there is ∃ri .G ∈ H such that Ci v G}.

          ∗ If Pi = ∅ then leave ∃ri .Ci as it is.
          ∗ If > ∈ Pi , then remove ∃ri .Ci . d
          ∗ Otherwise, replace ∃ri .Ci with F ∈SCG(Ci ,Pi ) ∃ri .F.

    We will show below that every element of SCG(C, P) is a compliant gener-
alization of C, and that all optimal compliant generalizations of C belong to
SCG(C, P). However, SCG(C, P) may also contain compliant generalizations of
C that are not optimal, as illustrated by the following example.
Example 1. Let C = ∃r.(A1 u A2 u A3 u A4 ) and P = {D1 , D2 }, where

              D1 = ∃r.A1 u ∃r.(A2 u A3 ) and D2 = ∃r.A2 u ∃r.A4 .

We have C v D1 and C v D2 , and thus C is not compliant with P. Consequently,
the elements of SCG(C, P) are obtained by considering the minimal hitting sets
of {∃r.A1 , ∃r.(A2 u A3 )} and {∃r.A2 , ∃r.A4 }.
    If we take the minimal hitting set H = {∃r.(A2 u A3 ), ∃r.A2 } and consider
the only existential restriction in con(C), the corresponding set Pi consists of
A2 uA3 and A2 . It is easy to see that SCG(A1 uA2 uA3 uA4 , Pi ) = {A1 uA3 uA4 }
since the only minimal hitting set of {A1 , A2 } and {A2 } is {A2 }. Thus, we obtain
C 0 := ∃r.(A1 u A3 u A4 ) as an element of SCG(C, P).
     However, if we take the minimal hitting set H0 = {∃r.A1 , ∃r.A2 } instead,
then the set Pi0 corresponding to the only existential restriction in con(C) is
{A1 , A2 }. Consequently, in this case SCG(A1 u A2 u A3 u A4 , Pi0 ) = {A3 u A4 }
since the only minimal hitting set of {A1 } and {A2 } is {A1 , A2 }. This yields
C 00 := ∃r.(A3 u A4 ) as another element of SCG(C, P). Since C 0 @ C 00 , the
element C 00 cannot be optimal.

   Next, we show that the elements of SCG(C, P) are compliant generalizations
of C.

Proposition 3. Let C be an EL-concept and P = {D1 , . . . , Dp } a policy. If
C 0 ∈ SCG(C, P), then C 0 is a P-compliant generalization of C.

Proof. In case C is already compliant with P, then C = C 0 and we are done.
Thus, assume that C is not compliant with P. We show that C 0 is a compliant
generalization of C by induction on the role depth of C.
     First, we show that C 0 is a generalization of C, i.e., C v C 0 . This is an
easy consequence of the fact that, when constructing C 0 from C, atoms from the
top-level conjunction of C are left unchanged, are removed, or are replaced by a
conjunction of more general atoms. The only non-trivialdcase is where we replace
an existential restriction ∃ri .Ci with the conjunction F ∈SCG(Ci ,Pi ) ∃ri .F . By
induction,
d               we know that Ci v F for all F ∈ SCG(Ci , Pi ), and thus ∃ri .Ci v
  F ∈SCG(Ci ,Pi ) ∃ri .F .
     Second, we show that C 0 is compliant with P, i.e., C 0 6v Di holds for i =
1, . . . , p. For the indices i with C 6v Di , we clearly also have C 0 6v Di since C v
C 0 . Now, consider one of the remaining indices ij ∈ {i1 , . . . , iq }, where i1 , . . . , iq
are exactly the indices for which C v Di . The concept C 0 was constructed by
taking some minimal hitting set H of con(Di1 ), . . . , con(Diq ). If the element in
H hitting con(Dij ) is a concept name, then this concept name does not occur
in con(C 0 ), and thus C 0 6v Dij . Thus, assume that it is an existential restriction
∃ri .G. But then each existential restriction ∃ri .Ci in con(C) with Ci v G is
either removed or replaced by a conjunction of existential restrictions ∃ri .F such
that (by induction) F 6v G. In addition, other existential restrictions are either
removed or generalized. This clearly implies C 0 6v Dij since ∃ri .G in con(Dij ) is
not covered by any element of con(C 0 ).                                                     t
                                                                                             u

   The next lemma states that every compliant generalization of C subsumes
some element of SCG(C, P).

Lemma 1. Let C be an EL-concept and P = {D1 , . . . , Dp } a policy. If C 00
is a P-compliant generalization of C, then there is C 0 ∈ SCG(C, P) such that
C 0 v C 00 .
Proof. If C is compliant with P, then we have C ∈ SCG(C, P) and C v C 00
since C 00 is a generalization of C. Thus, assume that C is not compliant with P,
and let i1 , . . . , iq be exactly the indices for which C v Di .
     Now, let ij be such an index. We have C v C 00 6v Dij and C v Dij .
Since C 00 6v Dij , there is an element Ej ∈ con(Dij ) that is not covered by
any element of con(C 00 ). Obviously, H 00 := {E1 , . . . , Eq } is a hitting set of
con(Di1 ), . . . , con(Diq ). Thus, there is a minimal hitting set H of con(Di1 ), . . . ,
con(Diq ) such that H ⊆ H 00 . Let C 0 be the element of SCG(C, P) that was con-
structed using this hitting set H. We claim that C 0 v C 00 . For this, it is sufficient
to show that con(C 0 ) covers con(C 00 ).
     First, consider a concept name A ∈ con(C 00 ). Since C v C 00 , we also have
A ∈ con(C). If A 6∈ H 00 , then A 6∈ H, and thus A is not removed in the
construction of C 0 . Consequently, A ∈ con(C 0 ) covers A ∈ con(C 00 ). If A ∈ H 00 ,
then A is not covered by any element of con(C 00 ) according to our definition of
H 00 , which contradicts our assumption that A ∈ con(C 00 ).
     Second, consider an existential restriction ∃ri .E ∈ con(C 00 ). Since C v C 00 ,
there is an existential restriction ∃ri .Ci in con(C) such that Ci v E. If this
restriction is not removed or generalized when constructing C 0 , then we are
done since this restriction then belongs to con(C 0 ) and covers ∃ri .E. Otherwise,
Pi = {G | there is ∃ri .G ∈ H such that Ci v G} is non-empty.
     If > ∈ Pi , then ∃ri .> ∈ H ⊆ H 00 . However, then ∃ri .E ∈ con(C 00 ) covers an
element of H 00 , which is a contradiction.                       d
     Consequently, > 6∈ Pi , and thus ∃ri .Ci is replaced with F ∈SCG(Ci ,Pi ) ∃ri .F
when constructing C 0 from C. According to our definition of H 00 and the fact that
H ⊆ H 00 , none of the existential restrictions ∃ri .G considered in the definition
of Pi is covered by ∃ri .E ∈ con(C 00 ). This implies that E is a Pi -compliant
generalization of Ci . By induction (on the role depth) we can thus assume that
there is an F ∈ SCG(Ci , Pi ) such that F v E. This shows that ∃ri .E ∈ con(C 00 )
is covered by ∃ri .F ∈ con(C 0 ).                                                      t
                                                                                       u

   As an easy consequence of this lemma, we obtain that all optimal compliant
generalizations of C must belong to SCG(C, P).

Proposition 4. Let C be an EL-concept and P = {D1 , . . . , Dp } a policy. If C 00
is an optimal P-compliant generalization of C, then C 00 ∈ SCG(C, P) (up to
equivalence of concepts).

Proof. Let C 00 be an optimal P-compliant generalization of C. By Lemma 1,
there is an element C 0 ∈ SCG(C, P) such that C 0 v C 00 . In addition, by Propo-
sition 3, C 0 is a P-compliant generalization of C. Thus, optimality of C 00 implies
C 00 ≡ C 0 .                                                                      t
                                                                                  u

   We are now ready to formulate and prove the main result of this section.

Theorem 1. Let C be an EL-concept and P = {D1 , . . . , Dp } a policy. Then
the set of all optimal P-compliant generalizations of C can be computed in time
exponential in the size of C and D1 , . . . , Dp .
Proof. It is sufficient to show that the set SCG(C, P) can be computed in ex-
ponential time. In fact, given SCG(C, P), we can compute the set of all optimal
P-compliant generalizations of C by removing elements that are not minimal
w.r.t. subsumption, which requires at most exponentially many subsumption
tests. Each subsumption test takes at most exponential time since subsumption
in EL is in P , and the elements of SCG(C, P) have at most exponential size, as
shown below.
    We show by induction on the role depth that SCG(C, P) consists of at most
exponentially many elements of at most exponential size. The at most exponen-
tial cardinality of SCG(C, P) is an immediate consequence of the fact that there
are at most exponentially many hitting sets of con(Di1 ), . . . , con(Diq ), and each
yields exactly one element of SCG(C, P) (see Definition 4). Regarding the size
of these elements, note that we may assume by induction that an existential
restriction may be replaced by a conjunction of at most exponentially many ex-
istential restrictions, where each is of at most exponential size. The overall size
of the concept description obtained this way is thus also of at most exponential
size. Given this, it is easy to see that the computation of these elements also
takes at most exponential time.                                                     t
                                                                                    u

   The following example shows that the exponential upper bounds can indeed
by reached.

Example 2. Let C = P1 u Q1 u . . . u Pn u Qn and P = {Pi u Qi | 1 ≤ i ≤ n}. Then
SCG(C, P) contains 2n elements since the sets {P1 , Q1 }, . . . , {Pn , Qn } obviously
have exponentially many hitting sets. To be more precise,

         SCG(C, P) = {X1 u . . . u Xn | Xi ∈ {Pi , Qi } for i = 1, . . . , n}.

This example can easily be modified to enforce an element of exponential size.
Consider  b = ∃r.C and P
          C              b = {∃r.(Pi u Qi ) | 1 ≤ i ≤ n}. Then SCG(C,    b P)
                                                                           b =
 d
{ F ∈SCG(C,P) ∃r.F }. We leave it to the reader to further modify the example in
order to obtain exponentially many elements of exponential size.


4    Characterizing safety
Before we can characterize safety, we need to remove redundant elements from
P. We say that Di ∈ P is redundant if there is a different element Dj ∈ P such
that Di v Dj . The following lemma is easy to prove.

Lemma 2. Let P be a policy and assume that Di ∈ P is redundant. Then the
following holds for all EL concepts C, C 0 :
 – C 0 is compliant with P iff C 0 is compliant with P \ {Di };
 – C is safe for P iff C is safe for P \ {Di }.

    This lemma shows that we can assume without loss of generality that our
policies do not contain redundant concepts. However, elements of Di of P may
also contain redundant atoms. In fact, if E, F are different atoms in con(Di ) such
that E v F , then the concept obtained from Di by removing F from its top-level
conjunction is equivalent to Di . By iteratively removing such redundant atoms
from the top-level conjunction of Di we obtain (in polynomial time) a concept
Di0 equivalent to Di such that the elements of con(Di0 ) are incomparable w.r.t.
subsumption. We call a policy redundancy-free if it does not contain redundant
elements and every element is normalized in this sense.
Proposition 5. Let P = {D1 , . . . , Dp } be a redundancy-free policy. The EL
concept C 0 is safe for P iff there is no pair of atoms (E, F ) such that E ∈
con(C 0 ), F ∈ con(D1 ) ∪ . . . ∪ con(Dp ), and E v F .
Proof. First, assume that C 0 is not safe for P, i.e., there is an EL concept C 00
that is compliant with P, but for which C 0 u C 00 is not compliant with P. The
latter implies that there is Di ∈ P such that C 0 u C 00 v Di , which is equivalent
to saying that con(C 0 ) ∪ con(C 00 ) covers con(Di ). On the other hand, we know
that con(C 00 ) does not cover con(Di ) since C 00 is compliant with P. Thus, there
is an element F ∈ con(Di ) that is covered by an element E of con(C 0 ). This
yields (E, F ) such that E ∈ con(C 0 ), F ∈ con(D1 ) ∪ . . . ∪ con(Dp ), and E v F .
    Conversely, assume that there is a pair of atoms (E, F ) such that E ∈
con(C 0 ), F ∈ con(Di ), and E v F . Let C 00 be the concept obtained from Di by
removing F from the top-level conjunction of Di . Then we clearly have Di v C 00 .
In addition, since Di is normalized, we also have C 00 6v Di . Consider Dj ∈ P
different from Di , and assume that C 00 v Dj . But then Di v C 00 v Dj con-
tradicts our assumption that P does not contain redundant elements. Thus, we
have shown that C 00 is compliant with P. In addition, con(C 0 ) ∪ con(C 00 ) covers
con(Di ). In fact, the elements of con(Di ) \ {F } belong to con(C 00 ), and thus
cover themselves. In addition, F is covered by E ∈ con(C 0 ). Thus C 0 u C 00 v Di ,
which shows that C 0 is not safe for P.                                           t
                                                                                  u
    Clearly, the necessary and sufficient condition for safety stated in this propo-
sition can be decided in polynomial time. If needed, the policy can first be made
redundancy-free, which can also be done in polynomial time.
Corollary 1. Safety of an EL concept for an EL policy is in P .
    We now consider the problem of computing optimal P-safe generalizations
of a given EL concept C. First note that, up to equivalence, there can be only
one optimal P-safe generalization of C. This is an immediate consequence of the
fact that the conjunction of safe concepts is again safe, which in turn is an easy
consequence of Proposition 5.
Lemma 3. Let C10 , C20 be two EL concepts that are P-safe generalizations of C,
where P is redundancy-free. Then C10 u C20 is also a P-safe generalization of C.
    Thus there cannot be non-equivalent optimal P-safe generalizations of a
given EL concept C since their conjunction would then be more specific, contra-
dicting their optimality. This property is independent of whether the policy is
redundancy-free or not since turning a policy into one that is redundancy-free
preserves the set of concepts that are compliant with (safe for) the policy.
Proposition 6. If C10 , C20 are optimal P-safe generalizations of the EL concept
C, then C10 ≡ C20 .

   The following theorem shows how an optimal safe generalization of C can be
constructed.

Theorem 2. Let C be an EL concept and P = {D1 , . . . , Dp } a redundancy-free
policy. We construct the concept C 0 from C by removing or modifying atoms in
the top-level conjunction of C in the following way:

 – For every concept name A ∈ con(C), remove A from the top-level conjunc-
   tion of C if A ∈ con(D1 ) ∪ . . . ∪ con(Dp );
 – For every existential restriction ∃ri .Ci ∈ con(C), consider the set of concepts

     Pi := {G | there is ∃ri .G ∈ con(D1 ) ∪ . . . ∪ con(Dp ) such that Ci v G}.

     • If Pi = ∅ then leave ∃ri .Ci as it is.
     • If > ∈ Pi , then remove ∃ri .Ci .d
     • Otherwise, replace ∃ri .Ci with F ∈OCG(Ci ,Pi ) ∃ri .F, where OCG(Ci , Pi )
       is the set of all optimal Pi -compliant generalizations of Ci .

Then C 0 is an optimal P-safe generalization of C.

Proof. Obviously C v C 0 since, when constructing C 0 from C, atoms from the
top-level conjunction of C are left unchanged, are removed, or are replaced by a
conjunction of more general atoms.
    To show that C 0 is safe for P, we must show that the condition of Proposi-
tion 5 holds. Thus assume that it is violated, i.e., there is a pair of atoms (E, F )
such that E ∈ con(C 0 ), F ∈ con(D1 ) ∪ . . . ∪ con(Dp ), and E v F .

 – First, we consider the case where E = A is a concept name. Then E v F
   implies that F = A, and thus A is a concept name occurring in con(D1 ) ∪
   . . . ∪ con(Dp ). However, all such concept names have been removed from
   the top-level conjunction of C when constructing C 0 . This contradicts our
   assumption that E = A belongs to con(C 0 ).
 – Second, assume that E is an existential restriction E = ∃ri .E 0 . Then F
   is of the form F = ∃ri .G0 and E 0 v G0 . In addition, there is an exis-
   tential restriction ∃ri .Ci ∈ con(C) from which E = ∃ri .E 0 was derived.
   By construction, Ci v E 0 . In the construction of C 0 , we consider the set
   Pi := {G | there is ∃ri .G ∈ con(D1 ) ∪ . . . ∪ con(Dp ) such that Ci v G}.
   Since Ci v E 0 v G0 , this set is non-empty, and since ∃ri .E 0 is derived from
   ∃ri .Ci , it does not contain >. Consequently, we have E 0 ∈ OCG(Ci , Pi ).
   However, G0 ∈ Pi then implies that E 0 6v G0 , which yields the desired con-
   tradiction.

   It remains to show that C 0 is optimal. Thus assume that C 00 is a P-safe
generalization of C. It is sufficient to show that C 0 v C 00 , i.e., that con(C 0 )
covers con(C 00 ).
 – Assume that A ∈ con(C 00 ) is a concept name. Then C v C 00 implies that
   A ∈ con(C). In addition, since C 00 is safe for P, Proposition 5 implies that
   A 6∈ con(D1 ) ∪ . . . ∪ con(Dp ). Thus, A is not removed in the construction of
   C 0 , which yields A ∈ con(C 0 ).
 – Second, consider an existential restriction ∃ri .E ∈ con(C 00 ). Since C v C 00 ,
   there is an existential restriction ∃ri .Ci in con(C) such that Ci v E. If this
   restriction is not removed or generalized when constructing C 0 , then we are
   done since this restriction then belongs to con(C 0 ) and covers ∃ri .E. Other-
   wise, Pi = {G | there is ∃ri .G ∈ con(D1 )∪. . . ∪con(Dp ) such that Ci v G}
   is non-empty.
   If > ∈ Pi , then ∃ri .> ∈ con(D1 ) ∪ . . . ∪ con(Dp ). However, then ∃ri .E ∈
   con(C 00 ) covers an element of con(D1 ) ∪ . . . ∪ con(Dp ), which is a contradic-
   tion to our assumption that C 00 is safe for P.              d
   Consequently, > 6∈ Pi , and thus ∃ri .Ci is replaced with F ∈OCG(Ci ,Pi ) ∃ri .F
   when constructing C 0 from C. Since C 00 is safe for P, none of the exis-
   tential restrictions ∃ri .G considered in the definition of Pi is covered by
   ∃ri .E ∈ con(C 00 ). This implies that E is a Pi -compliant generalization of Ci .
   Consequently, there is an F ∈ OCG(Ci , Pi ) such that F v E. This shows
   that ∃ri .E ∈ con(C 00 ) is covered by ∃ri .F ∈ con(C 0 ).                      t
                                                                                   u

   Since, by Theorem 1, OCG(Ci , Pi ) can be computed in exponential time, the
construction described in Theorem 2 can also be performed in exponential time.

Corollary 2. Let C be an EL concept and P = {D1 , . . . , Dp } a redundancy-
free policy. Then an optimal P-safe generalization of C can be computed in
exponential time.

Example 2 can easily be modified to provide an example that shows that this
exponential bound can actually not be improved since there are cases where the
safe generalization is of exponential size.


5   Conclusion
We have introduced the notions of compliance with and safety for a policy in
the simple setting where both the knowledge about individuals and the policy
are given by EL concepts. In this setting, we were able to characterize compliant
(safe) generalization of a given concept w.r.t. a policy, and have used these char-
acterizations to obtain algorithms for computing these generalizations. These
algorithms need exponential time, which is optimal since the generalizations
may be of exponential size.
    In the future, we intend to extend this work in two directions. On the one
hand, we will consider EL concepts w.r.t. a background ontology. On the other
hand, we will consider a setting where the ABox contains not just concept as-
sertions, but also role assertions. In the latter case, one can use not just general-
ization of concepts, but also renaming of individuals as operations for achieving
compliance (safety). Finally, of course, these two extensions should be combined.
References
1. F. Baader, S. Brandt, and C. Lutz. Pushing the EL envelope. In Proceedings of
   the Nineteenth International Joint Conference on Artificial Intelligence IJCAI-05,
   Edinburgh, UK, 2005. Morgan-Kaufmann Publishers.
2. F. Baader, D. Calvanese, D. L. McGuinness, D. Nardi, and P. F. Patel-Schneider, ed-
   itors. The Description Logic Handbook: Theory, Implementation, and Applications.
   Cambridge University Press, New York, NY, USA, 2003.
3. F. Baader, R. Küsters, and R. Molitor. Computing least common subsumers in
   description logics with existential restrictions. In Proc. of the 16th Int. Joint Conf.
   on Artificial Intelligence (IJCAI’99), pages 96–101, 1999.
4. F. Baader and B. Morawska. Unification in the description logic EL. Logical Methods
   in Computer Science, 6(3), 2010.
5. S. Brandt. Polynomial time reasoning in a description logic with existential restric-
   tions, GCI axioms, and—what else? In R. L. de Mántaras and L. Saitta, editors,
   Proc. of the 16th Eur. Conf. on Artificial Intelligence (ECAI 2004), pages 298–302,
   2004.
6. B. Cuenca Grau and E. V. Kostylev. Logical foundations of privacy-preserving
   publishing of linked data. In Proceedings of the Thirtieth AAAI Conference on
   Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona, USA., pages 943–
   949, 2016.
7. B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu. Privacy-preserving data publishing:
   A survey of recent developments. ACM Comput. Surv., 42(4):14:1–14:53, 2010.
8. I. Horrocks, L. Li, D. Turi, and S. Bechhofer. The instance store: DL reasoning with
   large numbers of individuals. In Proceedings of the 2004 International Workshop on
   Description Logics (DL2004), Whistler, British Columbia, Canada, June 6-8, 2004,
   2004.