Privacy-Preserving Ontology Publishing:
         The Case of Quantified ABoxes w.r.t. a
            Static Cycle-Restricted EL TBox?

             Franz Baader1 , Patrick Koopmann1 , Francesco Kriegel1 ,
                  Adrian Nuradiansyah1 , and Rafael Peñaloza2
1
    Theoretical Computer Science, Technische Universität Dresden, Dresden, Germany
                        firstname.lastname@tu-dresden.de
                    2
                      University of Milano-Bicocca, Milano, Italy
                            rafael.penaloza@unimib.it


        Abstract. We review our recent work on how to compute optimal re-
        pairs, optimal compliant anonymizations, and optimal safe anonymiza-
        tions of ABoxes containing possibly anonymized individuals. The results
        can be used both to remove erroneous consequences from a knowledge
        base and to hide secret information before publication of the knowledge
        base, while keeping as much as possible of the original information.

        Keywords: Repair · Compliance · Safety · Privacy-preserving ontology
        publishing · Optimality · Complexity · Practical algorithm


1     Introduction
In contrast to most of the work in description logic (DL), which is about how to
derive consequences of a DL knowledge base (KB) efficiently, this paper is about
how to get rid of consequences. The reason for this wish can, on the one hand, be
that a certain consequence is incorrect, and thus one wants to repair the KB to
get rid of this error. On the other hand, one may want to remove a consequence
since it is considered to be private information that is not supposed to be made
public. In both cases, the new KB should not introduce new consequences (i.e.,
it should be entailed by the original one), and it should be optimal in the sense
that a minimal amount of consequences is removed (i.e., it should be as close as
possible to the original one w.r.t. the entailment relation).
     Though both scenarios share the wish to remove consequences, there are
some differences. On the technical side, in the context of repairs one usually
considers a single consequence or a finite set of consequences of the form C(a),
i.e., one wants to get rid of instance relationships for specific individuals [3].1 The
resulting KB is then called a repair of the original one. In the context of privacy,
?
  Copyright c 2021 for this paper by its authors. Use permitted under Creative Com-
  mons License Attribution 4.0 International (CC BY 4.0)
1
  We restrict the attention here to derived instance relationships, though repairs for
  subsumption relationships have also been considered in the literature [7].
2                               Baader, Koopmann, Kriegel, Nuradiansyah, Peñaloza

one usually considers a policy P, consisting of one or a finite number of concepts,
and wants to get rid of all consequences of the form C(a) for C ∈ P and a a
named individual [8,9,17,18]. The resulting KB is then said to be a P-compliant
anonymization of the original one. Instead of changing the data before publishing
it, one could also provide only restricted access through queries, whose answers
are monitored by a “censor”, which may decide not to give an answer or even lie
if needed to satisfy the policy. There has been quite some work in this direction
in the database and the DL community [12, 14–16, 23], but this approach is not
the topic of the present paper.
    On the intentional side, achieving compliance is not always sufficient to guar-
antee privacy [9, 17, 18]. In fact, an attacker may already have some knowledge,
which does not imply the secret, but which together with a published compli-
ant anonymization may be used to derive the secret information. Thus, in the
context of privacy, one is interested in computing anonymization that are safe
in the sense that, even if extended with an arbitrary compliant KB, they do not
imply C(a) for C ∈ P and a a named individual.
    In the general setting of a DL KB consisting of a TBox and an ABox, optimal
repairs (optimal compliant/safe anonymizations) need not exist [3, 7]. There are
two ways to overcome this problem. On the one hand, one can weaken the notion
of optimality and restrict the attention to repairs (anonymizations) that can
be obtained from the original KB by applying certain repair (anonymization)
steps. This approach is, e.g., followed in [17, 18] in the setting of privacy and
in [7,19,22,29] for the repair scenario. Classical repair approaches that completely
remove axioms rather than just weakening them also fall under this category [11,
21, 24, 25, 27, 28].
    On the other hand, one can stick with the quest for optimality, and restrict
the considered KBs such that optimality can be achieved. Our first work in this
direction [6, 10] considered compliance and safety in the very restricted setting
of an EL instance store [20], i.e., where there is no TBox and the ABox does not
contain role assertions. In the first paper, the attacker’s knowledge is considered
to be a set of EL concept assertions (an EL instance store) whereas in the second
also other DLs are used to represent the attacker’s knowledge. In [8] we extended
the results of [6] to ABoxes with role assertions (and still no TBox), but restricted
the attention to compliance for EL policies. In [9] we investigated safety in the
same setting, but had to restrict the policies to ones consisting of a single EL
concept (singleton policies). Finally, in [3] we extended the results of [8] in two
directions, but formulate the new results in the (more general) context of repairs
rather than compliance. On the one hand, we add a TBox, which must however
be cycle-restricted. On the other hand, we develop a more practical algorithm
for computing optimal repairs.
    This paper summarizes the results obtained in our previous publications [3,8,
9], but presents them uniformly in the setting of privacy. In addition, it extends
the results of [9] by developing a more practical algorithm for computing optimal
safe anonymizations in the setting without a TBox. Finally, we show that using
TBoxes one can reduce safety for general policies to safety for singleton policies.
                                      Privacy-Preserving Ontology Publishing         3

Since characterizing safety for general policies is an open problem, this shows
that extending our results for safety to the case with TBox is a non-trivial
problem. Proofs of our new results can be found in the technical report [5].


2    Preliminaries

We use concepts of the lightweight DL EL both to define TBoxes and to for-
mulate which consequences are unwanted. The data are represented in the form
of quantified ABoxes, which are atomic ABoxes (i.e., ones not containing as-
sertions for complex concepts) in which some individual names are assumed to
be anonymized. While such anonymous individuals do not belong to the stan-
dard DL repertoire, they are actually available in OWL. Also, such ABoxes
have already been used in previous work on privacy-preserving ontology publish-
ing [17, 18]. Finally, note that concept and role assertions involving anonymous
individuals can be used to express concept assertions for complex concepts.


EL concepts and TBoxes. We assume basic knowledge about DLs [2]. Specif-
ically, we consider the DL EL, defined over a fixed signature Σ, which is the dis-
joint union of the countably infinite sets ΣO , ΣC , and ΣR of object names, concept
names, and role names. EL concepts are built using the concept constructors >,
u and ∃. We treat conjunctions as sets, that is, they do not contain duplicates
and the order is irrelevant. EL TBoxes, in the following just called TBoxes, are
defined as usual as sets of concept inclusions (CIs) C v D. We use the nota-
tion C vT D (alternatively T |= C v D) to denote that C v D holds in all
models of T . A TBox is called cycle-restricted if there is no non-empty sequence
of role names r1 , . . . , rk and no EL concept C such that C vT ∃ r1 . · · · ∃ rk . C.
Cycle-restrictedness of a given TBox can be decided in polynomial time [1].
    An atom is of the form A or ∃r.C, where A ∈ ΣC , r ∈ ΣR , and C is a concept.
Every EL concept C is a conjunction of atoms (with > as empty conjunction),
called the top-level conjunction of C. We denote the set of atoms occurring in
it as Conj(C). Given a TBox T and a set C of concepts, we use Sub(T , C) to
denote the set of concepts occurring in T and C (as elements or subconcepts),
Atoms(T , C) to denote the set of atoms occurring in T and C, and similarly for
Sub(C) and Atoms(C) for the concepts and atoms occurring in C. Given two sets
of EL concepts K and L, we say that K is covered by L (written K ≤ L) if, for
every C ∈ K, there is D ∈ L s.t. C v∅ D.


Quantified ABoxes. We use a generalisation of ABoxes called quantified
ABoxes (qABoxes) to adequately represent anonymous individuals as in OWL
and nulls common in database systems, which play a central role in anonymiza-
tion [18]. To illustrate, consider the ABox {r(a, b), A(a), B(b)}, and assume we
want to hide the fact that b is an instance of B. Quantified ABoxes allow us
to achieve this in a better way than by just deleting the fact B(b), namely by
additionally adding an anonymous copy of b, resulting in the quantified ABox
4                              Baader, Koopmann, Kriegel, Nuradiansyah, Peñaloza

∃ {x}. {r(a, b), A(a), r(a, x), B(x)}, for which a is still an instance of ∃r.B. In
fact, this qABox is equivalent to the ABox {r(a, b), (A u ∃r.B)(a)}, which uses
a concept assertion involving the complex concept A u ∃r.B.
    Essentially, qABoxes are syntactic variants of conjunctive queries. Formally,
a qABox is of the form ∃ X. A, where X is a finite subset of ΣO , the elements of
which are called variables, and A is the matrix, a finite set of concept assertions
A(u) where u ∈ ΣO and A ∈ ΣC , and of role assertions r(u, v) where u, v ∈
ΣO and r ∈ ΣR . Without loss of generality, we assume different qABoxes to
use disjoint sets of variables. A non-variable object name in ∃ X. A is called an
individual name, and the set of all these names is denoted as ΣI (∃ X. A). We
further set ΣO (∃ X. A) := ΣI (∃ X. A) ∪ X. Traditional DL ABoxes are qABoxes
where X = ∅; we then write A instead of ∃ ∅. A. The matrix A of a qABox
∃ X. A is such a traditional ABox. An interpretation I is a model of a qABox
∃ X. A if there is an interpretation J such that ∆I = ∆J , the interpretation
functions ·I and ·J coincide on Σ \ X, and uJ ∈ AJ for each A(u) ∈ A as well
as (uJ , v J ) ∈ rJ for each r(u, v) ∈ A.
    Let T be a TBox and ∃ X. A, ∃ Y. B two qABoxes. We write ∃ X. A |=T ∃ Y. B
to express that every model of T and ∃ X. A is also a model of ∃ Y. B, in which
case we say ∃ Y. B is entailed by ∃ X. A w.r.t. T . Entailment of traditional ABoxes
from a qABox can be decided in polynomial time, while entailment between
qABoxes is NP-complete.


3   Computing Optimal Compliant Anonymizations

A policy is a finite set of EL concepts. Intuitively, a policy says that one should
not be able to derive that any of the individuals of a qABox belongs to a concept
in the policy. To make a given qABox compliant to a policy, we compute an
anonymization of it, which is a compliant qABox entailed by it. Intuitively,
such an anonymization is optimal if it does not remove more information than
necessary.

Definition 1. Let T be a TBox, P be a policy, and ∃ X. A, ∃ Y. B be qABoxes.

1. ∃ X. A is compliant with P w.r.t. T if, for each a ∈ ΣI (∃ X. A) and C ∈ P,
   ∃ X. A 6|=T C(a),
2. ∃ Y. B is a P-compliant anonymization of ∃ X. A w.r.t. T if ∃ X. A |=T ∃ Y. B
   and ∃ Y. B is compliant with P w.r.t. T ;
3. ∃ Y. B is an optimal P-compliant anonymization of ∃ X. A w.r.t. T if ad-
   ditionally ∃ Z. C |=T ∃ Y. B implies ∃ Y. B |=T ∃ Z. C for every P-compliant
   anonymization ∃ Z. C of ∃ X. A w.r.t. T .

Since, in EL, entailment of concept assertions (viewed as singleton ABoxes) is in
P, we can decide compliance in polynomial time. More interesting is the question
of how to compute a (preferably optimal) anonymization for a given qABox. This
problem is investigated in [8] for the case without TBox, and in [3] for the case
with TBoxes. These works also consider a weaker version of entailment, called
                                        Privacy-Preserving Ontology Publishing             5

IQ-entailment, for the case where we are only interested in instance queries,
and [3] considers a generalisation of anonymizations called ABox repairs, where
instead of a policy, a set of assertions is given that should not be entailed. For
brevity, we focus here on the version of anonymizations defined above.
    To guarantee existence of optimal anonymizations, we restrict ourselves to
cycle-restricted TBoxes. An example where the TBox is not cycle-restricted
and where no optimal repairs exist is as follows. Consider the traditional
ABox {A(a)}, the TBox {A v ∃ r. A, ∃ r. A v A}, and the policy {A}. Intu-
itively, an optimal anonymization would have to entail any qABox of the form
∃ {x0 , . . . , xn }. {r(a, x0 ), r(xi , xi+1 ) | 0 ≤ i ≤ n − 1} for n ≥ 0, which is not pos-
sible for a qABox entailed by {A(a)} w.r.t. T . A formal proof that there is no
optimal compliant anonymization in this case can be found in [7]. As shown
in [3], this problem can be avoided by considering IQ-entailment, which we do
not discuss here.
    Next, we present a class of anonymizations called canonical anonymizations,
which cover all optimal anonymizations. They are given by a rather elegant
direct definition, but may be hard to compute in practice. We then present an
optimized approach that computes smaller representations of them.


3.1    Canonical Compliant Anonymizations

If the TBox T is cycle-restricted, it is possible to compute (in exponential time)
its saturation, i.e., a qABox satT (∃ X. A) such that for every qABox ∃ Y. B,
∃ X. A |=T ∃ Y. B iff satT (∃ X. A) |=∅ ∃ Y. B. The saturation integrates into the
qABox all relevant information that can be inferred using the TBox, so that
entailments can be decided without use of the TBox (see [3] for how to compute
satT (∃ X. A)).
    In our approach, we first compute the saturation, and then perform the
actual anonymization based on repair types and compliance seed functions. For
convenience, we fix in the following the TBox T , policy P and qABox ∃ X. A
given as input, and abbreviate ΣI (∃ X. A) as ΣI . A repair type specifies for a
given object which entailments are to be removed by the anonymization.

Definition 2. Let ∃ Y. B := satT (∃ X. A) and u ∈ ΣO (∃ Y. B). A repair type for
u is a subset K of Atoms(P, T ) that satisfies the following:

 1. B |=∅ C(u) for each atom C ∈ K,
 2. if C, D are distinct atoms in K, then C 6v∅ D,
 3. K is premise-saturated w.r.t. T , i.e., for all C ∈ Sub(P, T ) s.t. B |=∅ C(u)
    and C vT D for some D ∈ K, there is E ∈ K such that C v∅ E.

Condition 1 makes sure the concepts in the repair type are indeed entailed for
the given individual. Condition 2 avoids redundancies, and Condition 3 ensures
that removing the corresponding assertions is effective also in presence of the
TBox. The compliance seed function now assigns to every named individual a
repair type based on the given policy.
6                              Baader, Koopmann, Kriegel, Nuradiansyah, Peñaloza

Definition 3. A compliance seed function is a function s that maps each in-
dividual name b ∈ ΣI to a repair type s(b) for b such that, if C ∈ P and
satT (∃ X. A) |=∅ C(b), then there is D ∈ s(b) such that C v∅ D.

Each compliance seed function induces a compliant anonymization defined next.
Intuitively, for concept names A ∈ s(a), we simply remove the concept assertion
A(a) from A. For atoms of the form ∃ r. C ∈ s(a), we need to modify the role
successors of a such that ∃ r. C(a) is no longer entailed. To avoid losing more
information than necessary, we do not just remove assertions from the objects
in A, but also create copies of objects by introducing new variables, which are
based on the set of repair types for each object name.

Definition 4. Given a compliance seed function s, we define the canonical com-
pliant anonymization caT (∃ X. A, s) induced by s as the qABox ∃ Y. B where:
1. The set Y consists of the variables yu,K s.t. u is an object name in
   satT (∃ X. A) and K is a repair type for u, except for the case where u is
   an individual name and K = s(u). In the latter case, we keep the individual
   name u, but use yu,s(u) as a synonym for u in the definition of B below.
2. The matrix B consists of the following assertions:
   (a) A(yu,K ) if A(u) occurs in satT (∃ X. A) and A 6∈ K, and
   (b) r(yu,K , yv,L ) if r(u, v) occurs in satT (∃ X. A) and for each ∃ r. C ∈ K s.t.
       the matrix of satT (∃ X. A) entails C(v), there exists D ∈ L s.t. C v∅ D.
Every qABox caT (∃ X. A, s) induced by a seed function s is a compliant
anonymization of ∃ X. A, but it need not be optimal. However, every optimal
compliant anonymization is induced (up to equivalence) by some seed function.
Thus, we can compute all optimal compliant anonymizations (modulo equiva-
lence) by computing all canonical compliant anonymizations and then removing
the non-optimal ones. The latter requires testing entailment between quantified
ABoxes.

Theorem 5 ([3]). There is a deterministic, exponential time algorithm with
access to an NP oracle that computes the set of all optimal compliant anonymiza-
tions of ∃ X. A for P w.r.t. T .

3.2   Optimality Using Minimal Seed Functions
The NP oracle in Theorem 5 is needed for the NP-complete entailment test,
which is applied to exponentially large qABoxes. If it is sufficient to compute
some, rather than all, optimal compliant anonymizations, we can dispense with
the NP oracle and instead utilize a (polynomial time decidable) partial order on
seed functions [8]. For two compliance seed functions s and t, we say that s is
covered by t (written s ≤ t) if s(a) is covered by t(a) for every a ∈ ΣI , i.e., for
every C in s(a) there is D in t(a) s.t. C v∅ D.

Proposition 6 ([8]). If caT (∃ X. A, s) |=T caT (∃ X. A, t) for two compliance
seed functions s and t, then s ≤ t.
                                     Privacy-Preserving Ontology Publishing         7

This was shown in [8] for the case without a TBox, but the proof can easily be
extended to the case considered here.
   The proposition implies that each minimal seed function induces an optimal
anonymization. Since there is always at least one minimal seed function and since
≤ can be decided in polynomial time, we can draw the following conclusion.
Theorem 7 ([8]). A non-empty set of optimal compliant anonymizations of
∃ X. A for P w.r.t. T can be computed in exponential time.

3.3    Smaller Optimal Compliant Anonymizations
Since the number of variables introduced in a canonical compliant anonymization
is always exponential in the size of the TBox and the policy,2 computing even
one of them in practice quickly becomes infeasible. The exponential blow-up is
in general not avoidable, already for the very limited case without TBox and
where the qABox corresponds to an EL instance store [6]. However, in many
practical cases, we can compute a compliant anonymization that is significantly
smaller than the canonical compliant anonymization, but logically equivalent to
it [3, 4]. The idea is to avoid introducing unnecessary variables by starting with
the individual names and unmodified single copies of all object names, and then
incrementally determining which variables of the canonical anonymization need
to be included, where in each step we only look at the immediate role-successors
of each object name and the requirements expressed in the associated repair type.
    To be more precise, let s be a repair seed function and ∃ Y. B := caT (∃ X. A, s).
According to Definition 4, we have r(yt,K , yu,L ) ∈ B iff satT (∃ X. A) contains the
role assertion r(t, u) and the repair type L covers
Succ(K, r, u) := { C | ∃ r. C ∈ K and the matrix of satT (∃ X. A) entails C(u) }.
Our procedure produces a sequence Y0 , Y1 , . . . , Ym of subsets Yi of Y such that
∃ Y. B is equivalent to ∃ Ym . Bm , where Bm is the subset of B that uses only
objects from ΣI ∪ Ym . We start with the set
           Y0 := { yt,∅ | t is an object name occurring in satT (∃ X. A) }.
The subsequent sets are obtained by exhaustively applying the following rule:
Compliant Anonymization Rule.
  If (i) yt,K , yu,L ∈ ΣI ∪ Yi , (ii) r(t, u) occurs in satT (∃ X. A), (iii) L does
  not cover Succ(K, r, u), (iv) there is a covering-minimal repair type M for
  u that covers L ∪ Succ(K, r, u), and (v) yu,M 6∈ ΣI ∪ Yi ,
  then set Yi+1 := Yi ∪ {yu,M }.
Since each rule application adds a variable, the exhaustive application of the
Compliant Anonymization Rule must terminate after finitely many steps with a
set Ym ⊆ Y of variables. We call ∃ Ym . Bm the optimized compliant anonymization
of ∃ X. A w.r.t. T induced by the seed function s.
2
    However, canonical anonymizations can be computed in polynomial time w.r.t. data
    complexity, i.e., if only the size of the qABox counts (TBox and policy fixed).
8                                 Baader, Koopmann, Kriegel, Nuradiansyah, Peñaloza

                      relative           yx,∅            spouse        jerry
          ben
                                    yx,{Comedian}
         yben,∅
                                 yx,{∃spouse.Comedian}            yjerry,{Comedian}

                            yx,{Comedian,∃spouse.Comedian}

Fig. 1: Canonical anonymization (all nodes) and the subset that is the optimized
anonymization (non-shadowed nodes). Gray nodes denote instances of Comedian.


Theorem 8 ([3]). For each compliance seed function s, the optimized compliant
anonymization induced by s is equivalent to caT (∃ X. A, s).

To compute Bm we do not need to compute the larger matrix B first. Instead, we
directly apply the definition of the matrix (Definition 4) to the object names in
ΣI ∪ Ym . Experiments with an implementation3 of this procedure (for the more
general case of ABox repairs) indicate that applying this optimized procedure
reduces the size of the computed compliant anonymizations considerably [3, 4].

Example 9. To illustrate both kinds of anonymizations, consider an empty TBox,
policy P := {P } for P := ∃relative.(Comedian u ∃spouse.Comedian), and qABox
∃ X. A := ∃ {x}. {relative(ben, x), Comedian(x), spouse(x, jerry), Comedian(jerry)}.
As seed function, we select s s.t. s(ben) = {P } and s(jerry) = ∅. Fig. 1 depicts
both the canonical and the optimized compliant anonymization.


4     Safety of Quantified ABoxes, Mainly Without TBox

To guarantee privacy, policy compliance is not always sufficient since an attacker
may have additional knowledge that, by itself, does not reveal the secret, but
which, together with the to be published compliant information, would violate
the privacy policy. This is captured by the notion of safety: a qABox ∃ X. A is safe
for a given policy P if for every P-compliant ∃ Y. B, the union ∃ (X ∪ Y ). (A ∪ B)
is compliant with P as well.
    This definition is based on the assumption that the additional knowledge
possessed by the attacker is also in the form of a qABox (the qABox ∃ Y. B in
the formal definition). Since we do not know which additional knowledge the
attacker has, we need to consider all possible compliant qABoxes ∃ Y. B. Non-
compliant qABoxes ∃ Y. B need not be considered here: in fact, it is useless trying
to hide the secret information from such an attacker that already knows it.
    For instance, the canonical compliant anonymization shown in Figure 1 is
not safe since one could add the compliant qABox

                ∃ {y}. {relative(ben, y), Comedian(y), spouse(y, jerry)}.
3
    https://github.com/de-tu-dresden-inf-lat/abox-repairs-wrt-static-tbox
                                                     Privacy-Preserving Ontology Publishing        9

In the resulting qABox, Ben is an instance of the policy concept P .
    In [9], we give a characterization for safety of qABoxes for singleton policies,4
which are of the form {P } for an EL concept P . Specifically, safety for {P } is
violated if (1) A(a) ∈ A for some individual name a and A ∈ Atoms(P ), or
(2) r(a, u) ∈ A and ∃r.D ∈ Atoms(P ) such that a part of the concept D can
be found in ∃ X. A at the specific object u — in both cases we can construct
attacking compliant qABoxes as certificates for non-safety. The second condition
is captured by the notion of partial homomorphisms (cf. Definition 3.6 in [9]).
Intuitively, a partial homomorphism from a concept D to a qABox is “almost”
a homomorphism,5 but which only maps all those nodes of the syntax tree of D
that are between the root and a “cut.” Figure 2 shows an example: the “cut” is
depicted as the green line. These two conditions are not only necessary but also
sufficient for safety.


                        A    B
                                                                            B
                        r        r                                          c
                                         A       B
                                                                                r
                                                                                            A, B
                             s           r       r                                      r
                         B                                  y               b                z
                    B                                               r                   s
                                                 A         A
                                                                s       r           s
                             r               s
                                     A
                                                                    x                   a
                                                            r               r
                                                                    A                   A

Fig. 2: A partial homomorphism from the concept on the left to the quantified
ABox on the right at variable x, depicted by the dashed arrows.


Proposition 10 ([9]). ∃ X. A is safe for {P } iff, for each individual name a,
the following holds: (1) if A ∈ Atoms({P }), then A(a) 6∈ A and (2) if r(a, u) ∈ A
and ∃r.D ∈ Atoms({P }), then there is no partial homomorphism from D to
∃ X. A at u.
Since the existence of a partial homomorphism can be decided in polynomial
time [9], we obtain the following complexity result.

Theorem 11. Safety of qABox w.r.t. singleton EL policies is in P.

4.1    Canonical Safe Anonymizations
If a qABox turns out not to be safe, we again want to compute an anonymization
that is safe and that preserves as much information from the original qABox as
possible. We say that a qABox ∃ Y. B is a {P }-safe anonymization of ∃ X. A if
4
    Characterizing safety for general policies is an open problem.
5
    Homomorphisms come into play since they characterize the instance problem in EL.
10                              Baader, Koopmann, Kriegel, Nuradiansyah, Peñaloza

∃ X. A |= ∃ Y. B and ∃ Y. B is safe for {P }. Such an anonymization is optimal
if there is no {P }-safe anonymization ∃ Z. C of ∃ X. A that lies strictly between
∃ X. A and ∃ Y. B w.r.t. the entailment order. In [9], we presented an approach
for computing a unique optimal safe anonymization in exponential time. The
approach computes a qABox called canonical safe anonymization that entails
each {P }-safe anonymization of ∃ X. A.
Definition 12. The canonical safe anonymization sa(∃ X. A, {P }) of ∃ X. A
w.r.t. {P } is defined as the qABox ∃ Y. B such that

1. the set Y consists of the variables yt,K where t is an object name occurring in
   ∃ X. A and K is a subset of Atoms({P }) that does not contain v∅ -comparable
   atoms, and
2. the matrix B consists of the following assertions:
   (a) A(yt,K ) if A(t) occurs in A and A 6∈ K,
   (b) r(yt,K , yu,L ) provided r(t, u) ∈ A and, for each ∃ r. C ∈ K, there is D ∈ L
       with C v∅ D,
   (c) r(yt,K , b) if r(t, b) occurs in A and there is no ∃ r. C ∈ K.
   In these conditions, the first object name yt,K may also stand for an individ-
   ual name a, which is then treated like the variable ya,Max(Atoms({P })) , where
   Max(K) collects the subsumption-maximal elements of K modulo equivalence.

As in the case of compliance, the canonical safe anonymizations introduce an
exponential number of copies for each object in the input, which may make a
computation infeasible in practice.


4.2   Making It Smaller Again

Similar to the case of compliant anonymizations, we can reduce the number
of variables in the safe anonymization by creating copies only when needed.
According to Definition 12, r(yt,K , yu,L ) ∈ B iff r(t, u) ∈ A and L covers
Succ(K, r) := {C | ∃r.C ∈ K}. To compute the optimized safe anonymization,
we again produce a sequence Y0 , . . . , Ym of subsets of Y . Starting with the set
Y0 := {yt,∅ | t ∈ ΣO (∃ X. A)}, and applying the following two rules exhaustively.

Safe Anonymization Rule 1.
   If (i) yt,K , yu,L ∈ Yi , (ii) r(t, u) ∈ A, (iii) L does not cover Succ(K, r), (iv)
   M is a cover-minimal set of atoms covering L∪Succ(K, r), but (v) yu,M 6∈ Yi ,
   then set Yi+1 := Yi ∪ {yu,M }
Safe Anonymization Rule 2.
   If (i) a ∈ ΣI and yu,L ∈ Yi , (ii) r(a, u) ∈ A, (iii) L does not cover
   Succ(Max(Atoms({P })), r), (iv) M is a cover-minimal set of atoms covering
   L ∪ Succ(Max(Atoms({P })), r), but (v) yu,M 6∈ Yi ,
   then set Yi+1 := Yi ∪ {yu,M }

After generating the set of variables, we construct the matrix of the optimized
safe anonymization based on Definition 12.
                                         Privacy-Preserving Ontology Publishing          11

                                          yx,∅            spouse        jerry
            ben        relative
                                     yx,{Comedian}                     yjerry,∅
           yben,∅
                                  yx,{∃spouse.Comedian}            yjerry,{Comedian}

Fig. 3: Optimized safe anonymization of the example ABox. Gray nodes denote
instances of Comedian.


Definition 13. Let Ym ⊆ Y be the set of all variables obtained by exhaustive
applications of Safe Anonymization Rule 1 and Rule 2. The optimized {P }-safe
anonymization of ∃ X. A is the qABox ∃ Ym . Bm , where Bm contains all assertions
in the matrix of sa(∃ X. A, {P }) involving only object names in ΣI ∪ Ym .

Example 14. For the policy and qABox in Example 9, the canonical safe
anonymization would contain 24 variables, while the optimized safe anonymiza-
tion is much smaller.6 Applying the Safe Anonymization Rule 2 to the pair b
and yx,∅ exhaustively, we obtain the variables yx,{C} and yx,{∃s.C} , and then ap-
plying the Safe Anonymization Rule 1 to the pair yx,{C,∃s.C} and yj,∅ generates
yj,{C} . On the resulting set of objects, no rule is applicable, and our procedure
terminates. Thus, the optimized safe anonymization contains only 8 objects in
total. Using the matrix construction in Definition 12, we obtain the optimal safe
anonymization ∃ Ym . Bm whose matrix is depicted in Figure 3.

Bm is a subset of the matrix of sa(∃ X. A, {P }), which implies that the former
is entailed by the latter. It immediately follows that ∃ Ym . Bm is a {P }-safe
anonymization of ∃ X. A. We can also show the other direction.

Proposition 15. The optimized {P }-safe anonymization of ∃ X. A entails
sa(∃ X. A, {P }).

We thus obtain the following theorem, which shows that we can work with the
smaller anonymization.

Theorem 16. Given a qABox ∃ X. A and a singleton policy {P }, the optimized
{P }-safe anonymization ∃ Ym . Bm and sa(∃ X. A, {P }) are equivalent.

4.3     Static EL TBoxes and General Policies
So far, our methods for testing for and achieving safety can only deal with
singleton policies without a TBox. Safety w.r.t. a TBox is defined as follows:
the qABox ∃ X. A is safe for P w.r.t. T if for each quantified ABox ∃ Y. B that
is compliant with P w.r.t. T , the union ∃ X. A ∪ ∃ Y. B is also compliant with
P w.r.t. T . Interestingly, TBoxes can be used to express general policies by
singleton policies.
6
    To save space and increase legibility, we abbreviate names by their first letters.
12                             Baader, Koopmann, Kriegel, Nuradiansyah, Peñaloza

Proposition 17. Consider a quantified ABox ∃ X. A, an EL TBox T , and a
policy P. Further let A be a fresh concept name not occurring in ∃ X. A, in T ,
or in P, and define the extended TBox TP := T ∪ { P v A | P ∈ P }. Then
∃ X. A is safe for P w.r.t. T iff ∃ X. A is safe for {A} w.r.t. TP .

By setting T := ∅ in this proposition, we see that safety for an arbitrary policy P
(but without TBox) can be reduced to safety for the singleton policy {A} w.r.t.
to a non-empty cycle-restricted TBox. As shown in [9], such a reduction cannot
exist without a TBox. Until now, we do not have a characterization of safety
akin to Proposition 10 for non-singleton policies without TBox. The proposition
shows that dealing with (cycle-restricted) TBoxes, even for singleton policies, is
at least as hard as dealing with general policies.
    Nevertheless, by using ideas from [17, 18], we can find a co NP decision pro-
cedure for safety for a general policy w.r.t. an EL TBox. This complexity result
extends the one given in [9] (Proposition 3.16) for the case without a TBox, and
at the same time corrects a typo in the formulation of that proposition.

Proposition 18. The safety problem for general policies w.r.t. static EL
TBoxes is in co NP.


5    Conclusions

The work reviewed in this paper shows that, under some restrictions, optimality
can indeed be achieved when computing repairs as well as compliant and safe
anonymizations. What remains open is the question of how to deal with gen-
eral policies and/or cycle-restricted TBoxes in the context of safety. For general
TBoxes, optimality is not always achievable, but one can of course ask whether
the existence of an optimal repair or an optimal compliant/safe anonymization is
decidable, and whether one can then compute such an optimal ABox if it exists.
Using conjunctive queries rather then EL concepts is also an interesting topic
for future research.
    Classical repairs (which are based on removing axioms) have been used to
define inconsistency-tolerant semantics. Basically, instead of replacing an incon-
sistent ABox by one of its repairs, one reasons w.r.t. all optimal classical repairs
in a certain well-defined way [13, 26]. It would be interesting to see what hap-
pens if optimal classical repairs are replaced with optimal repairs (in the sense
introduced in the present paper) in such inconsistency-tolerant semantics.


Acknowledgements

This work was funded by DFG in project number 430150274 and in TRR 248
(cpec, grant 389792660).
                                       Privacy-Preserving Ontology Publishing          13

References
 1. Baader, F., Borgwardt, S., Morawska, B.: Extending unification in EL towards
    general TBoxes. In: Proc. of the 13th Int. Conf. on Principles of Knowledge Rep-
    resentation and Reasoning (KR 2012). pp. 568–572. AAAI Press/The MIT Press
    (2012)
 2. Baader, F., Horrocks, I., Lutz, C., Sattler, U.: An Introduction to Description
    Logic. Cambridge University Press (2017)
 3. Baader, F., Koopmann, P., Kriegel, F., Nuradiansyah, A.: Computing optimal
    repairs of quantified ABoxes w.r.t. static EL TBoxes. In: Platzer, A., Sutcliffe, G.
    (eds.) Proceedings of the 28th International Conference on Automated Deduction
    (CADE-28). Lecture Notes in Computer Science, vol. 12699, pp. 309–326 (2021)
 4. Baader, F., Koopmann, P., Kriegel, F., Nuradiansyah, A.: Computing optimal
    repairs of quantified ABoxes w.r.t. static EL TBoxes (extended version). LTCS-
    Report 21-01, Chair of Automata Theory, Institute of Theoretical Computer Sci-
    ence, Technische Universität Dresden, Dresden, Germany (2021), https://lat.
    inf.tu-dresden.de/research/reports/2021/BaKoKrNu-LTCS-21-01.pdf
 5. Baader, F., Koopmann, P., Kriegel, F., Nuradiansyah, A., Peñaloza, R.: Privacy-
    Preserving Ontology Publishing: The Case of Quantified ABoxes w.r.t. a Static
    Cycle-Restricted EL TBox (Extended Version). LTCS-Report 21-04, Chair of Au-
    tomata Theory, Institute of Theoretical Computer Science, Technische Universität
    Dresden, Dresden, Germany (2021), https://lat.inf.tu-dresden.de/research/
    reports/2021/BaKoKrNuPe-LTCS-21-04.pdf
 6. Baader, F., Kriegel, F., Nuradiansyah, A.: Privacy-preserving ontology publishing
    for EL instance stores. In: Calimeri, F., Leone, N., Manna, M. (eds.) Logics in
    Artificial Intelligence - 16th European Conference, JELIA 2019, Rende, Italy, May
    7-11, 2019, Proceedings. Lecture Notes in Computer Science, vol. 11468, pp. 323–
    338. Springer (2019)
 7. Baader, F., Kriegel, F., Nuradiansyah, A., Peñaloza, R.: Making repairs in descrip-
    tion logics more gentle. In: Thielscher, M., Toni, F., Wolter, F. (eds.) Principles of
    Knowledge Representation and Reasoning: Proceedings of the Sixteenth Interna-
    tional Conference, KR 2018, Tempe, Arizona, 30 October - 2 November 2018. pp.
    319–328. AAAI Press (2018)
 8. Baader, F., Kriegel, F., Nuradiansyah, A., Peñaloza, R.: Computing compliant
    anonymisations of quantified ABoxes w.r.t. EL policies. In: Pan, J.Z., Tamma,
    V.A.M., d’Amato, C., Janowicz, K., Fu, B., Polleres, A., Seneviratne, O., Kagal,
    L. (eds.) The Semantic Web - ISWC 2020 - 19th International Semantic Web
    Conference, Athens, Greece, November 2-6, 2020, Proceedings, Part I. Lecture
    Notes in Computer Science, vol. 12506, pp. 3–20. Springer (2020)
 9. Baader, F., Kriegel, F., Nuradiansyah, A., Peñaloza, R.: Safety of quantified
    ABoxes w.r.t. singleton EL policies. In: Hung, C., Hong, J., Bechini, A., Song,
    E. (eds.) SAC ’21: The 36th ACM/SIGAPP Symposium on Applied Computing.
    pp. 863–872. ACM (2021)
10. Baader, F., Nuradiansyah, A.: Mixing description logics in privacy-preserving on-
    tology publishing. In: Benzmüller, C., Stuckenschmidt, H. (eds.) KI 2019: Ad-
    vances in Artificial Intelligence - 42nd German Conference on AI, Kassel, Ger-
    many, September 23-26, 2019, Proceedings. Lecture Notes in Computer Science,
    vol. 11793, pp. 87–100. Springer (2019)
11. Baader, F., Suntisrivaraporn, B.: Debugging SNOMED CT using axiom pinpoint-
    ing in the description logic EL+ . In: Proceedings of the International Conference
14                               Baader, Koopmann, Kriegel, Nuradiansyah, Peñaloza

    on Representing and Sharing Knowledge Using SNOMED (KR-MED’08). Phoenix,
    Arizona (2008)
12. Benedikt, M., Grau, B.C., Kostylev, E.V.: Source information disclosure in
    ontology-based data integration. In: Singh, S.P., Markovitch, S. (eds.) Proceed-
    ings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9,
    2017, San Francisco, California, USA. pp. 1056–1062. AAAI Press (2017)
13. Bienvenu, M., Bourgaux, C., Goasdoué, F.: Computing and explaining query an-
    swers over inconsistent DL-Lite knowledge bases. J. Artif. Intell. Res. 64, 563–644
    (2019)
14. Biskup, J., Bonatti, P.A.: Controlled query evaluation for enforcing confidentiality
    in complete information systems. Int. J. Inf. Sec. 3(1), 14–27 (2004)
15. Bonatti, P.A., Sauro, L.: A confidentiality model for ontologies. In: Alani, H., Ka-
    gal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N.F.,
    Welty, C., Janowicz, K. (eds.) The Semantic Web - ISWC 2013 - 12th Interna-
    tional Semantic Web Conference, Sydney, NSW, Australia, October 21-25, 2013,
    Proceedings, Part I. Lecture Notes in Computer Science, vol. 8218, pp. 17–32.
    Springer (2013)
16. Grau, B.C., Kharlamov, E., Kostylev, E.V., Zheleznyakov, D.: Controlled query
    evaluation for datalog and OWL 2 profile ontologies. In: Yang, Q., Wooldridge,
    M.J. (eds.) Proceedings of the Twenty-Fourth International Joint Conference on
    Artificial Intelligence (IJCAI 2015). pp. 2883–2889. AAAI Press (2015)
17. Grau, B.C., Kostylev, E.V.: Logical foundations of privacy-preserving publishing
    of linked data. In: Schuurmans, D., Wellman, M.P. (eds.) Proceedings of the Thir-
    tieth AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix,
    Arizona, USA. pp. 943–949. AAAI Press (2016)
18. Grau, B.C., Kostylev, E.V.: Logical foundations of linked data anonymisation. J.
    Artif. Intell. Res. 64, 253–314 (2019)
19. Horridge, M., Parsia, B., Sattler, U.: Laconic and precise justifications in OWL.
    In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T.W.,
    Thirunarayan, K. (eds.) The Semantic Web - ISWC 2008, 7th International Se-
    mantic Web Conference, ISWC 2008, Karlsruhe, Germany, October 26-30, 2008.
    Proceedings. Lecture Notes in Computer Science, vol. 5318, pp. 323–338. Springer
    (2008)
20. Horrocks, I., Li, L., Turi, D., Bechhofer, S.: The instance store: DL reasoning with
    large numbers of individuals. In: Haarslev, V., Möller, R. (eds.) Proceedings of the
    2004 International Workshop on Description Logics (DL2004), Whistler, British
    Columbia, Canada, June 6-8, 2004. CEUR Workshop Proceedings, vol. 104. CEUR-
    WS.org (2004)
21. Kalyanpur, A., Parsia, B., Horridge, M., Sirin, E.: Finding all justifications of
    OWL DL entailments. In: Proc. of ISWC’07. Lecture Notes in Computer Science,
    vol. 4825, pp. 267–280. Springer-Verlag (2007)
22. Lam, J.S.C., Sleeman, D.H., Pan, J.Z., Vasconcelos, W.W.: A fine-grained approach
    to resolving unsatisfiable ontologies. J. Data Semant. 10, 62–95 (2008)
23. Lembo, D., Rosati, R., Savo, D.F.: Revisiting controlled query evaluation in de-
    scription logics. In: Kraus, S. (ed.) Proceedings of the Twenty-Eighth International
    Joint Conference on Artificial Intelligence (IJCAI 2019). pp. 1786–1792. ijcai.org
    (2019)
24. Meyer, T., Lee, K., Booth, R., Pan, J.Z.: Finding maximally satisfiable terminolo-
    gies for the description logic ALC. In: Proc. of the 21st Nat. Conf. on Artificial
    Intelligence (AAAI 2006). AAAI Press/The MIT Press (2006)
                                      Privacy-Preserving Ontology Publishing         15

25. Parsia, B., Sirin, E., Kalyanpur, A.: Debugging OWL ontologies. In: Ellis, A.,
    Hagino, T. (eds.) Proc. of the 14th International Conference on World Wide Web
    (WWW’05). pp. 633–640. ACM (2005)
26. Rosati, R.: On the complexity of dealing with inconsistency in description logic on-
    tologies. In: Walsh, T. (ed.) Proceedings of the 22nd International Joint Conference
    on Artificial Intelligence (IJCAI 2011). pp. 1057–1062. IJCAI/AAAI (2011)
27. Schlobach, S., Cornet, R.: Non-standard reasoning services for the debugging of
    description logic terminologies. In: Gottlob, G., Walsh, T. (eds.) Proc. of the 18th
    Int. Joint Conf. on Artificial Intelligence (IJCAI 2003). pp. 355–362. Morgan Kauf-
    mann, Los Altos, Acapulco, Mexico (2003)
28. Schlobach, S., Huang, Z., Cornet, R., Harmelen, F.: Debugging incoherent termi-
    nologies. J. Automated Reasoning 39(3), 317–349 (2007)
29. Troquard, N., Confalonieri, R., Galliani, P., Peñaloza, R., Porello, D., Kutz, O.:
    Repairing ontologies via axiom weakening. In: McIlraith, S.A., Weinberger, K.Q.
    (eds.) Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence
    (AAAI-18). pp. 1981–1988. AAAI Press (2018)