-

Preliminary Results on the Identity Problem in Description Logic Ontologies

Franz Baader

Daniel Borchmann

0 0 Theoretical Computer Science , TU Dresden

The work in this paper is motivated by a privacy scenario in which the identity of certain persons (represented as anonymous individuals) should be hidden. We assume that factual information about known individuals (i.e., individuals whose identity is known) and anonymous individuals is stored in an ABox and general background information is expressed in a TBox, where both the TBox and the ABox are publicly accessible. The identity problem then asks whether one can deduce from the TBox and the ABox that a given anonymous individual is equal to a known one. Since this would reveal the identity of the anonymous individual, such a situation needs to be avoided. We first observe that not all Description Logics (DLs) are able to derive any such equalities between individuals, and thus the identity problem is trivial in these DLs. We then consider DLs with nominals, number restrictions, or function dependencies, in which the identity problem is non-trivial. We show that in these DLs the identity problem has the same complexity as the instance problem. Finally, we consider an extended scenario in which users with different rôles can access different parts of the TBox and ABox, and we want to check whether, by a sequence of rôle changes and queries asked in each rôle, one can deduce the identity of an anonymous individual.

In order to illustrate the privacy scenario sketched in the abstract, assume that you are asked to perform a survey regarding the satisfaction of employees with the management of a company. Since the boss of the company is known not to respond well to criticism, the employees insist that you perform the survey such that the identity of persons voicing criticism cannot be deduced by the boss. Thus, you let the employees use a pseudonym when answering the survey. However, the survey does ask some personal data from the participants, and you are concerned that the boss can use the provided answers, in combination with the employee database and general knowledge about how things work in the company, to deduce that a certain pseudonym corresponds to a specific employee. For example, assume that in the survey the anonymous individual x states that she is female and has expertise in logic and privacy. The boss knows that all ? Funded by DFG within the Research Training Group 1907 “RoSI”. employees with expertise logic belong to the formal verification task force and all employees with expertise privacy belong to the security task force. In addition, the employee database contains the information that the members of the first task force are John, Linda, Paul, Pattie and of the second Jim, John, Linda, Pamela. Since Linda is the only female employee belonging to both task forces, the boss can deduce that Linda hides behind the pseudonym x. The question is now whether you can use an automated system to check whether such a breach of privacy can occur in your survey.

The purpose of this paper is to show that ontology reasoners can in principle be used for this purpose. We assume that both the information provided in the survey and the employee database are represented in a DL ABox A, where the employees from the database are represented as known individuals in A and the pseudonyms used in the survey are represented as anonymous individuals in A. Background information (such as disjointness of the concepts Male and Female, or the connection between expertise and task forces) are represented in a DL TBox T . In order to detect a breach of privacy, we then need to check whether the ontology O consisting of T and A implies an identity between some anonymous individual x and a known individual a. We call the underlying reasoning task the identity problem for O, x, and a.

In Section 2 we formally introduce the identity problem and show that, for a large class of DLs, this problem is trivial in the sense that no identities between distinct individuals can be deduced from consistent ontologies formulated in these DLs. Not surprisingly, this class consists of the DLs that are fragments of first-order logic without equality. In Section 3, we introduce three DLs for which the identity problem is non-trivial, i.e., the DL ALCO [ 10 ], where nominals allow us to derive identities; ALCQ [ 8 ], where number restrictions allow us to derive identities; and CF Dnc [ 17 ], where functional dependencies allow us to derive identities. In Section 4 we show that the identity problem can be reduced in polynomial time to the instance problem, and that for the three DLs mentioned above this actually yields an optimal procedure w.r.t. worst-case complexity. Section 5 considers the identity problem in the context of rôle-based access control [ 9 ] to ontologies. Basically, we assume that a user rôle r^ is associated with access to a subset Or^ of the ontology.1 While having rôle r^, the user can access Or^ through queries, and can then store the result in a view Vr^. In a setting where rôles can dynamically change, the user may have collected (and stored) a sequence of views for different rôles. The question is then whether it is possible to derive the identity of an anonymous individual with a known one using these views. We will show that answering this question can be reduced to the identity problem investigated in the previous sections.

Similar privacy scenarios have been considered for databases [ 2 ], but also in the context of ontology-based data access [ 5,13,4 ]. In particular, [ 13 ] introduces a setting with sub-ontologies and views that is similar to what we consider in Section 5. However, the main difference between these works and ours is that we 1 To distinguish user rôles from DL roles, we write them with “ô” and also denote specific such rôles with letters with a hat. concentrate on hiding the identity of an anonymous individual with a known one. In contrast, the other works are trying to hide properties of known individuals, i.e., the membership of an individual (or a tuple of individuals) in the answers to certain queries. 2

The Identity Problem

We assume that the reader is familiar with the basic notions of Description Logics, as e.g. introduced in [ 1 ]. We denote the set of concept names by NC , the set of role names by NR, and the set of individual names by NI . TBoxes and ABoxes as well as their models are assumed to be defined in the standard way. Note, however, that we do not make the unique name assumption (UNA) for individual names. An ontology is of the form O = (T ; A) where T is a TBox and A is an ABox. The identity problem asks whether two individuals are equal w.r.t. a given ontology. Since anything (also identities) follows from an inconsistent ontology, we consider this problem only for the case where the ontology is consistent.

Definition 1. Let a; b 2 NI be distinct individual names and O a consistent : ontology. Then a is equal to b w.r.t. O (denoted by O j= a = b) iff aI = bI for : all models I of O. The identity problem for O; a; b asks whether O j= a = b.

Not all DLs are able to derive equality of individuals. We call those that can DLs with equality power.

Definition 2. L is a description logic without equality power if there is no consistent ontology O formulated in L and two distinct individual names a; b 2 : NI such that O j= a = b. Otherwise we say that L has equality power.

It is well-known (see, e.g., Chapter 6 in [ 1 ]) that many DLs can be translated into first-order predicate logic (FOL). Basically, concept names and role names are translated into unary and binary predicates, respectively, and complex concept descriptions are translated into FOL formulas with one free variable. Individual names are translated into constant symbols and TBoxes and ABoxes into closed formulas. For the translation of some DLs, FOL without equality is sufficient whereas for others equality is needed.

Theorem 1. If the DL L can be translated into FOL without equality, then it is a DL without equality power.

Proof. Let O = (T ; A) be a consistent ontology o=f La =a:ndb. aA; cbco2rdNinIg bteo towuor distinct individual names. We must show that O 6j assumption on L, there is an FOL formula not containing the equality symbol that is equivalent to O. Consequently, it is sufficient to show that 6j= a = b according to the semantics of FOL, where the equality symbol = is interpreted as equality. Since O is consistent, the formula is satisfiable.

Using well-known approaches and results regarding FOL, we can transform into a formula 0 in Skolem form containing additional function symbols such that (i) is satisfiable iff 0 is satisfiable, and (ii) any model of 0 is a model of . Thus, 0 is satisfiable and since it is in Skolem form it has a Herbrand model IH . Since 0 does not contain equality, distinct terms (and thus in particular distinct constants) are interpreted by distinct elements in IH . Finally, we know that IH is also a model of , which shows that there is a model of in which a and b are not interpreted by the same domain element. This proves 6j= a = b. tu

As a consequence of this theorem, we conclude that the basic DL ALC and its fragments, but also more expressive DLs such as SRI, do not have equality power, and thus the identity problem is trivial for these DLs. 3

Three DLs with Equality Power

In this section, we introduce three DLs that are able to derive equalities between individuals, and for which thus the identity problem is non-trivial. The first two DLs are ALCO, which extends ALC by nominals, and ALCQ, which extends ALC by qualified number restrictions. Since these DLs are standard, we refer the reader to [ 1 ] for the definition of their syntax and semantics. The third DL, called CF Dnc [ 17 ], derives its equality power from so-called functional dependencies. We leave it to the reader to verify that expressing these logics in FOL really requires the equality symbol.

Since the DL CF Dnc is less standard and not introduced in [ 1 ], we describe it in more detail. Instead of roles, this logic uses attributes, which are interpreted as total functions. We use the symbol NA to denote the set of all attributes, replacing the set NR. Concept descriptions C; D of CF Dnc are defined using the following syntax rules:

C; D ::= A j :A j C u D j 8Pf:C j A : Pf1; : : : ; Pfk ! Pf; where A 2 NC , k 1, and the path functions Pf; Pfi are words in NA with the convention that the empty word is denoted by id. A concept descriptions of the form A : Pf1; : : : ; Pfk ! Pf is called a path functional dependency (PFD). In CF Dnc there is an additional restriction on PFDs to ensure that reasoning in this logic is polynomial: for any PFD of the form above there is an i; 1 i k such that 1. Pf is a prefix of Pfi, or 2. Pf = Pf0f for f 2 NA and Pf0 is a prefix of Pfi.

Note that PFDs whose right-hand side Pf has length 1 trivially satisfy this restriction.

The interpretation of attributes as total functions is extended to path functions by using composition of functions and interpreting id as the identity function. The semantics of atomic negation (:A) and conjunction (C u D) is defined in the usual way. For the constructors involving path functions, it is defined as follows: (8Pf:C)I := fd 2

I j PfI (d) 2 CI g; (A : Pf1; : : : ; Pfk ! Pf)I := 0 1 i k 1

I j 8e 2 AI : @ ^ PfiI (d) = PfiI (e)A ) PfI (d) = PfI (e)g A TBox T in CF Dnc consists of a finite set of inclusion dependencies A v C, and an ABox A consists of a finite set of concept assertions A(a) and path function assertions Pf1(a) = Pf2(b), where A 2 NC , C is a CF Dnc concept description, a; b 2 NI , and Pfi 2 NA.

Theorem 2. The DLs ALCO, ALCQ, and CF Dnc have equality power. This theorem is an immediate consequence of the following three examples, which each shows for the respective DL that it can derive equality between different individuals.

Example 1. Here we formulate the example from the introduction in the DL ALCO. Let O = (T ; A) where

T := f9expertise:fLOGICg v VerTF; 9expertise:fPRIVACYg v SecTF; VerTF fJOHN; LINDA; PAUL; PATTIEg;

SecTF fJIM; JOHN; LINDA; PAMELAg; Female v :Maleg; A := fFemale(x); expertise(x; LOGIC); expertise(x; PRIVACY);

Female(LINDA); Female(PATTIE); Female(PAMELA); Male(JOHN); Male(JIM); Male(PAUL)g:

: It is easy to see that O j= x = LINDA since x’s expertise implies that she belongs to both the verification and the security task force, but the only female employee belonging to both is Linda.

For the sake of brevity, we use abstract examples to show that ALCQ and CF Dnc have equality power. It would, however, be easy to provide intuitive examples also for these two DLs.

Example 2. Consider the ALCQ ontology O = (T ; A) where

T := fA v 6 1 r:Bg and A := fA(a); r(a; b); r(a; x); B(a); B(x)g: : Obviously, we have O j= x = b.

Example 3. Consider the CF Dnc ontology O = (T ; A) where

T := fA v A : f ! idg and A := fA(a); f (a) = b; A(x); f (x) = bg: Since both x and a belong to A and have the same value b for the path function f , the path functional dependency in T implies that they must be equal, i.e., we : have O j= x = a.

The Complexity of the Identity Problem

In this section, we first show that the identity problem can be polynomially reduced to the instance problem for all DLs with equality power. Note that the instance problem is one of the basic inference problems for DLs, and thus instance checking facilities are available in most DL reasoners.

Lemma 1. Let L be a DL with equality power, O = (T ; A) an L ontology and a; b two distinct individual names. If B is a concept name not occurring in O, then we have

O j= a = b iff (T ; A [ fB(a)g) j= B(b): Proof. The direction from left to right is trivial. We show the other direction by contraposition. Thus, assume that O 6j= a =: b. Let I be a model of O such that aI 6= bI . Let I0 be the interpretation that coincides with I on all role names, individual names, and concept names different from B. For B we define BI0 := faI g. Since B does not occur in O, the interpretation I0 is still a model of T and A, and it satisfies B(a) by our definition of BI0 . However, it does not satisfy B(b) since bI0 = bI 6= aI does not belong to BI0 . tu This lemma shows that the identity problem is at most as complex as the instance problem for all DLs with equality power that allow instance assertions for concept names in the ABox. Since the instance problem is polynomial for CF Dnc [ 17 ], this implies that also the identity problem is polynomial for this DL. In [ 17 ] it is mentioned that P-hardness of the consistency problem for CF Dnc ontologies is an easy consequence of P-hardness of satisfiability of propositional Horn formulas [ 3 ]. It is easy to see that the same is true also for the identity problem. Theorem 3. The identity problem is P-complete for CF Dnc ontologies.

For ALCO and ALCQ, the instance problem is ExpTime-complete [ 10,16 ]. Thus, we obtain exponential-time upper bounds for the identity problem in these DLs. To show that these upper bounds are optimal, we prove that there are polynomial-time reductions of the instance problem in ALC to the identity problem in these logics. In fact, the instance problem is already ExpTime-hard for the common sub-logic ALC of ALCO and ALCQ [ 11 ].

Lemma 2. Let L 2 fALCO; ALCQg, O be an ALC ontology, C an ALC concept description, and a an individual name. Then we can construct in polynomial time an L ontology O0 and individuals a0; b0 such that

O j= C(a) iff O0 j= a0 =: b0: Proof. Let O = (T ; A). We consider the two DLs separately. 1.) L = ALCO:

We define O0 := (T [ fC v 8r:fb0gg; A [ fr(a; a0); r(a; b0)g), where a0; b0 are distinct individual names and r is a role name such that a0; b0; r do not occur in O. The direction from left to right is again trivial. The other direction is shown by contraposition. Let I be a model of O such that aI 62 CI . We can assume without loss of generality that the domain of I contains at least two distinct elements d1 6= d2.2 We construct an interpretation I0 that coincides with I on all concept, role, and individual names occurring in O, and thus is also a model of O. In addition, I0 interprets r as rI0 := f(aI ; d1); (aI ; d2)g and the new individual names as a0I0 := d1 and b0I0 := d2. By construction, I0 satisfies the assertional part of O0. To see that it also satisfies the GCI C v 8r:fb0g, note that aI = aI0 is the only element of I0 that has successors w.r.t. the role r. Since it does not belong to CI = CI0 , the elements of CI0 trivially satisfy the value restriction 8r:fb0g. Thus, I0 is a model of O0 in which the individuals a0; b0 are interpreted by different elements, which : shows O0 6j= a0 = b0. 2.) L = ALCQ:

We define O0 := (T [ fC v 6 1 r:>g; A [ fr(a; a0); r(a; b0)g), where a0; b0 are distinct new individuals and r is a new role name not occurring in O. The direction from left to right is again trivial. To show the other direction, assume that I is a model of O such that aI 62 CI . Again, we assume without loss of generality that the domain of I contains at least two distinct elements d1 6= d2. We construct an interpretation I0 in the same way as in case 1. above. Also, the argument why I0 is a model of O0 in which a0; b0 are interpreted by different elements is identical to the one above. tu

As an easy consequence of Lemma 1 and Lemma 2 we obtain the exact complexity of the identity problem in ALCO and ALCQ. In fact, Lemma 1 yields ExpTime upper bounds. To show that Lemma 2 indeed yields ExpTime lower bounds, we need to take into account the fact that we have defined the identity problem with only consistent ontologies as possible input. However, it is easy to see that ExpTime-hardness of the instance problem in ALC also holds if we consider the instance problem only for consistent ALC ontologies O. In addition, if O is a consistent ALC ontology, then so are the ontologies O0 constructed from it in the proof of Lemma 1.

Theorem 4. The identity problem is ExpTime-complete for ALCO and ALCQ ontologies.

For the three DLs with equality power considered in this paper, the identity problem has the same complexity as the instance problem. A natural question to ask is whether this is always the case. A simple example shows that the answer to this question is negative. In fact, let ALC= be the DL ALC, with the only difference that ALC= ABoxes may contain equality assertions a =: b between individual names. It is easy to see that the identity problem in this DL is nontrivial, but it can be solved in polynomial time. In fact, to check whether a 2 This follows from the fact that models of ALC ontologies are closed under disjoint union. Note that this assumption could not be made without loss of generality for ALCO ontologies. For example, O = (f> v fagg; ;) has only models of size 1. : consistent ALC= ontology implies an equality a = b, we only need to construct the reflexive, transitive, and symmetric closure of the explicitly stated equalities. However, since ALC is a sub-logic of ALC=, the instance problem in this DL is ExpTime-hard (and it is easy to show that it is also in ExpTime).

One may also wonder whether the complexity of the instance problem can be transferred to the identity problem also for DLs where the instance problem has a higher complexity than ExpTime. For example, the DL ALCOIQ, which extends both ALCO and ALCQ and additionally allows the use of inverse roles, has a coNExpTime-complete instance problem [ 15 ]. Since it contains ALCO, it has equality power and can force models to have cardinality 1. Lemma 1 implies that the identity problem in ALCOIQ is in coNExpTime. Regarding hardness, the reductions employed in the proof of Lemma 2 can in principle both be used since the constructors employed in them are available in ALCOIQ. However, Lemma 2 uses an ALC ontology O in the reduction, which yields only an ExpTime lower bound. Simply using an ALCOIQ ontology instead does not work since the proof depends on the fact that O has models refuting the instance relation of cardinality at least 2. However, by looking at the coNExpTime-hardness proof for the instance problem in ALCOIQ, it is easy to see that the following modified instance problem is also coNExpTime-hard in ALCOIQ: is a an instance of C in all models of O of cardinality 2. Thus, one can without loss of generality restrict the attention to models of cardinality 2 when reducing the instance problem for ALCOIQ to the identity problem for this logic.

Theorem 5. The identity problem is coNExpTime-complete for ALCOIQ ontologies. 5

The View-based Identity Problem

In this section, we will adapt the approach of [ 13,12 ] for view-based information hiding such that it can formalize the rôle-based access control scenario sketched in the introduction. We assume that ontologies are written using some DL L with equality power.

To define what kind of information is to be hidden, we divide the set of individual names into the disjoint sets NAI and NKI consisting of anonymous and known individuals, respectively. As before, we do not make the unique name assumption for these individuals. Given an anonymous individual x 2 NAI and an ontology O, we define the identity of x w.r.t. O as : idn(x; O) := fb 2 NKI j O j= x = bg: Note that b; b0 2 idn(x; O) implies that O j= b0 =: b. Thus, if the cardinality of idn(x; O) is greater 1, this does not mean that x is equal to one of these individuals, but rather that it is equal to all of them (and thus that all of them are equal). We say that x is a hidden if idn(x; O) = ;.

In the rôle-based access control scenario we assume that there is a “large” input ontology OI that is always consistent, but users can only see a part of it depending on which rôle they currently have. More formally, we assume that there is a finite set of user rôles R, and that playing the rôle r^ 2 R gives access to a subset Or^ OI of the input ontology. Here “access” does not mean that a user with rôle r^ can download the ontology Or^. Instead, the users can ask queries to Or^, where a subsumption query is of the form C v D for concept descriptions C; D and a retrieval query is of the form C for concept descriptions C or r for role names r.

Definition 3. Let OI be the input ontology, Or^ OI the ontology accessible by users with rôle r^ 2 R, and q be a query. The answer to q w.r.t. r^, denoted by ans(q; r^), is defined as follows: – ans(q; r^) := ftrueg, if q = C v D and Or^ j= C v D, – ans(q; r^) := ;, if q = C v D and Or^ 6j= C v D, – ans(q; r^) := fa 2 NI j Or^ j= C(a)g, if q = C, – ans(q; r^) := f(a; b) 2 NI NI j Or^ j= r(a; b)g, if q = r.

Since Or^ OI , positive answers to queries, i.e., ans(C v D; r^) = ftrueg, a 2 ans(C; r^), or (a; b) 2 ans(r; r^), imply that this subsumption, instance, or role relationship also holds in OI . In contrast, negative answers do not tell us anything about what holds in OI since the inclusion may be strict. Answers to queries w.r.t. rôle r^ can be stored in a view.

Definition 4. A view is a total function V : dom(V ) ! 2NI [2NI NI [fftruegg where the view definition dom(V ) is a finite set of queries and V (q) is a finite set for all q 2 dom(V ). This view is a view for r^ 2 R (written r^ j= V ) if V (q) = ans(q; r^) holds for all q 2 dom(V ).

In a setting where user rôles can dynamically change, a user may successively play rôles r^1; r^2; : : : ; r^k, in each rôle r^i generating (and storing) a view Vr^i for r^i by asking queries. The question is now whether these views can be used to find out the identity of a given anonymous individual x 2 NAI . Assume that the user wants to know whether there is a b 2 NKI such that b 2 idn(x; OI ). However, the user cannot access OI as a whole, all she knows is that the positive answers to the queries in the views Vr^i are justified by subsets of OI . Consequently, instead of one (unknown) ontology OI , the user needs to consider all possible ontologies, i.e., all ontologies that are compatible with the positive answers in the views. Definition 5. The ontology P is a possible ontology for the sequence of views Vr^1 ; : : : ; Vr^k if P is consistent and compatible with all positive answers in these views, where P is compatible with – Vr^i (C v D) = ftrueg if P j= C v D, – a 2 Vr^i (C) if P j= C(a), and (a; b) 2 Vr^i (r) if P j= r(a; b).

We denote the set of all possible ontologies for Vr^1 ; : : : ; Vr^k with Poss(Vr^1 ; : : : ; Vr^k ). The certain identity of x w.r.t. Vr^1 ; : : : ; Vr^k is defined as cert _idn(x; Vr^1 ; : : : ; Vr^k ) :=

\ P2Poss(Vr^1 ;:::;Vr^k ) idn(x; P): We say that x is hidden w.r.t. Vr^1 ; : : : ; Vr^k if cert _idn(x; Vr^1 ; : : : ; Vr^k ) = ;.

Since OI 2 Poss(Vr^1 ; : : : ; Vr^k ), we know that b 2 cert _idn(x; Vr^1 ; : : : ; Vr^k ) implies that b 2 idn(x; OI ). Thus, if cert _idn(x; Vr^1 ; : : : ; Vr^k ) 6= ;, the identity of x in OI is no longer hidden. Conversely, if cert _idn(x; Vr^1 ; : : : ; =Vr^kx) ==: ;b. , then for all b 2 NKI there is a P 2 Poss(Vr^1 ; : : : ; Vr^k ) such that P 6j Since, according to the information available to the user, OI could be this P, : she cannot conclude for any b 2 NKI that OI j= x = b. This shows that cert _idn(x; Vr^1 ; : : : ; Vr^k ) = ; indeed corresponds to the fact that the views Vr^1 ; : : : ; Vr^k do not disclose the identity of x.

Since the set Poss(Vr^1 ; : : : ; Vr^k ) consists of infinitely many ontologies, the definition of cert _idn(x; Vr^1 ; : : : ; Vr^k ) does not directly yield an approach for computing this set. We will now show that we can reduce this computation to the identity problem for the canonical ontology of Vr^1 ; : : : ; Vr^k . Basically, this ontology consists of the GCIs, concept assertions, and role assertions obtained from the positive answers in the views.

Definition 6. The canonical ontology C(Vr^1 ; : : : ; Vr^k ) of Vr^1 ; : : : ; Vr^k is defined as C(Vr^1 ; : : : ; Vr^k ) := (T ; A) where

T := fC v D j Vr^i (C v D) = ftrueg for some i; 1 A := fC(a) j a 2 Vr^i (C) for some i; 1 i kg [ fr(a; b) j (a; b) 2 Vr^i (r) for some i; 1 i kg: i kg Since C(Vr^1 ; : : : ; Vr^k ) consists of all positive answers in the views Vr^1 ; : : : ; Vr^k , it clearly implies them, and thus C(Vr^1 ; : : : ; Vr^k ) 2 Poss(Vr^1 ; : : : ; Vr^k ). Conversely, every ontology P 2 Poss(Vr^1 ; : : : ; Vr^k ) implies all these positive answers, and thus all the GCIs, concept assertions, and role assertions in C(Vr^1 ; : : : ; Vr^k ). This implies that every consequence of C(Vr^1 ; : : : ; Vr^k ) is also a consequence of P. Theorem 6. Given views Vr^1 ; : : : ; Vr^k and an anonymous individual x 2 NAI , we have cert_idn(x; Vr^1 ; : : : ; Vr^k ) = idn(x; C(Vr^1 ; : : : ; Vr^k )): : Proof. First assume that b 2 cert_idn(x; Vr^1 ; : : : ; Vr^k ). Then we have P j= x = b for all P 2 Poss(Vr^1 ; : : : ; V:r^k ). Since C(Vr^1 ; : : : ; Vr^k ) 2 Poss(Vr^1 ; : : : ; Vr^k ), this yields C(Vr^1 ; : : : ; Vr^k ) j= x = b, and thus b 2 idn(x; C(Vr^1 ; : : : ; Vr^k )).

Conversely, assume b 2 idn(x; C(Vr^1 ; : : : ; Vr^k )), and thus C(Vr^1 ; : : : ; Vr^k ) j= x =: b. We must show that, for all P 2 Poss(Vr^1 ; : : : ; Vr^k ), we have P j= x = : b. This is an immediate consequence of the fact that all the consequences of C(Vr^1 ; : : : ; Vr^k ) are also consequences of P. tu

This theorem shows that, to check whether x is hidden w.r.t. Vr^1 ; : : : ; Vr^k , it is sufficient to compute idn(x; C(Vr^1 ; : : : ; Vr^k )). In case the employed ontology language L allows for unrestricted GCIs, concept assertions, and role assertions, the set idn(x; C(Vr^1 ; : : : ; Vr^k )) can clearly be computed using an algorithm that solves the identity problem for L ontologies a polynomial number of times. Note that this applies to the DLs ALCO, ALCQ, and ALCOIQ considered in the previous sections, but not to CF Dnc since there GCIs and concept assertions need to satisfy certain restrictions.

Corollary 1. For L 2 fALCO; ALCQg we can check in exponential time whether an anonymous individual x is hidden w.r.t. views Vr^1 ; : : : ; Vr^k . For L = ALCOIQ, this problem can be solved in NExpTime.

The ExpTime upper bound for ALCO and ALCQ is obvious. For ALCOIQ, one considers all the (polynomially many) known individuals a1; : : : ; ap. Using a NExpTime procedure for the complement of the identity problem, one then checks whether x is not identical to a1. The non-successful paths of this nondeterministic computation stop with failure whereas the successful ones continue with the same test for a2, etc. It is easy to see that this yields the desired NExpTime procedure. In fact, any path of this procedure has only exponential length, and a successful path indicates that inequality with x holds for all known individuals. 6

Conclusions and Future Work

In this paper, we have provided some initial definitions and results regarding the identity problem in DL ontologies, i.e., the question whether the ontology implies that a given anonymous individual is equal to a known individual. We have also considered a more involved rôle-based access control scenario where users can access parts of the ontology depending on their rôle. In a setting where users can change rôles dynamically, the question is then whether, by changing rôles and asking queries in these rôles, the user can find out the identity of an anonymous individual although this may not be possible for a single rôle. We have shown how to use the identity problem to address this question.

Until now, we have only investigated how to find out whether the identity of an anonymous individual is disclosed in a certain situation. We have not considered what to do when this is the case. One possibility would be to additionally anonymize the available information, e.g., by replacing some of the known individuals in assertions by new anonymous ones, similar to what is done in [ 6 ].

Another direction for future research could be to look at k-anonymity [ 14 ] rather than identity. In principle, our identity problem is concerned with 1anonymity, i.e., we want to avoid that one can deduce from the given information that an anonymous individual belongs to a singleton set consisting of only one known individual. In many applications, one also wants to ensure that the set of known individuals to which the anonymous one is known to belong has a large enough cardinality, i.e., one > k. Of course, in this setting additional anonymization (as mentioned above) is also relevant in cases where k-anonymity is not given.

Finally, we intend to consider cases where the information about known and anonymous individuals holds only with a certain probability, e.g., using ontologies with subjective probability as introduced in [ 7 ]. In this setting, equality can also only be derived with a certain probability.

Baader ,

Calvanese ,

D. L.

McGuinness ,

Nardi , and

P. F.

Patel- Schneider, editors. The Description Logic Handbook: Theory , Implementation, and Applications . Cambridge University Press, New York, NY, USA, 2003 .

Biskup and

P. A.

Bonatti . Controlled query evaluation for enforcing confidentiality in complete information systems . Int. J. Inf. Sec. , 3 ( 1 ): 14 - 27 , 2004 .

Cook and

Nguyen . Logical Foundations of Proof Complexity . Cambridge University Press, 1st edition, 2010 .

B. C.

Grau . Privacy in ontology-based information systems: A pending matter . Semantic Web , 1 : 137 - 141 , 2010 .

B. C.

Grau and

Horrocks . Privacy-preserving query answering in logic-based information systems . In In Proc. of the 18th European Conference on Artificial Intelligence , pages 40 - 44 , 2008 .

B. C.

Grau and

E. V.

Kostylev . Logical foundations of privacy-preserving publishing of linked data . In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence , pages 943 - 949 . AAAI Press, 2016 .

Gutiérrez-Basulto ,

J. C.

Jung ,

Lutz , and

Schröder . Probabilistic description logics for subjective uncertainty . J. Artif. Intell. Res. (JAIR) , 58 : 1 - 66 , 2017 .

Hollunder and

Baader . Qualifying number restrictions in concept languages . In Proc. of the 2nd Int. Conf. on the Principles of Knowledge Representation and Reasoning (KR'91) , pages 335 - 346 , 1991 .

R. S.

Sandhu ,

E. J.

Coyne ,

H. L.

Feinstein , and

C. E.

Youman . Role-based access control models . Computer , 29 ( 2 ): 38 - 47 , Feb. 1996 .

10.

Schaerf . Reasoning with individuals in concept languages . Data Knowl. Eng. , 13 ( 2 ): 141 - 176 , 1994 .

11.

Schild . A correspondence theory for terminological logics: Preliminary report . In Proc. of the 12th Int. Joint Conf. on Artificial Intelligence (IJCAI'91) , pages 466 - 471 , 1991 .

12.

Stouppa and

Studer . A formal model of data privacy. In I. Virbitskaite and A . Voronkov, editors, Proceedings of Perspectives of System Informatics , volume 4378 of Lecture Notes in Computer Science, pages 401 - 411 . Springer, 2007 .

13.

Stouppa and

Studer . Data privacy for ALC knowledge bases . In Proc. of the International Symposium on Logical Foundations of Computer Science , volume 5407 of Lecture Notes in Computer Science, pages 409 - 421 . Springer, 2009 .

14.

Sweeney . K-anonymity: A model for protecting privacy . Int. J. Uncertain. Fuzziness Knowl.-Based Syst. , 10 ( 5 ): 557 - 570 , 2002 .

15.

Tobies . The complexity of reasoning with cardinality restrictions and nominals in expressive description logics . J. of Artificial Intelligence Research , 12 : 199 - 217 , 2000 .

16.

Tobies . Complexity results and practical algorithms for logics in knowledge representation. CoRR, cs . LO/0106031 , 2001 . PhD thesis , RWTH Aachen.

17.

Toman and

G. E.

Weddell . Conjunctive query answering in CF Dnc : A PTIME description logic with functional constraints and disjointness . In Proc. of the 26th Australasian Joint Conference on Advances in Artificial Intelligence , volume 8272 of Lecture Notes in Computer Science, pages 350 - 361 . Springer, 2013 .