Armstrong Relations for Ontology Design and Evaluation Henriette Harmse1 , Katarina Britz2 , and Aurona Gerber1 1 CSIR Meraka Institute and Department of Informatics, University of Pretoria, South Africa 2 CSIR CAIR, Information Science Department, Stellenbosch University, South Africa Abstract. A challenge in ontology design is to ensure that the ontol- ogy accurately represents the application domain and application con- straints. Motivating scenarios provide the motivation for the representa- tion choices made during design, and competency questions are subse- quently used to evaluate the design. In this paper we show how the notion of Armstrong relations from relational database theory can be used to generate motivating scenarios and competency questions for uniqueness constraints and not null constraints. 1 Introduction In this paper3 our focus is on supporting an ontology engineer in designing an ontology that captures the relevant constraints of the application domain based on the input of a domain expert. The problem is that the expert might have omitted to mention constraints that are of importance. The question now is, how can the ontology engineer help the expert to determine all relevant constraints? In an attempt to answer this, we translate Armstrong relations from re- lational database theory to ontologies, which is the main contribution of this paper. In relational database theory Armstrong relations are used to generate “perfect test data” for a set of constraints. “Perfect test data” satisfies all con- straints deemed relevant and violates all constraints considered irrelevant. For arbitrary constraints Armstrong relations are undecidable [12], and we therefore limit ourselves to uniqueness constraints and not null constraints. Adopting the ontology design approach of Grüninger, et al.[9], we consider each row in an an Armstrong relation as a potential motivating scenario since it satisfies each meaningful constraint. The Armstrong relation will also violate every irrelevant constraint and thus the Armstrong relation is a motivating sce- nario for irrelevant constraints that could be used by a ontology engineer to verify with an expert that the constraints that are violated, are indeed irrele- vant. Once expert and ontology engineer agree on which constraints should be 3 This work was partially funded by the National Research Foundation of South Africa under Grant No. 85482. met and which not, these constraints and motivating scenarios can be used to evaluate the design of the ontology through formal competency questions (CQs). The paper is structured as follows: Section 2 presents preliminaries and Sec- tion 3 develops the algorithm for generating an Armstrong ABox, which we relate to ontology design and evaluation in Section 4. We conclude with Section 5. 2 Preliminaries 2.1 Ontology Design and Evaluation The ontology design and evaluation approach of Grüninger, et al. [9], aims to engineer an ontology based on first-order logic that matches a set of business requirements expressed as a number of motivating scenarios. Each motivating scenario gives rise to informal competency questions (CQ) which the ontology should answer. Based on the informal CQs a terminology can be constructed and the informal CQs can be translated into formal CQs. The last step in the approach is to define the conditions under which the competency questions are complete and to prove completeness. Competency questions are defined by Grüninger, et al. [9] as follows: Definition 1. Let Tontology be a terminology with related axioms, Tground a set of ground literals and Q a first-order sentence expressed in terms of Tontology . A formal competency question then has one of two forms: Entailment CQ Determine whether Tontology ∪ Tground  Q Consistency CQ Determine whether Tontology ∪ Tground 2 ¬Q 2.2 Armstrong Relations Armstrong relations are the underlying notion to Armstrong ABoxes which we present in Section 3. In this section we explain the benefit of using Armstrong relations in relational database theory with some examples. A challenge in database design is to determine the constraints that are rel- evant and the constraints that are irrelevant for a set of business requirements. This task is challenging due to the mismatch in expertise: typically domain ex- perts have in-depth domain knowledge but limited modelling knowledge whilst ontology engineers have modelling experience without domain knowledge. Test data are therefore often used to validate, communicate and consolidate the con- ceptual model [12]. In relational database theory perfect test data can be generated using Arm- strong relations. The generated test data are said to be perfect since an Arm- strong relation for a table schema is a set of data that satisfy all constraints that are perceived to be relevant and violates all constraints that are perceived to be irrelevant [12]4 . As an example, consider the table schema Schedule used by 4 The formal definition of an Armstrong relation can be found in the Link, et. al. paper, which is similar to that of an Armstrong ABox defined in Definition 8. 2 Link, et al. [12] (Figure 1), which defines a schedule of courses taught by lectur- ers at given times in given rooms. Note that the primary key over C ID and Time implies two constraints, namely that (1) the combination of C ID and Time is unique and that (2) neither C ID nor Time is allowed to be null. An Armstrong relation for these 2 constraints is given in the table in Figure 1. Note that ni is used to represent a null marker or “no information”. Fig. 1. Table schema Schedule with related Armstrong relation The table is an Armstrong relation for the table schema in Figure 1 because for all rows the combination of C ID and Time is unique and neither of these columns contain any null values. Hence, it agrees with the uniqueness and not null constraints that are relevant and it also violates all uniqueness and not null constraints that are irrelevant. As an example it violates the uniqueness constraint for C ID in rows 1 and 2. An Armstrong relation is constructed with only a certain set of constraints in mind. The Armstrong relation of Figure 1 only considers uniqueness and not null constraints. Moreover, Armstrong relations are undecidable for arbitrary constraints [2], but under certain conditions Armstrong relations do exist for cardinality constraints, uniqueness constraints, functional dependencies, multi- valued dependencies and not null constraints [8, 12]. In this paper we focus on uniqueness- and not null constraints. Based on the example data in Figure 1, a domain expert subsequently realizes that having a lecturer teach two courses at the same time in the same room does not make sense. They decide to revise the table schema to that of Figure 2, which presents the related Armstrong relation as well. Fig. 2. Redesigned table schema Schedule with related Armstrong relation 2.3 Important Description Logic Notions In this paper we will use OWL 2 [11, 13], which we assume readers are familiar with and therefore we only highlight the DL notions that are of relevance to this paper, namely n-ary relations and uniqueness constraints. n-ary Relations In general DLs only consider unary (concepts) and binary re- lations (roles) and hence n-ary relations cannot be expressed directly [1]. Variants of the DL DLR are the exception to the rule and explicitly include constructors 3 in their syntax for expressing n-ary relations [4]. However, n-ary relations can still be expressed in DLs that do not cater for n-ary relations through the notion of reification [1]. Uniqueness Constraints The requirement for expressing uniqueness is a need that has been expressed often by users of ontologies. This lead to various imple- mentations of identification constraints in DLs with different expressivity and different behaviour [3, 5, 6, 13]. However, the underlying assumption on which these implementations agree is that if two individuals x and y participate in some inverse functional role r, then it follows that x = y. We will use Easy Keys as defined in Parsia, et. al. [13]. 3 Generating an Armstrong ABox In this section we adapt the definitions and algorithm of Link, et al. [12], which are applicable to relational database theory to ontologies. The translation has to consider differences between DLs and relational database theory, in particular the open world/closed world assumption. However, in constructing an Armstrong ABox we will restrict ourselves to known individuals. The translation will be presented by first providing an informal discussion and/or example based on the table schemas and instances in Section 2.2. This approach is adopted so that the parallels between relational database theory and ontologies are clear, which will help to clarify why the Armstrong relation notion can be applied to ontologies. These informal explanations are followed by the formal DL related definitions and examples. 3.1 Ontology Structure for representing an Armstrong Relation Intuitively a table schema maps to a TBox and a table instance to an ABox. Hence, an Armstrong relation for a table schema Schema will translate to an ontology OSchema where OSchema = TSchema ∪ ASchema . The table schema of Figure 2 describes three different aspects of the table schema Schedule namely: 1. The structure of the table schema Schedule, which consists of columns C ID, Time, L Name and Room, where each column is associated with a countably in- finite domain, which in this case is respectively CHAR[5], VARCHAR, CHAR[15] and VARCHAR. 2. The primary key over C ID and Time, which states that for these columns null values are not allowed, which we will refer to as null-free here. 3. The uniqueness constraint which is specified over C ID and Time and the uniqueness constraint which is specified over Time, L Name and Room. The Armstrong algorithm of Link, et al. [12] depends on the ability to discern these different aspects of a table schema. We therefore partition TSchema such S nf Σ S that TSchema = TSchema ∪TSchema ∪TSchema , where TSchema defines the structural nf Σ aspects, TSchema the null-free aspects and TSchema the uniqueness constraints that apply to the Schema representation in DLs. 4 3.2 Characterization of n-ary Relations The definitions presented in this section establishes the terminology that will be used to give a compact characterization of the DL representation of a ta- ble schema. The structure of a table schema consisting of n columns can be represented via reification, which we formalize in Definition 2 [1]. Definition 2. An n-ary relation Rn over components R1 , . . . , Rn can be ex- pressed as the reified concept CRn using n roles r1 , . . . , rn such that CRn v ≤ 1r1 .> u . . . u ≤ 1rn .> > v ∀r1 .R1 u . . . u > v ∀rn .Rn CRn hasKey (r1 , . . . , rn ) where the key constraint ensures that no 2 individuals of CRn agrees on their participation in roles r1 , . . . , rn . For convenience we denote the DL axioms that represent the n-ary relation Rn as TRSn . S Example 1. The corresponding TBox TSched axioms for the table schema of Fig- ure 2 are: S TSched = {CSched v≤ 1rC ID .>u ≤ 1rT ime .>u ≤ 1rL N ame .>u ≤ 1rRoom .>, > v ∀rC ID .Rchar[5] , > v ∀rT ime .Rvarchar , > v ∀rL N ame .Rchar[15] , > v ∀rRoom .Rvarchar , CSched hasKey (rC ID , rT ime , rL N ame , rRoom )} D D D where Rchar[5] , Rvarchar , Rchar[15] ⊆ 4D and 4D is the domain of all data types. For a given table schema various table instances can exist. The table instance of Figure 2 is one possible table instance of the related table schema Schedule. When for a given row certain column values are not null, the row is said to be total for those columns. In the table instance of Figure 1, row 1 is total for columns C ID, Time, L Name and Room, while row 4 is total only for columns C ID, Time and Room. When all rows for a table instance are total for a set of columns, the table instance is said to be total for those columns. Hence, the table instance is total for columns C ID and Time. A table instance is represented in DLs as an ABox, denoted by ARn , corre- sponding to TBox TRn . For a given TRn any number of A0Rn , A1Rn , . . . can exist, where each AiRn represents a different table instance for a given table schema. The next definition defines ARn and related notions. Definition 3. An ABox ARn corresponding to TRn consists of a set of asser- tions CRn (cj ), j ≥ 1. A known individual c of CRn is said to be X-total, X ⊆ {r1 , . . . , rn }, if there are known individuals hi , such that Ri (hi ) and ri (c, hi ), for any ri ∈ X. ARn is said to be X-total if for all known individuals cj , j ≥ 1 such that CRn (cj ) ∈ ARn we have that cj is X-total. Example 2. The corresponding ASched is given for TSched in Table 1 where c1 , c2 , c3 and c4 (for which the assertions are not explicitly shown) are individuals of CSched . Table 1 is the ABox representation of the table instance of Figure 2. Note 5 that no assertions are added for null markers since this represents the semantics of “no information” faithfully. Individual c1 is {rC ID , rT ime , rL N ame , rRoom }- total and ASched is {rC ID , rT ime }-total. The table schema of Figure 2 states that columns C ID and Time together represent the primary key of Schedule. One implication of this is that columns C ID and Time represent a null-free subschema of Schedule and hence the table instance of Figure 2 is {C ID,Time}-total. Table 1. Corresponding ABox, ASched , for the table instance of Figure 2 C ID Time L Name Room rC ID (c1 , “11301”) rT ime (c1 , “M on, 10am”) rL N ame (c1 , “Church”) rRoom (c1 , “Red”) rC ID (c2 , “11301”) rT ime (c2 , “T ue, 02pm”) rL N ame (c2 , “Church”) rRoom (c2 , “Red”) rC ID (c3 , “78200”) rT ime (c3 , “M on, 10am”) rL N ame (c3 , “Church”) rC ID (c4 , “99120”) rT ime (c4 , “M on, 10am”) rRoom (c4 , “Red”) Definition 4. Let X ⊆ {r1 , . . . , rn } be a subset of the components of Rn that are null-free in the application domain, we define TRnf n such that it consists of the axioms CRn v ∃ri .>, for any ri ∈ X. If TRnf n is non-empty we say TRn is null-free for X, which we abbreviate by nf (TRn , X). When ARn is also X-total and therefore ARn satisfies nf (TRn , X), we say that ARn is null-free for X, which we denote by ARn nf (TRn , X). nf Example 3. When {CSched v ∃rC ID .>, CSched v ∃rC T ime .>} ⊆ TSched , we say that the Schedule relation is null-free for rC ID and rT ime , which we denote by nf (TSched , {rC ID , rT ime }). From Table 1 we determine that ASched is null-free for rC ID and rT ime since ASched is {rC ID , rT ime }-total. Hence, we have that ASched nf (TSched , {rC ID , rT ime }). A cardinality constraint (CC) over columns of a table schema constrains how many times a combination of values can be duplicated for those columns in a table instance. This is expressed as card(columns) ≤ b. As an example, the table instance of Figure 2 satisfies card(L N ame, Room) ≤ 2. Definition 5. A cardinality constraint (CC) over TRSn is a statement card(TRn , X) ≤ b where X ⊆ {r1 , . . . , rn } and b is a positive integer. The CC card(TRn , X) ≤ b is satisfied by ARn , which we denote by ARn card(TRn , X) ≤ b, if and only if for each rk ∈ X there is a named individual hk such that ]{cj |cj is X-total, CRn (cj ) ∈ ARn and for each ri ∈ X, ∃hi such that Ri (hi ) ∈ ARn , ri (cj , hi ) ∈ ARn } ≤ b. In the special case where b = 1 for a CC, the CC is called a uniqueness con- straint (UC) and is represented by u(TRn , X), which if satisfied by ARn is de- noted by ARn u(TRn , X) 5 . Example 4. For Table 1 ASched card(TSched , {rL N ame , rRoom }) ≤ 2. 5 We explicitly ignore non-standard CCs and UCs, that is where X = ∅. 6 3.3 C-Armstrong ABoxes Recall from Section 2.2 that an Armstrong relation satisfies all constraints that are relevant and violates all constraints that are irrelevant. I.e., the table schema of Figure 2 states that there is a UC over columns C ID and Time. Hence, there is no row in the Armstrong relation of Figure 2 that violates this constraint. Definition 6. Let Σ be a set of constraints (i.e. UCs) over TRSn . We say that ARn satisfies Σ, denoted by ARn Σ, if ARn satisfies every element of Σ. If for some σ ∈ Σ, ARn does not satisfy σ we say ARn violates σ and write ARn 1 σ. By extension we have that ARn also violates Σ, which we denote by ARn 1 Σ. For convenience we represent the axioms that represent Σ as TRΣn . Example 5. For ASched of Table 1 we have that ASched Σ where Σ = {u(rC ID , rT ime ), u(rT ime , rL N ame , rRoom )}. However, for Σ = {u(rC ID ), u(rL N ame , rRoom )} we have that ASched 1 Σ. Figure 2 states that there is a UC over columns C ID and Time, which implies the following UCs: (1) the UC over C ID, Time and L Name, (2) the UC over C ID, Time and Room, and (3) the UC over C ID, Time, L Name and Room. Definition 7. Assume TRSn is given with nf (TRn , X), X ⊆ {r1 , . . . , rn }, with Σ ∪ {ϕ} a finite set of constraints over TRn . We say that Σ implies ϕ in the presence of nf (TRn , X), denoted by Σ, nf (TRn , X)  ϕ, if every X-total ARn that satisfies Σ also satisfies ϕ. If Σ does not imply ϕ in the presence of nf (TRn , X) it is denoted by Σ, nf (TRn , X) 2 ϕ. We denote the semantic ∗ closure of Σ and nf (TRn , X) by Σnf (TRn ,X) , which we define as ∗ Σnf (TRn ,X) = {ϕ|Σ, nf (TRn , X)  ϕ} S Example 6. For TSched , nf (TSched , {rC ID , rT ime }) and Σ = {u(rC ID , rT ime ), u(rT ime , rL N ame , rRoom )} we have that ∗ Σnf (TSched ,{rC ID ,rT ime }) = {u(rC ID , rT ime ), u(rT ime , rL N ame , rRoom ), u(rC ID , rT ime , rL N ame ), u(rC ID , rT ime , rRoom ), u(rC ID , rT ime , rL N ame , rRoom )} Given the previous definitions it is now possible to formalize the notion of an Armstrong ABox for an n-ary relation Rn . Definition 8. Let C denote a class of constraints and let Σ be a finite set of elements from C. Assume nf (TRn , X), X ⊆ S holds where S = {r1 , . . . , rn }. We say ARn is C-Armstrong for Σ and nf (TRn , X) if and only if 1. ARn satisfies Σ, which we write as ARn Σ, ∗ 2. ARn violates every ϕ ∈ / Σnf (TRn ,X) , which we write as ARn 1 ϕ for each ∗ ϕ∈ / Σnf (TRn ,X) , 3. ARn is X-total, which is written as ARn nf (TRn , X), and 4. for any H ⊆ S − X, ARn is not H-total, which we write as ARn 1 nf (TRn , H). 7 S Example 7. For TSched , nf (TSchedtheoremstar , {rC ID , rT ime }) and Σ = {u(rC ID , rT ime ), u(rT ime , rL N ame , rRoom )}, we can confirm that ASched of Table 1 is C-Armstrong, where C represents the class of UCs, by verifying that: 1. ASched satisfies Σ, ∗ 2. ASched violates every ϕ ∈ / Σnf (TSched ,{rC ID ,rT ime }) , 3. ASched is {rC ID , rT ime }-total and 4. ASched is not {rL N ame }-total or {rRoom }-total. 3.4 Anti-keys and Strong Agreement Sets Calculating an Armstrong relation for the class of UCs depends on the ability to determine those columns for which no UC is implied in order for them to be violated. Rather than violating all UCs not implied, it is sufficient to only violate those that are maximal with the property that they are not implied by Σ in the presence of the null-free subschema. Evidence for this intuition can be seen in Figure 2. The combination of C ID, L Name and Room as a potential UC is violated by rows 1 and 2, which implies that no subset of these columns can be a UC. In relational database theory the combination of Time, L Name and Room is referred to as an anti-key [12]. An anti-key satisfies the following 3 conditions: (1) anti-keys are columns of a table schema, (2) anti-keys are non-empty sets of columns of a table schema for which no UC hold, and (3) anti-keys are maximal w.r.t. condition (2). Definition 9. Let Σ be a set of UCs, S = {r1 , . . . , rn }, X ⊆ S and assume nf (TRn , X) holds for TRSn . For Y ⊆ S, we denote that Y is an anti-key by a(Y ). The set Σ −1 of all anti-keys is defined as Σ −1 = {a(Y )|Y ⊆ S and Σ, nf (TRn , X) 2 u(TRn , Y ) and for any H ⊆ (S − Y )[Σ, nf (TRn , X)  u(TRn , Y ∪ H)]} Example 8. The anti-keys for TSched with nf (TRn , {rC ID , rT ime }) and UCs Σ = {u(rC ID , rT ime ), u(rT ime , rL N ame , rRoom )} is given by Σ −1 = {a(rC ID , rL N ame , rRoom ), a(rT ime , rL N ame ), a(rT ime , rRoom )}. Anti-keys can be calculated using hypergraphs where the columns of the table schema and the sets of columns representing UCs are respectively the vertexes and the edges [12]. Since only minimal UCs are used, the hypergraph is a simple hypergraph. We illustrate the calculation of anti-keys through an example. The minimal transversal [7] of H is given by T r(H) = {{Time}, {C ID, Room}, {C ID, L Name}}. Anti-keys are maximal such that they are not UCs. Since each element of the transversal represent columns that form part of minimal UCs, the set difference between the set of columns representing the table schema and an element of the transversal delivers an anti-key. I.e., the set difference between the columns of Schedule and {Time}, the first element of the transversal, delivers the anti-key over C ID, L Name and Room. 8 Algorithm 1 Algorithm for calculating an Armstrong ABox Input: TBox TRSn , S = {r1 , . . . , rn }, Σ a set of UCs and nf (TRn , X), X ⊆ S Output: Armstrong ABox ARn for Σ and nf (TRn , X) where hri ,jri , i = {1, . . . , n}, jri ≥ 1, are individuals such that Ri (hri ,jri ) A1: ARn = ARn ∪ {CRn (c1 ), ri (c1 , hri ,1 )}, i = {1, . . . , n} A2: Compute Σ −1 using hypergraph methods A3: k = 2, jri = 2, i = {1, . . . , n}; A4: for all Y such that a(Y ) ∈ Σ −1 do ARn = ARn ∪ {CRn (ck ), ri (ck , hri ,jri )}, i = {1, . . . , n} where  hri ,1 , if ri ∈ Y  hri ,jri = hri ,jri , jri = jri + 1, if ri ∈ X − Y ; k = k + 1;  do nothing, else  A5: Let Z ⊆ S be such that ARn is Z-total if Z − X 6= ∅ then ARn = ARn ∪ {CRn (ck ), ri (ck , hri ,jri )}, i = {1, . . . , n} where ( do nothing, if ri ∈ X − Z hri ,jri = ; return ARn hri ,jri , else else return ARn Lemma 1. Define the hypergraph H for TRSn , S = {r1 , . . . , rn }, with Σ a set of minimal UCs, with nf (TRn , X), X ⊆ S, as H = (V, E) where V = S and E = {Z|u(Z) ∈ Σ}. Then the set of anti-keys is defined as Σ −1 = {a(S − W )|W ∈ T r(H)} where T r(H) denotes the minimal transversal of the hypergraph H. S Example 9. For TSched and Σ = {u(rC ID , rT ime ), u(rT ime , rL N ame , rRoom )} we create the hypergraph H = (V, E) where V = {rC ID , rT ime , rL N ame , rRoom } and E = {{rC ID , rT ime }, {rT ime , rL N ame , rRoom }}. We then have that T r(H) = {{rT ime }, {rC ID , rRoom }, {rC ID , rL N ame }} from which follows that Σ −1 = {a(rC ID , rL N ame , rRoom ), a(rT ime , rL N ame ), a(rT ime , rRoom )}. A notion related to that of anti-keys is strong agreement sets. Two rows of a table instance is said to strongly agree on a set of columns if for those columns the two rows agree on their total values. For the table instance in Figure 2 row 1 and 2 strongly agree on columns C ID, L Name and Room. Definition 10. Given ARn we say two individuals c1 and c2 of concept CRn strongly agree on the roles X, X ⊆ S, where S = {r1 , . . . , rn }, if they are X-total and the fillers of each ri ∈ X are filled by the same individuals. The strong agreement set of c1 and c2 , denoted by ag s (c1 , c2 ), is given by ag s (c1 , c2 ) = {ri |ri -total , CRn (c1 ) ∈ ARn , CRn (c2 ) ∈ ARn , and ∃yi , zi such that Ri (yi ) ∈ ARn , Ri (zi ) ∈ ARn , ri (c1 , yi ) ∈ ARn , ri (c2 , zi ) ∈ ARn and yi = zi for any ri ∈ X}. 9 The strong agreement set of ARn is given by ag s (ARn ) which we define as ag s (ARn ) = {ag s (c1 , c2 )|CRn (c1 ) ∧ CRn (c2 ) ∧ c1 6= c2 } Example 10. The strong agreement set of ARn of Table 1 is given by ag s (ASched ) = {{rC ID , rL N ame , rRoom }, {rT ime , rL N ame }, {rT ime , rRoom }, {rL N ame }, {rRoom }, {rT ime }} 3.5 Algorithm for generating an Armstrong ABox Anti-keys and strong agreement sets are useful to characterize an Armstrong relation for the class of UCs and a null-free subschema. A table instance is Armstrong for the class of UCs and a null-free subschema if and only if the following conditions hold: 1. Each anti-key is an element of a strong agreement set. 2. There is no uniqueness constraint such that the set of columns representing a uniqueness constraint is a subset of a strong agreement set. 3. The set of columns over which the table instance is total is the same as the null-free subschema. Theorem 1. For TRSn let Σ be a set of UCs and assume nf (TRn , X), X ⊆ S where S = {r1 , . . . , rn } holds. An ABox ARn for TRSn , Σ and nf (TRn , X) is an Armstrong ABox if and only if the following conditions hold: 1. ∀a(Y ) ∈ Σ −1 [Y ∈ ag s (ARn )], 2. (∀u(TRn , Y ) ∈ Σ)(∀Z ∈ ag s (ARn ))[(Y * Z)], 3. Let Z ⊆ S be such that ARn is Z-total, then Z = X [10, 12]. Example 11. Examples 9 and 10 calculated Σ −1 and ag s (ASched ) for TSched with nf (TRn , {rC ID , rT ime }) and Σ = {u(rC ID , rT ime ), u(rT ime , rL N ame , rRoom )}. It therefore follows from Theorem 1 that ARn (Table 1) is an Armstrong ABox. Table 2. ASched generated using Algorithm 1 C ID Time L Name Room rC ID (c1 , hrC ID ,1 ) rT ime (c1 , hrT ime ,1 ) rL N ame (c1 , hrL N ame ,1 ) rRoom (c1 , hrRoom ,1 ) rC ID (c2 , hrC ID ,1 ) rT ime (c2 , hrT ime ,2 ) rL N ame (c2 , hrL N ame ,1 ) rRoom (c2 , hrRoom ,1 ) rC ID (c3 , hrC ID ,2 ) rT ime (c3 , hrT ime ,1 ) rL N ame (c3 , hrL N ame ,1 ) rC ID (c4 , hrC ID ,3 ) rT ime (c4 , hrT ime ,1 ) rRoom (c4 , hrRoom ,1 ) Algorithm 1 calculates an Armstrong ABox ARn for TRSn , S = {r1 , . . . , rn }, with nf (TRn , X) for X ⊆ S and Σ is a set of UCs, using the following steps: (A1) generates an arbitrary first tuple. (A2) calculates the anti-keys. (A3) initializes indexes k and jri , where k keeps track of the number of individuals representing tuples and indexes jri keep track of the number of different individuals used as role fillers for each of the roles ri . (A4) generates for each anti-key an individual ck with related assertions where hri ,1 is re-used when ri forms part of an anti-key. (A5) generates the equivalent of ni for any ri that is not null-free. 10 S Example 12. For TSched , Σ = {u(rC ID , rT ime ), u(rT ime , rL N ame , rRoom )} and nf (T , {rC ID , rT ime }) we generate ASched (Table 2) using Algorithm 1, assuming CRn (c1 ), . . . , CRn (c4 ) are implied. A mapping of each hri ,jri delivers Table 1. Theorem 2 follows from Lemma 1 and Theorem 1 [12]. Theorem 2. On input (TRSn , Σ, nf (TRn , X)) Algorithm 1 generates an Arm- strong ABox ARn over TRSn that is Armstrong for Σ and nf (TRn , X). The preceding definitions motivate the following definition: Definition 11. Given TRSn with Σ a set of UCs and nf (TRn , X), X ⊆ S where S = {r1 , . . . , rn }, we define the corresponding TBox TRn as TRn = TRSn ∪ TRΣn ∪ TRnf Σ nf n where TRn consists of the axioms representing Σ and TRn consists of the axioms representing nf (TRn , X). The corresponding Armstrong ontology ORn is given by ORn = TRn ∪ ARn where ARn is an Armstrong ABox for TRn . 4 Ontology Design and Evaluation using Armstrong ABoxes In this section we discuss how an Armstrong ABox can be used to generate motivating scenarios and CQs. Each row in Table 1 represents a scenario that supports the current design decisions for the ontology since it satisfies all relevant constraints. Each row in Table 1 is thus a motivating scenario. In order to ensure no latent UCs are missed, the complete table also needs to be considered. To communicate the current understanding of the relevant and irrelevant constraints, an ontology engineer can use an informal notation such as the table instance in Figure 2. In this way an Armstrong ABox can be used to systematically generate scenarios which can help an expert to validate the design. The design of the ontology can be evaluated through generated CQs. TRn rep- resent the axioms that have to be satisfied in every model. Hence, it follows that for every α ∈ TRn we have that TRn  α, which is referred to as an entailment CQ in Section 2.1. Each row of Table 1 must be consistent for ORn , which we can ex- press as ORn 2 ¬Q where for all i = {1, . . . n} Q = CRn (cj ) ∧ R(hij ) ∧ ri (cj , hij ), which is referred to as a consistency CQ in Section 2.1. Note that this last CQ represents a means with which the motivating scenarios can be formally evalu- ated since, as stated earlier, ARn consists of motivating scenarios. 5 Conclusion We presented a preliminary theoretical framework for applying Armstrong re- lations to ontology design within the context of the ontology design approach of Grüninger, et al. [9], where motivating scenarios inform design decisions and CQs are used to evaluate the ontology design. In future research we will investi- gate the applicability of Armstrong ABoxes to other DL constraints and aim to show how this framework can be used to revise existing populated ontologies. 11 References 1. Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P. (eds.): The Description Logic Handbook. Cambridge University Press (2007), 2nd edition 2. Beeri, C., Dowd, M., Fagin, R., Statman, R.: On the Structure of Armstrong Re- lations for Functional Dependencies. J. ACM 31(1), 30–46 (1984) 3. Borgida, A., Weddell, G.E.: Adding Uniqueness Constraints to Description Logics (Preliminary Report). In: Bry, F., Ramakrishnan, R., Ramamohanarao, K. (eds.) DOOD. Lecture Notes in Computer Science, vol. 1341, pp. 85–102. Springer (1997) 4. Calvanese, D., De Giacomo, G., Lenzerini, M., Nardi, D., Rosati, R.: Description Logic Framework for Information Integration. In: Principles of Knowledge Repre- sentation and Reasoning. pp. 2–13. Citeseer (1998) 5. Calvanese, D., De Giacomo, G., Lenzerini, M.: Identification Constraints and Func- tional Dependencies in Description Logics. In: Proceedings International Joint Con- ference on Artificial Intelligence. vol. 17, pp. 155–160. Citeseer (2001) 6. Calvanese, D., Fischl, W., Pichler, R., Sallinger, E., Simkus, M.: Capturing rela- tional schemas and functional dependencies in RDFS. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence. (2014) 7. Eiter, T., Gottlob, G.: Identifying the Minimal Transversals of a Hypergraph and Related Problems. In: SIAM Journal on Computing. pp. 1278–1304. Society for Industrial and Applied Mathematics (1995) 8. Fagin, R.: Armstrong Databases. Tech. rep. (May 1982) 9. Grüninger, M., Fox, M.: Methodology for the Design and Evaluation of Ontologies. In: Proceedings of International Joint Conference on Artificial Intelligence, Work- shop on Basic Ontological Issues in Knowledge Sharing, April 13, 1995 (1995) 10. Hartmann, S., Leck, U., Link, S.: On Codd Families of Keys over Incomplete Re- lations. Comput. J. 54(7), 1166–1180 (2011) 11. I. Horrocks, O. Kutz, and U. Sattler. The even more irresistible SROIQ. In P. Doherty, J. Mylopoulos, and C. A. Welty, editors, Proceedings of the 10th In- ternational Conference on Principles of Knowledge Representation and Reasoning, pages 57–67. AAAI Press, 2006. 12. Link, S.: Armstrong Databases: Validation, Communication and Consolidation of Conceptual Models with Perfect Test Data. Proceedings of the Eighth Asia-Pacific Conference on Conceptual Modelling (2012) 13. Parsia, B., Sattler, U., Schneider, T.: Easy Keys for OWL. In: Dolbear, C., Rutten- berg, A., Sattler, U. (eds.) Proceedings of the 5th Workshop on OWL: Experiences and Directions. CEUR Workshop Proceedings, vol. 432. CEUR Workshop Pro- ceedings (2008) 12