Introducing Customised Datatypes and Datatype Predicates into OWL(∗) Jeff Z. Pan and Ian Horrocks School of Computer Science, University of Manchester, UK Abstract. Although OWL is rather expressive, it has a very serious limitation on datatypes; i.e., it does not support customised datatypes. It has been pointed out that many potential users will not adopt OWL unless this limitation is overcome, and the W3C Semantic Web Best Practices and Deployment Working Group has set up a task force to address this issue. This paper provides a solution for this issue by presenting two decidable datatype extensions of OWL DL, namely OWL-Eu and OWL-E. OWL-Eu provides a minimal extension of OWL DL to support customised datatypes, while OWL-E extends OWL DL with both cus- tomised datatypes and customised datatype predicates. 1 Introduction The OWL Web Ontology Language [1] is a W3C recommendation for expressing on- tologies in the Semantic Web. Datatype support [7, 8] is one of the key features that OWL is expected to provide, and has prompted extensive discussions in the RDF-Logic mailing list [10] and in the Semantic Web Best Practices mailing list [12]. Although OWL adds considerable expressive power to the Semantic Web, the OWL datatype for- malism (or simply OWL datatyping) is much too weak for many applications; in partic- ular, OWL datatyping does not provide a general framework for customised datatypes, such as XML Schema derived datatypes. It has been pointed out that many potential users will not adopt OWL unless this limitation is overcome [11], as it is often necessary to enable users to define their own datatypes and datatype predicates for their ontologies and applications. One of the most well known type systems is W3C XML Schema Part 2 [2], which defines facilities to allow users to define customised datatypes, such as those defined by imposing some restrictions in the value spaces of existing datatypes. Example 1. Customised datatypes are useful in capturing the intended meaning of some vocabulary in ontologies. For example, users might want to use the customised datatype ‘atLeast18’ in the following definition of the class ‘Adult’: Class(Adult complete Person restriction(age allValuesFrom(atLeast18))), which says that an Adult is a Person whose age is at least 18. The datatype constraint (∗) This work is partially supported by the FP6 Network of Excellence EU project Knowledge Web (IST-2004-507842). ‘at least 18’ can be defined as an XML Schema user-defined datatype in which the facet ‘minInclusive’ is used to restrict the value space of atLeast18 (a customised datatype) to be a subset of the value space of integer (an XML Schema built-in datatype). User-defined datatypes (like the above one) cannot, however, be used in the OWL datatyping, which (only) provides the use of some built-in XML Schema datatypes and enumerated datatypes, which are defined by explicitly specifying their instances. The OWL datatyping does not support XML Schema customised datatypes for the following two reasons: (i) XML Schema does not provide a standard way to access a user-defined datatype. (ii) OWL DL does not provide a mechanism to guarantee the computability of the kinds of customised datatypes it supports. This paper provides a solution for this issue by presenting two decidable datatype extensions of OWL DL, namely OWL-Eu and OWL-E. OWL-Eu provides a minimal extension of OWL DL to support customised datatypes, while OWL-E extends OWL DL with both customised datatypes and customised datatype predicates. The rest of the paper is organised as follows: Section 2 further discusses the motivations of introducing customised datatypes and datatype predicates. Section 3 extends the OWL datatyping to unary datatype groups, which enables the use of customised datatypes. Section 4 and 5 present the OWL-Eu and the OWL-E languages, respectively; the latter one is based on datatype groups, which are general forms of unary datatype groups. Section 6 concludes the paper and suggests some future work. 2 Motivations Allowing users to define their own vocabulary is one of the most useful features that ontologies can provide over other approaches, such as the Dublin Core, of providing semantics in the Semantic Web. In the Dublin Core standard, the meaning of the set of 15 information properties are described in English text. The main drawback of the Dublin Core is its inflexibility; it is impossible to ‘predefine’ information properties for all sorts of applications. Ontologies, however, are more flexible in that users can define their own vocabu- lary based on existing vocabularies. In ontology languages, a set of class constructors are usually provided so that users can build class expressions based on, for example, ex- isting class names. The intended meaning of the vocabulary, therefore, can be captured by the axioms in the ontologies. Let us revisit Example 1 and consider the intended meaning of the Adult class. According to its definition, an Adult is a Person who is at least 18 years old. As a result, programs can also understand the meaning of customised vocabulary, with the help of ontologies. Although OWL DL provides a set of expressive class constructors to build cus- tomised classes, it does not provide enough expressive power to support, for example, 2 XML Schema customised datatypes. In order to capture the intended meaning of Adult, Example 1 has already shown the necessity of customised datatypes. In what follows, we give some more examples to illustrate the usefulness of customised datatypes and datatype predicates in various SW and ontology applications. Example 2. Semantic Web Service: Matchmaking Matchmaking is a process that takes a service requirement and a group of service advertisements as input, and returns all the advertisements that may potentially satisfy the requirement. In a computer sales ontology, a service requirement may ask for a PC with memory size greater than 512Mb, unit price less than 700 pounds and delivery date earlier than 15/03/2004. Here ‘greater than 512’, ‘less than 700’ and ‘earlier than 15/03/2004’ are customised datatypes of base datatypes integer, integer and date, respectively. Example 3. Electronic Commerce: A ‘No Shipping Fee’ Rule Electronic shops may need to classify items according to their sizes, and to reason that an item for which the sum of height, length and width is no greater than 15cm belongs to a class in their ontology, called ‘small-items’. Then they can have a rule saying that for ‘small-items’ no shipping costs are charged. Accordingly, the billing system will charge no shipping fees for all the instances of the ‘small-items’ class. Here ‘greater than 15’ is a customised datatype, ‘sum’ is a datatype predicate, while ‘sum no greater than 15’ is a customised datatype predicate. 3 Unary Datatype Groups The OWL datatyping is defined based on the notion of datatype maps [9]. A datatype map is a partial mapping from supported datatype URIrefs to datatypes. In this section, we introduce unary datatype groups, which extend the OWL datatyping with a hierarchy of supported datatypes. Definition 1 A unary datatype group G is a triple (Md ,B,dom), where Md is the datatype map of G, B is the set of primitive base datatype URI references in G and dom is the declared domain function. We call S the set of supported datatype URI ref- erences of G, i.e., for each u ∈ S, Md (u) is defined; we require B ⊆ S. We assume that there exist unary datatype URI reference rdfs:Literal, owlx:DatatypeBottom 6∈ S. The declared domain function dom has the following properties: for each u ∈ S, if u ∈ B, dom(u) = u; otherwise, dom(u) = v, where v ∈ B.  Definition 1 ensures that all the primitive base datatype URIrefs of G are supported (B ⊆ S) and that each supported datatype URIref relates to a primitive base datatype URIref through the declared domain function dom. Example 4. G1 = (Md1 , B1 , dom1 ) is a unary datatype group, where – Md1 = {xsd:integer 7→ integer, xsd:string 7→ string, xsd:nonNegativeInteger 7→≥0 , xsdx:integerLessThanN 7→ can be represented by the following disjunctive expression or( and(xsd:nonNegativeInteger, xsdx:integerLessThan100000) oneOf(“low”ˆˆxsd:string,“medium”ˆˆxsd:string, “expensive”ˆˆxsd:string) ). Note that “low”ˆˆxsd:string is a typed literal, which represents a value of the xsd:string datatype. “low”, instead, is a plain literal, where no datatype informa- tion is provided. ♦ We now define the interpretation of a unary datatype group. 4 Abstract Syntax DL Syntax Semantics a datatype URIref u u uD oneOf(l1 , . . . , ln ) {l1 , . . . , ln } {l1 } ∪ . . . ∪ {lnD } D not(u) u (dom(u))D \ uD if u ∈ S \ B ∆D \ uD otherwise and(E1 , . . . , En ) E1 ∧ . . . ∧ En E1D ∩ . . . ∩ EnD or(P, Q) E1 ∨ . . . ∨ En E1D ∪ . . . ∪ EnD Table 1. Syntax and semantics of datatype expressions (OWL-Eu data ranges) Definition 3 A datatype interpretation ID of a unary datatype group G = (Md , B, dom) is a pair (∆D , ·D ), where ∆D (the datatype domain) is a non-empty set and ·D is a datatype interpretation function, which has to satisfy the following con- ditions: 1. (rdfs:Literal)D = ∆D and (owlx:DatatypeBottom)D = ∅; 2. for each plain literal l, lD = l ∈ PL and PL ⊆ ∆D (PL is the value space for plain literals); 3. for any two primitive base datatype URIrefs u1 , u2 ∈ B: uD D 1 ∩ u2 = ∅; 4. for each supported datatype URIref u ∈ S, where d = Md (u): (a) uD = V (d) ⊆ ∆D , L(u) ⊆ L(dom(u)) and L2V (u) ⊆ L2V (dom(u)); (b) if s ∈ L(d), then (“s”ˆˆu)D = L2V (d)(s); otherwise, (“s”ˆˆu)D is not defined; 5. ∀u 6∈ S, uD ⊆ ∆D , and “v”ˆˆu ∈ uD . Moreover, we extend ·D to G unary datatype expression as shown in Table 5 (page 8). Let E be a G unary datatype expression, the negation of E is of the form ¬E, which is interpreted as ∆D \ E D .  Next, we introduce the kind of basic reasoning mechanisms required for a unary datatype group. Definition 4 Let V be a set of variables, G = (Md , B, dom) a unary datatype group and u ∈ B a primitive base datatype URIref. A datatype conjunction of u is of the form k ^ l ^ (i) (i) C = uj (vj ) ∧ 6=i (v1 , v2 ), (1) j=1 i=1 (i) (i) Vk where the vj are variables from V, v1 , v2 are variables in j=1 uj (vj ), uj are datatype URI references from S such that dom(uj ) = u, and 6=i are the inequality Vk predicates for primitive base datatypes Md (dom(ui )) where ui appear in j=1 uj (vj ). A datatype conjunction C is called satisfiable iff there exists an interpretation (∆D , ·D ) of G and a function δ mapping the variables in C to data values in ∆D s.t. (i) (i) (i) (i) δ(vj ) ∈ uD D j (for all 1 ≤ j ≤ k) and {δ(v1 ), δ(v2 )} ⊆ ui and δ(v1 ) 6= δ(v2 ) (for all 1 ≤ i ≤ l). Such a function δ is called a solution for C w.r.t. (∆D , ·D ).  We end this section by elaborating the conditions that computable unary datatype groups require. 5 Definition 5 A unary datatype group G is conforming iff 1. for any u ∈ S \ B: there exists u0 ∈ S \ B such that u0D = uD , and 2. for each primitive base datatype in G, the satisfiability problems for finite datatype conjunctions of the form (1) is decidable.  4 OWL-Eu In this section, we present a small extension of OWL DL, i.e., OWL-Eu. The underpin- ning DL of OWL-Eu is SHOIN (G 1 ), i.e., the SHOIN DL combined with a unary datatype group G (1 for unary). Specifically, OWL-Eu (only) extends OWL data range (i.e., enumerated datatypes as well as some built-in XML Schema datatypes) to OWL- Eu data ranges defined as follows. Definition 6 An OWL-Eu data range is a G unary datatype expression. Abstract (as well as DL) syntax and model-theoretic semantics of OWL-Eu data ranges are presented in Table 5 (page 8).  The consequence of the extension is that customised datatypes, represented by OWL-Eu data ranges, can be used in datatype exists restrictions (∃T.u) and datatype value restrictions (∀T.u), where T is a datatype property and u is an OWL-Eu data range. Hence, this extension of OWL DL is as large as is necessary to support cus- tomised datatypes. Example 6. PCs with memory size greater than or equal to 512 Mb and with price cheaper than 700 pounds can be represented in the following OWL-Eu concept descrip- tion in DL syntax (cf. Table 5 on page 8): PC u ∃memorySizeInM b.<512 u ∃priceInP ound. <700 , where <512 is a relativised negated expression and <700 is a supported datatype in G1 . ♦ It turns out that OWL-Eu (i.e., the SHOIN (G 1 ) DL) is decidable. Theorem 1. The SHOIN (G 1 )-concept satisfiability problem w.r.t. a knowledge base is decidable if the combined unary datatype group is conforming. Proof: (Sketch) We will show the decidability of SHOIN (G 1 )-concept satisfiability w.r.t. TBoxes and RBoxes by reducing it to the SHOIN -concept satisfiability w.r.t. TBoxes and RBoxes. The basic idea behind the reduction is that we can replace each datatype group-based concept C in T with a new atomic primitive concept AC in T 0 . We then compute the satisfiability problem for all possible conjunctions of datatype group-based concepts (and their negations) in T (of which there are only a finite number), and in case a conjunction C1 u . . . u Cn is unsatisfiable, we add an axiom AC1 u . . . u ACn v ⊥ to T 0 . For example, unary datatype group-based concepts ∃T. >1 and ∀T. ≤0 occurring in T would be replaced with A∃T.>1 and A∀T.≤0 in T 0 , and A∃T.>1 u A∀T.≤0 v ⊥ would be added to T 0 because ∃T. >1 u ∀T. ≤0 is unsatisfiable (i.e., there is no solution for the predicate conjunction >1 (v) ∧ ≤0 (v)). 6 5 OWL-E: A Step Further In this section, we present a further extension of OWL-Eu, called OWL-E, which sup- ports not only customised datatypes, but also customised datatype predicates. A datatype predicate (or simply predicate) p is characterised by an arity a(p), or a minimum arity amin (p) if p can have multiple arities, and a predicate extension (or simply extension) E(p). The notion of predicate maps can be defined in an obvious way. For example, =int is a (binary) predicate with arity a(=int ) = 2 and extension E(=int ) = {hi1 , i2 i ∈ V (integer)2 | i1 = i2 }, where V (integer) is the value space for the datatype integer. Now we can generalise unary datatype groups by the definition of datatype groups. In fact, datatypes and datatype predicates can be unified in datatype groups. Roughly speaking, a datatype group is a group of built-in predicate URIrefs ‘wrapped’ around a set of primitive datatype URIrefs. A datatype group G is a tuple (Mp ,B,dom), where Mp is the predicate map of G, B is the set of primitive datatype URI references in G and dom is the declared domain function. We call S the set of built-in predicate URI references of G, i.e., for each u ∈ S, Mp (u) is defined; we require B ⊆ S. The declared domain function dom has the following properties: for each u ∈ S, u if u ∈ B,   (v1 , . . . , vn ), where v1 , . . . , vn ∈ B if u ∈ S \ B and     a(Mp (u)) = n,   dom(u) = {(v, . . . , v ) | i ≥ n}, where v ∈ B if u ∈ S \ B and   | {z }    i times   amin (Mp (u)) = n. Example 7. G2 = (Mp 2 , B2 , dom2 ) is a datatype group, where – Mp 2 = {xsd:integer 7→ integer, xsd:string 7→ string, xsd:integerGreaterThanOr- EqualToN 7→ ≥N , xsdx:integerLessThanN 7→ mT1 , . . . , Tn .E {x ∈ ∆ | ]{ht1 , . . . , tn i | hx, ti i ∈ T I (for all I atleast restriction 1 ≤ i ≤ m) ∧ ht1 , . . . , tn i ∈ E D } ≥ m} expressive predicate 6mT1 , . . . , Tn .E {x ∈ ∆ | ]{ht1 , . . . , tn i | hx, ti i ∈ T I (for all I atmost restriction 1 ≤ i ≤ m) ∧ ht1 , . . . , tn i ∈ E D } ≤ m} Table 3. New class constructors in OWL-E where Ts , Th , Tl , Tw are concrete roles representing “sum in cm”, “hight in cm”, “length in cm” and “width in cm”, respectively, and (+int ∧ [≥15 , integer, integer, integer]) is a conjunctive datatype expression representing the customised predicate “sum no larger than or equal to 15”.1 ♦ Like OWL-Eu, OWL-E (i.e., the SHOIQ(G) DL) is also a decidable extension of OWL-DL. Theorem 2. The SHOIN (G)- and SHOIQ(G)-concept satisfiability and subsump- tion problems w.r.t. TBoxes and RBoxes are decidable. According to Tobies [13, Lemma 5.3], if L is a DL that provides the nominal con- structor, knowledge base satisfiability can be polynomially reduced to satisfiability of TBoxes and RBoxes. Hence, we obtain the following theorem. Theorem 3. The knowledge base satisfiability problems of SHOIN (G) and SHOIQ(G) are decidable. 6 Conclusion In this paper, we propose OWL-Eu and OWL-E, two decidable extensions of OWL DL that support customised datatypes and customised datatype predicates. OWL-Eu provides a general framework for integrating OWL DL with customised datatypes, such as XML Schema non-list simple types. OWL-E further extends OWL-Eu to support customised datatype predicates. We have implemented a prototype extension of the FaCT [5] DL system, called FaCT-DG, to support TBox reasoning in both OWL-Eu and OWL-E (without nomi- nals). As for future work, we are planning to extend the DIG1.1 interface [3] to sup- port OWL-Eu, and to implement a Protégé [6] plug-in to support XML Schema non-list simple types, i.e. users should be able to define and/or import customised XML Schema non-list simple types based on a set of supported datatypes, and to exploit our prototype through the extended DIG interface. Furthermore, we plan to extend the FaCT++ DL reasoner [4] to support the full OWL-Eu and OWL-E ontology languages. 1 To save space, we use predicates instead of predicate URIrefs here. 9 Bibliography [1] Sean Bechhofer, Frank van Harmelen, James Hendler, Ian Horrocks, Deborah L. McGuinness, Peter F. Patel-Schneider, and Lynn Andrea Stein eds. OWL Web Ontology Language Reference. http://www.w3.org/TR/owl-ref/, Feb 2004. [2] Paul V. Biron and Ashok Malhotra. Extensible Markup Language (XML) Schema Part 2: Datatypes – W3C Recommendation 02 May 2001. Technical report, World Wide Web Consortium, 2001. http://www.w3.org/TR/xmlschema-2/. [3] DIG. SourceForge DIG Interface Project. http://sourceforge.net/projects/dig/, 2004. [4] FaCT++. http://owl.man.ac.uk/factplusplus/, 2003. [5] Ian Horrocks. Using an Expressive Description Logic: FaCT or Fiction? In Proc. of KR’98, pages 636–647, 1998. [6] Holger Knublauch, Ray W. Fergerson, Natalya Fridman Noy, and Mark A. Musen. The Protégé OWL Plugin: An Open Development Environment for Semantic Web Applications. In International Semantic Web Conference, pages 229–243, 2004. [7] Jeff Z. Pan and Ian Horrocks. Extending Datatype Support in Web Ontology Reasoning. In Proc. of the 2002 Int. Conference on Ontologies, Databases and Applications of SEmantics (ODBASE 2002), Oct 2002. [8] Jeff Z. Pan and Ian Horrocks. Web Ontology Reasoning with Datatype Groups. In Proc. of the 2003 International Semantic Web Conference (ISWC2003), pages 47–63, 2003. [9] Peter F. Patel-Schneider, Patrick Hayes, and Ian Horrocks. OWL Web On- tology Language Semantics and Abstract Syntax. Technical report, W3C, Feb. 2004. W3C Recommendation, URL http://www.w3.org/TR/2004/ REC-owl-semantics-20040210/. [10] RDF-Logic Mailing List. http://lists.w3.org/archives/public/www-rdf-logic/. W3C Mailing List, starts from 2001. [11] Alan Rector. Re: [UNITS, OEP] FAQ : Constraints on data values range. Discussion in [12], Apr. 2004. http://lists.w3.org/Archives/Public/public-swbp- wg/2004Apr/0216.html. [12] Semantic Web Best Practice and Development Working Group Mailing List. http://lists.w3.org/archives/public/public-swbp-wg/. W3C Mailing List, starts from 2004. [13] Stephan Tobies. Complexity Results and Practical Algorithms for Logics in Knowledge Representation. PhD thesis, Rheinisch-Westfälischen Technischen Hochschule Aachen, 2001. URL http://lat.inf.tu-dresden.de/ research/phd/Tobies-PhD-2001.pdf .