Terminology-Based Patterns for Natural Language Denitions in Ontologies Dagmar Gromann Vienna University of Economics and Business, Austria dgromann@wu.ac.at Abstract. Natural language content in ontologies is crucial to any hu- man interaction with them, but scarcely available. Terminology science centers on best practices in domain-specic natural languages. Hence, ontologies can benet from the systematic approach of terminology to natural language denitions. This paper proposes an Annotation On- tology Design Pattern named Natural Language Denition ODP that provides natural language denitions for ontology classes. For this pur- pose, a (semi-)automated method for implementing this pattern com- bining ontology verbalization and information extraction is investigated herein and exemplied in the domain of nance. Keywords: Annotation ODPs, Natural Language Denition, Terminol- ogy, Automatic Extraction of ODPs, Domain-Specic ODP Application 1 Introduction A growing number of application scenarios for Semantic Web (SW) ontologies render reusable, high-quality solutions to their design increasingly important. For this purpose, Ontology Design Patterns (ODP) dene a formal methodology for various aspects of ontological design, ranging from Logical to Presentation ODPs [1]. The latter seek to increase the usability and readability of ontologies from a user's perspective, which are vital to multi-lingual scenarios [2] and inter- actions with domain experts and users [3], and are divided into Annotation and Naming ODPs. Annotation ODPs provide best practices for homogeneous nat- ural language (NL) expressions (rdfs:label) and denitions (rdfs:comment), while the latter focus on naming conventions [1]. A general paucity of an opera- tional approach to NL denition authoring and its time-intensive nature led to scarce and frequently inconsistent NL denitions in ontologies. Thus, this paper investigates their (semi-)automated generation by means of the proposed Anno- tation ODP Natural Language Denition ODP based on established methods from terminology science. Ever since its advent, terminology science has realized the need of provid- ing a systematic approach to NL denitions. Concept-centered terminologies as dened by ISO 704:2009 and 1087:2000 consist of sets of terminology concepts in specialized domains. NL denitions are required to form these terminology concepts and their interrelations. In contrast, ontology concepts are formally de- ned by means of logics. Combining the formal ontological denition with the terminological NL denition authoring method results in a multidimensional approach formalized as the proposed Annotation ODP. Thereby, the ISO 704 method of combining the denomination of the superordinate concept with (a) characteristic(s), delimiting the concept to be dened from its related concepts, can be (semi-) automated and is the foundation of the proposed pattern. Given domain-specic, axiomatized ontology elements with a minimum of NL coverage in labels or fragment identiers, the superordinate concept's denomination can be identied and applied by using its subsumption hierarchy. For the non-trivial purpose of obtaining characteristics, three mutually complementing approaches are proposed: ontology verbalization, utilizing existing NL content, and informa- tion extraction. An example is provided by applying the pattern to the partially 1 available NL content of Fadyart's Finance Ontology . 2 Natural Language Denition ODP The objective of the NL Denition ODP is to dene an ontology concept in natural language(s). Ontologies represent knowledge by formalizing vocabular- ies of terms as well as their interrelations and dene their meaning formally. Terminologies mostly rely on NL characteristics to establish NL denitions for concepts and interrelate concepts designated by terms, appellations, or symbols. The two most important types of denitions as specied by ISO 704 are exten- sional denitions, listing the instances of a concept, and intensional denitions. The latter constitutes a combination of superordinate concept and manually identied delimiting characteristic(s) for concepts related generically. The intensional approach oers the most explicit, consistent, and precise method to denition formation. It is intended to provide the minimum of infor- mation needed for human users to dierentiate one terminology concept from another. To facilitate its automation, the basic textual description of ISO has been adopted and formalized for proposing the NL Denition ODP introduced in Denition 1 and illustrated in Example 1. The pattern denes the NL denition of an entry term, which corresponds to the label of the ontology class. The singular form of the term is preferred, unless only available in plural, e.g. liabilities. It utilizes the label or fragment identier of the superordinate concept, which for the experiment herein is re- stricted to Noun Phrases (NP). Thereby, it obtains a context and implicitly inherits the characteristics of the superclass. The NP is connected to character- istics by utilizing a nite set of relative pronouns, verbs, and where applicable verbalized object properties. The same elements and a coordinating conjunction are needed to string together several characteristics. Obtaining the characteristic(s) relies on a three-tiered mutually complement- ing approach of ontology verbalization, utilizing existing NL content, and infor- 1 http://fadyart.com/ version 3.04 mation extraction from structured Web resources. All three of them help specify- ing the relative pronoun and linking verb to be used for the concept to be dened. Denition 1: NL Denition ODP Entry Term [ A/An] NP [which/that/who/whose] [(can) be/include/belong to/classify as OR ] [()∗ and] Example 1: NL Denition of Concept Card (Fadyart Finance Ontology) Card [ A] payment instrument [that] [has as card type] and [has as card data] Ontology verbalization refers to the translation of ontology concepts, rela- tions, and axioms to (controlled) natural languages, such as Attempto Controlled English [4]. In contrast to controlled natural language, the objective herein is to use verbalized ontology elements to identify the appropriate verb and relative pronoun linking the denition's characteristics. For this purpose, verbalization patterns have been identied, of which selected ones are provided below. P1 - ObjectUnionOf: [a/an OR ObjectMinCardinalty] [(NP,)* or] NP(class) P2 - ObjectMinCardinalty: at least P3 - ObjectSomeValuesFrom: NP(domain)[ that [a/an] NP(range(s))] P4 - ObjectProperty with has is split into two parts: NP<(domain) as [a/an] NP(range) Should the label of the object property already contain the concept label in the range, the concept label is not reiterated in the NL denition, e.g. hasManager pointing to Manager. The above list is not exhaustive and requires NLP meth- ods for its implementation, e.g. tokenization. The application of these patterns to characteristic formation will be exemplied in the next section. In a next step, the existing rdfs:comment of the ontology class is linked to the NP and, where applicable, verbalized content by means of a coordinating conjunction and the identied relative pronoun, which, if not available in the comment, can be obtained from Wiktionary. If no NL content is available, re-using existing structured Web resources, such as DBpedia, has been considered. The tentative information extraction process herein relies on string matching and an immediate subsumption to top DBpe- dia ontology concepts (e.g. Organization, Resource). Reducing NL denitions to DBpedia information might result in quality issues. For instance, circular deni- tions are frequent on DBpedia, i.e., a term is dened by itself or by a second term that refers back to the rst term. For instance, Debtor is dened as Debtor owes a debt to someone .... Applying the proposed pattern ensures the proper context for the concept, i.e., superordinate concept, and DBpedia information provide useful additional details. Due to its systematic nature, the described pattern enables a consistent for- mation of NL denitions, which strongly enhances the human readability of ontologies it is applied to. The proposed pattern is illustrated for the English language and requires minor adaptations for its realization in other NLs syntac- tically similar to English provided lexical resources are available. 3 Example Application An OWL ontology serves as the input to the intended system design, here exem- plied with Fadyart's Finance Ontology in English. By means of the OWL API the ontology can be parsed, the subclass relations and object properties identi- ed, and an annotation property can be added. Starting from >, the subsumption hierarchy is traversed to the rst concept not directly subsumed by it. If its su- perordinate concept contains no label, its fragment identier is tokenized (using e.g. the Stanford Core NLP) and represents the NP of Denition 1. In Example 2, the class ClientPortfolio is the subclass of AccountsPayable. To ensure the correct grammatical number and relative pronoun, the super- 2 class term is queried in Wiktionary, e.g. Java-based Wiktionary Library . Here, 3 the query returns plural only and the relative pronoun that for Accounts Payable. Subsequently, tokenization and verbalization pattern P4 dened in the previous section are applied to the object property of Example 2. Its range con- sists of a union of three classes, which is verbalized using pattern P1. Finally, the existing comment is to be added to the already obtained denition. By means of Wiktionary Clients in the existing rdfs:comment is identied as countable noun, so its singular form can be combined with the obtained denition using a coordinating conjunction and the relative pronoun identied above. The derived denition can be added to the concept ClientPortfolio as rdfs:comment. Example 2: Class ClientPortfolio in Manchester Syntax Original Input in OWL Resulting Denition Ontology: ... Client Account ObjectProperty: hasClientPortfolioBenecialOwnerOfIncome Accounts payable that has as client SubPropertyOf: hasAccountDomain portfolio benecial owner of income at Domain: ClientPortfolio least one party holder, party legal Range: PartyHolder or PartyLegalRepresentative or representative , or party usufructuary and PartyUsufructuary that is a client of the nancial Class: AccountsPayable institution for who's account the securities Annotations: rdfs: label "Accounts payable"@en handling operations are performed. SubClassOf: ShortTermLiabilities Class: ClientPortfolio Annotations: rdfs: label "Client accounts"@en, rdfs :comment "The clients of the nancial institution for who's account the securities handling operations are performed."^^ xsd:string SubClassOf: AccountsPayable Several object properties and enormous unions render it necessary that do- main experts decide which verbalization most adequately denes the concept. Additionally, at times comments are utilized for supplementary information rather than NL denitions of concepts, which is why for some cases the comments might not be re-used for the denition formation process. 4 Related Work Glosses for ontology concepts reuse existing lexical resources, e.g. WordNet, to provide ontology engineers with various linguistic descriptions to choose from for a specic concept [5]. Approaches grounding existing ontologies in lexical and 2 http://code.google.com/p/jwktl/ 3 Money that is owed ... linguistic descriptions (e.g. lemon ) either re-use glosses for NL descriptions or 4 derive meaning by pointing to the semantic object in the ontology. Ontology ver- balization utilizes formalized knowledge in ontologies to derive NL descriptions. 5 For instance, the SWAT project facilitates the understandability of verbalized entailments by providing individual inference steps in the English language [6]. Instead of providing an external meta-model or ontology engineering support, the proposed pattern seeks to re-use existing resources and use verbalization patterns in order to provide NL denitions for existing domain-specic ontology classes. As a standard-based approach, it reects established best practices and accepted semiotic theories. 5 Discussion and Future Work This paper proposes a NL denition ODP on the basis of denition forma- tion methods from terminology science. Subsequent to dening the pattern, a (semi-)automated design to obtaining NL denitions by means of ontology ver- balization, utilization of existing NL comments, and information extraction has been exemplied in the nancial domain. In terms of future work, the degree to which the pattern can be generalized to other domains will be tested. As regards, information extraction, a profound disambiguation process will be considered. Furthermore, its formalization for a submission to the ontology design pattern repository is planned. References 1. Presutti, V., Blomqvist, E., Daga, E., Gangemi, A.: Pattern-Based Ontology Design. In Suarez-Figueroa, M.C., Gomez-Perez, A., Motta, E., Gangemi, A., eds.: Ontology Engineering in a Networked World. Volume 12. Springer (2012) 3564 2. Cimiano, P., Buitelaar, P., McCrae, J., Sintek, M.: LexInfo: A Declarative Model for the Lexicon-Ontology Interface. Web Semantics: Science, Services and Agents on the World Wide Web 9(1) (2011) 2951 3. Damljanovic, D., Agatonovic, M., Cunningham, H.: Natural Language Interfaces to Ontologies: Combining Syntactic Analysis and Ontology-Based Lookup through the User Interaction. In Sure, Y., Domingue, J., eds.: The Semantic Web: Research and Applications. Volume 19. Springer (2010) 106120 4. Kaljurand, K., Kuhn, T.: A Multilingual Semantic Wiki Based on Attempto Con- trolled English and Grammatical Framework. In Corcho, P.C.O., Hollink, V.P.L., Rudolph, S., eds.: The Semantic Web: Semantics and Big Data. Volume 17. Springer (2013) 427441 5. Jarrar, M.: Position paper: Towards the Notion of Gloss, and the Adoption of Linguistic Resources in Formal Ontology Engineering. In: Proceedings of the 15th international conference on World Wide Web, ACM (2006) 497503 6. Nguyen, T.A.T., Power, R., Piwek, P., Williams, S.: Predicting the Understandabil- ity of OWL Inferences. In Corcho, P.C.O., Hollink, V.P.L., Rudolph, S., eds.: The Semantic Web: Semantics and Big Data. Volume 17. Springer (2013) 109123 4 http://lemon-model.net 5 http://swatproject.org