=Paper=
{{Paper
|id=Vol-1309/paper3
|storemode=property
|title=Textual and logical definitions in ontologies
|pdfUrl=https://ceur-ws.org/Vol-1309/paper3.pdf
|volume=Vol-1309
}}
==Textual and logical definitions in ontologies==
Textual and logical definitions in ontologies Selja Seppälä Yonatan Schreiber Alan Ruttenberg Department of Philosophy CUBRC School of Dental Medicine University at Buffalo 4455 Genesee St. University at Buffalo Buffalo, NY, 14260, USA Buffalo, NY, 14225, USA Buffalo, NY, 14215, USA Email: seljamar@buffalo.edu Email: yonatan.schreiber@cubrc.org Email: alanruttenberg@gmail.com Abstract—We discuss the structure and functions of definitions the perspective of a terminologist and a logician respectively. and axioms in ontologies from the perspective of a terminologist By working through a few examples of the correspondence and logician respectively. By working through a few examples between parts of the textual definitions and logical parts, of the correspondence between parts of the textual definitions we show how to compare and contrast each and how each and the axioms, we show how to compare and contrast each perspective reveals areas for improvement. and how each perspective reveals areas for improvement. Having established a correspondence between the textual and logical We suggest that it is possible to write tools that analyze parts of ontology term definitions, we discuss the possibility of textual definitions with the goal of offering places for im- developing tools that help developers improve their ontologies. provement. We discuss how such tools could be leveraged Such tools could be used to check both the textual definitions to check the contents of both textual and logical definitions against the asserted axioms and vice versa. In addition, we for terms in ontologies. Our recommendations could also propose a few other ways of checking the contents of textual definitions. contribute to supplementing the specifications of the OBO Foundry principles on textual definitions.1 Keywords—textual definitions, natural language definitions, logical definitions, OWL axioms, checking definition contents, II. T EXTUAL D EFINITIONS problems in definitions, functions of definitions in ontologies, rec- ommendations for definitions in ontologies, ontologies, terminology In an ontology, a textual definition is, ideally, a short sen- tence found as the object of an annotation property designated I. I NTRODUCTION for that purpose. This kind of natural language definition is also found in specialized terminological dictionaries. The account Ontologies have on the one hand axioms that form parts of we give in the present communication is thus based on the the logical definition of terms, and on the other hand natural more developed account of terminological definitions in [1], language definitions and other documentation of those terms. [2]. However, the ontological world does not seem to have a A good definition conveys the intended meaning of an theory of what the functions of textual as opposed to logical ontology term — we will come back to this later — by definitions are. The result of that is authoring practices that describing the type of thing to which the term refers. For vary widely. There are nevertheless correspondences (to a example, the Cell Type Ontology (CL) contains the following certain extent) between phrases in the textual parts and the definition for the term leukocyte: logical parts. We can use an expectation of correspondences between the textual and logical parts to build tools that help (a) An achromatic cell of the myeloid or lymphoid lineages developers improve their ontologies and provide guidelines for capable of ameboid movement, found in blood or other identifying issues in axioms and definitions. Aspects we can tissue. exploit are: This example shows that the term leukocyte refers to • Leverage logic to help establish correspondences between those things that are of the type achromatic cell and that are the textual definition and the axioms. distinguished from other achromatic cells in virtue of being: • Leverage principles of organizing terminological entities of the myeloid or lymphoid lineages; capable of ameboid (definitions, notes,. . . ) to characterize the logical parts. movement; found in blood or other tissue. • Measure some part of the quality of an ontology in terms As we can see, a definition normally states the type of of these correspondences. thing to which the instances of the defined term belong, and distinguishes these instances from the type and from other Thus, it may be feasible to bring automated methods used things falling under the same type by listing one or more of in the terminological world to bear on both establishing the the characteristics of the instances of the term. correspondences and identifying quality issues in the textual part that could be mapped to quality issues in the logical form. The first part, the head of the definition is called the genus; a distinguishing part, differentia. Thus, a definition has In this communication, we show examples of varying a structure where each part is related to the defined term’s definition practices in ontologies to support our first thesis instances by some type of relation: and describe issues in definition practices. We discuss the structure and functions of definitions and logical parts from 1 http://obofoundry.org/wiki/index.php/FP 006 textual definitions 35 • In the classical Aristotelian form, the genus (implicitly) annotating biomedical research texts or importing terms into expresses an is a relation, as in example (a) above, which other ontologies. Of course, this is also the function of the we read as: a leukocyte is an achromatic cell. axioms, as we will see in the next section. However, the latter • The differentia may express any kind of relation relevant can be somewhat obscure to non-ontologists who may need for describing and distinguishing the kinds of things to more detailed and explicit information about the term and its which the defined term refers. In example (a) above, referent. the relations expressed in the definition of leukocyte are respectively develops from (of the myeloid or lymphoid Therefore, there is a cognitive advantage in including lineages), capable of (capable of ameboid movement), textual definitions in ontologies. As argued in [1, section 1.3], and located in (found in blood or other tissue). dictionary-type definitions are meant to adjust users’ lexical competence [3] by modifying (or confirming) their knowledge A textual definition also has a logical form that derives about the use of terms. In ontologies, definitions allow users from the relationship between its intension (that which is said to make their use of a term converge toward that of the rest of about the referent) and its extension (the set of instances that the users of the ontology. Both the genus and the differentia fall under the intension). We can distinguish three main logical contribute to the cognitive adjustment: the genus is meant to forms:2 provide a sort of cognitive anchor by stating a term that should be familiar to the user of the definition; the differentiae are Classical definition A definition where the intension holds meant to tell the user how the defined thing differs from the for all instances of the type that is defined, as in Every thing that is expected to be already known. instance of X is a Y and all instances of X Z. . . . In this case, the characteristics expressed by Y and Z are necessary and, in the ideal case, they are jointly sufficient III. A XIOMS IN O NTOLOGIES for including all instances of X and distinguishing them Axioms in ontologies restrict the intended meaning of a from other instances of Y. The ideal case corresponds term by asserting necessary conditions for its use. They thus to the Aristotelian definition by necessary and sufficient function in a manner analogous to the necessary conditions conditions. A standard example of classical definition is previously discussed under Classical definition in section II. that of triangle: A rectilinear figure that has three sides. In OWL, it is rarely possible to provide sufficient conditions, (All triangles are rectilinear and have three sides.) so axioms do not on their own constitute full definitions. Typical or prototypical definition A definition where the in- We distinguish three primary functions of ontology axioms: tension holds for most of the instances of the type that disambiguation, taxonomic schematization, and fact-modeling. is defined, as in Every instance of X is a Y and most instances of X Z. . . . An example of prototypical definition The function of axioms in the disambiguation of terms for a swan would be An aquatic bird with a long neck, is analogous to the function that textual definitions play in usually having white plumage. (Most swans are white.) disambiguation. Every axiom represents a necessary condition Instance definition A definition where the intension holds for entities in the terms extension. Axioms thus help to for only a single instance, as in X is the only Y that determine the extension of a term by restricting it to those Z. . . . These correspond to proper definite descriptions. entities meeting the asserted condition. Each additional axiom This kind of definition would apply, for example, to restricts the extension further, though it is usually not possible ontologies that include what may be considered as proper to restrict the term to only its intended extension by providing names, such as the Large Hadron Collider (LHC) in an conditions that are jointly sufficient. The most common type ontology of nuclear physics. In this case, the relevant of axiom asserts an is a relation that relates the defined term kind of differentiae would probably inform us about the to a parent class by means of the subClassOf relation. For the geographical location of the LHC and specify that it is most part, the relatum of such an axiom should correspond (or was until some point in time) “the world’s largest and directly to the genus in the textual definition. most powerful particle accelerator.”3 The definition could be even more specific and tell us about the length of the We call the second function we identify ‘taxonomic ring and the number of magnets that compose it. schematization’. When employed in this capacity, an axiom as- serted for a class provides a schema or template for the axioms Normally, ontologies contain classical definitions because of any subclasses. This provides, in our view, robust, principled their function is to disambiguate terms. This is not to say that taxonomic relations between parent, child, and sibling classes. the other forms cannot appear in the textual definitions, but A class’s axioms are inherited by all of its subclasses. This this would not be ideal with respect to the function they are makes it possible to use axioms to suggest differentiae for its meant to fulfill in this context; without necessary and sufficient child classes, in other words to use these axioms as templates conditions it becomes possible to interpret terms in a manner for the axioms of the subclasses. This can be done by asserting that deviates from their intended use. a relational axiom for the parent class relating it to some other kind of entity (e.g. by writing an axiom for a class X asserting Indeed, the main function of textual definitions in ontolo- that any X is ‘part of some Y’). For every subclass of this gies is to specify the intended meaning of the ontology terms related kind, a subclass of the parent can then be distinguished. in order to avoid ambiguities and errors when, for example, For example, the axioms specifying the term infection in the 2 X, traditionally called the definiendum, stands for the defined term’s Infectious Disease Ontology (IDO) can be used to generate the referent; Y for the genus; Z for a differentia; Y and Z together for the definition subclass axioms of its child terms, such as amebiasis (see the itself, traditionally called the definiens. axiom under SubClass Of (Anonymous Ancestor) in Figure 1; 3 http://home.web.cern.ch/topics/large-hadron-collider see also the discussion of this example in section IV-C below). 36 Fig. 1. Correspondences in the parts of the textual definition and the axioms of the IDO term amebiasis. Lastly, we distinguish a fact-modeling function of axioms. A. General recommendation An ontology can be considered a specification of a controlled vocabulary for expressing facts in a given domain. Such a Based on the identified functions for textual definitions vocabulary is much sparser than the vocabulary that would be and axioms, we make the following general recommendation: used to express these facts in natural language, that is, there is a textual definitions should contain content analogous to what one-many correspondence between ontology terms and words is expressed in the axioms, i.e., descriptive content that mo- in domain-relevant portions of natural language. This means tivates the logical axioms. The expressions used in natural that the syntax for expressing facts (i.e., assertions between language may however be more idiomatic than the ontology instances) using ontology terms necessarily diverges from the vocabulary (e.g., the expression inheres in is not very natural). syntax used for expressing the same facts in natural language. Any complementary information that is deemed useful for The RDF-schema regularizes this syntax substantially, but it is understanding the intended meaning of the term but which still generally the case that RDF syntax plus the list of terms cannot be included in the axioms should be systematically in the ontology underdetermine how any given fact should asserted using other annotation properties. be translated from natural language into an expression using the ontologys controlled vocabulary. An important function of B. Exact correspondence axioms in ontologies is to provide a schematic suggestion of how this should be done. Thus, axioms complement textual Figure 3 shows that the parts of the textual definition definitions in contributing cognitively towards regularizing of dead-end host in IDO correspond exactly to the logical users’ employment of terms. For example, the axiom ‘is about definition by necessary and sufficient conditions. The only some document’ in one of the axioms specifying the term difference is in the natural language expression (bearing) used abstract in the Information Artifact Ontology (IAO) tells us for the has role ontological relation — perhaps to avoid the that the relation expressed by the verb to summarize in natural seemingly redundant use of ‘role twice. Here, the logical part language is expressed at the logical level by the is about is useful to fix the intended meaning of the natural language relation that is part of the controlled vocabulary of the ontology expression. (see annotations in blue in Figure 2). C. Structural correspondence but more specific content in textual definitions than in axioms Figure 1 shows that both differentia of the textual definition IV. C ORRESPONDENCES B ETWEEN T EXTUAL AND of the IDO term amebiasis contain information of the type L OGICAL D EFINITIONS expressed in the subclass axioms inherited from the parent class infection (see annotations in blue). However, the content As we have seen, axioms and textual definitions have conveyed by the parts of the textual definition of amebiasis are overlapping and complementary functions. Hereafter, we ex- more specific than the properties and classes expressed in the amine how they contribute to conveying the intended meaning axiom; they are subproperties of the relations and subclasses of terms. We compare and discuss some examples in the of the relata in the axiom. biomedical domain to show how these different forms relate. The examples will show what kinds of issues or inconsistencies If these inherited parts are relevant for distinguishing all can be identified by these comparisons; they reveal at least five the subclasses, then all textual definitions at that subclass types of correspondences. We also give some recommendations level should include that kind of information with the specific as to how to improve both the textual definitions and the related content that actually distinguishes each entity at that level. If axioms. For sake of readability, we will illustrate the cases with the comparison reveals a match of logical and textual parts at screenshots of the ontology editor Protégé. the level of inherited logical parts, this might be a sign that 37 Fig. 2. Correspondences in the parts of the textual definition and the axioms of the IAO term abstract. Fig. 3. Correspondences in the parts of the textual definition and the axioms of the IDO term dead-end host. the entity lacks an available subclass axiom. If this is the case, useful to have an axiom that allows these three terms to be the textual definition can be used as a basis for creating the logically distinguished. missing axioms. Here again, we recommend that the axiom be added We thus recommend that more specific axioms be added whenever the ontology has the resources to include the missing whenever the ontology has the resources to include them, i.e., axiom. if the terms are defined elsewhere in the ontology. For example, the axioms specifying the IDO term antiseptic role in Figure 4 F. Redundant parts of axioms or definitions could be completed as follows: Logical parts may contain axioms specifying other terms. subClassOf Figure 4 shows that part of the axioms specified for realized_by only has_participant antiseptic role in IDO correspond to: some (anatomical entity and part_of some organism) • the subclass axioms specifying the term ‘antimicrobial’ — the ‘material entity’ (see annotations in red); • the subclass axioms specifying the term ‘antimicrobial D. Incomplete textual definitions disposition’ (see annotations in blue). Figure 2 shows that the axiom specifying the term abstract This should not be a problem at the logical level, since the in the IAO contains the information ‘document part’ which is inferences that are made based on the logical expressions end absent from the textual definition. up being the same. We recommend that the textual definition be completed We recommend nevertheless that the axiom be simplified with this information. by using the terms that are specified by those axioms. For example, in this example, the first part of the axiom E. Missing axioms (inheres_in some Figure 4 shows that the last part of the textual definition (’material entity’ of the IDO term antiseptic role does not correspond to any and (has_disposition logical part (see annotations in green). However, this more some ‘antimicrobial disposition’))) specific differentia serves to distinguish the defined term from (1) antimicrobial disposition, which has the same subclass can be replaced by the following simpler expression: axiom (in blue), and (2) the sibling term disinfectant role which is specified by exactly the same axioms. It would therefore be inheres_in some ‘antimicrobial’ 38 This part of the definition does not correspond to any logical part. Why not replace this with 'antimicrobial' (the 'material entity')? In a textual definition, this amounts to defining another term inside the definition of the defined term. inheres_in some 'antimicrobial' This is the definition ('Equivalent To') of 'antimicrobial disposition'. It seems redundant to repeat it in here as it is imported by that relatum in the previous logical part. This shorter formulation says the same thing. From these axioms, it seems that 'antiseptic role' and 'antimicrobial disposition' are used synonymously. Fig. 4. Correspondences in the parts of the textual definition and the axioms of the IDO term antiseptic role. In a textual definition, this amounts to defining another V. U SING THE C ORRESPONDENCES TO H ELP IN term within the definition of the defined term, as can be D EFINITION C HECKING seen in the first differentia of the example (in red), which In ontologies that use semi-automated systems to create contains the definition of antimicrobial. This lacks conciseness the logical and the corresponding textual definitions, such as and is generally considered bad practice (see for example [4, TermGenie4 , both definition forms are expected to be reason- 28]). It unnecessarily overloads the contents of the definition ably consistent. However, when definitions are hand-crafted — imagine if each term of a definition was replaced by its or imported from other sources, such as other ontologies definition. More importantly, the reader might not recognize or, for example, from Wikipedia, various kinds of errors or that it is the definition of another term and fail to link the inconsistencies can creep in, as discussed above. Identifying defined term with that other one. these problems manually is less rigorous if no guidelines are provided. To increase reliability of definition-content checking, we propose a method that could be implemented in a computer program to assist ontology editors/curators in carrying out this We thus recommend that whenever a textual definition task in a systematic way. This method can also be used as a contains the definition of another term from the same ontology guide to manual identification of issues in definitions. or an imported ontology, this sub-definition be replaced by the corresponding term. In this example, the differentia borne by a The method consists in the following steps: material entity in virtue of the fact that it has an antimicrobial 1) Determine whether any of the terms from either the disposition should be replaced by borne by an antimicrobial. ontology that is being checked or the imported ontologies If the reader does not know the term used in the definition, appear in the textual definitions. she can (in principle) look it up in the ontology. A system of hyperlinks should also be provided for easier access, as it is 4 TermGenie is used for creating definitions in the Gene Ontology (GO), done in electronic dictionaries and in the axioms. (http://go.termgenie.org). 39 2) Get the taxonomic hierarchy of the matched terms to the • Punctuation signs such as parentheses and colons which top level. are also a sign of new definitions. 3) Determine whether any of the terms in this hierarchy • Expressions introducing new information such as i.e., that corresponds to one of the relata in the axioms. is, . . . 4) If no correspondence is found between terms in the textual definition and terms in the axioms, look for a correspon- The content-related issues presented in this section can dence between the relations expressed in the differentiae be automatically checked with a simple rule-based program of the textual definition and the object properties in the that uses, for example, lexico-syntactic patterns. This kind axioms. This can also be done by taking into account the of program can also be used for checking the conformity of hierarchy of object properties (if available). the surface form of the definitions to the editorial line of the 5) If matches are found, tag the corresponding part of the ontology (if any) [5]. textual definition with the corresponding relation–relatum In addition to these ontology-specific recommendations, pair (the tagging could supplement the textual definitions terminological manuals and guidelines state a number of other with hyperlinks to the entries of the terms and relations general principles and recommendations relating to definition used in the definition). writing [4], [6]–[8].5 6) If mismatches of this kind are identified, manually correct, modify or complete either the textual definition or the VII. C ONCLUSION axioms, or both according to the recommendations put forward in this paper. In this communication, we showed through examples that the defining practices in the ontology world lack systematic The proposed method may raise some implementation principles and theory. To fill this gap, we presented some challenges. For example, the first and fourth steps require background on textual definitions and axioms in ontologies natural language processing (NLP) methods to correctly iden- from the terminologist’s and logician’s viewpoint, emphasizing tify existing terms and relations in the textual definition. This their overlapping and complementary functions. involves using methods to find inexact matches, for example, Based on a discussion of various kinds of correspondences plural forms of terms and partial matches, as when only the between the parts of textual definitions and axioms, we put head of a complex term is used. Matching ontology relations to forward two primary recommendations to improve the contents natural language expressions can also be challenging, as there of both textual definitions and axioms: can be several ways to express a single ontological relation. A solution for relation identification that also involves NLP • Textual definitions and axioms should, whenever possible, methods would be analyzing large amounts of definitions in represent the same content. As we hope our examples which each part is matched to the corresponding ontological have indicated, it is frequently possible to do this with relation to identify the different corresponding expressions. the resources of the ontology. This solution might reveal domain-specific expressions for the • Neither textual definitions nor axioms should include more general ontological relations. content that defines another term in the same ontology. Finally, we proposed an implementable procedure to help VI. OTHER U SEFUL WAYS OF C HECKING THE C ONTENTS systematize content-checking of textual and logical definitions OF T EXTUAL DEFINITIONS in ontologies. In ontologies, definitions should include only necessary conditions that have the classical all-some form. Thus, they VIII. ACKNOWLEDGMENT should avoid: Work on this paper was supported by the Swiss National Science Foundation (SNSF). • Particularizing expressions such as for example, espe- cially, in particular, i.e., such as, . . . , and punctuation signs such as parentheses and colons. Sometimes, differ- R EFERENCES entia may contain hidden examples that should also be [1] S. Seppälä, “Contraintes sur la sélection des informations dans les avoided, as in the definition of leukocyte above which définitions terminographiques: vers des modèles relationnels génériques pertinents,” Ph.D. dissertation, Département de traitement informatique states found in blood or other tissue. Here, the speci- multilingue (TIM), Faculté de traduction et d’interprétation, Université de fication blood is superfluous since it is embedded in a Genève, 2012. [Online]. Available: http://archive-ouverte.unige.ch/unige: conjunction of which the other conjunct is its superclass. 21874 • Overly generalizing expressions such as etc., in general, [2] ——, “An ontological framework for modeling the contents of defini- normally, . . . , and disjunctions, as these are linguistic tions,” Terminology, forthcoming. markers of conditions that are not necessary. [3] D. Marconi, Lexical Competence. The MIT Press, 1997. [4] ISO 704:2009(E), Terminology work – Principles and methods (ISO Although particularizing and generalizing expressions can 704:2009), 3rd ed. Geneva: ISO, 2009. be useful for a better understanding of the term (as in example [5] S. Seppälä, “Semi-automatic checking of terminographic definitions,” (a) above). These kinds of information should be asserted using in International Workshop on Terminology design: quality criteria and other annotation properties. evaluation methods (TermEval) — LREC 2006, Genoa, Italy, 2006, pp. 22–27. Futhermore, textual definitions should not contain defini- 5 For a list of (terminology) manuals that contain definition writing principles tions of other terms, as in the definition of antiseptic role and recommendations, as well as other writings on definitions see https://sites. examined above (Figure 4). Thus, they should avoid: google.com/site/definitionsportal/literature. 40 [6] S. Pavel and D. Nolet, Handbook of Terminology. Canada: Public Works [8] R. Vézina, X. Darras, J. Bédard, and M. Lapointe-Giguère, La rédaction and Government Services - Translation Bureau, 2001. de définitions terminologiques, ser. Version abrégée et adaptée par Jean [7] (2012) The pavel terminology tutorial. [Online]. Available: http: Bédard et Xavier Darras. Montréal: Office québecois de la langue //www.bt-tb.tpsgc-pwgsc.gc.ca/btb.php?lang=eng&cont=308 française, 2009. 41