Facets, Tiers and Gems: Ontology Patterns for Hypernormalisation Phillip Lord 1∗and Robert Stevens2 1 School of Computing Science, Newcastle University, Newcastle-upon-Tyne 2 School of Computer Science, University of Manchester, Manchester ABSTRACT refining concepts that form closed, covering and disjoint hierarchies. There are many methodologies and techniques for easing the Building an ontology in this way, allows the ontology developer to task of ontology building. Here we describe the intersection of two exploit the reasoner to build a polyhierarchy by using classes that of these: ontology normalisation and fully programmatic ontology define the self-standing entity in terms of the refining partitions. development. The first of these describes a standardized organisation Polyhierarchies are difficult to build manually, as human ontology for an ontology, with singly inherited self-standing entities, and a developers, no matter how good their domain knowledge, find it number of small taxonomies of refining entities. The former are hard to ensure all possible parents of an entity are taken into account. described and defined in terms of the latter and used to manage The normalisation approach uses defined classes and reasoning to the polyhierarchy of the self-standing entities. Fully programmatic remove this chore. Creating the tree of self-standing entities still, development is a technique where an ontology is developed using a however, remains as a task for the developer. The normalisation domain-specific language within a programming language, meaning approach can significantly increase the robustness and reduce the that as well defining ontological entities, it is possible to add arbitrary work of manual maintenance (Wroe et al., 2003). In this latter form, patterns or new syntax within the same environment. We describe ontological normalisation has been widely, if implicitly, used. how new patterns can be used to enable a new style of ontology While the term “ontology normalisation” has been borrowed, development that we call hypernormalisation. somewhat metaphorically, from database engineering, the process of building ontologies using a set of standard design patterns has a rather more direct relationship to the software engineering 1 INTRODUCTION equivalent. By reusing a standard set of patterns, it is possible to Building ontologies is a difficult and time-consuming business for a build an ontology both rapidly, and consistently. This has manifested number of reasons: from an abstract point-of-view knowledge about itself in a number of different ways, with a number of different tools, the domain can be difficult to gather, to understand and to represent such as TermGenie (Dietze et al., 2014), or Populus (Jupp et al., ontologically; more, immediately, ontologies, especially those with 2011) which can generate ontologies according to a pattern. a complex representation, can be taxing to describe and define We have previously described a fully programmatic methodology consistently, to update, expand or change when that representation for ontology development (Lord, 2013), using the Tawny-OWL needs to change. environment. This is built around the programming language There have been numerous attempts to simplify and clarify Clojure and enables the ontology to take advantage of all the features this process including: the development of methodologies such as of a programming language and its environment, including unit OntoClean that defines a set of meta-properties that can inform testing (Warrender and Lord, 2015), build, evaluation and, of course, ontological modelling (Guarino and Welty, 2002); upper ontologies pattern-driven development by simple use of functions (Warrender such as DOLCE or BFO (Grenon et al., 2004) that provide a and Lord, 2013). With respect to patterns, this environment has pre-made upper classification. several advantages. First, and unlike tools such as Populous and Another approach that can leverage both of these techniques is OPPL (Egana Aranguren et al., 2009), patterns are developed in ontology normalisation (Rector, 2002). Originally intended as a the same environment and syntax as simple ontology concepts; it mechanism for “untangling” existing hierarchies or classifications is, therefore, as easy to define a pattern as it is to define a class. being reused as the basis for an ontology, it also has significant use Second, being based on Clojure, a language which is homoiconic as a pattern for building ontologies de novo. and has very little syntax of its own, it is possible to build arbitrary Broadly, a normalised ontology is defined using a skeleton that is syntactic constructions to represent patterns in a way that is both a strict tree (i.e. not a acyclic graph) of concepts differentiated using convenient and attractive to the developer. an inheritance (i.e. not a partonomy) relationship. These are further In this paper, we describe an extension of the normalisation split into: a set of self-standing entities in which children are disjoint technique that we call hypernormalisation. This technique is from each other, but do not cover the parent, and partitioning or typified by the (near or complete) absence of asserted hierarchy among the self-standing entities. We describe how this allows ∗ To whom correspondence should be addressed: construction of an exemplar ontology of amino-acids (Stevens and phillip.lord@newcastle.ac.uk Lord, 2012). We then move on to describe recent developments 1 Lord and Stevens Top Level Self-Standing Entities Refining Type Body Substance Person Role Value Types Protein Steroid Organic Ion Care Role Patient Role Age Type Sex Doctor Role Nurse Role Adult Child Male Female Fig. 1. A normalised ontology slightly modified from Rector (2002). The graph does not necessarily reflect subsumption, see text for details. in the Tawny-OWL environment, including the definition of two the self-standing entities are split into only three sets: the amino- new design patterns, the tier and the facet, and one syntactic acids themselves (e.g. Alanine); a (very large) set of defined abstraction, the gem, can be used to enable hypernormalised classes describing the refined types of amino-acid (e.g. Small ontology development. Finally, we discuss the application of this Neutral Amino Acid); and, finally, the single class Amino Acid. approach to other ontologies. Or, stated alternatively, it contains no skeleton hierarchy at all, and all relationships between the self-standing classes are 2 HYPERNORMALISATION AND AMINO ACIDS arrived at through reasoning. This is particularly relevant for the amino acid ontology as it contains over 500 defined classes, Normalisation is a methodology that aims to disentangle an with subsumption relationships to the amino acids and between ontological structure, in the process managing its maintainability, themselves. Maintaining this form of ontology by hand would be utility and expressivity of the ontology generated. To achieve impractical. this, the ontology is split into two main hierarchies: self-standing We call this style of ontology development hypernormalised. entities and refining types, see Figure 2 for an example. The We believe that it is a natural extension of normalisation. Rector self-standing hierarchy contains entities with a central hierarchy notes, for example, that the choice of aspect to form the skeleton or skeleton. In this part of the ontology, we would expect that is “to some degree arbitrary”, but that they should be rigid (after hierarchy contains levels that are not-exhaustive – that is the OntoClean (Guarino and Welty, 2002)) and pragmatically stable children do not cover the parents, and parents are not closed to new (i.e. unlikely to change during the evolution of the ontology). Both children. This is contrasted by the refining hierarchy that consists of of these are, however, true for all the refining concepts in the amino- classes that are exhaustive; in many cases, children will be non- acids. In short, not only is the choice of skeleton arbitrary it is overlapping and, therefore, disjoint. This is not to say that the actually unnecessary and brings no further utility to the ontology refining types hierarchy are necessarily complete: in Figure 2, for than that which can be achieved by use of reasoning. example, the representation of Sex is too simple for many medical We note that the distinction between normalisation and uses, but might be sufficient for a customer relations system. In hypernormalisation is not absolute, but one of degree; we are simply general, the self-standing entities will be defined in terms of the describing the tendency toward an ontology with an flat asserted refining types, while polyhierarchical relationships between the hierarchy. self-standing entities will be determined through use of a reasoner. Having introduced the notions of a hypernormalised ontology, we This form of ontology development is quite different from an next consider a set of new patterns in Tawny-OWL that enable this upper ontology and agnostic to the choice of upper ontology or style of ontology development. none. While Rector (2002) suggests only that self-standing entities and refining types should be “made clear by some mechanism”; in OWL, it could be an upper ontological term, or an annotation. 3 PATTERNISING AND TAWNY-OWL We next introduce the amino-acid, used here as an exemplar, The Tawny-OWL environment (Lord, 2013) and its ability to which defines the biological amino-acids in terms of the support patterns (Warrender and Lord, 2013) has been described physiochemical properties most relevant to their biological role. elsewhere in detail; here, we provide a quick overview, so that the It is a structurally interesting ontology because it is normalised, rest of the paper is clear. Tawny-OWL is implemented as a DSL with a clear and clean separation between the self-standing entities (domain-specific language) in Clojure, which is a Lisp-like language and the five refining concepts. It is rather more than this, though; implemented in Java, and running on the Java virtual machine. 2 Facets, Tiers and Gems Top Level Self-Standing Entities Refining Type Defined Class Amino Acid Alanine Arginine 18 more Size Charge Hydro. Polarity SideChain Small AA S. Neutral AA S. N. Aliphatic AA 500 more Tiny Small Large Fig. 2. A hypernormalised ontology representing the amino-acids using the same terminology as Figure 2. Some labels have been abbreviated. Tawny-OWL itself wraps the OWL API (Horridge and Bechhofer, define this pattern in the same environment, or side-by-side in the 2011); this is the same library that underpins Protg, and from it, same file as a simple class definition; with Tawny-OWL it is as easy Tawny-OWL gains much of its functionality. Simple sections of the to define a class, as to define and use a new pattern. Ontologies ontology can be generated using a syntax based on a “lispified” such as the Karyotype ontology make extensive use of this facility version of Manchester OWL Notation; for example, the following moving freely between ontology and pattern definitions, as well as code: literal data structures, utility functions and unit tests (Warrender and Lord, 2013). (defclass A Tawny-OWL is now a mature and used software product; the first :super B) alpha release of Tawny-OWL was in Nov 2012, first full release, Nov 2013, followed by four point releases to 2016. This paper This declares a new class A that has the pre-existing class B describes mostly the upcoming v2.0 release, although some of the as a superclass 1 which in Manchester OWL notation would be features described were available in earlier versions. expressed as: Class: o:A 4 THE VALUE PARTITION SubClassOf: A common pattern for building a normalised ontology is called the o:B value partition. This pattern (Rector, 2005) addresses the problem of the ontological modelling of a continuous range. For example, in This code is entirely valid Clojure and can be evaluated in any modelling the amino-acids, we can consider the concept of Size; Clojure environment, such as CIDER/Emacs or Cursive/IntelliJ. It this could be described directly using the molecular weight of the is also possible to define new patterns: for example the following amino-acid. However, for the purpose of the amino-acids, it is both pattern definition: easy and general practice to split size into three categories: tiny, small and large. In Tawny-OWL, this can be achieved straight- (defn some-only [property & clazzes] forwardly using the defpartition function3 . (list (some property clazzes) (only property (or clazzes)))) (defpartition Size [Tiny Small Large] defines the some-only pattern which generates a set of :domain AminoAcid existential restrictions and one universal with the union of the :super PhysioChemicalProperty) existential fillers as its filler, which implements the ontological closure pattern. This is a function definition in Clojure terms: defn Axiomatically, this expands into: a class Size; three subclasses, introduces the function, property & clazzes is the argument Tiny, Small and Large; and, a property hasSize. The list, some, only and or are functions provided by Tawny-OWL and list returns, prosaically, a list2 . Critically, it is possible to 3 For those with knowledge of Lisp, this is actually a macro; the main 1 See Lord (2014) an explanation of why :super is used rather than implementation is in the value-partition function. Tawny-OWL :subclass. provides support for implementing syntactic macros whose function is 2 The function shown here is a slightly simplified version of one provided simply to allow the use of bare symbols. For those without knowledge of in Tawny-OWL. Lisp, the distinction is not important! 3 Lord and Stevens property is functional, has range of Size and domain of The use of :suffix true causes a simple change to the AminoAcid. Expanded, this would be expressed as follows4 : naming of the entities: Positive will become PositiveCharge which would be expanded as follows: Class: o:Large SubClassOf: Class: o:PositiveCharge o:Size SubClassOf: o:Charge Class: o:Size EquivalentTo: Other names are modified equivalently. By default, this will o:Large or o:Small or o:Tiny manifest both when referring to the class in the Tawny-OWL environment, in the IRI of the concept when serialized as OWL, and SubClassOf: in the value of an annotation on the concepts5 . In addition to naming, o:PhysioChemicalProperty it is also possible to optionalise: whether or not the subclasses are disjoint, covering, whether the property is functional or whether it Class: o:Small is created at all. SubClassOf: The tier is a more general pattern than the value-partition; in fact, o:Size in the current version of Tawny-OWL, the latter is defined in terms of the former. Class: o:Tiny SubClassOf: 6 THE FACET o:Size Both the value partition and tier introduce a new object property named after the tier, and with a range limited to the classes defined DisjointClasses: within the tier. The converse is also true; where we use one of the o:Large,o:Small,o:Tiny tier classes, such as PositiveCharge it is most likely that we wish to use it with the hasCharge property defined as part. Taken The subclasses are disjoint and cover the parent. Following the together, we describe the combination of classes and a property as a terminology from Rector (2002), the value partition is useful for facet. Facets are a well known technique, first proposed in a library defining partitioning or refining concepts. classification (the Colon Classification (Ranganathan, 1933), named after the use of “:” as a separator). They are now common-place as 5 THE TIER seen with facetted browsers used by many websites for navigation The value partition is a pattern aimed at a specific purpose – of complex product catalogues. segmenting a continuous range. In practice, though, we have found Tawny-OWL provides explicit support for facets, allowing the that the axiomatization of this pattern is more generally useful. For association of a property and a set of classes, as demonstrated by example, considering the amino-acid ontology, it is natural to model the following code: the chemistry of the side-chain as such: (as-facet (defpartition SideChainStructure hasCharge [Aromatic Aliphatic] :domain AminoAcid Positive Neutral Negative) :super PhysicoChemicalProperty) The practical implication of this is that we can now use the While this is intuitive, ontologically, SideChainStructure facet function to return an existential restriction providing just a is actually of a very different form from Size, as it does not reflect class. We can express this programmatically; for example, we might a spectrum. Either the side-chain contains a benzene ring, making use the assert function provided by Clojure’s unit test framework. it aromatic, or it does not. This form of partition was also noted (assert in Rector (2002) which includes the classes Male and Female (= (some hasCharge Positive) which is not a spectrum, at least in this simplified representation. (facet Positive))) We introduce here, therefore, the more general notion of the tier: a small set of concepts in a one-deep hierarchy. The tier function By itself, this ability is only slightly more succinct. However, supports a range of options: when used with multiple facetted classes, the advantages become (deftier Charge considerably clearer, as can be shown by the following assertion. [Positive Neutral Negative] (assert :domain AminoAcid (= (list (some hasCharge :super PhysioChemicalProperty :suffix true) 5 The duplication between the annotation and the IRI fragment is there because IRI schemes such as numeric style OBO IDs; annotations have been 4 Tawny-OWL also adds annotations which have been elided elided for brevity 4 Facets, Tiers and Gems Neutral) explicit in the OWL serialization. Tawny-OWL actually uses (some hasHydrophobicity these annotations internally, for example, to enable the facet Hydrophobic) functionality by providing a relationship between the classes and (some hasPolarity the appropriate object property. This is a strictly an implementation NonPolar) detail and could have been achieved without annotations; however, (some hasSideChainStructure we believe that it shows the value of having this knowledge explicit Aliphatic) in OWL. (some hasSize Tiny)) 9 DISCUSSION (facet Neutral Hydrophobic NonPolar Aliphatic Tiny))) In this paper, we describe how we have used Tawny-OWL to provide higher-level patterns which can be applied to ontology development. The patterns provide both functionality and syntactic abstraction In addition to succinctness, this pattern also reduces the risk of over the underlying OWL implementation. In the process, they errors; a class such as Tiny will always be used with its correct enable the easy and accurate construction of ontologies. property. Without the use of facets, the ontology developer must More specifically, we demonstrate two new patterns: the tier and achieve this by hand. It would also be possible to detect the error the facet. The tier is an extension of the existing value partition using reasoning, although this will only succeed if appropriate range pattern and can be used for the generation of many small hierarchies and disjoint restrictions are in the ontology. The defpartition that can be used as refining properties. The facet borrows from the and deftier functions, of course, both add these range and library sciences notion of a facetted classification, and is used to disjoint restrictions and declare their classes as facets of their associate a set of classes with a specific set of values. This form of properties. classification is very common in the web; the majority of web stores, for example, offer facetted browsing, often with the facets changing 7 THE GEM for different subsections of the catalogue. Finally, we define the gem that provides a syntactic abstraction for Taken together, these two patterns enable a new form of ontology a class composed entirely or mainly from facets. Following the development, hypernormalisation, which is an extreme form of terminology from Rector (2002), this abstraction would be useful normalisation. In this form of normalisation, we do away with mostly for self-standing concepts. For example, we could define the the creation of a tree of self-standing entities and instead rely amino acid alanine using the following defgem statement. on the reasoner to build all the hierarchy. As well as making the ontologist’s task easier, it makes the characteristic that would (defgem Alanine have been used to create the tree of self-standing entities explicit :comment "An amino acid with a single in the form of a refining characteristic. Here, we have described methyl group as a side-chain." the application of this methodology to the exemplar amino-acid :facet Neutral Hydrophobic NonPolar ontology. Of course, it is dangerous to extrapolate to generality from Aliphatic Tiny) an exemplar, but we have also started to apply hypernormalisation to ontologies of other, more real, domains including clouds (in The other amino-acids can be likewise defined as a series of the meterological sense), cell lines and a reworking of the Gene gems. In fact, the amino acids are so regular, all having the same Ontology. The tier has been made generic; it does not require, for five facets, that we use a further syntactic abstract specific to example, that all refining types are closed (i.e. all possibilities are the amino-acid ontology – a form of pattern that we describe as known in advance) nor disjoint. localized (Warrender, 2015). The gem represents generalised syntax Clearly, not all forms of ontology will naturally be represented useful for developing any ontology. in a hypernormalised form. For example, the Karyotype ontology (Warrender and Lord, 2013) is far from this form; here, we 8 ON ANNOTATION define the self-standing concepts and then use reasoning over a set of We have previously discussed the relationship between a design defined classes which effectively operate as facets (Warrender and methodology such as normalisation and the use of an upper Lord, 2015). However, the popularity of the facetted browsers shows ontology. The Tawny-OWL patterns described here are all that is possible to use this form of classification in many areas. We orthogonal and agnostic to the choice of an upper ontology or to believe that the introduction of the concept of hypernormalisation none. They do not place their entities in any particular part of the and the implementation of it in Tawny-OWL could have significant class hierarchy nor define classes outside of those required for the implications for the future development of ontologies. domain ontology, although they could be easily extended to do so should the ontology developer require. REFERENCES However, we agree with Rector (2002) that the use of patterns Dietze, H., Berardini, T. Z., Foulger, R. E., Hill, D. P., Lomax, J., OsumiSutherland, should “made clear” and be explicit within the ontology. For D., Roncaglia, P., and Mungall, C. J. (2014). Termgenie - a web application for this reason, all of the patterns described here also make use of pattern-based ontology class generation. Journal of Biomedical Semantics, 5(1), 48. annotations, using annotation properties defined using its own Egana Aranguren, M., Stevens, R., and Antezana, E. (2009). Transforming the axiomisation of ontologies: The ontology pre-processor language. Nature internal annotation ontology. For example, all entities generated Precedings. as a result of a pattern such as deftier are explicitly annotated Grenon, P., Smith, B., and Goldberg, L. (2004). Biodynamic ontology: applying BFO as such. This means that the use of these patterns is (informally) in the biomedical domain. Stud Health Technol Inform, 102, 20–38. 5 Lord and Stevens Guarino, N. and Welty, C. (2002). Evaluating ontological decisions with ontoclean. Systems (OMAS) in conjunction with European Knowledge Acquisition Workshops. Commun. ACM, 45(2), 61–65. Siguenza, Spain. Horridge, M. and Bechhofer, S. (2011). The OWL API: A Java API for OWL Stevens, R. and Lord, P. (2012). Semantic publishing of knowledge about amino acids. Ontologies. Semantic Web Journal, 2. http://ceur-ws.org/Vol-903/paper-06.pdf. Jupp, S., Horridge, M., Iannone, L., Klein, J., Owen, S., Schanstra, J., Wolstencroft, K., Warrender, J. (2015). The Consistent Representation of Scientific Knowledge: and Stevens, R. (2011). Populous: a tool for building owl ontologies from templates. Investigations into the Ontology of Karyotypes and Mitochondria. Ph.D. thesis, BMC Bioinformatics, 13(Suppl 1), S5. School of Computing Science, Newcastle University. Lord, P. (2013). The Semantic Web takes Wing: Programming Ontologies with Tawny- Warrender, J. and Lord, P. (2013). A pattern-driven approach to biomedical ontology OWL. OWLED 2013. engineering. SWAT4LS 2013. Lord, P. (2014). Manchester syntax is a bit backward. http://www.russet.org. Warrender, J. D. and Lord, P. (2013). The Karyotype Ontology: a computational uk/blog/2985. representation for human cytogenetic patterns. Bio-Ontologies 2013. Ranganathan, S. (1933). Colon Classification. Warrender, J. D. and Lord, P. (2015). How, What and Why to test an ontology. Rector, A. (2005). Representing specified values in owl: “value partitions” and “value Wroe, C., Stevens, R., Goble, C., and Ashburner, M. (2003). A methodology to sets”. W3C Working Group Note. migrate the gene ontology to a description logic environment using daml+oil. Pacific Rector, A. L. (2002). Normalisation of ontology implementations: Towards modularity, Symposium on Biocomputing. re-use, and maintainability. Proceedings Workshop on Ontologies for Multiagent 6