=Paper=
{{Paper
|id=Vol-1515/demo7
|storemode=property
|title=Highly literate ontologies
|pdfUrl=https://ceur-ws.org/Vol-1515/demo7.pdf
|volume=Vol-1515
|dblpUrl=https://dblp.org/rec/conf/icbo/LordW15
}}
==Highly literate ontologies==
Highly Literate Ontologies Phillip Lord∗ and Jennifer D. Warrender School of Computing Science, Newcastle University, Newcastle-upon-Tyne, UK ABSTRACT The two forms of description have largely been used There is still a lot of discussion about exactly what ontologies independently. Ontology terms are sometimes used in semi- should represent, but what is generally agreed is that they formalise structured formats such as a UniProt record, or minimum and relate to some relatively complex areas of knowledge. While information documents. While these use ontologies in some parts ontology environments allow rich descriptions of the relationship of the document, in general, ontology terms and the free text are between the entities inside the ontology (because this is what an in different parts of the record. In this paper, we show how can we ontology is), they often do not provide the same rich environment integrate ontological and textual knowledge in a single authoring to describe the knowledge that they represent. environment and describe how we are applying this to describing OWL does, for instance, supports annotations which allows an amino acids. ontology developer to add comments to many parts of the ontology. But these comments, do not contain markup, sectioning or any of the 2 DEVELOPING KNOWLEDGE standard facilities authors use when writing documents. Our solution to this builds on Tawny-OWL, our highly-programmatic First, we ask the question, why is it difficult to relate ontological environment for ontology development. This provides a rich and textual descriptions. One possible explanation is that the two environment, which allows abstraction, automation and extension, forms have very different “development environments”. The main while still being entirely textual. As a result, it is possible to documentation environment used within science is Word, followed integrate this form of ontology with similar textual environments for by LATEX, common in more mathematical environments. More documentation such as LATEX, or AsciiDoc. We call the result a literate recently, there has also been interest in various light-weight markup ontology, in reference to literate programming. The result can be languages, such as markdown, and their associated tool-chains. ”tangled” to produce either a document or ontology. Ontology development environments also come in many different However, manipulating mixed syntax formats is difficult. Generally, forms. Early versions of the Gene Ontology, for instance, used the text editor either supports the literate form or programmatic a bespoke text file format and a text editor – an approach rather (ontology) form best. To address this, we have developed what we similar to the light-weight markup languages of today. This had the call ”lenticular views” – essentially, the source code can be presented significant advantage of a low-technological barrier to entry. More either in an ontology-centric or a document-centric view. Either form modern environments provide a much more graphical interface. can be changed, giving the author a powerful and unique environment These generally provide a much richer way of interacting with an for creating literate ontologies. Or alternatively, semantic documents ontology. where the ontology formalises the document. We demonstrate this While these environments add a lot of value, they do not with our literate amino-acid ontology which is also a part of the necessarily integrate well with text. Both Protégé and OBO-Edit developing manual for Tawny-OWL. have a class-centric view and are biased toward showing the various logical entities in the ontology, as opposed to the textual aspects. Indeed, this bias is shown even at the level of OWL. For example, 1 INTRODUCTION annotations on an entity (or rather an axiom) are a set rather than a Ontologies have been used extensively to describe many parts of list, while ordering is generally considered to be essential for most biology. Ontologies have two key features which make their usage documents. attractive. First, they provide a mechanism for standardising and With this divergence of development environments, it seems hard sharing the terms used in descriptions, making comparison easier to understand how we could square the circle of combining text and and, secondly, they provide a computationally amenable semantics ontology development. Next, we describe the amino-acid ontology to these descriptions, making it possible to draw conclusions about and how the novel development methodology we used for this the relationships between descriptions even when they share no ontology allows us to achieve this. terms in common. Despite these advantages, the oldest and most common form of description in biology is free text, or a semi-structured 3 TAWNY-OWL representation through the use of a standardised fill-in form. Free Tawny-OWL (Lord, 2013) provides a fully programmatic environment text has numerous advantages compared to ontologies: it is richly for development. Simple ontological statements can be written with expressive, is widely supported by tooling, and while the form of a syntax inspired by Manchester OWL notation (Horridge and Patel- language used in science (“Bad English” (Wood et al., 2001)) may Schneider, 2012); repetitive statements can be built automatically by not be easy to use, understand or learn, it is widely taught and most writing functions which encapsulate and abstract over the simpler scientists are familiar with it. statements, a process we call “patternisation” (Warrender and Lord, 2013). ∗ To whom correspondence should be addressed: In this way, we have managed to combine the advantages of text- phillip.lord@newcastle.ac.uk based environments for editing ontologies i.e. the use of a standard Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes 1 P. Lord and J.D. Warrender First, to explain the domain. Proteins are polymers ;; First, to explain the domain. Proteins are polymers made up from amino-acid monomers. They consist of a ;; made up from amino-acid monomers. They consist of a central carbon atom, attached to a carboxyl group (the ;; central carbon atom, attached to a carboxyl group (the ‘‘acid’’ amino) and amine group (the ‘‘amino’’ group) ;; ‘‘acid’’ amino) and amine group (the ‘‘amino’’ group) a hydrogen and an R group. The R group defines the ;; a hydrogen and an R group. The R group defines the different amino acids. The different R groups have ;; different amino acids. The different R groups have different phyiscal or chemical properties, such as ;; different phyiscal or chemical properties, such as their degree of hydrophobicity. We call these different ;; their degree of hydrophobicity. We call these different characteristics |RefiningFeatures|. ;; characteristics |RefiningFeatures|. \begin{tawny} ;; \begin{tawny} (defclass AminoAcid) (defclass AminoAcid) (defclass RefiningFeature) (defclass RefiningFeature) (defclass PhysicoChemicalProperty :super RefiningFeature) (defclass PhysicoChemicalProperty :super RefiningFeature) \end{tawny} ;; \end{tawny} Fig. 1. The document-centric view Fig. 2. The ontology-centric view editing environment and integration with version control, while example, “tab-completion” works in both the document-centric maintaining (and in some ways surpassing) the power of tools like view (completing LATEX macros) and in the ontology-centric view Protégé. (completing ontology identifiers). We can launch a compilation Tawny-OWL can be used to generate any ontology, but we of the document-centric view (producing a PDF), or evaluate our demonstrate it here with the amino-acid ontology: a highly ontology, perhaps reasoning over it, in the code-centric view. patternised ontology with over 430 classes generated from one Therefore, we have achieved a key aim of literate programming: pattern. We consider next the implications that this has for the ability neither view holds primacy and the author can edit either. to integrate ontological and textual descriptions. 5 DISCUSSION 4 LITERATE ONTOLOGY In this paper, we have described our methodology for integration of As Tawny-OWL is based on a full programming language, it text and ontological statements at authoring time, using lenticular supports a feature which at first seems quite inconsequential: text to enable literate ontology development. Indeed, we have fully comments. As with almost every programming language, it is documented the whole of the amino-acid ontology into literate possible to add free, unstructured text to the same source code that form3 . defines the ontology. While opinions vary on the role of comments The combination of Tawny-OWL and lenticular text is an in programmatic code, perhaps the most extreme is that of literate extremely rich environment. We are aware, however, that it is a programming (Knuth, 1984) which suggests that code should be specialist environment. To make full use of Tawny-OWL, the author usable both as a program capable of execution and as a document needs to use a Clojure based-development environment, document capable of reading and that neither view should have primacy. authoring in LATEX, and the lentic package which is Emacs-based. Literate programming can be difficult, however, partly because In reality, though, the tools are not tightly coupled: we have the editing environment offers few facilities for it: fundamentally, alternatives beyond LATEX, Emacs, or even Tawny-OWL. At the supporting mixed-syntax text in a tool is a difficult task. Our solution same time, one output form of a literate ontology is a readable PDF uses a multi-view approach to editing, which allows the author to document, something far more familiar to biologists or medics than see her source code in either a document-centric or an ontology- Protégé or any ontology development environment. centric view. We call this approach lenticular text, named after lenticular printing which produces images which change depending ACKNOWLEDGEMENTS on your angle of viewing. This is an entirely novel solution to literate This work was supported by Newcastle University. programming as it effectively performs the tangling operation for the author as they type. A representation of the two views are shown REFERENCES in Figures 2 and 4. The two views, it should be noted, contain the Horridge, M. and Patel-Schneider, P. F. (2012). Owl 2 web ontology language same text but are syntactically different, such that the document- manchester syntax (second edition). Technical report. centric view is entirely valid LATEX code, while the ontology-centric Knuth, D. E. (1984). Literate programming. The Computer Journal, 27, 97–111. view is valid Tawny-OWL code. Lord, P. (2013). The Semantic Web takes Wing: Programming Ontologies with Tawny- We have now implemented lenticular text for the editor, Emacs1 , OWL. OWLED 2013. Warrender, J. D. and Lord, P. (2013). A pattern-driven approach to biomedical ontology in a package called “lentic”2 . A key feature of this implementation engineering. SWAT4LS 2013. is that both views exist simultaneously in Emacs, and provide Wood, A., Flowerdew, J., and Peacock, M. (2001). International scientific english: The all the features of the appropriate development environment; for language of research scientists around the world. Research Perspectives on English for Academic Purposes, pages 71–83. 1 https://www.gnu.org/software/emacs/ 2 https://github.com/phillord/lentic 3 https://github.com/phillord/tawny-tutorial 2 Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes