=Paper= {{Paper |id=Vol-1515/demo7 |storemode=property |title=Highly literate ontologies |pdfUrl=https://ceur-ws.org/Vol-1515/demo7.pdf |volume=Vol-1515 |dblpUrl=https://dblp.org/rec/conf/icbo/LordW15 }} ==Highly literate ontologies== https://ceur-ws.org/Vol-1515/demo7.pdf
                                            Highly Literate Ontologies
                                         Phillip Lord∗ and Jennifer D. Warrender
                      School of Computing Science, Newcastle University, Newcastle-upon-Tyne, UK




ABSTRACT                                                                     The two forms of description have largely been used
   There is still a lot of discussion about exactly what ontologies       independently. Ontology terms are sometimes used in semi-
should represent, but what is generally agreed is that they formalise     structured formats such as a UniProt record, or minimum
and relate to some relatively complex areas of knowledge. While           information documents. While these use ontologies in some parts
ontology environments allow rich descriptions of the relationship         of the document, in general, ontology terms and the free text are
between the entities inside the ontology (because this is what an         in different parts of the record. In this paper, we show how can we
ontology is), they often do not provide the same rich environment         integrate ontological and textual knowledge in a single authoring
to describe the knowledge that they represent.                            environment and describe how we are applying this to describing
   OWL does, for instance, supports annotations which allows an           amino acids.
ontology developer to add comments to many parts of the ontology.
But these comments, do not contain markup, sectioning or any of the
                                                                          2   DEVELOPING KNOWLEDGE
standard facilities authors use when writing documents.
   Our solution to this builds on Tawny-OWL, our highly-programmatic      First, we ask the question, why is it difficult to relate ontological
environment for ontology development. This provides a rich                and textual descriptions. One possible explanation is that the two
environment, which allows abstraction, automation and extension,          forms have very different “development environments”. The main
while still being entirely textual. As a result, it is possible to        documentation environment used within science is Word, followed
integrate this form of ontology with similar textual environments for     by LATEX, common in more mathematical environments. More
documentation such as LATEX, or AsciiDoc. We call the result a literate   recently, there has also been interest in various light-weight markup
ontology, in reference to literate programming. The result can be         languages, such as markdown, and their associated tool-chains.
”tangled” to produce either a document or ontology.                          Ontology development environments also come in many different
   However, manipulating mixed syntax formats is difficult. Generally,    forms. Early versions of the Gene Ontology, for instance, used
the text editor either supports the literate form or programmatic         a bespoke text file format and a text editor – an approach rather
(ontology) form best. To address this, we have developed what we          similar to the light-weight markup languages of today. This had the
call ”lenticular views” – essentially, the source code can be presented   significant advantage of a low-technological barrier to entry. More
either in an ontology-centric or a document-centric view. Either form     modern environments provide a much more graphical interface.
can be changed, giving the author a powerful and unique environment       These generally provide a much richer way of interacting with an
for creating literate ontologies. Or alternatively, semantic documents    ontology.
where the ontology formalises the document. We demonstrate this              While these environments add a lot of value, they do not
with our literate amino-acid ontology which is also a part of the         necessarily integrate well with text. Both Protégé and OBO-Edit
developing manual for Tawny-OWL.                                          have a class-centric view and are biased toward showing the various
                                                                          logical entities in the ontology, as opposed to the textual aspects.
                                                                          Indeed, this bias is shown even at the level of OWL. For example,
1      INTRODUCTION
                                                                          annotations on an entity (or rather an axiom) are a set rather than a
Ontologies have been used extensively to describe many parts of           list, while ordering is generally considered to be essential for most
biology. Ontologies have two key features which make their usage          documents.
attractive. First, they provide a mechanism for standardising and            With this divergence of development environments, it seems hard
sharing the terms used in descriptions, making comparison easier          to understand how we could square the circle of combining text and
and, secondly, they provide a computationally amenable semantics          ontology development. Next, we describe the amino-acid ontology
to these descriptions, making it possible to draw conclusions about       and how the novel development methodology we used for this
the relationships between descriptions even when they share no            ontology allows us to achieve this.
terms in common.
   Despite these advantages, the oldest and most common form
of description in biology is free text, or a semi-structured              3   TAWNY-OWL
representation through the use of a standardised fill-in form. Free       Tawny-OWL (Lord, 2013) provides a fully programmatic environment
text has numerous advantages compared to ontologies: it is richly         for development. Simple ontological statements can be written with
expressive, is widely supported by tooling, and while the form of         a syntax inspired by Manchester OWL notation (Horridge and Patel-
language used in science (“Bad English” (Wood et al., 2001)) may          Schneider, 2012); repetitive statements can be built automatically by
not be easy to use, understand or learn, it is widely taught and most     writing functions which encapsulate and abstract over the simpler
scientists are familiar with it.                                          statements, a process we call “patternisation” (Warrender and Lord,
                                                                          2013).
∗ To       whom      correspondence      should      be      addressed:      In this way, we have managed to combine the advantages of text-
phillip.lord@newcastle.ac.uk                                              based environments for editing ontologies i.e. the use of a standard



 Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes                                         1
P. Lord and J.D. Warrender



First, to explain the domain. Proteins are polymers                        ;; First, to explain the domain. Proteins are polymers
made up from amino-acid monomers. They consist of a                        ;; made up from amino-acid monomers. They consist of a
central carbon atom, attached to a carboxyl group (the                     ;; central carbon atom, attached to a carboxyl group (the
‘‘acid’’ amino) and amine group (the ‘‘amino’’ group)                      ;; ‘‘acid’’ amino) and amine group (the ‘‘amino’’ group)
a hydrogen and an R group. The R group defines the                         ;; a hydrogen and an R group. The R group defines the
different amino acids. The different R groups have                         ;; different amino acids. The different R groups have
different phyiscal or chemical properties, such as                         ;; different phyiscal or chemical properties, such as
their degree of hydrophobicity. We call these different                    ;; their degree of hydrophobicity. We call these different
characteristics |RefiningFeatures|.                                        ;; characteristics |RefiningFeatures|.

\begin{tawny}                                                              ;; \begin{tawny}
(defclass AminoAcid)                                                       (defclass AminoAcid)

(defclass RefiningFeature)                                                 (defclass RefiningFeature)
(defclass PhysicoChemicalProperty :super RefiningFeature)                  (defclass PhysicoChemicalProperty :super RefiningFeature)
\end{tawny}                                                                ;; \end{tawny}



                   Fig. 1. The document-centric view                                              Fig. 2. The ontology-centric view



editing environment and integration with version control, while            example, “tab-completion” works in both the document-centric
maintaining (and in some ways surpassing) the power of tools like          view (completing LATEX macros) and in the ontology-centric view
Protégé.                                                                 (completing ontology identifiers). We can launch a compilation
   Tawny-OWL can be used to generate any ontology, but we                  of the document-centric view (producing a PDF), or evaluate our
demonstrate it here with the amino-acid ontology: a highly                 ontology, perhaps reasoning over it, in the code-centric view.
patternised ontology with over 430 classes generated from one              Therefore, we have achieved a key aim of literate programming:
pattern. We consider next the implications that this has for the ability   neither view holds primacy and the author can edit either.
to integrate ontological and textual descriptions.
                                                                           5    DISCUSSION
4    LITERATE ONTOLOGY                                                     In this paper, we have described our methodology for integration of
As Tawny-OWL is based on a full programming language, it                   text and ontological statements at authoring time, using lenticular
supports a feature which at first seems quite inconsequential:             text to enable literate ontology development. Indeed, we have fully
comments. As with almost every programming language, it is                 documented the whole of the amino-acid ontology into literate
possible to add free, unstructured text to the same source code that       form3 .
defines the ontology. While opinions vary on the role of comments             The combination of Tawny-OWL and lenticular text is an
in programmatic code, perhaps the most extreme is that of literate         extremely rich environment. We are aware, however, that it is a
programming (Knuth, 1984) which suggests that code should be               specialist environment. To make full use of Tawny-OWL, the author
usable both as a program capable of execution and as a document            needs to use a Clojure based-development environment, document
capable of reading and that neither view should have primacy.              authoring in LATEX, and the lentic package which is Emacs-based.
   Literate programming can be difficult, however, partly because          In reality, though, the tools are not tightly coupled: we have
the editing environment offers few facilities for it: fundamentally,       alternatives beyond LATEX, Emacs, or even Tawny-OWL. At the
supporting mixed-syntax text in a tool is a difficult task. Our solution   same time, one output form of a literate ontology is a readable PDF
uses a multi-view approach to editing, which allows the author to          document, something far more familiar to biologists or medics than
see her source code in either a document-centric or an ontology-           Protégé or any ontology development environment.
centric view. We call this approach lenticular text, named after
lenticular printing which produces images which change depending           ACKNOWLEDGEMENTS
on your angle of viewing. This is an entirely novel solution to literate   This work was supported by Newcastle University.
programming as it effectively performs the tangling operation for
the author as they type. A representation of the two views are shown       REFERENCES
in Figures 2 and 4. The two views, it should be noted, contain the
                                                                           Horridge, M. and Patel-Schneider, P. F. (2012). Owl 2 web ontology language
same text but are syntactically different, such that the document-            manchester syntax (second edition). Technical report.
centric view is entirely valid LATEX code, while the ontology-centric      Knuth, D. E. (1984). Literate programming. The Computer Journal, 27, 97–111.
view is valid Tawny-OWL code.                                              Lord, P. (2013). The Semantic Web takes Wing: Programming Ontologies with Tawny-
   We have now implemented lenticular text for the editor, Emacs1 ,           OWL. OWLED 2013.
                                                                           Warrender, J. D. and Lord, P. (2013). A pattern-driven approach to biomedical ontology
in a package called “lentic”2 . A key feature of this implementation
                                                                              engineering. SWAT4LS 2013.
is that both views exist simultaneously in Emacs, and provide              Wood, A., Flowerdew, J., and Peacock, M. (2001). International scientific english: The
all the features of the appropriate development environment; for              language of research scientists around the world. Research Perspectives on English
                                                                              for Academic Purposes, pages 71–83.
1   https://www.gnu.org/software/emacs/
2   https://github.com/phillord/lentic                                     3   https://github.com/phillord/tawny-tutorial



2                            Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes