User and Developer Interaction with Editable and Readable Ontologies Aisha Blfgeh 1,2∗and Phillip Lord 1 1 School of Computing Science, Newcastle University, UK 2 Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia ABSTRACT Biologists represent, manipulate and share their data in a wide- The process of building ontologies is a difficult task that involves variety of tools such as Microsoft Excel spreadsheets and Word collaboration between ontology developers and domain experts and documents. Unfortunately, these environments are far removed from requires an ongoing interaction between then. This collaboration the formal structured representation of the ontology development is made more difficult, because they tend to use different tool environments with which the ontologists work to build ontologies. sets, which can hamper this interaction. In this paper, we propose As a result of this difference in tools it is unclear how we can bridge to decrease this distance between domain experts and ontology the gap between the the two groups; this would be useful to facilitate developers by creating more readable forms of ontologies, and further the interaction between domain specialists and ontologists and help to enable editing in normal office environments. to make more convenient for both sides to read and/or manipulate Building on a programmatic ontology development environment, the ontology. such as Tawny-OWL, we are now able to generate these Ontology development environments are designed to produce readable/editable from the raw ontological source and its embedded formal structured representation of any domain. Either using GUI comments. We have this translation to HTML for reading; this software such as Protégé 1 or a textual programmatic environment environment provides rich hyperlinking as well as active features such such in Tawny-OWL (Lord, 2013). The next section describes these as hiding the source code in favour of comments. We are now working tools in more details. on translation to a Word document that also enables editing. Taken together this should provide a significant new route for 2 BUILDING ONTOLOGIES collaboration between the ontologist and domain specialist. There are various tools for constructing and developing ontologies with a variety of user interfaces and environments. The most popular 1 INTRODUCTION is Protégé which is an open-source tool that provides a user interface to develop and construct ontologies of any domain. It has been Ontologies are wide-spread in the field of biology and biomedicine, widely used for developing ontologies due to the variety of plug- as they facilitate the management of knowledge and the integration ins and frameworks (Noy et al., 2003). Protégé provides an easy of information, as in the Semantic Web (Bermejo, 2007). interface for editing, visualisation and validation of ontologies as Additionally, biological data are not only heterogeneous but also well as a useful tool for managing large ontologies (Horridge et al., require complex domain knowledge to be dealt with (Stevens et al., 2011). 2000). Therefore, ontologies are useful models for representing Conversely, Tawny-OWL is a textual interface for developing this complex knowledge that is potentially changing and are ontologies in a fully programmatic manner (Warrender, 2015). also widely used in biomedicine, examples being the GO (Gene This provides a convienient and readable syntax which can be Ontology) (Ashburner et al., 2000), SNOMED (Systematized edited directly using an IDE or text editor; in this style of Nomenclature of Medicine) (IHTSDO, 2016). ontology development, the ontologist ceases to mainpulate an However, building an ontology is a challenging task due to the OWL representation directly, and instead develops the ontology as use of languages with a sophisticated formalism (such as OWL), programmatic source code. In contrast to developing ontologies in especially when combined with a complex domain such as biology OWL, the ontologist can introduce new abstractions and syntax as or medicine. Normally ontologies are built as a collaboration they choose, whether for general use or specifically for a single between domain specialists who have the knowledge of the domain ontology. An OWL version of the ontology can then be generated and ontology developers who know how to structure and represent as required. It has been implemented in Clojure, which is a dialect the knowledge; they have to work together to construct a robust of lisp and runs on the Java Virtual Machine (Lord, 2013). Like and accurate ontology. Often, community involvement during the Protégé, it also wraps the OWL-API (Mankovskii et al., 2009) process of building ontologies using meetings, focus groups and the which performs much of the actual work, including interaction with like is very important (Mankovskii et al., 2009), as in GO where reasoners, serialisation and so forth. biological community involvement is important for successful Recently, we have developed tolAPC ontology using a new uptake (Bada et al., 2004). In addition, Bult et al. (2011) state document-centric approach by including an Excel spreadsheet that the development of Protein Ontology requires wider range of directly in the development pipeline. The spreadsheet contains all involvement to include other users and developers of the associated knowledge for the ontology which has been created and maintained ontologies (such as GO) to ensure consistent architecture of the by a biologist. Meanwhile, we design the ontology patterns using ontology. Tawny-OWL, then generate the axioms by extracting data from the spreadsheet using Clojure. Thus, Tawny-OWL contains the ∗ To whom correspondence should be addressed: a.blfgeh1@newcastle.ac.uk or abelfaqeeh@kau.edu.sa 1 http://protege.stanford.edu/ 1 A. Blfgeh et al Fig. 1: Polyglot library in Tawny-OWL. spreadsheet as a part of the source code; which can be freely updated a full conversion of the environment using the multilingual feature and the ontology regenerated when needed. Hence, it remains as a of Tawny-OWL as in Figure 2, which shows the English, Italian part of the ontology development process (Blfgeh et al., 2016). and Arabic version of the pizza ontology (Lord, 2012). The latter In this approach, the Excel spreadsheet is totally developed of these is a right-to-left alphabet, and we can use the IDE to by biologists; this has a significant advantage because it is a change the direction that Tawny-OWL code is rendered in. This tool which they are familiar with and find convenient. However, demonstrates the capability of Tawny-OWL to adapt with any we cannot ensure that the programmatic transformation of the language. The next language to be implemented will be French. values in the spreadsheet to the final ontology conforms with the These multilingual environments are advantageous for being domain specialists understanding, without the biologists reading and readable and comprehensible by users when using their own interacting with source code. Therefore, next we will discuss the language. This still leaves us in a programming environment, which probabilities of making this ontological source more readable by is an environment unlikely to be familiar or comfortable to the most the specialists. domain users. Moreover the ontology lacks a narrative structure, which means that it cannot be read in a literate fashion. We consider how to enable this in the next section. 3 MULTILINGUAL ONTOLOGIES The first and most obvious mechanism for increasing ontology 4 LITERATE ONTOLOGIES readability is to enable users to read and write the ontology using their native language. Internationalisation technologies The term literate programming was invented by (Knuth, 1992) are widespread and enable support for multiple languages for where the program is treated as a piece of literature rather than applications with a graphical user interface. a program. The main idea in this paradigm is to insert text along We next consider how we can enable support for multiple with code and the program will also be its own documentation. languages in textual user interface such as Tawny-OWL, giving the The intentionality here is that the program should become easier ontologist the ability to use their own native language for all parts to understand and, conversely, that the documentation is less likely of the development process. to become out-of-date, as it is maintained in the same place. The first option is using polyglot library. This part of the As Tawny-OWL is a fully programmatic environment, we can add system mimics a fairly standard technique for internationalisation comments freely, along with any additional mark-up that we wish. of programmatic code; the ontology is developed with a set of This enables us to produce different representation of the ontology. programmatic labels which are then referenced in a language, or We have previously discussed two examples of literate ontologies: locale bundle with an appropriate translation. In the case of Tawny- the first is the Amino Acid Ontology, taken from a previous OWL, this translation appears as rdfs:label annotations on the ICBO2015 tutorial about Tawny-OWL 2 , while the second is a ontology entities (classes, properties etc). This overall process is version of the Karyotype ontology (Lord and Warrender, 2015). shown in Figure 1, placing Italian and Arabic language translations In both cases, they have been produced using the Tawny-OWL onto the pizza ontology. source code, with markup in the comments being interpreted using While this may enable internationalisation for users of a markup processing tool. Figure 3 shows a snippet from the literate the ontology, it does not change the English-centric editing ontology Amino Acids as a webpage. The result appears as a normal environment. We would wish, instead, to internationalise the web page, with syntax highlighting for the source code. entire source code of the ontology. This will make the entire ontology more comprehensible and readable for all developers who 2 http://homepages.cs.ncl.ac.uk/phillip.lord/ communicate in Italian and/or Arabic. This is fully supported with take-wing/take_wing.html 2 Readable and Editable Ontologies (a) English Pizza Ontology (b) Italian Pizza Ontology (c) Arabic Pizza Ontology Fig. 2: Multilingual Pizza ontology Literate ontologies can be represented in different forms; using can freely provide feedback on an existing ontology simply by the various techniques for converting the markup text into different interacting with Word documents. formalisms; webpages for example. Representing the ontology as an HTML webpage gives us the ability to navigate and browse the documentation either in order (section by section) or with a navigation facility (jumping between sections). It is also possible to 5 DISCUSSION hide or expose the “source” sections, leaving the reader to see just In this paper, we have described our approach to the translation of the documentation as appropriate. From the developer perspective, ontologies into a form that domain users can interact with more while the reader may still not be able to see the axiomatization in this naturally. way, the comments that they have checked are embedded directly We have shown that it is possible to translate a textual next to the code which is an interpretation of them. environment like Tawny-OWL into another human language, or It is interesting to enable specialists to read and navigate through indeed a different script, including right-to-left text. To our the ontology and its documentation. However, with HTML there are knowledge, this is the first ontology editing environment with such no editing facilities to modify and update the ontology. Therefore, textual and syntactic flexibility. Despite the fact that the multilingual rather than using HTML, we have also investigated the possibility ontologies approach is less relevant for scientific ontologies, it is to turn the whole ontology into a Word document, an environment already applied by some means in some cases of terminologies. which can also be modified, changed or updated. Now biologists For example, the use of some and only in Tawny-OWL rather and domain specialists are placed in an environment in which they than using universal and existential notations which implies the agreement for using alternative language for ontology development. 3 A. Blfgeh et al Fig. 3: Literate Amino Acid ontology in HTML representation. Further than this, however, we also translate the ontological significantly enhance process for the knowledge capture, ontology source code into alternative visualisations such as HTML and Word development and refinement from the process that we currently documents which map directly back to the source, but which can have. differ from it: for instance, by enabling hyperlinks, adding section links and hide source code in favour of commentary. Especially ACKNOWLEDGEMENTS with a Word document, this should enable a novel mechanism for Thanks to Newcastle university for supporting this research. Also, interacting with an ontology: users can see and edit comments, with thanks to King Abdulaziz University, Jeddah, Saudi Arabia for change tracking switched on, and use this as mechanism for feeding funding and supporting the study. back to the ontology developer. Using this approach, of course, only enables us to visualise ontologies developed using Tawny-OWL. While a migration path REFERENCES is provided (Warrender, 2015), a whole-sale switch to Tawny- Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, OWL is not effort-free. We note, however, that many ontologies A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel- Tarver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., are developed partly in Protégé and partly using OWL generated Rubin, G. M., and Sherlock, G. (2000). Gene ontology: tool for the unification of from other sources; a secondarly migratory path would be to use biology. the gene ontology consortium. Nat Genet, 25. Tawny-OWL for these sections. Bada, M., Stevens, R., Goble, C., Gil, Y., Ashburner, M., Blake, J., Cherry, M., Harris, We still need to evaluate this kind of interaction rigorously. For M., and Lewis, S. (2004). A Short Study on the Success of Gene Ontology. Web Semantics: Science, Services and Agents on the World Wide Web, 1(2), 235–240. this, we are proposing a focus group test which will be include Bermejo, J. (2007). A Simplified Guide to Create an Ontology. ASLab.org. specialists participants to read the document of the ontology and http://tierra.aslab.upm.es/documents/controlled/ provide their opinion about it and whether they prefer to update any ASLAB-R-2007-004.pdf. terminologies according to their expertise. Blfgeh, A., Warrender, J. D., Hilkens, C. M. U., and Lord, P. (2016). A document- We are not proposing that Word documents will be directly used centric approach for developing the tolapc ontology. In F. Loebe, M. Boeker, H. Herre, L. Jansen, and D. Schober, editors, Proceedings of the 7th Workshop on by domain specialists for editing ontologies. We expect that an Ontologies and Data in Life Sciences, ODLS 2016, organized by the GI Workgroup ontologist will be involved with incorporating changes suggested Ontologies in Biomedicine and Life Sciences (OBML), Halle (Saale), Germany, back to the domain user; in this sense, we are using a Word September 29-30, 2016., volume 1692 of CEUR Workshop Proceedings, pages 1–6. document as an intermediate representation (Rector et al., 2001). CEUR-WS.org. http://ceur-ws.org/Vol-1692/paperB.pdf. Bult, C. J., Drabkin, H. J., Evsikov, A., Natale, D., Arighi, C., Roberts, N., Ruttenberg, Our hope is that the reviewing features of Word should, however, A., D’Eustachio, P., Smith, B., Blake, J. A., and Wu, C. (2011). The representation enable us to provide a rich environment to support the ontologist of protein complexes in the protein ontology (pro). BMC Bioinformatics, 12(1), 371. in this process. Taken together, these should provide us with a Horridge, M., Knublauch, H., Rector, A., Stevens, R., Wroe, C., Jupp, S., Moulton, G., Drummond, N., and Brandt, S. (2011). A Practical 4 Readable and Editable Ontologies Guide To Building OWL Ontologies Using Protégé 4 and CO-ODE Galuba, W., Girdzijauskas, S., and Bechhofer, S. (2009). OWL: Web Ontology Tools Edition 1.3. http://dio.freelabs.net/downloads/ Language. In Encyclopedia of Database Systems, pages 2008–2009. Springer ProtegeOWLTutorialP4{_}v1{_}3.pdf. US, Boston, MA. http://www.springerlink.com/index/10.1007/ IHTSDO (2016). International health terminology standards development organisation. 978-0-387-39940-9{_}1073. Knuth, D. (1992). Literate Programming. Center for the Study of Language and Noy, N. F., Crubézy, M., Fergerson, R. W., Knublauch, H., Tu, S. W., Vendetti, J., and Information Publication Lecture Notes. Cambridge University Press. Musen, M. A. (2003). Protégé-2000: An Open-Source Ontology-Development and Lord, P. (2012). Tawny-owl pizza. https://github.com/phillord/ Knowledge-Acquisition Environment. AMIA Annu Symp Proc, 953, 953. http: tawny-pizza. //protege.stanford.edu. Lord, P. (2013). The Semantic Web takes Wing: Programming Ontologies with Tawny- Rector, A. L., Wroe, C., Rogers, J., and Roberts, A. (2001). Untangling taxonomies and OWL. CoRR, abs/1303.0. http://arxiv.org/abs/1303.0213. relationships: Personal and practical problems in loosely coupled development of Lord, P. and Warrender, J. D. (2015). A highly literate approach to ontology building. large ontologies. In Proceedings of the 1st International Conference on Knowledge abs/1512.04250. Capture, K-CAP ’01, pages 139–146, New York, NY, USA. ACM. Mankovskii, S., Gogolla, M., Urban, S. D., Dietrich, S. W., Urban, S. D., Stevens, R., Goble, C. a., and Bechhofer, S. (2000). Ontology-based knowledge Dietrich, S. W., Yang, M.-H., Dobbie, G., Ling, T. W., Halpin, T., Kemme, B., representation for bioinformatics. Briefings in bioinformatics, 1(4), 398–414. Schweikardt, N., Abelló, A., Romero, O., Jimenez-Peris, R., Stevens, R., Lord, Warrender, J. D. (2015). The Consistent Representation of Scientific Knowledge: P., Gruber, T., Leenheer, P. D., Gal, A., Bechhofer, S., Paton, N. W., Li, C., Investigations into the Ontology of Karyotypes and Mitochondria. Ph.D. Buchmann, A., Hardavellas, N., Pandis, I., Liu, B., Shapiro, M., Bellatreche, thesis, Newcastle University. https://theses.ncl.ac.uk/dspace/ L., Gray, P. M. D., Aalst, W. M. P., Palmer, N., Palmer, N., Risch, T., bitstream/10443/2910/1/Warrender,J2015.pdf. 5