Ontologies Come of Age with the iKUP Browser Simon Jupp1 , Julie Klein2 , Panagiotis Moulos2 , Joost Schanstra2 , and Robert Stevens1 1 School of Computer Science, The University of Manchester, UK 2 Institut National de la Santé et de la Recherche Médicale, Toulouse, France first.last@manchester.ac.uk The iKUP browser, a web-based interface to the kidney and urinary path- way knowledge base (KUPKB) [4], shows ontologies coming of age by enabling biologists to ask questions of integrated data that form hypotheses that are be- ing tested in the laboratory. That is, semantically annotated data have been delivered to the target users in a form that they can use to change how they undertake their job. iKUP uses a browsing approach to query the KUP data as its users are usually not bioinformaticians that will design and use sophisti- cated scripts or workflows, and they are almost certainly not users familiar with semantic web technologies. The KUPKB contains data from high-throughput ex- periments on the kidney and the urinary system. The experimental data is richly interconnected to other biological data to form a single integrated repository for querying and exploration. The KUPKB uses multiple biomedical ontologies that act as a controlled vocabulary for standardised annotation of the datasets. These ontologies’ semantics are used to ask queries that return intelligent answers that form part of the biologists’ hypothesis generation process. By reducing the data to common representation languages like the Web Ontology Language (OWL) and the Resource Description Framework (RDF), we have shown semantic web technologies offering novel opportunities for data analysis. The iKUP browser supports renal biologists in finding data integrated from many disparate data sources, providing a simple interface to survey a large set of the KUP domains ’omics experiments simultaneously. At present, biologists must gather and integrate many data sets by hand and this integration is vital as genes, proteins, and small molecules have to be co-ordinated across many ’omic levels through investigations reported by many people. By hand, this kind of integration and querying is long, tedious and error prone. Users can use the iKUP browser to search for a molecule (mRNA, miRNA, Protein) or list of molecules. The query box exploits the label and synonym tags for molecules to guide the user with their search via a dynamic suggestion box that pops up as users type. Once users confirm the molecules they are looking for are present in the KUPKB, the application performs a SPARQL query based on these search terms to generate the results. The results show known information about each molecule from the range of datasets in the KUPKB. For each result, the user can see where anatomically the molecule is active and under what conditions. The metadata for each experiment is captured using ontologies. Datasets are collected in spreadsheet based templates that have ontologies embedded inside them. By using a spreadsheet template we provide a lightweight and familiar mechanism for the community of renal biologist to submit data. By embedding II the ontologies inside the spreadsheets, we are able to collect ontological annota- tions from the users by stealth. These semantic spreadsheets are created using RightField [7] which provides a mechanism to embed ontology based restrictions on values inside the spreadsheet. These consistent and precise metadata help to integrate the different experiments and are presented to the user on querying the KUPKB via iKUP. Every search result is accompanied by a navigable polyhier- archy that is generated from the ontological metadata. This hierarchy provides a faceted browser that exploits the semantics of the ontology to perform intel- ligent filtering of the data. For example, filtering the results on experiments on the glomerulus of the kidney will include in the results any experiments where the sample is glomerular or any part of the glomerulus, thus spanning from gross anatomy to the cellular level. The KUPKB is public and open-access, and biologists are encouraged to sub- mit new datasets to the KUPKB. So far, over 160 experiments have been sourced from the literature and public databases. All biological identifiers are normalised to a common identification scheme based on de facto URIs sourced from linked data resources, such as Bio2RDF [1] and OBO ontologies. To provide deeper querying and more flexibility, the knowledge base is also accessible through a SPARQL endpoint. The iKUP application is online at http://www.kupkb.org and the SPARQL endpoint is at http://www.e-lico.eu/kupkb/sparql. Many of the datasets and ontologies are sourced from public repositories, however, a spe- cific KUP Ontology (KUPO) was developed and is available from http://www.e- lico.eu/kupo. The source code for the KUPKB website is open source and avail- able from http://code.google.com/p/kupkb-dev. Both the KUPO and the KUPKB were built by engaging our non-Semantic Web savvy users in the building pro- cess using tools such as Populous [3] and RightField. Using such tools enables iKUP’s users to engage with the building process, delivering the annotated data for the KUPKB and extensions to the KUPO without having to have knowledge of OWL, RDF, SPARQL and related tools. We have taken this approach in an attempt to make the KUPKB sustainable and scalable. The KUPKB was designed to fulfil specific requirements from members of the KUP community. These requirements include the integration of experimen- tal datasets with existing biological databases about genes, proteins, miRNAs, metabolites and diseases. This type of data integration provides a constant chal- lenge for bioinformaticians. One goal of the KUPKB was to show how many of these challenges can be overcome when data are aligned to a common represen- tation and reuse is maximised. To do this we have reused portions of Bio2RDF and made the KUPO as an application ontology, reusing and extending many OBO ontologies [6]. The KUPKB utilises the state of the art in semantic web technology wher- ever possible. At its core is the RDF triple store that stores the data. After an evaluation we decided to use a Sesame framework [2] backed with a BigOWLIM storage and inference layer [5]. The iKUP web application is built using the Google Web Toolkit (GWT). Whilst the primary search on the iKUP website is powered by SPARQL, we also exploit the ontologies using the Java based OWL III API to classify results and handle the faceted browsing. We load all the ontolo- gies into the OWL API and use the OWL-DL reasoner HermiT for classification. This provides us with the necessary inference to generate the hierarchy and drive the dynamic filtering of results on the web site. We can develop the whole appli- cation using a single programming language as Java interfaces are provided for Sesame, the OWL API and GWT. The importance of having such APIs in place can not be underestimated. The iKUP browser is a demonstration that semantic technologies are sufficiently matured so that they can deliver competitive data integration solutions. The KUPKB and iKUP show ontologies coming of age by fulfilling some of their promise. The KUPKB has used ontologies to provide a common semantic framework for a broad range of previously semantically heterogeneous data. The use of Semantic Web technologies provides the means to integrate and query these data. The key to the coming of age is the iKUP user interface; without a simple means to access these integrated data, our biologist users would not and could not use the KUPKB; ontologies come of age when they deliver meaningful use to their intended users. This has now happened with the KUPKB, with biologists testing hypotheses generated via the iKUP in laboratories. Though the KUPKB is relatively small, it does what semantic technologies are supposed to do and show what is possible with biology’s rich resource of data once issues of heterogeneity are taken away and the means of delivery to its users is taken into account. Acknowledgments:This work is funded by the e-LICO project—EU/FP7/ICT- 2007.4.4. References 1. Franois Belleau, Marc-Alexandre Nolin, Nicole Tourigny, Philippe Rigault, and Jean Morissette. Bio2rdf: Towards a mashup to build bioinformatics knowledge systems. Journal of Biomedical Informatics, 41(5):706 – 716, 2008. Semantic Mashup of Biomedical Data. 2. Jeen Broekstra, Arjohn Kampman, and Frank van Harmelen. Sesame: A generic architecture for storing and querying rdf and rdf schema. In Ian Horrocks and James Hendler, editors, The Semantic Web - ISWC 2002, volume 2342 of Lecture Notes in Computer Science, pages 54–68. Springer Berlin / Heidelberg, 2002. 3. Simon Jupp, Matthew Horridge, Luigi Iannone, Julie Klein, Stuart Owen, Joost Schanstra, Robert Stevens, and Katy Wolstencroft. Populous: A tool for populating ontology templates. CoRR, abs/1012.1745, 2010. 4. Simon Jupp, Julie Klein, Joost Schanstra, and Robert Stevens. Developing a kidney and urinary pathway knowledge base. Journal of Biomedical Semantics, 2(Suppl 2):S7, 2011. 5. Atanas Kiryakov, Damyan Ognyanov, and Dimitar Manov. Owlim - a pragmatic semantic repository for owl. In Mike Dean, Yuanbo Guo, Woochun Jun, Roland Kaschek, Shonali Krishnaswamy, Zhengxiang Pan, and Quan Z. Sheng, editors, WISE Workshops, volume 3807 of Lecture Notes in Computer Science, pages 182– 192. Springer, 2005. IV 6. Smith, Barry, Ashburner, Michael, Rosse, Cornelius, Bard, Jonathan, Bug, William, Ceusters, Werner, Goldberg, Louis J., Eilbeck, Karen, Ireland, Amelia, Mungall, Christopher J., Leontis, Neocles, Rocca-Serra, Philippe, Ruttenberg, Alan, San- sone, Susanna-Assunta, Scheuermann, Richard H., Shah, Nigam, Whetzel, Patricia L., and Lewis, Suzanna. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nature Biotechnology, 25(11):1251–1255, November 2007. 7. Katy Wolstencroft, Stuart Owen, Matthew Horridge, Olga Krebs, Wolfgang Mueller, Jacky L. Snoep, Franco du Preez, and Carole Goble. Rightfield: embedding ontology annotation in spreadsheets. Bioinformatics, 27(14):2021–2022, 2011.