=Paper=
{{Paper
|id=Vol-1320/paper_31
|storemode=property
|title=Querying Life Science Ontologies with SemFacet
|pdfUrl=https://ceur-ws.org/Vol-1320/paper_31.pdf
|volume=Vol-1320
|dblpUrl=https://dblp.org/rec/conf/swat4ls/GrauKMZZ14
}}
==Querying Life Science Ontologies with SemFacet==
Querying Life Science Ontologies with SemFacet? Bernardo Cuenca Grau, Evgeny Kharlamov, Šarūnas Marciuška, Dmitriy Zheleznyakov, and Yujiao Zhou Department of Computer Science, University of Oxford first.middle.lastname@cs.ox.ac.uk Abstract. Faceted search is the de facto query paradigm in e-commerce and it has been recently adapted for the Semantic Web. In this demonstration we present our faceted search system SemFacet and show how it can enhance access to RDF and OWL 2 datasets and OWL 2 ontologies in the domain of life sciences. Sem- Facet combines keyword and faceted search and it is based on a solid theory, in particular it employs novel ontology projection techniques to enable faceted navigation for OWL 2. SemFacet relies on PAGOdA and HermiT for logical rea- soning and on JRDFox and Sesame for storing and querying RDF triples. 1 Introduction In the last decade numerous RDF datasets and OWL ontologies in the life sciences domain have become available [4–6]. Accessing the required information, however, re- mains a challenging task for end users and often requires proficiency in SPARQL. In order to make data and ontological knowledge more human accessible numerous query formulation, data exploration, and browsing tools have been developed. Many such in- terfaces have beed tailored for specific life science datasets [6, 7]. More generic systems typically rely on controlled natural language, [8, 9] diagrammatic query constructors [10, 11], or exploratory search [12]. Faceted search is the de facto query paradigm in e-commerce applications [13]. A facet typically consists of a property (e.g., ‘gender’ or ‘occupation’ when querying doc- uments about people) and a set of possible string values (e.g., ‘female’ or ‘research’), and documents in the collection are annotated with property-value pairs. During faceted search, users iteratively select facet values and the documents annotated according to the selection are returned as the search result. Several authors have proposed faceted search for querying RDF, and a number of systems have been developed [14–18]. Existing systems, however, have been designed for plain RDF data, and do not take into account ontological axioms other than sub- sumption statements between atomic classes and properties [19, 20], with reasoning playing little or no role. In stark constrast to other domains, life sciences applications tend to require a great deal of the expressive power available in OWL 2; in particular, data often involves complex class or property assertions (e.g., see FlyBase [6]) and on- tologies largely consist of complex axioms which encapsulate highly valuable informa- tion for faceted search. As a result, existing faceted search systems are not well-suited for typical life sciences applications. ? Work supported by the Royal Society, the EPSRC projects Score!, Exoda, and MaSI3 , and the FP7 project OPTIQUE [1–3] under the grant agreement 318338. In [21] we developed a faceted search approach for RDF data enhanced with OWL 2 ontologies. Our solution is based on a solid theoretical framework and it addresses many of the limitations of existing techniques. To put our ideas into practice we de- veloped SemFacet [21, 22]: a faceted search system that relies on state-of-the-art triple stores and OWL 2 reasoners to generate and update faceted query interfaces, as well as for computing search results. For demonstration purposes our platform integrates JRDFox [23] and Sesame [24] as RDF triple stores, as well as PAGOdA [25] and Her- miT [26] as fully-fledged OWL 2 reasoners. Our system is fully generic and can be used to query arbitrary data and ontologies. In this demonstration we will show how SemFacet can be used to access several datasets and ontologies from the domain of life sciences and illustrate the main advantages of our approach over existing techniques designed for plain RDF. 2 The SemFacet System SemFacet [27] combines keyword search and faceted navigation to query arbitrary ontology-enhanced RDF datasets. Our system offers the following main functionality. – Keyword search. Seach in SemFacet typically starts with a set of keywords, which are matched against the annotations in the ontology and data. – Faceted interface generation and update. SemFacet implements dedicated infras- tructure for automatically generating a faceted interface from the result of a key- word search as well as for updating an interface in response to users’ actions. A distinguishing aspect of our algorithms for interface generation and update is that they are ‘guided’ by both explicit and implicit information in the ontology and data (see [21] for details). – Query answering. User selections of facet values in an interface are compiled into SPARQL queries, which are then evaluated against the ontology and data using a reasoner. Our system allows for both disjunctive facets (i.e., those where mul- tiple value selections are interpreted disjunctively) and conjunctive facets. Thus, the SPARQL graph patters relevant to our approach can be captured by the A ND- U NION fragment of SPARQL 1.1. The current version of SemFacet integrates the following reasoners: Sesame [24] (a widely used system for RDF(S) reasoning), JRDFox [23] (a parallel in-memory RDF triple store supporting sound and com- plete reasoning for OWL 2 RL), HermiT [26] (a standard fully-fledged OWL 2 reasoner), and PAGOdA [25] (a pay-as-you-go reasoner for OWL 2 that combines JRDFox and HermiT for increased efficiency). – Refocusing. SemFacet provides functionality for changing the focus of the search from one type of object to another. For instance, if the system is displaying as search results neurons that develop from cells, where “develops from” is a facet name and “cell” is a facet value, we can refocus the search and display as search results the particular cells that are related to the selected neurons. – Customisation. Our system is generic and highly customisable for different datasets and applications. Users can upload arbitrary ontologies and datasets, select the rea- soner to be exploited for faceted navigation and query answering, customise the kinds of annotations relevant for keyword search, select which facets should be in- terpreted disjunctively or conjunctively as well as which facets should be excluded textual description Keyword Based Faceted Query Answers as GUI is missing Search Interface Snippets textual description is missing KBS Facet Query Snippet Backend Engine Generator Converter Generator textual description is missing textual description is missing Triple Store: Data and Inverted Index Reasoning e.g. DBpedia Ontology textual description is missing Abstracts Data RDFOX, PAGOdA, Hermit, Sesame Fig. 1. Left: screenshot of SemFacet over FlyBase OWL 2 data, Right: architecture of SemFacet from the search process, or select what properties are relevant for image thumbnails and snippets (if any). On the left-hand-side of Figure 1 we can see a screenshot of SemFacet with a search over the Adult Brain Anatomy dataset [4]. The navigation map in the interface enables refocusing, the filter by section displays the relevant facet names and values, and search results (i.e., query answers) are displayed on the rightmost part of the interface. The gen- eral architecture of SemFacet including its main software components is summarised on the right-hand-side of Figure 1. 3 Demonstration Scenarios During the demonstration we will show how to explore and query OWL 2 life science datasets and ontologies with SemFacet. To this end, we will preconfigure the system for several test cases, including fragments of FlyBase [6], SNOMED CT [5], as well as a selection of Bio2RDF [4] datasets. In all cases the input for the search will be a dataset and an ontology. We will demonstrate the following variants of our algorithms for interface generation and update. – Data driven, where only the data is exploited for interface generation and update. This configuration simulates existing approaches to faceted search over RDF. – Ontology driven, where only the axioms in the ontology are considered. In this configuration, facet names and values in an interface reflect semantic relationships between entities in the input ontology. – Both data and ontology driven, where both the data and ontology are exploited in interface generation and update. This is the default configuation of SemFacet, and the aim here is to show how reasoning and ontologies can improve data driven faceted interfaces and allow for enhanced data exploration. Besides querying preconfigured scenarios, the demo attendees will be able to try Sem- Facet end-to-end. This would require to load a data set and ontology, to customise the system parameters, and to query the uploaded ontology and data with the selected pa- rameters. For the end-to-end test of SemFacet the demo attendees will be able to use datasets and ontologies either from the preconfigured scenarios or the ones they provide (of reasonable size), e.g., by downloading them from the Web. 4 References [1] E. Kharlamov, M. Giese, E. Jiménez-Ruiz, et al. Optique 1.0: Semantic Access to Big Data: The Case of Norwegian Petroleum Directorate’s FactPages. In: ISWC (Posters & Demos). 2013. [2] E. Kharlamov, E. Jiménez-Ruiz, D. Zheleznyakov, et al. Optique: Towards OBDA Sys- tems for Industry. In: ESWC (Satellite Events). 2013. [3] E. Kharlamov, N. Solomakhina, O. Ozcep, et al. How Semantic Technologies can Enhance Data Access at Siemens Energy. In: ISWC. 2014. [4] F. Belleau, M. Nolin, N. Tourigny, et al. Bio2RDF: Towards a mashup to build bioinfor- matics knowledge systems. In: Journal of Biomedical Informatics 41.5 (2008). [5] SNOMED CT. www.ihtsdo.org/snomed-ct. [6] FLyBase. http://flybase.org/. [7] N. Milyaev, D. Osumi-Sutherland, S. Reeve, et al. The Virtual Fly Brain browser and query interface. In: Bioinformatics 28.3 (2012). [8] E. Franconi, P. Guagliardo, M. Trevisan, and S. Tessaris. Quelo: an Ontology-Driven Query Interface. In: DL. 2011. [9] A. Bernstein, E. Kaufmann, A. Göhring, and C. Kiefer. Querying Ontologies: A Con- trolled English Interface for End-Users. In: ISWC. 2005. [10] D. Calvanese, C. M. Keet, W. Nutt, et al. Web-based graphical querying of databases through an ontology: the Wonder system. In: SAC. 2010. [11] A. Soylu, M. G. Skjæveland, M. Giese, et al. A Preliminary Approach on Ontology-Based Visual Query Formulation for Big Data. In: MTSR. 2013. [12] S. Ferré and A. Hermann. Semantic Search: Reconciling Expressive Querying and Ex- ploratory Search. In: ISWC. 2011. [13] D. Tunkelang. Faceted Search. Morgan & Claypool Publishers, 2009. [14] P. Fafalios and Y. Tzitzikas. X-ENS: Semantic Enrichment of Web Search Results at Real- Time. In: SIGIR. 2013. [15] R. Hahn, C. Bizer, C. Sahnwaldt, et al. Faceted Wikipedia Search. In: BIS. 2010. [16] D. F. Huynh and D. R. Karger. Parallax and Companion: Set-based Browsing for the Data Web. 2013. [17] P. Heim, J. Ziegler, and S. Lohmann. gFacet: A Browser for the Web of Data. In: IMC- SSW. 2008. [18] G. Kobilarov and I. Dickinson. Humboldt: Exploring Linked Data. In: LDOW. 2008. [19] M. Hildebrand, J. van Ossenbruggen, and L. Hardman. /facet: A Browser for Heteroge- neous Semantic Web Repositories. In: ISWC. 2006. [20] E. Oren, R. Delbru, and S. Decker. Extending Faceted Navigation for RDF Data. In: ISWC. 2006. [21] M. Arenas, B. C. Grau, E. Kharlamov, et al. Faceted Search over Ontology-Enhanced RDF Data. In: CIKM. 2014. [22] M. Arenas, B. C. Grau, E. Kharlamov, et al. SemFacet: semantic faceted search over yago. In: WWW, Companion Volume. 2014. [23] RDFox. www.cs.ox.ac.uk/isg/tools/RDFox/. [24] Sesame. http://www.openrdf.org/. [25] Y. Zhou, Y. Nenov, B. C. Grau, and I. Horrocks. Complete Query Answering over Horn Ontologies Using a Triple Store. In: ISWC. 2013. [26] B. Glimm, I. Horrocks, B. Motik, et al. HermiT: An OWL 2 Reasoner. In: Journal of Automated Reasoning 53.3 (2014). [27] SemFacet. http://www.cs.ox.ac.uk/isg/tools/SemFacet/.