FAIRDOM approach for semantic interoperability of systems biology data and models Olga Krebs1*, Katy Wolstencroft3, Natalie Stanford2, Norman Morrison 2, , Martin Golebiewski1, Rostyk Kuzyakiv4, Stuart Owen2, Quyen Nguyen1, Jacky Snoep2, Wolfgang Mueller1, and Carole Goble2 1 Heidelberg Institute for Theoretical Studies, Germany 2 School of Computer Science, University of Manchester, UK 3 Leiden Institute of Advanced Computer Science, Leiden, NL 4 University of Zurich, Switzerland ABSTRACT bling systems biology projects to make their Data, Operating proce- Motivation: The ability to collect and interlink heterogeneous dures and Models, Findable, Accessible, Interoperable and Reusable data and model collections is essential in systems biology. Effec- (FAIR). FAIRDOM builds on the outcomes of the successful tive data exchange and comparison requires sufficient data anno- SysMO-DB and SyBIT data management projects, uniting their tool tation. This is particularly apparent in systems biology, where data and database development as well as their experience serving large heterogeneity means that multiple community metadata stand- systems biology projects. FAIRDOMHub is a web-based platform comprising two main components: SEEK (http://seek4science.org) ards are required for the annotation of a whole investigation, in- as a web-based front-end cataloguing and metadata platform and cluding data, models and protocols. openBIS as a back-end LIMS for scalable local data collection and Results: FAIRDOM (http://fair-dom.org/) is an initiative to enable processing (https://sis.id.ethz.ch/software/openbis.html). Here we the systems biology community to produce and publish FAIR present the semantic data integration in SEEK, and how it supports Data, Operating procedures and Models. It allows research as- the whole life cycle of data collection, annotation, sharing, and reuse sets to be aggregated, interlinked and shared in the context of the of systems biology data and resources. systems biology investigations that produced them. Here we pre- sent the FAIRDOM strategy in the context of semantic data inte- 2 APPROACH gration, and how it supports the whole life cycle of data collection, The SEEK [1] is based on the ISA infrastructure (Investigations, annotation, sharing and reuse of systems biology data and re- Studies and Assays), a standard format for describing how individ- sources. ual experiments (assays) are aggregated into wider studies and in- Availability: https://fairdomhub.org vestigations [2]. The Just Enough Results Model (JERM) describes * Contact: olga.krebs@h-its.org the interrelations between assets and the metadata fields required to describe them. For example, for each dataset uploaded to SEEK, the 1 INTRODUCTION JERM describes what type of experiment it was, what was meas- ured, and what the values in the dataset mean. The JERM captures Data integration is an essential part of systems biology. Scientists the core elements of MIBBI metadata, allowing users to comply need to combine different sources of information in order to model with these standards as well as capturing the information required biological systems, and relate those models to available experi- for linking in SEEK. The JERM Ontology (available from the Bi- mental data for validation. Metadata is an important aspect of data oPortal, http://bioportal.bioontology.org/ontologies/1488) is an ap- management and data sharing. Annotating experimental results with plication ontology designed to describe the relationships between a consistent set of information allows for easier discovery of relevant items in SEEK (for example, data, models, experiment descriptions, data as well as enabling others to potentially reuse it. Metadata samples, protocols, standard operating procedures and publica- ranges from simple descriptions about when an experiment was tions); and to enable these relationships to be expressed with formal done to more detailed descriptions of where biological samples orig- semantics. It is based on the idea of the Minimal Information Models inated, how they were prepared, and what the experimental condi- (https://www.biosharing.org), which have been collected under the tions were at the time of the experiment. Currently, only a small umbrella of MIBBI (Minimum Information for Biological and Bio- fraction of the data and models produced during systems biology medical Investigations). investigations are deposited for reuse by the community, and only a smaller fraction of that data is standards compliant, semantically en- riched content. 3 METHODS FAIRDOM project is a joint action of ERA-Net ERASysAPP The majority of laboratory scientists use spreadsheets for the daily (https://www.erasysapp.eu/) and European Research Infrastructure management and manipulation of data, so the RightField semantic ISBE (http://project.isbe.eu/) to establish a data and model manage- spreadsheet application [3] (also part of this work) is used to embed ment service facility for systems biology. Its prime mission is to sup- semantic annotation into the data. Individual cells, columns, or rows port researchers, students, trainers, funders and publishers by ena- in spreadsheets can be restricted to particular ranges of allowed clas- ses or instances from chosen ontologies. By embedding the JERM 1 O. Krebs et al. metadata model in a spreadsheet format, and enabling the use of JERM (and other) vocabulary terms for annotation, the process of standardized semantic data collection can become part of the exist- ing data management activities in the laboratory. Bioinformaticians, with experience in ontologies and data annotation, can prepare RightField-enabled spreadsheets with embedded ontology term se- lection support for distribution across the consortium. JERM-compliant spreadsheet templates have been developed for a wide range of experimental data types, their collection is available from http://docs.seek4science.org/help/templates.html. By embedding semantic technologies into familiar data management tools, the SEEK enables semantic annotation of new data and the generation and querying of Linked Data - compliant datasets, whilst hiding the complexities of ontologies and metadata from its users. Underlying semantic web resources additionally extract and serve SEEK metadata in RDF (Resource Description Format). RDF ena- bles rich semantic queries, both within SEEK and between related resources in the web of Linked Open Data. ACKNOWLEDGEMENT This work was funded by the BBSRC (BBG0102181, BB/I004637/1, BB/M013189/1), and by the BMBF grants 0315749, 20315781 and 031A525. We would like to thank the FAIRDOM PALS and users for their valuable feedback, testing and comments. REFERENCES 1. Wolstencroft et al (2015) SEEK: a systems biology data and model management platform. BMC Systems Biology (9)33 DOI:10.1186/s12918-015-0174-y 2. Rocca-Serra, P., Brandizi, M., Maguire, E., Sklyar, N., Taylor, C., Be-gley, K., Field, D., Harris, S., Hide, W., Hofmann, O. et al. (2010) ISA software suite: supporting standards-compliant exper- imental annotation and enabling curation at the community level. Bioinformatics, 26, 2354-2356. 3. Wolstencroft, K., Owen, S., Horridge, M., Krebs, O., Mueller, W., Snoep, J.L., du Preez, F. and Goble, C. (2011) RightField: em- bedding ontology annotation in spreadsheets. Bioinformatics, 27, 2021-2022 2