=Paper= {{Paper |id=Vol-1692/abstractI |storemode=property |title=FAIRDOM approach for semantic interoperability of systems biology data and models |pdfUrl=https://ceur-ws.org/Vol-1692/abstractI.pdf |volume=Vol-1692 |authors=Olga Krebs,Katy Wolstencroft,Natalie J. Stanford,Norman Morrison,Martin Golebiewski,Rostyk Kuzyakiv,Stuart Owen,Quyen Nguyen,Jacky L. Snoep,Wolfgang Mueller,Carole A. Goble |dblpUrl=https://dblp.org/rec/conf/odls/KrebsWSMGKONSMG16 }} ==FAIRDOM approach for semantic interoperability of systems biology data and models== https://ceur-ws.org/Vol-1692/abstractI.pdf
FAIRDOM approach for semantic interoperability of systems
biology data and models
Olga Krebs1*, Katy Wolstencroft3, Natalie Stanford2, Norman Morrison 2, , Martin
Golebiewski1, Rostyk Kuzyakiv4, Stuart Owen2, Quyen Nguyen1, Jacky Snoep2,
Wolfgang Mueller1, and Carole Goble2
1 Heidelberg Institute for Theoretical Studies, Germany
2   School of Computer Science, University of Manchester, UK
3 Leiden Institute of Advanced Computer Science, Leiden, NL
4 University of Zurich, Switzerland




ABSTRACT                                                                  bling systems biology projects to make their Data, Operating proce-
Motivation: The ability to collect and interlink heterogeneous            dures and Models, Findable, Accessible, Interoperable and Reusable
data and model collections is essential in systems biology. Effec-        (FAIR). FAIRDOM builds on the outcomes of the successful
tive data exchange and comparison requires sufficient data anno-          SysMO-DB and SyBIT data management projects, uniting their tool
tation. This is particularly apparent in systems biology, where data      and database development as well as their experience serving large
heterogeneity means that multiple community metadata stand-
                                                                          systems biology projects. FAIRDOMHub is a web-based platform
                                                                          comprising two main components: SEEK (http://seek4science.org)
ards are required for the annotation of a whole investigation, in-
                                                                          as a web-based front-end cataloguing and metadata platform and
cluding data, models and protocols.
                                                                          openBIS as a back-end LIMS for scalable local data collection and
Results: FAIRDOM (http://fair-dom.org/) is an initiative to enable        processing (https://sis.id.ethz.ch/software/openbis.html). Here we
the systems biology community to produce and publish FAIR                 present the semantic data integration in SEEK, and how it supports
Data, Operating procedures and Models. It allows research as-             the whole life cycle of data collection, annotation, sharing, and reuse
sets to be aggregated, interlinked and shared in the context of the       of systems biology data and resources.
systems biology investigations that produced them. Here we pre-
sent the FAIRDOM strategy in the context of semantic data inte-           2    APPROACH
gration, and how it supports the whole life cycle of data collection,
                                                                          The SEEK [1] is based on the ISA infrastructure (Investigations,
annotation, sharing and reuse of systems biology data and re-
                                                                          Studies and Assays), a standard format for describing how individ-
sources.
                                                                          ual experiments (assays) are aggregated into wider studies and in-
Availability: https://fairdomhub.org                                      vestigations [2]. The Just Enough Results Model (JERM) describes
* Contact: olga.krebs@h-its.org                                           the interrelations between assets and the metadata fields required to
                                                                          describe them. For example, for each dataset uploaded to SEEK, the
1    INTRODUCTION                                                         JERM describes what type of experiment it was, what was meas-
                                                                          ured, and what the values in the dataset mean. The JERM captures
Data integration is an essential part of systems biology. Scientists
                                                                          the core elements of MIBBI metadata, allowing users to comply
need to combine different sources of information in order to model        with these standards as well as capturing the information required
biological systems, and relate those models to available experi-          for linking in SEEK. The JERM Ontology (available from the Bi-
mental data for validation. Metadata is an important aspect of data
                                                                          oPortal, http://bioportal.bioontology.org/ontologies/1488) is an ap-
management and data sharing. Annotating experimental results with         plication ontology designed to describe the relationships between
a consistent set of information allows for easier discovery of relevant   items in SEEK (for example, data, models, experiment descriptions,
data as well as enabling others to potentially reuse it. Metadata
                                                                          samples, protocols, standard operating procedures and publica-
ranges from simple descriptions about when an experiment was              tions); and to enable these relationships to be expressed with formal
done to more detailed descriptions of where biological samples orig-      semantics. It is based on the idea of the Minimal Information Models
inated, how they were prepared, and what the experimental condi-
                                                                          (https://www.biosharing.org), which have been collected under the
tions were at the time of the experiment. Currently, only a small         umbrella of MIBBI (Minimum Information for Biological and Bio-
fraction of the data and models produced during systems biology           medical Investigations).
investigations are deposited for reuse by the community, and only a
smaller fraction of that data is standards compliant, semantically en-
riched content.                                                           3    METHODS
FAIRDOM project is a joint action of ERA-Net ERASysAPP                    The majority of laboratory scientists use spreadsheets for the daily
(https://www.erasysapp.eu/) and European Research Infrastructure          management and manipulation of data, so the RightField semantic
ISBE (http://project.isbe.eu/) to establish a data and model manage-      spreadsheet application [3] (also part of this work) is used to embed
ment service facility for systems biology. Its prime mission is to sup-   semantic annotation into the data. Individual cells, columns, or rows
port researchers, students, trainers, funders and publishers by ena-      in spreadsheets can be restricted to particular ranges of allowed clas-
                                                                          ses or instances from chosen ontologies. By embedding the JERM



                                                                                                                                               1
O. Krebs et al.



metadata model in a spreadsheet format, and enabling the use of
JERM (and other) vocabulary terms for annotation, the process of
standardized semantic data collection can become part of the exist-
ing data management activities in the laboratory. Bioinformaticians,
with experience in ontologies and data annotation, can prepare
RightField-enabled spreadsheets with embedded ontology term se-
lection support for distribution across the consortium.
JERM-compliant spreadsheet templates have been developed for a
wide range of experimental data types, their collection is available
from http://docs.seek4science.org/help/templates.html.
By embedding semantic technologies into familiar data management
tools, the SEEK enables semantic annotation of new data and the
generation and querying of Linked Data - compliant datasets, whilst
hiding the complexities of ontologies and metadata from its users.
Underlying semantic web resources additionally extract and serve
SEEK metadata in RDF (Resource Description Format). RDF ena-
bles rich semantic queries, both within SEEK and between related
resources in the web of Linked Open Data.



ACKNOWLEDGEMENT
This work was funded by the BBSRC (BBG0102181,
BB/I004637/1, BB/M013189/1), and by the BMBF grants 0315749,
20315781 and 031A525. We would like to thank the FAIRDOM
PALS and users for their valuable feedback, testing and comments.



REFERENCES
1. Wolstencroft et al (2015) SEEK: a systems biology data and
   model management platform. BMC Systems Biology (9)33
   DOI:10.1186/s12918-015-0174-y
2. Rocca-Serra, P., Brandizi, M., Maguire, E., Sklyar, N., Taylor, C.,
   Be-gley, K., Field, D., Harris, S., Hide, W., Hofmann, O. et al.
   (2010) ISA software suite: supporting standards-compliant exper-
   imental annotation and enabling curation at the community level.
   Bioinformatics, 26, 2354-2356.
3. Wolstencroft, K., Owen, S., Horridge, M., Krebs, O., Mueller, W.,
   Snoep, J.L., du Preez, F. and Goble, C. (2011) RightField: em-
   bedding ontology annotation in spreadsheets. Bioinformatics, 27,
   2021-2022




2