=Paper=
{{Paper
|id=Vol-2849/paper-19
|storemode=property
|title=None
|pdfUrl=https://ceur-ws.org/Vol-2849/paper-19.pdf
|volume=Vol-2849
|dblpUrl=https://dblp.org/rec/conf/swat4ls/ButanoCCHLRSYM19
}}
==None==
Implementing FAIR Principles in InterMine Daniela Butano1 , Justin Clark-Casey1 , Sergio Contrino1 , Josh Heimbach1 , Rachel Lyne1 , Kevin Herald Reierskog1 , Julie Sullivan1 , Yo Yehudi1 , and Gos Micklem1 Department of Genetics, University of Cambridge, Cambridge, United Kingdom Abstract. InterMine is an established platform to integrate and access life sciences data providing a web interface and RESTful web services. In order to make the data integrated in the different InterMine deploy- ments even more Findable, Accessible, Interoperable and Reusable, we have been improving InterMine adherence to FAIR principles, adopt- ing concepts as such as persistent URIs, standards for embedding data descriptions into web pages, describing data with ontologies, and data licences. 1 Introduction InterMine [1] is a platform to integrate and access life sciences data, providing flexible querying through a web interface as well as RESTful web services [2]. Whilst InterMine comes with a core data model for common biological entities, different deployments can extend these components to publish any type of data. InterMine is an established platform first released in 2006, and already includes some FAIR principles such as search and structured query functionalities, web services, and cross-references to other InterMine instances and resources. We will describe here how we are improving InterMine adherence to FAIR principles [3]. 2 Persistent URIs InterMine already has unique URLs to identify the report pages for biological entities, but these are based on internal InterMine IDs that change at every database build. To achieve data findability and accessibility, we have gener- ated new navigable URLs based on the InterMine class names combined with local IDs provided by the data resource providers. For example, in FlyMine, the URL of the report page for the protein, with UniProt accession Q9V4E1, will be https://www.flymine.org/flymine/protein:Q9V4E1. Adding the InterMine database instance to third party resolvers, as such Iden- tifiers.org [4], we can generate persistent URIs with pattern: http://identifiers.org/a mine unique namespace/class name:local ID. Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 3 Describing data with ontologies The InterMine system is based on a core data model, described in an XML file which defines classes (the entities in the model) and the relationships be- tween them. InterMine already automatically applied terms from the Sequence Ontology [5] to its data model, but, to improve data interoperability and reusability, we have added more ontologies to its core data model and provided InterMine instance administrators with the ability to apply any other ontology describing their data model extension. The ontologies applied are available in the data model and will be used in the generation of RDF. 4 Marking up web pages In order to improve findability, we have applied structured data in JSON- LD format to InterMine web pages, using Bioschemas.org [6] DataCatalog pro- file in the home page and DataSet profile in the report page for DataSet. The Bioschemas.org types Gene and Protein are in development stage. 5 Publishing Data Licences To improve one of many aspects related to data reusability, InterMine has updated its model, adding the attribute licence to include the licences that govern the data sets that have been integrated. As a data integrator, we must propagate the licences provided for the underlying data by displaying them in the dataset report pages and in query results. At the moment only a minority of data sets have a licence. We will propagate the licence information when generating RDF. References 1. Smith RN, et al. InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data. Bioinformatics. 28(23):3163-5 (2012) https://doi.org/10.1093/bioinformatics/bts577 2. Kalderimis A, et al. InterMine: extensive web services for modern biology. Nucleic Acids Res. 42(Web Server issue):W468-72 (2014) 3. Wilkinson MD, Dumontier M, Aalbersberg IJJ, et al. The FAIR Guiding Princi- ples for scientific data management and stewardship. Sci Data. 3: 160018 (2016) https://doi.org/10.1038/sdata.2016.18 4. Sarala M. Wimalaratne, Nick Juty, John Kunze et al. Uniform resolution of compact identifiers for biomedical data. Scientific Data. 5: 180029 (2018). https://doi.org/10.1038/sdata.2018.29 5. Eilbeck K., Lewis S.E., Mungall C.J., Yandell M., Stein L., Durbin R., Ashburner M. The Sequence Ontology: A tool for the unification of genome annotations. Genome Biology 6:R44 (2005) 6. Gray, A.J.G, Goble, C.A. and Jimenez, R., 2017. Bioschemas: From Potato Salad to Protein Annotation.In International Semantic Web Conference (Posters, Demos & Industry Tracks).