=Paper= {{Paper |id=Vol-2849/paper-19 |storemode=property |title=None |pdfUrl=https://ceur-ws.org/Vol-2849/paper-19.pdf |volume=Vol-2849 |dblpUrl=https://dblp.org/rec/conf/swat4ls/ButanoCCHLRSYM19 }} ==None== https://ceur-ws.org/Vol-2849/paper-19.pdf
                  Implementing FAIR Principles in InterMine

               Daniela Butano1 , Justin Clark-Casey1 , Sergio Contrino1 , Josh Heimbach1 ,
              Rachel Lyne1 , Kevin Herald Reierskog1 , Julie Sullivan1 , Yo Yehudi1 , and Gos
                                               Micklem1

                 Department of Genetics, University of Cambridge, Cambridge, United Kingdom




                      Abstract. InterMine is an established platform to integrate and access
                      life sciences data providing a web interface and RESTful web services.
                      In order to make the data integrated in the different InterMine deploy-
                      ments even more Findable, Accessible, Interoperable and Reusable, we
                      have been improving InterMine adherence to FAIR principles, adopt-
                      ing concepts as such as persistent URIs, standards for embedding data
                      descriptions into web pages, describing data with ontologies, and data
                      licences.



             1     Introduction

             InterMine [1] is a platform to integrate and access life sciences data, providing
             flexible querying through a web interface as well as RESTful web services [2].
             Whilst InterMine comes with a core data model for common biological entities,
             different deployments can extend these components to publish any type of data.
             InterMine is an established platform first released in 2006, and already includes
             some FAIR principles such as search and structured query functionalities, web
             services, and cross-references to other InterMine instances and resources. We will
             describe here how we are improving InterMine adherence to FAIR principles [3].


             2     Persistent URIs

             InterMine already has unique URLs to identify the report pages for biological
             entities, but these are based on internal InterMine IDs that change at every
             database build. To achieve data findability and accessibility, we have gener-
             ated new navigable URLs based on the InterMine class names combined with
             local IDs provided by the data resource providers. For example, in FlyMine, the
             URL of the report page for the protein, with UniProt accession Q9V4E1, will
             be https://www.flymine.org/flymine/protein:Q9V4E1.
             Adding the InterMine database instance to third party resolvers, as such Iden-
             tifiers.org [4], we can generate persistent URIs with pattern:
             http://identifiers.org/a mine unique namespace/class name:local ID.




Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
3    Describing data with ontologies
The InterMine system is based on a core data model, described in an XML
file which defines classes (the entities in the model) and the relationships be-
tween them. InterMine already automatically applied terms from the Sequence
Ontology [5] to its data model, but, to improve data interoperability and
reusability, we have added more ontologies to its core data model and provided
InterMine instance administrators with the ability to apply any other ontology
describing their data model extension. The ontologies applied are available in
the data model and will be used in the generation of RDF.

4    Marking up web pages
In order to improve findability, we have applied structured data in JSON-
LD format to InterMine web pages, using Bioschemas.org [6] DataCatalog pro-
file in the home page and DataSet profile in the report page for DataSet. The
Bioschemas.org types Gene and Protein are in development stage.

5    Publishing Data Licences
To improve one of many aspects related to data reusability, InterMine has
updated its model, adding the attribute licence to include the licences that
govern the data sets that have been integrated. As a data integrator, we must
propagate the licences provided for the underlying data by displaying them in
the dataset report pages and in query results. At the moment only a minority
of data sets have a licence. We will propagate the licence information when
generating RDF.

References
1. Smith RN, et al. InterMine: a flexible data warehouse system for the integration
   and analysis of heterogeneous biological data. Bioinformatics. 28(23):3163-5 (2012)
   https://doi.org/10.1093/bioinformatics/bts577
2. Kalderimis A, et al. InterMine: extensive web services for modern biology. Nucleic
   Acids Res. 42(Web Server issue):W468-72 (2014)
3. Wilkinson MD, Dumontier M, Aalbersberg IJJ, et al. The FAIR Guiding Princi-
   ples for scientific data management and stewardship. Sci Data. 3: 160018 (2016)
   https://doi.org/10.1038/sdata.2016.18
4. Sarala M. Wimalaratne, Nick Juty, John Kunze et al. Uniform resolution
   of compact identifiers for biomedical data. Scientific Data. 5: 180029 (2018).
   https://doi.org/10.1038/sdata.2018.29
5. Eilbeck K., Lewis S.E., Mungall C.J., Yandell M., Stein L., Durbin R., Ashburner M.
   The Sequence Ontology: A tool for the unification of genome annotations. Genome
   Biology 6:R44 (2005)
6. Gray, A.J.G, Goble, C.A. and Jimenez, R., 2017. Bioschemas: From Potato Salad
   to Protein Annotation.In International Semantic Web Conference (Posters, Demos
   & Industry Tracks).