=Paper= {{Paper |id=Vol-2042/paper33 |storemode=property |title=Bioschemas: schema.org for the Life Sciences |pdfUrl=https://ceur-ws.org/Vol-2042/paper33.pdf |volume=Vol-2042 |authors=Leyla Jael García Castro,Olga X. Giraldo,Alexander Garcia,Michel Dumontier,Bioschemas Community |dblpUrl=https://dblp.org/rec/conf/swat4ls/CastroGCDC17 }} ==Bioschemas: schema.org for the Life Sciences== https://ceur-ws.org/Vol-2042/paper33.pdf
            Bioschemas: schema.org for the Life Sciences

Leyla Garcia1[0000-0003-3986-0510], Olga Giraldo2[0000-0003-2978-8922] , Alexander Garcia2[0000-
   0003-1238-2539]
                   , Michel Dumontier3[0000-0003-4727-9435] and Bioschemas Community
1 European Molecular Biology Laboratory - European Bioinformatics Institute, Wellcome Ge-

                                  nome Campus, CB10 1SD, UK.
                                    ljgarcia@ebi.ac.uk
    2 Universidad Politécnica de Madrid, Campus de Montegancedo, 28660 Boadilla del Monte,

                                           Madrid, Spain
                     ogiraldo@fi.upm.es, agarcia@gmail.com
    3 University of Maastricht , Minderbroedersberg 4-6, 6211 LK Maastricht, The Netherlands

                   michel.dumontier@maastrichtuniversity.nl



         Abstract. Websites are commonly used to expose data to end users, enabling
         search, filter, and download capabilities making it easier for users to find, organ-
         ize and obtain data relevant to their own interests. With the continuous growth of
         data in the Life Sciences domain, it becomes difficult for users to easily find in-
         formation required for their research on one single website. Search engines
         should make it easier for researchers to search and retrieve collated information
         from multiple sites so they can better decide where to go next. Schema.org is a
         collaborative project providing schemas for semantically structuring data in web
         pages. By adding semantic mark-up it becomes easier to determine whether a
         web page refers to a book or a movie. It also facilitates summarizing information
         in a fashion similar to infoboxes used in Wikipedia. Bioschemas is a community
         effort aiming to extend schema.org to support mark-up for Life Sciences web-
         sites. Here we present an overview of the main types used and proposed by Bio-
         schemas in order to support such mark up. Availability: http://bioschemas.org/


         Keywords: Semantic mark-up, structured data, data discoverability.


1        Bioschemas

Bioschemas is a community initiative aiming to extend schema.org in order to improve
data discoverability and interoperability in Life Sciences. Bioschemas reuses some ex-
isting types such as DataCatalog and Dataset, adds new properties to others such as
CreativeWork, and proposes new types such as BioChemEntity, DataRecord and Lab-
Protocol. Editions and additions are expected to be included in schema.org during
2018. In addition to types and properties, Bioschemas also provides guidelines regard-
ing cardinality –one or many, marginality –minimum, recommended or optional, and
usage of controlled vocabularies for those properties considered more relevant for Life
2


Sciences data. Specifications and guidelines are available at http://bio-
schemas.org/specifications. An overview of the main types involved in Bioschemas is
presented in Fig. 1.




                    Fig. 1. Bioschemas types and properties at a glance.

   BioChemEntity acts as a flexible and extensible wrapper, easy to customize. For such
customizations, a.k.a. profiles, Bioschemas provides guidelines on the (i) minimum and
recommended data to be delivered, (ii) expected cardinality, and (iii) third-party ontol-
ogy terms useful to model the data. For instance, a protein profile advises as minimum
one unique identifier while as recommended transcribed genes, organisms and associ-
ated diseases. On top of it, the protein profile recommends a well-known ontology class
or controlled vocabulary type such as http://purl.obolibrary.org/obo/PR_000000001 to
represent the protein type, as well as object properties or predicates such as http://se-
manticscience.org/resource/SIO_010081 to link to transcribed genes, schema:isCon-
tainedIn to link to organisms, and http://semanticscience.org/resource/SIO_000001 to
link to associated diseases. The property schema:mainEntityOfPage is used to link the
entity to its corresponding schema:DataRecord in a schema:Dataset, while
schema:sameAs is used to link to other pages describing this entity, and schema:url is
used to link to its official webpage. Following the specifications, Bioschemas will con-
tinue with adoption by some key resources in Life Sciences and development of tools
for validation and data extraction.