<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Bioschemas: schema.org for the Life Sciences</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>University of Maastricht</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Minderbroedersberg</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>LK Maastricht</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>The Netherlands</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>European Molecular Biology Laboratory - European Bioinformatics Institute</institution>
          ,
          <addr-line>Wellcome Genome Campus, CB10 1SD</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universidad Politécnica de Madrid, Campus de Montegancedo, 28660 Boadilla del Monte</institution>
          ,
          <addr-line>Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <fpage>0000</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>Websites are commonly used to expose data to end users, enabling search, filter, and download capabilities making it easier for users to find, organize and obtain data relevant to their own interests. With the continuous growth of data in the Life Sciences domain, it becomes difficult for users to easily find information required for their research on one single website. Search engines should make it easier for researchers to search and retrieve collated information from multiple sites so they can better decide where to go next. Schema.org is a collaborative project providing schemas for semantically structuring data in web pages. By adding semantic mark-up it becomes easier to determine whether a web page refers to a book or a movie. It also facilitates summarizing information in a fashion similar to infoboxes used in Wikipedia. Bioschemas is a community effort aiming to extend schema.org to support mark-up for Life Sciences websites. Here we present an overview of the main types used and proposed by Bioschemas in order to support such mark up. Availability: http://bioschemas.org/</p>
      </abstract>
      <kwd-group>
        <kwd>Semantic mark-up</kwd>
        <kwd>structured data</kwd>
        <kwd>data discoverability</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Bioschemas is a community initiative aiming to extend schema.org in order to improve
data discoverability and interoperability in Life Sciences. Bioschemas reuses some
existing types such as DataCatalog and Dataset, adds new properties to others such as
CreativeWork, and proposes new types such as BioChemEntity, DataRecord and
LabProtocol. Editions and additions are expected to be included in schema.org during
2018. In addition to types and properties, Bioschemas also provides guidelines
regarding cardinality –one or many, marginality –minimum, recommended or optional, and
usage of controlled vocabularies for those properties considered more relevant for Life
Sciences data. Specifications and guidelines are available at
http://bioschemas.org/specifications. An overview of the main types involved in Bioschemas is
presented in Fig. 1.</p>
      <p>BioChemEntity acts as a flexible and extensible wrapper, easy to customize. For such
customizations, a.k.a. profiles, Bioschemas provides guidelines on the (i) minimum and
recommended data to be delivered, (ii) expected cardinality, and (iii) third-party
ontology terms useful to model the data. For instance, a protein profile advises as minimum
one unique identifier while as recommended transcribed genes, organisms and
associated diseases. On top of it, the protein profile recommends a well-known ontology class
or controlled vocabulary type such as http://purl.obolibrary.org/obo/PR_000000001 to
represent the protein type, as well as object properties or predicates such as
http://semanticscience.org/resource/SIO_010081 to link to transcribed genes,
schema:isContainedIn to link to organisms, and http://semanticscience.org/resource/SIO_000001 to
link to associated diseases. The property schema:mainEntityOfPage is used to link the
entity to its corresponding schema:DataRecord in a schema:Dataset, while
schema:sameAs is used to link to other pages describing this entity, and schema:url is
used to link to its official webpage. Following the specifications, Bioschemas will
continue with adoption by some key resources in Life Sciences and development of tools
for validation and data extraction.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>