<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Bio2RDF Release 3: A Larger Connected Network of Linked Data for the Life Sciences</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Michel Dumontier</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alison Callahan</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jose Cruz-Toledo</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Peter Ansell</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vincent Emonet</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>François Belleau</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arnaud Droit</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Molecular Medicine, CHUQ Research Center, Laval University</institution>
          ,
          <addr-line>QC</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>IO Informatics</institution>
          ,
          <addr-line>Berkeley, CA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Microsoft QUT eResearch Centre, Queensland University of Technology</institution>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Stanford Center for Biomedical Informatics Research, Stanford University</institution>
          ,
          <addr-line>CA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2011</year>
      </pub-date>
      <abstract>
        <p>Bio2RDF is an open source project to generate and provide Linked Data for the Life Sciences. Here, we report on a third coordinated release of ~11 billion triples across 30 biomedical databases and datasets, representing a 10 fold increase in the number of triples since Bio2RDF Release 2 (Jan 2013). New clinically relevant datasets have been added. New features in this release include improved data quality, typing of every URI, extended dataset statistics, tighter integration, and a refactored linked data platform. Bio2RDF data is available via REST services, SPARQL endpoints, and downloadable files.</p>
      </abstract>
      <kwd-group>
        <kwd>linked open data</kwd>
        <kwd>semantic web</kwd>
        <kwd>RDF</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Bio2RDF is an open-source project to transform the vast collections of
heterogeneously formatted biomedical data into Linked Data [1], [2]. GitHub-housed PHP
scripts convert data (e.g. flat files, tab-delimited files, XML, JSON) into RDF using
downloadable files or APIs. Bio2RDF scripts follow a basic convention to specify the
syntax of HTTP identifiers for i) source-identified data items, ii) script-generated data
items, and iii) vocabulary used to describe the dataset contents [1]. Bio2RDF scripts
uses the Life Science Registry (http://tinyurl.com/lsregistry), a comprehensive list of
over 2200 biomedical databases, datasets and terminologies, to obtain a canonical
dataset name (prefix), which is used in the formulation of a Bio2RDF URI
http://bio2rdf.org/{prefix}:{identifier} and identifiers.org URI. Each data item is
annotated with provenance, including the URL of the files from which it was
generated. Bio2RDF types and relations have been mapped to the Semanticscience Integrated
Ontology (SIO)[3], thereby enabling queries to be formulated using a single
terminology [4]. Bio2RDF has been used for a wide variety of biomedical research including
understanding HIV-based interactions [5] and drug discovery [6].</p>
      <p>Here, we report an update to the Bio2RDF network, termed Bio2RDF Release 3,
and compare the results to Bio2RDF Release 2.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Bio2RDF Release 3</title>
      <p>We redeveloped the Bio2RDF linked data platform to provide 3 basic services
(describe, search, links) by querying the target SPARQL endpoint using Talend ESB,
a graphical Java code generator based on the Eclipse framework. The REST services
now return RDF triples or quads based on content negotiation or RESTful URIs of the
form http://bio2rdf.org/[prefix]/[service]/[format]/[searchterm]. The describe service
returns statements with the searchterm as an identifier in the subject position. The
links service returns triples with the searchterm as an identifier in the object position.
Finally, the search service returns triples containing matched literals. Datasets and
available services descriptions are stored and retrieved by the web application using a
new SPARQL endpoint (http://dataset.bio2rdf.org/sparql).
4</p>
    </sec>
    <sec id="sec-3">
      <title>Availability</title>
      <p>Bio2RDF is accessible from http://bio2rdf.org. Bio2RDF scripts, mappings, and
web application are available from GitHub (https://github.com/bio2rdf). A list of the
datasets, detailed statistics, and downloadable content (RDF files, VoID description,
statistics, virtuoso database) are available from
http://download.bio2rdf.org/current/release.html . Descriptions of Bio2RDF datasets
and file locations are also available from datahub.io .
5</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Callahan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cruz-Toledo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ansell</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumontier</surname>
          </string-name>
          , “
          <article-title>Bio2RDF Release 2: Improved coverage, interoperability and provenance of life science linked data</article-title>
          ,
          <source>” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)</source>
          ,
          <year>2013</year>
          , vol.
          <volume>7882</volume>
          LNCS, pp.
          <fpage>200</fpage>
          -
          <lpage>212</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>F.</given-names>
            <surname>Belleau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Nolin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tourigny</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rigault</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Morissette</surname>
          </string-name>
          , “Bio2RDF:
          <article-title>Towards a mashup to build bioinformatics knowledge systems,”</article-title>
          <string-name>
            <given-names>J.</given-names>
            <surname>Biomed</surname>
          </string-name>
          . Inform., vol.
          <volume>41</volume>
          , no.
          <issue>5</issue>
          , pp.
          <fpage>706</fpage>
          -
          <lpage>716</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>M. Dumontier</surname>
          </string-name>
          et al, “
          <article-title>The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery</article-title>
          .,
          <source>” J. Biomed. Semantics</source>
          , vol.
          <volume>5</volume>
          , p.
          <fpage>14</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Callahan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cruz-Toledo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumontier</surname>
          </string-name>
          , “
          <article-title>Ontology-Based Querying with Bio2RDF's Linked Open Data</article-title>
          .,
          <string-name>
            <given-names>” J.</given-names>
            <surname>Biomed</surname>
          </string-name>
          . Semantics, vol.
          <volume>4</volume>
          <issue>Suppl 1</issue>
          , p.
          <fpage>S1</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>M. A. Nolin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Dumontier</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Belleau</surname>
            , and
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Corbeil</surname>
          </string-name>
          , “
          <article-title>Building an HIV data mashup using Bio2RDF,” Brief</article-title>
          . Bioinform., vol.
          <volume>13</volume>
          , pp.
          <fpage>98</fpage>
          -
          <lpage>106</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>B.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ding</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Wild</surname>
          </string-name>
          , “
          <article-title>Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data</article-title>
          .,
          <source>” BMC Bioinformatics</source>
          , vol.
          <volume>11</volume>
          , p.
          <fpage>255</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>