<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>AgroLD: a Knowledge Graph for the Plant Sciences</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bill Happi</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Valentin Guignon</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manuel Ruiz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pierre Larmande</string-name>
          <email>pierre.larmande@ird.fr</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bertrand Pitollat</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ndomassi Tando</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yann Pomie</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Montpellier</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>France</string-name>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Knowledge Graph, Linked Data, FAIR data, Plant Sciences, Bioinformatics</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>AGAP, CIRAD, INRAE, Univ. Montpellier</institution>
          ,
          <addr-line>Montpellier</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Bioversity International</institution>
          ,
          <addr-line>Montpellier</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>DIADE, IRD, Univ. Montpellier, CIRAD</institution>
          ,
          <addr-line>Montpellier</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>French Institute of Bioinformatics (IFB)-South Green Bioinformatics Platform</institution>
          ,
          <addr-line>Bioversity, CIRAD, INRAE, IRD</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Recent advances in high-throughput technologies have revolutionized the analysis in the field of the plant sciences. However, there is an urgent need to effectively integrate and assimilate complementary information to understand the biological system in its entirety. We have developed AgroLD, a knowledge graph that exploits Semantic Web technologies to integrate data of interest for the plant science community e.g., rice, wheat, arabidopsis and in this way facilitate the formulation and validation of new scientific hypotheses. AgroLD contains around 900M triples created by annotating and integrating more than 100 datasets coming from 15 data sources. Our objective is to offer a domain specific knowledge platform to answer complex biological and plant sciences questions related to the implication of genes in, for instance, plant disease resistance or adaptative responses to climate change. In this demo, we present some results which currently focused on genomics, genetics and trait associations.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. The AgroLD Knowledge Graph</title>
      <p>AgroLD is built incrementally spanning vast aspects of plant molecular interactions. The current
phase covers information on genes, proteins, predictions of homologous genes, metabolic
pathways, plant trait associations and genetic studies. At this stage, we have integrated data
from several resources such as Ensembl plants, UniProtKB, Gene Ontology Annotation. The
choice of these sources has been guided by the biological community, as they are widely used
and have a strong impact on the user’s confidence. We have also integrated resources developed
by the local SouthGreen platform 1 such as TropGeneDB, a tropical plant genetics database, Rice
Genome Hub, a rice genomics database, GreenPhylDB, a comparative genomics database for
tropical plants, OryzaTagLine, a rice phenotype database and SniPlay, a rice genomic variation
database. These resources bring together experimental data produced by researcher groups in
Sciences
∗Corresponding author.
CEUR
Workshop
Proceedings
htp:/ceur-ws.org
ISN1613-073</p>
      <p>CEUR Workshop Proceedings (CEUR-WS.org)
Montpellier and the South of France. The online documentation provides an overview of the
integrated data sources 2.</p>
      <p>The conceptual framework of AgroLD is based on well-established ontologies in the plant field
such as Gene Ontology, Plant Ontology or Plant Trait Ontology. Furthermore, we developed
a dedicated schema 3 that creates links between the imported ontologies and introduces new
classes and properties. The online documentation shows the complete list of the used ontologies.
The majority of these ontologies are hosted by the OBO Foundry project.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Statistics</title>
      <p>As of today, AgroLD contains more than 900 Millions triples resulting of the integration of
roughly 100 datasets gathered in 33 named graphs. Table 1 gives an overview of available
resources and tools. All datasets are available in Zenodo under the Creative Commons
Attribution 4.0 International license (CC-BY 4.0). Each resource can contain several datasets, for
instances, one dataset per species or per data type. Combining all ontologies and datasets
imported, AgroLD graph gather 383 classes and 793 properties. Among the pipelines developed
to lift up the datasets, we focused also on connecting our datasets with others. The property
rdfs:seeAlso reach the total number of almost 80 millions of outbound links making the AgroLD
graph correctly linked with other datasets in the LOD. Besides, we paid attention to increasing
the number of semantic annotations with imported ontologies, which increased the number
of links between datasets making the overall graph denser. We created more than 14 million
semantic links linking entities to ontological classes. Finally, our data linking strategy allowed
us to create around 160,000 owl:sameAs links between entities.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>