<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>COPO - Linked Open Infrastructure for Plant Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>F Shaw</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>A Etuk</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>A Gonzalez-Beltran</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>P Rocca-Serra</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>D Johnson</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>P Kersey</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>R Bastow</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>S Sansone</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>V Schneider</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>R Davey</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>European Bioinformatics Institute</institution>
          ,
          <addr-line>Cambridge</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Oxford e-Research Centre, University of Oxford</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>The Genome Analysis Centre</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Warwick</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Collaborative Open Plant Omics (COPO) is a brokering service between plant scientists and public repositories, enabling management, aggregation and publication of research outputs. COPO consolidates access to services and disparate information sources via web interfaces and Application Programming Interfaces (APIs). Users will deposit and view open access data, as well as seamlessly pull such data into analysis environments. Subsequent accessions and associated metadata will be tracked in COPO, thus creating a provenance trail from data to publication.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>In plant science, high throughput \-omics" technologies have resulted in more
and larger datasets. Researchers are realizing the bene ts of data sharing to
promote their work and to accelerate discovery in science based on aggregated
data. Many funding bodies and journals now require that data be made publicly
available. Despite the opportunities that data sharing o ers for recognition and
reuse, many scientists still do not use public repositories, choosing instead to
store data in private infrastructure. This is may be due to unfamiliarity with
services and technology, lack of standards and common metadata, or a lack of
funding to support archiving. The large number and size of datasets make them
di cult to store, let alone download, making cloud-based analysis tools
essential. However, submission formats to public repositories are heterogeneous, often
requiring manual authoring of complex markup documents, taking scientists out
of their elds of expertise.</p>
      <p>COPO aims to streamline the process of data deposition to public repositories
and data journals, by hiding much of the complexity of meta data capture and
data management from the end-user. The ISA (Investigation/Study/Assay)
infrastructure (www.isa-tools.org) provides the interoperability between metadata
formats required for deposition to repositories. Logical groupings of artifacts (e.g.
experimental meta data and results, PDFs, raw data, contextual supplementary
information) relating to a body of work are stored in COPO \collections" and
represented by common open standards, which are publicly searchable. Bundles
of data objects can be deposited directly into public repositories (such as the
European Nucleotide Archive, Figshare and F1000) through COPO interfaces.</p>
    </sec>
    <sec id="sec-2">
      <title>Metadata Management</title>
      <p>The ISA model enables experimental metadata attribution and management of
metadata formats, where scienti c metadata comprises information about
investigators, objectives, hypotheses, publications, subjects, experimental design,
experimental work ow, and assays and related experimental data. ISA
metadata is represented in ISA-JSON, and integrated within a broader subset of
metadata, COPO-JSON, that encompasses infrastructural information relative
to the platform itself. Both JSON implementations can be extended to JSON-LD
linked data schemas. All JSON metadata fragments are stored in a MongoDB
document-based database. Where required, ISA converters allow traversal
between representations of the same metadata, e.g. ISATab to/from ISA-JSON,
and public repository formats are expressed as ISA con gurations which are
mapped to a COPO-JSON user interface (UI) model to power the COPO UI
itself. In this way, we can quickly and easily adapt to new repositories or changes
to existing repository schemas all the way from data representation to UI design.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Platform in Development</title>
      <p>The COPO framework is being built using Python, Django, MongoDB,
JSONLD, ISATools, jQuery and Bootstrap technologies. A single sign-on (SSO)
mechanism provided via ORCiD, allows COPO to track service integration and rich
user pro le data. Anonymous users are able to search the COPO index for
research artifacts. Deposition functionality is available to authenticated users only.
The complexity of deposition services is hidden from end users, who simply ll
out clean, intuitive web forms and story-driven wizards that use the semantic
level metadata to make inferences about what a user is submitting, subsequently
making suggestions based on previous submissions.</p>
      <p>So far we have developed initial EMBL-EBI repository deposition support
(European Nucleotide Archive (ENA), MetaboLights) facilitated by
Asperapowered data transfer and ISA API integration. Figshare deposition of secondary
research artifacts (PDFs, images, gures, supplementary data, etc) is also
supported.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Future Work</title>
      <p>The large network of linked metadata that COPO will gather allows semantic
meaning to be attached to research artifacts. Semantic inferences can then be
made over artifacts providing a richer search experience than with text based
search alone, enabling researchers to quickly nd and use well-described publicly
available datasets linked by inter-connected network of metadata. The provision
of visualization for graphs of linked metadata will aid discovery of useful
connections between datasets, investigations and protocols. Support for more
repositories and open publishing platforms are planned, as well as integration with
cloud-based analysis services such as Galaxy and iPlant.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>