<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Flexible Scientific Data Management for Plant Phenomics Research</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Peter Ansell</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Robert Furbank</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kutila Gunasekera</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jianming Guo</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Benn</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gareth Williams</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xavier Sirault</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CSIRO IM&amp;T Advanced Scientific Computing and Research Data Services</institution>
          ,
          <addr-line>Melbourne</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>CSIRO Plant industry, High Resolution Plant Phenomics Centre</institution>
          ,
          <addr-line>Canberra</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>eResearch Group, School of Information Technology and Electronic Engineering, University of Queensland</institution>
          ,
          <addr-line>Brisbane</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we expand on the design and implementation of the Phenomics Ontology Driven Data repository [1] (PODD) with respect to the capture, storage and retrieval of data and metadata generated at the High Resolution Plant Phenomics Centre (Canberra, Australia). PODD is a schema-driven Semantic Web database which uses the Resource Description Framework (RDF) model to store semi-structured information. RDF allows PODD to process information about a range of phenomics experiments without needing to define a universal schema for all of the different structures. To illustrate the process, exemplar datasets were generated using a medium throughput, high resolution, three-dimensional digitisation system purposely built for studying plant structure and function simultaneously under specific environmental conditions. The High Performance Compute (HPC), storage and data collection publication aspects of the workflow and their realisation in CSIRO infrastructure are also discussed along with their relationship to PODD.</p>
      </abstract>
      <kwd-group>
        <kwd>eResearch</kwd>
        <kwd>Semantic Web</kwd>
        <kwd>RDF</kwd>
        <kwd>OWL</kwd>
        <kwd>Data collection citation</kwd>
        <kwd>BagIt</kwd>
        <kwd>Data Access Portal</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Since the genomics era, biology has become a data-driven science. Advances
in robotics, automation and imaging, in combination with high performance
computing have permitted the rapid production of large and complex
biological datasets. Currently, high volumes of heterogeneous image data, physiological
and morphological measurements are being acquired by a range of new
phenotyping platforms located in purpose built phenomics centres across the world.
These large datasets of phenotypic characteristics such as growth rate, plant
architecture, photosynthetic performance, yield must be stored and correlated
with genotypes. These factors provide evidence of genetic variation in natural
and derived genetic populations (e.g. germplasm collections, association genetic
panels, recombinant inbred lines). They also enable a deeper understanding of
the dynamic relationship between phenotype, genotype and environment which
is necessary to continue delivering the increase in productivity necessary for
feeding the world.</p>
      <p>The vast array of phenotypic data collected from a variety of phenomics
platforms must be combined with metadata explaining how the raw data was
collected. This combination of raw data and metadata are then delivered to a
range of analysis pipelines, which transform the raw data into aggregated
multiphase datasets, each phase representing a new aggregation or inference from the
original raw data. This reduction process converts the raw multi-dimensional
data into information which is conceptually interpretable by a human being, i.e.
new knowledge. The additional metadata describing the steps taken are recorded
to give context to the data.</p>
      <p>To make sense of this large amount of information, sophisticated storage,
archiving, searching and analysis capabilities are required. To date solutions to
this problem have been handled essentially by private companies, and no suitable
solution exists in the public domain. Lack of systems, both to manage linked
metadata, and controlled vocabularies to describe plant growth and experimental
conditions, have severely hampered sharing of plant phenomics data, comparison
of results between laboratories and the capacity to carry out meta-analysis of
existing data sets.</p>
      <p>
        Thus, to support publicly-funded phenomics activities in Australia, the
Phenomics Ontology Driven Data repository (PODD) has been developed as a
repository for data produced by the variety of plant imaging and phenotyping
platforms available at the High Resolution Plant Phenomics Centre, as well as for
recording the contextual metadata associated with plant genotypes, treatments
and environmental conditions [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        In this paper, we describe the workflow management that the High Resolution
Plant Phenomics Centre (HRPPC) has implemented for keeping track of its
phenomics data, metadata and experimental processes. This complex challenge
was addressed by building a multi-disciplinary group of information technology
experts and embedding users of phenomics technologies into it. The result of
the approach is a state of the art computational and data mining environment,
optimised for data access, data discovery and data sharing, which also provides
the flexibility for linking genomic information through the use of RDF triples. In
this context, we also describe the role of the CSIRO Data Access Portal (DAP) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
to annotate and store raw and processed datasets. DAP also provides long term
secure storage for data collections and the ability to search for, control access
to, and cite them via Digital Object Identifiers. PODD manages the mapping
of collections located in DAP to PODD projects, providing for the storage of
large images and documents unsuited to RDF databases. Figure 1 shows the
relationship between components and key data flows.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Phenomics Ontology Driven Data repository</title>
      <sec id="sec-2-1">
        <title>Semantic science for phenomics data management</title>
        <p>Scientists have focused on including semantics into datasets, typically using the
foundations of RDF and OWL, from two main directions. Some focus on
defining ontologies based on hierarchies of scientific concepts and properties, while
others have focused on mapping complex scientific datasets to RDF using syntax
transformations without initially defining the semantic meaning of the results. In
reality, most efforts fall somewhere in the middle, with ontological annotations
attached to some data points while other nearby data points are syntactically
represented using RDF, without links to ontologies of scientific concepts.</p>
        <p>
          Increasingly however, providers of scientific datasets are focusing on
enhancing their datasets using curated scientific concepts from ontologies. For example,
scientists have used the Gene Ontology [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] to link well known concepts to
represent common elements across genomics datasets, while the Plant Ontology [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]
allows the description of plant based datasets.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Redesign of the Phenomics Ontology Driven Data repository</title>
        <p>
          The PODD repository relies on semantic web technologies to manage phenomics
data and metadata. Although both ontologies and mappings are essential, in
PODD it was necessary to build the system with a relaxed ontological
vocabulary. This enables scientists to sparsely populate their datasets and sparsely
link to community defined upper ontologies as necessary. This allows scientists
to continue to maintain projects containing curated scientific concepts alongside
raw experimental data. The PODD repository was redesigned based on an
evaluation of the original software [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] that found it was not able to scale sufficiently
to suit the HRPPC needs due to design and implementation deficiencies. The
major design differences to the software implemented by [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] are that projects
are no longer the only supported top object type, and projects are not stored in
multiple parts, as that approach was not able to scale as was originally
hypothesised.
        </p>
        <p>A PODD project in PlantScanTM contains top level branches describing the
various parts of a scientific project. These include a branch for raw data, along
with separate branches for results, analysis, and publications related to the
project. In the case of raw data, the semantics are not necessarily clear and
are not easily defined by the automated platforms collecting the data. The
scientist may later semantically link the data with results, conclusions, and external
ontologies. For example, a scientist may annotate the data objects representing
images of a plant with a link to a trait that is defined in the Plant Ontology.
They may also annotate the image with a link to a trait that is defined inside of
the project, such as when the trait is novel and not represented in a community
ontology.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Semantic validation</title>
        <p>
          PODD validates scientific project descriptions using independently configurable
constraints based on OWL (Web Ontology Language) ontologies. Although PODD
currently solely supports OWL for constraint verification, it could be easily
extended in other cases to use different systems such as N3, RDFS, SPARQL, or
SPIN as rules languages [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>OWL is used to determine whether projects are both internally consistent,
with all objects having an explicit RDF type, and whether they are consistent
with the ontologies that they import. For example, any OWL object property
that has been defined to link from image acquisition runs to images defines the
provenance of an image.</p>
        <p>General scientific properties and phenotype specific properties are defined
in optional extension ontologies as illustrated in Figure 2. These are used by
scientists to annotate their projects with concepts specific to their field, without
requiring other scientists using the same PODD installation to use phenotype
properties to annotate their projects.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>CSIRO Data Access Portal</title>
      <p>
        CSIRO’s Research Data Service (RDS) has developed the Data Access Portal
(DAP), an open source web application that enables research data to be
discovered, managed and shared. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
      </p>
      <p>Researchers can describe a data collection, deposit data, choose a license,
and add attribution details. Access to a collection’s description and/or data
can be restricted to CSIRO or a set of individuals (within CSIRO or partner
organisations) or it can be made public, becoming searchable by anyone via the
Internet. In the case where a collection and its data are public, a Digital Object
Identifier (DOI) is issued and can be used to formally cite the collection in a
publication.</p>
    </sec>
    <sec id="sec-4">
      <title>The PlantScan</title>
      <p>TM</p>
      <p>
        digitisation platform
BagIt is defined by an Internet Engineering Task Force (IETF) document as an
“hierarchical file packaging format for storage and transfer of arbitrary digital
content”[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. A payload manifest details content and MD5 or SHA hashes for
content integrity verification. Data file related metadata can be stored in
predefined files as key-value pairs.
      </p>
      <p>
        For PlantScanTM , file-level metadata includes plant barcodes, batch numbers,
and plant type, although the BagIt specification does not mandate a particular
archiving strategy, with the focus being upon the directory structure, special
files, and integrity checking. BagIt-conforming tools [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] were assessed and
where necessary, improvements were implemented and tested to ensure that the
tools were fit for purpose in the CSIRO Advanced Scientific Computing (ASC)
HPC environment.
4.2
      </p>
      <sec id="sec-4-1">
        <title>Bag preparation for a DAP collection</title>
        <p>
          CSIRO ASC shared facilities [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] are used to process the raw PlantScanTM data
to derive data products (meshes). Raw data and meshes are collected using the
BagIt format [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] and stored in the ASC archival system. ASC High Performance
Compute (HPC) hosts (systems with high processor count and large memory)
are taken advantage of to create and verify bags more rapidly than would be
possible on conventional computer systems. CSIRO’s HRPPC makes use of DAP
to store collections of PlantScanTM raw images and processed mesh data as bags.
Currently, one bag is equivalent to a single batch scanned on the PlantScanTM
local software system, which usually means the same kind of plant with different
genotypes scanned under one experiment configuration profile.
        </p>
        <p>Raw data from PlantScanTM local storage (HRPPC-Store) and data
processed on HPC hosts are transferred to ASC bulk storage where image and mesh
files are organised in folders by batch, then barcode number, then subfolders for
each image file type, including RGB images, IR images, and LiDAR (Light
Detection and Ranging Sensors, and their related meshes. Bag creation is carried
out via an allocated ASC HPC job. The metadata required for a DAP
publication is created and the bag transferred to the DAP staging area via SFTP (SSH
File Transfer Protocol). After publication of the DAP collection, the data from
PlantScanTM for the given project becomes discoverable via DAP. In addition,
experiment reports, published papers, and sensor configurations can either be
made accessible via a DAP collection’s “related materials” links, other metadata
fields, or within the collection’s data (e.g. bag).
4.3</p>
      </sec>
      <sec id="sec-4-2">
        <title>Heterogeneous data streams</title>
        <p>
          PlantScanTM is a medium throughput high resolution phenotyping platform,
which brings together a number of imaging sensors–light detection and ranging,
far-infrared imaging, and multi-wavelength imaging–to non-invasively measure
plant growth and function using in-silico approaches. Raw data is captured with
its contextual information (e.g. system configuration, time of acquisition, batch
number and project) and is stored in a purpose-built database as the data is
being generated. The various data streams are collated and used to produce full
3D representation of each plant with overlaid spectral information. The metadata
collected during image acquisition are necessary inputs for the computer vision
techniques which are used to create the 3D representation of the plant. The 3D
meshes are then automatically segmented in order to semantically identify the
different parts of the plants [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. A longitudinal 3D matching pipeline for plant
mesh parts is then used to evaluate temporal changes at the whole plant and/or
organ level.
4.4
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>Metadata</title>
        <p>
          Each acquisition on PlantScanTM includes metadata (in addition to the raw data
streams), such as plant genus and species, project and experiment metadata, a
unique identifier for each image (Globally Unique Identifier), imaging angle,
environmental temperature of the imaging chamber, location of optical and colour
calibration datasets for each acquisition run, and LiDAR calibration files. The
metadata associated with each acquisition is automatically generated when
setting up the configuration on the platform. This information is paramount to
validate and process the raw image data, and for the post-processing phases.
Digitisation systems such as PlantScanTM generate huge amounts of data
including raw image data, registration metadata, sensor configurations and plant
metadata. For example, PlantScanTM generates around 500GB of raw image
data, representing in excess of 200,000 database records, per day. Sufficient
storage space (usually at remote locations) and fast network transfer rates are thus
necessary to facilitate data movement for processing using high performance
computers (HPC). Because an RDF database structure is not suitable for
handling large data sets of images, it is necessary to package the raw information
into elementary units with permanent addresses which could be retrieved using
PODD. The CSIRO DAP [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] and ASC storage and compute facilities [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] are key
resources used by PlantScanTM to process and store bulk data.
5
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Semantic integration</title>
      <p>The PODD ontology enables plant phenomics researchers to link from mesh
results to the raw data that they were generated from. It also allows researchers to
link from both mesh results and their recorded conclusions to shared phenomics
ontologies which describe specific features of the plants. When used together,
this enables scientists to trace the provenance of their results and conclusions
based on well known concepts in phenomics ontologies.</p>
      <p>Subsets of phenomics ontologies such as the Plant Ontology and the Crop
Ontology were mapped into PODD by adding OWL constraints. These constraints
enable PODD to verify that the use of classes and properties from these
ontologies was consistent with the PODD ontology. For example, the Crop Ontology
contains a class defining soil as “Sandy Loam”, giving it the identifier “0000104”.
This was mapped into PODD to define a particular soil sample as being Sandy
Loam using the triple: poddSampleSandyLoamSoil a cropOntology : 0000104.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Semantic publication</title>
      <p>PODD provides a secure mechanism for publishing both human and machine
readable descriptions of scientific experiments. It utilises the well-known DOI
mechanism for publishing raw data files using DAP, and uses HTTP URIs to
publish experiments using the PODD web interface.</p>
      <p>
        Scientific journals increasingly require the data and provenance for articles
to be available in a machine readable format. The DOI registrar that DAP uses,
DataCite [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], was setup to provide unique identifiers for data items that can be
attached to publications, which in turn may have their own DOIs.
      </p>
      <p>
        By providing machine readable descriptions of scientific experiments,
including semantic references to shared ontologies where possible, PODD enables the
output from PlantScanTM to be interpreted and extended by others. The use of
PODD URIs in other RDF documents enables scientists to extend the initial
work using the Linked Data paradigm [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
    </sec>
    <sec id="sec-7">
      <title>Conclusion</title>
      <p>This paper described how the Phenomics Ontology Driven Data repository
inte</p>
      <p>TM
grates with the PlantScan platform and CSIRO Data Access Portal to manage
the complex workflows at the High Resolution Plant Phenomics Centre. This
workflow keeps track of phenomics data, metadata and experimental processes
and also provides a secure mechanism to share and publish scientific experiments
in both human and machine readable formats.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kennedy</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Davies</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hunter</surname>
            ,
            <given-names>J.: PODD</given-names>
          </string-name>
          :
          <article-title>An ontology-driven data repository for collaborative phenomics research</article-title>
          . In Chowdhury, G.,
          <string-name>
            <surname>Koo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hunter</surname>
          </string-name>
          , J., eds.:
          <article-title>The Role of Digital Libraries in a Time of Global Change</article-title>
          . Volume
          <volume>6102</volume>
          of Lecture Notes in Computer Science. Springer Berlin Heidelberg (
          <year>2010</year>
          )
          <fpage>179</fpage>
          -
          <lpage>188</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>CSIRO</surname>
            <given-names>IM</given-names>
          </string-name>
          &amp;
          <article-title>T: CSIRO data access portal</article-title>
          . http://data.csiro.au
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Ashburner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ball</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blake</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Botstein</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Butler</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cherry</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Davis</surname>
            ,
            <given-names>A.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dolinski</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dwight</surname>
            ,
            <given-names>S.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eppig</surname>
            ,
            <given-names>J.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harris</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hill</surname>
            ,
            <given-names>D.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>IsselTarver</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kasarskis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matese</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Richardson</surname>
            ,
            <given-names>J.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ringwald</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rubin</surname>
            ,
            <given-names>G.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sherlock</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Gene ontology: tool for the unification of biology. the gene ontology consortium</article-title>
          .
          <source>Nature Genet</source>
          .
          <volume>25</volume>
          (
          <year>2000</year>
          )
          <fpage>25</fpage>
          -
          <lpage>29</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Avraham</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tung</surname>
            ,
            <given-names>C.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ilic</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jaiswal</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kellogg</surname>
            ,
            <given-names>E.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCouch</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pujar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reiser</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rhee</surname>
            ,
            <given-names>S.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sachs</surname>
            ,
            <given-names>M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schaeffer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stevens</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vincent</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zapata</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ware</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>The plant ontology database: a community resource for plant structure and developmental stages controlled vocabulary and annotations</article-title>
          .
          <source>Nucleic Acids Research 36(suppl 1)</source>
          (
          <year>2008</year>
          )
          <fpage>D449</fpage>
          -
          <lpage>D454</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Fu¨rber,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Hepp</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.:</surname>
          </string-name>
          <article-title>Using sparql and spin for data quality management on the semantic web</article-title>
          . In Abramowicz, W.,
          <string-name>
            <surname>Tolksdorf</surname>
          </string-name>
          , R., eds.
          <source>: Business Information Systems. Volume 47 of Lecture Notes in Business Information Processing</source>
          . Springer Berlin Heidelberg (
          <year>2010</year>
          )
          <fpage>35</fpage>
          -
          <lpage>46</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kunze</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Littman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Madden</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>The bagit file packaging format (v0.97) (April 15</article-title>
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Summers</surname>
          </string-name>
          , E.:
          <article-title>Bagit python software</article-title>
          . https://github.com/edsu/bagit
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. Library of Congress:
          <article-title>Bagit java software</article-title>
          . http://sourceforge.net/projects/locxferutils/files/loc-bagger/
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>CSIRO</surname>
            <given-names>IM</given-names>
          </string-name>
          &amp;
          <article-title>T: CSIRO advanced scientific computing</article-title>
          . https://wiki.csiro.au/display/ASC
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Paproki</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sirault</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berry</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Furbank</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fripp</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A novel mesh processing based technique for 3d plant analysis</article-title>
          .
          <source>BMC Plant Biology</source>
          <volume>12</volume>
          (
          <issue>1</issue>
          ) (
          <year>2012</year>
          )
          <fpage>63</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Brase</surname>
          </string-name>
          , J.:
          <article-title>Datacite - a global registration agency for research data</article-title>
          .
          <source>In: Cooperation and Promotion of Information Resources in Science and Technology</source>
          ,
          <year>2009</year>
          . COINFO '
          <volume>09</volume>
          . Fourth International Conference on. (
          <year>2009</year>
          )
          <fpage>257</fpage>
          -
          <lpage>261</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Berners-Lee</surname>
          </string-name>
          , T. http://www.w3.org/DesignIssues/LinkedData.html (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>