<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using Ontologies to Drive the Creation of High-Quality Metadata in CEDAR</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rafael S. Gonçalves[</string-name>
          <email>rafael.goncalves@stanford.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Csongor I. Nyulas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marcos Martínez-Romero</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin J. O'Connor</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>John Graybeal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mark A. Musen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Stanford Center for Biomedical Informatics Research Stanford University</institution>
          ,
          <addr-line>Stanford, CA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Center for Expanded Data Annotation and Retrieval (CEDAR) developed a suite of tools¾the CEDAR Workbench¾that allows users to build metadata templates using ontologies to annotate template fields and to constrain the options available to metadata authors for specific fields; to fill in those templates with metadata; to upload data and their metadata to online repositories; and to perform searches over the metadata stored in CEDAR's metadata repository. The CEDAR Workbench is released under a BSD 2-Clause opensource license, and it is freely available at https://metadatacenter.org.</p>
      </abstract>
      <kwd-group>
        <kwd>Metadata</kwd>
        <kwd>metadata authoring</kwd>
        <kwd>metadata repository</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        We present the CEDAR [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] software to produce high-quality, structured,
standardsbased metadata. The software we have developed¾the CEDAR Workbench [2]¾is a
suite of Web-based tools and APIs that offers users the ability to build highly-modular
metadata acquisition forms (templates) that can be annotated with ontology terms, and
whose fields can be constrained using terms or branches of terms from ontologies.
Rather than having a single monolithic template, CEDAR allows users to recursively
construct templates from existing, more granular templates. CEDAR template
designers can share templates with individuals or groups—the metadata authors, who
fill in the metadata templates, validate field entries, and submit the metadata to online
repositories. The metadata produced using CEDAR templates are, by design, adherent
to the FAIR data principles [3]. Our goal is ultimately to provide scientists with a
robust, end-to-end software solution to author and to manage high-quality FAIR
metadata about scientific experiments.
      </p>
    </sec>
    <sec id="sec-2">
      <title>The CEDAR Workbench</title>
      <p>The CEDAR Workbench is an open-source Web-based platform for the acquisition,
storage, search, and reuse of metadata templates and metadata instances. At the core
of the CEDAR technology lies a lightweight, standards-based model [4] designed to
provide a common format for describing templates and metadata. All CEDAR
resources are represented as JSON-LD documents that conform to our model, which is
specified by a JSON Schema. These resources can be viewed and retrieved as RDF
documents. Fig. 1 shows an overview of CEDAR.</p>
      <sec id="sec-2-1">
        <title>EXPLORE METADATA</title>
      </sec>
      <sec id="sec-2-2">
        <title>DESIGN TEMPLATE</title>
      </sec>
      <sec id="sec-2-3">
        <title>CREATE METADATA</title>
        <p>Template authors / Scientists
Template authors
(e.g., standards committees)
Scientists
Resource Manager
Template Designer
explore and manage
metadata
create use
Template
&gt; 500
biomedical
ontologies</p>
        <p>Metadata Editor
use create use</p>
        <p>Metadata</p>
      </sec>
      <sec id="sec-2-4">
        <title>CEDAR METADATA REPOSITORY</title>
        <p>validate
Intelligent
Authoring</p>
      </sec>
      <sec id="sec-2-5">
        <title>VALIDATE METADATA</title>
        <p>Metadata
Validator
use use
External
Validator
Schema
Validator</p>
      </sec>
      <sec id="sec-2-6">
        <title>UPLOAD METADATA</title>
        <p>upload</p>
        <p>Metadata
Uploader
The following are the main components of the CEDAR Workbench software.</p>
        <p>Resource Manager. Template authors and scientists who use the CEDAR
Workbench are initially presented with the Resource Manager tool. The Resource
Manager allows users to create and store resources in the CEDAR Metadata Repository;
to organize templates and metadata into folders; and to search for these resources. From
the Resource Manager, users can define groups composed of their team members for
purposes of collaboration. CEDAR users can share resources (with read or write
permissions) among users, among groups, or with the general community.</p>
        <p>Template Designer. Template authors can build metadata templates using the
Template Designer. In the Template Designer, users piece together fields of various
types (e.g., text, checkbox, and multiple choice) to form templates. Possible field
values can be constrained to terms from ontologies using an interactive look-up service
linked to NCBO’s BioPortal [5]. With the BioPortal lookup service (Fig. 2), users can
interactively create new ontology terms (which can be mapped to terms in other
ontologies) and value sets at template design-time for their annotation purposes. The
metadata templates and their fields can be annotated using properties from ontologies.</p>
        <p>Metadata Editor. Scientists generate metadata instances by filling in metadata
templates using the Metadata Editor. This tool builds a metadata-acquisition form
interface from template specifications built in the Template Designer. We implemented
a computer-assisted value recommender [6] in the Metadata Editor that provides
context-sensitive suggestions for field values during metadata submission. The value
recommender learns associations between field values in previous metadata entries
using rule mining, and ranks their applicability to specific fields. The goal of the value
recommender is to ease the burden of authoring high-quality metadata. Metadata
generated through CEDAR templates can be submitted to external repositories, such as
the NCBI BioSample [7] and SRA [8] repositories, or the ImmPort repository for
immunology-related datasets [9].</p>
        <p>The CEDAR Workbench can be used through the Web-based components described
above, or using the CEDAR API¾a collection of REST-based services that provide
comprehensive access to the CEDAR ecosystem. The API allows creating, reading,
updating, and deleting CEDAR resources programmatically. With this API, users can
also export templates or metadata to other repositories or applications. All our software
is distributed and versioned on GitHub, at https://github.com/metadatacenter.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Summary</title>
      <p>The CEDAR Workbench provides a comprehensive solution for authoring, validating,
searching, and (re)using metadata. The goal behind CEDAR is to significantly improve
the way scientists work with metadata, and the quality and interoperability of the
metadata that they create. We meet this goal by equipping the community with a
collaborative platform to build standards-based metadata templates that use ontologies as
sources for standard terms, and to author and submit high-quality metadata to online
repositories. CEDAR’s metadata repository gives scientists a means to search for and
to use metadata templates developed by the community, and to build new ones from
scratch or based on existing templates. CEDAR allows its users to submit their
metadata to external repositories, such as NCBI databases. We are working to allow
our users to submit metadata to an increasing number of external repositories.
Acknowledgements
CEDAR is supported by NIAID grant U54 AI117925 through funds provided by the
trans-NIH Big Data to Knowledge (BD2K) initiative. The NCBO BioPortal has been
supported by the NIH Common Fund under grant U54HG004028.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[1] [2] [3] [4] [5] [6] [7] [8]</source>
          [9]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Musen</surname>
          </string-name>
          et al., “
          <article-title>The center for expanded data annotation and retrieval,”</article-title>
          <string-name>
            <given-names>J.</given-names>
            <surname>Am</surname>
          </string-name>
          . Med.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Informatics</given-names>
            <surname>Assoc</surname>
          </string-name>
          ., vol.
          <volume>22</volume>
          , no.
          <issue>6</issue>
          , pp.
          <fpage>1148</fpage>
          -
          <lpage>52</lpage>
          , Jun.
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Gonçalves</surname>
          </string-name>
          et al., “
          <article-title>The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata that Describe Scientific Experiments,”</article-title>
          <source>in Proc. of International Semantic Web Conference (ISWC)</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>M. D.</surname>
          </string-name>
          Wilkinson et al., “
          <article-title>The FAIR Guiding Principles for scientific data management and stewardship</article-title>
          .,
          <source>” Sci. data</source>
          , vol.
          <volume>3</volume>
          , p.
          <fpage>160018</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Musen</surname>
          </string-name>
          , “
          <article-title>An Open Repository Model for Acquiring Knowledge About Scientific Experiments,”</article-title>
          <source>in Proc. of International Conference on Knowledge Engineering and Knowledge Management (EKAW)</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>N. F.</given-names>
            <surname>Noy</surname>
          </string-name>
          et al.,
          <article-title>“BioPortal: ontologies and integrated data resources at the click of a mouse,” Nucleic Acids Res</article-title>
          ., vol.
          <volume>37</volume>
          , pp.
          <fpage>W170</fpage>
          -
          <lpage>W173</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Martínez-Romero</surname>
          </string-name>
          et al.,
          <article-title>“Fast and Accurate Metadata Authoring Using OntologyBased Recommendations,”</article-title>
          <source>in Proc. of AMIA Annual Symposium</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>T.</given-names>
            <surname>Barrett</surname>
          </string-name>
          et al.,
          <article-title>“BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata,” Nucleic Acids Res</article-title>
          ., vol.
          <volume>40</volume>
          , pp.
          <fpage>D57</fpage>
          -
          <lpage>D63</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>R.</given-names>
            <surname>Leinonen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sugawara</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Shumway</surname>
          </string-name>
          , “The Sequence Read Archive,”
          <source>Nucleic Acids Res</source>
          ., vol.
          <volume>39</volume>
          , no.
          <source>Database</source>
          , pp.
          <fpage>D19</fpage>
          -
          <lpage>D21</lpage>
          , Jan.
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>S.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          et al.,
          <article-title>“ImmPort: disseminating data to the public for the future of immunology</article-title>
          .,
          <source>” Immunol. Res.</source>
          , vol.
          <volume>58</volume>
          , no.
          <issue>2-3</issue>
          , pp.
          <fpage>234</fpage>
          -
          <lpage>9</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>