<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Ontology challenges for the stem cell community: towards integrative data mining in the Stemformatics atlas</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Chris Pacheco Rivera</string-name>
          <email>chris.pacheco@unimelb.edu.au</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rowland Mosbergen</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Othmar Korn</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tyrone Chen</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Isha Nagpal</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christine A. Wells</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Australian Institute for Bioengineering and Nanotechnology, The University of Queensland</institution>
          ,
          <addr-line>Building 75 Cnr College Rd &amp; Cooper Rd, Brisbane, QLD 4072</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Centre for Stem Cell Systems, Department of Anatomy and Neuroscience, MDHS, The University of Melbourne</institution>
          ,
          <addr-line>30 Royal Parade Parkville, Melbourne, VIC 3010</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>The Walter and Eliza Hall Research Institute</institution>
          ,
          <addr-line>1G Royal Parade, Parkville, Melbourne, VIC 3010</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p />
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Stemformatics (www.stemformatics.org) is a web-based pocket dictionary
targeted to stem cell biologists with limited knowledge in bioinformatics. It
holds a growing collection of manually-curated and high-quality public stem
cell datasets. It allows easy visualisation and comparison of gene expression
profiles across different platforms from different laboratory sources in
mouse and human. Stemformatics hosts &gt;344 public datasets, with &gt;7060
human and &gt;1853 mouse samples.</p>
      <p>We have a large set of curated data, primarily transcriptome, including
microarray and RNAseq, as well as unconventional “omics” platforms such
as ChIPSeq, miRNA, proteomics, and metabolomics data. Stem cell
metadata fall into two broad categories – (1) the description of endogenous
stem cells, isolated using cell surface proteins and characterised on their
originating tissue or developmental stage. (2) in vitro derived cells, including
a variety of reprogrammed, as well as directed differentiation protocols
aimed at recapitulating a specific class of cell.</p>
      <p>Here, we review the challenges of adapting ontology standards to fit a stem
cell framework and implementation in Stemformatics. Our aim is to develop
a stem cell ontology that can describe different cell types and provide
information of their biological background. Stemformatics has started to
standardize specific naming conventions to differentiate several cell types.
Building a dictionary of stem cell types and their integration into existing
ontology resources will be included in the near future.</p>
      <p>Annotation of samples metadata is a difficult task when it involves
description of synthetic cells whose provenance is hard to capture using
existing anatomical ontologies. Induced pluripotent stem cells do not have a
developmental equivalent, because these are artificially transformed from
mature cell types, such as a skin or blood biopsy. Equally problematic is the
description of samples in intermediate states (mid-reprogramming, or
middifferentiation) as these include cell states that have not been defined before
and do not have a developmental or anatomical equivalent. Our ontologies
must capture information about the source, manipulation, characterisation of
the starting materials, as well as any transformation to a new cell type in the
laboratory.</p>
      <p>Stemformatics hosts a large amount of primary data, which leads to
challenges in data aggregation and downstream analysis if sample
annotations are not well standardised. Dealing with several related cell types
and cell lines increases the complexity of this problem. Historically, we have
had several annotators with different backgrounds who have inadvertently
introduced inconsistencies because of a lack of standardised ontology, the
rapid pace of change in the field, and a lack of appropriate resources to cross
check new samples against existing ontologies.</p>
      <p>Large-scale gene expression profiling approaches are used by the stem cell
community to for the purposes of bench-marking cell types, defining stem
cell states, and characterising molecular networks including predictions of
cell-cell and molecular relationships. Deep mining of Stemformatics datasets
have facilitated the identification of novel cell types, resolved questions
about phenotype similarities between stromal subpopulations, and identified
genes involved in maintenance of pluripotency and the differentiation to
embryonic lineages.</p>
      <p>Stemformatics facilitates data visualisation including interactive graphs like
Yugene, where the ranking of all samples can be visualised across a single
gene. Furthermore, the Rohart Mesenchymal Stromal Cells (MSC) test is an
example of using well-curated data and metadata to create an algorithm to
classify stem cells behaving like MSCs.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>