<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using SNOMED-CT For Translational Genomics Data Integration</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Pediatrics, Stanford University School of Medicine</institution>
          ,
          <addr-line>Stanford, CA/</addr-line>
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Lucile Packard Children's Hospital</institution>
          ,
          <addr-line>Palo Alto, CA/</addr-line>
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Stanford Center for Biomedical Informatics Research, Department of Medicine</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2008</year>
      </pub-date>
      <fpage>91</fpage>
      <lpage>96</lpage>
      <abstract>
        <p>As industrial, governmental, and academic agencies place increasing emphasis on translational research, biomedical researchers are now faced with entirely new challenges in regards to both biomedical data integration and knowledge discovery. There is now both a strong need and a tremendous opportunity to apply translational bioinformatics to address the fundamental challenges in integrating the vast bodies of -omics and clinical data. Here we report on our preliminary work in utilizing SNOMED-CT as both a tool for translational data discovery, and a major component in a framework for the large-scale integration of gene expression microarray data and clinical laboratory data. Annotations from microarray experiments in NCBI GEO were mapped to SNOMED-CT terms using UMLS, and these mappings were joined to clinical laboratory data using ICD9CM to SNOMED-CT mappings within UMLS. We find that microarray experiments characterizing 211 distinct diseases can be mapped to clinical laboratory data measurements for 13,452 distinct patients. We maintain that this work represents critical first steps in providing a foundation for large-scale translational data integration, and underlines the important role that controlled clinical terminologies, such as SNOMEDCT, can play in addressing such problems.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>Our ability to generate high-quality biomolecular data
has advanced at considerably faster rate than our
ability to investigate the data generated. This
imbalance, driven primarily by rapid advances in
high-throughput biological data acquisition
technologies and plummeting per-experiment costs,
has created an entire spectrum of informatics
challenges that are, in many instances, as intangible
and complex as the fundamental biological questions
that these technologies were designed to address. As
a consequence, our ability to formulate and
investigate important biological and medical
questions is currently limited by our ability to
manage and integrate the profusion of biomedical
data.</p>
      <p>Problems in data integration are moving towards the
forefront of biomedical research, driven foremost by
the sheer diversity of measurement technologies now
available, and the tremendous volumes of such
measurements finding their way into the public
domain. The situation is further complicated by the
fact that the majority of the public biomolecular data
is annotated using unstructured free-text, making it
difficult to discern the various biological and medical
contexts of the data in an automated fashion. In
previous work we demonstrated the feasibility of
using controlled terminologies and straightforward
text-mining techniques to elucidate clinical,
environmental, and phenotypic contexts from
freetext annotations associated with public microarray
data1, 2. The establishment of experimental context is
critical to linking genes to environment, phenotype,
and ultimately medicine.</p>
      <p>While most major types of biomolecular data can be
found in the public domain, it is traditionally difficult
for researchers to gain access to clinical data. This is
unfortunate as the data generated on a daily basis by
hospitals and clinicians is perhaps the richest source
of phenotypic biomarker data currently available.
Fortunately modern Electronic Health Record (EHR)
systems such as the Stanford Translational Research
Integrated Database Environment (STRIDE)3 and the
University of Virginia Health System Clinical Data
Repository (CDR)4 grant institutional researchers
access to large volumes of de-identified, quantitative
clinical data in digital form. In recent work, we
demonstrated the utility in applying bioinformatics
methods to quantitative clinical data to draw new
inferences about disease severity5, and elucidate
novel biomarkers6.</p>
      <p>Genome Wide Association studies have revealed that
for many complex diseases, the pathogenesis of the
disease may be facilitated by relatively minor
changes across a large number of genes interacting
through as of yet poorly understood mechanisms7.
These findings have therefore highlighted the
importance of linking biomolecular data with
phenotypic quantifications in order to uncover the
full complexity of disease etiology. Recent work in
integrating these two data types has offered new
insights into disease etiology and pathology with
direct clinical implications. Segal and colleagues
correlated imaging traits from computed tomography
(CT) images of liver cancers with gene expression
data to reconstruct global expression signatures in
cancer tumors that are linked to diagnosis, prognosis
and treatment8. A number of studies have
demonstrated the utility of patient microarrays in
identifying gene expression patterns linked to disease
diagnosis9, subtypes10, 11, outcome12, and treatment13, 14.
As significant as the aforementioned findings are,
their underlying methods are limited by the fact that,
in all instances, they require that the biomolecular
and clinical data be derived from the same patient.
Given the current high costs and logistical
complexities involved in acquiring patient data in a
clinical setting, it would be prohibitively expensive to
scale the same approaches to address the broad
spectrum of human disease. Furthermore, such an
approach implicitly eschews the great wealth of
public biomolecular data readily available.</p>
      <p>A major problem in integrating clinical and
biomolecular data derived from disparate sources is
to identify attributes by which they can be
appropriately joined. This task is complicated by the
fact that the majority of biomolecular data is
annotated around the concepts of genes and gene
products, whereas clinical data is centered on the
concept of a patient. We find one concept shared
among both clinical data and vast amounts of
biomolecular data, and that is the concept of a
disease. Therefore it is possible to integrate
anonymous biomolecular data characterizing an
aspect of a particular disease state with quantitative
clinical data derived from patients being treated for
the same disease.</p>
      <p>Central to this approach is the need for a
comprehensive controlled disease terminology
through which the biomedical and clinical data is
joined in a systematic fashion. In general, we would
want this disease terminology to maximize three
primary criteria: coverage, defined by the number of
unique disease terms defined; expressiveness, which
is the richness of relationships between disease terms;
and resolution, which is the level of detail offered by
the terminology structure. A deficiency in any of
these could negatively impact the amount and
diversity of data that could be integrated, and
potentially limit the types of analyses that can be
performed on the data downstream. There are a
number of well-established disease terminologies in
active use that satisfy the above criteria to varying
degrees. Chief among these are the International
Classification of Diseases (ICD), Medical Subject
Headings (MeSH), and the Systemized Nomenclature
of Medicine-Clinical Term (SNOMED-CT). Each of
these is suited for data integration, yet each of them
present particular pros and cons.</p>
      <p>The ICD terminology, evolved from a lineage that
spans more than 100 years, is the most widely
utilized disease terminology, with widespread
adoption among a large number of major healthcare
providers, the U.S. Federal Government, as well as
the World Health Organization. Consequently, the
majority of clinical data is codified using ICD codes.
Unfortunately the ICD is poorly suited for data
integration as the approximately 14,000 unique terms
codified by ICD is quite small compared to other
terminologies. Furthermore, the ICD is more a
compendium of diagnosis and procedure codes, as it
lacks any significant hierarchical or relational
structure.</p>
      <p>MeSH, which is used primarily for the purpose of
indexing publications, is only slightly larger than
ICD in terms of size with more than 22,000 unique
terms. However, the design of MeSH is much more
structured and diverse compared to ICD. MeSH
terms are arranged into a hierarchy of 14 distinct
toplevel categories that organize terms by Anatomy,
Disease, Chemicals and Drugs, and Geography
among other things. MeSH also contains a set of
qualifier terms that can be used to narrow the
specificity of a descriptor term (e.g.
"Measles/epidemiology"). While MeSH possesses
many of the attributes desirable for translational data
integration, its attributes modest in comparison to
those of SNOMED-CT.</p>
      <p>SNOMED-CT was born from a medical terminology
lineage that traces back more than 75 years, and is
currently in use by pathologists worldwide to perform
precise classifications of human disease15, 16. With
more than 340,000 unique biomedical concepts
organized into 19 relational hierarchies linked by
more than 1.3 million relationships, it is by far the
most expansive and expressive disease terminology
in existence. The sheer number of concepts coupled
with the rich relational architecture in SNOMED-CT
offers attributes superior to other disease
terminologies. For example, SNOMED-CT
establishes that a clear cell carcinoma of the kidney is
both a malignant tumor of the kidney and a malignant
tumor of the retroperitoneum. The ICD version 9
(ICD-9) simply asserts that a malignant neoplasm of
the kidney is a malignant neoplasm of the
genitourinary organs, which is a much coarser
designation. Therefore assert that SNOMED-CT is
currently the best-suited terminology for integrating
biomolecular and clinical data by disease.</p>
      <p>In this study we investigate the feasibility of using
SNOMED-CT to integrate gene expression data from
a public microarray repository with de-identified
clinical laboratory data obtained from a hospital EHR
system by disease. We propose that SNOMED-CT is
well suited for this approach as it is the largest
disease vocabulary currently available. We evaluate
the effectiveness of this approach based on the extent
of data successfully joined.</p>
    </sec>
    <sec id="sec-2">
      <title>METHODS</title>
      <p>
        A high level representation of the data integration
approach is detailed in figure 1. The microarray
experiment data was obtained from the NCBI GEO
FTP site (downloaded 11/2
        <xref ref-type="bibr" rid="ref7">7/2007</xref>
        ), which was parsed
into a relational structure and stored in a MySQL
database. The de-identified clinical laboratory data
was obtained from the Lucile Packard Children’s
hospital via STRIDE as delimited text files. UMLS
release 2007 AA was used as the vocabulary source.
The integration steps were performed as follows.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Mapping microarray experiments to diseases</title>
      <p>Clinically relevant microarray data was identified
using a previously described method17. In brief, we
queried the NCBI Gene Expression Omnibus (GEO)18
to obtain all GEO DataSet experiments with
associated PubMed identifiers. For each PubMed
identifier we obtained the associated MeSH headings
using NCBI eUtils. Each of the MeSH headings was
mapped to a UMLS CUI using the MRCONSO table.
Using the MRSTY table, we obtained the semantic
type identifier (TUI) for the mapped CUIs, and if any
MeSH term is found to have a semantic type among
Injury or Poisoning (T037), Pathologic Function
(T046), Disease or Syndrome (T047), Mental or
Behavioral Dysfunction (T048), Experimental Model
of Disease (T050), or Neoplastic Process (T191) then
the associated experiment is determined to be
disease-associated and therefore clinically relevant.
This resulted in the positive identification of 737
disease-associated experiments.</p>
      <p>The disease-associated experiments are investigated
by a second previously described text-mining
technique that examines GEO DataSet (GDS) subset
annotations to identify when a disease state is being
compared to a normal control state2. GDS are
higherlevel representations of microarray experiment in
which samples are organized into biologically
informative collections known as subsets. The
subsets are representative of the experimental axis
under examination (figure 2). An attempt is made to
map the free-text annotations associated with the
GDS subsets to SNOMED-CT disease terms using
UMLS concepts. These mappings are subsequently
manually reviewed for accuracy, where erroneous
codifications are corrected if found.</p>
    </sec>
    <sec id="sec-4">
      <title>Mapping patient laboratory data to diseases</title>
      <p>Clinical laboratory data for pediatric patients from
the Lucile Packard Children’s Hospital was obtained
digitally from the STRIDE system. All of the
laboratory measurements were received pre-encoded
with ICD-9 codes. These ICD-9 codes were mapped
to SNOMED-CT codes by first querying UMLS to
find the CUI identifier associated with the ICD-9
code. We then took advantage of the
interterminology mappings provided by the UMLS
(MRMAP) table to translate the ICD-9 codes into
SNOMED-CT concepts using associated CUIs.</p>
    </sec>
    <sec id="sec-5">
      <title>Joining the microarray and patient lab data by disease</title>
      <p>The GDS subsets with mappings to SNOMED-CT
disease CUIs were joined with the clinical laboratory
data using the UMLS CUIs derived from mapping
the ICD-9 codes to SNOMED-CT terms using the
UMLS MRMAP table. Of the 238 unique disease
concepts mapped to the microarray data, 90% were
mapped to quantitative clinical laboratory data for at
least one patient.</p>
    </sec>
    <sec id="sec-6">
      <title>RESULTS</title>
      <p>Using automated methods, were able to identify 737
GDS microarray experiments in NCBI GEO related
to human disease. The GDS subsets were
investigated for terms related to UMLS concepts that
were linked to a SNOMED-CT disease term,
resulting in the identification of 238 unique human
disease concepts. In total, 29,451 microarray samples
were codified with SNOMED-CT disease identifiers.
Note however that method was restricted to include
only those GDS for which a disease and normal
control subset could be identified. This restriction
ensures that a disease vs. normal vector of change can
be extracted from the data to establish a baseline
disease expression signature for downstream
analysis.</p>
    </sec>
    <sec id="sec-7">
      <title>SNOMED</title>
    </sec>
    <sec id="sec-8">
      <title>Terms</title>
      <p>ICD9CM</p>
    </sec>
    <sec id="sec-9">
      <title>Terms</title>
    </sec>
    <sec id="sec-10">
      <title>Disease</title>
      <sec id="sec-10-1">
        <title>Allergic</title>
        <p>asthma</p>
      </sec>
      <sec id="sec-10-2">
        <title>Asthma</title>
        <p>Allergic
asthma NEC
Esophageal
Reflux
H. pylori
infection</p>
      </sec>
      <sec id="sec-10-3">
        <title>Colitis</title>
        <p>Primary
Hypertension</p>
      </sec>
      <sec id="sec-10-4">
        <title>Hypertension</title>
      </sec>
      <sec id="sec-10-5">
        <title>Obesity</title>
        <p>Type 1
diabetes
1
1
1
1
1
1
1
1
2
1
1
1
1
1
2
1
1
1
1
1</p>
        <p>Ind
2240
2240
2240
1895
1322
1299
1017
1017
1010
843</p>
        <p>We retrieved quantitative clinical laboratory data
representing diagnostic biomarkers for 49,414
patients across 9,997 distinct diagnosis codes. These
codes mapped to 20,049 distinct UMLS CUIs. It is
interesting to note that in mapping ICD to UMLS we
find that twice as many UMLS concepts as ICD-9
terms are found. This likely resulted from the fact
that ICD-9 is generally a more high-level
terminology, and therefore terms related to rare
genetic disorders, for example, may only be
represented by one ICD-9 code, whereas UMLS may
allow for more fine-grained attribution of specific
rare genetic disorders.</p>
        <p>In joining the ICD-9 disease codes from the clinical
laboratory data to the microarray data using
SNOMED-CT disease codes, we find that 211 of the
unique disease concepts annotating the microarray
data can be mapped to clinical laboratory data. In
total, clinical laboratory data for 13, 452 patients was
mapped to SNOMED-CT disease codes that were
used to annotate the microarray GDS experiments.
Table 1 shows the top diseases by the number of
patients mapped.</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>SNOMED</title>
    </sec>
    <sec id="sec-12">
      <title>Terms</title>
      <p>ICD9CM</p>
    </sec>
    <sec id="sec-13">
      <title>Terms</title>
    </sec>
    <sec id="sec-14">
      <title>Disease</title>
      <sec id="sec-14-1">
        <title>Follicular</title>
        <p>lymphoma
Hamman-Rich
syndrome
Mycobacterial
infection
Mixed
hyperlipidemia</p>
      </sec>
      <sec id="sec-14-2">
        <title>Hepatoma</title>
        <p>Fetal alcohol
syndrome
Diabetic
nephropathy
Megakaryocytic
leukemia
Acute monocytic
leukemia
Status epilepticus
4
4
3
3
3
3
3
2
2
2
3
2
2
2
2
1
2
2
1
1</p>
        <p>Ind
136
18
26
90
67
10
30
125
7
84</p>
        <p>As evident from the data listed in table 1, there are
cases in which distinct SNOMED-CT terms will map
to the same ICD-9 term. To explore the ambiguities
of mapping terms between the SNOMED-CT and
ICD-9 using CUIs, we investigated the overall
pattern of the mapping cardinalities. Table 2 shows
cases in which a single UMLS CUI maps to multiple</p>
        <p>SNOMED-CT terms. This could indicate that there
is some degree of ambiguity in the SNOMED-CT to
ICD-9 UMLS mappings, and perhaps a dampening of
SNOMED-CT term resolution when using UMLS
concepts.</p>
        <p>To better understand the influence of UMLS CUI
definitions with regards to source identifier
consolidation, we calculated summary statistics for
several terminologies with UMLS and restricted the
results to CUIs representing a disease. The summary
statistics are listed in table 3.</p>
      </sec>
    </sec>
    <sec id="sec-15">
      <title>Source</title>
      <sec id="sec-15-1">
        <title>SNOMED-CT ICD-9-CM NCI MeSH</title>
        <p>The profusion of large public data repositories of
genome-scale measures, coupled with the pressing
imperative to translate such data into medicine, has
precipitated the need to develop informatics tools and
techniques for integrating disparate forms of
biomolecular and clinical data. The purpose of this
investigation was to explore the feasibility of using
SNOMED-CT for such integrative efforts. We
assessed the feasibility of SNOMED-CT as a
translational joining factor by using it to integrate
anonymous gene expression data from a public
microarray repository with de-identified clinical
laboratory data by disease.</p>
        <p>We find that SNOMED-CT is effective as a disease
terminology for integrating these two types of
biomolecular and clinical data. The cases in which
microarray data could not be mapped to clinical
laboratory data largely reflect the fact that only
pediatric data was used. The unmapped terms
contain diseases such as Parkinson’s disease,
macular degeneration, Alzheimer’s disease and other
diseases not generally found in children. Other failed
mappings represent relatively rare disorders, such as
Yersiniosis and Luteoma. Better mappings might be
obtained by leveraging the relational structure of
UMLS to map terms that are parent or child
relationships to the disease terms.</p>
        <p>The many-to-many and many-to-one SNOMED-CT
to ICD-9 mappings using UMLS CUIs do present an
interesting problem. These could lead to ambiguities
in the mappings such that a highly specific disease
variant is mapped to a more generalized disease
category. This could have a negative impact on the
downstream utilization of the integrated data. The
data in table 3 suggests that large source vocabularies
like SNOMED-CT have been constrained and
compressed by the smaller vocabularies within
UMLS to the degree that original source vocabulary
resolution is lost. This may suggest and alternative
strategy in which the biomolecular samples are
labeled only with SNOMED-CT identifiers and the
translation between SNOMED-CT and ICD-9 is
performed outside of UMLS CUI constraints.
There are several caveats in the interpretation of the
results. First off, the data sets were not generalized
in that the clinical laboratory data only represented
pediatric patients and the microarray experiments
were limited to those in which a disease and a normal
control distinction was evident. Furthermore, this
study offered only a focus on SNOMED-CT and did
not apply the same techniques to the alternative
disease terminologies mentioned to offer any
quantitative comparison. Although the investigation
revealed that SNOMED-CT was capable of joining
the two data types, it offers no statistical
characterization of the joining to assess its overall
quality and reliability. Of course we also
acknowledge that the text mining aspects of this
approach are prone to errors, such as miscodings of
the data.</p>
        <p>The results demonstrate that current and future
translational data integration endeavors can leverage
existing clinical terminologies, such as
SNOMEDCT, to integrate clinical and biomolecular data types
and shift valuable efforts to downstream discovery.
Furthermore, this study provides support for the
continued development and use of SNOMED-CT for
translational data integration, and brings to light the
importance inter-terminology mappings resources
such as UMLS. As demonstrated by our own work,
and the work of others, the straightforward act of
integrating data from the molecular and clinical
worlds can have profound and direct impact on
human health.</p>
        <p>Although our initial work focused on the integration
of microarray data and patient lab data specifically,
we are now working to expand the application of the
underlying system to integrate additional data types.
In order to integrate new forms of biomolecular data
into our current framework we must develop
improved text-mining methods to map the underlying
experimental data to SNOMED-CT identifiers. From
the clinical perspective we will continue to integrate
new data obtained from the STRIDE system and look
to incorporate additional clinical data types as well.</p>
        <p>We must also develop methods to test and improve
the reliability of the clinical data, as hospital workers
will inevitably miscode a small percentage of the
data. We must also account for the fact that the
application of clinical codes is subject to a number of
non-scientific influences, such as hospital billing
policies, insurance companies, and pharmaceutical
regulations. Any future work in this area should also
entail the development of statistical metrics to
evaluate the joining terminology, such that a
principled decision can be made to identify the most
appropriate terminology for a particular integration
scenario.</p>
      </sec>
    </sec>
    <sec id="sec-16">
      <title>ACKNOWLEDGEMENTS</title>
      <p>This work was supported in part by the Lucile
Packard Foundation for Children’s Health, National
Library of Medicine (K22 LM008261), National
Institute of General Medical Sciences (R01
GM079719), National Human Genome Research
Institute (P50 HG003389), Howard Hughes Medical
Institute, and the Pharmaceutical Research and
Manufacturers of America Foundation. The authors
would also like to thank Alex Skrenchuck for High
Performance Computing support.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Butte</surname>
            <given-names>AJ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kohane</surname>
            <given-names>IS</given-names>
          </string-name>
          .
          <article-title>Creation and implications of a phenome-genome network</article-title>
          .
          <source>Nature biotechnology</source>
          . 2006 Jan;
          <volume>24</volume>
          (
          <issue>1</issue>
          ):
          <fpage>55</fpage>
          -
          <lpage>62</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Dudley</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Butte</surname>
            <given-names>AJ</given-names>
          </string-name>
          .
          <article-title>Enabling Integrative Genomic Analysis of High-Impact Human Diseases Through Text Mining</article-title>
          .
          <source>Pacific Symposium on Biocomputing</source>
          .
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>3. STRIDE. [http://stride.stanford.edu/STRIDE/]</mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>4. CDR. [https://cdr.virginia.edu/]</mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Chen</surname>
            <given-names>DP</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weber</surname>
            <given-names>SC</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Constantinou</surname>
            <given-names>PS</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ferris</surname>
            <given-names>TA</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lowe</surname>
            <given-names>HJ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Butte</surname>
            <given-names>AJ</given-names>
          </string-name>
          .
          <article-title>Clinical Arrays of Laboratory Measures, or "Clinarrays", Built from an Electronic Health Record Enable Disease Subtyping by Severity</article-title>
          .
          <source>AMIA Annual Symposium Proceedings</source>
          .
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Chen</surname>
            <given-names>DP</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weber</surname>
            <given-names>SC</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Constantinou</surname>
            <given-names>PS</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ferris</surname>
            <given-names>TA</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lowe</surname>
            <given-names>HJ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Butte</surname>
            <given-names>AJ</given-names>
          </string-name>
          .
          <article-title>Novel Integration of Hospital Electronic Medical Records and Gene Expression Measurements to Identify Genetic Markers of Maturation</article-title>
          .
          <source>Pacific Symposium on Biocomputing</source>
          .
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Pickrell</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clerget-Darpoux</surname>
            <given-names>F</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bourgain</surname>
            <given-names>C</given-names>
          </string-name>
          .
          <article-title>Power of genome-wide association studies in the presence of interacting loci</article-title>
          .
          <source>Genetic epidemiology. 2007 Nov;31</source>
          (
          <issue>7</issue>
          ):
          <fpage>748</fpage>
          -
          <lpage>62</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Segal</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sirlin</surname>
            <given-names>CB</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ooi</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Adler</surname>
            <given-names>AS</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gollub</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            <given-names>X</given-names>
          </string-name>
          , et al.
          <article-title>Decoding global gene expression programs in liver cancer by noninvasive imaging</article-title>
          .
          <source>Nature biotechnology</source>
          . 2007 Jun;
          <volume>25</volume>
          (
          <issue>6</issue>
          ):
          <fpage>675</fpage>
          -
          <lpage>80</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Ramaswamy</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tamayo</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rifkin</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mukherjee</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yeang</surname>
            <given-names>CH</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Angelo</surname>
            <given-names>M</given-names>
          </string-name>
          , et al.
          <article-title>Multiclass cancer diagnosis using tumor gene expression signatures</article-title>
          .
          <source>Proceedings of the National Academy of Sciences of the United States of America. 2001 Dec</source>
          <volume>18</volume>
          ;
          <volume>98</volume>
          (
          <issue>26</issue>
          ):
          <fpage>15149</fpage>
          -
          <lpage>54</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Pandita</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zielenska</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thorner</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bayani</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Godbout</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Greenberg</surname>
            <given-names>M</given-names>
          </string-name>
          , et al.
          <article-title>Application of comparative genomic hybridization, spectral karyotyping, and microarray analysis in the identification of subtype-specific patterns of genomic changes in rhabdomyosarcoma</article-title>
          .
          <source>Neoplasia</source>
          (New York, NY.
          <year>1999</year>
          Aug;
          <volume>1</volume>
          (
          <issue>3</issue>
          ):
          <fpage>262</fpage>
          -
          <lpage>75</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Lapointe</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Higgins</surname>
            <given-names>JP</given-names>
          </string-name>
          ,
          <string-name>
            <surname>van de Rijn</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bair</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montgomery</surname>
            <given-names>K</given-names>
          </string-name>
          , et al.
          <article-title>Gene expression profiling identifies clinically relevant subtypes of prostate cancer</article-title>
          .
          <source>Proceedings of the National Academy of Sciences of the United States of America. 2004 Jan</source>
          <volume>20</volume>
          ;
          <issue>101</issue>
          (
          <issue>3</issue>
          ):
          <fpage>811</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Chen</surname>
            <given-names>HY</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            <given-names>SL</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            <given-names>CH</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            <given-names>GC</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            <given-names>CY</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yuan</surname>
            <given-names>A</given-names>
          </string-name>
          , et al.
          <article-title>A five-gene signature and clinical outcome in non-small-cell lung cancer</article-title>
          .
          <source>The New England journal of medicine</source>
          .
          <source>2007 Jan</source>
          <volume>4</volume>
          ;
          <issue>356</issue>
          (
          <issue>1</issue>
          ):
          <fpage>11</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Potti</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dressman</surname>
            <given-names>HK</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bild</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riedel</surname>
            <given-names>RF</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chan</surname>
            <given-names>G</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sayer</surname>
            <given-names>R</given-names>
          </string-name>
          , et al.
          <article-title>Genomic signatures to guide the use of chemotherapeutics</article-title>
          .
          <source>Nature medicine</source>
          .
          <source>2006 Nov</source>
          ;
          <volume>12</volume>
          (
          <issue>11</issue>
          ):
          <fpage>1294</fpage>
          -
          <lpage>300</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Komatsu</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hiyama</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tanimoto</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yunokawa</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Otani</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ohtaki</surname>
            <given-names>M</given-names>
          </string-name>
          , et al.
          <article-title>Prediction of individual response to platinum/paclitaxel combination using novel marker genes in ovarian cancers</article-title>
          .
          <source>Molecular cancer therapeutics</source>
          .
          <source>2006 Mar;5</source>
          (
          <issue>3</issue>
          ):
          <fpage>767</fpage>
          -
          <lpage>75</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>15. SNOMED Intl. [http://www.snomed.org]</mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Chute</surname>
            <given-names>CG.</given-names>
          </string-name>
          <article-title>Clinical classification and terminology: some history and current observations</article-title>
          .
          <source>J Am Med Inform Assoc</source>
          . 2000 May-Jun;
          <volume>7</volume>
          (
          <issue>3</issue>
          ):
          <fpage>298</fpage>
          -
          <lpage>303</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Butte</surname>
            <given-names>AJ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            <given-names>R</given-names>
          </string-name>
          .
          <article-title>Finding disease-related genomic experiments within an international repository: first steps in translational bioinformatics</article-title>
          .
          <source>AMIA Annual Symposium proceedings / AMIA Symposium</source>
          .
          <year>2006</year>
          :
          <fpage>106</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Barrett</surname>
            <given-names>T</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suzek</surname>
            <given-names>TO</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Troup</surname>
            <given-names>DB</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wilhite</surname>
            <given-names>SE</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ngau</surname>
            <given-names>WC</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ledoux</surname>
            <given-names>P</given-names>
          </string-name>
          , et al.
          <article-title>NCBI GEO: mining millions of expression profiles--database and tools</article-title>
          .
          <source>Nucleic acids research</source>
          .
          <source>2005 Jan</source>
          <volume>1</volume>
          ;
          <fpage>33</fpage>
          (Database issue):
          <fpage>D562</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>