<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Ontology-based Normalization for Disease- Lab test Relation Extraction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yaoyun Zhang</string-name>
          <email>Yaoyun.Zhang@uth.tmc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jingqi Wang</string-name>
          <email>Jingqi.Wang@uth.tmc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cui Tao</string-name>
          <email>Cui.Tao@uth.tmc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hua Xu</string-name>
          <email>Hua.Xu@uth.tmc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Biomedical Informatics University of Texas at Houston Houston</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <fpage>87</fpage>
      <lpage>89</lpage>
      <abstract>
        <p>-This poster describes our preliminary work on ontology-based normalization for diseases and lab tests, as a fundamental step toward disease-lab test relation extraction. Multiple ontologies are leveraged for this aim. Specifically, diseases and lab tests are first extracted and mapped to the Concept Unique Identifier (CUI) of the Unified Medical Language System (UMLS) by MetaMap. Codes of International Classification of Diseases, Version 9 - Clinical Modification (ICD-9CM) are then employed to further normalize diseases; while the Logical Observation Identifiers Names and Codes (LOINC) are used to normalize lab tests.</p>
      </abstract>
      <kwd-group>
        <kwd>ontology-based normalization</kwd>
        <kwd>normalization</kwd>
        <kwd>lab test normalization</kwd>
        <kwd>relation extraction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>I. INTRODUCTION</title>
      <p>Disease-labtest relation extraction plays an important role
in various medical appliations such as clinical decision-support
systems and phenotype information extraction. However,
mentions of diseases and labtests in text contains diverse
nonstandard variations. Those variants need to be normalizad into
standard codes first to facilitate more universal computational
applications. This poster describes our preliminary work on
ontology-based normalization for diseases and lab tests, as a
fundamental step toward disease-lab test relation extraction.</p>
    </sec>
    <sec id="sec-2">
      <title>Three existing standard ontologies UMLS[1], ICD-9CM[3] and LOINC[4] are leveraged for this aim.</title>
    </sec>
    <sec id="sec-3">
      <title>II. ONTOLOGY OVERVIEW</title>
    </sec>
    <sec id="sec-4">
      <title>Overview</title>
      <sec id="sec-4-1">
        <title>A. UMLS</title>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>UMLS[1] is a thesaurus re-orgizaing many controlled</title>
      <p>vocabularies in the biomedical sciences. It provides a mapping
structure among viraous vocabularies and serves as a
comprehensive ontology of biomedical concepts. For each
concept in UMLS, a synonym list consisting of terms from
multiple vocabularies is collected. For example, “Diabetes</p>
    </sec>
    <sec id="sec-6">
      <title>Mellitus” and “dm” are synonyms of the same CUI C0011849.</title>
    </sec>
    <sec id="sec-7">
      <title>Various ontological relations between CUIs are defined, such as “isa”, “broader”, and “sibling”, etc.</title>
      <p>B. ICD-9CM</p>
    </sec>
    <sec id="sec-8">
      <title>The International Statistical Classification of Diseases and</title>
      <p>
        Related Health Problems (ICD) is the international "standard
diagnostic tool for epidemiology, health management and
clinical purposes"[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] . The ICD provides a hierarchical system
of diagnostic codes for classifying diseases. Major categories
are designed to include a set of similar diseases as
subcategories. For example, “(050.0) Variola major” is a
subcategory of “(050) Smallpox”. Health conditions can be
mapped corresponding generic categories or more specific
subcategories.
      </p>
      <sec id="sec-8-1">
        <title>C. LOINC</title>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>The Logical Observation Identifier Names and Codes</title>
      <p>(LOINC) is the only publicly available universal standard for
laboratory test codes and names[4 5].The current version of the</p>
    </sec>
    <sec id="sec-10">
      <title>LOINC code set (released in June 2014) contains 73,889 terms for lab tests, measurements and clinical observations. Lab tests are orgazied hierarchically into 14 top classes including “Microbiology”, “Blood Bank”, etc.</title>
    </sec>
    <sec id="sec-11">
      <title>III. NORMALIZATION METHOD</title>
    </sec>
    <sec id="sec-12">
      <title>The original mention of disease/lab test is first recognized</title>
      <p>
        and mapped to UMLS concepts by MetaMap[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. If
ICD
      </p>
    </sec>
    <sec id="sec-13">
      <title>9CM/LOINC is among the sources of terms for the mapped</title>
    </sec>
    <sec id="sec-14">
      <title>UMLS concept, then the mention can be normalized to the</title>
      <p>corresponding code and name from ICD-9CM/LOINC directly.
If not, for disease, if SNOWMED CT is one source of terms
for the UMLS concept, the corresponding SNOWMED CT
concept can be mapped to ICD-9CM code by the rule-based
mapping provided by NIH[6]. For lab test, the RELMA
software is employed to map labtest to LOINC code and
name[7]. Fig. 1 and Fig. 2 illustrate the workflow of disease
normalization and lab test normalization, respectivley.</p>
    </sec>
    <sec id="sec-15">
      <title>Multiple variations of diseases and lab tests are normalized</title>
      <p>into standard codes following the workfolow in Fig 1 and Fig</p>
    </sec>
    <sec id="sec-16">
      <title>2. Fig. 3 and Fig. 4 show the examples of disease normalization and lab test normalization, respectivley. The original mentions are mapped to UMLS concept first, and then to ICD-9 CM and LOINC code.</title>
      <p>Bodenreider O. The unified medical language system (UMLS):
integrating biomedical terminology. Nucleic acids research
2004;32(suppl 1):D267-D70.
http://www.nlm.nih.gov/research/umls/mapping_projects/snomedct_to_i
cd9cm_reimburse.html.
http://loinc.org/downloads/relma.
The School of Biomedical Informatics |The University of Texas Health Science Center at Houston</p>
      <sec id="sec-16-1">
        <title>Introduction</title>
        <p>Disease-lab test relation extraction plays an important role in various medical appliations such
as clinical decision-support systems and phenotype information extraction. However,
mentions of diseases and labtests in text have diverse non-standard variations. Those
variants need to be normalizad into standard codes first to facilitate
more universal
computational applications. This poster describes our preliminary work on ontology-based
normalization of diseases and lab tests, as a fundamental step toward disease-lab test
relation extraction.</p>
        <p>
          Ontology Overview
•  UMLS [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]: Re-organized many controlled vocabularies in the biomedical sciences; a
comprehensive ontology of biomedical concepts. E.g., “Diabetes Mellitus” and “dm” are
synonyms of the same concept .
•  ICD-9CM [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]: A hierarchical system of diagnostic codes for classifying diseases. E.g.,
“(050.0) Variola major” is a subcategory of “(050) Smallpox”.
•  LOINC [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]: The only publicly available universal standard for laboratory test codes and
names. E.g., the test of “blood culture” is under the general category “Microbiology” .
        </p>
      </sec>
      <sec id="sec-16-2">
        <title>Method</title>
      </sec>
      <sec id="sec-16-3">
        <title>Results</title>
        <p>•  Precision: 1) General concepts of diseases/lab tests not so valuable practically. E.g., in
relations between “Heart Diseases” and lab tests, “Heart Diseases” include “coronary
artery disease”, “arrhythmias” and “congenital heart defects”, etc. 2) Fail to normalize to
L O N I C b y R E L M A . E . g . , “ a c a n t h o c y t e c o u n t ” - &gt; 5 6 5 - 2 : C O L O N Y
[COUNT]:NUM:PT:XXX:ORD:VC.
•  Coverage: Fail to recognize variants of diseases/lab tests. E.g., “blood film” refers to
“blood smear” .</p>
        <p>This poster presents the preliminary results of our ontology-based normalization of
diseases and lab tests. In the next stage, machine learning methods will be employed for
disease and lab test recognition. General concepts of diseases and lab tests need to be
filtered. The precision of lab test normalization to LOINC also need to be further improved.</p>
      </sec>
      <sec id="sec-16-4">
        <title>Conclusion</title>
        <p>Figure 1 Diagrams of Disease and Lab Test Normalization</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.  Bodenreider O.
          <article-title>The unified medical language system (UMLS): integrating biomedical terminology</article-title>
          .
          <source>Nucleic acids research</source>
          <year>2004</year>
          ;
          <volume>32</volume>
          (
          <issue>suppl 1</issue>
          ):
          <fpage>D267</fpage>
          -
          <lpage>D70</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.  National Center for Health,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <article-title>ICD-9-CM: International Classification of Diseases 9th Revision Clinical Modification</article-title>
          .
          <source>US Department of Health and Human Services</source>
          , Public Health Service,
          <source>Health Care Financing Administration</source>
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>  McDonald</surname>
            <given-names>CJ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huff</surname>
            <given-names>SM</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suico</surname>
            <given-names>JG</given-names>
          </string-name>
          , et al.
          <article-title>LOINC, a universal standard for identifying laboratory observations: a 5-year update</article-title>
          .
          <source>Clinical chemistry</source>
          <year>2003</year>
          ;
          <volume>49</volume>
          (
          <issue>4</issue>
          ):
          <fpage>624</fpage>
          -
          <lpage>33</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>