<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Improving Content Based Recovery on a Radiological Reports Database</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Fábio Alexandrini</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mariana Kessler Bortoluzzi</string-name>
          <email>2kesslerb@uni-trier.de</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aldo von Wangenheim</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>The present effort focuses on developing a method for assisting the representation of radiological reports, written in simplified natural language, in a standardized and content recovery prone structure, such as DICOM Structured Report. Sample reports were collected and have being analyzed. The work is currently in process, but an intermediary representation was already reached and is being evaluated by physicians to attest the accuracy of the results.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Most health care institutions have a precious legacy of clinical reports written in
natural language, or simplified grammatical structure. Unfortunately content based
retrieval of information from these reports is inefficient due to the peculiarities of
natural language. This prevents institutions from sharing clinical records without
waste of precious time and resources.</p>
      <p>
        The present research effort focuses on the development of methods based on
knowledge about normal findings in radiological examinations [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and the
Systematized Nomenclature of Medicine - SNOMED [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] with the objective of
translating thoracic radiological reports into a representation more suitable for content
recovery and that can be, in further work, rendered into reports compliant to
internationally accepted standards such as DICOM Structured Report [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. A set of
radiological reports, provided by a Brazilian and a German health care institution, was
used as source of sample subjects for the experiment.
The natural language processing involves the development of intelligent computer
systems that deals with problems in microworlds, application limited domains
characterized by the search of most appropriated technique to solve each sort of
problem. It seeks to develop systems capable to solve complex problems composed
by distinguished tasks [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Some subfields of infomation retieval rely on a training
corpus of documents that have been classified as either relevant or non-relevant to a
particular situation, in text categorization or attempts to assign documents to two or
more pre-defined categories [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>In the same way there must be a more appropriated technique to solve each task,
where several tools of AI and other areas of knowledge are combined in only one
intelligent system that manages the efforts to solve tasks.</p>
      <p>Clinical report, especially in radiology contains information concerning a patient’s
medical condition. However, a great percentage of this information is not structured,
for it’s free text based, that consequently makes it more difficult to search, analyze,
summarize and present.</p>
      <p>
        Previous studies have shown the potential benefits of medical structured data for
the practice, research and medical teaching. The information’s structure can be used
to help organize and improve the medical record presentation [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
Specialist systems can use the structured information of Clinical report to decision
support [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. For research and teaching, structured Clinical Reports can
extremely improve recall and the precision of recovering information’s tasks. Only
structured data are accessible to the cause, space, time and evolutionary advanced
database that models the technique being developed on computing and medical
informatics fields [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>
        Other systems that seek to enlarge the semantic contents of ontology to guide the
knowledge discovery on database and also analyze the variables types that the user
checks [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Systems that work on evaluation and comparison of terminology models
use of diagnosis concepts for clinical terms terminology for integration, as SNOMED,
have obtained success with semantics categories of ECS – European Committee for
Standardization of structured categories and ISO reference model of terminology [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
But there are also problems concerning information quality, one of the main factors of
successfully or unsuccessfully cases of application or methodologies proposed.
      </p>
      <p>
        One fact that deserves to be singled out is the lack of works directed to
applications in the Portuguese Language, face to the fact that most terminologies are
found mostly in English, having a few versions to other languages as German, French
and Spanish. Portuguese is the eighth most spoken language on the planet, third
among occidental languages, after English and Spanish [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-2">
      <title>The DICOM SR Standard and the SNOMED Nomenclature</title>
      <p>SNOMED is a widely accepted terminology and infrastructure designed to enable the
sharing of health care knowledge, across clinical specialties and sites of care. It
contains, preferred medical terms and concepts, consisting of more than 144,000
terms and term codes divided into eleven: Topography; Morphology; Function;
Living Organisms; Chemicals, Drugs, and Biological Products; Physical Agents,
Activities and Forces; Social Context; Diseases/Diagnoses; Procedures; and general
Linkage Modifiers. SNOMED is available in several languages including German.
Unfortunately there is not yet an available translation of the SNOMED terms for
Portuguese. Thus, in order to make this experiment possible, the basic terms for
thoracic radiological exams were translated by collaborator Brazilian physicians.</p>
      <p>
        The DICOM SR standard sets out rules that define how structured documents that
contain health information should be composed, stored and transmitted. These make
use of a controlled terminology; which enhances the results of content based retrieval
[
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
4
      </p>
    </sec>
    <sec id="sec-3">
      <title>Methods</title>
      <p>The sample thoracic reports, 315 of them in German and 7719 in Portuguese, were
analyzed. In the sample reports a common structure was identified. A heading
containing the identification information such as of the physician, patient, and
institution in which the report was emitted, followed by a description of the
techniques used to perform the exam, such as type of procedure. The main part of the
report is the body in which the conclusions drawn from the exam can be found. This
will be the focus of the effort to interpret, retrieve and codify content.</p>
      <p>
        The language used by physicians to describe findings in radiological examinations
in both the institutions can not be considered indeed natural language. It is rather a set
of short sentences normally without the use of verb, escaping the linguistic
conventional rules. These, often either affirm of deny the existence of a malformation
or alteration in an anatomical structure, the presence of strange bodies or illnesses,
adding observations regarding the localization, shape or appearance of the anatomical
structure. For each type of clinical report of radiological examination there are
specific anatomical structures of interest [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>The subject of most sentences found in thorax radiological reports are anatomical
structures such as mediastinum, lung, trachea and others. The subject is often
followed by adjectives stating the position, or the part of the organ that was observed,
for instance surface or right. Adverbs, describing morphological information follow,
such as normal or reduced. To this, might follow information regarding diseases, like
for example, the existence of emphysema and adverbs indicating the degree of
severity of the disease. For the analysis and structuring process ware used natural
language processing techniques, lexical, syntactic, semantics and of speech analysis
combined with information from the specialist physicians. Because the medical
language and usually language have a great difference, normality the language in the
clinical reports are affirmative or negative phrases with few or without verb.
5</p>
    </sec>
    <sec id="sec-4">
      <title>Structuring Approach</title>
      <p>
        The sample radiological reports from Dataflex and word documents base were
converted to simple text ASCII standard representation in order to disregard
formatting information. Afterwards, the GNU Aspell free tool [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], available in
several languages, was used to perform a lexical checking.
      </p>
      <p>For the natural language processing, the text file is separated in sentences
observing punctuation and end of line commands. Later the words are separated for
the lexical analysis. Using a former developed dictionary, the words are classified by
their grammatical class, such as substantive, adjective and others. The sentences and
words are the input for the syntactic and semantics analysis. Each word has a function
in the sentence, but we search first the terms for anatomical structures, diseases and
morphology are then matched to the appropriate SNOMED codes.</p>
      <p>After the first tests of classification, there was found a great amount of adjectives
that derived from substantives and for that reason had a huge importance, needing
then a special treatment, for were directly referred to anatomies. See the table 1. For
each sentence without an anatomical SNOMED matching term, a speech analysis is
used, the algorithm searches an anatomical term from the previous phrases. When no
term is found, the code of anatomical term from the clinical report’s title is used.
Earlier the use of specific rules for each type of exam was used, but due to the
complexity and difficulty to make a great number of rules, computationally were
studied the resemblance and differences among the medical reports and the sentence
structures contained on those. Despite the diversity of sentences as well as in numbers
of words as in the way of position or construction, the most important information to
be extracted are the anatomical region (where?), the diseases or alterations (what?),
and the qualifications of those (how?). Additionally can be used the position into
anatomical region and general modifiers, such as right, left, bilateral, surface,
superficial and others.</p>
      <p>The terms for anatomic regions are searched in the “Topography” branch of
SNOMED. If a word regarding position is along with the anatomic region, that also
can be converted to one of the terms in the “General Linkage and Modifiers” branch
of the SNOMED hierarchy, and the diseases or alterations (what?) can be converted
in categories “Morphology“, “Diseases/Diagnoses“ or “Function“. In this part of the
process we obtained the support from the Teachers and Resident Doctors in General
Surgery from HRAV (Hospital Regional Alto Vale).</p>
      <p>In association with the overall are found the terms of qualification (how?) such as:
normal, intact, anatomic, etc, and can also be combined with the denial as: without,
absent, etc, that indicate the absence of problems, or their intensity such as: minimum,
light, acute, chronic, etc. In the following example of analysis, on the first original
sentence of the chart, there are three anatomic regions and one qualification, that
would generate three sentences according to the where and how items, in a relation N
anatomies to 1 qualifier, and the word “bronchi” also applies to “lobar”.</p>
      <p>On the second sentence is noticed the relation of 1 to 1 and on the last sentence is
found a relation of 1 anatomy to N qualifiers, besides finding the pleural derived
object that needs to be converted for the substantive Pleura added by the adjective
surface, that can be found respectively in the “Topography” and “General Linkage
Modifiers” branches of SNOMED. For each anatomical SNOMED term and adjective
in a phrase, a new sentence must be created; this process can be visualized on the
Table 2.
The language used by physicians to describe findings in radiological examinations in
both collaborator hospitals is a simplified restricted one. Nevertheless, there is much
research still to be done before a reliable representation of such reports in a standard
such as DICOM SR can be obtained.</p>
      <p>The objective of this work is not to produce software capable of performing a
complete automatic translation of clinical reports written in natural language to a
standardized form, but to develop an approach to facilitate de process of structuring
documents, so that these are better suited for content based retrieval.</p>
      <p>Although it is currently being tested using reports written in Portuguese and
German, the same method slightly adapted is expected to function with other
languages. The results so far reached are at the present time subject to the evaluation
of the physicians to attest the validity and suggest better approaches.
The authors thank the physicians of the Chirurgic Residence of the Upper Itajaí
Valley Regional Hospital in Brazil for the translation of Radiological SNOMED
terms to Portuguese and for the anonymized sample reports provided for the analysis.
The Buddenbrock Blasinger und Benz Radiological Clinic, in Mainz, Germany,
contributed providing anonymized sample reports. We also thank the German
Academic Exchange Service -DAAD- for the scholarship number A/03/42304 granted
to one of the authors.
online</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Möller</surname>
          </string-name>
          , Torsten B.:
          <article-title>Normal Findings in Radiology</article-title>
          . Georg Thieme Verlag,
          <year>2000</year>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. College of American Pathologists: SNOMED - Systematized Nomenclature of Medicine.
          <source>College of American Pathologists</source>
          ,
          <year>1994</year>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. NEMA.:
          <source>Digital Imaging and Communications in Medicine (DICOM): Version 3</source>
          .0;
          <fpage>2000</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Russel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Norvig</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Artificial</surname>
            <given-names>Intelligence - A Modern</given-names>
          </string-name>
          <string-name>
            <surname>Approach. Pearson Education Inc</surname>
          </string-name>
          ,
          <year>1995</year>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schütze</surname>
          </string-name>
          , H.:
          <article-title>Foundations of Statistical Natural Language Processing 6</article-title>
          .ed. MIT. Massachusetts, 2003
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Langlotz</surname>
            ,
            <given-names>Curtis P.</given-names>
          </string-name>
          ,
          <source>Automatic Structuring of Radiology Reports: Harbinger of a Second Information Revolution in Radiology, Radiology</source>
          ,
          <year>2002</year>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Hripcsak</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          et al.:
          <article-title>Use of Natural Language Processing to Translate Clinical Information from a Database of 889,921 Chest Radiographic Reports</article-title>
          . Radiology,
          <year>2002</year>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Ricky</surname>
            ,
            <given-names>K. T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stephen</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soderland</surname>
            ,
            <given-names>R. M. J.</given-names>
          </string-name>
          :
          <source>Automatic Structuring of Radiology FreeText Reports. Radiology</source>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Shortliffe</surname>
            ,
            <given-names>E.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hubbard</surname>
            ,
            <given-names>S.M.:</given-names>
          </string-name>
          <article-title>Information systems in oncology</article-title>
          . In:
          <string-name>
            <surname>De Vita</surname>
            <given-names>VT</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hellman</surname>
            <given-names>S</given-names>
          </string-name>
          , Rosenberg S, eds.
          <source>Cancer: principles and practice of oncology</source>
          . Philadelphia, Pa: Lippincott,
          <year>1989</year>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Aberle</surname>
            ,
            <given-names>D.R.</given-names>
          </string-name>
          , et al.:
          <article-title>Integrated multimedia timeline of medical images and data for thoracic oncology patients</article-title>
          .
          <source>RadioGraphics</source>
          ,
          <year>1996</year>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Lyman</surname>
            , M,
            <given-names>S. N.</given-names>
          </string-name>
          , et al.:
          <article-title>The application of natural-language processing to healthcare quality assessment</article-title>
          .
          <source>Med Decis Making</source>
          ,
          <year>1991</year>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Brown</surname>
            ,
            <given-names>P. J. B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Warmington</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Data quality probes-exploiting and improving the quality of electronic patient record data and patient care</article-title>
          .
          <source>International Journal of Medical Informatics</source>
          ,
          <year>2002</year>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Sager</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , et al.:
          <article-title>Medical language processing: applications to patient data representation and automatic encoding</article-title>
          .
          <source>Methods Inf Med</source>
          ,
          <volume>34</volume>
          :
          <fpage>140</fpage>
          -
          <lpage>146</lpage>
          ,
          <year>1995</year>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Muller</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , et al.:
          <article-title>A graph-grammar approach to represent causal, temporal and other contexts in an oncological patient record</article-title>
          .
          <source>Methods Inf Med</source>
          ,
          <volume>35</volume>
          :
          <fpage>127</fpage>
          -
          <lpage>141</lpage>
          ,
          <year>1996</year>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Abidi</surname>
          </string-name>
          . R.,
          <string-name>
            <surname>Manickan</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>Extracting Case Structures from XML-Based Electronic Patient Records: A Knowledge Engineering Solution to Augment Case Based Reasoning Systems</article-title>
          .
          <source>International Journal of Medical Informatics</source>
          ,
          <year>2002</year>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Jackson</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Natural Processing for online applications</article-title>
          .
          <source>J.Benjamins Pulishing Co., Philadelphia</source>
          ,
          <year>2002</year>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Phillips</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Buchanan</surname>
            ,
            <given-names>B.G.</given-names>
          </string-name>
          :
          <article-title>Ontology-guided knowledge discovery in databases</article-title>
          .
          <source>International Conf. Knowledge Capture Victoria, Canada</source>
          ,
          <year>2001</year>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Bakken</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Warren</surname>
            ,
            <given-names>J. J.:</given-names>
          </string-name>
          <article-title>An evaluation of the usefulness of two terminology models for integrating nursing diagnosis concepts into SNOMED Clinical Terms ®</article-title>
          ,
          <source>International Journal of Medical Informatics</source>
          ,
          <year>2002</year>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Medeiros</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <string-name>
            <given-names>A Língua</given-names>
            <surname>Portuguesa</surname>
          </string-name>
          [On-Line] available http://www.linguaportuguesa.ufrn.br,
          <year>2004</year>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Clunie</surname>
          </string-name>
          , David A.:
          <article-title>DICOM Structured Reporting</article-title>
          .
          <source>PixelMed Publishing</source>
          ,
          <year>2000</year>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Atkinson</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          : GNU Aspell. Available online URL: http://aspell.net/,
          <year>2004</year>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>