<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Dutch Treat for Healthcare Terminology</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ronald Cornet</string-name>
          <email>r.cornet@amc.uva.nl</email>
          <email>ronald.cornet@liu.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Academic Medical Center - University of Amsterdam, Department of Medical Informatics</institution>
          ,
          <addr-line>P.O. Box 22700, 1100 DE Amsterdam</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Linköping University, Department of Biomedical Engineering, University Hospital</institution>
          ,
          <addr-line>S-581 85 Linköping</addr-line>
          ,
          <country country="SE">Sweden</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Structured and encoded information are important to maximize the meaningful (re)use of the Electronic Health Record (EHR). SNOMED CT is generally regarded as the preferred terminology system for encoding, but it has been shown that manual encoding (i.e., fully structured data entry) has issues with data quality and usability. Therefore, automated SNOMED CT encoding of free-text clinical narratives needs to be explored, which involves both post-hoc processing of yet unstructured records and ad-hoc processing of text being entered into a record. Processing requires thesauri and tools which are apt for the clinical language being used. This poses a problem, as tools are to a large extent language dependent, and thesauri may not be available for the language of interest. Therefore, we have created an inventory of components for processing Dutch natural language, enabling to encode Dutch text as structured SNOMED CT output. This inventory distinguishes languageindependent and language-dependent components, according to the pipeline depicted in Figure 1.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Clinical
free text
SNOMED CT
encoded data
Table 1 below summarizes the available tools for processing Dutch natural language.</p>
    </sec>
    <sec id="sec-2">
      <title>TermTreffer2</title>
    </sec>
    <sec id="sec-3">
      <title>Alpino3</title>
      <p>(int)
+
+
+</p>
      <p>Dutch
stemmer and
analyzer
Morphological
Analyzer;
Stopwords;
Named entity
recognizer;
Negation
finder
(int)</p>
      <p>+
(int)
+</p>
      <p>Noun
phrase
finder
Multiword
recognizer
Second, we investigated the possibilities of creating a concept mapper (to map Dutch terms to concepts
in SNOMED CT) based on the UMLS. To this end, we assessed the extent to which concepts in the CORE
1 http://lucene.apache.org/
2 http://www.inl.nl/tst-centrale/nl/over-de-tst-centrale/projecten/termtreffer
3 http://www.let.rug.nl/~vannoord/alp/Alpino/
subset, which consists of 5965 SNOMED CT concepts useful for documenting reasons of encounter
(RoE), have a Dutch term in one of the source vocabularies of the UMLS. A total of 4236 concepts have a
direct translation to Dutch, which is 71% of the concepts in the CORE subset. Furthermore, we
attempted to map a set of 3930 free-text reasons of encounter, using Dutch translations of SNOMED CT
in the UMLS. Table 2 depicts the extent to which RoE’s could be mapped in full, partially, or not at all, to
a SNOMED CT concept.
# of matches to SNOMED CT concepts</p>
      <p>Percentage</p>
      <sec id="sec-3-1">
        <title>Full match</title>
      </sec>
      <sec id="sec-3-2">
        <title>Partial match</title>
      </sec>
      <sec id="sec-3-3">
        <title>Non match</title>
        <p>Total</p>
        <p>79
2927</p>
        <p>924
3930</p>
        <p>2.0%
74.5%
23.5%
One solution for such a concept mapper would be the complete translation of SNOMED CT, as has been
undertaken for Danish, Spanish, and Swedish, but which involves significant effort and resources.
However, the increasing interest to map coding systems used in the Netherlands to SNOMED CT
provides an opportunity to collect Dutch entry terms for SNOMED CT concepts as a derivative of the
mapping process.</p>
        <p>A variety of Dutch coding systems is currently either being mapped or planned to be mapped. These
systems include the following domains :



</p>
      </sec>
      <sec id="sec-3-4">
        <title>Diagnoses, based on the diagnosis lists of two University Medical Centers</title>
        <p>Procedures, based on the list maintained by Stichting CBV (a national organization)</p>
      </sec>
      <sec id="sec-3-5">
        <title>Optometry, based on a SNOMED CT subset defined by optometrists.</title>
        <p>Pathology, based on the PALGA thesaurus (a national thesaurus based on SNOMED II)
The mapping-based approached for creating Dutch entry terms for SNOMED CT provides a number of
advantages. First, those parts of SNOMED CT are first addressed which have the most value for Dutch
users. Second, the terms provided are those with which the users are already familiar.
Once a significant part of the mappings has been performed, evaluation needs to be performed to
ensure that the use of the Dutch terms for processing clinical narratives results in sufficiently high
precision and recall, i.e., that the SNOMED CT concepts to which the narratives are matched are
maximally complete and correct.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>