=Paper=
{{Paper
|id=Vol-1179/CLEF2013wn-CLEFER-Cornet2013
|storemode=property
|title=A Dutch Treat for Healthcare Terminology
|pdfUrl=https://ceur-ws.org/Vol-1179/CLEF2013wn-CLEFER-Cornet2013.pdf
|volume=Vol-1179
|dblpUrl=https://dblp.org/rec/conf/clef/Cornet13
}}
==A Dutch Treat for Healthcare Terminology==
A Dutch Treat for Healthcare Terminology 1,2 Ronald Cornet 1 Academic Medical Center - University of Amsterdam, Department of Medical Informatics, P.O. Box 22700, 1100 DE Amsterdam, The Netherlands 2 Linköping University, Department of Biomedical Engineering, University Hospital, S-581 85 Linköping, Sweden r.cornet@amc.uva.nl, ronald.cornet@liu.se Structured and encoded information are important to maximize the meaningful (re)use of the Electronic Health Record (EHR). SNOMED CT is generally regarded as the preferred terminology system for encoding, but it has been shown that manual encoding (i.e., fully structured data entry) has issues with data quality and usability. Therefore, automated SNOMED CT encoding of free-text clinical narratives needs to be explored, which involves both post-hoc processing of yet unstructured records and ad-hoc processing of text being entered into a record. Processing requires thesauri and tools which are apt for the clinical language being used. This poses a problem, as tools are to a large extent language dependent, and thesauri may not be available for the language of interest. Therefore, we have created an inventory of components for processing Dutch natural language, enabling to encode Dutch text as structured SNOMED CT output. This inventory distinguishes language- independent and language-dependent components, according to the pipeline depicted in Figure 1. Clinical SNOMED CT free text encoded data Figure 1. Components of an NLP pipeline. Language-independent components are in italic font. Table 1 below summarizes the available tools for processing Dutch natural language. Table 1. Overview of tools suitable for processing Dutch natural language. (int): tool provides the functionality only internally, i.e., the results cannot be retrieved; +: the tool offers the functionality. Language Language dependent independent Sentence Tokenizer Morphological POS Parser Noun Concept Splitter Analyzer Tagger phrase mapper finder Apache + Dutch Lucene1 stemmer and analyzer TermTreffer2 + Morphological + Multi- Analyzer; word Stopwords; recognizer Named entity recognizer; Negation finder Alpino3 (int) + (int) (int) + Second, we investigated the possibilities of creating a concept mapper (to map Dutch terms to concepts in SNOMED CT) based on the UMLS. To this end, we assessed the extent to which concepts in the CORE 1 http://lucene.apache.org/ 2 http://www.inl.nl/tst-centrale/nl/over-de-tst-centrale/projecten/termtreffer 3 http://www.let.rug.nl/~vannoord/alp/Alpino/ subset, which consists of 5965 SNOMED CT concepts useful for documenting reasons of encounter (RoE), have a Dutch term in one of the source vocabularies of the UMLS. A total of 4236 concepts have a direct translation to Dutch, which is 71% of the concepts in the CORE subset. Furthermore, we attempted to map a set of 3930 free-text reasons of encounter, using Dutch translations of SNOMED CT in the UMLS. Table 2 depicts the extent to which RoE’s could be mapped in full, partially, or not at all, to a SNOMED CT concept. Table 2. Mappings from RoEs to SNOMED CT concepts in numbers and percentages. # of matches to SNOMED CT concepts Percentage Full match 79 2.0% Partial match 2927 74.5% Non match 924 23.5% Total 3930 - One solution for such a concept mapper would be the complete translation of SNOMED CT, as has been undertaken for Danish, Spanish, and Swedish, but which involves significant effort and resources. However, the increasing interest to map coding systems used in the Netherlands to SNOMED CT provides an opportunity to collect Dutch entry terms for SNOMED CT concepts as a derivative of the mapping process. A variety of Dutch coding systems is currently either being mapped or planned to be mapped. These systems include the following domains : Diagnoses, based on the diagnosis lists of two University Medical Centers Procedures, based on the list maintained by Stichting CBV (a national organization) Optometry, based on a SNOMED CT subset defined by optometrists. Pathology, based on the PALGA thesaurus (a national thesaurus based on SNOMED II) The mapping-based approached for creating Dutch entry terms for SNOMED CT provides a number of advantages. First, those parts of SNOMED CT are first addressed which have the most value for Dutch users. Second, the terms provided are those with which the users are already familiar. Once a significant part of the mappings has been performed, evaluation needs to be performed to ensure that the use of the Dutch terms for processing clinical narratives results in sufficiently high precision and recall, i.e., that the SNOMED CT concepts to which the narratives are matched are maximally complete and correct.