=Paper=
{{Paper
|id=Vol-1650/smbm16Weissenborn
|storemode=property
|title=A Light-weight & Robust System for Clinical Concept Disambiguation
|pdfUrl=https://ceur-ws.org/Vol-1650/smbm16Weissenborn.pdf
|volume=Vol-1650
|authors=Dirk Weissenborn,Roland Roller,Feiyu Xu,Hans Uszkoreit,Enrique Garcia Perez
|dblpUrl=https://dblp.org/rec/conf/smbm/WeissenbornRXUP16
}}
==A Light-weight & Robust System for Clinical Concept Disambiguation==
A Light-weight & Robust System for Clinical Concept Disambiguation
Dirk Weissenborn, Roland Roller, Feiyu Xu and Hans Uszkoreit
Language Technology Lab, DFKI
Alt-Moabit 91c, Berlin, Germany
{dirk.weissenborn, roland.roller, feiyu, uszkoreit}@dfki.de
Enrique Garcia Perez
SAP Innovation Center
Konrad-Zuse-Ring 10, Potsdam, Germany
enrique.garcia.perez@sap.com
Abstract supervised (Agirre et al., 2010) methods. Each
of those techniques has its advantages, however,
This paper presents a system for the nor- as seen in different disambiguation tasks, sim-
malization of concept mentions in clini- ple methods (and their combination) can achieve
cal narratives. We evaluate and compare very good results, such as the generation of rules
it against a popular, open-source solution and heuristics from the training data (Afzal et al.,
that is frequently used for natural language 2015), the usage of similarity measures (Pathak et
processing of clinical text. The evalu- al., 2015) or the inclusion of Information Content
ation is based on a manually annotated (Leal et al., 2015).
dataset of 72 discharge summaries taken In this work we develop a light-weight solution
from the i2b2-corpus. Besides the demon- to the problem of clinical concept normalization,
stration and evaluation of our system we that is easy to implement and does not require ex-
provide an in-depth corpus analysis that pensive computations and is therefore particularly
guided the development of the system. suited for industrial application. The approach is
Our focus lies on the task of concept dis- mainly unsupervised and does not require large
ambiguation, for which we combine two amounts of training data. In particular, the dis-
unsupervised approaches that are easy to ambiguation is based on a densest-subgraph al-
implement and computationally inexpen- gorithm to ensure contextual compatibility among
sive. We show that some ambiguities can the normalized concepts and the string similarity
only be resolved by adapting to annotation between the surface string and the preferred labels
guidelines and preferences which we solve of a respective concept. We achieve very good
via the introduction of heuristics. Finally, performance with this setup on a manually anno-
we present an online-demo that gives in- tated dataset. An web-application was developed
sights into the individual parts of the nor- for demonstration purposes and to debug the nor-
malization pipeline. malization pipeline1 .
1 Introduction 2 Clinical Concept Normalization
Recognizing and disambiguating clinical concepts
The concept normalization task requires a well
plays a central role in many information extraction
defined target vocabulary. A useful resource is
tasks within the clinical domain. It requires the
the Unified Medical Language System (UMLS),
identification of concept mentions in clinical nar-
which defines biomedical concepts with various
ratives and the disambiguation of their respective
names, spellings and abbreviations. Concepts
surface strings (normalization). In recent years,
within UMLS are defined by so called concept
many tasks have focused on the normalization
unique identifiers (CUI) that represent concepts
of clinical concepts, such as the i2b2 challenge
across different biomedical vocabularies, such as
(Uzuner et al., 2011), ShARe/CLEF (Pradhan et
NCI, NDF-RT or RxNorm. However, natural lan-
al., 2013) and SemEval (Elhadad et al., 2015).
guage is highly variable and surface strings can
Traditionally, disambiguation systems rely on
have different meanings depending on the context.
supervised (Martinez and Baldwin, 2011), semi-
1
supervised (Preiss and Stevenson, 2013) or un- http://clinical-ta.dfki.de
Concept-Type (Source) #Annotations ment and the second half for testing.
Symptoms (NCI) 1434
Disease (NCI) 1370 We also analyzed the ambiguities within the
Medication (RxNorm) 1190 corpus based on our candidate search (§3.2). Ta-
Diagnostic Procedure (NCI) 647 ble 2 lists different ambiguity classes and their
Therapeutic procedure (NCI) 644
Anatomy (NCI) 593 fraction in the dataset. It shows that ambiguity
Laboratory Tests (NCI) 458 arises only in 18% of mentions. Candidate search
fails in about a third of all cases for which the
Table 1: Concept annotations by type in our
correct candidate is not found. For most of those
dataset.
cases no candidate is found at all. This shows that
Class % of mentions the currently employed dictionary lookup has to be
ambiguous 18 refined. However, this work addresses the problem
ambiguous given type 12
not ambiguous 49 of disambiguation. Thus, only 18% of all cases
no candidates 28 are non-trivial and are useful for evaluating dis-
correct candidate not found 33 ambiguation.
Table 2: Ambiguity classes and their relative fre- 3 System Architecture
quency in the dataset. ambiguous - mentions with
more than one candidate including the correct; 3.1 Mention Recognition
ambiguous given type - subset of ambiguous that Because of the focus on disambiguation our demo
remains ambiguous after removing candidates of system employs a simple approach to mention
wrong type; not ambiguous - only one, correct recognition. Given a tokenized input document all
candidate. word n-grams up to a predefined n are extracted.
This guarantees high recall. In the subsequent can-
The task of normalizing surface strings to didate search step we eliminate all extracted men-
unique concepts of a given vocabulary such as tions for which no candidates are found.
UMLS can be subdivided into three partial tasks:
3.2 Candidate Search
Mention Recognition, Candidate Search and Dis-
ambiguation. Given an input text, the mention We find concept candidates for each recognized
recognition subtask identifies text-spans that are mention via a string lookup to a given dictionary.
potential mentions of a medical concept. Sub- The dictionary maps surface strings to concepts.
sequently, the candidate search is responsible for Those were extracted from a predefined subset of
finding candidate concepts for the surface strings vocabularies in the UMLS, namely RxNorm for
of each mention. Finally, the disambiguation step medications and NCI for anatomical concepts, dis-
selects the candidate that fits best into the men- eases, therapeutic procedures, diagnostic proce-
tions context, i.e., it resolves the ambiguity among dures, laboratory tests and symptoms. The surface
its candidates. The work focuses on the disam- strings of the dictionary were expanded by includ-
biguation task. ing additional lexical variations.
3.3 Disambiguation
2.1 Data The most crucial part of the concept normalization
pipeline is the concept disambiguation. Given a
In our experiments we used a part of the i2b22 - set of candidates for each recognized mention it
corpus (Uzuner et al., 2011) that was manually re- selects the concept which fits best to the mention
annotated3 . It consists of 72 discharge summaries. of interest. The disambiguation is guided by two
Overall, the dataset contains 6336 annotations. Ta- algorithms, that are explained in the following.
ble 1 lists annotation types and their corresponding
number of annotations. The corpus was split into String-Edit-Distance Each concept in UMLS
2 distinct subsets, each covering half of the docu- may include a set of synonyms containing a range
ments. The first set was used for system develop- of variations and spellings. Not all of those string
variations are likely to represent a concept in free
2
https://www.i2b2.org/ text. However, a small subset of strings are indi-
3
Note, the re-annotation took place within an industrial
use case and was not carried out by one of the authors. The cated as preferred labels for a concept. In a corpus
data and the dictionaries we used were already given. analysis, we found that many ambiguities can be
resolved by selecting the candidate concept whose 3.4 Rule-based disambiguation
preferred labels contains a close match with the A problem of unsupervised disambiguation is
mention string. We further found that preferred la- the inability of learning corpus-specific patterns
bels of distinct UMLS concepts are usually mutual which depend on annotation guide-lines and the
exclusive. Thus, we employ a string-edit-distance personal perspective of the annotators themselves.
(ED) algorithm, namely Levenshtein-distance, be- Based on our observations the following set of
tween the preferred labels Lc of all candidates cm
i simple rules are defined and used to support both
and the mention string xm . We use the minimum disambiguation techniques:
of those distances to define the ED-score of a can-
didate concept. Active Substance: If the given mention is a
tradename (e.g., Tylenol), in most of the cases its
1 active substance (e.g., Acetaminophen) is anno-
sed (cm
i ) = max
l∈Lcm distance(xm , l) + 1 tated. Therefore we map all concepts that refer
i
to a tradename to its active substance: This infor-
Densest-Subgraph We employ a densest- mation is taken from the UMLS Metathesaurus re-
subgraph algorithm similar to Moro et al. (2014) lation has-tradename.
or Weissenborn et al. (2015) to account for the
context of a mention. First we construct a graph Structure of: If a mention ‘M’ (e.g. ‘left foot’)
that consists of all candidates cim for all mentions includes two candidate concepts, one containing
m of a document. These are the vertices of the the preferred label ‘structure of M’ and the other
graph. We connect candidate concepts from one ‘entire M’, the second concept is removed
different mentions with each other, whenever they from the list of candidates.
co-occurred at least once together in MEDLINE,
Abbreviation validation: Abbreviations tend to
a repository of abstracts from biomedical publica-
be highly ambiguous (Kim et al., 2011) and are
tions. This information is annually summarized
difficult to disambiguate. However, in many cases
by the National Institutes of Health (NIH)4 . Given
those candidates are selected, whose preferred la-
the concept graph G = (V, E) of a document,
bels fit the mentioned abbreviation. To address
we iteratively select a mention with the most
this issue, abbreviations are firstly identified us-
remaining candidates and remove its least con-
ing the UMLS Lexical Tools. Next, candidates
nected candidate until each mention has at most
whose preferred labels are not valid long forms
a predefined number of candidates left5 . Given
of a mentioned abbreviation are removed during
the pruned graph G∗ = (V ∗ , E ∗ ) we score each
pre-processing. Valid long forms of abbreviations
remaining candidate by the product of its number
have to fulfill the following criterion: The first let-
of connections to other mention candidates and
ter of the abbreviation must match the first letter
other mentions, i.e., number of mentions that have
of the text, and the remainder of the abbreviation,
at least one connected candidate concept.
i.e., the abbreviation without its first letter, must be
0 0 ∗
an abbreviation for the either the remaining text or
suds (cm m m m
i ) = {cj |(ci , cj ) ∈ E } · the remaining words, excluding the first.
0
{m0 |∃j : (cm m ∗
i , cj ) ∈ E } 4 Online Demo
suds (cm
i )
sds (cm
i ) = P u m The web interface of the online demo6 is based on
j sds (cj ) the BRAT NLP-tool7 to visualize the implemented
We tried different combinations of both scores candidate search and disambiguation. Figure 1
and found the disambiguation via sds with a fall- presents the output of our Demo after process-
back to sed to work best. I.e., we select always ing a clinical narrative. The upper part ‘Candi-
the candidate for each mention with the highest date Search’ displays the text including mentions
sds and apply sed in case there are more than one with their respective concept candidates. Differ-
candidate with the same score. ent colors indicate different types of concepts. In
the given example, red refers to anatomy, green to
4
https://mbr.nlm.nih.gov/MRCOC.shtml
5 6
We use 5 in our system, which performs slightly better http://clinical-ta.dfki.de
7
or equal to other configurations. http://brat.nlplab.org/
System Pre-processing P R F1
ED Gold-standard 0.850 0.592 0.698
DS Gold-standard 0.850 0.592 0.698
DS+SE Gold-standard 0.857 0.597 0.703
ED cTAKES 0.777 0.522 0.624
DS cTAKES 0.766 0.514 0.615
DS+ED cTAKES 0.780 0.524 0.627
cTAKES cTAKES 0.743 0.499 0.597
Table 3: Normalization results in Precision (P),
Recall (R) and F1-score (F1) for all mentions in
testset.
System Pre-processing #Mentions P
ED Gold-standard 502 0.751
DS Gold-standard 502 0.751
DS+SE Gold-standard 502 0.781
ED cTAKES 270 0.730
DS cTAKES 270 0.659
DS+ED cTAKES 270 0.767
cTAKES cTAKES 270 0.481
Table 4: Precision (P) for all non-trivial mentions
in testset, i.e., mentions with at least 2 candidates
Figure 1: Annotations comprising candidate and containing the correct one.
disambiguated view.
tribute to the performance of disambiguation. Our
symptom, pink to disease and turquoise to labora- system performs also better than cTAKES9 with
tory test. Moving the mouse courser over a candi- the same pre-processing (mention recognition and
date mention, the GUI shows the vocabulary ori- candidate search). The main problem in general
gin and its concept unique identifier. lies in the low recall, which is mainly due to fail-
ing candidate search. This is also a major concern
5 Experiments
in future work.
5.1 Setup As mentioned in §2.1, only a fraction of men-
We evaluated our system on the test part of the tions can be considered non-trivial with respect
dataset with different configurations. More specif- to the disambiguation. Table 4 shows the perfor-
ically, we compare the performance of the indi- mance of our system and cTAKES for all non-
vidual disambiguation algorithms, namely string- trivial mentions. The observations are similar to
edit-distance (ED) and densest-subgraph (DS), the previous results. We can see that the precision
and their combination, as well as a widely used of our system is quite robust and much better than
reference system called cTAKES8 (Savova et al., the performance of cTAKES.
2010) in combination with the disambiguation
component YTEX (Garla et al., 2011). We make 6 Conclusion
use of a gold-standard mention recognizer that ex-
tracts only annotated mentions in the experiments. We presented a light-weight disambiguation sys-
When comparing to cTAKES, we make use of its tem for the normalization of clinical concept men-
internal mention extraction and candidate search tions. The system is mainly unsupervised and uti-
in combination with our disambiguation to guar- lizes string similarity metrics as well as informa-
antee a fair comparison. Additionally, our post- tion from concept co-occurrences. We demon-
processing heuristics were applied to the output of strate its robustness with respect to disambigua-
both our system and cTAKES. tion and compared it to cTAKES, a popular open-
source system for clinical NLP. In addition, we
5.2 Results give examples where our unsupervised approach
Table 3 shows the results on the entire testset. We fails because of annotation guidelines and prefer-
achieve a high precision of over 85% which we at- ences. This problem is solved by the introduction
8 9
https://ctakes.apache.org/ standard configuration for YTEX disambiguation
of simple heuristics. Finally, our system can be Parth Pathak, Pinal Patel, Vishal Panchal, Sagar Soni,
accessed via a web-application. Kinjal Dani, Amrish Patel, and Narayan Choudhary.
2015. ezDI: A Supervised NLP System for Clinical
Acknowledgements Narrative Analysis. In Proceedings of the 9th In-
ternational Workshop on Semantic Evaluation (Se-
This research was partially supported by SAP, mEval 2015), pages 412–416. Association for Com-
the German Federal Ministry of Economics and putational Linguistics.
Energy (BMWi) through the project MACSS Sameer Pradhan, Noémie Elhadad, Brett R. South,
(01MD16011F), and by the German Federal Min- David Martı́nez, Lee M. Christensen, Amy Vogel,
istry of Education and Research (BMBF) through Hanna Suominen, Wendy W. Chapman, and Guer-
gana K. Savova. 2013. Task 1: ShARe/CLEF
the project BBDC (01IS14013E). eHealth Evaluation Lab 2013. In Working Notes for
CLEF 2013 Conference , Valencia, Spain, Septem-
ber 23-26, 2013.
References
Judita Preiss and Mark Stevenson. 2013. DALE: A
Zubair Afzal, Saber A. Akhondi, Herman van Haa-
Word Sense Disambiguation System for Biomedical
gen, Erik M. van Mulligen, and Jan A. Kors. 2015.
Documents Trained using Automatically Labeled
Biomedical Concept Recognition in French Text Us-
Examples. In Proceedings of the 2013 NAACL HLT
ing Automatic Translation of English Terms. In
Demonstration Session, pages 1–4, Atlanta, Geor-
Working Notes of CLEF 2015 - Conference and Labs
gia, June. Association for Computational Linguis-
of the Evaluation forum, Toulouse, France, Septem-
tics.
ber 8-11, 2015.
Eneko Agirre, Aitor Soroa, and Mark Stevenson. Guergana K Savova, James J Masanz, Philip V Ogren,
2010. Graph-based Word Sense Disambigua- Jiaping Zheng, Sunghwan Sohn, Karin C Kipper-
tion of biomedical documents. Bioinformatics, Schuler, and Christopher G Chute. 2010. Mayo
26(22):2889–2896. clinical Text Analysis and Knowledge Extraction
System (cTAKES): architecture, component evalua-
Noémie Elhadad, Sameer Pradhan, Sharon Gorman, tion and applications. Journal of the American Med-
Suresh Manandhar, Wendy Chapman, and Guergana ical Informatics Association, 17(5):507–513.
Savova. 2015. SemEval-2015 Task 14: Analysis
of Clinical Text. In Proceedings of the 9th Interna- Özlem Uzuner, Brett R South, Shuying Shen, and
tional Workshop on Semantic Evaluation (SemEval Scott L DuVall. 2011. 2010 i2b2/VA challenge on
2015), pages 303–310, Denver, Colorado, June. As- concepts, assertions, and relations in clinical text.
sociation for Computational Linguistics. Journal of the American Medical Informatics Asso-
ciation, 18(5):552–556.
Vijay Garla, Vincent Lo Re, Zachariah Dorey-Stein,
Farah Kidwai, Matthew Scotch, Julie Womack, Amy Dirk Weissenborn, Leonhard Hennig, Feiyu Xu, and
Justice, and Cynthia Brandt. 2011. The Yale Hans Uszkoreit. 2015. Multi-Objective Optimiza-
cTAKES extensions for document classification: ar- tion for the Joint Disambiguation of Nouns and
chitecture and application. Journal of the American Named Entities. Proc. of ACLIJCNLP, Beijing,
Medical Informatics Association, 18(5):614–620. China, pages 596–605.
Youngjun Kim, John Hurdle, and Stéphane M Meystre.
2011. Using UMLS lexical resources to disam-
biguate abbreviations in clinical text. AMIA Sym-
posium, 2011:715722.
André Leal, Bruno Martins, and Francisco Couto.
2015. ULisboa: Recognition and Normalization of
Medical Concepts. In Proceedings of the 9th In-
ternational Workshop on Semantic Evaluation (Se-
mEval 2015), pages 406–411. Association for Com-
putational Linguistics.
David Martinez and Timothy Baldwin. 2011. Word
sense disambiguation for event trigger word detec-
tion in biomedicine. BMC Bioinformatics, 12(2):1–
8.
Andrea Moro, Alessandro Raganato, and Roberto Nav-
igli. 2014. Entity linking meets word sense disam-
biguation: a unified approach. Transactions of the
Association for Computational Linguistics, 2:231–
244.