=Paper=
{{Paper
|id=Vol-1276/MedIR-SIGIR2014-03
|storemode=property
|title=Retrieving Attitudes: Sentiment Analysis from Clinical Narratives
|pdfUrl=https://ceur-ws.org/Vol-1276/MedIR-SIGIR2014-03.pdf
|volume=Vol-1276
|dblpUrl=https://dblp.org/rec/conf/sigir/DengSD14
}}
==Retrieving Attitudes: Sentiment Analysis from Clinical Narratives==
Retrieving Attitudes: Sentiment Analysis from Clinical
Narratives
Yihan Deng Matthaeus Stoehr Kerstin Denecke
ICCAS ENT Clinic ICCAS
University of Leipzig University Hospital Leipzig University of Leipzig
Semmelweissstr. 14 Liebigstr. 10 Semmelweissstr. 14
Leipzig, Germany Leipzig, Germany Leipzig, Germany
{name.surname}@iccas.de
ABSTRACT view on the facets of sentiment in clinical texts. With the
Physicians and nurses express their judgments and observa- development of the principles of evidence-based medicine [6]
tions towards a patient’s health status in clinical narratives. and digital patient modeling [1], the observations and judg-
Thus, their judgments are explicitly or implicitly included in ments expressed in clinical narratives will play a crucial role
patient records. To get impressions on the current health sit- for the clinical decision process.
uation of a patient or on changes in the status, analysis and Consider the following scenario: During the daily ward
retrieval of this subjective content is crucial. In this paper, round, a physician is making observations with respect to
we approach this question as sentiment analysis problem and the health status of a patient (e.g. symptoms improved).
analyze the feasibility of assessing these judgments in clin- The patient describes his personal experiences on the symp-
ical text by means of general sentiment analysis methods. toms such as the degree of pain. All this information reflects
Specifically, the word usage in clinical narratives and in a the individual health status and is documented in clinical
general text corpus is compared. The linguistic characteris- notes. Retrieving, analyzing and aggregating this informa-
tics of judgments in clinical narratives are collected. Besides, tion over time can support the treatment decisions and al-
the requirements for sentiment analysis and retrieval from lows a physician to quickly get an overview on the health
clinical narratives are derived. status. Another application example is retrieving attitudes
from clinical documents which can support assessing the out-
come of treatments. In this way, labor-intensive user studies
Categories and Subject Descriptors for treatment or medication evaluation can be facilitated.
H.3 [INFORMATION STORAGE AND RETRIEVAL]: For processing clinical narratives in the last years, effec-
Content Analysis and Indexing tive algorithms in particular for named entity recognition
and relation extraction [2] have been developed. Based on
Keywords recognized entities and relations between entities, sentiments
expressed in medical narratives can now be analyzed to offer
Clinical text mining, Sentiment analysis an upper-level text understanding. Further, a corresponding
retrieval of judgments or sentiments can be realized. How-
1. INTRODUCTION ever, sentiments, opinions and intentions expressed in clin-
Sentiment analysis deals with determining the sentiment ical narratives have not been well exploited yet. In this
with respect to a specific topic expressed in natural language paper, we start analyzing the sentiment expressions used in
text. So far, the development of sentiment analysis meth- clinical texts through a linguistic comparison with a non-
ods concentrated on processing very opinionated, subjective medical, subjective text corpus.
texts such as customer reviews [3, 4]. Clearly, sentiment in Conventional methods for sentiment analysis have been de-
clinical documents differs from sentiment in user-generated veloped for processing subjective on-line documents such as
content or other text types. With the term sentiment we weblogs and forums. In this paper, our goal is to analyze
refer to information on the health status, or on the outcome the applicability of such methods for sentiment analysis in
of a medical treatment or change / seriousness of a symptom clinical narratives. We will identify necessary extensions of
(e.g. serious pain) or the certainty of an observation. The existing methods and come up with the requirement of sen-
work presented in this paper intends to get a more complete timent in clinical narratives. To this end, we will first com-
pare two types of medical narratives (radiology report and
nurse letter) with a weblog data set. The lexical and linguis-
tic differences will be presented. Afterwards, we will apply
a general subjectivity lexicon to medical narratives using
dictionary-based methods. Sources of errors of this simple
sentiment recognition approach will be discussed. The fol-
lowing research questions will be addressed:
MedIR July 11, 2014, Gold Coast, Australia 1. In comparison with user generation content, which lex-
Copyright is held by the author/owner(s). ical characteristics do clinical narratives have?
12
2. What characterizes sentiments in clinical narratives? Radiology Report: A radiological report is mainly used
to inform the treating physicians about the findings in an
3. Can existing methods for sentiment analysis be ap- radiological examination. It starts usually with a medical
plied? Which adaptations are necessary? history, which is followed by a description of the region of
interest and questions for the examinations. The texts con-
2. SENTIMENT ANALYSIS IN THE MEDI- tain many judgments and observations as observed in the
CAL DOMAIN examination.
Slashdot Interviews: Slashdot is a technology-related we-
To our best knowledge, few work considered sentiment blog, which covers different technical topics. The users ex-
analysis in medical texts: Xia et al. [9] have indicated that press their opinions on certain topics. We chose the tech-
sentiments are topics-related. Their approach to sentiment nical interviews as benchmark instead of movie or product
analysis starts with a standard topic classifier based on topic review, since technical interviews contain also a relatively
labels. In the second step, special classifiers are initialized large amount of terminologies.
to detect the polarity for each topic. The multi-step clas-
sification method has earned a nearly 10% improvement of 3.2 Linguistic and Sentiment Analysis of the
F1 measure in comparison with the single-step approach. Data Sets
Niu et al. consider sentiment analysis in biomedical litera-
ture [5]. They exploit a supervised method to classify the Apparently, the three text sources are different in terms
polarity at sentence level. The linguistic features such as of terminology usage and content. The interview corpus is
uni-grams, bi-grams and negations are employed. The med- typical user generated content. We expect that the corpus
ical terms are merely replaced by their semantic category. will contain a relatively large amount of sentiment terms
The category information and context information are de- and subjective expressions, while the clinical narratives are
rived from the Unified Medical Language System (UMLS1 ). written in a more objective way. Less opinionated terms and
The combination of linguistic features and domain-specific rather more clinical terminology are expected. However, the
knowledge have improved the accuracy of the algorithm. question is whether the terminology and word usage is re-
In summary, existing methods for sentiment analysis in the ally distributed as expected. To what extent do the corpora
medical domain focus on processing biomedical literature differ with respect to linguistic characteristics? Recalling
and patient-generated text. The clinical text which is used our initial research questions, we need to answer whether
to record the activities and judgments of health care work- existing sentiment lexicons can provide the basis for ana-
ers has not yet been analyzed. Moreover, the existing ap- lyzing judgments and sentiments in clinical narratives. In
proaches and definitions of sentiment in the medical domain order to address these questions, an extraction pipeline has
are derived from general sentiment analysis for Web 2.0 me- been built to obtain part of speeches and sentiment terms
dia. Clinical context and medical knowledge have not been from the texts and to determine their occurrence frequency.
used thoroughly besides some category meta data derived The Penn Tree POS-tagger4 and the SL sentiment lexicon
from the UMLS [7, 5]. We expect that due to different ex- [8] (contains 8,221 single-term subjective expressions) have
pressions and the more objective way of writing in the clini- been exploited for this purpose. The punctuation, numbers
cal narratives, the conventional sentiment analysis methods and stop words were also extracted and their proportions
need to be adapted to cope with the clinical context. We were calculated.
will concentrate on that particular text material. After analyzing the linguistic composition of the data sets,
we want to study the applicability of a dictionary-based sen-
timent analysis approach on clinical narratives. Potential
3. METHODOLOGY limitations of the approach when applied to medical narra-
tives will be identified. For this purpose, we have created
3.1 Text Material an experiment pipeline in KNIME5 . Two dictionary taggers
In order to analyze the differences between the language were applied to recognize positive and negative terms in the
in clinical narratives and general texts from the Internet, text respectively. A voting algorithm is applied to calculate
200 nurse letters and 200 radiology reports from “MIMIC II the polarity for each document. It is based on the number
Database2 ” have been chosen as corpus. These documents of positive and negative occurrences and handles negations.
form the domain-specific data source in our assessment. For Although it is only a simple approach, it is a direct method
comparison reasons, we additionally consider 200 technical to evaluate the compatibility between the subjectivity lexi-
interviews downloaded from the website Slashdot3 . We have con and clinical narratives. The SL sentiment lexicon from
chosen that particular dataset since it belongs to the cate- Wilson et al. [8] is used by the dictionary tagger. It com-
gory of user-generated, subjective content. Given the tech- prises a large amount of adjectives, adverbs, but also nouns
nical topics, we however expect a certain similarity, mainly and verbs expressing sentiments. For evaluation purposes,
an objectivity as it occurs in clinical narratives. the three corpora were annotated with an overall document
Nurse Letter: A nurse letter is part of a patient record, polarity at document level by one physician from our uni-
and is written by nurses on duty. Its content reflects the versity hospital.
situation of the patient and the feedback to the ongoing
treatment. It is written in a relatively subjective manner.
Acronyms and typos appear very often in nurse letters.
4. RESULTS AND DISCUSSION
1
http://www.nlm.nih.gov/research/umls/, accessed: 4.1 Results of the Linguistic Analysis
20.04.2014
2 4
http://www.physionet.org/, accessed 20.04.2014 http://www.cis.upenn.edu/ treebank/, accessed 20.04.2014
3 5
http://slashdot.org, accessed 20.04.2014 http://www.knime.org/, accessed 20.4.2014
13
Figure 1: Result of the Linguistic Analysis
In Figure 1, the proportions of punctuations, numbers, Types Accu(overall) F1(Bad) F1(Neutral) F1(Good)
stop words, nouns, pronouns, adjectives, adverbs as well Interviews 0.696 0.754 0.367 0.735
as the sentiment terms are illustrated. Part of speeches of Nurse 0.420 0.437 0.216 0.503
terms that matched with the sentiment lexicon have not Letter
been considered. The result has partially confirmed our ex- Radiology 0.446 0.297 0.080 0.559
pectation. Report
Sentiment Terms: According to the results, the normal in-
terview corpus contains the highest proportion of sentiment Table 1: Sentiment Analysis Results: Accuracy and
terms with 8%, while the radiology reports contain 4% and F1 measure for three text types
nurse letters 6% sentiment terms. These results have ap-
proved our observation that nurse letters are written more
subjectively in comparison to radiology reports, but they jectives (6-8% of the terms) that are not included in the SL
are still more objective than the interviews. The sentiments sentiment lexicon. In contrast, all adjectives in the interview
expressed in nurse letters are normally implicit and appear corpus matched with the sentiment lexicon. The additional
with the description of patient’s health status, or the social adjectives in clinical narratives are mainly related to body
records for the visitors of the patients. Opinionated terms locations, such as “left” side, “right” side, “vertical”, ”dorsal”,
and expressions such as suspicion, negation, approval or rec- “cervical”. They express neither emotion nor attitude but
ommendations can be found in radiology reports mainly in anatomical concepts and relative locations in the body. In
the conclusion section or impression part at the end of the summary, the nurse letters show a relatively higher linguis-
whole report. tic similarity to technical interviews than radiology reports.
Number: Numbers are one of the most important elements They are to a certain extent more subjectively written than
in clinical reports, where they are mainly used to represent radiology reports. The large amount of the medical terms
the dose of medications, the size of a tumor or the frequency (noun, adjective) describe the status of a patient. They re-
of a treatment, etc. In our clinical data sets, numbers com- flect the attitudes of physicians. Thus, the implicit clinical
prise between 2% and 3% of the words or characters. In events may influence the polarity outcome of a clinical re-
contrast, in the interviews almost no numbers occur, since port as well. Consequently, the implicit clinical events and
the discussions in weblogs are more likely to use simple, evidences are expected to be relevant to understand and in-
colloquial vocabulary to present the personal attitudes and terpret the status of the patient.
preferences.
Stop Word: The nurse letters and radiology reports con- 4.2 Results of the Sentiment Analysis
tain 13% and 17% stop words respectively. In contrast, the
percentage of stop words in the interview corpus is with 32% The automatically retrieved polarity for the texts were
significantly higher, which shows that the clinical documents compared to the manual annotation done by clinical experts.
are clearly written in a concise way, focusing on facts. The overall accuracy and F1 measure for the three text types
Nouns and Pronouns: What noteworthy is, the percentage is shown in Table 4.1: Accuracy is the proportion of true
of nouns in radiology reports (31%) and nurse letter (33%) results in the population. The sentiment analysis of inter-
is clearly higher than the percentage of nouns in interviews views leads to an acceptable accuracy of 69.6%. The results
(21%), while the percentage of pronouns in the interviews for nurse letters and radiology reports have merely achieved
(4%) is notably higher than in the radiology report (0%) the accuracies of 42% and 44% respectively. This shows that
and nurse letters (1%). The reason is that in medical facts existing methods need to be adapted when processing these
are described in clinical narratives using nouns from medical texts and that sentiment is different. Furthermore, the F1
terminologies (e.g. names of diseases, symptoms , medica- measure for positive texts (F1 good) is significantly higher
tions). In contrast, the interviews contain more subjective for the clinical texts than for negative (F1 bad) texts. A
terms and use a large amount of first person expressions to manual assessment showed that the positive sentiments or
express the ideas and opinions of individuals. outcomes are described in an explicit way, e.g., by phrases
Adjective and Adverb: Another interesting finding is that such as the “patient slept well, the treatment has a satisfac-
the clinical narratives contain a substantial amount of ad- tory result” or “the tube has been placed successfully”. For
negative clinical events, the nurse and physician were more
14
likely to express the status of patient in a careful and cau- also shows that for interpreting the detected sentiment,
tious manner, e.g. by phrase such as “some situation cannot the context need to be considered. Further, sentiment
be excluded or need further pathological investigation”. The can be seen as presence, change in or certainty of a
radiology reports are more likely to exclude or confirm the medical condition. I.e. a medical condition can exist,
occurrences of certain clinical events rather than to give a improve, worsen, be certain or uncertain. The treat-
final diagnosis. In addition, the recognition of neutral sit- ment outcome can be positive, negative (e.g. surgery
uations is difficult, since the judgment of neutral outcome was successful or failed), neutral or a treatment can
depends on the recognition of positive and negative terms. have no outcome.
However, neutral clinical outcomes in real world are proba-
bly not objectively expressed. Some surgical result may only 3. A simple method for sentiment analysis is not well
show moderate effect, but it may turn out to be an insignif- suited to analyze sentiment in clinical narratives. Sen-
icant outcome in nurse letters or might even produce some timent in clinical texts differs significantly from sen-
negative feedbacks. During the annotation, our physician timent in general texts. In particular, implicit senti-
tended to give more positive and negative judgments to the ments need to be detected. An adapted annotation
reports rather than neutral ones, since the determination scheme should be defined with the help of physicians.
of “neutral” needs more context and reference, which is not New features for sentiment analysis need to be col-
that easy to obtain without knowing the complete patient lected for gathering these subjective sentiments.
history. In the short term, we will develop a sentiment lexicon spe-
cific for the medical domain. It will define a scheme for
5. CONCLUSION AND FUTURE WORK analyzing and retrieving implicit sentiments and attitudes
In this paper, we have studied the linguistic characteris- expressed in clinical texts. The kind of influence and de-
tics of clinical narratives compared to a web data set and gree of influence of a symptom to the health status will be
analyzed the feasibility of a simple sentiment analysis ap- defined. This lexicon or ontology will be exploited for devel-
proach on clinical narratives. The results provide important oping a more comprehensive sentiment analysis algorithm.
insights to understand sentiment in clinical narratives and
to continue with developing corresponding analysis meth- 6. REFERENCES
ods. The initial three research questions raised in Section 1
[1] K. Denecke. Model-based Decision Support:
can be answered.
Requirements and Future for its Application in
1. The linguistic analysis showed that clinical narratives Surgery. Biomedical Engineering, 58(1), 2013.
contain a moderate amount of sentiment terms. In [2] C. Friedman, T. C. Rindflesch, and M. Corn. Natural
contrast to the web data set, more numbers, medi- language processing: State of the art and prospects for
cal terms (nouns), location-related adjectives are ex- significant progress, a workshop sponsored by the
ploited and less stop words, and less pronouns are in- national library of medicine. Journal of Biomedical
cluded. This composition and word usage reflects the Informatics, 46(5):765–773, 2013.
objectivity and preciseness of the clinical writing style. [3] M. Hu and B. Liu. Mining and summarizing customer
2. By analyzing the clinical documents, we learned more reviews. In Proc. Tenth ACM SIGKDD, KDD ’04,
about the nature of sentiment in clinical narratives. pages 168–177, New York, NY, USA, 2004. ACM.
Sentiment can concern the general health status of a [4] B. Liu. Sentiment Analysis and Opinion Mining.
patient, the outcome of a treatment or of a specific Synthesis Lectures on HLT. Morgan & Claypool
medical condition or can concern uncertainty of an ob- Publishers, 2012.
servation. Good, bad or positive and negative is [5] Y. Niu, X. Zhu, J. Li, and G. Hirst. Analysis of polarity
manifested in status changes, e.g. an improvement or information in medical text. In In: Proc of the AMIA
worsening of a certain medical or physical condition or 2005 Annual Symposium, pages 570–574, 2005.
the success or failure of a treatment. Sentiment can be [6] D. L. Sackett, W. M. Rosenberg, J. Gray, R. B. Haynes,
seen as health status of a patient: The patient’s health and W. S. Richardson. Evidence based medicine: What
status can be good, bad or normal at some point in it is and what it isn’t. BMJ, 312(7023):71–72, 1996.
time, expressed either implicitly or explicitly. By an- [7] A. Sarker, D. Molla, and C. Paris. Outcome polarity
alyzing that health status over time, improvements or identification of medical papers. In Proceedings of the
worsening in the status can be recognized. An implicit Australasian Language Technology Association
description of a health status concerns the mentioning Workshop 2011, pages 105–114, Canberra, Australia,
of critical symptoms (e.g. serious pain, extreme weight December 2011.
loss, high blood pressure). A explicit description of the [8] T. Wilson, J. Wiebe, and P. Hoffmann. Recognizing
health status is reflected through phrases such as “the contextual polarity in phrase-level sentiment analysis.
patient recovered well” or “normal”. Sentiment in clin- In Proc of HLT ’05, HLT ’05, pages 347–354,
ical texts can be the outcome of a treatment or the im- Stroudsburg, PA, USA, 2005. Association for
pact of a specific medical condition, i.e. whether the Computational Linguistics.
condition improved or worsens which allows to draw [9] L. Xia, A. L. Gentile, J. Munro, and J. Iria. Improving
conclusions on the effect or outcome of a treatment patient opinion mining through multi-step
(positive/negative outcome). The phrase “blood sugar classification. In TSD, pages 70–76, 2009.
decreased” could express a positive or negative change
depending on the previous state. A decrease of blood
pressure can be good when it was too high before. This
15