Retrieving Attitudes: Sentiment Analysis from Clinical
                            Narratives

                      Yihan Deng                      Matthaeus Stoehr                         Kerstin Denecke
                        ICCAS                              ENT Clinic                                ICCAS
                  University of Leipzig            University Hospital Leipzig                 University of Leipzig
                  Semmelweissstr. 14                      Liebigstr. 10                        Semmelweissstr. 14
                   Leipzig, Germany                    Leipzig, Germany                         Leipzig, Germany

                                                 {name.surname}@iccas.de

ABSTRACT                                                                view on the facets of sentiment in clinical texts. With the
Physicians and nurses express their judgments and observa-              development of the principles of evidence-based medicine [6]
tions towards a patient’s health status in clinical narratives.         and digital patient modeling [1], the observations and judg-
Thus, their judgments are explicitly or implicitly included in          ments expressed in clinical narratives will play a crucial role
patient records. To get impressions on the current health sit-          for the clinical decision process.
uation of a patient or on changes in the status, analysis and              Consider the following scenario: During the daily ward
retrieval of this subjective content is crucial. In this paper,         round, a physician is making observations with respect to
we approach this question as sentiment analysis problem and             the health status of a patient (e.g. symptoms improved).
analyze the feasibility of assessing these judgments in clin-           The patient describes his personal experiences on the symp-
ical text by means of general sentiment analysis methods.               toms such as the degree of pain. All this information reflects
Specifically, the word usage in clinical narratives and in a            the individual health status and is documented in clinical
general text corpus is compared. The linguistic characteris-            notes. Retrieving, analyzing and aggregating this informa-
tics of judgments in clinical narratives are collected. Besides,        tion over time can support the treatment decisions and al-
the requirements for sentiment analysis and retrieval from              lows a physician to quickly get an overview on the health
clinical narratives are derived.                                        status. Another application example is retrieving attitudes
                                                                        from clinical documents which can support assessing the out-
                                                                        come of treatments. In this way, labor-intensive user studies
Categories and Subject Descriptors                                      for treatment or medication evaluation can be facilitated.
H.3 [INFORMATION STORAGE AND RETRIEVAL]:                                   For processing clinical narratives in the last years, effec-
Content Analysis and Indexing                                           tive algorithms in particular for named entity recognition
                                                                        and relation extraction [2] have been developed. Based on
Keywords                                                                recognized entities and relations between entities, sentiments
                                                                        expressed in medical narratives can now be analyzed to offer
Clinical text mining, Sentiment analysis                                an upper-level text understanding. Further, a corresponding
                                                                        retrieval of judgments or sentiments can be realized. How-
1.    INTRODUCTION                                                      ever, sentiments, opinions and intentions expressed in clin-
   Sentiment analysis deals with determining the sentiment              ical narratives have not been well exploited yet. In this
with respect to a specific topic expressed in natural language          paper, we start analyzing the sentiment expressions used in
text. So far, the development of sentiment analysis meth-               clinical texts through a linguistic comparison with a non-
ods concentrated on processing very opinionated, subjective             medical, subjective text corpus.
texts such as customer reviews [3, 4]. Clearly, sentiment in            Conventional methods for sentiment analysis have been de-
clinical documents differs from sentiment in user-generated             veloped for processing subjective on-line documents such as
content or other text types. With the term sentiment we                 weblogs and forums. In this paper, our goal is to analyze
refer to information on the health status, or on the outcome            the applicability of such methods for sentiment analysis in
of a medical treatment or change / seriousness of a symptom             clinical narratives. We will identify necessary extensions of
(e.g. serious pain) or the certainty of an observation. The             existing methods and come up with the requirement of sen-
work presented in this paper intends to get a more complete             timent in clinical narratives. To this end, we will first com-
                                                                        pare two types of medical narratives (radiology report and
                                                                        nurse letter) with a weblog data set. The lexical and linguis-
                                                                        tic differences will be presented. Afterwards, we will apply
                                                                        a general subjectivity lexicon to medical narratives using
                                                                        dictionary-based methods. Sources of errors of this simple
                                                                        sentiment recognition approach will be discussed. The fol-
                                                                        lowing research questions will be addressed:

MedIR July 11, 2014, Gold Coast, Australia                                1. In comparison with user generation content, which lex-
Copyright is held by the author/owner(s).                                    ical characteristics do clinical narratives have?


                                                                   12
     2. What characterizes sentiments in clinical narratives?          Radiology Report: A radiological report is mainly used
                                                                       to inform the treating physicians about the findings in an
     3. Can existing methods for sentiment analysis be ap-             radiological examination. It starts usually with a medical
        plied? Which adaptations are necessary?                        history, which is followed by a description of the region of
                                                                       interest and questions for the examinations. The texts con-
2.     SENTIMENT ANALYSIS IN THE MEDI-                                 tain many judgments and observations as observed in the
       CAL DOMAIN                                                      examination.
                                                                       Slashdot Interviews: Slashdot is a technology-related we-
   To our best knowledge, few work considered sentiment                blog, which covers different technical topics. The users ex-
analysis in medical texts: Xia et al. [9] have indicated that          press their opinions on certain topics. We chose the tech-
sentiments are topics-related. Their approach to sentiment             nical interviews as benchmark instead of movie or product
analysis starts with a standard topic classifier based on topic        review, since technical interviews contain also a relatively
labels. In the second step, special classifiers are initialized        large amount of terminologies.
to detect the polarity for each topic. The multi-step clas-
sification method has earned a nearly 10% improvement of               3.2      Linguistic and Sentiment Analysis of the
F1 measure in comparison with the single-step approach.                         Data Sets
Niu et al. consider sentiment analysis in biomedical litera-
ture [5]. They exploit a supervised method to classify the                Apparently, the three text sources are different in terms
polarity at sentence level. The linguistic features such as            of terminology usage and content. The interview corpus is
uni-grams, bi-grams and negations are employed. The med-               typical user generated content. We expect that the corpus
ical terms are merely replaced by their semantic category.             will contain a relatively large amount of sentiment terms
The category information and context information are de-               and subjective expressions, while the clinical narratives are
rived from the Unified Medical Language System (UMLS1 ).               written in a more objective way. Less opinionated terms and
The combination of linguistic features and domain-specific             rather more clinical terminology are expected. However, the
knowledge have improved the accuracy of the algorithm.                 question is whether the terminology and word usage is re-
In summary, existing methods for sentiment analysis in the             ally distributed as expected. To what extent do the corpora
medical domain focus on processing biomedical literature               differ with respect to linguistic characteristics? Recalling
and patient-generated text. The clinical text which is used            our initial research questions, we need to answer whether
to record the activities and judgments of health care work-            existing sentiment lexicons can provide the basis for ana-
ers has not yet been analyzed. Moreover, the existing ap-              lyzing judgments and sentiments in clinical narratives. In
proaches and definitions of sentiment in the medical domain            order to address these questions, an extraction pipeline has
are derived from general sentiment analysis for Web 2.0 me-            been built to obtain part of speeches and sentiment terms
dia. Clinical context and medical knowledge have not been              from the texts and to determine their occurrence frequency.
used thoroughly besides some category meta data derived                The Penn Tree POS-tagger4 and the SL sentiment lexicon
from the UMLS [7, 5]. We expect that due to different ex-              [8] (contains 8,221 single-term subjective expressions) have
pressions and the more objective way of writing in the clini-          been exploited for this purpose. The punctuation, numbers
cal narratives, the conventional sentiment analysis methods            and stop words were also extracted and their proportions
need to be adapted to cope with the clinical context. We               were calculated.
will concentrate on that particular text material.                        After analyzing the linguistic composition of the data sets,
                                                                       we want to study the applicability of a dictionary-based sen-
                                                                       timent analysis approach on clinical narratives. Potential
3.     METHODOLOGY                                                     limitations of the approach when applied to medical narra-
                                                                       tives will be identified. For this purpose, we have created
3.1      Text Material                                                 an experiment pipeline in KNIME5 . Two dictionary taggers
   In order to analyze the differences between the language            were applied to recognize positive and negative terms in the
in clinical narratives and general texts from the Internet,            text respectively. A voting algorithm is applied to calculate
200 nurse letters and 200 radiology reports from “MIMIC II             the polarity for each document. It is based on the number
Database2 ” have been chosen as corpus. These documents                of positive and negative occurrences and handles negations.
form the domain-specific data source in our assessment. For            Although it is only a simple approach, it is a direct method
comparison reasons, we additionally consider 200 technical             to evaluate the compatibility between the subjectivity lexi-
interviews downloaded from the website Slashdot3 . We have             con and clinical narratives. The SL sentiment lexicon from
chosen that particular dataset since it belongs to the cate-           Wilson et al. [8] is used by the dictionary tagger. It com-
gory of user-generated, subjective content. Given the tech-            prises a large amount of adjectives, adverbs, but also nouns
nical topics, we however expect a certain similarity, mainly           and verbs expressing sentiments. For evaluation purposes,
an objectivity as it occurs in clinical narratives.                    the three corpora were annotated with an overall document
Nurse Letter: A nurse letter is part of a patient record,              polarity at document level by one physician from our uni-
and is written by nurses on duty. Its content reflects the             versity hospital.
situation of the patient and the feedback to the ongoing
treatment. It is written in a relatively subjective manner.
Acronyms and typos appear very often in nurse letters.
                                                                       4.     RESULTS AND DISCUSSION
1
  http://www.nlm.nih.gov/research/umls/,         accessed:             4.1      Results of the Linguistic Analysis
20.04.2014
2                                                                      4
  http://www.physionet.org/, accessed 20.04.2014                           http://www.cis.upenn.edu/ treebank/, accessed 20.04.2014
3                                                                      5
  http://slashdot.org, accessed 20.04.2014                                 http://www.knime.org/, accessed 20.4.2014


                                                                  13
                                       Figure 1: Result of the Linguistic Analysis


   In Figure 1, the proportions of punctuations, numbers,               Types        Accu(overall) F1(Bad)      F1(Neutral) F1(Good)
stop words, nouns, pronouns, adjectives, adverbs as well                Interviews   0.696          0.754       0.367        0.735
as the sentiment terms are illustrated. Part of speeches of             Nurse        0.420          0.437       0.216        0.503
terms that matched with the sentiment lexicon have not                  Letter
been considered. The result has partially confirmed our ex-             Radiology    0.446          0.297       0.080        0.559
pectation.                                                              Report
   Sentiment Terms: According to the results, the normal in-
terview corpus contains the highest proportion of sentiment           Table 1: Sentiment Analysis Results: Accuracy and
terms with 8%, while the radiology reports contain 4% and             F1 measure for three text types
nurse letters 6% sentiment terms. These results have ap-
proved our observation that nurse letters are written more
subjectively in comparison to radiology reports, but they             jectives (6-8% of the terms) that are not included in the SL
are still more objective than the interviews. The sentiments          sentiment lexicon. In contrast, all adjectives in the interview
expressed in nurse letters are normally implicit and appear           corpus matched with the sentiment lexicon. The additional
with the description of patient’s health status, or the social        adjectives in clinical narratives are mainly related to body
records for the visitors of the patients. Opinionated terms           locations, such as “left” side, “right” side, “vertical”, ”dorsal”,
and expressions such as suspicion, negation, approval or rec-         “cervical”. They express neither emotion nor attitude but
ommendations can be found in radiology reports mainly in              anatomical concepts and relative locations in the body. In
the conclusion section or impression part at the end of the           summary, the nurse letters show a relatively higher linguis-
whole report.                                                         tic similarity to technical interviews than radiology reports.
   Number: Numbers are one of the most important elements             They are to a certain extent more subjectively written than
in clinical reports, where they are mainly used to represent          radiology reports. The large amount of the medical terms
the dose of medications, the size of a tumor or the frequency         (noun, adjective) describe the status of a patient. They re-
of a treatment, etc. In our clinical data sets, numbers com-          flect the attitudes of physicians. Thus, the implicit clinical
prise between 2% and 3% of the words or characters. In                events may influence the polarity outcome of a clinical re-
contrast, in the interviews almost no numbers occur, since            port as well. Consequently, the implicit clinical events and
the discussions in weblogs are more likely to use simple,             evidences are expected to be relevant to understand and in-
colloquial vocabulary to present the personal attitudes and           terpret the status of the patient.
preferences.
   Stop Word: The nurse letters and radiology reports con-            4.2    Results of the Sentiment Analysis
tain 13% and 17% stop words respectively. In contrast, the
percentage of stop words in the interview corpus is with 32%             The automatically retrieved polarity for the texts were
significantly higher, which shows that the clinical documents         compared to the manual annotation done by clinical experts.
are clearly written in a concise way, focusing on facts.              The overall accuracy and F1 measure for the three text types
   Nouns and Pronouns: What noteworthy is, the percentage             is shown in Table 4.1: Accuracy is the proportion of true
of nouns in radiology reports (31%) and nurse letter (33%)            results in the population. The sentiment analysis of inter-
is clearly higher than the percentage of nouns in interviews          views leads to an acceptable accuracy of 69.6%. The results
(21%), while the percentage of pronouns in the interviews             for nurse letters and radiology reports have merely achieved
(4%) is notably higher than in the radiology report (0%)              the accuracies of 42% and 44% respectively. This shows that
and nurse letters (1%). The reason is that in medical facts           existing methods need to be adapted when processing these
are described in clinical narratives using nouns from medical         texts and that sentiment is different. Furthermore, the F1
terminologies (e.g. names of diseases, symptoms , medica-             measure for positive texts (F1 good) is significantly higher
tions). In contrast, the interviews contain more subjective           for the clinical texts than for negative (F1 bad) texts. A
terms and use a large amount of first person expressions to           manual assessment showed that the positive sentiments or
express the ideas and opinions of individuals.                        outcomes are described in an explicit way, e.g., by phrases
   Adjective and Adverb: Another interesting finding is that          such as the “patient slept well, the treatment has a satisfac-
the clinical narratives contain a substantial amount of ad-           tory result” or “the tube has been placed successfully”. For
                                                                      negative clinical events, the nurse and physician were more


                                                                 14
likely to express the status of patient in a careful and cau-                     also shows that for interpreting the detected sentiment,
tious manner, e.g. by phrase such as “some situation cannot                       the context need to be considered. Further, sentiment
be excluded or need further pathological investigation”. The                      can be seen as presence, change in or certainty of a
radiology reports are more likely to exclude or confirm the                       medical condition. I.e. a medical condition can exist,
occurrences of certain clinical events rather than to give a                      improve, worsen, be certain or uncertain. The treat-
final diagnosis. In addition, the recognition of neutral sit-                     ment outcome can be positive, negative (e.g. surgery
uations is difficult, since the judgment of neutral outcome                       was successful or failed), neutral or a treatment can
depends on the recognition of positive and negative terms.                        have no outcome.
However, neutral clinical outcomes in real world are proba-
bly not objectively expressed. Some surgical result may only                   3. A simple method for sentiment analysis is not well
show moderate effect, but it may turn out to be an insignif-                      suited to analyze sentiment in clinical narratives. Sen-
icant outcome in nurse letters or might even produce some                         timent in clinical texts differs significantly from sen-
negative feedbacks. During the annotation, our physician                          timent in general texts. In particular, implicit senti-
tended to give more positive and negative judgments to the                        ments need to be detected. An adapted annotation
reports rather than neutral ones, since the determination                         scheme should be defined with the help of physicians.
of “neutral” needs more context and reference, which is not                       New features for sentiment analysis need to be col-
that easy to obtain without knowing the complete patient                          lected for gathering these subjective sentiments.
history.                                                                  In the short term, we will develop a sentiment lexicon spe-
                                                                          cific for the medical domain. It will define a scheme for
5.     CONCLUSION AND FUTURE WORK                                         analyzing and retrieving implicit sentiments and attitudes
   In this paper, we have studied the linguistic characteris-             expressed in clinical texts. The kind of influence and de-
tics of clinical narratives compared to a web data set and                gree of influence of a symptom to the health status will be
analyzed the feasibility of a simple sentiment analysis ap-               defined. This lexicon or ontology will be exploited for devel-
proach on clinical narratives. The results provide important              oping a more comprehensive sentiment analysis algorithm.
insights to understand sentiment in clinical narratives and
to continue with developing corresponding analysis meth-                  6.     REFERENCES
ods. The initial three research questions raised in Section 1
                                                                          [1] K. Denecke. Model-based Decision Support:
can be answered.
                                                                              Requirements and Future for its Application in
     1. The linguistic analysis showed that clinical narratives               Surgery. Biomedical Engineering, 58(1), 2013.
        contain a moderate amount of sentiment terms. In                  [2] C. Friedman, T. C. Rindflesch, and M. Corn. Natural
        contrast to the web data set, more numbers, medi-                     language processing: State of the art and prospects for
        cal terms (nouns), location-related adjectives are ex-                significant progress, a workshop sponsored by the
        ploited and less stop words, and less pronouns are in-                national library of medicine. Journal of Biomedical
        cluded. This composition and word usage reflects the                  Informatics, 46(5):765–773, 2013.
        objectivity and preciseness of the clinical writing style.        [3] M. Hu and B. Liu. Mining and summarizing customer
     2. By analyzing the clinical documents, we learned more                  reviews. In Proc. Tenth ACM SIGKDD, KDD ’04,
        about the nature of sentiment in clinical narratives.                 pages 168–177, New York, NY, USA, 2004. ACM.
        Sentiment can concern the general health status of a              [4] B. Liu. Sentiment Analysis and Opinion Mining.
        patient, the outcome of a treatment or of a specific                  Synthesis Lectures on HLT. Morgan & Claypool
        medical condition or can concern uncertainty of an ob-                Publishers, 2012.
        servation. Good, bad or positive and negative is                  [5] Y. Niu, X. Zhu, J. Li, and G. Hirst. Analysis of polarity
        manifested in status changes, e.g. an improvement or                  information in medical text. In In: Proc of the AMIA
        worsening of a certain medical or physical condition or               2005 Annual Symposium, pages 570–574, 2005.
        the success or failure of a treatment. Sentiment can be           [6] D. L. Sackett, W. M. Rosenberg, J. Gray, R. B. Haynes,
        seen as health status of a patient: The patient’s health              and W. S. Richardson. Evidence based medicine: What
        status can be good, bad or normal at some point in                    it is and what it isn’t. BMJ, 312(7023):71–72, 1996.
        time, expressed either implicitly or explicitly. By an-           [7] A. Sarker, D. Molla, and C. Paris. Outcome polarity
        alyzing that health status over time, improvements or                 identification of medical papers. In Proceedings of the
        worsening in the status can be recognized. An implicit                Australasian Language Technology Association
        description of a health status concerns the mentioning                Workshop 2011, pages 105–114, Canberra, Australia,
        of critical symptoms (e.g. serious pain, extreme weight               December 2011.
        loss, high blood pressure). A explicit description of the         [8] T. Wilson, J. Wiebe, and P. Hoffmann. Recognizing
        health status is reflected through phrases such as “the               contextual polarity in phrase-level sentiment analysis.
        patient recovered well” or “normal”. Sentiment in clin-               In Proc of HLT ’05, HLT ’05, pages 347–354,
        ical texts can be the outcome of a treatment or the im-               Stroudsburg, PA, USA, 2005. Association for
        pact of a specific medical condition, i.e. whether the                Computational Linguistics.
        condition improved or worsens which allows to draw                [9] L. Xia, A. L. Gentile, J. Munro, and J. Iria. Improving
        conclusions on the effect or outcome of a treatment                   patient opinion mining through multi-step
        (positive/negative outcome). The phrase “blood sugar                  classification. In TSD, pages 70–76, 2009.
        decreased” could express a positive or negative change
        depending on the previous state. A decrease of blood
        pressure can be good when it was too high before. This


                                                                     15