Retrieving Attitudes: Sentiment Analysis from Clinical Narratives Yihan Deng Matthaeus Stoehr Kerstin Denecke ICCAS ENT Clinic ICCAS University of Leipzig University Hospital Leipzig University of Leipzig Semmelweissstr. 14 Liebigstr. 10 Semmelweissstr. 14 Leipzig, Germany Leipzig, Germany Leipzig, Germany {name.surname}@iccas.de ABSTRACT view on the facets of sentiment in clinical texts. With the Physicians and nurses express their judgments and observa- development of the principles of evidence-based medicine [6] tions towards a patient’s health status in clinical narratives. and digital patient modeling [1], the observations and judg- Thus, their judgments are explicitly or implicitly included in ments expressed in clinical narratives will play a crucial role patient records. To get impressions on the current health sit- for the clinical decision process. uation of a patient or on changes in the status, analysis and Consider the following scenario: During the daily ward retrieval of this subjective content is crucial. In this paper, round, a physician is making observations with respect to we approach this question as sentiment analysis problem and the health status of a patient (e.g. symptoms improved). analyze the feasibility of assessing these judgments in clin- The patient describes his personal experiences on the symp- ical text by means of general sentiment analysis methods. toms such as the degree of pain. All this information reflects Specifically, the word usage in clinical narratives and in a the individual health status and is documented in clinical general text corpus is compared. The linguistic characteris- notes. Retrieving, analyzing and aggregating this informa- tics of judgments in clinical narratives are collected. Besides, tion over time can support the treatment decisions and al- the requirements for sentiment analysis and retrieval from lows a physician to quickly get an overview on the health clinical narratives are derived. status. Another application example is retrieving attitudes from clinical documents which can support assessing the out- come of treatments. In this way, labor-intensive user studies Categories and Subject Descriptors for treatment or medication evaluation can be facilitated. H.3 [INFORMATION STORAGE AND RETRIEVAL]: For processing clinical narratives in the last years, effec- Content Analysis and Indexing tive algorithms in particular for named entity recognition and relation extraction [2] have been developed. Based on Keywords recognized entities and relations between entities, sentiments expressed in medical narratives can now be analyzed to offer Clinical text mining, Sentiment analysis an upper-level text understanding. Further, a corresponding retrieval of judgments or sentiments can be realized. How- 1. INTRODUCTION ever, sentiments, opinions and intentions expressed in clin- Sentiment analysis deals with determining the sentiment ical narratives have not been well exploited yet. In this with respect to a specific topic expressed in natural language paper, we start analyzing the sentiment expressions used in text. So far, the development of sentiment analysis meth- clinical texts through a linguistic comparison with a non- ods concentrated on processing very opinionated, subjective medical, subjective text corpus. texts such as customer reviews [3, 4]. Clearly, sentiment in Conventional methods for sentiment analysis have been de- clinical documents differs from sentiment in user-generated veloped for processing subjective on-line documents such as content or other text types. With the term sentiment we weblogs and forums. In this paper, our goal is to analyze refer to information on the health status, or on the outcome the applicability of such methods for sentiment analysis in of a medical treatment or change / seriousness of a symptom clinical narratives. We will identify necessary extensions of (e.g. serious pain) or the certainty of an observation. The existing methods and come up with the requirement of sen- work presented in this paper intends to get a more complete timent in clinical narratives. To this end, we will first com- pare two types of medical narratives (radiology report and nurse letter) with a weblog data set. The lexical and linguis- tic differences will be presented. Afterwards, we will apply a general subjectivity lexicon to medical narratives using dictionary-based methods. Sources of errors of this simple sentiment recognition approach will be discussed. The fol- lowing research questions will be addressed: MedIR July 11, 2014, Gold Coast, Australia 1. In comparison with user generation content, which lex- Copyright is held by the author/owner(s). ical characteristics do clinical narratives have? 12 2. What characterizes sentiments in clinical narratives? Radiology Report: A radiological report is mainly used to inform the treating physicians about the findings in an 3. Can existing methods for sentiment analysis be ap- radiological examination. It starts usually with a medical plied? Which adaptations are necessary? history, which is followed by a description of the region of interest and questions for the examinations. The texts con- 2. SENTIMENT ANALYSIS IN THE MEDI- tain many judgments and observations as observed in the CAL DOMAIN examination. Slashdot Interviews: Slashdot is a technology-related we- To our best knowledge, few work considered sentiment blog, which covers different technical topics. The users ex- analysis in medical texts: Xia et al. [9] have indicated that press their opinions on certain topics. We chose the tech- sentiments are topics-related. Their approach to sentiment nical interviews as benchmark instead of movie or product analysis starts with a standard topic classifier based on topic review, since technical interviews contain also a relatively labels. In the second step, special classifiers are initialized large amount of terminologies. to detect the polarity for each topic. The multi-step clas- sification method has earned a nearly 10% improvement of 3.2 Linguistic and Sentiment Analysis of the F1 measure in comparison with the single-step approach. Data Sets Niu et al. consider sentiment analysis in biomedical litera- ture [5]. They exploit a supervised method to classify the Apparently, the three text sources are different in terms polarity at sentence level. The linguistic features such as of terminology usage and content. The interview corpus is uni-grams, bi-grams and negations are employed. The med- typical user generated content. We expect that the corpus ical terms are merely replaced by their semantic category. will contain a relatively large amount of sentiment terms The category information and context information are de- and subjective expressions, while the clinical narratives are rived from the Unified Medical Language System (UMLS1 ). written in a more objective way. Less opinionated terms and The combination of linguistic features and domain-specific rather more clinical terminology are expected. However, the knowledge have improved the accuracy of the algorithm. question is whether the terminology and word usage is re- In summary, existing methods for sentiment analysis in the ally distributed as expected. To what extent do the corpora medical domain focus on processing biomedical literature differ with respect to linguistic characteristics? Recalling and patient-generated text. The clinical text which is used our initial research questions, we need to answer whether to record the activities and judgments of health care work- existing sentiment lexicons can provide the basis for ana- ers has not yet been analyzed. Moreover, the existing ap- lyzing judgments and sentiments in clinical narratives. In proaches and definitions of sentiment in the medical domain order to address these questions, an extraction pipeline has are derived from general sentiment analysis for Web 2.0 me- been built to obtain part of speeches and sentiment terms dia. Clinical context and medical knowledge have not been from the texts and to determine their occurrence frequency. used thoroughly besides some category meta data derived The Penn Tree POS-tagger4 and the SL sentiment lexicon from the UMLS [7, 5]. We expect that due to different ex- [8] (contains 8,221 single-term subjective expressions) have pressions and the more objective way of writing in the clini- been exploited for this purpose. The punctuation, numbers cal narratives, the conventional sentiment analysis methods and stop words were also extracted and their proportions need to be adapted to cope with the clinical context. We were calculated. will concentrate on that particular text material. After analyzing the linguistic composition of the data sets, we want to study the applicability of a dictionary-based sen- timent analysis approach on clinical narratives. Potential 3. METHODOLOGY limitations of the approach when applied to medical narra- tives will be identified. For this purpose, we have created 3.1 Text Material an experiment pipeline in KNIME5 . Two dictionary taggers In order to analyze the differences between the language were applied to recognize positive and negative terms in the in clinical narratives and general texts from the Internet, text respectively. A voting algorithm is applied to calculate 200 nurse letters and 200 radiology reports from “MIMIC II the polarity for each document. It is based on the number Database2 ” have been chosen as corpus. These documents of positive and negative occurrences and handles negations. form the domain-specific data source in our assessment. For Although it is only a simple approach, it is a direct method comparison reasons, we additionally consider 200 technical to evaluate the compatibility between the subjectivity lexi- interviews downloaded from the website Slashdot3 . We have con and clinical narratives. The SL sentiment lexicon from chosen that particular dataset since it belongs to the cate- Wilson et al. [8] is used by the dictionary tagger. It com- gory of user-generated, subjective content. Given the tech- prises a large amount of adjectives, adverbs, but also nouns nical topics, we however expect a certain similarity, mainly and verbs expressing sentiments. For evaluation purposes, an objectivity as it occurs in clinical narratives. the three corpora were annotated with an overall document Nurse Letter: A nurse letter is part of a patient record, polarity at document level by one physician from our uni- and is written by nurses on duty. Its content reflects the versity hospital. situation of the patient and the feedback to the ongoing treatment. It is written in a relatively subjective manner. Acronyms and typos appear very often in nurse letters. 4. RESULTS AND DISCUSSION 1 http://www.nlm.nih.gov/research/umls/, accessed: 4.1 Results of the Linguistic Analysis 20.04.2014 2 4 http://www.physionet.org/, accessed 20.04.2014 http://www.cis.upenn.edu/ treebank/, accessed 20.04.2014 3 5 http://slashdot.org, accessed 20.04.2014 http://www.knime.org/, accessed 20.4.2014 13 Figure 1: Result of the Linguistic Analysis In Figure 1, the proportions of punctuations, numbers, Types Accu(overall) F1(Bad) F1(Neutral) F1(Good) stop words, nouns, pronouns, adjectives, adverbs as well Interviews 0.696 0.754 0.367 0.735 as the sentiment terms are illustrated. Part of speeches of Nurse 0.420 0.437 0.216 0.503 terms that matched with the sentiment lexicon have not Letter been considered. The result has partially confirmed our ex- Radiology 0.446 0.297 0.080 0.559 pectation. Report Sentiment Terms: According to the results, the normal in- terview corpus contains the highest proportion of sentiment Table 1: Sentiment Analysis Results: Accuracy and terms with 8%, while the radiology reports contain 4% and F1 measure for three text types nurse letters 6% sentiment terms. These results have ap- proved our observation that nurse letters are written more subjectively in comparison to radiology reports, but they jectives (6-8% of the terms) that are not included in the SL are still more objective than the interviews. The sentiments sentiment lexicon. In contrast, all adjectives in the interview expressed in nurse letters are normally implicit and appear corpus matched with the sentiment lexicon. The additional with the description of patient’s health status, or the social adjectives in clinical narratives are mainly related to body records for the visitors of the patients. Opinionated terms locations, such as “left” side, “right” side, “vertical”, ”dorsal”, and expressions such as suspicion, negation, approval or rec- “cervical”. They express neither emotion nor attitude but ommendations can be found in radiology reports mainly in anatomical concepts and relative locations in the body. In the conclusion section or impression part at the end of the summary, the nurse letters show a relatively higher linguis- whole report. tic similarity to technical interviews than radiology reports. Number: Numbers are one of the most important elements They are to a certain extent more subjectively written than in clinical reports, where they are mainly used to represent radiology reports. The large amount of the medical terms the dose of medications, the size of a tumor or the frequency (noun, adjective) describe the status of a patient. They re- of a treatment, etc. In our clinical data sets, numbers com- flect the attitudes of physicians. Thus, the implicit clinical prise between 2% and 3% of the words or characters. In events may influence the polarity outcome of a clinical re- contrast, in the interviews almost no numbers occur, since port as well. Consequently, the implicit clinical events and the discussions in weblogs are more likely to use simple, evidences are expected to be relevant to understand and in- colloquial vocabulary to present the personal attitudes and terpret the status of the patient. preferences. Stop Word: The nurse letters and radiology reports con- 4.2 Results of the Sentiment Analysis tain 13% and 17% stop words respectively. In contrast, the percentage of stop words in the interview corpus is with 32% The automatically retrieved polarity for the texts were significantly higher, which shows that the clinical documents compared to the manual annotation done by clinical experts. are clearly written in a concise way, focusing on facts. The overall accuracy and F1 measure for the three text types Nouns and Pronouns: What noteworthy is, the percentage is shown in Table 4.1: Accuracy is the proportion of true of nouns in radiology reports (31%) and nurse letter (33%) results in the population. The sentiment analysis of inter- is clearly higher than the percentage of nouns in interviews views leads to an acceptable accuracy of 69.6%. The results (21%), while the percentage of pronouns in the interviews for nurse letters and radiology reports have merely achieved (4%) is notably higher than in the radiology report (0%) the accuracies of 42% and 44% respectively. This shows that and nurse letters (1%). The reason is that in medical facts existing methods need to be adapted when processing these are described in clinical narratives using nouns from medical texts and that sentiment is different. Furthermore, the F1 terminologies (e.g. names of diseases, symptoms , medica- measure for positive texts (F1 good) is significantly higher tions). In contrast, the interviews contain more subjective for the clinical texts than for negative (F1 bad) texts. A terms and use a large amount of first person expressions to manual assessment showed that the positive sentiments or express the ideas and opinions of individuals. outcomes are described in an explicit way, e.g., by phrases Adjective and Adverb: Another interesting finding is that such as the “patient slept well, the treatment has a satisfac- the clinical narratives contain a substantial amount of ad- tory result” or “the tube has been placed successfully”. For negative clinical events, the nurse and physician were more 14 likely to express the status of patient in a careful and cau- also shows that for interpreting the detected sentiment, tious manner, e.g. by phrase such as “some situation cannot the context need to be considered. Further, sentiment be excluded or need further pathological investigation”. The can be seen as presence, change in or certainty of a radiology reports are more likely to exclude or confirm the medical condition. I.e. a medical condition can exist, occurrences of certain clinical events rather than to give a improve, worsen, be certain or uncertain. The treat- final diagnosis. In addition, the recognition of neutral sit- ment outcome can be positive, negative (e.g. surgery uations is difficult, since the judgment of neutral outcome was successful or failed), neutral or a treatment can depends on the recognition of positive and negative terms. have no outcome. However, neutral clinical outcomes in real world are proba- bly not objectively expressed. Some surgical result may only 3. A simple method for sentiment analysis is not well show moderate effect, but it may turn out to be an insignif- suited to analyze sentiment in clinical narratives. Sen- icant outcome in nurse letters or might even produce some timent in clinical texts differs significantly from sen- negative feedbacks. During the annotation, our physician timent in general texts. In particular, implicit senti- tended to give more positive and negative judgments to the ments need to be detected. An adapted annotation reports rather than neutral ones, since the determination scheme should be defined with the help of physicians. of “neutral” needs more context and reference, which is not New features for sentiment analysis need to be col- that easy to obtain without knowing the complete patient lected for gathering these subjective sentiments. history. In the short term, we will develop a sentiment lexicon spe- cific for the medical domain. It will define a scheme for 5. CONCLUSION AND FUTURE WORK analyzing and retrieving implicit sentiments and attitudes In this paper, we have studied the linguistic characteris- expressed in clinical texts. The kind of influence and de- tics of clinical narratives compared to a web data set and gree of influence of a symptom to the health status will be analyzed the feasibility of a simple sentiment analysis ap- defined. This lexicon or ontology will be exploited for devel- proach on clinical narratives. The results provide important oping a more comprehensive sentiment analysis algorithm. insights to understand sentiment in clinical narratives and to continue with developing corresponding analysis meth- 6. REFERENCES ods. The initial three research questions raised in Section 1 [1] K. Denecke. Model-based Decision Support: can be answered. Requirements and Future for its Application in 1. The linguistic analysis showed that clinical narratives Surgery. Biomedical Engineering, 58(1), 2013. contain a moderate amount of sentiment terms. In [2] C. Friedman, T. C. Rindflesch, and M. Corn. Natural contrast to the web data set, more numbers, medi- language processing: State of the art and prospects for cal terms (nouns), location-related adjectives are ex- significant progress, a workshop sponsored by the ploited and less stop words, and less pronouns are in- national library of medicine. Journal of Biomedical cluded. This composition and word usage reflects the Informatics, 46(5):765–773, 2013. objectivity and preciseness of the clinical writing style. [3] M. Hu and B. Liu. Mining and summarizing customer 2. By analyzing the clinical documents, we learned more reviews. In Proc. Tenth ACM SIGKDD, KDD ’04, about the nature of sentiment in clinical narratives. pages 168–177, New York, NY, USA, 2004. ACM. Sentiment can concern the general health status of a [4] B. Liu. Sentiment Analysis and Opinion Mining. patient, the outcome of a treatment or of a specific Synthesis Lectures on HLT. Morgan & Claypool medical condition or can concern uncertainty of an ob- Publishers, 2012. servation. Good, bad or positive and negative is [5] Y. Niu, X. Zhu, J. Li, and G. Hirst. Analysis of polarity manifested in status changes, e.g. an improvement or information in medical text. In In: Proc of the AMIA worsening of a certain medical or physical condition or 2005 Annual Symposium, pages 570–574, 2005. the success or failure of a treatment. Sentiment can be [6] D. L. Sackett, W. M. Rosenberg, J. Gray, R. B. Haynes, seen as health status of a patient: The patient’s health and W. S. Richardson. Evidence based medicine: What status can be good, bad or normal at some point in it is and what it isn’t. BMJ, 312(7023):71–72, 1996. time, expressed either implicitly or explicitly. By an- [7] A. Sarker, D. Molla, and C. Paris. Outcome polarity alyzing that health status over time, improvements or identification of medical papers. In Proceedings of the worsening in the status can be recognized. An implicit Australasian Language Technology Association description of a health status concerns the mentioning Workshop 2011, pages 105–114, Canberra, Australia, of critical symptoms (e.g. serious pain, extreme weight December 2011. loss, high blood pressure). A explicit description of the [8] T. Wilson, J. Wiebe, and P. Hoffmann. Recognizing health status is reflected through phrases such as “the contextual polarity in phrase-level sentiment analysis. patient recovered well” or “normal”. Sentiment in clin- In Proc of HLT ’05, HLT ’05, pages 347–354, ical texts can be the outcome of a treatment or the im- Stroudsburg, PA, USA, 2005. Association for pact of a specific medical condition, i.e. whether the Computational Linguistics. condition improved or worsens which allows to draw [9] L. Xia, A. L. Gentile, J. Munro, and J. Iria. Improving conclusions on the effect or outcome of a treatment patient opinion mining through multi-step (positive/negative outcome). The phrase “blood sugar classification. In TSD, pages 70–76, 2009. decreased” could express a positive or negative change depending on the previous state. A decrease of blood pressure can be good when it was too high before. This 15