228 Analysis of hospital reviews through sentiment analysis: An approach to aid patients in the times of COVID-19 pandemic Ankita Bansal, Manoj Maurya, Niranjan Kumar, and Siddharth Tomar Department of Information and Technology, Netaji Subhas University of Technology, Dwarka, New Delhi, India Abstract. The COVID pandemic had over-stressed our healthcare system. This has affected the lives of both COVID and non-COVID patients, in the worst possible way. The patients are facing difficulty in getting proper medical care in time. The reason being the already stressed situation of hospitals and lack of proper information in the general population. The lack of information is amplified in the environment of fear and panic in the pandemic. We aim to use sentiment analysis of hospital reviews to provide relevant and important information about the operating conditions and current status of hospitals to the general public. Sentiment analysis applied to patient's reviews to quantify the direction and/or magnitude of the emotive content. Patient comments are segregated into different sections and analysis is done on these sections to quantify the positive or negative aspects of the reviews. This will allow us to give an overall rating to the hospital-based on key parameters that will help people to understand the hospital's current condition. The results established a strong relationship between the online reviews and the overall recommendation percentage of the hospitals. This information provides great value to the patients by allowing them to compare and select the best option. The information is more reliable and robust due to its dynamic nature. Keywords: Sentiment analysis, Natural language processing, lemmatization, web scrapping, COVID-19 outbreak In contrast, there have been few sectors that have 1. Introduction benefited. In the difficult times of COVID-19 Pandemic, our healthcare system has been The end of the year 2019 has seen a spread of continuously operating above its capacity and is in COVID-19 coronavirus in China which infected a a stressed situation [5]. This has not only affected large number of people all over the country [1]. the healthcare workers but also patients who are in However, China was soon able to control the immediate need of medical care. Patients have outbreak, while COVID-19 spread to other faced great difficulty in getting access to hospitals. countries. The primary reason being overcrowding of hospitals but another significant reason that can be At present, many countries can control the further resolved is the lack of information about the spread of COVID-19, from the pandemic, the hospitals among the general populous. This healthcare industry being one of them. while few problem has serious consequences on not only countries are still struggling to adopt efficient and COVID-19 patients but also other non-COVID effective The study by Bartik et al. 2020 shows that patients who are required to take extra precautions pandemic has led to a massive dislocation of small in this pandemic as their current health puts them at businesses [4]. higher risk [6]. The non-COVID patients are also finding serious difficulties in getting treatment due to a lack of proper information about hospitals and ISIC’21: International Semantic Intelligence Conference, details regarding their current status [6]. To get the February 25-27, 2021, Delhi, India EMAIL: ankita.bansal06@gmail.com (A. Bansal); precise information about hospitals like the manojm.it.17@nsit.net.in (M.Maurya); availability of beds, availability of ventilators, and niranjan.it.17@nsit.net.in (N.Kumar); any other such data, people generally follow the siddhartht.it.17@nsit.net.in (S.Tomar) traditional process of question-answers where they ©️ 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). ask from their friends, acquaintances, and other CEUR Workshop Proceedings (CEUR- people who might have used the facilities of the WS.org) particular hospital or may know about the hospital. Nowadays, the more popular method is to read 229 reviews of the hospitals posted by the patients and addressed the task of classifying given reviews as other stakeholders on various blogs and social positive, negative, or neutral. The task is complex networking websites which are easily accessible on in nature due to the inherent complexity of the the internet [7]. Patients share their opinions, natural language constructs as there are many ways suggestions, and other thoughts, which may be to indicate positive and negative views in natural either favorable or unfavorable on various review language. sites. However, these reviews are largely The contributions of this paper are: unstructured, contain sarcasm, language slangs, etc. 1. Development of a general approach towards Hence, it becomes necessary to understand these Sentiment Analysis for any product or services reviews properly to derive meaningful insights based on online reviews. from them, which is the basic idea behind 2. Collection of nearly 14121 reviews from Sentiment Analysis [8]. Sentiment analysis is a mouthshut.com of 184 hospitals. domain that classifies reviews, comments, or 3. Established a relationship between the opinions into two basic and intrinsic emotional patient's review for the hospital and overall indicators namely positive and negative [9]. recommendation for the hospital. Sentiment analysis means the understanding of the 4.Classification of a given review as positive, emotional essence of any text and the evaluation of negative, neutral. the nature of opinion of the text [10]. Nature can be 5. Provide the overall rating (recommendation either on the extremes of good and bad or just percentage) of each of the hospitals so that the neutral [11]. Sentiment analysis has been used on customers/patients can draw a comparison amongst the reviews of movies, hotels, restaurants, etc. to them. help customers to choose according to their requirements, but work on the application of The task of Sentiment Analysis of hospital reviews sentiment analysis on the reviews of hospitals is not mainly includes the following steps in sequence – much and thus it is an area to explore. The manual preprocessing [13], feature extraction which is way of filtering out reviews associated with a followed by the selection of the relevant features hospital's status and the condition is not scalable [14], classification [15], and finally the result and also has reliability issues. To automate the analysis. Preprocessing includes removing any process of categorizing the sentiment from reviews error present in the review which is very important or posts, we analyze the text then perform natural for the task of proper and accurate classification of language processing along with various text. Feature Extraction identifies the features that computational techniques [12]. are essential and then these features are stored in the feature array. Different types of feature In this paper, the authors have developed a extraction methodologies exist and in this paper, methodology to extract reviews on 184 hospitals the authors have used machine learning-based from mouthshut.com and thereby apply sentiment feature extraction methods. Finally, the analysis analysis on the reviews to gain meaningful insights. involves the overall calculation of the percentage In other words, the authors aim to conclude the for the hospital. This percentage is based on the patient's reviews collected from some review polarity of all the reviews of the particular hospital. specific sites. In this work, the authors have From this study, the authors found that the can classify the consumer's emotions about the sentiment analysis of online reviews can provide product or service in question [16,17,18]. reliable information on hospitals. This information can be very useful for patients as it helps them The paper is organized in a total of ten sections. compare and choose what's best for them. The The next section discusses some works already analysis result also helps hospital administration to done in the field of sentiment analysis along with improve the current services according to the needs other works related to the analysis of hospital of the patients. Sentiment analysis is a very useful reviews. Following this, section 3 describes the tool for gaining insight into a patient's opinions. collection and structuring of the dataset for the This can also be generalized for consumers. A large hospital and its reviews. Section 4 discusses the number of companies including the health sector as methodology of the work. The results are presented well are using this tool for providing a better service in section 5. The next section discusses the practical to their customers. After analysis of the review, we 230 implications of the work. The final section contains 3. Data collection and preprocessing the epitome of paper and future work is discussed. Data collection is done by the method of web scraping. It is a method used to fetch large volumes 2. Related Work of data from multiple websites. Web scraping automates the process of data gathering and data Sentiment Analysis is popularly used for providing can be gathered in multiple formats. We extracted ratings to movies, hotels, restaurants based on the the hospital reviews data from India's largest reviews provided by the customers on different review website mouthshut.com. Python was used review websites, blogs, online groups, etc. For for data extraction as it has efficient tools for web example, many approaches involve the use of using scraping. a tree kernel-based model for the classification of the polarity for Twitter tweets into three classes Web scraping is done using the Beautifulsoup namely: positive, negative, and neutral. This can library and the Request library of python. We have also be used for gauging public sentiment regarding extracted the review data from the website and any particular event, news, etc. The random forest stored it in a JSON file. Thus, the process of data method is often used to attain a very high accuracy collection can be summarized in three main steps: for predicting the overall reception and popularity (1) pairing of HTML websites, (2) extraction of of books by the reviews. In addition to this, few required data, and (3) storing of data. A total of studies have used sentiment analysis in the field of 14121 reviews of 184 hospitals are collected from healthcare. For example, the authors of paper [8] well-known and established review sites.After demonstrated the use of sentiment analysis for gathering data, we generally need to clean and analyzing a person's online posts, tweets, remarks, reorganize the data as the collected data is not well etc. regarding their experiences in a hospital or any structured and needs some processing before it health-care-related institution. And the new becomes ready to use. We organized the raw data approach might be a better alternative than the after cleanup into a key-value pair where key - traditional methods such as surveys that were attribute indicates the Hospital's name and the value previously associated with measurements - attribute represents the reviews of the particular regarding customer satisfaction and feedback. hospital. In this study, preprocessing of the data is done by From this survey, we appreciated the importance of (1) Removing HTML tags and URLs, (2) our proposed work which is based on the idea of Correcting spelling errors. Reviews may contain using sentiment analysis on hospital reviews to bold /underlined/ italic words to emphasize the collect reliable and dynamic data on working meaning of some words or sentences. Different tags conditions of hospitals. in HTML are used for this purpose, for bold, for underline and for italics. However, The gathered results will be very beneficial for the while analyzing the reviews, such emphasizing is general public who are facing a serious lack of of no use as they do not provide any useful reliable and dynamic information on the condition information towards the sentiment, thus, they are of hospitals. The current virus-pandemic situation removed. Similarly, punctuation marks, special had amplified the problems faced by the public. characters, white spaces, stop words, etc. are also Reliable information will help both COVID and removed during preprocessing. In addition to this, non-COVID patients. This will also reduce the there may be some spelling errors in the review stress on our healthcare system which might face which can result in deviation from correct analysis. problems because of false information in public. TextBlob is a Python library for processing textual This work opens up a new dimension of data. possibilities where sentiment analysis can not only be used as feedback by the healthcare system but We used the correct () method in the TextBlob also provide public reliable and right information library to attempt spelling correction. After on hospitals and other such infrastructures. The removing all the unnecessary HTML tags and dynamic nature of information ensures future correcting the spelling mistakes, the whole review reliability and also makes our results more robust. 231 is separated into two parts, viz. the review title & regarding any of the following feelings such as the review body. anger, gloom, joy, confidence, shock, pity, panic, and expectation. In the following section, we The review title gives the whole gist of the review explain in detail, the algorithm used for polarity and hence can be used to get the whole sentiment calculation and assigning the positive, negative, of the review. So to analyze the sentiment of the and neutral rating to each hospital. review we first find the polarity of the review title and if and only if the polarity is neutral then we The coding has been done in Python using the process the review body for finding the polarity. module NLTK (Natural Language Toolkit) which is used for natural language processing. Figure 1 represents the main processing that is applied to the 4. Methodology preprocessed data. It involves the polarity calculation to find the sentiment of the review. The If we delve into the Sentiment analysis, it also output is organized and stored in a database that is involves the understanding of different emotions analyzed for the final results and conclusion. conveyed by the patient. These emotions can be Fig. 1. The proposed methodology of the work 232 4.1 Steps used to conduct the review analysis based on its grammatical definition. We have attached the PosTag such as adjective, noun, adverb The proposed algorithm is represented by the & verb with the word in the sentence. The noun following steps: phrases generally correspond to product features, Step 1: Break the review into sentences i.e. adjectives refer to opinions, and adverbs are Tokenization. generally used as modifiers to represent the degree By our methodology, the sentiment analysis can of expressiveness of opinions. only be done for each sentence of the review and therefore we need to break the review into Step 4: Lemmatization sentences. We Apply the SentenceTokeniser() to It is a text normalization technique in the domain of break each review into sentences as follows: Natural Language Processing that is used to prepare text & words for further processing. It refers to nltk.tokenize.punkt.PunktSentenceTokenizer() performing things in the right way along with the use of a vocabulary and morphological analysis of Step 2: Analyzing the negation sentences words, aimed for the elimination of inflectional For sentiment analysis, the algorithm must know endings only and to return the base form of a word, the structure of the sentence. For example, the which is called a lemma. For example:“playing”-> sentence can be complex and may include Lemmatization -> “play” comparison, contradiction, negation, or irony. “plays”-> Lemmatization -> “play” and “played”- Negation can be of a localized nature, it can of long- > Lemmatization -> “play”. range or it can be the negation of the subject. To analyze these sentences, we mark the words which Step 5: Repeat steps 6-7 until each sentence of the have changed their meaning due to the tone of the review is iterated. sentence. For eg. If the sentence is stated as “The hospital is not good” the sentiwordnet will give this Step 6: Maintain three counters: the first counter a positive polarity due to the presence of "good" in stores the positive reviews count, the second the sentence. After marking the negation of the counter stores the negative reviews count, and the sentence it will become "The hospital is not third counter stores the neural reviews count. good_NEG” now the sentiwordnet will give this sentence a negative polarity due to the presence of Step 7: Polarity Calculation using SentiWordNet “good_NEG”. The interpretation of the word changes for the sentiwordnet after it is marked as The sentiment analyzer uses words, their meanings, negation. Negation analysis is done as follows: alternative words, polarity of each word, and association intensity level with each word words in nltk.sentiment.util.mark_negation(sentence) a sentence. The polarity of the sentence is usually based on the meaning of words. However, the For example, consider the following original negation (for negative sentences only) changes the sentences and their negation analysis. meaning of the words and polarity of the sentence in reverse order. Now check the orientation of the Original Sentence ⇒ Sentence after negation review title using SentiWordNet. If the orientation analysis(“polarity”) is positive or negative, then update the respective counter. If the orientation is neutral, then check the not good ⇒ not good_NEG (“negative”) orientation of the review body and update the does not look very good ⇒ does not look_NEG respective counter. very_NEG good_NEG(“negative”) no one thinks that it’s good ⇒ no one_NEG Step 8: Calculating the final results thinks_NEG that_NEG it's_NEG good_NEG(“negative”) The polarity data is calculated for every review and a final polarity value of each hospital is computed Step 3: Tagging words by their syntactical nature by taking the summation of the polarity value of all the reviews for a particular hospital. Then using this Part of Speech (POS) tagging involves tagging the data, we finally calculate the recommendation word in a corpus to a congruous part of a speech tag percentage of every hospital according to the 233 reviews. Recommendation percentage is calculated selected a few hospitals out of a total of 184 by dividing the number of positive reviews by the hospitals. For selecting a few hospitals, we have total number of reviews and then multiplying by divided the hospitals into 3 categories and have 100 for percentage. represented the results for only the hospitals which fall in these categories. Hospitals are categorized based on their recommendation percentages as 5. Results analysis follows: Category 1: Hospitals with a recommendation Here we evaluate and analyze the results. The percentage between 80 to 100 results are represented in the terms of quantity of This category represents the ‘Best’ class which positive ratings, the number of negative ratings, the includes the most recommended hospitals. number of neutral ratings, and the recommendation Category 2: Hospitals with a recommendation percentage in Table 1. The amount of positive percentage between 70 to 79 This category rating represents the number of good ratings given represents the 'Average' class. by the user in their review and the number of the Category 3: Hospitals with the recommendation negative rating represents the quantity of poor percentage between 0 to 69 This category rating given by the user in their review. Then finally represents the ‘Poor’ class which includes the least the recommendation percentage is indicated for the recommended hospitals. hospital-based on the polarity of overall reviews of The hospitals are divided into three categories the hospital. Recommendation percentage is according to the scatter plot shown in figure 2 calculated by dividing the number of positive which represents the overall distribution of the reviews by the total number of reviews and then hospitals over the recommendation percentage. multiplying by 100 for percentage. This trend line can further be used as a benchmark that can be tested against other hospital’s As discussed, we collected 14121 reviews of 184 recommendation percentages. This benchmark also hospitals from mouthshut.com a well-known and allows us to further categorize hospitals based on established review site. Due to space constraints, it their relative recommendation percentage which is not feasible to represent the result values of all can also be used to generate relative ranking for the the 184 hospitals. Therefore, concerning all the hospitals in the dataset. Since the ranking in reviews of 184 hospitals, we report the overall general, it can also be used on newly included data negative, positive and neutral ratings which are to expand the scope of the dataset and also increase 11427, 2545, and 149 respectively. To show the the reliability of the methodology. results corresponding to the hospital names, we Fig. 2. Scatter plot indicating the overall distributions of hospitals over the recommendation percentage 234 Table 1. Recommendation percentage of the selected hospitals S. No. Hospital Name Number of Number of Number of Recommen +ve ratings -ve ratings neutral dati ratings on % Class1 Hospitals with the highest recommendation percentage 1 C T Hospital - Thergaon - Pune 11 1 0 91.66 2 Mody General Hospital - Kadodara - Surat 11 0 0 100.00 3 Alpha Hospital & Research Centre - Mela 76 8 0 90.47 Anupannady - Madurai 4 Jayam Hospital - Chokikulam - Madurai 52 6 0 89.65 Class2 Hospitals with an average recommendation percentage 5 Baava Medicals - Madurai Main - Madurai 57 13 1 80.28 6 Shyama Heart Care Centre - Chargawan - 76 22 0 77.55 Gorakhpur 7 Shukla Hospital - Betiahata - Gorakhpur 58 19 1 74.35 Class3 Hospitals with the least recommendation percentage. 8 Padaav: Speciality Ayurvedic Treatment Centre - 1 4 0 20.00 Dehradun 9 Fortis Hospital - Kangra 2 3 0 40.00 10 Columbia Asia Hospital - Malleshwaran - 91 104 5 45.5 Bangalore 11 Park Hospital - Gurgaon 18 21 0 46.15 12 Iswarya Fertility Test Tube Baby and Research 11 11 0 50.00 Centre - Coimbatore The results are also represented in the form of a pie percentage of positive reviews also is following our chart in figure 3. The pie chart is used for the average recommendation percentage of the representation of sentiment distribution of reviews. hospitals. Both scatter plot (for distribution of The chart indicates that neutral reviews are only a hospitals on the recommendation percentage) and very small fraction which completely coincides pie chart (for distribution of sentiment of the with the general nature of opinions and reviews in reviews) validated the integrity and reliability of the online community and also with commonly the dataset used in this paper to demonstrate the observed patterns of distribution of sentiments of methodology for sentiment analysis on the hospital opinion by the general populous. The large reviews by the patients. 235 Fig. 3. Pie chart indicating the overall distributions of reviews on sentiments of the review. large data -set and its dynamic nature. This result 6. Conclusion & Future Work can also provide hospital administration with the patient's valuable feedback which will allow them Sentiment analysis is an ever-expanding field and it to improve the quality of service at the hospital. needs a lot of work to mature as a domain of study This paper has emphasized the importance of The approach discussed in this paper also has some review analysis for a multitude of benefits for both implications for other sectors especially the service patients and the healthcare sector itself. It can help sector where the quality of services and its patients by providing relevant and reliable data for perception is the most important metric for the both COVID and non-COVID patients. It also helps company involved. This paper established a strong in revealing the weak links of the overall system relationship with the online reviews and how its which need immediate improvement. These analysis has applications for both the consumers improvements will not only help patients but also and the company which is providing the services. allow hospitals to increase their operational ability And for the provider company, the analyzed data in stressed pandemic situations. The general can provide important insight for upgrading the method involves three main sections namely: data standard of the facility. The approach may be collection, pre-processing, sentiment analysis of further extended to analyze the different aspects of reviews, and finally analysis for the the service or product in question. This may be recommendation rating of the hospitals. The data achieved with slight modification in the collection is done through web-scraping using methodology and the same dataset provided that the python and preprocessing involves the removal of dataset has diverse reviews covering the different non-essential components of the collected data. aspects of the product or service. This can give us Then using sentiment analysis, we find the more detailed insight into different aspects or sentiment of the review which can be positive, attributes of the product. negative, or neutral. Then in the final step, we calculate the final recommendation rating of the Considering the healthcare industry as an example hospital by finding the percentage of positive we can modify the approach and try to find the ratings based on the total no of reviews. aspect-based rating for different attributes of the hospital. Different aspects related to hospitals such Fig. 2 gives us a benchmark to compare the as infrastructure, food quality, economic expense, hospitals based on the patient's perceived quality of etc. can also be categorized and rated. One service and personal experience. This benchmark drawback is that this method will need a more can also be used to compare any other hospital. The diverse dataset or extra dataset to augment the reliability of ratings is ensured by the analysis 236 References Association for Computational Linguistics, pp. 271 (2004). [11] A. Tripathy, A. Agarwal, S. Rath, Classification [1] L. Li, Z. Yang, Z. Dang, C. Meng, J. Huang, H. of Sentimental Reviews Using Machine Learning Meng, D. Wang, G. Chen, J. Zhang, H. Peng, Y. Techniques. Procedia Computer Science, Vol. 57, Shao, Propagation analysis and prediction of the pp. 821–829 (2015). COVID-19, Infectious Disease Modelling, Volume 5, Pages 282-292, (2020) [12] Hussein DME-DM, A survey on sentiment analysis challenges. J, King Saud Univ Eng Sci [2] D. Gursoy, C. Chi, Effects of COVID-19 30(4): 330–338 pandemic on hospitality industry: a review of the current situations and a research agenda, Journal [13] I. Latha, G., Varma, A. Govardhan, of Hospitality Marketing & Management, 29:5, “Preprocessing the informal text for efficient 527-529, (2020) sentiment analysis.” International Journal of Emerging Trends & Technology in Computer [3] A. Hoisington, 2020. “5 insights about how the Science (IJETTCS) 1, no. 2: pp.58-61. (2012). COVID-19 pandemic will affect hotels”, available at: [14] A. Manek, P. Shenoy, M. Mohan, K. Venugopal, https://www.hotelmanagement.net/own/roundup- “Aspect term extraction for sentiment analysis in 5-insights-about-how-COVID-19-pandemic- large movie reviews using Gini Index feature will-affect-hotels selection method and SVM classifier.” World Wide Web Vol. 20, no. 2, pp.135-154. 2017. [4] A. W. Bartik, M. Bertrand, Z. B. Cullen, E. L. Glaeser, M. Luca, C. T. Stanton, how are small [15] A. Kennedy, D. Inkpen, “Sentiment classification businesses adjusting to COVID-19? Early of movie reviews using contextual valence evidence from a survey (No. w26989). National shifters.” Computational intelligence, Vol. 22, no. Bureau of Economic Research (2020). 2, .pp.110-125. (2006). [5] D. Kavadi, R. Patan, M. Ramachandran, A. [16] C. Smithikrai, Effectiveness of teaching with Gandomi, Partial derivative Nonlinear Global movies to promote positive characteristics and Pandemic Machine Learning prediction of behaviors. Procedia-Social and Behavioral COVID 19. In: Chaos, Solitons & Fractals. An Sciences, Vol. 217, pp.522–530 (2016). interdisciplinary journal of nonlinear science; [17] T. Shrutiand, M. Choudhary, Feature-Based ISSN: 0960-0779, (2020). Opinion Mining on Movie Review. International [6] S. F. Ardabili, A. Mosavi, P. Ghamisi, F. Journal of Advanced Engineering Research and Ferdinand, A. R. Varkonyi-Koczy, U. Reuter, T. Science, Vol. 3, pp.77–81. (2016). Rabczuk, P. M. Atkinson, COVID-19 Outbreak [18] M.Hu, B. Liu, Mining, summarizing customer Prediction with Machine Learning. This research reviews. In Proceedings of the tenth ACM is supported within the project of “Support of SIGKDD international conference on Knowledge research and development activities of the J. discovery and data mining ACM, pp. 168–177 Selye University in the field of Digital Slovakia (2004). and creative industry” of the Research and Innovation Operational Programme (ITMS code: NFP313010T504) co-funded by the European Regional Development Fund (2020). [7] H. Kumar, B. Harish, H. Darshan, Sentiment Analysis on IMDb Movie Reviews Using Hybrid Feature Extraction Method.In: International Journal of Interactive Multimedia and Artificial Intelligence InPress(InPress):1, (2018). [8] A. Ortigosa, J. Martín, R. Carro, "Sentiment analysis in Facebook and its e-learning application." Computers in Human Behavior Vol. 31, pp.527-541. (2014). [9] P. Bhatia, R. Nath, Using Sentiment Analysis In Patient Satisfaction: A Survey. In: Advances in Mathematics: Scientific Journal 9 (2020), no.6, 3803–3812 ISSN: 1857-8365 (printed); 1857- 8438 (electronic); (2020). [10] B. Pang, L. Lee, A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd annual meeting on