Predicting Elderly Patient Behaviour in Rural Healthcare Using Machine Learning Prince Appiah1, Thierry Oscar Edoh 2 [0000-0002-7390-3396] and Jules Degila3[0000-0003-4688-9178] 1 University of Education Winneba-Kumasi, Ghana Princeappiah35@gmail.com 2 RFW-Universität Bonn, Bonn, Germany oscar.edoh@gmail.com 3 Institute of Mathematics & Physical Science, Porto-Novo, Benin Jules.degila@imsp-uac.org Abstract. The digitization of modern health care data in a rural community has produced a vast amount of patient data stored in health care record systems. Together with the rise of computing power this data could produce effective insight through advanced analysis of this data and include it in medical applications for use in daily operations. This is the case in which structured, semi-structured and unstructured dataset from emergency room admissions is used for machine learning, in order to develop models that predict the possibility of an elderly patient returning to an emergency room within 96 hours. Logistic regression was the selected algorithm since it commonly used in the healthcare data set. The results from the model had a recall of 73% and a precision of 78%. This paper discusses the implementation of such a model in daily operations with a new approach to cost benefits. In other instances, the study is a proof of the concept of predictive modeling in a health care context in rural communities. Keywords: Machine Learning, Rural healthcare, unstructured dataset, Logistic regression 1. Introduction The invention of the computer and even long before that, people are trying to predict or interpret different outcomes from data. This also applies to the health care sector notwithstanding this sector is extremely sensitive to errors because of varied reasons [1]. Rural healthcare centers are exponentially increasing the amount of data. Which has become opportune for machine learning [2]. Data collected in a rural hospital containing vitals, lab results, metadata, etc. can be combined into individual records for every patient. Through machine learning techniques we are able to make use of all this data. Using these algorithms we can predict various outcomes and form recommendations to support medical professionals or do predictions regarding the patient’s health. For example, algorithms can help assign medicines and treatments to patients, they can support medical professionals in making diagnoses and present new origins of certain diseases, the possibilities are endless. This knowledge can also be used to act proactively on different issues. It is, for example, possible to set computerized alerts when specific thresholds are exceeded to prevent unwanted consequences. However, despite the volume of data stored, the potential of using the data for strategic and operational decisions through the means of data analysis is, especially in rural health care, rarely acknowledge. It is obvious that, with this aforementioned data of patients, rural healthcare centers are not using data mining techniques to predict the behavior of elderly patients. The purpose of this study is to apply machine learning to rural health care data, in order to predict elderly patient behavior, providing a basis for medical decisions or risk stratification. The study focus on the prominent property of emergency patients. The fact that ten percent of elderly patients sent from emergency room return within 96 hours. This will help elderly patents to be identified before they return, decisions concerning their care could be taken in Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). IREHI-2019: International Conference on rural and elderly health Informatics, Dakar, Sénégal, December 04-06, 2019 order to reduce the risk of them returning. And also, more importantly, improve patient safety and cut costs, and gain new knowledge about what is causing the elderly patient to return. 2. Problem Statement Ever since the introduction of information technology in rural health care, the amount of data generated has increased exponentially. However, despite the volume of data stored, the potential of using the data for strategic and operational decisions through the means of data analysis is, especially in rural health care, rarely acknowledge. It is obvious that, with this aforementioned data of patients, rural healthcare centers are not using data mining techniques to predict the behavior of elderly patients. 3. Related Work Machine learning technique Logistic regression is often used when predicting outcomes especially in combination with clinical trials. For instance, Baan [3] uses logistic regression to identify undiagnosed diabetes. Also [4] used a random forest model together with data collected from publicly available health register. Using only readily available health record data in electronic health registers, Razavian et al. [5] present a retrospective study predicting type 2 diabetes, [6] makes a prediction of risk of hospitalization or death; and Kontio et al. [7] predict patient acuity. These studies are good examples of performing data analysis in a health care setting. But using machine learning to perform data analysis in rural communities is very limited. Furthermore, when it comes to the prediction of elderly patient behavior involving patient’s no study has been done concerning rural areas. An example, however, [8]who predicts patient’s decisions about end-of-life care, by looking at the characteristics of patient’s physicians. This is important in relation to this study, as the decision to revisit an emergency room is indeed taken by the patient. According to Hillestad [9], enhanced health data record systems could facilitate predictive modeling and thus decrease cost due to the insights of these predictions. However, these cost benefits are not assessed in the other studies as well as the setting in rural communities, mentioned in this section. This study, however, does indeed try to approach the cost benefits of implementing predictive models in daily decision making in rural healthcare. 4. Background 4.1 Electronic Health Record’s (HER’s) HERs is a real-time digital version of a patient’s clinical information that makes it possible to provide dynamic clinical patient information captured in structured records which can be consulted at every moment by medical professionals [10]. Electronic health records have introduced many merits for handling modern healthcare-related data. Its first merit is that healthcare professionals have improved access to the entire medical history of a patient. Electronic health records enable faster data retrieval and facilitate reporting of key healthcare quality indicators to the organizations, and also improve public health surveillance by immediate reporting of disease outbreaks. The electronic health records and the internet together help provide access to millions of health-related medical information critical for patient life [2]. Many rural hospitals and clinics are now using EHRs to keep records of their patents. 4.2 Machine Learning According to Baan et. al., [3], “machine learning is about making computers modify or adapt their actions so that these actions get more accurate, where accuracy is measured by how well the chosen actions reflect the correct ones”. There two different types of learning supervised and unsupervised learning. Supervised learning is the most common approach to machine learning. The data provided to train the machine learning algorithm is well labeled with the correct outcome. Based on that information, the algorithm generalizes to respond correctly to all possible inputs. Examples of supervised models are k-nearest neighbors, regression model, Bayesian network and support vector machine. Unsupervised learning is a contrast to the supervised learning method. The algorithm is trained on data that is not labeled, due to that we don’t know the outcome of. It mainly uses for finding patterns and detecting relevant information in data to form clusters [10]. An example of unsupervised learning is adaptive resonance theory and self- organizing map. This study used a supervised machine learning model known as logistic regression. 4.3 Logistic Regression Logistic regression is a generalized linear model, sharing some similarities with linear regression, with the exception of the predicted value is binary (0, 1). In [11] stated that “logistic regression uses a logistic function to map the outcome of a linear model to 0 or 1, hence the name Logistic regression “. Fig. 1. Logistic Regression Model The formula (1) for a simple linear regression model, (tries to minimize error in linear function) 𝑦(𝑥) = 𝛽 + 𝛽 𝑥 (1) Where y is the response variable, β0 is the intercept, β1 is the coefficient and x is the input variable. In the case of a logistic regression model, the linear regression model is mapped to the logistic function from; 𝑦(𝑥) = ( ) (2) But when more features (variables) are introduced the from change to; 𝑦(𝑥) = ( ) (3) Where w is the vector weights applied to each variable contained in the vector x. It can be trained on one or many independent continuous features. 5. Methods and Data Source 5.1 Study Data The data was collected from a rural hospital, from the analysis unit. Emergency admissions data was selected as the cohort, with the number of visits counted and previous diagnoses. The data collected were stored in a table and prepared by dichotomizing the categorical features, added as columns and assigned a 1 or a 0. The data was regarded as possible to have an effect on whether a patient would return or not was extracted, which resulted in Table 1. 5.2 Study Design This study was in the form of an exploratory case study described by [12]. The data used is archival and quantitative. It is done in order to generate new ideas and insights, despite its exploratory. The case outline follows 3 steps. These are; (1) data collection and preparation, (2) model training and testing with adjustable hyper-parameters and (3) result of the study was evaluated including implementation analysis and conclusion. 5.3 Study Methods The study choice of data mining technique for the study was implemented using Jupiter Notebook. The healthcare data collected from the rural hospital was saved to .csv extension for easy loading into Jupiter Notebook. SciKit Learn was imported to deployed Logistic Regression using Python codes. All the required analysis was done using Jupiter Notebook. 6. Implementation The first stage is pre-processing the data. Selection was made during the pre-processing of the data to get only patients above the age of 60. Data in this phase partition into training and testing. In the next step, we applied the logistic regression model on the training dataset in order to build and train the model. With Jupiter Notebook, SciKit Learn was used for the analysis which contains Logistic regression. The training data set consists of 9 features, shown in Table 1. Table 1. Feature Description Feature Type Range Age Continuous N ≥ 60 Sex Categorical (f /m) Emergency time Continuous …….. Emergency cause Categorical -------- Mode of arrival Categorical -------- Admission hour Categorical (0,23) Admission day Categorical (1,7) Admission month Categorical (1,12) Next 96 hours Target (Binary) (0,1) The logistic regression model has been optimized on Area under the curve (AUC) over a hyper-parameters using stratified 5-fold cross-validation. Table 2. Model Settings Summary Classifier Dataset Hyper-parameter Set (Threshold) Optimization Logistic Basic λ = (10 to the power of) 0.1, 0.2, 0.3, Area Under Regression 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0 Curve 7. Experiment Results This section describes the results discovered during the study. The accuracy of the model performance was high, 76%. This was expected since more elderly patients are likely to return within 96 hours. Since accuracy thus not convey much information on model performance, we look interested in precision and recall. The recall of the model was 73%, this interpreted that more than 50% of the elderly patients returning are identified within 96 hours. Precision range from 50% to 80%. A precision of 78% means that of the elderly patients predicted to return, 50% of them actually did. The precision of 50% was considered high as the data collected is very unbalanced. Regardless of the confusion matrix, a 50% precision means that a predicted positive outcome will be wrong 50% of the time. Figure 2, shows the scores of the calculated by extracting a small sub-set dataset from the training data. Thus they are just an indication of how the scores are distributed across the different hyper-parameters. Confusion Matrix: Threshold 0.5 Actual Actual Positive Negative Predicted 64 (TP) 18 (FP) Positive Predicted 23 (FN) 34(TN ) Negative Where: True Positive (TP), True Negative (TN) False Positive (FP), False Negative (FN) Calculation of precision and Recall: 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = = = 0.78 (4) 𝑅𝑒𝑐𝑎𝑙𝑙 = = = 0.73 (5) Fig. 2. ROC Curve for Different Hyper-parameters. From the ROC curve, at a threshold of 1.0, we classify no elderly patients is returning within 96 hours and hence have a recall and precision of 0.0. As the threshold decreases, the recall increases because we identify more elderly patients returning to the emergency unit. However, as our recall increases, our precision decreases because, in addition to increasing the true positives, we increase the false positives. At a threshold of 0.0, our recall is perfect, we could find all patients would return within 96 hours. 8. Conclusion and Future Work The study found that age affects the result positively, that is patients over 70 years are more likely is it for him/her to return within 96 hours. An intuitive notation is that older patients with heart diseases and sickle cell disease need to be treated with more caution. From the study, the following features have a good impact on elderly patient return, i.e., Admission last 12 months, the number of previous primary diagnoses and emergency cause. Emergence time has the smallest impact on all the features. After predicting that an elderly patient will return to the hospital within 96 hours, an action would be taken to prevent that from happening. This will lead to an extra, unknown cost above the normal visit cost. The study opens up some different paths in relation to future work. Firstly other behavior could be studied in rural communities EHR. Secondly, deploy different algorithms or machine learning techniques to come out with the best one instead of using only one model for predictions. Lastly more patient’s features could be used for model training and testing. References [1] A. A. Fuss, “The Prevention Of Depression : A Machine Learning Approach The Prevention Of Depression : A Machine Learning Approach,” 2019. [2] J. Kallio and M. Juhola, “Support Vector Machine and Deep Learning in Medical Application,” 2017. [3] C. A. Baan et al., “Performance of a predictive model to identify undiagnosed diabetes in a health care setting,” Diabetes Care, vol. 22, no. 2, pp. 213–219, 1999. [4] M. Khalilia, S. Chakraborty, and M. Popescu, “Predicting disease risks from highly imbalanced data using random forest,” BMC Med. Inform. Decis. Mak., vol. 11, no. 1, 2017. [5] N. Razavian, S. Blecker, A. M. Schmidt, A. Smith-Mclallen, S. Nigam, and D. Sontag, “Population-level prediction of type 2 diabetes from claims data and analysis of risk factors,” Big Data, vol. 3, no. 4, pp. 277– 287, 2017. [6] D. Z. Louis et al., “Predicting risk of hospitalisation or death: A retrospective population-based analysis,” BMJ Open, vol. 4, no. 9, pp. 1–8, 2017. [7] E. Kontio et al., “Predicting patient acuity from electronic patient records,” J. Biomed. Inform., vol. 51, pp. 35–40, 2017. [8] eljko Ivezic, “Statistics: A Practical Python Guide for the Analysis of Survey Data,” 2019. [9] R. Hillestad et al., “Can electronic medical record systems transform health care? Potential health benefits, savings, and costs,” Health Aff., vol. 24, no. 5, pp. 1103–1117, 2017. [10] D. Boonen, “The impact of bias on the predictive value of EHR driven machine learning models Dries Boonen,” 2019. [11] M. W. Attia, T. Zaoutis, J. D. Klein, and F. A. Meier, “Performance of a predictive model for streptococcal pharyngitis in children,” Arch. Pediatr. Adolesc. Med., vol. 155, no. 6, pp. 687–691, 2018. [12] P. Runeson and M. Höst, “Tutorial: Case studies in software engineering,” in Lecture Notes in Business Information Processing, 2018.