=Paper=
{{Paper
|id=Vol-2216/healthRecSys18_paper_6
|storemode=property
|title=A Hybrid Health Journey Recommender System Using Electronic Medical Records
|pdfUrl=https://ceur-ws.org/Vol-2216/healthRecSys18_paper_6.pdf
|volume=Vol-2216
|authors=Soheil Jamshidi,Mohamad Ali Torkamani,Jynelle Mellen,Malhar Jhaveri,Penny Pan,James Chung,Hakan Kardes
|dblpUrl=https://dblp.org/rec/conf/recsys/JamshidiTMJPCK18
}}
==A Hybrid Health Journey Recommender System Using Electronic Medical Records==
A Hybrid Health Journey Recommender System using Electronic Medical Records Soheil Jamshidi, Ali Torkamani, Jynelle Mellen, Malhar Jhaveri, Penny Pan, James Chung, Hakan Kardes Cambia Health Solutions Portland, OR, 97201 {firstName}.{lastName}@cambiahealth.com ABSTRACT offer insights to researchers. However, administrative data and EHRs We present a recommender system aimed at improving the healthcare are scattered among numerous entities and sources, such as health experience of consumers. Our model provides actionable insights to plans, laboratories, providers, hospitals, chart notes, and more. In ad- cohorts or individuals, based on their collective and personal health- dition to the disparate sources, the breadth, depth, linkage, and scale related data. The actionable insights are delivered through digital of the data lead to further complexity. For end-users (consumers) it interventions to help prevent adverse events for the consumer. By can make interpretation difficult due to information overload or a proposing timely and personalized suggestions, we will improve lack of information, as well as term inconsistency. The increasing consumer health outcomes and prevent complications, which would need to leverage the health records led to the presence of Health also result in cost-savings. Our recommendation system employs an Recommender Systems (HRS) [4, 13]. Such recommender systems ensembling technique, where at its core, we have a Bayesian network can target medical experts or patients and play a vital role in improv- that uses administrative claims data but could be extended to use ing an individual’s health by providing insightful recommendations. Electronic Health Records (EHR) data for learning the structure of These systems are primarily created to handle ambiguous diagnosis the interwoven health graph (conditions, medications, procedures, situation because of the varied decisions of providers [13] for certain and more). This method allows for predicting the probability of diseases. In that case, recommenders created by medical specialists various outcomes conditioned on the consumers’ evidential health provide insight for tailored diagnosis procedure for patients. To this data. We also couple our ensemble method with a shallow random aim, Machine Learning (ML) methods act as enablers. There has forest model to further refine the personalized recommendations after been a large number of studies on a wide range of machine learning receiving the consumer’s feedback. The experimental results show techniques - such as decision trees, multi-layer perceptron (MLP), that our system significantly improves the precision-recall metrics support vector machine (SVM) - that has been focused on a variety of several intervention targets compared to a random baseline. of diseases - dementia, kidney, and heart diseases to name a few [1–3, 10, 14–16, 18]. CCS CONCEPTS In this paper, we present a consumer-focused recommender sys- tem that will give individuals suggestions based on their collective • Applied computing → Consumer health; Health care informa- health-related data (EHR and claims). To accomplish this, we lever- tion systems; • Information systems → Recommender systems; age an ensemble algorithm, where a Bayesian network (BN) is combined with a random forest (RF). Probabilistic Graphical Mod- KEYWORDS els such as BN are known to tolerate the data uncertainty (noise, Health Journey; Health Recommender System; Hybrid Recommender ambiguity, and missing values) in a consistent and mathematically System correct way [25] in inference phase and RF facilitates refining the ACM Reference Format: personalized recommendations after receiving the consumer’s feed- Soheil Jamshidi, Ali Torkamani, Jynelle Mellen,, Malhar Jhaveri, Penny Pan, back. Such a system learns the conditional probability table using James Chung, Hakan Kardes . 2018. A Hybrid Health Journey Recommender maximum entropy or belief propagation approach on large datasets System, using Electronic Medical Records. In Proceedings of the Third Inter- derived from over a million records with thousands of diagnoses and national Workshop on Health Recommender Systems co-located with Twelfth findings, and over a hundred variables. Medical and pharmacy claims ACM Conference on Recommender Systems (HealthRecSys’18), Vancouver, in combination with laboratory results, nurse notes, and consumer BC, Canada, October 6, 2018 , 6 pages. data form a substantially powerful aggregated dataset that can be used to train a model that offers actionable recommendations. This 1 INTRODUCTION recommendation system can serve a variety of health-related appli- Availability of clinical data in form of Electronic Health Records cations (cost predictions and engagement modeling for example) or (EHR) has dramatically increased over the past decade. In addition can be presented as a product to the market. to EHRs, there are large volumes of administrative payment data that The structure of the remaining paper is as follows. Section 2 intro- duces the features of the data. Section 3 demonstrates the methods HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada used for learning and inference. Section 4 explains the technical © 2018 Copyright for the individual papers remains with the authors. Copying permitted details of our graphical model. We discuss the experimental results for private and academic purposes. This volume is published and copyrighted by its in Section 5. Then, in Section 6 we review the existing techniques editors. HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada Soheil Jamshidi et al. >=86 76-85 Dental Male 66-75 56-65 46-55 Age bin 36-45 26-35 Hospital 18-25 13-17 Female 5-12 1-4 Medical <1 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 0.0 0.1 0.2 0.3 0.4 0.5 percentage (%) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Figure 1: Gender Distribution Figure 2: Age Group Distribution Figure 3: Claim Types Distribution and studies. We conclude the paper with a discussion and an outline PGMs can handle large datasets in a computationally tractable man- of the directions for further research in Section 7. ner. We leverage a Bayesian network on a Directed Acyclic Graph (DAG). Before learning the conditional probabilities of the network, 2 NATURE OF DATA we found it useful to transform the data in a way that each patient observation corresponded to the manifests exhibited in a quarter, The dataset used in our recommender system is largely derived from which is long enough to cover a sequence of symptoms, that let us the claim data with over 1.5 million claims and over a hundred focus on a less noisy sequence of healthcare events. We consider distinct attributes over a 6 year period. populating the data for the immediate previous quarter and the next Available attributes can be grouped as numerical and categorical quarter for every quarter in our dataset, and hence there are triple of attributes. For example, consumer gender has two states, male and the observations (or rows) for any given patient. female. Figure 1 illustrates that more than 55% of our target group are women. We divided Age into twelve bins as follows: ≤1, 1-4, 5-12, 13-17, 18-25, from 26 to 85 in 10-year bins, and ≥86. It was confirmed by domain experts that patients in these age ranges gener- ally develop similar conditions. The distribution of the number of consumers per age bin is shown in Figure 2. The claim records cover 4 types of data sources: dental, medical, hospital. As it is shown in Figure 3, 65% of the claims are related to medical claims. We term the observed data, the manifests. The manifests are computed as aggregations based on meaningful categories. For each manifest, there is an integer count ≥ 0 that signifies the number of times a Figure 4: Proposed framework for our recommender system person had the event in a quarter. Currently, our system only relies on 4 major categories; drug classification, diagnosis classification, providers specialty, and service category. Finally, since each patient 3.1 Structure learning is associated with many records (claims) we aggregated records of each patient by calendar quarters (as suggested by domain experts Given the transformed data, we can translate the questions to the to be the most relevant time frame to capture related health events). following prediction/inference problem: Let a patient with a total of However, our system can be used for different time granularities. N features (manifests) including his/her medical tests, drugs, health events, and other manifests related to a period, the previous period and the next period. If we observe x features out of these N features, 3 METHODOLOGY can we predict the values (or the probability distributions) of the We rely on a hybrid approach leveraging Probabilistic Graphical remaining N −x features based on the available historical data. Using Models (PGM), Random Forest (RF), and Collaborative Filtering this data as input we learn the structure and create the model. The (CF) technique to obtain a vector of recommendations and combine steps involved are as follows: the results using an ensembler. This way, we benefit from the power First, to derive a joint probability distribution table, we transformed of PGMs in capturing the propagation of effects, CF in considering the input matrix to a discrete form with 0, 1 states. The input data the similar situations, and RFs targeting tailored recommendations. had the number of times a manifest was observed for a patient in Our proposed framework is illustrated in Figure 4 where data from a given quarter. If this value was non-zero, we replaced it with a 1. different sources is fed into analysis block where we train and use The matrix now represents if the manifest occurred at least once in our models. The output is a list of recommendations delivered via a the period. mobile application. Based on the feedback we gather from the users Then, we convert the matrix to have manifests observed in a quarter (consumers, providers, etc.) on the quality of the recommendations, along with manifests observed in the next and previous quarters. we can optimize the weights of the ensembler. In this study, we focus This transformation provided us with the data structure with which only on the green boxes, PGM and RF and the rest will be touched we could predict or infer manifests of a quarter given those from in our future works. another quarter. A Hybrid Health Journey Recommender System HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada We consider the data as a matrix A. Then, we find AT × A where pay more attention to the more important segments of information. AT represents the transpose of A. The resulting symmetric matrix B Attention modeling will be done as a part of our future work. has the number of joint occurrences of manifests across all patients. The diagonals in this matrix represent the number of occurrences 4 TECHNICAL DETAILS of the manifest for all of the patients. Dividing the row values by In this section, we briefly review the technical aspects of our Bayesian the diagonal value in the row resulted in the conditional probability network and how parameter learning methods are used to estimate P (M R, MC) of the column manifest, given the row manifest, P (M R) where the conditional probabilities for the given set of claims and predict MR is the manifest in the row, MC is the manifest in the column, the state or occurrence of a new set of claims. resulting in P(MC |MR). As expected, all the diagonals reduce to 1, and the resulting matrix is no more symmetric. 4.1 Bayesian Network To determine the significant relationships and to discard those Bayesian network (BN) is a probabilistic graphical model (PGM) that were not as significant, we set a threshold of 0.05 (5%) for that represents a set of random variable (nodes) say X 1 , ....., X n the conditional probability. If a conditional probability was greater and their conditional dependencies (edges corresponding to direct than 5% we retain it. This choice of threshold was arbitrary, and influence of one node on another), say X 1 ⊥ ⊥ X 3 |X 6, using a directed as an improvement, we should consult with our domain experts to acyclic graph (DAG). By surfacing these independencies we can verify the structure and adjust accordingly. Each relationship with reduce the number of values needed to be stored in order to represent a conditional probability above the threshold represents a directed the joint probability distribution and thus makes the representation edge in a graph, with the arrow going from the row manifest to more compact. the column manifest. Then, we remove cyclic relationships (the For our purpose, we use two layers of inference: structure and diagonal entries because Bayesian model does not allow loops) parameter learning. By leveraging structure inference, we create using the networkX python library. This function detects cycles on a the skeleton using conditional probabilities and domain expert input first-come basis and removes the last encountered edge once a cycle which captures the dependencies between the variables. The second is detected. These edges form a DAG structure. layer utilizes the dependencies and historical data to estimate the The fit function estimated the Conditional Probability Distri- conditional probability distributions of the individual variables. bution (CPD) for each variable based on the given data and the In parameter learning, there are two main methods: parameter estimation approach we use. In our case, we use the • Maximum likelihood estimation Bayesian parameter estimation because it considers the probability • Bayesian estimation distribution representing our prior knowledge (how likely are we to We use Bayesian estimation over maximum likelihood estimation believe in the different choices of parameters) and the support of the (MLE) because MLE considers a uniform prior distribution and this data (because confidence increases with more data). Moreover, our might lead us to end up in wrong conclusions about the likelihood of prior distribution is not uniform and hence this is also a reason to a variable θ X i and adjust the likelihood based on whether the sample use Bayesian parameter estimation. Since our aim is to predict the is biased or not. Also, MLE does not update the confidence of θ X i values of an unknown manifest we fit the model with the training with the change in the size of the data (450,000 out of 1,000,000 data set. At this step, the given data in graphical form was ready for vs 45 out of 100). Thus, in Bayesian estimation, we use the prior performing various types of reasoning. knowledge about θ with its probability distribution. This distribu- tion will represent how likely we believe the different choices of 3.2 Hybrid Scoring parameters. Therefore, we can create a joint distribution, which cap- As shown in Figure 4, we use multiple models to obtain the final tures the assumption over the parameters θ and the data we are to recommendation. One approach is to apply weighted ensemble meth- observe. Each new data point gives us more information about θ and ods to obtain better predictive performance than could be obtained hence the probability of the next occurrence. Hence the posterior from each of these models independently. The final recommenda- distribution for Bayesian estimation is: tions can be used for a wide range of applications. Here, we focus Pr(x[1], .., x[M]|θ ) Pr(θ ) on a Mobile App that provides health-care benefit or educational Pr(θ |x[1], .., x[M]) = (1) Pr(x[1], .., x[M]) recommendations. For example, if a high probability of hypertension is predicted, then the App would recommend that the person visits 4.2 Inference his/her Primary Care Physician (PCP). Finding the conditional probability distribution (CPD) over some variables Pr(Y |E = e) is the same as inferring from a model. There- 3.3 Feedback loop fore, predicting values for a new data point is the same as finding Feedback is a valuable asset for personalizing the recommendations the conditional probability of the unknown variables, given the ob- as well as making better recommendations to similar people. The served values of other variables. The CPDs can be computed from feedback loop can directly contribute to updating the weight vector the joint probability distribution of the variables, by marginalizing of the ensemble method, as well as hyperparameter tuning of the and reducing them over variables and states. individual models. In fact, we need an online learning algorithm In addition, we are interested in finding the state of a set of to incorporate the feedback into the re-training phase. However, variables given other set of variables. It is simply an inference query not every feedback or data observation has the same weight/qual- over the model and state having higher probability would be the ity. Therefore, we need to consider a context-aware algorithm to prediction by the model. However, computing the joint probability HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada Soheil Jamshidi et al. distribution will give us an exponentially large table, which the and vision benefits, respectively. In each figure, the expected result probabilistic graphical model helps to avoid these tables. There are from a random guess - based on the frequency of the positive class two algorithms we can use for inference: that is 27.38% (4.66%) for dental (vision) benefit - is also depicted. • Variable elimination • Belief propagation PR-ROC for Dental benefits We use variable elimination over belief propagation since the former Model Scores is suitable for a very large network as it is not memory-expensive. In 70 Random Guess addition, we discard the generated intermediate factors and hence it is more flexible than BP. In variable elimination, consider the model 60 A → B → C → D and we try to find Pr(D): Precision Õ 50 Pr(D) = Pr(a) Pr(b |a) Pr(c |b) Pr(D|c) (2) In variable elimination, we can sum over parts of the product instead 40 of over the complete product. Hence, Eq. (2) becomes: ÕÕÕ 30 Pr(D) = Pr(a) Pr(b |a) Pr(c |b) Pr(D|c) = a c 20 40 60 80 100 Õ b Õ Õ (3) Recall Pr(D|c) Pr(c |b) Pr(a) Pr(b|a) c b a Figure 5: ROC measure of Random Forest on Dental benefits This method helps to significantly reduce the computation required to compute the probabilities. Hence variable elimination is much more efficient for calculating probability distributions than normalizing and marginalizing the joint probability distribution. PR-ROC for Vision benefits Our prediction function uses a maximum a posteriori probability 80 Model Scores to find the states of variables corresponding to the maximum proba- Random Guess bility in the joint distribution. This is useful when we want to predict 70 the state of variables in our model. Moreover, we introduce another 60 operation on factors called maximization. Maximum a posteriori 50 Precision query is essentially a way to predict the state of variables, given 40 the state of other variables. Thus, using the trained model, we try to predict the states of variables for new data points. To design the 30 models, we need to create conditional probability distributions or 20 factors, add them to the base model, create an inference object, and 10 then do maximum a posteriori queries over it for new data points to predict variable states. 20 40 60 80 100 Recall 5 EXPERIMENTAL RESULTS While our un-targeted Bayesian network based recommender system Figure 6: ROC measure of Random Forest on Vision benefits can be leveraged to address a wide range of questions, in some cases a targeted model (such as a random forest) can be more beneficial. Using the trained Bayesian network, we predicted the states of In our proposed framework depicted in Figure 4, we have included all the missing columns/features of the test set. Prediction is done both components. To assess the abilities of our recommender system, by belief propagation where we find the most probable state of the we focus on recommendations of vision and dental benefits and train unknown manifests/features given the states of the other manifest- the random forest model on the same dataset and set of features that s/features using CPD’s. Fig 7 shows the correlation of manifests we trained our probabilistic graphical model and discuss the results related to a specific target (Diabetes mellitus without complication, in this section. in the next quarter) and their pairwise correlations considering two We trained the RF model using the default parameteres except for consecutive quarters. As expected, having Diabetes mellitus without the followings: n_estimators=30,max_depth=350, random_state=0, complication and taking antidiabetics in the previous quarter have and min_samples_leaf =2. Before fitting the data to the model we a strong correlation with having it in the next quarter. Interestingly, split the data into random train and test set in 80:20 ratio (widely the correlation between hypertension, the disorder of lipid, and high recommended split ratio). The test set does not contain the columns/- Glucose in blood are also captured by our model. manifests that we are interested in predicting. The shape of the data The probabilities gathered from our Bayesian network can be used was 2,010 cases with around 1.5K features. We used a subset represented as a network. In Fig 8 the most probable manifests in of the data set because of the constraint in the computational power. the previous quarter that can be used to predict whether the con- Figure 5 and Figure 6 illustrate the Precision-Recall ROC for dental sumer will visit an optimetrist are depicted as a network where edge A Hybrid Health Journey Recommender System HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada DM.w.o.C._NXT 1.0 buy but it does not indicate she does not like the product. However, DM.w.o.C. similar to other domains, machine learning methods have a clear ANTIDIABETICS 0.8 METFORMIN HCL advantage over manual inspection of data. Health domain is volatile Essential hypertension GLUCOSE BLOOD 0.6 DIAGNOSTIC PRODUCTS and dynamic, machine learning methods can tolerate the changes in Disorders of lipid Other aftercare 0.4 medical codes due to easier retraining process compare to manually DME SUPPLIER DME pattern recognition, robust against temporal changes of patterns - Med. Devices 0.2 General Medicine using a sliding window approach for learning, and can offer person- Laboratory - Outpatient INSULIN GLARGINE 0.0 alized experience per hospital or patient at the same time having a DME ANTIDIABETICS METFORMIN HCL GLUCOSE BLOOD Other aftercare DME SUPPLIER Med. Devices General Medicine Laboratory - Outpatient DM.w.o.C._NXT DM.w.o.C. Essential hypertension DIAGNOSTIC PRODUCTS Disorders of lipid INSULIN GLARGINE general view of the whole system. To design a framework/system, it is necessary to know the target users. Two main end users can be considered for healthcare rec- ommender systems. Wiensner and Pfeifer [26] suggest that such systems can target health professionals (doctors and/or nurses) to help them gather additional information on a special case, or can identify patients as end user and deliver health-related content to them, such as lifestyle change recommendation [8] through changing Figure 7: Feature Corrolation related to Diabetes mellitus with- their sleeping, eating, and exercising routines and improving patient out complication safety [17] and lowering health risks through informing them about interactions between different drugs. Policy makers are also another target for in this domain. thickness, shows the prevalence of manifests related to a specific To design such a system, there are several guidelines. Valdez et al. target. As shown, those who had more medical interactions (surgery, propose a 3-step process [23] to design a recommender system: 1) medicines, office visit, etc.) are more probable to have optometry understanding the domain, 2) Evaluation , and 3) Inception. In the event in their medical journey. evaluation step, the importance of user-centered criteria and ethical implications (trust, value, security, long-term efficiency, individual FAMILY PRACTICE freedom, and risk) in addition to accuracy metrics are discussed. HOSPITAL Schafer et al. discuss the recent challenges that were tackled by the LaboratoryOutpatient Surgery researchers and how to proceed toward a health aware recommender system. Personalization, the balance between persuasion and em- Office Visits powerment, and user trust and satisfaction are the main issues that ANTIHYPERTENSIVES captured researchers attention [19]. They group the challenges for OPTOMETRY future studies into 3 groups of Patient, recommender systems, and INTERNAL MEDICINE evaluation challenges. User modeling and profiling, Data integra- General Medicine tion and cleaning from multiple sources are the main patient-related M.D. challenges. On the recommender system side, personalized and ac- curate recommendation along with step by step implementation of HospitalOutpatient Diagnostic recommendations using the “expert-in-the-loop" interactions. On Essential hypertension Medical the evaluation side, the accuracy, real-life performance, ethical, and privacy considerations are discussed. BETA BLOCKERS There has been a number of prior efforts in this domain. One of the well known existing application is Promedas [13], a medical patient-specific clinical diagnostic decision support system, that uses Figure 8: Optimetry targets a probabilistic graphical model built with the help of medical spe- cialists. As discussed earlier, they help in recommending a diagnosis specific to an individual when there is a ambiguity among physi- 6 EXISTING TECHNOLOGIES cians without rationalization. Probabilistic methods and especially Health-care domain has specific characteristics and requirements Bayesian networks have been used in a wide range of domains. For that can not be addressed by general purpose or commercial recom- example, Huang [9] used the Bayesian network in response to two mender systems that are available in other domains (such as ones issues in the tourism domain. First, the absence of travel history for a used in Netflix or Amazon) [5]. Recommender systems for clinical single user to use a content-based activity estimation and the second activities have no specific task and it mainly depends on the item is the absence of similarity between users and other users. that is recommended. All of the possible items are expected to be Since our system predicts multiple targets simultaneously, for each recommended. Rating system does not exist and most clinical be- person, certain combinations of the outputs will be more likely that haviours are binary (have a symtom or not). Compare to the general the other combinations. To address this fact, we adopted a struc- recommender systems with specific and well defined tasks, subset tured prediction based approach that uses collective classification of items can be recommended, rating system exists due to subjec- in its core by considering the associativity of targets as nodes in a tive desire and behavior is not binary as a customer may refuse to graph [20, 24]. This model captures dependencies that would not HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada Soheil Jamshidi et al. be considered otherwise [7, 21]. At the same time, the consumers’ [6] Shobeir Fakhraei, James Foulds, Madhusudana Shashanka, and Lise Getoor. 2015. feedback plays a vital role in fine-tuning and improving the per- Collective spammer detection in evolving multi-relational social networks. In Pro- ceedings of the 21th acm sigkdd international conference on knowledge discovery formance of the recommender systems. However incorporating the and data mining. ACM, 1769–1778. users’ feedback is challenging, the system can also be exposed to [7] Shobeir Fakhraei, Bert Huang, Louiqa Raschid, and Lise Getoor. 2014. Network- based drug-target interaction prediction with probabilistic soft logic. IEEE/ACM bias due to personal taste or motivations as was shown in other do- Transactions on Computational Biology and Bioinformatics (TCBB) 11, 5 (2014), mains such as Amazon or application markets [12, 27]. Therefore, 775–787. we incorporated robust approaches similar to Torkamani et al. and [8] Robert G Farrell, Catalina M Danis, Sreeram Ramakrishnan, and Wendy A Kel- logg. 2012. Intrapersonal retrospective recommendation: lifestyle change recom- Fakhraei et al. [6, 12, 21, 22]. We also need to tackle the imposes mendations using stable patterns of personal behavior. In Proceedings of the First computational complexity to the system, if the goal is to update the International Workshop on Recommendation Technologies for Lifestyle Change system’s state on the fly in real time. For which, we can consider (LIFESTYLE 2012), Dublin, Ireland. Citeseer, 24. [9] Yuxia Huang and Ling Bian. 2009. A Bayesian network and analytic hierar- using updating schemes that have been used in similar domains to chy process based personalized recommendations for tourist attractions over the reduce the updating costs [11]. Internet. Expert Systems with Applications 36, 1 (2009), 933–943. [10] Aiswarya Iyer, S Jeyalatha, and Ronak Sumbaly. 2015. Diagnosis of diabetes Given all the efforts have been done in this domain, our model uses using classification mining techniques. arXiv preprint arXiv:1502.03774 (2015). a large and rich dataset and relies on a mathematically correct and [11] Soheil Jamshidi and Mahmoud Reza Hashemi. 2012. An efficient data enrichment proved basis and is able to address a wide range of different questions scheme for fraud detection using social network analysis. In Telecommunications (IST), 2012 Sixth International Symposium on. IEEE, 1082–1087. about the health journey of consumers, beneficial for consumers, [12] Soheil Jamshidi, Reza Rejaie, and Jun Li. 2018. Trojan Horses in Amazons Castle: providers, and payers over time. Understanding the Incentivized Online Reviews. In Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM ’18). ACM. 7 CONCLUSION [13] Bert Kappen, Wim Wiegerinck, Ender Akay, Jan Neijt, and Andr’e van Beek. 2003. Promedas: A clinical diagnostic decision support system. In Proceedings of Our proposed recommender system provides personalized, timely the 15th Belgian-Dutch Conference on Artificial Intelligence. 23–24. and actionable health-care insights for consumers. We make rele- [14] Nguyen Cong Long, Phayung Meesad, and Herwig Unger. 2015. A highly ac- vant suggestions by predicting the probabilities of various health curate firefly based algorithm for heart disease prediction. Expert Systems with Applications 42, 21 (2015), 8221–8231. events. By deploying users’ feedback from their interactions within [15] Joao Maroco, Dina Silva, Ana Rodrigues, Manuela Guerreiro, Isabel Santana, the mobile application, we enable additional personalized sugges- and Alexandre de Mendonça. 2011. Data mining methods in the prediction of tions. This is accomplished through an ensemble algorithm, where a Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector Bayesian network is combined with a random forest. In the future, machines, classification trees and random forests. BMC research notes 4, 1 (2011), we can improve this framework in several ways. First, by includ- 299. [16] Amrita Naik and Lilavati Samant. 2016. Correlation review of classification ing data from other sources, such as lab results or nurse notes, we algorithm using data mining tool: WEKA, Rapidminer, Tanagra, Orange and can expand the feature set to provide a more complete view of the Knime. Procedia Computer Science 85 (2016), 662–668. consumer. This view would improve the precision-recall metrics of [17] Haggai Roitman, Yossi Messika, Yevgenia Tsimerman, and Yonatan Maman. 2010. Increasing patient safety using explanation-driven personalized content the predictions, as well as shed a brighter light on how to increase recommendation. In Proceedings of the 1st ACM International Health Informatics the effectiveness of the recommendations. Second, while a calendar Symposium. ACM, 430–434. quarter is currently the feature extraction and prediction time unit, [18] Kanak Saxena, Richa Sharma, et al. 2015. Efficient heart disease prediction system using decision tree. In Computing, Communication & Automation (ICCCA), 2015 we can change this within the probability model to predict the timing International Conference on. IEEE, 72–77. of a health event (as an additional random variable). We could also [19] Hanna Schafer, Santiago Hors-Fraile, Raghav Pavan Karumur, Andre Calero Valdez, Alan Said, Helma Torkamaan, Tom Ulmer, and Christoph Trattner. expand the model to capture a longer period of health care history to 2017. Towards health (aware) recommender systems. In Proceedings of the 2017 identify missing values over time, which would lead to the discovery international conference on digital health. ACM, 157–161. of long-term influences, such as chronic ailments. Finally, for both [20] Ben Taskar, Vassil Chatalbashev, Daphne Koller, and Carlos Guestrin. 2005. Learning structured prediction models: A large margin approach. In Proceedings better interpretability and overall improvement of the recommender of the 22nd international conference on Machine learning. ACM, 896–903. system, we are working on a context-aware attention modeling al- [21] MohamadAli Torkamani and Daniel Lowd. 2013. Convex adversarial collective gorithm to identify, invigorate, and use the most relevant features classification. In International Conference on Machine Learning. 642–650. [22] Mohamad Ali Torkamani and Daniel Lowd. 2014. On robustness and regular- extracted from the health data and received feedback. ization of structural support vector machines. In International Conference on Machine Learning. 577–585. [23] Andr’e Calero Valdez, Martina Ziefle, Katrien Verbert, Alexander Felfernig, and REFERENCES Andreas Holzinger. 2016. Recommender systems for health informatics: State- [1] Pragati Agrawal and Amit kumar Dewangan. 2015. A brief survey on the tech- of-the-art and future perspectives. In Machine Learning for Health Informatics. niques used for the diagnosis of diabetes-mellitus. Int. Res. J. of Eng. and Tech. Springer, 391–414. IRJET 2 (2015), 1039–1043. [24] David Weiss and Benjamin Taskar. 2010. Structured prediction cascades. In [2] PK Anooj. 2012. Clinical decision support system: Risk level prediction of heart Proceedings of the Thirteenth International Conference on Artificial Intelligence disease using weighted fuzzy rules. Journal of King Saud University-Computer and Statistics. 916–923. and Information Sciences 24, 1 (2012), 27–40. [25] WAJJ Wiegerinck and Tom Heskes. 2001. Probability assessment with maximum [3] Pushkaraj R Bhandari, Sapna P Yadav, Shyam A Mote, Devika P Rankhambe, UG entropy in Bayesian networks. (2001). Scholar, and Pune APCOER. 2016. Predictive system for medical diagnosis with [26] Martin Wiesner and Daniel Pfeifer. 2014. Health recommender systems: con- expertise analysis. International Journal of Engineering Science 4652 (2016). cepts, requirements, technical basics and challenges. International journal of [4] Amy Compton-Phillips. [n. d.]. Care Redesign - What Data Can Re- environmental research and public health 11, 3 (2014), 2580–2607. ally Do for Health Care. http://join.catalyst.nejm.org/hubfs/Insights% [27] Zhen Xie and Sencun Zhu. 2015. AppWatcher: Unveiling the underground market 20Council%20Monthly%20-%20Files/Insights%20Council%20March% of trading mobile app reviews. In Proceedings of the 8th ACM Conference on 202017%20Report%20What%20Data%20Can%20Really%20Do%20for% Security & Privacy in Wireless and Mobile Networks. ACM, 10. 20Health%20Care.pdf [5] Lian Duan, W Nick Street, and E Xu. 2011. Healthcare information systems: data mining methods in the creation of a clinical recommender system. Enterprise Information Systems 5, 2 (2011), 169–181.