=Paper= {{Paper |id=Vol-2216/healthRecSys18_paper_6 |storemode=property |title=A Hybrid Health Journey Recommender System Using Electronic Medical Records |pdfUrl=https://ceur-ws.org/Vol-2216/healthRecSys18_paper_6.pdf |volume=Vol-2216 |authors=Soheil Jamshidi,Mohamad Ali Torkamani,Jynelle Mellen,Malhar Jhaveri,Penny Pan,James Chung,Hakan Kardes |dblpUrl=https://dblp.org/rec/conf/recsys/JamshidiTMJPCK18 }} ==A Hybrid Health Journey Recommender System Using Electronic Medical Records== https://ceur-ws.org/Vol-2216/healthRecSys18_paper_6.pdf

A Hybrid Health Journey Recommender System
using Electronic Medical Records
Soheil Jamshidi, Ali Torkamani, Jynelle Mellen,
Malhar Jhaveri, Penny Pan, James Chung, Hakan Kardes
Cambia Health Solutions
Portland, OR, 97201
{firstName}.{lastName}@cambiahealth.com
ABSTRACT offer insights to researchers. However, administrative data and EHRs
We present a recommender system aimed at improving the healthcare are scattered among numerous entities and sources, such as health
experience of consumers. Our model provides actionable insights to plans, laboratories, providers, hospitals, chart notes, and more. In ad-
cohorts or individuals, based on their collective and personal health- dition to the disparate sources, the breadth, depth, linkage, and scale
related data. The actionable insights are delivered through digital of the data lead to further complexity. For end-users (consumers) it
interventions to help prevent adverse events for the consumer. By can make interpretation difficult due to information overload or a
proposing timely and personalized suggestions, we will improve lack of information, as well as term inconsistency. The increasing
consumer health outcomes and prevent complications, which would need to leverage the health records led to the presence of Health
also result in cost-savings. Our recommendation system employs an Recommender Systems (HRS) [4, 13]. Such recommender systems
ensembling technique, where at its core, we have a Bayesian network can target medical experts or patients and play a vital role in improv-
that uses administrative claims data but could be extended to use ing an individual’s health by providing insightful recommendations.
Electronic Health Records (EHR) data for learning the structure of These systems are primarily created to handle ambiguous diagnosis
the interwoven health graph (conditions, medications, procedures, situation because of the varied decisions of providers [13] for certain
and more). This method allows for predicting the probability of diseases. In that case, recommenders created by medical specialists
various outcomes conditioned on the consumers’ evidential health provide insight for tailored diagnosis procedure for patients. To this
data. We also couple our ensemble method with a shallow random aim, Machine Learning (ML) methods act as enablers. There has
forest model to further refine the personalized recommendations after been a large number of studies on a wide range of machine learning
receiving the consumer’s feedback. The experimental results show techniques - such as decision trees, multi-layer perceptron (MLP),
that our system significantly improves the precision-recall metrics support vector machine (SVM) - that has been focused on a variety
of several intervention targets compared to a random baseline. of diseases - dementia, kidney, and heart diseases to name a few
[1–3, 10, 14–16, 18].
CCS CONCEPTS In this paper, we present a consumer-focused recommender sys-
tem that will give individuals suggestions based on their collective
• Applied computing → Consumer health; Health care informa-
health-related data (EHR and claims). To accomplish this, we lever-
tion systems; • Information systems → Recommender systems;
age an ensemble algorithm, where a Bayesian network (BN) is
combined with a random forest (RF). Probabilistic Graphical Mod-
KEYWORDS
els such as BN are known to tolerate the data uncertainty (noise,
Health Journey; Health Recommender System; Hybrid Recommender ambiguity, and missing values) in a consistent and mathematically
System correct way [25] in inference phase and RF facilitates refining the
ACM Reference Format: personalized recommendations after receiving the consumer’s feed-
Soheil Jamshidi, Ali Torkamani, Jynelle Mellen,, Malhar Jhaveri, Penny Pan, back. Such a system learns the conditional probability table using
James Chung, Hakan Kardes . 2018. A Hybrid Health Journey Recommender maximum entropy or belief propagation approach on large datasets
System, using Electronic Medical Records. In Proceedings of the Third Inter- derived from over a million records with thousands of diagnoses and
national Workshop on Health Recommender Systems co-located with Twelfth findings, and over a hundred variables. Medical and pharmacy claims
ACM Conference on Recommender Systems (HealthRecSys’18), Vancouver,
in combination with laboratory results, nurse notes, and consumer
BC, Canada, October 6, 2018 , 6 pages.
data form a substantially powerful aggregated dataset that can be
used to train a model that offers actionable recommendations. This
1 INTRODUCTION recommendation system can serve a variety of health-related appli-
Availability of clinical data in form of Electronic Health Records cations (cost predictions and engagement modeling for example) or
(EHR) has dramatically increased over the past decade. In addition can be presented as a product to the market.
to EHRs, there are large volumes of administrative payment data that The structure of the remaining paper is as follows. Section 2 intro-
duces the features of the data. Section 3 demonstrates the methods
HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada used for learning and inference. Section 4 explains the technical
© 2018 Copyright for the individual papers remains with the authors. Copying permitted details of our graphical model. We discuss the experimental results
for private and academic purposes. This volume is published and copyrighted by its in Section 5. Then, in Section 6 we review the existing techniques
editors.
HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada Soheil Jamshidi et al.

>=86
76-85 Dental
Male 66-75
56-65
46-55

Age bin
36-45
26-35 Hospital
18-25
13-17
Female 5-12
1-4 Medical
<1
0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5
0.0 0.1 0.2 0.3 0.4 0.5 percentage (%) 0.0 0.1 0.2 0.3 0.4 0.5 0.6

Figure 1: Gender Distribution Figure 2: Age Group Distribution Figure 3: Claim Types Distribution

and studies. We conclude the paper with a discussion and an outline PGMs can handle large datasets in a computationally tractable man-
of the directions for further research in Section 7. ner. We leverage a Bayesian network on a Directed Acyclic Graph
(DAG). Before learning the conditional probabilities of the network,
2 NATURE OF DATA we found it useful to transform the data in a way that each patient
observation corresponded to the manifests exhibited in a quarter,
The dataset used in our recommender system is largely derived from
which is long enough to cover a sequence of symptoms, that let us
the claim data with over 1.5 million claims and over a hundred
focus on a less noisy sequence of healthcare events. We consider
distinct attributes over a 6 year period.
populating the data for the immediate previous quarter and the next
Available attributes can be grouped as numerical and categorical
quarter for every quarter in our dataset, and hence there are triple of
attributes. For example, consumer gender has two states, male and
the observations (or rows) for any given patient.
female. Figure 1 illustrates that more than 55% of our target group
are women. We divided Age into twelve bins as follows: ≤1, 1-4,
5-12, 13-17, 18-25, from 26 to 85 in 10-year bins, and ≥86. It was
confirmed by domain experts that patients in these age ranges gener-
ally develop similar conditions. The distribution of the number of
consumers per age bin is shown in Figure 2. The claim records cover
4 types of data sources: dental, medical, hospital. As it is shown
in Figure 3, 65% of the claims are related to medical claims. We
term the observed data, the manifests. The manifests are computed
as aggregations based on meaningful categories. For each manifest,
there is an integer count ≥ 0 that signifies the number of times a Figure 4: Proposed framework for our recommender system
person had the event in a quarter. Currently, our system only relies
on 4 major categories; drug classification, diagnosis classification,
providers specialty, and service category. Finally, since each patient
3.1 Structure learning
is associated with many records (claims) we aggregated records of
each patient by calendar quarters (as suggested by domain experts Given the transformed data, we can translate the questions to the
to be the most relevant time frame to capture related health events). following prediction/inference problem: Let a patient with a total of
However, our system can be used for different time granularities. N features (manifests) including his/her medical tests, drugs, health
events, and other manifests related to a period, the previous period
and the next period. If we observe x features out of these N features,
3 METHODOLOGY can we predict the values (or the probability distributions) of the
We rely on a hybrid approach leveraging Probabilistic Graphical remaining N −x features based on the available historical data. Using
Models (PGM), Random Forest (RF), and Collaborative Filtering this data as input we learn the structure and create the model. The
(CF) technique to obtain a vector of recommendations and combine steps involved are as follows:
the results using an ensembler. This way, we benefit from the power First, to derive a joint probability distribution table, we transformed
of PGMs in capturing the propagation of effects, CF in considering the input matrix to a discrete form with 0, 1 states. The input data
the similar situations, and RFs targeting tailored recommendations. had the number of times a manifest was observed for a patient in
Our proposed framework is illustrated in Figure 4 where data from a given quarter. If this value was non-zero, we replaced it with a 1.
different sources is fed into analysis block where we train and use The matrix now represents if the manifest occurred at least once in
our models. The output is a list of recommendations delivered via a the period.
mobile application. Based on the feedback we gather from the users Then, we convert the matrix to have manifests observed in a quarter
(consumers, providers, etc.) on the quality of the recommendations, along with manifests observed in the next and previous quarters.
we can optimize the weights of the ensembler. In this study, we focus This transformation provided us with the data structure with which
only on the green boxes, PGM and RF and the rest will be touched we could predict or infer manifests of a quarter given those from
in our future works. another quarter.
A Hybrid Health Journey Recommender System HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada

We consider the data as a matrix A. Then, we find AT × A where pay more attention to the more important segments of information.
AT represents the transpose of A. The resulting symmetric matrix B Attention modeling will be done as a part of our future work.
has the number of joint occurrences of manifests across all patients.
The diagonals in this matrix represent the number of occurrences 4 TECHNICAL DETAILS
of the manifest for all of the patients. Dividing the row values by In this section, we briefly review the technical aspects of our Bayesian
the diagonal value in the row resulted in the conditional probability network and how parameter learning methods are used to estimate
P (M R, MC)
of the column manifest, given the row manifest, P (M R) where the conditional probabilities for the given set of claims and predict
MR is the manifest in the row, MC is the manifest in the column, the state or occurrence of a new set of claims.
resulting in P(MC |MR). As expected, all the diagonals reduce to 1,
and the resulting matrix is no more symmetric. 4.1 Bayesian Network
To determine the significant relationships and to discard those Bayesian network (BN) is a probabilistic graphical model (PGM)
that were not as significant, we set a threshold of 0.05 (5%) for that represents a set of random variable (nodes) say X 1 , ....., X n
the conditional probability. If a conditional probability was greater and their conditional dependencies (edges corresponding to direct
than 5% we retain it. This choice of threshold was arbitrary, and influence of one node on another), say X 1 ⊥ ⊥ X 3 |X 6, using a directed
as an improvement, we should consult with our domain experts to acyclic graph (DAG). By surfacing these independencies we can
verify the structure and adjust accordingly. Each relationship with reduce the number of values needed to be stored in order to represent
a conditional probability above the threshold represents a directed the joint probability distribution and thus makes the representation
edge in a graph, with the arrow going from the row manifest to more compact.
the column manifest. Then, we remove cyclic relationships (the For our purpose, we use two layers of inference: structure and
diagonal entries because Bayesian model does not allow loops) parameter learning. By leveraging structure inference, we create
using the networkX python library. This function detects cycles on a the skeleton using conditional probabilities and domain expert input
first-come basis and removes the last encountered edge once a cycle which captures the dependencies between the variables. The second
is detected. These edges form a DAG structure. layer utilizes the dependencies and historical data to estimate the
The fit function estimated the Conditional Probability Distri- conditional probability distributions of the individual variables.
bution (CPD) for each variable based on the given data and the In parameter learning, there are two main methods:
parameter estimation approach we use. In our case, we use the • Maximum likelihood estimation
Bayesian parameter estimation because it considers the probability • Bayesian estimation
distribution representing our prior knowledge (how likely are we to
We use Bayesian estimation over maximum likelihood estimation
believe in the different choices of parameters) and the support of the
(MLE) because MLE considers a uniform prior distribution and this
data (because confidence increases with more data). Moreover, our
might lead us to end up in wrong conclusions about the likelihood of
prior distribution is not uniform and hence this is also a reason to
a variable θ X i and adjust the likelihood based on whether the sample
use Bayesian parameter estimation. Since our aim is to predict the
is biased or not. Also, MLE does not update the confidence of θ X i
values of an unknown manifest we fit the model with the training
with the change in the size of the data (450,000 out of 1,000,000
data set. At this step, the given data in graphical form was ready for
vs 45 out of 100). Thus, in Bayesian estimation, we use the prior
performing various types of reasoning.
knowledge about θ with its probability distribution. This distribu-
tion will represent how likely we believe the different choices of
3.2 Hybrid Scoring parameters. Therefore, we can create a joint distribution, which cap-
As shown in Figure 4, we use multiple models to obtain the final tures the assumption over the parameters θ and the data we are to
recommendation. One approach is to apply weighted ensemble meth- observe. Each new data point gives us more information about θ and
ods to obtain better predictive performance than could be obtained hence the probability of the next occurrence. Hence the posterior
from each of these models independently. The final recommenda- distribution for Bayesian estimation is:
tions can be used for a wide range of applications. Here, we focus Pr(x[1], .., x[M]|θ ) Pr(θ )
on a Mobile App that provides health-care benefit or educational Pr(θ |x[1], .., x[M]) = (1)
Pr(x[1], .., x[M])
recommendations. For example, if a high probability of hypertension
is predicted, then the App would recommend that the person visits 4.2 Inference
his/her Primary Care Physician (PCP).
Finding the conditional probability distribution (CPD) over some
variables Pr(Y |E = e) is the same as inferring from a model. There-
3.3 Feedback loop fore, predicting values for a new data point is the same as finding
Feedback is a valuable asset for personalizing the recommendations the conditional probability of the unknown variables, given the ob-
as well as making better recommendations to similar people. The served values of other variables. The CPDs can be computed from
feedback loop can directly contribute to updating the weight vector the joint probability distribution of the variables, by marginalizing
of the ensemble method, as well as hyperparameter tuning of the and reducing them over variables and states.
individual models. In fact, we need an online learning algorithm In addition, we are interested in finding the state of a set of
to incorporate the feedback into the re-training phase. However, variables given other set of variables. It is simply an inference query
not every feedback or data observation has the same weight/qual- over the model and state having higher probability would be the
ity. Therefore, we need to consider a context-aware algorithm to prediction by the model. However, computing the joint probability
HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada Soheil Jamshidi et al.

distribution will give us an exponentially large table, which the and vision benefits, respectively. In each figure, the expected result
probabilistic graphical model helps to avoid these tables. There are from a random guess - based on the frequency of the positive class
two algorithms we can use for inference: that is 27.38% (4.66%) for dental (vision) benefit - is also depicted.
• Variable elimination
• Belief propagation
PR-ROC for Dental benefits
We use variable elimination over belief propagation since the former
Model Scores
is suitable for a very large network as it is not memory-expensive. In 70 Random Guess
addition, we discard the generated intermediate factors and hence it
is more flexible than BP. In variable elimination, consider the model 60
A → B → C → D and we try to find Pr(D):

Precision
Õ 50
Pr(D) = Pr(a) Pr(b |a) Pr(c |b) Pr(D|c) (2)
In variable elimination, we can sum over parts of the product instead 40
of over the complete product. Hence, Eq. (2) becomes:
ÕÕÕ 30
Pr(D) = Pr(a) Pr(b |a) Pr(c |b) Pr(D|c) =
a c 20 40 60 80 100
Õ
b
Õ Õ (3) Recall
Pr(D|c) Pr(c |b) Pr(a) Pr(b|a)
c b a
Figure 5: ROC measure of Random Forest on Dental benefits
This method helps to significantly reduce the computation required to
compute the probabilities. Hence variable elimination is much more
efficient for calculating probability distributions than normalizing
and marginalizing the joint probability distribution. PR-ROC for Vision benefits
Our prediction function uses a maximum a posteriori probability 80 Model Scores
to find the states of variables corresponding to the maximum proba- Random Guess
bility in the joint distribution. This is useful when we want to predict 70
the state of variables in our model. Moreover, we introduce another 60
operation on factors called maximization. Maximum a posteriori 50
Precision

query is essentially a way to predict the state of variables, given
40
the state of other variables. Thus, using the trained model, we try
to predict the states of variables for new data points. To design the 30
models, we need to create conditional probability distributions or 20
factors, add them to the base model, create an inference object, and 10
then do maximum a posteriori queries over it for new data points to
predict variable states. 20 40 60 80 100
Recall
5 EXPERIMENTAL RESULTS
While our un-targeted Bayesian network based recommender system Figure 6: ROC measure of Random Forest on Vision benefits
can be leveraged to address a wide range of questions, in some cases
a targeted model (such as a random forest) can be more beneficial. Using the trained Bayesian network, we predicted the states of
In our proposed framework depicted in Figure 4, we have included all the missing columns/features of the test set. Prediction is done
both components. To assess the abilities of our recommender system, by belief propagation where we find the most probable state of the
we focus on recommendations of vision and dental benefits and train unknown manifests/features given the states of the other manifest-
the random forest model on the same dataset and set of features that s/features using CPD’s. Fig 7 shows the correlation of manifests
we trained our probabilistic graphical model and discuss the results related to a specific target (Diabetes mellitus without complication,
in this section. in the next quarter) and their pairwise correlations considering two
We trained the RF model using the default parameteres except for consecutive quarters. As expected, having Diabetes mellitus without
the followings: n_estimators=30,max_depth=350, random_state=0, complication and taking antidiabetics in the previous quarter have
and min_samples_leaf =2. Before fitting the data to the model we a strong correlation with having it in the next quarter. Interestingly,
split the data into random train and test set in 80:20 ratio (widely the correlation between hypertension, the disorder of lipid, and high
recommended split ratio). The test set does not contain the columns/- Glucose in blood are also captured by our model.
manifests that we are interested in predicting. The shape of the data The probabilities gathered from our Bayesian network can be
used was 2,010 cases with around 1.5K features. We used a subset represented as a network. In Fig 8 the most probable manifests in
of the data set because of the constraint in the computational power. the previous quarter that can be used to predict whether the con-
Figure 5 and Figure 6 illustrate the Precision-Recall ROC for dental sumer will visit an optimetrist are depicted as a network where edge
A Hybrid Health Journey Recommender System HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada

DM.w.o.C._NXT 1.0 buy but it does not indicate she does not like the product. However,
DM.w.o.C. similar to other domains, machine learning methods have a clear
ANTIDIABETICS 0.8
METFORMIN HCL advantage over manual inspection of data. Health domain is volatile
Essential hypertension
GLUCOSE BLOOD 0.6
DIAGNOSTIC PRODUCTS and dynamic, machine learning methods can tolerate the changes in
Disorders of lipid
Other aftercare 0.4 medical codes due to easier retraining process compare to manually
DME SUPPLIER
DME pattern recognition, robust against temporal changes of patterns -
Med. Devices 0.2
General Medicine using a sliding window approach for learning, and can offer person-
Laboratory - Outpatient
INSULIN GLARGINE 0.0 alized experience per hospital or patient at the same time having a

DME
ANTIDIABETICS
METFORMIN HCL
GLUCOSE BLOOD

Other aftercare
DME SUPPLIER
Med. Devices
General Medicine
Laboratory - Outpatient
DM.w.o.C._NXT
DM.w.o.C.

Essential hypertension
DIAGNOSTIC PRODUCTS
Disorders of lipid

INSULIN GLARGINE
general view of the whole system.
To design a framework/system, it is necessary to know the target
users. Two main end users can be considered for healthcare rec-
ommender systems. Wiensner and Pfeifer [26] suggest that such
systems can target health professionals (doctors and/or nurses) to
help them gather additional information on a special case, or can
identify patients as end user and deliver health-related content to
them, such as lifestyle change recommendation [8] through changing
Figure 7: Feature Corrolation related to Diabetes mellitus with- their sleeping, eating, and exercising routines and improving patient
out complication safety [17] and lowering health risks through informing them about
interactions between different drugs. Policy makers are also another
target for in this domain.
thickness, shows the prevalence of manifests related to a specific
To design such a system, there are several guidelines. Valdez et al.
target. As shown, those who had more medical interactions (surgery,
propose a 3-step process [23] to design a recommender system: 1)
medicines, office visit, etc.) are more probable to have optometry
understanding the domain, 2) Evaluation , and 3) Inception. In the
event in their medical journey.
evaluation step, the importance of user-centered criteria and ethical
implications (trust, value, security, long-term efficiency, individual
FAMILY PRACTICE
freedom, and risk) in addition to accuracy metrics are discussed.
HOSPITAL
Schafer et al. discuss the recent challenges that were tackled by the
LaboratoryOutpatient
Surgery researchers and how to proceed toward a health aware recommender
system. Personalization, the balance between persuasion and em-
Office Visits powerment, and user trust and satisfaction are the main issues that
ANTIHYPERTENSIVES captured researchers attention [19]. They group the challenges for
OPTOMETRY future studies into 3 groups of Patient, recommender systems, and
INTERNAL MEDICINE evaluation challenges. User modeling and profiling, Data integra-
General Medicine
tion and cleaning from multiple sources are the main patient-related
M.D. challenges. On the recommender system side, personalized and ac-
curate recommendation along with step by step implementation of
HospitalOutpatient Diagnostic
recommendations using the “expert-in-the-loop" interactions. On
Essential hypertension Medical the evaluation side, the accuracy, real-life performance, ethical, and
privacy considerations are discussed.
BETA BLOCKERS
There has been a number of prior efforts in this domain. One of
the well known existing application is Promedas [13], a medical
patient-specific clinical diagnostic decision support system, that uses
Figure 8: Optimetry targets
a probabilistic graphical model built with the help of medical spe-
cialists. As discussed earlier, they help in recommending a diagnosis
specific to an individual when there is a ambiguity among physi-
6 EXISTING TECHNOLOGIES cians without rationalization. Probabilistic methods and especially
Health-care domain has specific characteristics and requirements Bayesian networks have been used in a wide range of domains. For
that can not be addressed by general purpose or commercial recom- example, Huang [9] used the Bayesian network in response to two
mender systems that are available in other domains (such as ones issues in the tourism domain. First, the absence of travel history for a
used in Netflix or Amazon) [5]. Recommender systems for clinical single user to use a content-based activity estimation and the second
activities have no specific task and it mainly depends on the item is the absence of similarity between users and other users.
that is recommended. All of the possible items are expected to be Since our system predicts multiple targets simultaneously, for each
recommended. Rating system does not exist and most clinical be- person, certain combinations of the outputs will be more likely that
haviours are binary (have a symtom or not). Compare to the general the other combinations. To address this fact, we adopted a struc-
recommender systems with specific and well defined tasks, subset tured prediction based approach that uses collective classification
of items can be recommended, rating system exists due to subjec- in its core by considering the associativity of targets as nodes in a
tive desire and behavior is not binary as a customer may refuse to graph [20, 24]. This model captures dependencies that would not
HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada Soheil Jamshidi et al.

be considered otherwise [7, 21]. At the same time, the consumers’ [6] Shobeir Fakhraei, James Foulds, Madhusudana Shashanka, and Lise Getoor. 2015.
feedback plays a vital role in fine-tuning and improving the per- Collective spammer detection in evolving multi-relational social networks. In Pro-
ceedings of the 21th acm sigkdd international conference on knowledge discovery
formance of the recommender systems. However incorporating the and data mining. ACM, 1769–1778.
users’ feedback is challenging, the system can also be exposed to [7] Shobeir Fakhraei, Bert Huang, Louiqa Raschid, and Lise Getoor. 2014. Network-
based drug-target interaction prediction with probabilistic soft logic. IEEE/ACM
bias due to personal taste or motivations as was shown in other do- Transactions on Computational Biology and Bioinformatics (TCBB) 11, 5 (2014),
mains such as Amazon or application markets [12, 27]. Therefore, 775–787.
we incorporated robust approaches similar to Torkamani et al. and [8] Robert G Farrell, Catalina M Danis, Sreeram Ramakrishnan, and Wendy A Kel-
logg. 2012. Intrapersonal retrospective recommendation: lifestyle change recom-
Fakhraei et al. [6, 12, 21, 22]. We also need to tackle the imposes mendations using stable patterns of personal behavior. In Proceedings of the First
computational complexity to the system, if the goal is to update the International Workshop on Recommendation Technologies for Lifestyle Change
system’s state on the fly in real time. For which, we can consider (LIFESTYLE 2012), Dublin, Ireland. Citeseer, 24.
[9] Yuxia Huang and Ling Bian. 2009. A Bayesian network and analytic hierar-
using updating schemes that have been used in similar domains to chy process based personalized recommendations for tourist attractions over the
reduce the updating costs [11]. Internet. Expert Systems with Applications 36, 1 (2009), 933–943.
[10] Aiswarya Iyer, S Jeyalatha, and Ronak Sumbaly. 2015. Diagnosis of diabetes
Given all the efforts have been done in this domain, our model uses using classification mining techniques. arXiv preprint arXiv:1502.03774 (2015).
a large and rich dataset and relies on a mathematically correct and [11] Soheil Jamshidi and Mahmoud Reza Hashemi. 2012. An efficient data enrichment
proved basis and is able to address a wide range of different questions scheme for fraud detection using social network analysis. In Telecommunications
(IST), 2012 Sixth International Symposium on. IEEE, 1082–1087.
about the health journey of consumers, beneficial for consumers, [12] Soheil Jamshidi, Reza Rejaie, and Jun Li. 2018. Trojan Horses in Amazons Castle:
providers, and payers over time. Understanding the Incentivized Online Reviews. In Proceedings of the 2018
IEEE/ACM International Conference on Advances in Social Networks Analysis
and Mining (ASONAM ’18). ACM.
7 CONCLUSION [13] Bert Kappen, Wim Wiegerinck, Ender Akay, Jan Neijt, and Andr’e van Beek.
2003. Promedas: A clinical diagnostic decision support system. In Proceedings of
Our proposed recommender system provides personalized, timely the 15th Belgian-Dutch Conference on Artificial Intelligence. 23–24.
and actionable health-care insights for consumers. We make rele- [14] Nguyen Cong Long, Phayung Meesad, and Herwig Unger. 2015. A highly ac-
vant suggestions by predicting the probabilities of various health curate firefly based algorithm for heart disease prediction. Expert Systems with
Applications 42, 21 (2015), 8221–8231.
events. By deploying users’ feedback from their interactions within [15] Joao Maroco, Dina Silva, Ana Rodrigues, Manuela Guerreiro, Isabel Santana,
the mobile application, we enable additional personalized sugges- and Alexandre de Mendonça. 2011. Data mining methods in the prediction of
tions. This is accomplished through an ensemble algorithm, where a Dementia: A real-data comparison of the accuracy, sensitivity and specificity of
linear discriminant analysis, logistic regression, neural networks, support vector
Bayesian network is combined with a random forest. In the future, machines, classification trees and random forests. BMC research notes 4, 1 (2011),
we can improve this framework in several ways. First, by includ- 299.
[16] Amrita Naik and Lilavati Samant. 2016. Correlation review of classification
ing data from other sources, such as lab results or nurse notes, we algorithm using data mining tool: WEKA, Rapidminer, Tanagra, Orange and
can expand the feature set to provide a more complete view of the Knime. Procedia Computer Science 85 (2016), 662–668.
consumer. This view would improve the precision-recall metrics of [17] Haggai Roitman, Yossi Messika, Yevgenia Tsimerman, and Yonatan Maman.
2010. Increasing patient safety using explanation-driven personalized content
the predictions, as well as shed a brighter light on how to increase recommendation. In Proceedings of the 1st ACM International Health Informatics
the effectiveness of the recommendations. Second, while a calendar Symposium. ACM, 430–434.
quarter is currently the feature extraction and prediction time unit, [18] Kanak Saxena, Richa Sharma, et al. 2015. Efficient heart disease prediction system
using decision tree. In Computing, Communication & Automation (ICCCA), 2015
we can change this within the probability model to predict the timing International Conference on. IEEE, 72–77.
of a health event (as an additional random variable). We could also [19] Hanna Schafer, Santiago Hors-Fraile, Raghav Pavan Karumur, Andre
Calero Valdez, Alan Said, Helma Torkamaan, Tom Ulmer, and Christoph Trattner.
expand the model to capture a longer period of health care history to 2017. Towards health (aware) recommender systems. In Proceedings of the 2017
identify missing values over time, which would lead to the discovery international conference on digital health. ACM, 157–161.
of long-term influences, such as chronic ailments. Finally, for both [20] Ben Taskar, Vassil Chatalbashev, Daphne Koller, and Carlos Guestrin. 2005.
Learning structured prediction models: A large margin approach. In Proceedings
better interpretability and overall improvement of the recommender of the 22nd international conference on Machine learning. ACM, 896–903.
system, we are working on a context-aware attention modeling al- [21] MohamadAli Torkamani and Daniel Lowd. 2013. Convex adversarial collective
gorithm to identify, invigorate, and use the most relevant features classification. In International Conference on Machine Learning. 642–650.
[22] Mohamad Ali Torkamani and Daniel Lowd. 2014. On robustness and regular-
extracted from the health data and received feedback. ization of structural support vector machines. In International Conference on
Machine Learning. 577–585.
[23] Andr’e Calero Valdez, Martina Ziefle, Katrien Verbert, Alexander Felfernig, and
REFERENCES Andreas Holzinger. 2016. Recommender systems for health informatics: State-
[1] Pragati Agrawal and Amit kumar Dewangan. 2015. A brief survey on the tech- of-the-art and future perspectives. In Machine Learning for Health Informatics.
niques used for the diagnosis of diabetes-mellitus. Int. Res. J. of Eng. and Tech. Springer, 391–414.
IRJET 2 (2015), 1039–1043. [24] David Weiss and Benjamin Taskar. 2010. Structured prediction cascades. In
[2] PK Anooj. 2012. Clinical decision support system: Risk level prediction of heart Proceedings of the Thirteenth International Conference on Artificial Intelligence
disease using weighted fuzzy rules. Journal of King Saud University-Computer and Statistics. 916–923.
and Information Sciences 24, 1 (2012), 27–40. [25] WAJJ Wiegerinck and Tom Heskes. 2001. Probability assessment with maximum
[3] Pushkaraj R Bhandari, Sapna P Yadav, Shyam A Mote, Devika P Rankhambe, UG entropy in Bayesian networks. (2001).
Scholar, and Pune APCOER. 2016. Predictive system for medical diagnosis with [26] Martin Wiesner and Daniel Pfeifer. 2014. Health recommender systems: con-
expertise analysis. International Journal of Engineering Science 4652 (2016). cepts, requirements, technical basics and challenges. International journal of
[4] Amy Compton-Phillips. [n. d.]. Care Redesign - What Data Can Re- environmental research and public health 11, 3 (2014), 2580–2607.
ally Do for Health Care. http://join.catalyst.nejm.org/hubfs/Insights% [27] Zhen Xie and Sencun Zhu. 2015. AppWatcher: Unveiling the underground market
20Council%20Monthly%20-%20Files/Insights%20Council%20March% of trading mobile app reviews. In Proceedings of the 8th ACM Conference on
202017%20Report%20What%20Data%20Can%20Really%20Do%20for% Security & Privacy in Wireless and Mobile Networks. ACM, 10.
20Health%20Care.pdf
[5] Lian Duan, W Nick Street, and E Xu. 2011. Healthcare information systems: data
mining methods in the creation of a clinical recommender system. Enterprise
Information Systems 5, 2 (2011), 169–181.