=Paper= {{Paper |id=Vol-2216/healthRecSys18_paper_6 |storemode=property |title=A Hybrid Health Journey Recommender System Using Electronic Medical Records |pdfUrl=https://ceur-ws.org/Vol-2216/healthRecSys18_paper_6.pdf |volume=Vol-2216 |authors=Soheil Jamshidi,Mohamad Ali Torkamani,Jynelle Mellen,Malhar Jhaveri,Penny Pan,James Chung,Hakan Kardes |dblpUrl=https://dblp.org/rec/conf/recsys/JamshidiTMJPCK18 }} ==A Hybrid Health Journey Recommender System Using Electronic Medical Records== https://ceur-ws.org/Vol-2216/healthRecSys18_paper_6.pdf
                    A Hybrid Health Journey Recommender System
                          using Electronic Medical Records
                                            Soheil Jamshidi, Ali Torkamani, Jynelle Mellen,
                                         Malhar Jhaveri, Penny Pan, James Chung, Hakan Kardes
                                                                  Cambia Health Solutions
                                                                     Portland, OR, 97201
                                                         {firstName}.{lastName}@cambiahealth.com
ABSTRACT                                                                                 offer insights to researchers. However, administrative data and EHRs
We present a recommender system aimed at improving the healthcare                        are scattered among numerous entities and sources, such as health
experience of consumers. Our model provides actionable insights to                       plans, laboratories, providers, hospitals, chart notes, and more. In ad-
cohorts or individuals, based on their collective and personal health-                   dition to the disparate sources, the breadth, depth, linkage, and scale
related data. The actionable insights are delivered through digital                      of the data lead to further complexity. For end-users (consumers) it
interventions to help prevent adverse events for the consumer. By                        can make interpretation difficult due to information overload or a
proposing timely and personalized suggestions, we will improve                           lack of information, as well as term inconsistency. The increasing
consumer health outcomes and prevent complications, which would                          need to leverage the health records led to the presence of Health
also result in cost-savings. Our recommendation system employs an                        Recommender Systems (HRS) [4, 13]. Such recommender systems
ensembling technique, where at its core, we have a Bayesian network                      can target medical experts or patients and play a vital role in improv-
that uses administrative claims data but could be extended to use                        ing an individual’s health by providing insightful recommendations.
Electronic Health Records (EHR) data for learning the structure of                       These systems are primarily created to handle ambiguous diagnosis
the interwoven health graph (conditions, medications, procedures,                        situation because of the varied decisions of providers [13] for certain
and more). This method allows for predicting the probability of                          diseases. In that case, recommenders created by medical specialists
various outcomes conditioned on the consumers’ evidential health                         provide insight for tailored diagnosis procedure for patients. To this
data. We also couple our ensemble method with a shallow random                           aim, Machine Learning (ML) methods act as enablers. There has
forest model to further refine the personalized recommendations after                    been a large number of studies on a wide range of machine learning
receiving the consumer’s feedback. The experimental results show                         techniques - such as decision trees, multi-layer perceptron (MLP),
that our system significantly improves the precision-recall metrics                      support vector machine (SVM) - that has been focused on a variety
of several intervention targets compared to a random baseline.                           of diseases - dementia, kidney, and heart diseases to name a few
                                                                                         [1–3, 10, 14–16, 18].
CCS CONCEPTS                                                                                In this paper, we present a consumer-focused recommender sys-
                                                                                         tem that will give individuals suggestions based on their collective
• Applied computing → Consumer health; Health care informa-
                                                                                         health-related data (EHR and claims). To accomplish this, we lever-
tion systems; • Information systems → Recommender systems;
                                                                                         age an ensemble algorithm, where a Bayesian network (BN) is
                                                                                         combined with a random forest (RF). Probabilistic Graphical Mod-
KEYWORDS
                                                                                         els such as BN are known to tolerate the data uncertainty (noise,
Health Journey; Health Recommender System; Hybrid Recommender                            ambiguity, and missing values) in a consistent and mathematically
System                                                                                   correct way [25] in inference phase and RF facilitates refining the
ACM Reference Format:                                                                    personalized recommendations after receiving the consumer’s feed-
Soheil Jamshidi, Ali Torkamani, Jynelle Mellen,, Malhar Jhaveri, Penny Pan,              back. Such a system learns the conditional probability table using
James Chung, Hakan Kardes . 2018. A Hybrid Health Journey Recommender                    maximum entropy or belief propagation approach on large datasets
System, using Electronic Medical Records. In Proceedings of the Third Inter-             derived from over a million records with thousands of diagnoses and
national Workshop on Health Recommender Systems co-located with Twelfth                  findings, and over a hundred variables. Medical and pharmacy claims
ACM Conference on Recommender Systems (HealthRecSys’18), Vancouver,
                                                                                         in combination with laboratory results, nurse notes, and consumer
BC, Canada, October 6, 2018 , 6 pages.
                                                                                         data form a substantially powerful aggregated dataset that can be
                                                                                         used to train a model that offers actionable recommendations. This
1     INTRODUCTION                                                                       recommendation system can serve a variety of health-related appli-
Availability of clinical data in form of Electronic Health Records                       cations (cost predictions and engagement modeling for example) or
(EHR) has dramatically increased over the past decade. In addition                       can be presented as a product to the market.
to EHRs, there are large volumes of administrative payment data that                        The structure of the remaining paper is as follows. Section 2 intro-
                                                                                         duces the features of the data. Section 3 demonstrates the methods
HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada                                  used for learning and inference. Section 4 explains the technical
© 2018 Copyright for the individual papers remains with the authors. Copying permitted   details of our graphical model. We discuss the experimental results
for private and academic purposes. This volume is published and copyrighted by its       in Section 5. Then, in Section 6 we review the existing techniques
editors.
HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada                                                                      Soheil Jamshidi et al.


                                                           >=86
                                                           76-85                                        Dental
     Male                                                  66-75
                                                           56-65
                                                           46-55




                                                 Age bin
                                                           36-45
                                                           26-35                                       Hospital
                                                           18-25
                                                           13-17
    Female                                                  5-12
                                                             1-4                                       Medical
                                                             <1
                                                               0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5
            0.0   0.1   0.2   0.3   0.4   0.5                             percentage (%)                      0.0 0.1 0.2 0.3 0.4 0.5 0.6

            Figure 1: Gender Distribution                   Figure 2: Age Group Distribution              Figure 3: Claim Types Distribution

and studies. We conclude the paper with a discussion and an outline             PGMs can handle large datasets in a computationally tractable man-
of the directions for further research in Section 7.                            ner. We leverage a Bayesian network on a Directed Acyclic Graph
                                                                                (DAG). Before learning the conditional probabilities of the network,
2     NATURE OF DATA                                                            we found it useful to transform the data in a way that each patient
                                                                                observation corresponded to the manifests exhibited in a quarter,
The dataset used in our recommender system is largely derived from
                                                                                which is long enough to cover a sequence of symptoms, that let us
the claim data with over 1.5 million claims and over a hundred
                                                                                focus on a less noisy sequence of healthcare events. We consider
distinct attributes over a 6 year period.
                                                                                populating the data for the immediate previous quarter and the next
   Available attributes can be grouped as numerical and categorical
                                                                                quarter for every quarter in our dataset, and hence there are triple of
attributes. For example, consumer gender has two states, male and
                                                                                the observations (or rows) for any given patient.
female. Figure 1 illustrates that more than 55% of our target group
are women. We divided Age into twelve bins as follows: ≤1, 1-4,
5-12, 13-17, 18-25, from 26 to 85 in 10-year bins, and ≥86. It was
confirmed by domain experts that patients in these age ranges gener-
ally develop similar conditions. The distribution of the number of
consumers per age bin is shown in Figure 2. The claim records cover
4 types of data sources: dental, medical, hospital. As it is shown
in Figure 3, 65% of the claims are related to medical claims. We
term the observed data, the manifests. The manifests are computed
as aggregations based on meaningful categories. For each manifest,
there is an integer count ≥ 0 that signifies the number of times a                Figure 4: Proposed framework for our recommender system
person had the event in a quarter. Currently, our system only relies
on 4 major categories; drug classification, diagnosis classification,
providers specialty, and service category. Finally, since each patient
                                                                                 3.1   Structure learning
is associated with many records (claims) we aggregated records of
each patient by calendar quarters (as suggested by domain experts               Given the transformed data, we can translate the questions to the
to be the most relevant time frame to capture related health events).           following prediction/inference problem: Let a patient with a total of
However, our system can be used for different time granularities.               N features (manifests) including his/her medical tests, drugs, health
                                                                                events, and other manifests related to a period, the previous period
                                                                                and the next period. If we observe x features out of these N features,
3     METHODOLOGY                                                               can we predict the values (or the probability distributions) of the
We rely on a hybrid approach leveraging Probabilistic Graphical                 remaining N −x features based on the available historical data. Using
Models (PGM), Random Forest (RF), and Collaborative Filtering                   this data as input we learn the structure and create the model. The
(CF) technique to obtain a vector of recommendations and combine                steps involved are as follows:
the results using an ensembler. This way, we benefit from the power             First, to derive a joint probability distribution table, we transformed
of PGMs in capturing the propagation of effects, CF in considering              the input matrix to a discrete form with 0, 1 states. The input data
the similar situations, and RFs targeting tailored recommendations.             had the number of times a manifest was observed for a patient in
Our proposed framework is illustrated in Figure 4 where data from               a given quarter. If this value was non-zero, we replaced it with a 1.
different sources is fed into analysis block where we train and use             The matrix now represents if the manifest occurred at least once in
our models. The output is a list of recommendations delivered via a             the period.
mobile application. Based on the feedback we gather from the users              Then, we convert the matrix to have manifests observed in a quarter
(consumers, providers, etc.) on the quality of the recommendations,             along with manifests observed in the next and previous quarters.
we can optimize the weights of the ensembler. In this study, we focus           This transformation provided us with the data structure with which
only on the green boxes, PGM and RF and the rest will be touched                we could predict or infer manifests of a quarter given those from
in our future works.                                                            another quarter.
A Hybrid Health Journey Recommender System                               HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada


  We consider the data as a matrix A. Then, we find AT × A where         pay more attention to the more important segments of information.
AT represents the transpose of A. The resulting symmetric matrix B       Attention modeling will be done as a part of our future work.
has the number of joint occurrences of manifests across all patients.
The diagonals in this matrix represent the number of occurrences         4     TECHNICAL DETAILS
of the manifest for all of the patients. Dividing the row values by      In this section, we briefly review the technical aspects of our Bayesian
the diagonal value in the row resulted in the conditional probability    network and how parameter learning methods are used to estimate
                                                      P (M R, MC)
of the column manifest, given the row manifest, P (M R) where            the conditional probabilities for the given set of claims and predict
MR is the manifest in the row, MC is the manifest in the column,         the state or occurrence of a new set of claims.
resulting in P(MC |MR). As expected, all the diagonals reduce to 1,
and the resulting matrix is no more symmetric.                           4.1    Bayesian Network
    To determine the significant relationships and to discard those      Bayesian network (BN) is a probabilistic graphical model (PGM)
that were not as significant, we set a threshold of 0.05 (5%) for        that represents a set of random variable (nodes) say X 1 , ....., X n
the conditional probability. If a conditional probability was greater    and their conditional dependencies (edges corresponding to direct
than 5% we retain it. This choice of threshold was arbitrary, and        influence of one node on another), say X 1 ⊥   ⊥ X 3 |X 6, using a directed
as an improvement, we should consult with our domain experts to          acyclic graph (DAG). By surfacing these independencies we can
verify the structure and adjust accordingly. Each relationship with      reduce the number of values needed to be stored in order to represent
a conditional probability above the threshold represents a directed      the joint probability distribution and thus makes the representation
edge in a graph, with the arrow going from the row manifest to           more compact.
the column manifest. Then, we remove cyclic relationships (the              For our purpose, we use two layers of inference: structure and
diagonal entries because Bayesian model does not allow loops)            parameter learning. By leveraging structure inference, we create
using the networkX python library. This function detects cycles on a     the skeleton using conditional probabilities and domain expert input
first-come basis and removes the last encountered edge once a cycle      which captures the dependencies between the variables. The second
is detected. These edges form a DAG structure.                           layer utilizes the dependencies and historical data to estimate the
    The fit function estimated the Conditional Probability Distri-       conditional probability distributions of the individual variables.
bution (CPD) for each variable based on the given data and the              In parameter learning, there are two main methods:
parameter estimation approach we use. In our case, we use the                 • Maximum likelihood estimation
Bayesian parameter estimation because it considers the probability            • Bayesian estimation
distribution representing our prior knowledge (how likely are we to
                                                                         We use Bayesian estimation over maximum likelihood estimation
believe in the different choices of parameters) and the support of the
                                                                         (MLE) because MLE considers a uniform prior distribution and this
data (because confidence increases with more data). Moreover, our
                                                                         might lead us to end up in wrong conclusions about the likelihood of
prior distribution is not uniform and hence this is also a reason to
                                                                         a variable θ X i and adjust the likelihood based on whether the sample
use Bayesian parameter estimation. Since our aim is to predict the
                                                                         is biased or not. Also, MLE does not update the confidence of θ X i
values of an unknown manifest we fit the model with the training
                                                                         with the change in the size of the data (450,000 out of 1,000,000
data set. At this step, the given data in graphical form was ready for
                                                                         vs 45 out of 100). Thus, in Bayesian estimation, we use the prior
performing various types of reasoning.
                                                                         knowledge about θ with its probability distribution. This distribu-
                                                                         tion will represent how likely we believe the different choices of
3.2    Hybrid Scoring                                                    parameters. Therefore, we can create a joint distribution, which cap-
As shown in Figure 4, we use multiple models to obtain the final         tures the assumption over the parameters θ and the data we are to
recommendation. One approach is to apply weighted ensemble meth-         observe. Each new data point gives us more information about θ and
ods to obtain better predictive performance than could be obtained       hence the probability of the next occurrence. Hence the posterior
from each of these models independently. The final recommenda-           distribution for Bayesian estimation is:
tions can be used for a wide range of applications. Here, we focus                                            Pr(x[1], .., x[M]|θ ) Pr(θ )
on a Mobile App that provides health-care benefit or educational                    Pr(θ |x[1], .., x[M]) =                                      (1)
                                                                                                                  Pr(x[1], .., x[M])
recommendations. For example, if a high probability of hypertension
is predicted, then the App would recommend that the person visits        4.2    Inference
his/her Primary Care Physician (PCP).
                                                                         Finding the conditional probability distribution (CPD) over some
                                                                         variables Pr(Y |E = e) is the same as inferring from a model. There-
3.3    Feedback loop                                                     fore, predicting values for a new data point is the same as finding
Feedback is a valuable asset for personalizing the recommendations       the conditional probability of the unknown variables, given the ob-
as well as making better recommendations to similar people. The          served values of other variables. The CPDs can be computed from
feedback loop can directly contribute to updating the weight vector      the joint probability distribution of the variables, by marginalizing
of the ensemble method, as well as hyperparameter tuning of the          and reducing them over variables and states.
individual models. In fact, we need an online learning algorithm            In addition, we are interested in finding the state of a set of
to incorporate the feedback into the re-training phase. However,         variables given other set of variables. It is simply an inference query
not every feedback or data observation has the same weight/qual-         over the model and state having higher probability would be the
ity. Therefore, we need to consider a context-aware algorithm to         prediction by the model. However, computing the joint probability
HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada                                                                    Soheil Jamshidi et al.


distribution will give us an exponentially large table, which the          and vision benefits, respectively. In each figure, the expected result
probabilistic graphical model helps to avoid these tables. There are       from a random guess - based on the frequency of the positive class
two algorithms we can use for inference:                                   that is 27.38% (4.66%) for dental (vision) benefit - is also depicted.
     • Variable elimination
     • Belief propagation
                                                                                                   PR-ROC for Dental benefits
We use variable elimination over belief propagation since the former
                                                                                                                                Model Scores
is suitable for a very large network as it is not memory-expensive. In                   70                                     Random Guess
addition, we discard the generated intermediate factors and hence it
is more flexible than BP. In variable elimination, consider the model                    60
A → B → C → D and we try to find Pr(D):




                                                                             Precision
                        Õ                                                                50
              Pr(D) =       Pr(a) Pr(b |a) Pr(c |b) Pr(D|c)         (2)
In variable elimination, we can sum over parts of the product instead                    40
of over the complete product. Hence, Eq. (2) becomes:
                   ÕÕÕ                                                                   30
         Pr(D) =                Pr(a) Pr(b |a) Pr(c |b) Pr(D|c) =
                   a        c                                                                 20        40            60        80       100
                   Õ
                        b
                                 Õ              Õ                   (3)                                      Recall
                       Pr(D|c)       Pr(c |b)       Pr(a) Pr(b|a)
                   c             b              a
                                                                            Figure 5: ROC measure of Random Forest on Dental benefits
This method helps to significantly reduce the computation required to
compute the probabilities. Hence variable elimination is much more
efficient for calculating probability distributions than normalizing
and marginalizing the joint probability distribution.                                              PR-ROC for Vision benefits
   Our prediction function uses a maximum a posteriori probability                       80                                     Model Scores
to find the states of variables corresponding to the maximum proba-                                                             Random Guess
bility in the joint distribution. This is useful when we want to predict                 70
the state of variables in our model. Moreover, we introduce another                      60
operation on factors called maximization. Maximum a posteriori                           50
                                                                             Precision




query is essentially a way to predict the state of variables, given
                                                                                         40
the state of other variables. Thus, using the trained model, we try
to predict the states of variables for new data points. To design the                    30
models, we need to create conditional probability distributions or                       20
factors, add them to the base model, create an inference object, and                     10
then do maximum a posteriori queries over it for new data points to
predict variable states.                                                                      20        40            60        80       100
                                                                                                             Recall
5   EXPERIMENTAL RESULTS
While our un-targeted Bayesian network based recommender system             Figure 6: ROC measure of Random Forest on Vision benefits
can be leveraged to address a wide range of questions, in some cases
a targeted model (such as a random forest) can be more beneficial.            Using the trained Bayesian network, we predicted the states of
In our proposed framework depicted in Figure 4, we have included           all the missing columns/features of the test set. Prediction is done
both components. To assess the abilities of our recommender system,        by belief propagation where we find the most probable state of the
we focus on recommendations of vision and dental benefits and train        unknown manifests/features given the states of the other manifest-
the random forest model on the same dataset and set of features that       s/features using CPD’s. Fig 7 shows the correlation of manifests
we trained our probabilistic graphical model and discuss the results       related to a specific target (Diabetes mellitus without complication,
in this section.                                                           in the next quarter) and their pairwise correlations considering two
   We trained the RF model using the default parameteres except for        consecutive quarters. As expected, having Diabetes mellitus without
the followings: n_estimators=30,max_depth=350, random_state=0,             complication and taking antidiabetics in the previous quarter have
and min_samples_leaf =2. Before fitting the data to the model we           a strong correlation with having it in the next quarter. Interestingly,
split the data into random train and test set in 80:20 ratio (widely       the correlation between hypertension, the disorder of lipid, and high
recommended split ratio). The test set does not contain the columns/-      Glucose in blood are also captured by our model.
manifests that we are interested in predicting. The shape of the data         The probabilities gathered from our Bayesian network can be
used was 2,010 cases with around 1.5K features. We used a subset           represented as a network. In Fig 8 the most probable manifests in
of the data set because of the constraint in the computational power.      the previous quarter that can be used to predict whether the con-
Figure 5 and Figure 6 illustrate the Precision-Recall ROC for dental       sumer will visit an optimetrist are depicted as a network where edge
A Hybrid Health Journey Recommender System                                      HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada


         DM.w.o.C._NXT                                                    1.0   buy but it does not indicate she does not like the product. However,
              DM.w.o.C.                                                         similar to other domains, machine learning methods have a clear
         ANTIDIABETICS                                                    0.8
       METFORMIN HCL                                                            advantage over manual inspection of data. Health domain is volatile
  Essential hypertension
       GLUCOSE BLOOD                                                      0.6
 DIAGNOSTIC PRODUCTS                                                            and dynamic, machine learning methods can tolerate the changes in
       Disorders of lipid
         Other aftercare                                                  0.4   medical codes due to easier retraining process compare to manually
          DME SUPPLIER
                    DME                                                         pattern recognition, robust against temporal changes of patterns -
            Med. Devices                                                  0.2
       General Medicine                                                         using a sliding window approach for learning, and can offer person-
 Laboratory - Outpatient
     INSULIN GLARGINE                                                     0.0   alized experience per hospital or patient at the same time having a




                                               DME
                                    ANTIDIABETICS
                                  METFORMIN HCL
                                  GLUCOSE BLOOD

                                    Other aftercare
                                     DME SUPPLIER
                                       Med. Devices
                                  General Medicine
                            Laboratory - Outpatient
                                    DM.w.o.C._NXT
                                         DM.w.o.C.

                             Essential hypertension
                            DIAGNOSTIC PRODUCTS
                                  Disorders of lipid




                                INSULIN GLARGINE
                                                                                general view of the whole system.
                                                                                To design a framework/system, it is necessary to know the target
                                                                                users. Two main end users can be considered for healthcare rec-
                                                                                ommender systems. Wiensner and Pfeifer [26] suggest that such
                                                                                systems can target health professionals (doctors and/or nurses) to
                                                                                help them gather additional information on a special case, or can
                                                                                identify patients as end user and deliver health-related content to
                                                                                them, such as lifestyle change recommendation [8] through changing
Figure 7: Feature Corrolation related to Diabetes mellitus with-                their sleeping, eating, and exercising routines and improving patient
out complication                                                                safety [17] and lowering health risks through informing them about
                                                                                interactions between different drugs. Policy makers are also another
                                                                                target for in this domain.
thickness, shows the prevalence of manifests related to a specific
                                                                                To design such a system, there are several guidelines. Valdez et al.
target. As shown, those who had more medical interactions (surgery,
                                                                                propose a 3-step process [23] to design a recommender system: 1)
medicines, office visit, etc.) are more probable to have optometry
                                                                                understanding the domain, 2) Evaluation , and 3) Inception. In the
event in their medical journey.
                                                                                evaluation step, the importance of user-centered criteria and ethical
                                                                                implications (trust, value, security, long-term efficiency, individual
                                           FAMILY PRACTICE
                                                                                freedom, and risk) in addition to accuracy metrics are discussed.
                                HOSPITAL
                                                                                Schafer et al. discuss the recent challenges that were tackled by the
               LaboratoryOutpatient
                                                       Surgery                  researchers and how to proceed toward a health aware recommender
                                                                                system. Personalization, the balance between persuasion and em-
                                                          Office Visits         powerment, and user trust and satisfaction are the main issues that
      ANTIHYPERTENSIVES                                                         captured researchers attention [19]. They group the challenges for
                             OPTOMETRY                                          future studies into 3 groups of Patient, recommender systems, and
                                                      INTERNAL MEDICINE         evaluation challenges. User modeling and profiling, Data integra-
          General Medicine
                                                                                tion and cleaning from multiple sources are the main patient-related
                                                                  M.D.          challenges. On the recommender system side, personalized and ac-
                                                                                curate recommendation along with step by step implementation of
           HospitalOutpatient                        Diagnostic
                                                                                recommendations using the “expert-in-the-loop" interactions. On
                Essential hypertension     Medical                              the evaluation side, the accuracy, real-life performance, ethical, and
                                                                                privacy considerations are discussed.
                             BETA BLOCKERS
                                                                                There has been a number of prior efforts in this domain. One of
                                                                                the well known existing application is Promedas [13], a medical
                                                                                patient-specific clinical diagnostic decision support system, that uses
                      Figure 8: Optimetry targets
                                                                                a probabilistic graphical model built with the help of medical spe-
                                                                                cialists. As discussed earlier, they help in recommending a diagnosis
                                                                                specific to an individual when there is a ambiguity among physi-
6   EXISTING TECHNOLOGIES                                                       cians without rationalization. Probabilistic methods and especially
Health-care domain has specific characteristics and requirements                Bayesian networks have been used in a wide range of domains. For
that can not be addressed by general purpose or commercial recom-               example, Huang [9] used the Bayesian network in response to two
mender systems that are available in other domains (such as ones                issues in the tourism domain. First, the absence of travel history for a
used in Netflix or Amazon) [5]. Recommender systems for clinical                single user to use a content-based activity estimation and the second
activities have no specific task and it mainly depends on the item              is the absence of similarity between users and other users.
that is recommended. All of the possible items are expected to be               Since our system predicts multiple targets simultaneously, for each
recommended. Rating system does not exist and most clinical be-                 person, certain combinations of the outputs will be more likely that
haviours are binary (have a symtom or not). Compare to the general              the other combinations. To address this fact, we adopted a struc-
recommender systems with specific and well defined tasks, subset                tured prediction based approach that uses collective classification
of items can be recommended, rating system exists due to subjec-                in its core by considering the associativity of targets as nodes in a
tive desire and behavior is not binary as a customer may refuse to              graph [20, 24]. This model captures dependencies that would not
HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada                                                                                           Soheil Jamshidi et al.


be considered otherwise [7, 21]. At the same time, the consumers’                          [6] Shobeir Fakhraei, James Foulds, Madhusudana Shashanka, and Lise Getoor. 2015.
feedback plays a vital role in fine-tuning and improving the per-                              Collective spammer detection in evolving multi-relational social networks. In Pro-
                                                                                               ceedings of the 21th acm sigkdd international conference on knowledge discovery
formance of the recommender systems. However incorporating the                                 and data mining. ACM, 1769–1778.
users’ feedback is challenging, the system can also be exposed to                          [7] Shobeir Fakhraei, Bert Huang, Louiqa Raschid, and Lise Getoor. 2014. Network-
                                                                                               based drug-target interaction prediction with probabilistic soft logic. IEEE/ACM
bias due to personal taste or motivations as was shown in other do-                            Transactions on Computational Biology and Bioinformatics (TCBB) 11, 5 (2014),
mains such as Amazon or application markets [12, 27]. Therefore,                               775–787.
we incorporated robust approaches similar to Torkamani et al. and                          [8] Robert G Farrell, Catalina M Danis, Sreeram Ramakrishnan, and Wendy A Kel-
                                                                                               logg. 2012. Intrapersonal retrospective recommendation: lifestyle change recom-
Fakhraei et al. [6, 12, 21, 22]. We also need to tackle the imposes                            mendations using stable patterns of personal behavior. In Proceedings of the First
computational complexity to the system, if the goal is to update the                           International Workshop on Recommendation Technologies for Lifestyle Change
system’s state on the fly in real time. For which, we can consider                             (LIFESTYLE 2012), Dublin, Ireland. Citeseer, 24.
                                                                                           [9] Yuxia Huang and Ling Bian. 2009. A Bayesian network and analytic hierar-
using updating schemes that have been used in similar domains to                               chy process based personalized recommendations for tourist attractions over the
reduce the updating costs [11].                                                                Internet. Expert Systems with Applications 36, 1 (2009), 933–943.
                                                                                          [10] Aiswarya Iyer, S Jeyalatha, and Ronak Sumbaly. 2015. Diagnosis of diabetes
Given all the efforts have been done in this domain, our model uses                            using classification mining techniques. arXiv preprint arXiv:1502.03774 (2015).
a large and rich dataset and relies on a mathematically correct and                       [11] Soheil Jamshidi and Mahmoud Reza Hashemi. 2012. An efficient data enrichment
proved basis and is able to address a wide range of different questions                        scheme for fraud detection using social network analysis. In Telecommunications
                                                                                               (IST), 2012 Sixth International Symposium on. IEEE, 1082–1087.
about the health journey of consumers, beneficial for consumers,                          [12] Soheil Jamshidi, Reza Rejaie, and Jun Li. 2018. Trojan Horses in Amazons Castle:
providers, and payers over time.                                                               Understanding the Incentivized Online Reviews. In Proceedings of the 2018
                                                                                               IEEE/ACM International Conference on Advances in Social Networks Analysis
                                                                                               and Mining (ASONAM ’18). ACM.
7    CONCLUSION                                                                           [13] Bert Kappen, Wim Wiegerinck, Ender Akay, Jan Neijt, and Andr’e van Beek.
                                                                                               2003. Promedas: A clinical diagnostic decision support system. In Proceedings of
Our proposed recommender system provides personalized, timely                                  the 15th Belgian-Dutch Conference on Artificial Intelligence. 23–24.
and actionable health-care insights for consumers. We make rele-                          [14] Nguyen Cong Long, Phayung Meesad, and Herwig Unger. 2015. A highly ac-
vant suggestions by predicting the probabilities of various health                             curate firefly based algorithm for heart disease prediction. Expert Systems with
                                                                                               Applications 42, 21 (2015), 8221–8231.
events. By deploying users’ feedback from their interactions within                       [15] Joao Maroco, Dina Silva, Ana Rodrigues, Manuela Guerreiro, Isabel Santana,
the mobile application, we enable additional personalized sugges-                              and Alexandre de Mendonça. 2011. Data mining methods in the prediction of
tions. This is accomplished through an ensemble algorithm, where a                             Dementia: A real-data comparison of the accuracy, sensitivity and specificity of
                                                                                               linear discriminant analysis, logistic regression, neural networks, support vector
Bayesian network is combined with a random forest. In the future,                              machines, classification trees and random forests. BMC research notes 4, 1 (2011),
we can improve this framework in several ways. First, by includ-                               299.
                                                                                          [16] Amrita Naik and Lilavati Samant. 2016. Correlation review of classification
ing data from other sources, such as lab results or nurse notes, we                            algorithm using data mining tool: WEKA, Rapidminer, Tanagra, Orange and
can expand the feature set to provide a more complete view of the                              Knime. Procedia Computer Science 85 (2016), 662–668.
consumer. This view would improve the precision-recall metrics of                         [17] Haggai Roitman, Yossi Messika, Yevgenia Tsimerman, and Yonatan Maman.
                                                                                               2010. Increasing patient safety using explanation-driven personalized content
the predictions, as well as shed a brighter light on how to increase                           recommendation. In Proceedings of the 1st ACM International Health Informatics
the effectiveness of the recommendations. Second, while a calendar                             Symposium. ACM, 430–434.
quarter is currently the feature extraction and prediction time unit,                     [18] Kanak Saxena, Richa Sharma, et al. 2015. Efficient heart disease prediction system
                                                                                               using decision tree. In Computing, Communication & Automation (ICCCA), 2015
we can change this within the probability model to predict the timing                          International Conference on. IEEE, 72–77.
of a health event (as an additional random variable). We could also                       [19] Hanna Schafer, Santiago Hors-Fraile, Raghav Pavan Karumur, Andre
                                                                                               Calero Valdez, Alan Said, Helma Torkamaan, Tom Ulmer, and Christoph Trattner.
expand the model to capture a longer period of health care history to                          2017. Towards health (aware) recommender systems. In Proceedings of the 2017
identify missing values over time, which would lead to the discovery                           international conference on digital health. ACM, 157–161.
of long-term influences, such as chronic ailments. Finally, for both                      [20] Ben Taskar, Vassil Chatalbashev, Daphne Koller, and Carlos Guestrin. 2005.
                                                                                               Learning structured prediction models: A large margin approach. In Proceedings
better interpretability and overall improvement of the recommender                             of the 22nd international conference on Machine learning. ACM, 896–903.
system, we are working on a context-aware attention modeling al-                          [21] MohamadAli Torkamani and Daniel Lowd. 2013. Convex adversarial collective
gorithm to identify, invigorate, and use the most relevant features                            classification. In International Conference on Machine Learning. 642–650.
                                                                                          [22] Mohamad Ali Torkamani and Daniel Lowd. 2014. On robustness and regular-
extracted from the health data and received feedback.                                          ization of structural support vector machines. In International Conference on
                                                                                               Machine Learning. 577–585.
                                                                                          [23] Andr’e Calero Valdez, Martina Ziefle, Katrien Verbert, Alexander Felfernig, and
REFERENCES                                                                                     Andreas Holzinger. 2016. Recommender systems for health informatics: State-
 [1] Pragati Agrawal and Amit kumar Dewangan. 2015. A brief survey on the tech-                of-the-art and future perspectives. In Machine Learning for Health Informatics.
     niques used for the diagnosis of diabetes-mellitus. Int. Res. J. of Eng. and Tech.        Springer, 391–414.
     IRJET 2 (2015), 1039–1043.                                                           [24] David Weiss and Benjamin Taskar. 2010. Structured prediction cascades. In
 [2] PK Anooj. 2012. Clinical decision support system: Risk level prediction of heart          Proceedings of the Thirteenth International Conference on Artificial Intelligence
     disease using weighted fuzzy rules. Journal of King Saud University-Computer              and Statistics. 916–923.
     and Information Sciences 24, 1 (2012), 27–40.                                        [25] WAJJ Wiegerinck and Tom Heskes. 2001. Probability assessment with maximum
 [3] Pushkaraj R Bhandari, Sapna P Yadav, Shyam A Mote, Devika P Rankhambe, UG                 entropy in Bayesian networks. (2001).
     Scholar, and Pune APCOER. 2016. Predictive system for medical diagnosis with         [26] Martin Wiesner and Daniel Pfeifer. 2014. Health recommender systems: con-
     expertise analysis. International Journal of Engineering Science 4652 (2016).             cepts, requirements, technical basics and challenges. International journal of
 [4] Amy Compton-Phillips. [n. d.].         Care Redesign - What Data Can Re-                  environmental research and public health 11, 3 (2014), 2580–2607.
     ally Do for Health Care.            http://join.catalyst.nejm.org/hubfs/Insights%    [27] Zhen Xie and Sencun Zhu. 2015. AppWatcher: Unveiling the underground market
     20Council%20Monthly%20-%20Files/Insights%20Council%20March%                               of trading mobile app reviews. In Proceedings of the 8th ACM Conference on
     202017%20Report%20What%20Data%20Can%20Really%20Do%20for%                                  Security & Privacy in Wireless and Mobile Networks. ACM, 10.
     20Health%20Care.pdf
 [5] Lian Duan, W Nick Street, and E Xu. 2011. Healthcare information systems: data
     mining methods in the creation of a clinical recommender system. Enterprise
     Information Systems 5, 2 (2011), 169–181.