Engagement Scoring for Care-gap Intervention Optimization
                                    MohamadAli Torkamani, Malhar Jhaveri, Jynelle Mellen
                                 Michael Brown-Hayes, James Chung, Penny Pan, Hakan Kardes
                                                       {FirstName}.{LastName}@CambiaHealth.com
                                                                Cambia Health Solutions
                                                                    Portland, OR, 97201
ABSTRACT                                                                               health by performing needed preventive screenings. Care-gaps are
Timely preventative health screenings can be crucial for the early                     the result of obstacles preventing patients and physicians from
detection of serious disease or complications from chronic con-                        implementing care recommendations. Some barriers include misun-
ditions, such as diabetes. Influencing the right people to obtain                      derstanding of guidelines, lack of awareness, lack of proper trans-
recommended screenings, at the right time, can result in signifi-                      portation to clinics and hospital, fear of procedures like colonoscopy,
cantly improved health outcomes. These screenings, if not attained,                    etc.
are called care-gaps, while performing the screening is called a                          For breast cancer screenings, the USPSTF recommends that
closed care-gap. The spectrum of individuals managing their own                        women 50-74 years of age receive one mammogram every 27
health care ranges from minimal engagement (no screenings in this                      months. For colorectal cancer screenings, the USPSTF recommends
case) to high engagement (timely screenings). For those who do not                     that individuals 45-75 years of age receive either one fecal occult
obtain timely screenings, we have identified two types: individuals                    blood test (FOBT) every year, one flexible sigmoidoscopy every five
who will respond to outreach and those who will not. Therefore,                        years, or one colonoscopy every ten years [2, 6].
our focus becomes identifying the right people (those who need                            There are also specific guidelines for people with diabetes to help
to close a gap and have a likelihood of responding) at the right                       manage their care. The CDC encourages individuals with diabetes
time. This approach will maximize the effectiveness and impact of                      to annually receive at least one hemoglobin A1c (HbA1c) test to
outreaches. Our recommendation model generates a ranking order                         understand their average blood glucose levels, at least one dilated
where the individuals who are most likely to close their care-gaps                     eye exam for early detection of retinal changes, and at least one
after intervention, are ranked first. Our method shows successful                      nephropathy test to check kidney function [5].
results in detecting patients who need a prompt, and our experi-                          The National Committee for Quality Assurance (NCQA) sets
mental results show that by using this recommendation model, we                        guidelines to evaluates the performance of every health plan based
can increase the number of closed gaps.                                                on those guidelines [1]. The Healthcare Effectiveness Data and
                                                                                       Information Set (HEDIS) measures the performance of health plans
CCS CONCEPTS                                                                           based in part on care-gap closure rates of Breast Cancer Screening
                                                                                       (BCS), Colorectal Cancer Screening (COL), and Comprehensive
• Applied computing → Consumer health; Health care in-
                                                                                       Diabetes Care (DIAB).
formation systems; • Information systems → Recommender
                                                                                          To improve HEDIS performance, health plans employ clinical
systems;
                                                                                       staff to develop intervention plans for contacting patients with out-
                                                                                       reach intended to encourage them to close their care-gaps. Support-
KEYWORDS
                                                                                       ing this intervention are algorithms designed to identify members
Recommendation Impact, Care-gap Closure, Right Time Interven-                          who should have received at least one of these screenings but have
tion                                                                                   not done so.
ACM Reference Format:                                                                     The number of people covered by a health plan who are also
MohamadAli Torkamani, Malhar Jhaveri, Jynelle Mellen Michael Brown-                    eligible for the above-mentioned screenings can be quite large, and
Hayes, James Chung, Penny Pan, Hakan Kardes. 2018. Engagement Scoring                  comprehensive outreach campaigns could take several months to
for Care-gap Intervention Optimization. In Proceedings of the Third Inter-             complete. During such a campaign, everyone who is eligible for
national Workshop on Health Recommender Systems co-located with Twelfth                screening will receive a telephone call (except those who have
ACM Conference on Recommender Systems (HealthRecSys’18), Vancouver, BC,
                                                                                       opted-out on the “Do Not Call” list). If several outreach attempts
Canada, October 6, 2018 , 5 pages.
                                                                                       are unable to contact a person, then a letter outlining the relevant
                                                                                       information is mailed to their address.
1    GAPS IN CARE
                                                                                          In general, some portion of the population is not engaged in their
Based on condition, age, and sex, the US Preventive Services Task                      own health. They may be resistant to interventions and may ignore
Force (USPSTF) and Centers for Disease Control and Prevention                          phone calls and letters. This sub-population is less likely to close
(CDC) publish guidelines for how Americans can best manage their                       their care-gaps even if they are contacted several times via various
                                                                                       channels. Some people might request that their names be added to
HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada                                the Do Not Call list. On the other hand, a complementary portion
© 2018 Copyright for the individual papers remains with the authors. Copying permit-   of the population are highly engaged in their health. They do not
ted for private and academic purposes. This volume is published and copyrighted by
its editors.
                                                                                       require interventions at all. They will close their care-gaps on their
                                                                                       own. Finally, there is a group of people who will likely close their
HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada                                                                        MA Torkamani et al.


care-gaps, but only after being contacted. One of the challenges         and human error, the missing values could be caused by the com-
of the healthcare system is to prioritize resource allocation to the     plicated structure of the healthcare system in the United States. For
patients who are higher risk and are more likely to be impacted          example, a value not being present in a patient’s manifest could
positively.                                                              be due to their multiple coverages or lack of eligibility for specific
   To address this problem, we have designed a machine learning          periods of time. To deal with this problem, we assume that similar
recommender system that ranks the more impactable care-gaps              patients require similar types of care. Also, many of the features
higher – i.e., the ones that are more likely to be closed âĂŞ for        co-occur. For example, many of the diagnoses are comorbidities
each person. Some of the screenings such as colonoscopy and eye          that patients have at the same time, or certain drugs are always
exam for diabetes are harder to close. Our model can be used to          prescribed for specific ailments. As a result, the table of our features
frame a personalized message for each member based on their              for all the patients at all time should form a low-rank matrix. We
engagement score, and the degree of difficulty in closing their open     use a low-rank matrix completion approach for both filling the
gaps. If a person is likely to close at least one care-gap, the system   missing values as well as smoothing the features and removing the
recommends this person to the care advocates for intervention.           noise [4].
                                                                             Our approach is similar to the Robust PCA method by Candès et.
2     MANIFEST SPACE                                                     al. [3]. Let X i j be the observed value for feature j for the ith obser-
To create a reliable feature vector, we have used both direct feature    vation (i.e.,p̃atient, year and quarter). We learn a low-dimensional
extraction from the data and a collaborative filtering method for        approximation M of the full matrix X , by solving the following
data imputation. The data we observe in the claims-record we term        nuclear norm minimization convex program:
manifests.                                                                                                         X
                                                                                        minimize      ∥M ∥∗ + λ          |X i j − Mi j |
                                                                                             M
2.1    Data                                                                                                         ij
                                                                                                      X
We have used four manifests categories of data sources to create                        subject to          (X i j − Mi j ) 2 ≤ ϵ
our features: pharmaceutical claims, specialties of the providers that                                 ij
patients have been visiting, diagnoses made, and the final services         ∥M ∥∗ is the nuclear norm of the smoothed data, which encour-
performed.                                                               ages M to be low-rank. i j |X i j −Mi j | is the ℓ1 norm of the distance
                                                                                                 P
   Data used for creating the model is derived from claims data          between the observations and their approximations and allows the
with approximately 30 million rows and over 3 million individ-           existence of some outlier vaues, while it pushes matrix M to be
uals, containing information such as their gender (male, female          close to observed values of matrix X as much as possible. From a
and unknown), dates of birth and death (if applicable). Based on a       statistical point of view, this term assumes that there is a sparse
patients age, we categorized people in twelve age groups (≤1, 1-4,       set of outliers that can be modeled using a Laplacian distribution.
5-12, 13-17, 18-25, 26-35, 36-45, 46-55, 56-65, 6675, 76-85 and ≥86).    The constraint i j (X i j − Mi j ) 2 ≤ ϵ encourages the closeness of
                                                                                          P
The data is aggregated to the industry standard quarters from year       M and X in general, but in specific it limits the effects of Gaussian
2012 to 2017.                                                            additive noise such as variations in the number of prescription of
   The features include everything that a member could claim from        the same medication for similar people by different doctors. The hy-
a payer (medical rehab, surgeries, treatment for health disorders,       perparameter λ and ϵ are tuned by cross-validation using held-out
prescriptions, etc). within the applicable period. Also, in the data     samples after model training explained in the next section.
set, each person might be present in multiple year/quarters.
   To create an extended feature vector, we constructed a bag-of-        3     ENGAGEMENT MODEL
words representation for the presence of every possible value that
                                                                         We designed an ensemble model that predicts the likelihood of
a manifest could have. For example, we used several therapeutic
                                                                         closing care gaps after a phone call. For the people who are more
groupers for pharmaceutical data. And, we used the number of
                                                                         likely to close their care-gap after a call, we recommend a higher
times that the patient had a prescription for a specific medication
                                                                         priority for outreach, because the data and model show that we can
in one calendar quarter as the corresponding feature for that drug.
                                                                         impact them.
We used the same count-based representation for all other features.
   The feature vector was also augmented with patientsâĂŹ de-
mographics in the calendar quarter of interest. This included their
                                                                         3.1    Predictive modeling of care-gap closure
age, gender, and several features from the United States census data            using experimental data
based on their home neighborhood.                                        Members with open care-gaps can be grouped into three groups.
   We hand-crafted some feature that we expect to indicate health           (1) Members who will close their care-gaps by themselves.
engagement. In particular, we constructed several features for their        (2) Those who will respond to an outreach by closing their care-
medication-adherence based on how timely they are in refilling                  gaps.
their recommended prescriptions.                                            (3) Those who are not engaged with their healthcare and will
                                                                                not close their care-gaps, even after outreach. (Figure 1)
2.2    Smoothing and Missing Value Imputation                               With the experimental dataset described in the protocol below,
A problem with claim data is that missing values do not necessarily      we will be able to estimate the following at an individual-member
mean that the patients have not had a manifest. Besides the noise        level.
Engagement Scoring for Care-gap Intervention Optimization                                      HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada


            Closed on             Closed after             Closed after
            their own             one contact              two contact                  Figure 3: A: More Likely To Close by themselves or after a
                                                                                        small nudge, B: Likely to Close after intervention, C: Calling
                                  Not closed                Not closed          Tim e

                                  after one                 after two
                                                                                        them does not change their behaviour, and the are not likely
                                   contact                   contact                    to close their gaps. D: Score-Based Prioritization E: Heuristic
                        First                    Second
                                                                                        (Random) Prioritization. Note that prioritization of the top
                        contact                  contact                                100 consumers will gain 288% improvement using the pro-
                                                                                        posed model.
Figure 1: Segmentation of the population based on their be-
havior in closing their care gaps before being contacted, or
after one and two interventions.


    (a) Probability of closing care-gaps without outreach.
    (b) Using 1 and 2, we also estimated the increased likelihood of
        closing care gaps after outreach. In other words, we will be
        able to measure the value it adds to contact a person, and
        how the likelihood of care gap closure increases accordingly.
    (c) Probability of not closing care-gaps with outreach.

3.2     Implementation of the Score
The input to our model is the smoothed and imputed feature vectors,
as well as gold standard targets from intervention in the previous                                         Closed after        Not closed after
years. After feature imputation, we use a hybrid ensemble method                          Medicare         intervention         intervention           (p-value)
                                                                                          Measure                                                    *: statistically
consisting of random forest and a support vector regression model                                      N         Avg. score   N        Avg. score      significant
for computing the probability of being impacted by an intervention.
   The output scores from the model were used to prioritize the                           Wellness     1,184        0.58      3,783        0.5            <0.01*

member list for closing care-gaps for Medicare and Commercial                             COL           797         0.6       4,572       0.52            <0.01*
lines of business.                                                                        BCS           461         0.58      1,716       0.51            <0.01*
   Experimental Study design was a randomized controlled study                          Table 1: Effectiveness of the engagement model for the Medi-
of Medicare members with an open care-gap for at least one mea-                         care line of business
sure from annual wellness visit, colorectal cancer screening, and
breast cancer screening. Randomization was done at a member-
level using stratification by engagement model score, i.e. samples
were randomly selected from each decile of the score distribution.                                           Closed after     Not closed after
                                                                                            Medicare         intervention      intervention           (p-value)
Below is the study design diagram (Figure 2). The interventions                             Measure                                                 *: statistically
were performed for 10,045 unique members.                                                                N       Avg. score   N       Avg. score      significant

                                                                                            Wellness     228        0.59      885        0.51           <0.01*

                                                                                            COL          140        0.58      973        0.52           <0.01*
Figure 2: Randomized study design – ∗ Engagement score is
                                                                                            BCS          163        0.58      524        0.52           <0.01*
the output from Data Science Model that will predict who
will likely to close the gaps (higher score is better)                                  Table 2: Effectiveness of the engagement model for the con-
                                                                                        trol group only. Note that while there have been no interven-
                     Eligible Members                                                   tions for control group, the time interval of waiting for care-
                        Identified for                                                  gaps to be closed has been consistently selected for both the
                    Outreach (N=11,158)
                                                                                        intervention and the control groups.
                                                           Stratified Randomization
                                                           based on Engagement
                                                           Score*
                                                                                        with outreach, directly from data. To jointly measure the perfor-
       Intervention Group                  Control Group                                mance of our model as well as the effectiveness of interventions,
        (N=10,045, 90%)                   (N=1,113, 10%)                                an experimental study was designed.
                                                                                           Out of 10,045 members, 9,768 members could be contacted. Dis-
                                                                                        tribution of measures for outreached members is shown in Figure 4
                                                                                        (highest for colorectal screening following by Wellness and breast
4     EXPERIMENTAL RESULTS                                                              cancer screenings.
While the model can identify the likelihood of closing care-gaps, we                       Table 1 presents the effectiveness of the engagement model. Here
are unable to calculate the likelihood of impacting care-gap closures                   the effectiveness is measures by comparing the average engagement
HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada                                                                              MA Torkamani et al.


   Medicare
                     Closed after      Not closed after
                                                             (p-value)
                                                                                 Also, the distribution of the engagement score for members who
                     intervention       intervention
   Measure                                                 *: statistically   closed versus who didnâĂŹt closed the wellness gaps is shows in
                 N       Avg. score   N       Avg. score     significant      Figure 5. The patients who closed wellness measure had signifi-
   Wellness      956        0.57      2,898      0.50          <0.01*         cantly higher engagement score. Similar patterns were observed
   COL           657        0.60      3,599      0.52          <0.01*
                                                                              for Breast and colorectal cancer screening measures (Figure 5). This
                                                                              verifies that our reommender system has successfully selected the
   BCS           298        0.57      1,192      0.51          <0.01*
                                                                              impactable people for outreach.
Table 3: Effectiveness of the engagement model for the in-
tervention group only
                                                                              Figure 5: The distribution of scores of members who closed
                                                                              their wellness gaps vs. the members who didn’t close. We ob-
                                                                              serve a significant difference between the two cohorts’ pre-
score between members who closed the gaps versus who didnâĂŹt                 dicted scores.
closed the gaps for the respective eligible measures. As you can see                                  Distribution of Engagement Scores
for all the three measures, members who closed the respective gaps                                                                        Wellness gap status after
                                                                                                                                          intervention:
had a significantly higher engagement score.                                                                                              1: Gap Closed
                                                                                                                                          0: Gap Open
   In Table 1, we show the effectiveness of the model within the
whole population, i.e., both the control and intervention groups
combined. To study the performance of the model itself, we should


                                                                                  Percent
also investigate how the members of the control group behaved
regarding their open care-gaps in the period following the inter-
vention campaign. To do so, we performed analyses similar to the
process for Table 1, and we analyzed the intervention and control
groups in tables 2 and 4 separately. As the results in Table 2 state,
the model has been able to successfully identify the people who are                                            Engagement Score
engaged in their health and have closed their care-gaps without
being contacted during this campaign. Table 4 is also aligned with
tables 1 and 2, and it shows that the model has performed similarly
well for the intervention group.
                                                                              5        DATA ETHICS
                                                                               In building our data science models and algorithms, we evaluate
  Carenet Outcome Population
          Figure 4: Intervention Outcome Population
                                                                               and test for accuracy, precision and fairness to account for potential
                                                                               biases. We also ensure that the privacy of individuals are respected.
                                          Intervention                         It is critically important to build models that are primarily beneficial
                                             Group                             for our customers. Both through traditional and digital customer
                                           (N-10,045)                          interfaces, these models should also create, deepen and maintain
                                                                               consumers’ trust.
                                                                                   One of the data ethics considerations within the care gaps model
                                                         Didn’t                included    the control group size and experimental size. Specifically,
                             Outreached                Outreached
                              (N=9,768)                                        while the typical ratio would be 30% to 50% or a control group, we
                                                        (N=277)
                                                                               opted for 10%. This 10% was also randomly chosen from stratified
                                                                               scores to confidently avoid an accidental accumulation of specific
                                                                               population in the control group. This way, our efforts would more
         Wellness
                                                                               rapidly include as many individuals as possible, but also provided
                            BCS Screening            COL Screening             enough data for statistically significant conclusions to assess the
        Screening             (N=3,082)                (N=8,500)     Total # of measures: 19,378
        (N=7,796)                                                              effectiveness of interventions and the model in one study. It is
                                                                     i.e. 1.98 measures/member
                                                                               important to note that our model does not exclude individuals from
                                                                               being contacted. Instead, it prioritizes based on impact-ability at
   Table 4 shows the effectiveness of interventions in closing gaps            the right time. Individuals in the control group were not excluded
for three measures. There is a statistically significant positive lift         from other campaigns outside the study period.
for wellness and colorectal measures (absolute difference of +4.3%                 Other ethical considerations were related to how we apply the
and +2.8%, and relative lift of +20.97% and 22.22% ). Lift is the gap-         scoring within the model. For example, part of the scoring includes
closure percentage difference between intervention and control                 the number of gaps an individual has and the severity. We prioritize
groups. Lift for breast cancer screening measure is negative but not           people with more open gaps higher, regardless of their engagement
statistically significant. Negative lift is partly due gender specific         scores. To best understand these details and how to translate the
measure and during the randomization process members in control                details into a meaningful and ethical model, we have closely collab-
group were selected at a member level and not at a measure level.              orated with key subject matter experts within the organization.
Engagement Scoring for Care-gap Intervention Optimization                                  HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada


                            Total number of                  Intervention Group                             Control Group                         Lift% (p-value)
 Medicare Measure         subjects in the study                                                                                              *: statistically significant
                                                  Ncontacted     Nclosed    closed (%)         Ncontrol       Nclosed      closed (%)
                                                                                                                                               Abs.: +4.3% (<0.01*)
 Wellness Measure                 4,967              3,854          956           24.80%          1,113          228          20.50%
                                                                                                                                                  Rel.:+20.97%
                                                                                                                                               Abs.:+2.8% (<0.01*)
 COL Measure                      5,369              4,256          657           15.40%          1,113          140          12.60%
                                                                                                                                                   Rel.:22.22%
                                                                                                                                                Abs.:-3.70% (>0.1)
 BCS Measure                      2,177              1,490          298           20.00%           687           163          23.70%
                                                                                                                                                  Rel.:-15.61%

Table 4: Effectiveness of interventions: We observe that interventions improve care-gap closure for wellness and colorectal
measures by 4.3% and 2.8% with statistical significance. However, the impact of interventions for breast cancer screening is
negative, although the difference is not statistically significant. (Abs.: absolute change, Rel.: absolute change normalized by
control baseline.)


6   CONCLUSION                                                                       It is easy to add other measures to this system. The model is
Care gaps closure in not only financially important for the health-               built on healthcare industry standards (e.g., ICD-10, CPT codes,
care system, but also it directly helps patients’ well-being by iden-             therapeutic classes). Therefore, it can be used by a broader popula-
tifying conditions at early stages.                                               tion. The engagement score can also be used as a proxy for general
   Our proposed recommendation system generates ordered pri-                      healthcare engagement for marketing applications.
oritization of the patients who are more likely impacted by phone
interventions. The results show that in practice, people with high
engagement scores are more likely to close their care-gaps after
                                                                                  REFERENCES
being outreached. Our recommender system could prioritize the
                                                                                  [1] Nancy D Beaulieu and Arnold M Epstein. 2002. National Committee on Quality
outreach and can differentiate who is likely to close the gaps after                  Assurance health-plan accreditation: predictors, correlates of performance, and
an intervention.                                                                      market impact. Medical Care (2002), 325–337.
                                                                                  [2] Ned Calonge, Diana B Petitti, Thomas G DeWitt, Allen J Dietrich, Kimberly D
   This model can be extended in several directions. For example,                     Gregory, Russell Harris, George Isham, Michael L LeFevre, Roseanne M Leipzig,
if we afford to contact many people, we can use the system for                        and Carol Loveland-Cherry. 2008. Screening for colorectal cancer: US Preventive
personalized messages during the outreach. People who are not                         Services Task Force recommendation statement. Annals of Internal Medicine 149,
                                                                                      9 (2008), 627–637.
much engaged in their healthcare, might be motivated by more                      [3] Emmanuel J Candès, Xiaodong Li, Yi Ma, and John Wright. 2011. Robust principal
incentives and encouragement, but highly-engaged people might                         component analysis? Journal of the ACM (JACM) 58, 3 (2011), 11.
be willing to have more information about other health-related                    [4] Emmanuel J Candès and Benjamin Recht. 2009. Exact matrix completion via
                                                                                      convex optimization. Foundations of Computational mathematics 9, 6 (2009), 717.
activities.                                                                       [5] Sarah Stark Casagrande, Catherine C Cowie, and Judith E Fradkin. 2013. Utility of
   We can also use this system, to contact people who need more                       the US Preventive Services Task Force criteria for diabetes screening. American
                                                                                      Journal of Preventive Medicine 45, 2 (2013), 167–174.
time to close their care-gaps first and reach to people who will                  [6] Laura Davisson. 2016. USPSTF breast cancer screening guidelines. West Virginia
respond faster afterward.                                                             Medical Journal 112, 6 (2016), 29–32.