Engagement Scoring for Care-gap Intervention Optimization MohamadAli Torkamani, Malhar Jhaveri, Jynelle Mellen Michael Brown-Hayes, James Chung, Penny Pan, Hakan Kardes {FirstName}.{LastName}@CambiaHealth.com Cambia Health Solutions Portland, OR, 97201 ABSTRACT health by performing needed preventive screenings. Care-gaps are Timely preventative health screenings can be crucial for the early the result of obstacles preventing patients and physicians from detection of serious disease or complications from chronic con- implementing care recommendations. Some barriers include misun- ditions, such as diabetes. Influencing the right people to obtain derstanding of guidelines, lack of awareness, lack of proper trans- recommended screenings, at the right time, can result in signifi- portation to clinics and hospital, fear of procedures like colonoscopy, cantly improved health outcomes. These screenings, if not attained, etc. are called care-gaps, while performing the screening is called a For breast cancer screenings, the USPSTF recommends that closed care-gap. The spectrum of individuals managing their own women 50-74 years of age receive one mammogram every 27 health care ranges from minimal engagement (no screenings in this months. For colorectal cancer screenings, the USPSTF recommends case) to high engagement (timely screenings). For those who do not that individuals 45-75 years of age receive either one fecal occult obtain timely screenings, we have identified two types: individuals blood test (FOBT) every year, one flexible sigmoidoscopy every five who will respond to outreach and those who will not. Therefore, years, or one colonoscopy every ten years [2, 6]. our focus becomes identifying the right people (those who need There are also specific guidelines for people with diabetes to help to close a gap and have a likelihood of responding) at the right manage their care. The CDC encourages individuals with diabetes time. This approach will maximize the effectiveness and impact of to annually receive at least one hemoglobin A1c (HbA1c) test to outreaches. Our recommendation model generates a ranking order understand their average blood glucose levels, at least one dilated where the individuals who are most likely to close their care-gaps eye exam for early detection of retinal changes, and at least one after intervention, are ranked first. Our method shows successful nephropathy test to check kidney function [5]. results in detecting patients who need a prompt, and our experi- The National Committee for Quality Assurance (NCQA) sets mental results show that by using this recommendation model, we guidelines to evaluates the performance of every health plan based can increase the number of closed gaps. on those guidelines [1]. The Healthcare Effectiveness Data and Information Set (HEDIS) measures the performance of health plans CCS CONCEPTS based in part on care-gap closure rates of Breast Cancer Screening (BCS), Colorectal Cancer Screening (COL), and Comprehensive • Applied computing → Consumer health; Health care in- Diabetes Care (DIAB). formation systems; • Information systems → Recommender To improve HEDIS performance, health plans employ clinical systems; staff to develop intervention plans for contacting patients with out- reach intended to encourage them to close their care-gaps. Support- KEYWORDS ing this intervention are algorithms designed to identify members Recommendation Impact, Care-gap Closure, Right Time Interven- who should have received at least one of these screenings but have tion not done so. ACM Reference Format: The number of people covered by a health plan who are also MohamadAli Torkamani, Malhar Jhaveri, Jynelle Mellen Michael Brown- eligible for the above-mentioned screenings can be quite large, and Hayes, James Chung, Penny Pan, Hakan Kardes. 2018. Engagement Scoring comprehensive outreach campaigns could take several months to for Care-gap Intervention Optimization. In Proceedings of the Third Inter- complete. During such a campaign, everyone who is eligible for national Workshop on Health Recommender Systems co-located with Twelfth screening will receive a telephone call (except those who have ACM Conference on Recommender Systems (HealthRecSys’18), Vancouver, BC, opted-out on the “Do Not Call” list). If several outreach attempts Canada, October 6, 2018 , 5 pages. are unable to contact a person, then a letter outlining the relevant information is mailed to their address. 1 GAPS IN CARE In general, some portion of the population is not engaged in their Based on condition, age, and sex, the US Preventive Services Task own health. They may be resistant to interventions and may ignore Force (USPSTF) and Centers for Disease Control and Prevention phone calls and letters. This sub-population is less likely to close (CDC) publish guidelines for how Americans can best manage their their care-gaps even if they are contacted several times via various channels. Some people might request that their names be added to HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada the Do Not Call list. On the other hand, a complementary portion © 2018 Copyright for the individual papers remains with the authors. Copying permit- of the population are highly engaged in their health. They do not ted for private and academic purposes. This volume is published and copyrighted by its editors. require interventions at all. They will close their care-gaps on their own. Finally, there is a group of people who will likely close their HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada MA Torkamani et al. care-gaps, but only after being contacted. One of the challenges and human error, the missing values could be caused by the com- of the healthcare system is to prioritize resource allocation to the plicated structure of the healthcare system in the United States. For patients who are higher risk and are more likely to be impacted example, a value not being present in a patient’s manifest could positively. be due to their multiple coverages or lack of eligibility for specific To address this problem, we have designed a machine learning periods of time. To deal with this problem, we assume that similar recommender system that ranks the more impactable care-gaps patients require similar types of care. Also, many of the features higher – i.e., the ones that are more likely to be closed âĂŞ for co-occur. For example, many of the diagnoses are comorbidities each person. Some of the screenings such as colonoscopy and eye that patients have at the same time, or certain drugs are always exam for diabetes are harder to close. Our model can be used to prescribed for specific ailments. As a result, the table of our features frame a personalized message for each member based on their for all the patients at all time should form a low-rank matrix. We engagement score, and the degree of difficulty in closing their open use a low-rank matrix completion approach for both filling the gaps. If a person is likely to close at least one care-gap, the system missing values as well as smoothing the features and removing the recommends this person to the care advocates for intervention. noise [4]. Our approach is similar to the Robust PCA method by Candès et. 2 MANIFEST SPACE al. [3]. Let X i j be the observed value for feature j for the ith obser- To create a reliable feature vector, we have used both direct feature vation (i.e.,p̃atient, year and quarter). We learn a low-dimensional extraction from the data and a collaborative filtering method for approximation M of the full matrix X , by solving the following data imputation. The data we observe in the claims-record we term nuclear norm minimization convex program: manifests. X minimize ∥M ∥∗ + λ |X i j − Mi j | M 2.1 Data ij X We have used four manifests categories of data sources to create subject to (X i j − Mi j ) 2 ≤ ϵ our features: pharmaceutical claims, specialties of the providers that ij patients have been visiting, diagnoses made, and the final services ∥M ∥∗ is the nuclear norm of the smoothed data, which encour- performed. ages M to be low-rank. i j |X i j −Mi j | is the ℓ1 norm of the distance P Data used for creating the model is derived from claims data between the observations and their approximations and allows the with approximately 30 million rows and over 3 million individ- existence of some outlier vaues, while it pushes matrix M to be uals, containing information such as their gender (male, female close to observed values of matrix X as much as possible. From a and unknown), dates of birth and death (if applicable). Based on a statistical point of view, this term assumes that there is a sparse patients age, we categorized people in twelve age groups (≤1, 1-4, set of outliers that can be modeled using a Laplacian distribution. 5-12, 13-17, 18-25, 26-35, 36-45, 46-55, 56-65, 6675, 76-85 and ≥86). The constraint i j (X i j − Mi j ) 2 ≤ ϵ encourages the closeness of P The data is aggregated to the industry standard quarters from year M and X in general, but in specific it limits the effects of Gaussian 2012 to 2017. additive noise such as variations in the number of prescription of The features include everything that a member could claim from the same medication for similar people by different doctors. The hy- a payer (medical rehab, surgeries, treatment for health disorders, perparameter λ and ϵ are tuned by cross-validation using held-out prescriptions, etc). within the applicable period. Also, in the data samples after model training explained in the next section. set, each person might be present in multiple year/quarters. To create an extended feature vector, we constructed a bag-of- 3 ENGAGEMENT MODEL words representation for the presence of every possible value that We designed an ensemble model that predicts the likelihood of a manifest could have. For example, we used several therapeutic closing care gaps after a phone call. For the people who are more groupers for pharmaceutical data. And, we used the number of likely to close their care-gap after a call, we recommend a higher times that the patient had a prescription for a specific medication priority for outreach, because the data and model show that we can in one calendar quarter as the corresponding feature for that drug. impact them. We used the same count-based representation for all other features. The feature vector was also augmented with patientsâĂŹ de- mographics in the calendar quarter of interest. This included their 3.1 Predictive modeling of care-gap closure age, gender, and several features from the United States census data using experimental data based on their home neighborhood. Members with open care-gaps can be grouped into three groups. We hand-crafted some feature that we expect to indicate health (1) Members who will close their care-gaps by themselves. engagement. In particular, we constructed several features for their (2) Those who will respond to an outreach by closing their care- medication-adherence based on how timely they are in refilling gaps. their recommended prescriptions. (3) Those who are not engaged with their healthcare and will not close their care-gaps, even after outreach. (Figure 1) 2.2 Smoothing and Missing Value Imputation With the experimental dataset described in the protocol below, A problem with claim data is that missing values do not necessarily we will be able to estimate the following at an individual-member mean that the patients have not had a manifest. Besides the noise level. Engagement Scoring for Care-gap Intervention Optimization HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada Closed on Closed after Closed after their own one contact two contact Figure 3: A: More Likely To Close by themselves or after a small nudge, B: Likely to Close after intervention, C: Calling Not closed Not closed Tim e after one after two them does not change their behaviour, and the are not likely contact contact to close their gaps. D: Score-Based Prioritization E: Heuristic First Second (Random) Prioritization. Note that prioritization of the top contact contact 100 consumers will gain 288% improvement using the pro- posed model. Figure 1: Segmentation of the population based on their be- havior in closing their care gaps before being contacted, or after one and two interventions. (a) Probability of closing care-gaps without outreach. (b) Using 1 and 2, we also estimated the increased likelihood of closing care gaps after outreach. In other words, we will be able to measure the value it adds to contact a person, and how the likelihood of care gap closure increases accordingly. (c) Probability of not closing care-gaps with outreach. 3.2 Implementation of the Score The input to our model is the smoothed and imputed feature vectors, as well as gold standard targets from intervention in the previous Closed after Not closed after years. After feature imputation, we use a hybrid ensemble method Medicare intervention intervention (p-value) Measure *: statistically consisting of random forest and a support vector regression model N Avg. score N Avg. score significant for computing the probability of being impacted by an intervention. The output scores from the model were used to prioritize the Wellness 1,184 0.58 3,783 0.5 <0.01* member list for closing care-gaps for Medicare and Commercial COL 797 0.6 4,572 0.52 <0.01* lines of business. BCS 461 0.58 1,716 0.51 <0.01* Experimental Study design was a randomized controlled study Table 1: Effectiveness of the engagement model for the Medi- of Medicare members with an open care-gap for at least one mea- care line of business sure from annual wellness visit, colorectal cancer screening, and breast cancer screening. Randomization was done at a member- level using stratification by engagement model score, i.e. samples were randomly selected from each decile of the score distribution. Closed after Not closed after Medicare intervention intervention (p-value) Below is the study design diagram (Figure 2). The interventions Measure *: statistically were performed for 10,045 unique members. N Avg. score N Avg. score significant Wellness 228 0.59 885 0.51 <0.01* COL 140 0.58 973 0.52 <0.01* Figure 2: Randomized study design – ∗ Engagement score is BCS 163 0.58 524 0.52 <0.01* the output from Data Science Model that will predict who will likely to close the gaps (higher score is better) Table 2: Effectiveness of the engagement model for the con- trol group only. Note that while there have been no interven- Eligible Members tions for control group, the time interval of waiting for care- Identified for gaps to be closed has been consistently selected for both the Outreach (N=11,158) intervention and the control groups. Stratified Randomization based on Engagement Score* with outreach, directly from data. To jointly measure the perfor- Intervention Group Control Group mance of our model as well as the effectiveness of interventions, (N=10,045, 90%) (N=1,113, 10%) an experimental study was designed. Out of 10,045 members, 9,768 members could be contacted. Dis- tribution of measures for outreached members is shown in Figure 4 (highest for colorectal screening following by Wellness and breast 4 EXPERIMENTAL RESULTS cancer screenings. While the model can identify the likelihood of closing care-gaps, we Table 1 presents the effectiveness of the engagement model. Here are unable to calculate the likelihood of impacting care-gap closures the effectiveness is measures by comparing the average engagement HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada MA Torkamani et al. Medicare Closed after Not closed after (p-value) Also, the distribution of the engagement score for members who intervention intervention Measure *: statistically closed versus who didnâĂŹt closed the wellness gaps is shows in N Avg. score N Avg. score significant Figure 5. The patients who closed wellness measure had signifi- Wellness 956 0.57 2,898 0.50 <0.01* cantly higher engagement score. Similar patterns were observed COL 657 0.60 3,599 0.52 <0.01* for Breast and colorectal cancer screening measures (Figure 5). This verifies that our reommender system has successfully selected the BCS 298 0.57 1,192 0.51 <0.01* impactable people for outreach. Table 3: Effectiveness of the engagement model for the in- tervention group only Figure 5: The distribution of scores of members who closed their wellness gaps vs. the members who didn’t close. We ob- serve a significant difference between the two cohorts’ pre- score between members who closed the gaps versus who didnâĂŹt dicted scores. closed the gaps for the respective eligible measures. As you can see Distribution of Engagement Scores for all the three measures, members who closed the respective gaps Wellness gap status after intervention: had a significantly higher engagement score. 1: Gap Closed 0: Gap Open In Table 1, we show the effectiveness of the model within the whole population, i.e., both the control and intervention groups combined. To study the performance of the model itself, we should Percent also investigate how the members of the control group behaved regarding their open care-gaps in the period following the inter- vention campaign. To do so, we performed analyses similar to the process for Table 1, and we analyzed the intervention and control groups in tables 2 and 4 separately. As the results in Table 2 state, the model has been able to successfully identify the people who are Engagement Score engaged in their health and have closed their care-gaps without being contacted during this campaign. Table 4 is also aligned with tables 1 and 2, and it shows that the model has performed similarly well for the intervention group. 5 DATA ETHICS In building our data science models and algorithms, we evaluate Carenet Outcome Population Figure 4: Intervention Outcome Population and test for accuracy, precision and fairness to account for potential biases. We also ensure that the privacy of individuals are respected. Intervention It is critically important to build models that are primarily beneficial Group for our customers. Both through traditional and digital customer (N-10,045) interfaces, these models should also create, deepen and maintain consumers’ trust. One of the data ethics considerations within the care gaps model Didn’t included the control group size and experimental size. Specifically, Outreached Outreached (N=9,768) while the typical ratio would be 30% to 50% or a control group, we (N=277) opted for 10%. This 10% was also randomly chosen from stratified scores to confidently avoid an accidental accumulation of specific population in the control group. This way, our efforts would more Wellness rapidly include as many individuals as possible, but also provided BCS Screening COL Screening enough data for statistically significant conclusions to assess the Screening (N=3,082) (N=8,500) Total # of measures: 19,378 (N=7,796) effectiveness of interventions and the model in one study. It is i.e. 1.98 measures/member important to note that our model does not exclude individuals from being contacted. Instead, it prioritizes based on impact-ability at Table 4 shows the effectiveness of interventions in closing gaps the right time. Individuals in the control group were not excluded for three measures. There is a statistically significant positive lift from other campaigns outside the study period. for wellness and colorectal measures (absolute difference of +4.3% Other ethical considerations were related to how we apply the and +2.8%, and relative lift of +20.97% and 22.22% ). Lift is the gap- scoring within the model. For example, part of the scoring includes closure percentage difference between intervention and control the number of gaps an individual has and the severity. We prioritize groups. Lift for breast cancer screening measure is negative but not people with more open gaps higher, regardless of their engagement statistically significant. Negative lift is partly due gender specific scores. To best understand these details and how to translate the measure and during the randomization process members in control details into a meaningful and ethical model, we have closely collab- group were selected at a member level and not at a measure level. orated with key subject matter experts within the organization. Engagement Scoring for Care-gap Intervention Optimization HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada Total number of Intervention Group Control Group Lift% (p-value) Medicare Measure subjects in the study *: statistically significant Ncontacted Nclosed closed (%) Ncontrol Nclosed closed (%) Abs.: +4.3% (<0.01*) Wellness Measure 4,967 3,854 956 24.80% 1,113 228 20.50% Rel.:+20.97% Abs.:+2.8% (<0.01*) COL Measure 5,369 4,256 657 15.40% 1,113 140 12.60% Rel.:22.22% Abs.:-3.70% (>0.1) BCS Measure 2,177 1,490 298 20.00% 687 163 23.70% Rel.:-15.61% Table 4: Effectiveness of interventions: We observe that interventions improve care-gap closure for wellness and colorectal measures by 4.3% and 2.8% with statistical significance. However, the impact of interventions for breast cancer screening is negative, although the difference is not statistically significant. (Abs.: absolute change, Rel.: absolute change normalized by control baseline.) 6 CONCLUSION It is easy to add other measures to this system. The model is Care gaps closure in not only financially important for the health- built on healthcare industry standards (e.g., ICD-10, CPT codes, care system, but also it directly helps patients’ well-being by iden- therapeutic classes). Therefore, it can be used by a broader popula- tifying conditions at early stages. tion. The engagement score can also be used as a proxy for general Our proposed recommendation system generates ordered pri- healthcare engagement for marketing applications. oritization of the patients who are more likely impacted by phone interventions. The results show that in practice, people with high engagement scores are more likely to close their care-gaps after REFERENCES being outreached. Our recommender system could prioritize the [1] Nancy D Beaulieu and Arnold M Epstein. 2002. National Committee on Quality outreach and can differentiate who is likely to close the gaps after Assurance health-plan accreditation: predictors, correlates of performance, and an intervention. market impact. Medical Care (2002), 325–337. [2] Ned Calonge, Diana B Petitti, Thomas G DeWitt, Allen J Dietrich, Kimberly D This model can be extended in several directions. For example, Gregory, Russell Harris, George Isham, Michael L LeFevre, Roseanne M Leipzig, if we afford to contact many people, we can use the system for and Carol Loveland-Cherry. 2008. Screening for colorectal cancer: US Preventive personalized messages during the outreach. People who are not Services Task Force recommendation statement. Annals of Internal Medicine 149, 9 (2008), 627–637. much engaged in their healthcare, might be motivated by more [3] Emmanuel J Candès, Xiaodong Li, Yi Ma, and John Wright. 2011. Robust principal incentives and encouragement, but highly-engaged people might component analysis? Journal of the ACM (JACM) 58, 3 (2011), 11. be willing to have more information about other health-related [4] Emmanuel J Candès and Benjamin Recht. 2009. Exact matrix completion via convex optimization. Foundations of Computational mathematics 9, 6 (2009), 717. activities. [5] Sarah Stark Casagrande, Catherine C Cowie, and Judith E Fradkin. 2013. Utility of We can also use this system, to contact people who need more the US Preventive Services Task Force criteria for diabetes screening. American Journal of Preventive Medicine 45, 2 (2013), 167–174. time to close their care-gaps first and reach to people who will [6] Laura Davisson. 2016. USPSTF breast cancer screening guidelines. West Virginia respond faster afterward. Medical Journal 112, 6 (2016), 29–32.