Engaging Learners in an Enterprise L&K System Wesley M. Gifford, Ashish Jagmohan, John Ambrose, Sue Rodeman, Yi-Min Chee, Anshul Sheopuri Shota Aki IBM Research Skillsoft Yorktown Heights, NY, USA Nashua, NH, USA ABSTRACT cific structure that enables capturing of temporal dependen- We describe a system being designed for a leading provider of cies. The original Rstructure considered by Hawkes’ was[Hawkes t enterprise learning solutions, to improve engagement among 1971]: λ(t) = µ+ −∞ g(t−u; θ)dN (u), where N (u) is an ap- learners. The system consists of an engagement timing com- propriate point process. P Hawkes’ specifically considered the ponent which estimates a learner’s level of engagement and case where g(t) = P i=1 αi exp{−βi t}, t > 0. This function likely preferred interaction times, and a recommendation states that the intensity at the current time consists of de- component which generates personalized content recommen- cayed contributions from prior events. If only a short period dations. We summarize early results from a recently initi- of time has elapsed since the learner’s last action, the inten- ated pilot deployment. sity function is impacted by these recent events and hence captures the fact that the learner is more likely to reengage. If a long period has elapsed since the last action, the process 1. INTRODUCTION behaves more like a homogeneous Poisson process with rate The problem of interest is improving learner engagement µ until the next action. Similar models have been used to in an enterprise learning system, by utilizing consumption model stock market trades and earthquake aftershocks. data captured by the learning platform. The existing learn- The system estimates the level of engagement for each ing platform records each content launch, tracking user and learner by considering their reengagement probability in a content ID and launch time and duration, among other data. time window given their prior interaction history. For learn- The platform also defines an expert-curated hierarchy of ers with sufficient history, the parameters of the model above content, wherein assets are grouped into a forest of asset- can first be determined using maximum likelihood estima- folders on the basis of subject matter. We have developed an tion. This was done using numerical maximization of the engagement system consisting of two major components: 1) likelihood, whose expressions are available in [Ozaki 1979]1 . an engagement timing component that is responsible for es- Then, the probability that a particular learner reengages timating both a learner’s level of engagement, and preference in the next s days, given their prior interaction history, is to interact at certain days and times; and 2) a recommen- equivalent to the event that there is at least one arrival in dation component that generates personalized recommenda- the time period of interest from the underlying stochastic tions for each learner, based on historical learner activity. In process (details omitted for brevity). an initial email-based pilot, these components have demon- In addition to knowing a learner’s engagement level, it strated significant improvements in user response compared is also important to know the best time of day and day of to industry benchmarks. week to contact individual learners. This is derived from a learner’s prior interactions under the assumption that prior interaction times are indicative of preferred interaction times. 2. ENGAGEMENT TIMING Each day of the week is divided into n uniform duration The goal of the engagement system is to improve the bins, giving a multinomial distribution with a total of 7n level of engagement of its learners. The engagement timing categories. In many cases, estimation of the category prob- component helps the system with proper timing of actions, abilities suffers from sparsity due to limited interactions. based on each learner’s current level of engagement. We ob- This problem is solved by using Bayesian estimation with served that learners often exhibit “bursty” or self-excitation a Dirichlet prior that incorporates the aggregate preferences behavior, where a learner’s interactions frequently occur in of the entire population. Results based on a test across clusters. We model these interactions as arrivals from a multiple customers are shown in Figure 1. The preferences stochastic process that captures the temporal dependencies estimated in this test use an exponential weighting scheme in learner behavior; the typical homogeneous Poisson pro- (with parameter γ) to place more weight on recent activ- cess is not capable of doing so. ity. The plot indicates that the perfromance saturates for γ For users with sufficient interaction histories, we consider greater than 24 months. For this value of γ the estimated a Hawkes’ process, which can be viewed as a counting pro- distribution significantly outperforms a naive model. cess whose time-varying intensity function adheres to a spe- 1 Copyright is held by the author/owner(s). For learners with fewer prior interactions, one promising RecSys 2014 Poster Proceedings, October 6–10, 2014, Foster City, Silicon strategy is to aggregate their inter-arrival times and fit an Valley, USA. aggregate model. 2.0 Validation performance (baseline: best popular recommender) Best popular recommender 1.8 Best single recommender Random-forest hybrid Normalized performance 1.6 1.4 1.2 1.0 0.8 5 6 7 8 9 10 Number of recommendations Figure 2: Performance comparison of recom- menders. Metric: number of trained recommenda- Figure 1: Performance of engagement time prefer- tions consumed in the validation set. ence estimates relative to a naive estimate (uniform) for the case of 4 bins per day. We created visualizations to help learners understand why they were receiving specific recommendations. One type of visualization shows the relative strength of each recommen- 3. RECOMMENDATION ENGINE dation along each of the three broad recommender groups The recommendation engine seeks to improve learner en- described above. Another set of visualizations compares gagement by generating personalized recommendations for the strength of the recommendations along a single dimen- which the learner will likely have high preference and, hence, sion using their relative ranks from a specific recommender high consumption likelihood. group. Anecdotal evidence indicates that users found these The recommendation engine utilizes a blended ensemble visualizations to be useful in helping to determine which of [Koren 2009], wherein several baseline recommenders are the recommendations might be of interest. combined to yield a final set of recommendations. We use three groups of baseline recommenders, each group con- 4. PRELIMINARY EVALUATION taining multiple individual recommenders. The first group Pilot deployments of the described engagement solution consists of popularity-based recommenders which use sev- have been recently initiated. Learners receive emails at en- eral metrics to measure popularity, including temporal re- gagement times determined as in Section 2, containing per- cency, and launch and duration information. The second sonalized recommendations as described in Section 3. Some group consists of content-based recommenders, wherein the preliminary quantitative indications of the efficacy of the learner’s historical consumption of certain asset-types, as solution have been gleaned by examining initial email inter- determined by the expert-curated hierarchy, is leveraged action metrics. After the first set of emails was sent to all to generate new recommendations. These include recom- participants, the click-through and click-to-open rates (ie. menders based on generative Bayesian models of the learner’s the fraction of participants who clicked upon one of the rec- type-preferences, and based on tfidf type metrics over the ommendations after opening the email) were tracked. The content hierarchy. The third group consists of collaborative- overall click-through rate was 5.6%, while the click-to-open filtering recommenders, which leverage the implicit feedback rate was 31.6%. These metrics were compared to indus- information [Hu et al. 2008] manifested in each user’s histori- try benchmarks for email campaigns in the education indus- cal asset consumption activity; individual recommenders in- try, as reported in [Silverpop 2014]. The comparison shows clude some based on matrix factorization, and others based that the both rates are significantly higher than the indus- on separate user-user and asset-asset based filtering. try median (2.8% and 14.3% respectively). These metrics The recommender ensemble described above is combined give some preliminary confirmation of the promise of the to generate a final set of recommendations (typically 5-10) proposed engagement approach. for each learner. Activity data was temporally split into training and validation data sets. We used several metrics to quantify recommender goodness, including metrics based 5. REFERENCES on discounted cumulative gain and precision, and predic- [Hawkes 1971] Alan G Hawkes. 1971. Point spectra of tive metrics quantifying the number of trained recommen- some mutually exciting point processes. JRSS. Series dations which were consumed in the validation set. The B (1971), 438–443. metrics yielded largely consistent results. We tried multi- [Hu et al. 2008] Yifan Hu, Yehuda Koren, and Chris ple blending techniques including gradient-boosted decision Volinsky. 2008. Collaborative Filtering for Implicit trees and random-forests; random forests were found to yield Feedback Datasets. In Proc. ICDM 2008. 263–272. best performance. Figure 2 shows, for one enterprise, a com- [Koren 2009] Yehuda Koren. 2009. The BellKor Solution parison of the blended recommender to the best popular rec- to the Netflix Grand Prize. (2009). ommender, as the number of recommendations varies. The [Ozaki 1979] T Ozaki. 1979. Maximum likelihood performance is normalized to that of the best popular rec- estimation of Hawkes’ self-exciting point processes. ommender. Note that this comparison is aggregated over all Ann. Inst. Stat. Math. 31, 1 (1979), 145–155. learners, including a significant number with no prior his- [Silverpop 2014] Silverpop. 2014. 2014 Silverpop email torical activity, for whom the popular recommender is best. marketing metrics benchmark study. (2014).