Engaging Learners in an Enterprise L&K System

           Wesley M. Gifford, Ashish Jagmohan,                                  John Ambrose, Sue Rodeman,
             Yi-Min Chee, Anshul Sheopuri                                                Shota Aki
                             IBM Research                                                     Skillsoft
                       Yorktown Heights, NY, USA                                          Nashua, NH, USA


ABSTRACT                                                                   cific structure that enables capturing of temporal dependen-
We describe a system being designed for a leading provider of              cies. The original Rstructure considered by Hawkes’ was[Hawkes
                                                                                                t
enterprise learning solutions, to improve engagement among                 1971]: λ(t) = µ+ −∞ g(t−u; θ)dN (u), where N (u) is an ap-
learners. The system consists of an engagement timing com-                 propriate point process.
                                                                                                P      Hawkes’ specifically considered the
ponent which estimates a learner’s level of engagement and                 case where g(t) = P     i=1 αi exp{−βi t}, t > 0. This function
likely preferred interaction times, and a recommendation                   states that the intensity at the current time consists of de-
component which generates personalized content recommen-                   cayed contributions from prior events. If only a short period
dations. We summarize early results from a recently initi-                 of time has elapsed since the learner’s last action, the inten-
ated pilot deployment.                                                     sity function is impacted by these recent events and hence
                                                                           captures the fact that the learner is more likely to reengage.
                                                                           If a long period has elapsed since the last action, the process
1. INTRODUCTION                                                            behaves more like a homogeneous Poisson process with rate
   The problem of interest is improving learner engagement                 µ until the next action. Similar models have been used to
in an enterprise learning system, by utilizing consumption                 model stock market trades and earthquake aftershocks.
data captured by the learning platform. The existing learn-                   The system estimates the level of engagement for each
ing platform records each content launch, tracking user and                learner by considering their reengagement probability in a
content ID and launch time and duration, among other data.                 time window given their prior interaction history. For learn-
The platform also defines an expert-curated hierarchy of                   ers with sufficient history, the parameters of the model above
content, wherein assets are grouped into a forest of asset-                can first be determined using maximum likelihood estima-
folders on the basis of subject matter. We have developed an               tion. This was done using numerical maximization of the
engagement system consisting of two major components: 1)                   likelihood, whose expressions are available in [Ozaki 1979]1 .
an engagement timing component that is responsible for es-                 Then, the probability that a particular learner reengages
timating both a learner’s level of engagement, and preference              in the next s days, given their prior interaction history, is
to interact at certain days and times; and 2) a recommen-                  equivalent to the event that there is at least one arrival in
dation component that generates personalized recommenda-                   the time period of interest from the underlying stochastic
tions for each learner, based on historical learner activity. In           process (details omitted for brevity).
an initial email-based pilot, these components have demon-                    In addition to knowing a learner’s engagement level, it
strated significant improvements in user response compared                 is also important to know the best time of day and day of
to industry benchmarks.                                                    week to contact individual learners. This is derived from a
                                                                           learner’s prior interactions under the assumption that prior
                                                                           interaction times are indicative of preferred interaction times.
2. ENGAGEMENT TIMING                                                       Each day of the week is divided into n uniform duration
   The goal of the engagement system is to improve the                     bins, giving a multinomial distribution with a total of 7n
level of engagement of its learners. The engagement timing                 categories. In many cases, estimation of the category prob-
component helps the system with proper timing of actions,                  abilities suffers from sparsity due to limited interactions.
based on each learner’s current level of engagement. We ob-                This problem is solved by using Bayesian estimation with
served that learners often exhibit “bursty” or self-excitation             a Dirichlet prior that incorporates the aggregate preferences
behavior, where a learner’s interactions frequently occur in               of the entire population. Results based on a test across
clusters. We model these interactions as arrivals from a                   multiple customers are shown in Figure 1. The preferences
stochastic process that captures the temporal dependencies                 estimated in this test use an exponential weighting scheme
in learner behavior; the typical homogeneous Poisson pro-                  (with parameter γ) to place more weight on recent activ-
cess is not capable of doing so.                                           ity. The plot indicates that the perfromance saturates for γ
   For users with sufficient interaction histories, we consider            greater than 24 months. For this value of γ the estimated
a Hawkes’ process, which can be viewed as a counting pro-                  distribution significantly outperforms a naive model.
cess whose time-varying intensity function adheres to a spe-

                                                                           1
Copyright is held by the author/owner(s).                                   For learners with fewer prior interactions, one promising
RecSys 2014 Poster Proceedings, October 6–10, 2014, Foster City, Silicon   strategy is to aggregate their inter-arrival times and fit an
Valley, USA.                                                               aggregate model.
                                                                                                2.0
                                                                                                    Validation performance (baseline: best popular recommender)
                                                                                                                                    Best popular recommender
                                                                                                1.8                                 Best single recommender
                                                                                                                                    Random-forest hybrid


                                                                       Normalized performance
                                                                                                1.6

                                                                                                1.4

                                                                                                1.2

                                                                                                1.0

                                                                                                0.8
                                                                                                      5       6         7           8          9      10
                                                                                                                     Number of recommendations

                                                                  Figure 2:   Performance comparison of recom-
                                                                  menders. Metric: number of trained recommenda-
Figure 1: Performance of engagement time prefer-                  tions consumed in the validation set.
ence estimates relative to a naive estimate (uniform)
for the case of 4 bins per day.                                      We created visualizations to help learners understand why
                                                                  they were receiving specific recommendations. One type of
                                                                  visualization shows the relative strength of each recommen-
3. RECOMMENDATION ENGINE                                          dation along each of the three broad recommender groups
   The recommendation engine seeks to improve learner en-         described above. Another set of visualizations compares
gagement by generating personalized recommendations for           the strength of the recommendations along a single dimen-
which the learner will likely have high preference and, hence,    sion using their relative ranks from a specific recommender
high consumption likelihood.                                      group. Anecdotal evidence indicates that users found these
   The recommendation engine utilizes a blended ensemble          visualizations to be useful in helping to determine which of
[Koren 2009], wherein several baseline recommenders are           the recommendations might be of interest.
combined to yield a final set of recommendations. We use
three groups of baseline recommenders, each group con-            4.    PRELIMINARY EVALUATION
taining multiple individual recommenders. The first group
                                                                    Pilot deployments of the described engagement solution
consists of popularity-based recommenders which use sev-
                                                                  have been recently initiated. Learners receive emails at en-
eral metrics to measure popularity, including temporal re-
                                                                  gagement times determined as in Section 2, containing per-
cency, and launch and duration information. The second
                                                                  sonalized recommendations as described in Section 3. Some
group consists of content-based recommenders, wherein the
                                                                  preliminary quantitative indications of the efficacy of the
learner’s historical consumption of certain asset-types, as
                                                                  solution have been gleaned by examining initial email inter-
determined by the expert-curated hierarchy, is leveraged
                                                                  action metrics. After the first set of emails was sent to all
to generate new recommendations. These include recom-
                                                                  participants, the click-through and click-to-open rates (ie.
menders based on generative Bayesian models of the learner’s
                                                                  the fraction of participants who clicked upon one of the rec-
type-preferences, and based on tfidf type metrics over the
                                                                  ommendations after opening the email) were tracked. The
content hierarchy. The third group consists of collaborative-
                                                                  overall click-through rate was 5.6%, while the click-to-open
filtering recommenders, which leverage the implicit feedback
                                                                  rate was 31.6%. These metrics were compared to indus-
information [Hu et al. 2008] manifested in each user’s histori-
                                                                  try benchmarks for email campaigns in the education indus-
cal asset consumption activity; individual recommenders in-
                                                                  try, as reported in [Silverpop 2014]. The comparison shows
clude some based on matrix factorization, and others based
                                                                  that the both rates are significantly higher than the indus-
on separate user-user and asset-asset based filtering.
                                                                  try median (2.8% and 14.3% respectively). These metrics
   The recommender ensemble described above is combined
                                                                  give some preliminary confirmation of the promise of the
to generate a final set of recommendations (typically 5-10)
                                                                  proposed engagement approach.
for each learner. Activity data was temporally split into
training and validation data sets. We used several metrics
to quantify recommender goodness, including metrics based         5.    REFERENCES
on discounted cumulative gain and precision, and predic-          [Hawkes 1971] Alan G Hawkes. 1971. Point spectra of
tive metrics quantifying the number of trained recommen-               some mutually exciting point processes. JRSS. Series
dations which were consumed in the validation set. The                 B (1971), 438–443.
metrics yielded largely consistent results. We tried multi-       [Hu et al. 2008] Yifan Hu, Yehuda Koren, and Chris
ple blending techniques including gradient-boosted decision            Volinsky. 2008. Collaborative Filtering for Implicit
trees and random-forests; random forests were found to yield           Feedback Datasets. In Proc. ICDM 2008. 263–272.
best performance. Figure 2 shows, for one enterprise, a com-      [Koren 2009] Yehuda Koren. 2009. The BellKor Solution
parison of the blended recommender to the best popular rec-            to the Netflix Grand Prize. (2009).
ommender, as the number of recommendations varies. The            [Ozaki 1979] T Ozaki. 1979. Maximum likelihood
performance is normalized to that of the best popular rec-             estimation of Hawkes’ self-exciting point processes.
ommender. Note that this comparison is aggregated over all             Ann. Inst. Stat. Math. 31, 1 (1979), 145–155.
learners, including a significant number with no prior his-       [Silverpop 2014] Silverpop. 2014. 2014 Silverpop email
torical activity, for whom the popular recommender is best.            marketing metrics benchmark study. (2014).