=Paper= {{Paper |id=Vol-2715/paper4 |storemode=property |title=Debiasing Few-Shot Recommendation in Mobile Games |pdfUrl=https://ceur-ws.org/Vol-2715/paper4.pdf |volume=Vol-2715 |authors=Lele Cao,Sahar Asadi,Matteo Biasielli,Michael Sjöberg |dblpUrl=https://dblp.org/rec/conf/recsys/CaoABS20 }} ==Debiasing Few-Shot Recommendation in Mobile Games== https://ceur-ws.org/Vol-2715/paper4.pdf
           Debiasing Few-Shot Recommendation in Mobile Games
                                     Lele Cao                                                                  Sahar Asadi
                          caolele@gmail.com                                                             sahar.asadi@king.com
                   AI R&D, King Digital Entertainment,                                            AI R&D, King Digital Entertainment,
                       Activision Blizzard Group                                                      Activision Blizzard Group
                          Stockholm, Sweden                                                              Stockholm, Sweden

                               Matteo Biasielli                                                             Michael Sjöberg
                       matteo.biasielli@king.com                                                     michael.sjoberg@king.com
                   AI R&D, King Digital Entertainment,                                           AI R&D, King Digital Entertainment,
                       Activision Blizzard Group                                                     Activision Blizzard Group
                          Stockholm, Sweden                                                             Stockholm, Sweden
ABSTRACT                                                                               in e-commerce, the integration with mobile games is a relatively
Mobile gaming has become increasingly popular due to the grow-                         new area of research. Previous works have mostly focused on rec-
ing usage of smartphones in day to day life. In recent years, this                     ommending game titles to potential players (e.g., [2], [13], [14], [23],
advancement has led to an interest in the application of in-game                       and [24]). A few recent works have also explored in-game recom-
recommendation systems. However, the in-game recommendation                            mendation [3, 5, 8, 19], however, to the best of our knowledge the
is more challenging than common recommendation scenarios, such                         large-scale and real-time recommendation of in-game items has
as e-commerce, for a number of reasons: (1) the player behavior and                    not reached its maturity in the industrial scenarios. One of the
context change at a fast pace, (2) only a few items (few-shot) can be                  common business models in modern mobile games is free-to-play
exposed, and (3) with an existing hand-crafted heuristic recommen-                     where the game can be played free of charge, and monetization oc-
dation, performing randomized explorations to collect data is not a                    curs through micro-transactions of additional content and in-game
business choice that is preferred by game stakeholders. To that end,                   items [1]. Therefore, in-game contents are continuously added to
we propose an end-to-end model called DFSNet (Debiasing Few-                           the game, which may easily overwhelm the players, causing an in-
Shot Network) that enables training an in-game recommender on                          crease in churn probability. In-game item recommendation systems
an imbalanced dataset that is biased by the existing heuristic policy.                 help to alleviate this problem by ranking items and selecting the
We experimentally evaluate the performance of DFSNet both in                           ones that are more relevant to players in order to improve player
an offline setup on a validation dataset and online in a real-time                     engagement.
serving environment, illustrating the correctness and effectiveness                       In-game recommender systems utilize user interaction data that
of the trained model.                                                                  describes historical behavior and current context of individual play-
                                                                                       ers to expose each player the right item at the right time. However,
CCS CONCEPTS                                                                           despite a few in-game recommendation trials [3, 5, 8, 19] evaluated
                                                                                       mostly in an offline and batch fashion, there have not been many
• Information systems → Recommender systems; • Comput-
                                                                                       successful industrial applications of online in-game recommenda-
ing methodologies → Neural networks.
                                                                                       tion systems. This is mainly attributed to three unique requirements
                                                                                       from mobile games:
KEYWORDS
In-game recommendation, debiasing, mobile game, feedback loop,                            (1) The recommendation is often calculated on remote servers
few-shot recommendation, A/B test                                                             and delivered to game clients in near-real-time with low la-
Reference Format:                                                                             tency (e.g., within the range of 100 milliseconds). Because of
Lele Cao, Sahar Asadi, Matteo Biasielli, and Michael Sjöberg. 2020. Debiasing                 the fast-evolving game dynamics, the behavior of players and
Few-Shot Recommendation in Mobile Games. In 3rd Workshop on Online                            their context keep changing quickly; consequently, the rec-
Recommender Systems and User Modeling (ORSUM 2020), in conjunction with                       ommendations (calculated from behavior and context data)
the 14th ACM Conference on Recommender Systems, September 25th, 2020,                         become outdated easily. As a result, the optimal solution
Virtual Event, Brazil.                                                                        should continuously perform recommendation calculation
                                                                                              and always deliver up-to-date prediction when item exposure
1    INTRODUCTION                                                                             is triggered. That is why the offline batch recommendation
As smartphones expand the gaming market [21], mobile gaming has                               might only provide a suboptimal average policy.
become a significant segment of the video game industry. Although                         (2) In mobile games, the items to purchase or to play are usu-
recommendation systems such as [12] and [25] are widely adopted                               ally carefully crafted by game designers. To avoid distract-
                                                                                              ing the players with an overloaded small mobile screen, i.e.
ORSUM@ACM RecSys 2020, September 25th, 2020, Virtual Event, Brazil                            only a minimal subset (e.g. as small as one to three items,
Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).                                            hence termed few-shot) of those items is displayed at each
                                                                                              exposure occasion. Therefore, the players’ experience and
ORSUM@ACM RecSys 2020, September 25th, 2020, Virtual Event, Brazil                                                                      Cao, et al.




                                          (a) LiveOps (dynamic content)            (b) Daily gifts

              Figure 1: Examples of scenarios where item recommendation can be applied in CCS (1a) and CCSS (1b).




                    Figure 2: Illustration of collecting features and label for one player and one exposure trigger.


       behavior will be more sensitive to recommendations than in         2     THE PROPOSED APPROACH
       e-commerce applications where a large number of items can          There are many scenarios where in-game recommendations could
       be displayed at a time, leading to a stronger direct feedback      be applied. In Figure 1, we exemplify a couple of examples for two
       loop [18].                                                         of the King1 games: Candy Crush Soda Saga (CCSS) and Candy
   (3) The carefully designed in-game items are often exposed to          Crush Saga (CCS). We notice that some occasions allow only one
       players following a pre-defined heuristic policy that contains     item (a.k.a. one-shot) to be exposed at a time such as Figure 1b,
       a set of hard-coded rules regulating the particular item(s) to     while others (e.g., Figure 1a) can display a few more (a.k.a. few-shot)
       be exposed to player group(s) with certain attributes (e.g., IF    items. Items can have no values specified as shown in these two
       a player has won more than z games in a day, THEN show item        examples, or have values attached. To simplify the introduction of
       A instead of item B). The recommendation models largely fall       our method and experiments, we use the one-shot setup where only
       into two main categories: Supervised Learning [10] and Rein-       one item k with value vk can be recommended upon each trigger
       forcement Learning [7], both of which work only when item          of an exposure opportunity. We will show that our approach can be
       exposure can be randomly explored. However, the existing           easily applied to scenarios with few-shot exposure and items with
       heuristic policy heavily biases the experience of the players      no values. The overall optimization objective is to maximize the
       and hence the dataset, which makes it extremely difficult to       expected value of the potentially clicked items. In this section, we
       train an unbiased model directly. Collection of randomized         present a walk-through of our debiasing few-shot recommendation
       data is not often trivial. In many cases, stakeholders prefer      approach.
       to continue working with reasonably good heuristics which
       might not be optimal but avoid any potential business risks        2.1     Features and Label
       caused by randomization.
                                                                          Each sample in the dataset corresponds to a complete item exposure
   Our literature survey (till the date when this paper is written)       event triggered at time t for a player. As illustrated in Figure 2, we
shows that none of the related works [3, 5, 8, 19] managed to si-         calculate the player features, noted as x ∈ RD , using historical data
multaneously address the three aforementioned challenges. The             of the last N days before the time t. The D-dimensional features
contributions of this paper is threefold: (1) we propose a Debiasing      fall into two categories: behavioral (e.g., the total number of game
Few-Shot Network (DFSNet) that enables training an in-game item           rounds played) and contextual (e.g., the latest inventory status). In
recommender merely using heavily biased and imbalanced data, (2)          addition, at time t, the exposed item k (following a heuristic policy)
we discuss an approach to benchmark the trained DFSNet offline            is recorded. Within the time window that item k is exposed, we
and (3) we put the model live to recommend items in real-time,            log if the player eventually clicks on it or not, which is treated as a
and demonstrate how to monitor, evaluate, interpret, and iterate
on DFSNet in a controlled A/B test framework.                             1 https://king.com
Debiasing Few-Shot Recommendation in Mobile Games                                                ORSUM@ACM RecSys 2020, September 25th, 2020, Virtual Event, Brazil



  Labels Y                                                   Preference Loss: Lp          +       Confidence Loss: Lc        =      Total Loss: L
 Exposed    k
  Items
              feature filter                                                                      Item 1
                                                                             X        X1                             ^
                               X1      ~
                                       X1       Item 1                                        1024-512-256-2         C1     c^1           Item Values
  Players X                                 1024-512-256-2
                                                               ^
                                                               Y1     y^1                        Item 2                                  v=[v1, …, v5]
                                                                            X         X2                             ^
 (Feature)
                           X2          ~       Item 2                                         768-384-192-2          C2     c^2    c^k
                                       X2                      ^      ^                                                                        vk
                                            512-256-128-2      Y2     y2                         Item 3
                                                                             X        X3                             ^
                               X3      ~
                                       X3       Item 3                                        512-256-128-2          C3     c^3
                                                                      y^3
                                                               ^                                                                     Meta Ranking           Ranked
                                             256-128-64-2      Y3                                Item 4                                 Module               Items
                                                                            X         X4                             ^
                           X4          ~        Item 4                                         128-64-32-2           C4     c^4    (with exploration)
                                       X4
                                              64-32-16-2
                                                               ^
                                                               Y4     y^4                         Item 5
                                                                            X         X5                             ^
                           X5          ~        Item 5
                                                                      ^
                                                                                                96-48-24-2           C5     c^5    y^k
                                       X5                      ^
                                              64-32-16-2       Y5     y5       Sample            Confidence
                                                                              Balancers          Predictors
                                Sample         Preference
                               Balancers       Predictors                          Confidence Predictor Module
                                    Preference Predictor Module

 Figure 3: The architecture of DFSNet (best viewed in color). In the example shown here, we set K, the number of items, to 5.


binary label y ∈ {0, 1}2 . The raw dataset is extremely biased due to                information embodied by negative samples is lost. We propose a
the presence of the existing heuristic policy, and it is imbalanced                  minority subsampling technique (cf. sample balancers in Figure 3)
concerning the label and distribution of exposed item types.                         to automatically balance Xk during training. We split Xk into two
                                                                                     sets Xk+ and Xk− , where Xk+ contains all Mk+ positive samples, Xk−
2.2    The End-To-End Model: DFSNet                                                  contains the Mk− negative samples, and Mk+ ≪ Mk− . We randomly
In this section, we propose to train a debiasing few-shot network,                   pick (without replacement) max (min(Mk+ , Mk− ), 1) samples from
DFSNet, to perform a few-shot in-game recommendation using                           Xk− and put them in a set X
                                                                                                               H− . We construct the balanced mini-batch
                                                                                                                k
only the heavily biased and imbalanced dataset. The goal of DFSNet                   subset XHk by
is to rank K items where the k-th item has a value vk for a player                                                                         +            +
                                                                                       Hk = X+ ∪ X          Hk ∈ R[max(min(Mk , Mk ),1)+Mk ]×D .
                                                                                                                                               −
                                                                                                 H− , where X                                                   (1)
(that is represented by a D-dimensional feature vector x ∈ RD ), in                    X     k    k
order to maximize the expected click value. As shown in Figure 3,                    This minority subsampling balancer is conceptually similar to the
DFSNet consists of three modules: preference predictors, confidence                  negative sampling in [22] that enforces each mini-batch to contain
predictors, and meta ranking. The training is conducted in a mini-                   only one positive sample; therefore, our approach results in a far
batch fashion; the input is a matrix X = {x (m) }m=1M    ∈ RM ×D ,                   more balanced mini-batch. Similarly to negative sampling, in mi-
where M is the number of samples in each mini-batch and D is the                     nority subsampling, Xk must contain at least one sample of the
number of features in each sample. For the sake of conciseness, we                   minority class.
use the general terms x and X to denote any sample and mini-batch,                      The output of the Sample Balancer from the k-th branch (i.e. XHk
respectively.                                                                        in Figure 3) is then fed to a preference predictor implemented with
                                                                                     a 4-layer Deep Neural Network (DNN) binary classifier. The ELU
2.2.1 Preference Predictors. For a player x, the preference predic-
                                                                                     (Exponential Linear Unit) activation function [9] is applied to all
tor module (cf. the red dashed bounding box in Figure 3) predicts
                                                                                     hidden layers except the last one, which is a softmax layer with
the probability ŷk that player x will click item k if this item is ex-
                                                                                     two neurons. Dropout could be applied to avoid overfitting, yet we
posed. During training, the mini-batch X is firstly divided into K
                                                                                     choose to empirically scale the first three layers of the k-th DNN
subsets (noted as X1 , . . . , XK ), so that the k-th subset Xk ∈ RMk ×D             proportionally (from a base architecture 32-16-8) to the exposure
only contains the Mk players that were exposed to the k-th item.
                                                                                     ratio of the corresponding item: Mk / kK=1 Mk . The loss to optimize
                                                                                                                           P
As a result, each item k has its own architectural branch, which
                                                                                     the preference predictor module, Lp , is formulated as
sequentially propagates Xk through a sample balancer and a prefer-
ence predictor, and eventually yields the click/non-click probability                                   K 
                                                                                                                  M
                                                                                                                                            
                                                                                                    1 X  1 X
                                                                                                                    Hk                   
Yk ∈ RMk ×2 .
D                                                                                            Lp =
                                                                                                                          (m)       (m)
                                                                                                                        −yk ∗ log ŷk        ,    (2)
   Since the number of clicked items usually represents a small                                    2K          M
                                                                                                             H
                                                                                                                 k m=1
                                                                                                                                           1
                                                                                                       k =1                                 
fraction of the entire exposed item set, there are far more negative
                                                                                                                                            
samples (y=[1,0]) than the positive ones (y=[0,1]) in Xk . In many                   where “∗” represents the element-wise multiplication,
                                                                                     Hk = max(min(M + , M − ), 1) + M + is the number of samples in X
                                                                                     M                                                              Hk ,
recommendation methods such as [16, 20], positive and negative                                        k      k         k
                                                                                                       (m)
samples are manually balanced by random sampling, and the rich                       the notation yk          is the label (one-hot encoded vector) of the m-th
ORSUM@ACM RecSys 2020, September 25th, 2020, Virtual Event, Brazil                                                                                Cao, et al.


                       (m)
sample in X
          Hk , and ŷ
                     k
                             is the predicted probability vector for the   level of item k for an individual player x. The meta ranking module
same sample.                                                               (cf. the right-most box in Figure 3) ranks items by calculating a
                                                                           propensity score Rk for each item using three factors: ŷk , ĉ k , and
2.2.2 Confidence Predictors. To explicitly model the bias from the
                                                                           vk ; the term vk is the value of item k, which is usually predefined.
pre-dominant heuristic, we introduce a confidence predictor module
                                                                           We propose a piece-wise formula for computing Rk :
(cf. the green dashed bounding box in Figure 3) to DFSNet. The
                                                                                                    v
confidence predictors estimate the probability c k that player x has                             k
                                                                                             min(v)
                                                                                                              , (ĉ k ≥ 12 ) ∧ (ŷk ≥ 12 )
recently been exposed to the item k. Thus, ĉ k can be treated as an                   Rk = 
                                                                                            ĉ · ŷ · vk                                     ,          (5)
                                                                                             k k max(v)       , otherwise
approximation of the confidence we have for the predicted click
probability ŷk . Similar to preference predictors, this module also       where functions min(v) and max(v) respectively return the min-
employs K branches (for K items), each of which has a sample               imum and maximum element from vector v = [v 1 , . . . , v K ]. Gen-
balancer and a DNN binary classifier.                                      erally speaking, Rk is obtained by calibrating ŷk with ĉ k and vk ,
   The mini-batch input X ∈ RM ×d is fed into the sample balancer          so that random exploration data would not be mandatory (at least
of each branch indiscriminately. To prevent the confidence pre-            initially). To the best of our knowledge, only [17] discussed the
dictors from simply memorizing the heuristic rule and lose the             possibility of removing item position bias using an adversarial net-
generalization capability, it is important to remove the features (if      work, yet our approach manages to deal with much stronger item
any) that are used in heuristic policy, hence X’s second dimension d       exposure bias using a more explainable strategy; and explainabil-
may be smaller than the original dimension D. The sample balancer          ity is valued highly in industrial environments [11]. If vk is not
in this module first divides X into two subsets Xk ∈ RMk ×d and            available (Figure 1b and 1a), we can adapt Equation (5) to
X¬k ∈ R (M −Mk )×d , where Xk only contains Mk players exposed                                 ŷk
                                                                                                          , (ĉ k ≥ 12 ) ∧ (ŷk ≥ 12 )
to item k, and X¬k has the rest M − Mk samples. Due to the pre-                          Rk = ĉ · ŷ                                 .       (6)
existing heuristic policy, the item exposure was not randomized,                                k k , otherwise
making the size of Xk and X¬k imbalanced. To that end, we need a              We can conveniently assume that K items are already sorted by
sample balancer for each branch to produce a balanced mini-batch           their values v, hence the propensity scores R = [R 1 , . . . , R K ] are
Xk using                                                                   also sorted accordingly. An overly drastic change of item exposure
                                                                           (e.g. a player who used to see item 1 according to the heuristic
           X¬k ∪ Xk′ ∈ R[max(M −Mk ,1)]×d
                                                
                                                   , Mk ≥ M − Mk
                                                                           which suddenly gets item K from a newly deployed recommender
         
  Xk =                                                           , (3)
         
                               [max(M   ,1)]×d
                                             
                     ′
          Xk ∪ X¬k ∈ R
                                      k            , Mk < M − Mk           system) may undermine the player experience and game ecosystem.
where Xk′ and X¬k   ′ are obtained via minority subsampling (with-         To avoid that situation, it is a good practice to enforce a heuristic
out replacement); specifically, the former term contains max(M −           deviation threshold (noted as ks ∈ {1, . . . , K − 1}) in the online
Mk , 1) samples randomly selected from Xk , and the latter contains        production environment. Specifically, we mask Rk with
max(Mk , 1) randomly picked samples from X¬k . M k denotes the
                                                                                                    Hk =  0   , |kh − k | > ks
                                                                                                         
                                                                                                    R                           ,             (7)
                                                                                                         
number of samples in Xk , hence Xk ∈ RM k ×d .                                                           R
                                                                                                             k , otherwise
   Xk is then fed to a confidence predictor implemented in the same
                                                                                                         
                                                                           where kh is the item from the pre-existing heuristic policy. With
way as in preference predictors except that each DNN is scaled
proportionally to factor M k / kK=1 M k . The loss Lc to optimize this
                                   P                                       R = [R
                                                                           H    H1 , . . . , R
                                                                                             HK ], both one-shot and few-shot in-game recommen-
module has a similar form to Equation (2):                                 dation are possible. When recommending items based on H     R, we can
                                                                           sometimes choose to apply ϵ-greedy to slowly accumulate more
                        K         Mk                                      diversified data for follow-up model iterations.
                                                            
                   1 X  1 X                (m)
                                                    
                                                      (m)
                                                           
           Lc =                           −ck ∗ log ĉk       ,   (4)
                 2K         
                               M k m=1
                                                            1
                       k =1  
                                                             
                                                                          3     EXPERIMENTATION AND EVALUATION
                                                                           We apply DFSNet to a real-time item recommendation scenario for
where ck ∈ {0, 1}2 is the constructed confidence label specifying
         (m)

                                                                  (m)
                                                                           the CCSS game. There is a total of five items (K=5) in this scenario,
if the m-th player/sample in Xk actually saw item m or not, and ĉk        yet only one item k can be shown on the mobile screen when the
is the predicted confidence probability vector for the same sample.        player triggers the exposure event. The item k has a value vk . Items
The preference and confidence predictors are jointly optimized with        are sorted by value in an ascending order, i.e., v 1 < v 2 < v 3 < v 4 <
a total loss L = Lp + Lc .                                                 v 5 . If a player clicks on the exposed item k, a value vk will be added
2.2.3 Meta Ranking. During model serving/prediction, the sample            to the game ecosystem; and we choose to maximize the value of
                                                                           the clicked item. The details of the concrete use case and items are
balancers will be omitted, meaning that the input X ∈ RM ×D
                                                                           considered to be sensitive proprietary information and therefore
(representing M players) will be directly fed to DNNs in all branches,
                                                                           anonymized in this paper.
in order to simultaneously generate real-valued preference (D     Y∈
                                                                                As illustrated in Figure 4, the raw dataset is collected (cf. Sec-
[0, 1]M ×K ×2 ) and confidence (DC ∈ [0, 1]M ×K ×2 ) predictions. The
                                                                           tion 2.1) using a Flink2 based stateful streaming platform [6]. The
second values in the last dimension of D Y are the click probabilities,    collected dataset contains approximately 22 million samples, each
while those of DC are the confidence levels expressed as probabilities.    of which has D = 48 features. We apply different transformations
To simplify the discussion that follows, we will use ŷk and ĉ k to
denote, respectively, the predicted click probability and confidence       2 https://flink.apache.org
Debiasing Few-Shot Recommendation in Mobile Games                                        ORSUM@ACM RecSys 2020, September 25th, 2020, Virtual Event, Brazil




                        Events                         Events                    Features                        Data (BigQuery)

                                                                                                                   Trained Model
                          Item                          Item                    Prediction
                                                                                            Tensorflow                                  Machine Learning
                                                                Streaming Cluster
   Game Clients                   Game Servers                                            Serving Cluster                                  Platform
                                                                  (Flink-based)
                                      Online (with A/B test capability)                                                                Offline

                     Figure 4: High-level system topology of offline model development and online model serving.


(e.g., min-max, z-score, and logarithmic) to numerical features and            we illustrate, in Figure 6b, a Policy Transition Matrix (PTM), where
perform either one-hot encoding or embedding to categorical fea-               each cell at position (i, j) indicates the ratio of players who were
tures. The dataset pre-processing and model development is carried             supposed to get item j, according to heuristic policy, but are now
out on a machine learning platform developed by King.                          exposed to item i according to DFSNet. It can be seen that the
   We will present both offline (training and evaluation) and online           diagonal has the majority of the unchanged exposures, and each
(serving and monitoring) evaluation of DFSNet in the following                 row largely follows a truncated normal distribution.
sections. DFSNet is implemented in Tensorflow3 ; the preference and
                                                                               3.1.3 Distribution of preference and confidence predictions. For
confidence DNNs are scaled as depicted in Figure 3. The training
                                                                               each sample in the validation dataset, DFSNet produces ten proba-
is carried out with Adam optimizer [15], using 70,000 steps and a
                                                                               bilities: five click probabilities (ŷ1 to ŷ5 ) and five confidence prob-
mini-batch size of 2,048. The learning rate is initialized to 5 × 10−3 ,
                                                                               abilities (ĉ 1 to ĉ 5 ). Figure 7 visualizes ŷk and ĉ k jointly to answer
and then it exponentially decays to 2 × 10−6 . During serving, we
                                                                               four questions:
set ks = 2 in Equation 7 to obtain one single item to recommend.
                                                                                   (1) Does ŷk reflect the low click ratio of item k? The five red
3.1     Offline Performance: Model Training and                                        area plots on the diagonal are the distributions of ŷk , all of
                                                                                       which show that clicking tends to be a rare event.
        Validation
                                                                                   (2) Does ĉ k match the exposure ratio of item k? The five green
To perform offline model evaluation, we create a validation dataset                    bar plots on the diagonal represent the distributions of ĉ k ;
                       U ) by randomly selecting 1% data from the
(noted as U = {x (u ) }u=1                                                             the majority of exposures come from item 1, which coincides
raw dataset (thus U ≈ 0.22 million), and use the rest for training.                    with the heuristic item exposure distribution in Figure 6a.
3.1.1 Generalization evolvement during training. We evaluate the                   (3) Does ŷk show general item preference? The lower triangu-
performance of the current model on the validation dataset during                      lar portion has pair-wise scatter plots of click probabilities.
the training. Since the datasets are highly imbalanced, accuracy is                    Each data point in the plot for item i and j has a coordinate
not an informative metric to monitor during training. We also find                     of (ŷi , ŷ j ), thus if the point is below the line of ŷi =ŷ j , the
that recall and precision are having a hard time competing with each                   corresponding player prefers item i over j, and vice versa. To
other (showing no clear trend) during the training, hence not ideal                    examine the general trend, we fit linear models (red straight
for monitoring the training performance. AUC-ROC (Area Under                           lines going through the original points) for pair-wise plots.
the Curve of Receiver Operating Characteristics), on the other hand,                   We observe that in average, players prefer items with lower
is a stable metric that reliably tells how much the model is capable                   values.
of distinguishing between classes; therefore, the evolution of per-                (4) Can DFSNet be confident with multiple items for the same
item AUC-ROCs (see Figure 5) indicates how the generalization                          player? The upper triangular portion in Figure 7 contains
ability of the model improves during the training process. At the                      pair-wise scatter plots of confidence probabilities ĉ k . Every
end of the training, we also measure the recall and precision for                      point in the plot for item i and j is located at (ĉ i , ĉ j ). In-
each item, which are visualized as red bars in Figure 12.                              tuitively, implied by Equation (5), the points (representing
                                                                                       players) in the green shaded areas are likely eligible to more
3.1.2 Policy change quantization: heuristic vs. DFSNet. We use the                     than one item.
trained DFSNet to obtain predictions on the validation dataset. We
                                                                               3.1.4 Best-effort estimation of recall, precision, and uplifts. On the
first measure the overall change of item exposure distribution. The
results are reported in Figure 6a. In our experiment, we observed no           offline validation dataset U ∈ RU ×D , it is impossible to measure
significant change in item allocation for players due to the strong            the “quality” of a recommendation that is different than what was
confidence constraint imposed, yet there is a slight shift towards             actually exposed; hence, a sub-optimal solution is to create a subset
the higher-valued items. The ratio of players that see a different             (from U) containing only the players for whom both DFSNet and the
                                                                               heuristic policy recommended the same item. We use U ′ ∈ RU ×D
                                                                                                                                                 ′
item (than heuristic) is about 7.4%. To decompose the policy change,
                                                                               (U ′ < U ) to denote that subset. On that subset, we measure per-
3 https://www.tensorflow.org                                                   item recall and precision for preference predictors (cf. the red
ORSUM@ACM RecSys 2020, September 25th, 2020, Virtual Event, Brazil                                                                               Cao, et al.




                 (a) Item 1               (b) Item 2          (c) Item 3      (d) Item 4               (e) Item 5        (f) ROC for all items




                 (g) Item 1               (h) Item 2           (i) Item 3     (j) Item 4               (k) Item 5        (l) ROC for all items

Figure 5: Training performance of confidence predictors (a-f) and preference predictors (g-l). (a-e),(g-k) The AUC of ROC
measured on the validation dataset for each item during training. (f),(l) The item ROCs on the validation dataset at the end of
the training.




                   (a) Comparison of item distribution.                                                           (b) PTM.

Figure 6: Comparison of heuristic and DFSNet policy on the validation dataset: (6a) overall item exposure distribution and (6b)
policy transition matrix (PTM). Each row in PTM adds up to 1.0.


bars in Figure 12). To provide uplift baselines of average click rate       For each game player in the test group, the prediction client makes
( #_clicked_items
      #_items     ) and click value ( total_value_of_clicked_items
                                            #_clicked_items        ), we    a request to the DFSNet prediction service (in real-time) as soon
calculate both metrics for both the heuristic and DFSNet policies.          as any pre-defined triggering event emerges. In the life cycle of a
The results are presented in Table 1. Offline uplifts will be then          real-time recommendation system, it is often required to iterate on
compared with the ones obtained during online model serving (cf.            the model serving periodically (cf. Figure 8 for an example of two
Section 3.2.3).                                                             serving iterations) to incorporate bug fixes or new models trained
                                                                            on more recent data.
3.2     Online Performance: Real-Time Serving                                  During online serving, we track several metrics (aggregated
        and Monitoring                                                      into temporal windows of 5 minutes) to monitor the key system
                                                                            performance, some examples of which include model response
After the DFSNet model is trained and validated in an offline envi-
                                                                            time, model exceptions, and model raw output distribution. The
ronment, it is then deployed in a Tensorflow Serving4 cluster. As
                                                                            definition of those system metrics remains the same for different
illustrated in Figure 4, a prediction client (sharing the entire feature
                                                                            recommendation models. These metrics are indicators of the system
collection logic described in Section 2.1) is also deployed on the
                                                                            health, and therefore, they play a critical role in the validity of
streaming cluster. To validate the online performance of the DFSNet
                                                                            the model. To monitor model performance, we log all features,
model, we run an A/B test on a small fraction of players on CCSS.
                                                                            predictions, and labels in a BigQuery5 database, and visualize them
4 https://www.tensorflow.org/tfx/guide/serving                              5 https://cloud.google.com/bigquery
Debiasing Few-Shot Recommendation in Mobile Games                                    ORSUM@ACM RecSys 2020, September 25th, 2020, Virtual Event, Brazil




                   Figure 7: The visualization of preference (red plots) and confidence (green plots) predictions.

Table 1: The estimated click rate and value obtained from the validation dataset. For the DFSNet policy, if item k (with a
predicted click probability ŷk ) is ranked highest, and if ŷk > 0.5, we assume the player will click the recommended item k,
generating a value vk .

                            Metrics (obtained on validation dataset)    Heuristic Policy   DFSNet Policy      Uplift (%)
                                       Average click rate                   0.1022             0.1312            28 %
                                       Average click value                   1.52               1.92             26 %


in a dashboard that is updated on hourly basis. We will hereafter           where the red curve shows the ratio of players (in the DFSNet A/B
emphasize our online evaluation on several key perspectives, all of         test group) for which DFSNet and the heuristic policy recommended
which are adapted from the monitoring dashboard.                            different items. As reported in Section 3.1.2, this ratio is approxi-
                                                                            mately 7.4% (represented by a straight green line) when measured
3.2.1 Heuristic deviation trend. The foremost questions to answer           on the offline validation dataset. So, the expectation is that the
about model performance are twofold: (1) what the scale of the              ratio of impact should reach around 7.4%; this trend can be clearly
model impact is and (2) how this impact evolves along the timeline.         seen in each model serving iteration. However, some input features
To answer these two questions, we illustrate the overall heuristic          need individual players to respond to certain game components,
deviation trend of two adjacent model serving iterations in Figure 8,       which takes about three days in the use case discussed here; and
ORSUM@ACM RecSys 2020, September 25th, 2020, Virtual Event, Brazil                                                                                      Cao, et al.



        different bundle than heuristic   0.10
        The ratio of players that get a
                                                                                                                                              Online
                                          0.08                                                                                    0.074       Offline

                                          0.06

                                          0.04
                                          0.02
                                                                                                                                              Timeline
                                          0.00
                                                                                                                                               (Days)
                                                 1




                                                                5




                                                                              10




                                                                                       15




                                                                                                         20




                                                                                                                             25


                                                                                                                                      27
                                             ay




                                                             ay




                                                                          ay




                                                                                   ay




                                                                                                       ay




                                                                                                                          ay


                                                                                                                                  ay
                                            D




                                                            D




                                                                          D




                                                                                   D




                                                                                                     D




                                                                                                                         D


                                                                                                                                  D
                                                 Previous model serving                 Current model serving

Figure 8: The ratio of players that see different items from the heuristic policy: a span of 27 days containing two model serving
iterations.


items are served using the heuristic policy to players that still have                         expectation. In practice, it is acceptable to have a few red
incomplete features. As a result, for each model serving iteration,                            boxes as long as most of the densely populated cells satisfy
the ratio always starts from a fairly low point before reaching 7.4%.                          the expectation.
Furthermore, between two subsequent iterations, the ratio drops,                           (d) The PTM in Figure 9d principally serves the same purpose
in this use case, for three days while rebuilding features before                              as Figure 9c except that it computes the average click value
picking up the ascending trend again. We believe that the daily                                of item i instead.
monitoring of the heuristic deviation trend helps tracking the scale                       Based on the PTM analysis, we expect to see an improvement in
of the model impact effectively.                                                        the user engagement with the examined game feature compared
3.2.2 Player Transition Matrix (PTM). To have a better insight of                       to using the heuristic solution. Further analysis of the PTM can
the underlying changes contributing to the overall model impact,                        help us to understand better user behaviors. In our analysis, we
we break down the impact analysis further using PTMs. Here, a PTM                       used an eight-dimensional space to describe players behavior. Fig-
is a 2D matrix that puts players into each grid cell according to how                   ure 10 shows different user behavioral patterns (in the form of radar
their experience changed from the default heuristic policy (rows)                       charts) on top of the grayscale background from Figure 9a. The
to the model policy (columns). Figure 9 summarizes the results over                     KPI calculation and the actual values are considered to be sensitive
a period of 14 days since Day11 (cf. Figure 8) and illustrates four                     proprietary data and therefore removed from the charts.
different PTMs (a-d) that enable model impact decomposition from                        3.2.3 Uplifts of click ratio, count, and value. The previous sections
four different perspectives:                                                            presented a drill-down process of analyzing the model behavior
    (a) Grid cell (i, j) denotes the number of players that are now                     in comparison with the heuristic policy. We now zoom out and
        exposed to the item i but would have originally got item j.                     compare the item click dynamics with the control group. We choose
        The gray-scale background of Figure 9a is also used in (c)                      to focus on the accumulated 14-day uplift of three metrics: click
        and (d).                                                                        count, click ratio, and click value (these metrics are defined in
    (b) Calculated by dividing each number in (a) by the sum of the                     Section 3.1.4). The online uplift is computed by subtracting the
        corresponding row. It should have the highest ratio values                      metrics (normalized by the population size of A/B test groups) of
        on the diagonal, as obtained in the offline evaluation and                      the control group from the ones of the DFSNet group. Hence, uplift
        reported in Figure 6b. Because the presence of incomplete                       can have both positive and negative values. All uplift values are
        features leads to a policy fallback (to the heuristic, as men-                  considered sensitive proprietary data and therefore scaled.
        tioned in Section 3.2.1), the results presented in Figure 9b                       Figure 11a shows that DFSNet group is losing clicking counts
        are more conservative than those shown in Figure 6b.                            on items 1 and 4 while gaining more click counts on other items;
    (c) To inspect how the moved players impact the click prob-                         therefore, the players moved away from those buckets are mostly
        ability on PTM, we calculate the click percentage (of item                      item clickers, and they bring more absolute click counts for items 2,
        i) for the player cohort in each grid cell, and then over-                      3, and 5. However, the click ratio of item 5 is reversed in Figure 11b,
        lay the percentage values on top of Figure 9a’s gray-scale                      which is a consequence of the lower click ratio in the player cohort
        background, resulting in Figure 9c. The blue values in the                      moved to item 5. Figure 11c shows the uplift of accumulated click
        diagonal cells are the click percentage for the control group.                  value for each item. We observe that the loss of click value (from
        Each off-diagonal cell (i, j)i,j contains the players that are                  items 1 and 4) is compensated by the increased click value for items
        moved from cell (j, j) to cell (i, i). Intuitively, we expect the               2, 3, and 5, leading to a net positive value uplift (approximately
        model to guarantee the click percentage in (i, j)i,j to be                      +0.71% over the control group). As a result, the offline uplift esti-
        larger than either percentage values in cell (i, i) or (j, j); we               mations (Table 1) are overly optimistic compared to that measured
        use red boxes to highlight the cells that fail to satisfy that                  online.
Debiasing Few-Shot Recommendation in Mobile Games                                    ORSUM@ACM RecSys 2020, September 25th, 2020, Virtual Event, Brazil




             (a) Number of players “moved” from j to i.                                 (b) Ratio version of (a): each row sums to 1.




(c) Click percentage of item i for players who should see item j ac-        (d) Average click value of item i for players who should see item j
cording to the heuristic.                                                   according to the heuristic.

Figure 9: The player transition matrices: each grid cell at coordinate (i, j) indicates (9a-9b) the volume, (9c) average click ratio,
and (9d) average click value for players who would have been exposed to item j according to the heuristic policy and got
recommended item i from DFSNet. The blue numbers along the diagonal in 9c, 9d denote the corresponding statistics obtained
from the control group in A/B test.


3.2.4 Iterating on the DFSNet model. The dataset used to train DF-          reference initially, hence we argue that the offline evaluation tends
SNet is highly biased due to the pre-existing heuristic rules. Our          to underestimate the true values of precision and recall. Figure 12c
approach achieves debiasing by incorporating confidence predic-             and 12d reflect the situation four days later. The trend is clear:
tors, thus demonstrating a mild impact of less than 8% (cf. Figure 8).      in four days, both precision and recall have declined significantly;
Nonetheless, that impact continuously changes the players’ experi-          when the majority of green bars go under the red bars, it is probably
ences, which nudges the input feature distributions around. This            the time to retrain/finetune the DFSNet using fresher data.
creates a direct feedback loop, which gradually compromises the gen-
eralization and discriminative capability of the model being served.        4    CONCLUSION AND PERSPECTIVES
It is a form of analysis debt [18], in which it becomes increasingly
                                                                            In-game recommendation aims to enable providing more relevant
difficult to predict the behavior of a given model before it is released.
                                                                            items to each player. The in-game recommendation use cases usu-
Iterating the model periodically using more recently collected data
                                                                            ally allow exposing only a few items at a time; thus, change in
can reduce the intensity of the feedback loop. However, we need a
                                                                            the choice of items can have a large impact on game dynamics
metric to determine the time to train a new model. Accuracy is not
                                                                            leading to a short feedback loop. In addition, player preferences
an option since the logged data is heavily imbalanced (click event
                                                                            change quickly due to the change in the game dynamics and player
is rare), and we care more about correctly predicting the clicking
                                                                            context. As a result, the model gets outdated sooner in real-time
events. Practically, AUC-ROC (cf. Section 3.1) and response distri-
                                                                            prediction. In-game item exposures are mostly dominated by some
bution charts [4] can also be used to monitor the feedback loop,
                                                                            hand-crafted heuristics, which heavily bias the data, and random-
yet they are not as sensitive as precision and recall. We propose to
                                                                            ized exploration to train an unbiased recommendation model is
monitor the precision and recall of preference predictors (i.e., the
                                                                            usually not favored by stakeholders. We propose DFSNet that en-
green bars in Figure 12) to identify the “right” moment for model
                                                                            ables training an unbiased few-shot recommender using only the
iteration.
                                                                            biased and imbalanced data.
    In Figure 12, the red bars represent the precision and recall
                                                                               During training, AUC-ROC is a stable indicator of the modelâĂŹs
estimated using a subset of the validation dataset (as explained
                                                                            generalization ability. We demonstrate several ways to estimate the
in Section 3.1.4), while the green bars in Figure 12a and 12b are
                                                                            model performance offline on a validation dataset. We also evalu-
respectively precision and recall calculated 14 days after the model
                                                                            ate the online DFSNet performance in an A/B test. We start with
got deployed (a snapshot on Day24 in Figure 8). We observe that
                                                                            monitoring the overall model impact by looking at the heuristic de-
online precision and recall reach much higher values than the offline
                                                                            viation trend. Then, we further decompose the model impact using
ORSUM@ACM RecSys 2020, September 25th, 2020, Virtual Event, Brazil                                                                                Cao, et al.




Figure 10: The overlay of player behavioral features (in an eight dimensional space visualized in radar charts) over the policy
transition matrix; each player dimension is calculated over a period of the past N days and then rescaled to the range of [0,1].




            (a) Uplift of click counts.                              (b) Uplift of click ratio.                      (c) Uplift of click value.

Figure 11: The accumulative scaled uplift (DFSNet over control A/B test group) of (11a) click count, (11b) click ratio, and (11c)
click value per exposed item type.


PTMs. We carried out data analysis to understand user behaviors                      and proxy metrics as a way to have an estimate of model online
and discern the key factors causing players to be exposed to a differ-               performance. We discuss and showcase the challenges of an online
ent item than the heuristic recommendation. This work proposes                       solution in an A/B test. The comparison between the control and
a solution to address the problem of bias and imbalanced data in                     DFSNet test groups show a net +0.71% uplift of click value, which is
the domain of in-game recommender systems. We suggest offline                        less optimistic than the best-effort offline estimation. We show that
Debiasing Few-Shot Recommendation in Mobile Games                                                    ORSUM@ACM RecSys 2020, September 25th, 2020, Virtual Event, Brazil




              (a) Precision of click prediction up till Day24.                                           (b) Recall of click prediction up till Day24.




              (c) Precision of click prediction up till Day28.                                           (d) Recall of click prediction up till Day28.

  Figure 12: The online precision and recall of DFSNet preference predictions compared to the offline best-effort estimates.


continuous comparison of offline and online precision/recall can                               Mining (Melbourne VIC, Australia) (WSDM ’19). Association for Computing Ma-
help determine the appropriate time to retrain the model. Further                              chinery, New York, NY, USA, 456–464. https://doi.org/10.1145/3289600.3290999
                                                                                           [8] Zhengxing Chen, Christopher Amato, Truong-Huy D Nguyen, Seth Cooper,
analysis is required before putting the proposed solution live. In                             Yizhou Sun, and Magy Seif El-Nasr. 2018. Q-deckrec: A fast deck recommendation
the scenario presented in this paper, we chose the click-through                               system for collectible card games. In 2018 IEEE Conference on Computational
                                                                                               Intelligence and Games (CIG). IEEE, IEEE, Maastricht, The Netherlands, 1–8.
rate as one of the evaluation metrics, which is widely adopted in                          [9] Djork-Arné Clevert, Thomas Unterthiner, and Sepp Hochreiter. 2015. Fast
e-commerce. However, this metric might not be a good proxy for                                 and Accurate Deep Network Learning by Exponential Linear Units (ELUs).
business metrics for an in-game recommender system. Future work                                arXiv:1511.07289 [cs.LG]
                                                                                          [10] Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks
will explore the choice of metrics that constitute a better proxy of                           for YouTube Recommendations. In Proceedings of the 10th ACM Conference on
the model’s online performance.                                                                Recommender Systems (Boston, Massachusetts, USA) (RecSys ’16). Association for
   In addition, future work includes (1) designing long-term labels                            Computing Machinery, New York, NY, USA, 191–198. https://doi.org/10.1145/
                                                                                               2959100.2959190
that better approximate the business targets, (2) explicitly modeling                     [11] Krishna Gade, Sahin Cem Geyik, Krishnaram Kenthapadi, Varun Mithal, and
the interactions between different in-game features to eliminate                               Ankur Taly. 2019. Explainable AI in industry. In Proceedings of the 25th ACM
                                                                                               SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM,
the implicit feedback loop, and (3) replacing model iteration with                             ACM, New York, NY, USA, 3203–3204.
online reinforcement learning approaches.                                                 [12] Mihajlo Grbovic and Haibin Cheng. 2018. Real-time personalization using em-
                                                                                               beddings for search ranking at Airbnb. In Proceedings of the 24th ACM SIGKDD
                                                                                               International Conference on Knowledge Discovery & Data Mining. Association for
                                                                                               Computing Machinery, New York, NY, USA, 311–320.
REFERENCES                                                                                [13] Rama Hannula, Aapo Nikkilä, and Kostas Stefanidis. 2019. GameRecs: Video
                                                                                               Games Group Recommendations. In Welzer T. et al. (eds) New Trends in Databases
 [1] Kati Alha, Elina Koskinen, Janne Paavilainen, Juho Hamari, and Jani Kinnunen.             and Information Systems. ADBIS. Springer, Cham, Switzerland.
     2014. Free-to-play games: Professionals’ perspectives, In DiGRA Nordic: Pro-         [14] JaeWon Kim, JeongA Wi, SooJin Jang, and YoungBin Kim. 2020. Sequential
     ceedings of the 2014 International DiGRA Nordic Conference. Proceedings of                Recommendations on Board-Game Platforms. Symmetry 12, 2 (2020), 210.
     nordic DiGRA 11, 1–14.                                                               [15] Diederick P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic opti-
 [2] S. M. Anwar, T. Shahzad, Z. Sattar, R. Khan, and M. Majid. 2017. A game recom-            mization. In 3rd International Conference on Learning Representations, ICLR 2015,
     mender system using collaborative filtering (GAMBIT). In 2017 14th International          Yoshua Bengio and Yann LeCun (Eds.). San Diego, CA, USA, 13 pages.
     Bhurban Conference on Applied Sciences and Technology (IBCAST). 328–332.             [16] Jianxun Lian, Fuzheng Zhang, Xing Xie, and Guangzhong Sun. 2018. To-
 [3] Vladimir Araujo, Felipe Rios, and Denis Parra. 2019. Data mining for item                 wards better representation learning for personalized news recommendation:
     recommendation in MOBA games. In Proceedings of the 13th ACM Conference on                a multi-channel deep fusion approach. In Proceedings of the Twenty-Seventh
     Recommender Systems. Association for Computing Machinery, New York, NY,                   International Joint Conference on Artificial Intelligence, IJCAI-18. International
     USA, 393–397.                                                                             Joint Conferences on Artificial Intelligence Organization, 3805–3811. https:
 [4] Lucas Bernardi, Themistoklis Mavridis, and Pablo Estevez. 2019. 150 Successful            //doi.org/10.24963/ijcai.2018/529
     Machine Learning Models: 6 Lessons Learned at Booking.Com. In Proceedings of         [17] John Moore, Joel Pfeiffer, Kai Wei, Rishabh Iyer, Denis Charles, Ran Gilad-
     the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data                Bachrach, Levi Boyles, and Eren Manavoglu. 2018. Modeling and Simultaneously
     Mining (Anchorage, AK, USA) (KDD ’19). Association for Computing Machinery,               Removing Bias via Adversarial Neural Networks. arXiv:1804.06909 [cs.LG]
     New York, NY, USA, 1743–1751. https://doi.org/10.1145/3292500.3330744                [18] D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar
 [5] P. Bertens, A. Guitart, P. P. Chen, and A. Perianez. 2018. A Machine-Learning             Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Denni-
     Item Recommendation System for Video Games. In 2018 IEEE Conference on                    son. 2015. Hidden Technical Debt in Machine Learning Systems. In Proceedings
     Computational Intelligence and Games (CIG). IEEE, Maastricht, The Netherlands,            of the 28th International Conference on Neural Information Processing Systems -
     1–4.                                                                                      Volume 2. MIT Press, Cambridge, MA, USA, 2503–2511.
 [6] Paris Carbone, Stephan Ewen, Gyula Fóra, Seif Haridi, Stefan Richter, and Kostas     [19] Rafet Sifa, Raheel Yawar, Rajkumar Ramamurthy, Christian Bauckhage, and
     Tzoumas. 2017. State management in Apache Flink®: consistent stateful dis-                Kristian Kersting. 2020. Matrix-and Tensor Factorization for Game Content
     tributed stream processing. Proceedings of the VLDB Endowment 10, 12 (2017),              Recommendation. KI-Künstliche Intelligenz 34, 1 (2020), 57–67.
     1718–1729.                                                                           [20] Hongwei Wang, Fuzheng Zhang, Xing Xie, and Minyi Guo. 2018. DKN: Deep
 [7] Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, and Ed H.        knowledge-aware network for news recommendation. In Proceedings of the 2018
     Chi. 2019. Top-K Off-Policy Correction for a REINFORCE Recommender System.                world wide web conference. ACM, New York, NY, USA, 1835–1844.
     In Proceedings of the Twelfth ACM International Conference on Web Search and Data
ORSUM@ACM RecSys 2020, September 25th, 2020, Virtual Event, Brazil                                                                                             Cao, et al.


[21] Robert Williams. 2020.        Mobile games sparked 60% of 2019 global                157–168.
     game revenue, study finds.      Mobile Marketer.      Retrieved January 2,      [24] Hsin-Chang Yang, Cathy S Lin, Zi-Rui Huang, and Tsung-Hsing Tsai. 2017. Text
     2020 from https://www.mobilemarketer.com/news/mobile-games-sparked-60-               mining on player personality for game recommendation. In Proceedings of the
     of-2019-global-game-revenue-study-finds/569658/                                      4th Multidisciplinary International Social Networks Conference. Association for
[22] Chuhan Wu, Fangzhao Wu, Mingxiao An, Jianqiang Huang, Yongfeng Huang, and            Computing Machinery, New York, NY, USA, 1–6.
     Xing Xie. 2019. NPA: Neural news recommendation with personalized attention.    [25] Chang Zhou, Jinze Bai, Junshuai Song, Xiaofei Liu, Zhengchao Zhao, Xiusi Chen,
     In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge          and Jun Gao. 2018. Atrank: An attention-based user behavior modeling frame-
     Discovery & Data Mining. ACM, New York, NY, USA, 2576–2584.                          work for recommendation. In Thirty-Second AAAI Conference on Artificial Intelli-
[23] Hsin-Chang Yang and Zi-Rui Huang. 2019. Mining personality traits from social        gence. AAAI Press, New Orleans, Louisiana, USA, 4564–4571.
     messages for game recommender systems. Knowledge-Based Systems 165 (2019),