<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>mender Systems: A Case Study on Data Debiasing Techniques in Mobile Games</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yixiong Wang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maria Paskevich</string-name>
          <email>maria.paskevich@king.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hui Wang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>King</institution>
          ,
          <addr-line>Malmskillnadsgatan 19, 111 57 Stockholm</addr-line>
          ,
          <country country="SE">Sweden</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>RobustRecSys: Design</institution>
          ,
          <addr-line>Evaluation, and Deployment of Robust Recom-</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>The mobile gaming industry, particularly the free-to-play sector, has been around for more than a decade, yet it still experiences rapid growth. The concept of games-as-service requires game developers to pay much more attention to recommendations of content in their games. With recommender systems (RS), the inevitable problem of bias in the data comes hand in hand. A lot of research has been done on the case of bias in RS for online retail or services, but much less is available for the specific case of the game industry. Also, in previous works, various debiasing techniques were tested on explicit feedback datasets, while it is much more common in mobile gaming data to only have implicit feedback. This case study aims to identify and categorize potential bias within datasets specific to model-based recommendations in mobile games, review debiasing techniques in the existing literature, and assess their efectiveness on real-world data gathered through implicit feedback. The efectiveness of these methods is then evaluated based on their debiasing quality, data requirements, and computational demands.</p>
      </abstract>
      <kwd-group>
        <kwd>Recommender systems</kwd>
        <kwd>In-game recommendation</kwd>
        <kwd>Debiasing</kwd>
        <kwd>Mobile games</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In the context of mobile gaming, delivery of content to
players through recommendations plays an important role. It
could include elements such as, for example, in-game store
products or certain parts of content. However, RSs used
within this context are susceptible to bias due to (1)
limited exposure: unlike in webshops (e.g. Amazon), available
placements for sellable products in mobile games are often
limited, and showing one product to a user means that
alternatives would not be displayed; (2) the common approach of
segmenting content through fixed heuristics before adopting
RS introduces biases in the training data, which influences
the development of these models. Traditionally, at King
we have been addressing these biases by either training
models on biased data, or by establishing holdout groups
of users who would receive random recommendations for
a period of time in order to collect a uniform dataset that
reflects user preference in an unbiased way. Although the
second approach allows the collection of unbiased data, it
could compromise user experience for a segment of players,
and may lead to significant operational costs and
potential revenue losses. In previous studies, researchers have
primarily focused on data derived from explicit feedback,
where users rate items using a numerical scale, and
various debiasing techniques are tested on this data. However,
within the realm of mobile gaming, obtaining explicit
feedback afects from user experience, making it challenging to
collect. As an alternative, data is often collected through
implicit feedback [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], where user preferences are inferred
from behaviors such as impressions, purchases, and other
interactions. Given these challenges, our objectives in this
study are: (1) to identify and categorize potential bias within
our datasets; (2) to conduct a review of existing literature
on debiasing techniques and assess their efectiveness on
nEvelop-O
      </p>
      <p>
        0000-0001-8904-2052 (Y. Wang); 0009-0006-6211-1824
ologies [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ][
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. It suggests that the selection of particular
debiasing techniques should depend on the specific types
of bias present in the data, as well as on the availability of
unbiased data samples. In recommender systems for mobile
games, various types of bias can arise, including but not
limited to selection bias, exposure bias, position bias, and
conformity bias. Some of the relevant methods to debias the
data in these cases could be The Inverse Propensity Scoring
(IPS) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] method, which deals with selection and exposure
biases by weighting observations inversely to their selection
probability, and does so without need for unbiased data. Yet
the method could potentially result in high variance due
to the challenges in accurately estimating propensities.
Potential solutions to the high variance issue of IPS method
include, for example, using Doubly Robust (DR) learning
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] that introduces a novel approach to loss functions as
a combination of IPS-based models with imputation-based
models. The combination of two models assures doubly
robustness property when either of the two components
(propensity estimation or imputed data) remains accurate.
This method, though, relies on having an unbiased data
sample to work. Another option is model-agnostic and
biasagnostic solutions like AutoDebias [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], which are based on
meta-learning to dynamically assign weights within the RS,
aiming to neutralize biases across the board. A potential
benefit of such solution is that it doesn’t require knowing the
types of bias present in the data, but as a downside, it also
relies on randomized samples. In addition, the process of
fitting multiple models makes training more computationally
demanding. Despite the advances and variety of available
debiasing techniques, applying Recommendation Systems
to mobile gaming content remains a relatively untapped
CEUR
      </p>
      <p>ceur-ws.org
which can make the implementation and maintenance
challenging. Moreover, it requires constant feedback loops over
time and the model’s performance is highly dependent on
the quality and recency of the training data.
3.</p>
    </sec>
    <sec id="sec-2">
      <title>Methodology</title>
      <sec id="sec-2-1">
        <title>3.1. Datasets</title>
        <p>
          Our study utilized two public datasets (COAT[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ],
yahooR3![
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]) to validate theoretical results and three
proprietary datasets from King (Set A, Set B, Set C) that are
focused on user-item interactions in game shops within
Match-3 Game A and Match-3 Game B (Fig.1). The sizes of
each dataset, along with their respective feedback types, are
provided in Table 1. We aimed to observe the efectiveness
of diferent techniques on datasets collected with explicit
feedback (public datasets), and those with implicit feedback
(King’s datasets). Explicit feedback is typically collected by
asking users to rate items on a numerical scale, for example
from 1 to 5, where 1 indicates disinterest, 2 signifies
dissatisfaction, and 5 shows a preference. In contrast, Implicit
feedback (as in the proprietary datasets) involves a binary
response from users: purchase or non-purchase. This setup
makes it harder to accurately measure user preferences. As
discussed in the Introduction, mobile games often have
limited space for displaying sellable products, which is the case
for all three proprietary datasets. This limitation leads to
exposure bias in the data. Additionally, placement of diferent
products within the game shop creates positional bias, with
The sizes and feedback types of all datasets used in this study.
A key diference is that the open datasets (COAT and YahooR3!)
provide explicit feedback, while the proprietary datasets (A, B,
and C) ofer only implicit feedback (purchase/no purchase). Set A,
a proprietary dataset, lacks randomized data, limiting debiasing
options.
        </p>
        <sec id="sec-2-1-1">
          <title>Dataset</title>
        </sec>
        <sec id="sec-2-1-2">
          <title>COAT</title>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>3.2. Selection of Debiasing techniques</title>
        <p>The primary reasoning for the selection of debiasing
techniques for this study was based in a literature review, and
included the applicability of each method to the specific
biases present in the propreitery datasets—namely,
selection bias, exposure bias, and position bias. Further, it was
imperative to evaluate techniques across two dimensions:
those that require randomized datasets and those that do
not, as well as to examine methodologies that are agnostic
to any particular type of bias. Given the identified biases in
the datasets, we adopted several debiasing techniques: (1)
Matrix Factorisation (MF) as a baseline model, Inverse
Propensity Scoring (IPS), a method that does not require
randomized data collection and primarily addresses
selection and exposure biases. (2) Doubly Robust learning,
that tackles the same biases but, unlike IPS, requires a
randomized dataset. And (3) AutoDebias (DR), a bias-agnostic
technique that also needs randomized data. Each method
was tested across all datasets to evaluate model performance
and complexity. We initially applied MF to biased dataset  
to establish metrics for comparison, we denote our baseline
model as MF(biased), then compared these outcomes with
the results from the debiasing methods.</p>
      </sec>
      <sec id="sec-2-3">
        <title>3.3. Evaluation metrics</title>
        <p>For models’ evaluation, we use metrics that assess both
predictive power of the models (RMSE and AUC), as well as
quality of ranking (NDCG@5) and inequality and diversity
in the recommendations (Gini index and Entropy):
• NDCG@5 assesses the model’s ability to rank
relevant items in the recommendation list:
sents items ordered by their relevance up to position
k.
(AUC, RMSE, NDCG@5, Gini, and Entropy) for various models relative to MF(biased). AUC is plotted against other metrics to
demonstrate the trade-of between diversity gains in recommendation systems and potential compromises in predictive power.
Diferent models are represented by colors, training times are indicated by point sizes, and dataset types are distinguished by
shapes.
where  t+e is the number of positive samples in test
set  te, and</p>
        <p>, denotes the position of a positive
feedback (, ) . In experimentation, AUC mainly
served as a metric to prevent overfitting and help
ifne-tunning in validation phase.
• Gini index measures inequality in the
recommendations distribution. The higher coeficient indicates
higher inequality


 =</p>
        <p>∑=1 (2 −  − 1 )  ()
 ⋅ ∑</p>
        <p>=1  ()
Where   is the popularity score of the  -th item, with
the scores   arranged in ascending order (  ≤  +1 ),
and  represents the total number of items.
• Entropy measures the diversity in the distribution
of recommended items with higher values indicating
higher diversity.</p>
        <p>= −
∑   log(  ),
=1
where  is a total number of items u in a dataset and
  is a probability of an item being recommended.</p>
        <p>Additionally, we include Training Time, defined as the
time required for each model to reach saturation, measured
in seconds. This metric provides insights into the
computational complexity and the resources required by diferent
methodologies.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Experimentation</title>
      <p>
        We regard biased data as training set,   . When it comes
to randomized data, following the strategies as mentioned
in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], we split it into 3 parts: 5% for randomised set  
to help training as required by DR and Autodebias, 5% for
validation set   to tune hyper-parameters and incur
earlystopping mechanism to prevent overfitting, the rest 90% for
test set    to evaluate the model. For conformity reasons,
the data split strategy mentioned above is applied to both
open datasets and proprietary datasets. For this project, we
deploy a training pipeline on Vertex AI [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], integrating
components such as data transformation powered by
BigQuery, model training and evaluation, as well as experiment
tracking. The training pipeline retrieves data from the data
warehouse to train models and produces artifacts that are
later integrated into an experiment tracker. By adopting
this artifact-based approach, we address the inherent
challenge of reproducibility in operationalizing ML projects, as
it provides all the necessary components to reproduce
experiments. Each experiment is run up to 10 times on Vertex
AI with the same hyper parameters, but varying random
seeds to get estimation on the variability of the results.
      </p>
      <p>A pipeline plays a pivotal role in enhancing machine
learning processes within the industry by automating each
step from data fetching to model evaluation. For this project,
a training pipeline was implemented on Vertex AI,
encompassing components such as data transformation utilizing
BigQuery, model training, model evaluation, and
experiment tracking. All the experiments were conducted within</p>
    </sec>
    <sec id="sec-4">
      <title>5. Experimentation results</title>
      <p>The absolute results of all experiments, including
confidence intervals, are presented in Table 4. In this section,
we report the percentage improvement of various
debiasing techniques compared to the baseline model, which was
trained on biased data (MF(biased) model).</p>
      <sec id="sec-4-1">
        <title>5.1. Open Datasets</title>
        <p>For the COAT dataset, the results show varying degrees of
improvement across diferent metrics (Table 2). The top
performing method (AutoDebias), exhibited the best
improvements in RMSE (-5.06%), AUC (0.39%) and NGCG@5 (3.73%)
with low changes in Gini (0.16%) and no improvement in
Entropy. DR also provided higher gains in NDCG@5 (2.75%),
and performed better in Gini (-18.88%) and Entropy (6.16%),
but at a cost of higher RMSE (3.86%) and lower AUC (-1.57%).
While AutoDebias outperformed other techniques when it
comes to improving predictive power of the model (AUC,
RMSE), it was not very eficient in terms of Gini and
Entropy, and has a significantly higher computational cost.
This highlights a trade-of between improved accuracy and
increased resource requirements.</p>
        <p>For YahooR3! dataset, again, AutoDebias results in
the highest improvement in RMSE (-36.89%), AUC (1.79%),
NDCG@5 (20.70%), as well as Gini (-58.15%) and Entropy
(4.26%), but did so also with dramatically increased
computational cost (3216%). IPS provides a balanced performance
with improvements in RMSE (-29.70%) and Entropy (0.82%)
at a lower computational cost (-22.98%), making it a practical
choice for resource-constrained environments.</p>
      </sec>
      <sec id="sec-4-2">
        <title>5.2. Internal Datasets</title>
        <p>For the internal datasets, the results are less consistent
across the datasets and debiasing techniques (Table 3). This
may be due to the fact that internal datasets employed
implicit feedback when collecting data, where user preferences
are inferred from their impression and purchase records.
This can introduce biases due to the lack of negative
samples and overrepresentation of user interactions, potentially
skewing the models towards popular items.</p>
        <p>Set A is a relatively small dataset (Table 1), and the lack of
randomized data limits our options to only using IPS. As a
result, some metrics, such as RMSE and AUC, actually worsen
(Table 3), which we might accept as a trade-of to achieve
better balance in recommendations. However, NDCG@5
also does not improve. On the positive side, IPS enhances
diversity metrics, with Gini improving by 3.06% and
Entropy by 0.41%, while also reducing computational cost by
4.27%. Overall, applying this method increases model
diversity with comparable training time, but comes at the cost of
accuracy.</p>
        <p>Set B demonstrates substantial improvements with DR,
including a 45.40% reduction in RMSE, a 7.07% increase in
AUC, and gains in NDCG@5 (0.68%) and Gini (-0.54%),
making the model perform better in both accuracy and diversity.
However, this comes at a significant computational cost,
increasing training time by 386.46%. Given the total number
of samples being 318k, this leads to a considerably longer
training process. AutoDebias ranks second in RMSE
improvement (-26.46%), while IPS shows a positive gain in
AUC (3.18%). However, DR is the only method that
consistently improves outcomes of NDCG@5, Gini, and Entropy.</p>
        <p>For Set C, the largest dataset with nearly 2.2 million
samples, AutoDebias achieves the highest improvement
in AUC (2.61%) and maintains stable NDCG@5. However,
it underperforms compared to the baseline and other
tech</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. Conclusion and Future work</title>
      <p>
        Implementing more accurate and less biased models is
crucial to avoiding the perpetuation of negative feedback loops
and the overexposure of certain items caused by
segmentation heuristics in retraining data. This approach also
enhances data quality, which is essential for fine-tuning
models. A recommender system that diversifies content
exposure improves user experience by ensuring that
visibility is not limited to only the most popular items. In
our experiments, Inverse Propensity Scoring (IPS) stands
out for its simplicity and model-agnostic nature, requiring
no randomized data collection and fewer training epochs.
However, the improvements it ofers are somewhat limited.
AutoDebias excels in improving accuracy metrics, but at
substantially higher computational costs and sometimes
poorer performance in Gini and Entropy. DR still ofers
strong improvement in observed metrics, including Gini
and Entropy. So while each debiasing method has its own
trade-ofs, significant performance gains still depend on the
challenging task of collecting randomized datasets, as
highlighted in our introduction. Potential future work includes:
(1) adopting online reinforcement learning approach such
as Multi-Armed Bandit (MAB) [
        <xref ref-type="bibr" rid="ref14 ref15 ref16">14, 15, 16</xref>
        ] for data
collection, including contextual bandit models, (2) developing
and testing combined debiasing models which can
combine strengths of diferent debiasing techniques to mitigate
various biases simultaneously while optimizing for
computational eficiency.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Oard</surname>
            ,
            <given-names>Douglas W.&amp; Jinmook</given-names>
          </string-name>
          <string-name>
            <surname>Kim</surname>
          </string-name>
          .
          <article-title>Implicit feedback for recommender systems</article-title>
          .
          <source>Proceedings of the AAAI workshop on recommender systems</source>
          . Vol.
          <volume>83</volume>
          .
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Jiawei</given-names>
            <surname>Chen</surname>
          </string-name>
          and
          <article-title>Hande Dong and Xiang Wang and Fuli Feng and Meng Wang &amp; Xiangnan He, Bias and Debias in Recommender System: A Survey and Future Directions</article-title>
          .
          <source>ACM Trans. Inf. Syst</source>
          .
          <volume>41</volume>
          ,
          <issue>3</issue>
          ,
          <string-name>
            <surname>Article 67</surname>
          </string-name>
          (
          <year>2023</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Harald</given-names>
            <surname>Steck</surname>
          </string-name>
          ,
          <article-title>Training and testing of recommender systems on data missing not at random</article-title>
          .
          <source>In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD</source>
          <year>2010</year>
          ).
          <article-title>Association for Computing Machinery</article-title>
          , New York, NY, USA,
          <fpage>713</fpage>
          -
          <lpage>722</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Tobias</given-names>
            <surname>Schnabel</surname>
          </string-name>
          , Adith Swaminathan, Ashudeep Singh,
          <string-name>
            <given-names>Navin</given-names>
            <surname>Chandak</surname>
          </string-name>
          , and Thorsten Joachims,
          <article-title>Recommendations as Treatments: Debiasing Learning and Evaluation</article-title>
          .
          <source>In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 (ICML</source>
          <year>2016</year>
          ). JMLR.org,
          <volume>1670</volume>
          -
          <fpage>1679</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Quanyu</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Haoxuan</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Peng Wu</given-names>
            , Zhenhua Dong,
            <surname>Xiao-Hua</surname>
          </string-name>
          <string-name>
            <surname>Zhou</surname>
          </string-name>
          , Rui Zhang, Rui Zhang, and
          <string-name>
            <given-names>Jie</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A Generalized</given-names>
            <surname>Doubly</surname>
          </string-name>
          <article-title>Robust Learning Framework for Debiasing Post-Click Conversion Rate Prediction</article-title>
          .
          <source>In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD</source>
          <year>2022</year>
          ).
          <article-title>Association for Computing Machinery</article-title>
          , New York, NY, USA,
          <fpage>252</fpage>
          -
          <lpage>262</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Jiawei</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <surname>Hande</surname>
            <given-names>Dong</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            <given-names>Qiu</given-names>
          </string-name>
          , Xiangnan He, Xin Xin, Liang Chen,
          <string-name>
            <given-names>Guli</given-names>
            <surname>Lin</surname>
          </string-name>
          , and
          <string-name>
            <surname>Keping</surname>
            <given-names>Yang</given-names>
          </string-name>
          ,
          <article-title>AutoDebias: Learning to Debias for Recommendation</article-title>
          .
          <source>In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR</source>
          <year>2021</year>
          ).
          <article-title>Association for Computing Machinery</article-title>
          , New York, NY, USA,
          <fpage>21</fpage>
          -
          <lpage>30</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Andrés</given-names>
            <surname>Villa</surname>
          </string-name>
          , Vladimir Araujo, Francisca Cattan, and Denis Parra,
          <article-title>Interpretable Contextual Team-aware Item Recommendation: Application in Multiplayer Online Battle Arena Games</article-title>
          .
          <source>In Proceedings of the 14th ACM Conference on Recommender Systems (RecSys</source>
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Qilin</given-names>
            <surname>Deng</surname>
          </string-name>
          , Kai Wang,
          <string-name>
            <surname>Minghao Zhao</surname>
            ,
            <given-names>Zhene</given-names>
          </string-name>
          <string-name>
            <surname>Zou</surname>
          </string-name>
          , Runze Wu, Jianrong Tao, Changjie Fan, and
          <article-title>Liang Chen Personalized Bundle Recommendation in Online Games</article-title>
          .
          <source>In Proceedings of the 29th ACM International Conference on Information &amp; Knowledge Management (CIKM</source>
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Meng</given-names>
            <surname>Wu</surname>
          </string-name>
          , John Kolen, Navid Aghdaie, and
          <string-name>
            <surname>Kazi</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Zaman</surname>
          </string-name>
          .
          <article-title>Recommendation Applications and Systems at Electronic Arts</article-title>
          .
          <source>In Proceedings of the Eleventh ACM Conference on Recommender Systems (RecSys</source>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>Lele</given-names>
          </string-name>
          &amp; Asadi, Sahar &amp; Biasielli, Matteo &amp; Sjöberg, Michael.
          <source>Debiasing Few-Shot Recommendation in Mobile Games. Workshop of ACM Conference on Recommender Systems (RecSys</source>
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Liu</surname>
          </string-name>
          , Dugang and Cheng, Pengxiang and Dong,
          <article-title>Zhenhua and He, Xiuqiang and Pan, Weike and Ming, Zhong A general knowledge distillation framework for counterfactual recommendation via uniform data</article-title>
          .
          <source>Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR</source>
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Google</surname>
          </string-name>
          .
          <year>2024</year>
          ,
          <string-name>
            <surname>Vertex</surname>
            <given-names>AI</given-names>
          </string-name>
          .
          <source>Retrieved December 1</source>
          ,
          <year>2023</year>
          from https://cloud.google.com/vertex-ai,
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Marlin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Benjamin</surname>
            <given-names>M.</given-names>
          </string-name>
          , and Richard S. Zemel.
          <article-title>Collaborative prediction and ranking with non-random missing data</article-title>
          .
          <source>Proceedings of the third ACM conference on Recommender systems (RecSys</source>
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Felício</surname>
            ,
            <given-names>Crícia Z.</given-names>
          </string-name>
          ,
          <source>Klérisson VR Paixão</source>
          , Celia AZ Barcelos, and
          <string-name>
            <given-names>Philippe</given-names>
            <surname>Preux</surname>
          </string-name>
          .
          <article-title>A multi-armed bandit model selection for cold-start user recommendation</article-title>
          .
          <source>In Proceedings of the 25th conference on user modeling, adaptation and personalization</source>
          , pp.
          <fpage>32</fpage>
          -
          <lpage>40</lpage>
          .
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Wang</surname>
            , Lu, Chengyu Wang,
            <given-names>Keqiang</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            , and
            <given-names>Xiaofeng</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
          </string-name>
          .
          <article-title>Biucb: A contextual bandit algorithm for cold-start and diversified recommendation</article-title>
          .
          <source>In 2017 IEEE International Conference on Big Knowledge (ICBK)</source>
          , pp.
          <fpage>248</fpage>
          -
          <lpage>253</lpage>
          . IEEE,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Wang</surname>
            , Qing, Chunqiu Zeng, Wubai Zhou,
            <given-names>Tao</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>S. Sitharama</given-names>
          </string-name>
          <string-name>
            <surname>Iyengar</surname>
          </string-name>
          , Larisa Shwartz, and Genady Ya Grabarnik.
          <article-title>Online interactive collaborative filtering using multi-armed bandit with dependent arms</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>31</volume>
          , no.
          <issue>8</issue>
          (
          <year>2018</year>
          ):
          <fpage>1569</fpage>
          -
          <lpage>1580</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>