<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Meta-Learning and MAB Approach for Context-Specific Multi-Objective Recom mendation Optimization</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tiago Cunha</string-name>
          <email>tsacunha@expediagroup.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Marchini</string-name>
          <email>amarchini@expediagroup.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Multi-Stakeholder, Multi-Armed bandits, Meta-Learning</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Expedia Group</institution>
          ,
          <country country="PT">Portugal</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Expedia Group</institution>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Recommender systems in online marketplaces face the challenge of balancing multiple objectives to satisfy various stakeholders, including customers, providers, and the platform itself. This paper introduces Juggler-MAB, a hybrid approach that combines meta-learning with Multi-Armed Bandits (MAB) to address the limitations of existing multi-stakeholder recommendation systems. Our method extends the Juggler framework, which uses meta-learning to predict optimal weights for utility and compensation adjustments, by incorporating a MAB component for real-time, context-specific refinements. We present a two-stage approach where Juggler provides initial weight predictions, followed by MAB-based adjustments that adapt to rapid changes in user behavior and market conditions. Our system leverages contextual features such as device type and brand to make fine-grained weight adjustments based on specific segments. To evaluate our approach, we developed a simulation framework using a dataset of 0.6 million searches from Expedia's lodging booking platform. Results show that Juggler-MAB outperforms the original Juggler model across all metrics, with NDCG improvements of 2.9%, a 13.7% reduction in regret, and a 9.8% improvement in best arm selection rate.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Recommender systems often focus solely on user satisfaction. However, in many real-world applications,
particularly in online marketplaces, multiple stakeholders’ interests need to be considered [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. These
stakeholders typically include users (customers), item providers (e.g., hotel owners), and the platform.
Multi-stakeholder recommenders aim to balance these diverse and often conflicting objectives [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ].
      </p>
      <p>
        The Juggler framework [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] was introduced to address this multi-stakeholder recommendation problem
by using meta-learning [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ] to predict optimal weights for utility and compensation adjustments in
real-time scoring. Deployed in production, Juggler has been an integral part of the Lodging Ranking
stack at Expedia. However, Juggler’s reliance on a pre-configured set of five options for relevance and
compensation limits its ability to fine-tune recommendations for specific contexts. Additionally, its
infrequent training cycles make it less responsive to rapid changes in trafic patterns across segments.
      </p>
      <p>To address these limitations, we propose a two-step approach that combines meta-learning (Juggler)
with Multi-Armed Bandits for multi-stakeholder recommendations for real-time weight adjustments.
This approach aims to: 1) provide more granular weight adjustments based on specific segments (e.g.,
device type, brand) and 2) adapt quickly to changes in trafic patterns without requiring frequent
retraining of the main Juggler model. Our research questions are:
• Can the integration of MAB with Juggler improve the performance and adaptability of
multistakeholder recommendations in online marketplaces?
• Are contextual features useful to improve the MAB’s efectiveness at making the right decisions?
SURE workshop held in conjunction with the 18th ACM Conference on Recommender Systems (RecSys), 2024, in Bari, Italy.</p>
      <p>CEUR</p>
      <p>ceur-ws.org</p>
      <p>The rest of this paper is organized as follows: Section 2 presents the related work, while Section 3
introduces the proposed hybrid solution. Section 4 covers the experimental setup used to validate the
proposal and while Section 5 reports the results to the research questions. Lastly, Section 6 highlights
the main conclusions and avenues for future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background and Related Work</title>
      <p>
        The Juggler framework [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] was introduced as a meta-learning approach to address the multi-stakeholder
recommendation problem. It dynamically predicts the ideal weights for utility (user relevance) and
compensation (platform revenue) for each search query. The meta-model leverages a collection of
historical search queries and learns the mapping between the search context and the ideal utility and
compensation weights, learned via ofline simulations. Juggler selects from five pre-configured options,
each representing a diferent balance between relevance and compensation: 1) Lower relevance, lower
compensation, 2) Lower relevance, higher compensation, 3) Neutral relevance, neutral compensation,
4) Higher relevance, lower compensation and 5) Higher relevance, higher compensation. The
preconfigured options refer to sections of the search space which are explored to identify diferent directions
of improvement, while reducing the number of options to ultimately choose from. It is noteworthy
that although the pre-configured options are fixed, the actual instantiation of weights for each option
depends on the ranking problem characteristics and Juggler framework hyper-parameters. While
Juggler has shown success in production, its reliance on these fixed options and infrequent training
cycles limits its adaptability to rapid changes in user behavior and market conditions.
      </p>
      <p>
        Multi-Armed Bandits (MAB) are a class of reinforcement learning algorithms that balance exploration
and exploitation in decision-making processes [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. In the context of recommenders, MABs have been
used to address the exploration-exploitation dilemma and to adapt to changing user preferences [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ].
      </p>
      <p>
        The integration of meta-learning and bandit algorithms has been explored in other domains, such as
algorithm selection [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and hyperparameter optimization [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Our work extends these ideas to the
realm of multi-stakeholder recommendations, addressing the unique challenges of online marketplaces.
      </p>
      <p>
        Several studies have addressed the challenge of balancing multiple objectives in recommender
systems. Rodriguez et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] proposed a multi-objective optimization approach for job recommendations.
Nguyen et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] introduced a multi-objective learning to re-rank approach to optimize online
marketplaces for multiple stakeholders. Sürer et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] explored multi-stakeholder recommendation with
provider constraints. These approaches provide valuable insights into balancing multiple objectives, but
our proposed method aims to extend their capabilities by combining meta-learning with multi-armed
bandits for enhanced adaptability in dynamic online marketplaces.
      </p>
      <p>
        Recent developments in industry have led to the creation of self-service platforms for deploying
contextual bandits, such as AdaptEx [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. These platforms provide powerful tools for optimizing user
experiences at scale, which we leverage in our hybrid approach to combine the strengths of meta-learning
and MAB algorithms. To evaluate our approach, we utilized a custom simulation framework based on
real-world data from an online travel marketplace. This allowed us to assess the performance of our
system in a controlled yet realistic setting, similar to other sophisticated simulation environments [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Juggler with MAB</title>
      <p>
        We present a hybrid approach that combines the Juggler framework’s meta-learning capabilities [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
with a MAB system powered by the AdaptEx SDK [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. This approach, which we call ”Juggler-MAB”
aims to address the limitations of the original Juggler system while leveraging the adaptive capabilities
of contextual bandits. The Juggler-MAB system operates in two stages:
1. Juggler Stage: The meta-learning model predicts initial utility and compensation weights based
on search context.
2. MAB Stage: A contextual MAB refines these weights in real-time based on user interactions and
search features.
      </p>
      <p>
        The Juggler framework selects from five pre-configured options for utility and compensation weights,
providing a coarse adjustment of the recommendation strategy based on the search context. These
options range from lower relevance and compensation to higher relevance and compensation, as
described in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and aim to tackle the main issues in multi-objective optimization.
      </p>
      <p>The MAB component introduces fine-grained adjustments to the Juggler-predicted weights. Each
arm of the bandit represents a small corrective measure to be applied to the utility and compensation
weights to improve relevance.</p>
      <p>The key features of our MAB implementation include:
1. Contextual arms: The contextual bandits consider contextual features (e.g., device type, brand)
when selecting arms.
2. Reward function: We use Normalized Discounted Cumulative Gain (NDCG) as a proxy for
Conversion Rate, allowing for ofline simulation and evaluation.</p>
      <p>
        balance exploration and exploitation efectively [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
3. Exploration strategy: We employ epsilon-greedy and Thompson Sampling for its ability to
The integration of Juggler and MAB is achieved through an additive approach in the scoring function:
(1)
(2)
   =(
  

+ ( 
  
+
      </p>
      <p>+  

) ⋅   
) ⋅  
where 
  

and</p>
      <p>corrective weights determined by the MAB.</p>
      <p>are the weights predicted by Juggler, and  
and  
are the</p>
      <p>We formulate our contextual MAB problem as follows: let  be the set of arms, where each arm
 ∈ 
represents a pair of corrective weights ( 
,  

features such as device or brand. The reward   is defined as the NDCG of the resulting ranking. The
goal is to find a policy  ∶  → 
that maximizes the expected cumulative reward:
). The context   ∈</p>
      <p>at time  includes

=1

max  [∑   (  ,  (  ))]
where  is the time horizon.</p>
      <p>We explored various methods to combine Juggler’s predictions with MAB corrections, ultimately
settling on the additive approach described above. We carefully selected contextual variables that
would help identify under-performing segments in the Juggler model, such as device type and brand.
Balancing multiple objectives in a single reward function required careful consideration. We chose
NDCG as an initial approach due to its widely accepted usage, with plans to explore more complex
multi-objective reward functions in future work.</p>
      <p>To evaluate our hybrid approach, we developed a custom simulator that allows us to test various
configurations ofline using historical data. The simulator, built on Expedia data, enables to:
1. Replay historical searches and user interactions. Data is loaded on a daily basis, consisting of
data for each property in each search and the respective user clicks and bookings.
2. Apply the Juggler-MAB model to generate new rankings. The MAB is sampled (ppotentially
using contextual data) and the retrieved arm is included in the ranking formula, yielding the
simulated score and the final ranking.
3. Evaluate the performance using both immediate (e.g., clicks) and delayed (e.g., bookings) feedback.</p>
      <p>The reward function evaluates the simulated rankings and information about the arm sampled,
reward and contextual information (if any) is provided to the MAB, to update its internal state.</p>
      <p>The simulation framework provides a safe environment to test and refine our approach before
considering online deployment.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Setup</title>
      <p>We used a dataset of 0.6 million searches from Expedia’s lodging booking platform, covering a period of
31 consecutive days. The data has over 600000 distinct properties across approximately 41000 distinct
destinations, with feedback sparsity over 96%. The dataset includes features such as device type, brand,
destination, and historical user interactions.</p>
      <p>
        We compared several variants of the proposed Juggler-MAB hybrid approach against the original
Juggler model [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. We tested several MAB algorithms, ranging from classical (i.e. no contextual features)
to contextual bandits:
• Gaussian Thompson (GT): a classical bandit using Thompson Sampling assuming a Gaussian
      </p>
      <p>Distribution of reward value.
•  -greedy: a classical bandit using a vanilla implementation of the canonical algorithm. We have
used  = 0.1 and  = 0.3 .
• Recursive Least Squares with Thompson Sampling (RLS): a contextual bandit using a linear model
with a vector of means and a matrix of variances-covariances.</p>
      <p>
        The experiments use the actual production Juggler model predictions for each search. This improves
the reliability of Juggler’s predictions, which in turn leads to more robust estimates of the MAB’s
efect. We then implemented the MAB component using the AdaptEx SDK [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], with the following
configuration:
• Arm space: we explore 3 diferent values for each arm, respectively   ∈ {−0.3, 0.0, 0.3} and
  ∈ {−0.2, 0.0, 0.2}. The selected weights are determined via domain knowledge, also ensuring
non-zero weights.
• Contextual features: several low cardinality categorical search features were tested, with 3 being
identified as the most important: brand, user device and geographical categorization of the search
destination, e.g. neighborhood vs city.
• Exploration strategy: Thompson Sampling and  -greedy
• Reward: Normalized Discounted Cumulative Gain (NDCG), to determine how well can MAB
algorithms correct towards relevance and expected conversion rate improvement.
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Results and Discussion</title>
      <p>Our Juggler-MAB hybrid approach outperformed the Juggler baseline across all metrics for all bandits
proposed. The NDCG improvements range from +0.8% for GT bandit, all the way to +2.9% in several
RLS bandits. In terms of regret, we achieve a reduction of 13.7% and an improvement in best arm
selection rate of 9.8%.</p>
      <p>The  -greedy algorithms provide very strong baselines, especially when  = 0.1 . GT bandit is clearly
the worst bandit, but yet useful since it outperforms the baseline. Among the contextual bandits, the
best one across all metrics is the   . Interestingly, when using more contextual features, we did
not achieve better performance. Further investigations are required to identify what matters to define
the context.</p>
      <p>
        We performed Wilcoxon signed-rank tests and observed no statistical diference between all RLS
bandits. The Critical Diference [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] diagram for the remaining bandits is shown in Figure 1. The results
show no statistical significant diference between RLS and  = 0.1 , hinting that contextual features are
not meaningful. However, all RLS bandits are better than the baselines. To note as well how all bandits
are better than the Juggler baseline - this is a testament to the value of hybrid approach proposed.
      </p>
      <p>Figure 2 shows the learning dynamics for all bandits across all days in the data sample. To improve
interpretation, we include only the best contextual bandit. The Juggler-MAB demonstrated fast
adaptation to changing conditions. We observed that the MAB component was able to make fine-grained
adjustments to the Juggler predictions, resulting in improved performance.</p>
      <p>We inspect now Juggler-MAB’s efect on lodging ranking top-10 average statistics in Table 2. The
results are reported as diferences to the Juggler baseline, as we cannot expose the sensitive raw data.</p>
      <p>Metric
daily price
guest rating
star rating
margin %
margin $</p>
      <p>The results show a clear pattern for all bandits: average daily price decreases and guest and star
ratings increase as NDCG improves. On the contrary, margin % and margin $ decreases, which could
pose problems to the marketplace objectives and long term health. The expectations, to be validated
via AB test, is that the increase in relevance will lead to an improvement in conversion rate which can
ofset the impact in profit per transaction.</p>
      <p>Diving now deeper into the arms selection per bandit, we present Figure 3. The results show a clear
and expected preference towards arms lower compensation weights, as they are not aligned with the
NDCG reward. However, it is interesting to observe that the best bandit has learned that not only is it
ideal to decrease compensation, but also to increase or decrease relevance depending on the context.</p>
      <p>
        Despite the overall positive results, we identified two limitations. First, the reward function considers
only a single dimension of the problem (i.e. relevance), which explains the impact to the
compensation component. Future work will address this limitation by using multi-objective optimization
techniques [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Second, our current simulations use historical interactions with a deterministic logging
policy, introducing bias. To address it, we will implement of-policy evaluation techniques [
        <xref ref-type="bibr" rid="ref19 ref20">19, 20</xref>
        ]
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Future Work</title>
      <p>In this paper, we presented a novel hybrid approach combining Meta-Learning with Multi-Armed
Bandits for multi-stakeholder recommendations in online travel marketplaces. Our Juggler-MAB
system demonstrated significant improvements over existing methods. Key contributions of our work
include 1) an integration of meta-learning and contextual bandits for recommendation systems and 2)
empirical evidence of the efectiveness of our approach in a large-scale, real-world setting. Based on
our findings and the limitations identified, we propose the following directions for future research:
1. Online testing: Conduct A/B tests in a production environment to validate the performance of</p>
      <p>
        Juggler-MAB under real-world conditions and user behaviors
2. Dynamic arm space: Explore methods for dynamically adjusting the arm space of the MAB
component based on observed performance and changing market conditions.
3. Fairness considerations: Incorporate explicit fairness constraints or objectives into the MAB
formulation to ensure equitable treatment of diferent provider segments [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]
4. Long-term value optimization: Extend the approach to consider long-term user value, potentially
using reinforcement learning techniques for sequential decision-making.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Abdollahpouri</surname>
          </string-name>
          , G. Adomavicius,
          <string-name>
            <given-names>R.</given-names>
            <surname>Burke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Guy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jannach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kamishima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Krasnodebski</surname>
          </string-name>
          , L. Pizzato,
          <article-title>Multistakeholder recommendation: Survey and research directions, User Modeling and User-Adapted Interaction 30 (</article-title>
          <year>2020</year>
          )
          <fpage>127</fpage>
          -
          <lpage>158</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H.</given-names>
            <surname>Abdollahpouri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Burke</surname>
          </string-name>
          <article-title>, Multi-stakeholder recommendation and its connection to multi-sided fairness</article-title>
          , in: Workshop on Recommendation in Multi-stakeholder
          <string-name>
            <surname>Environments</surname>
          </string-name>
          (RMSE'19),
          <source>in Conjunction with the 13th ACM Conference on Recommender Systems, RecSys'19</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Mehrotra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Carterette</surname>
          </string-name>
          ,
          <article-title>Recommendations in a marketplace</article-title>
          ,
          <source>in: Proceedings of the 13th ACM Conference on Recommender Systems</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>580</fpage>
          -
          <lpage>581</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Jannach</surname>
          </string-name>
          , G. Adomavicius,
          <article-title>Recommendations with a purpose</article-title>
          ,
          <source>in: Proceedings of the 10th ACM Conference on Recommender Systems</source>
          , ACM,
          <year>2016</year>
          , pp.
          <fpage>7</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Cunha</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Partalas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <article-title>Juggler: Multi- stakeholder ranking with meta-learning</article-title>
          ,
          <source>in: Proceedings of the MORS workshop at the 15th ACM Conference on Recommender Systems, CEUR Workshop Proceedings</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Cunha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Soares</surname>
          </string-name>
          , A. C. de Carvalho,
          <article-title>Metalearning and recommender systems: A literature review and empirical study on the algorithm selection problem for collaborative filtering</article-title>
          ,
          <source>Information Sciences 423</source>
          (
          <year>2018</year>
          )
          <fpage>128</fpage>
          -
          <lpage>144</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Cunha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Soares</surname>
          </string-name>
          , A. C. de Carvalho,
          <article-title>Cf4cf: Recommending collaborative filtering algorithms using collaborative filtering</article-title>
          ,
          <source>in: RecSys 2018 - Proceedings of the 12th ACM Conference on Recommender Systems</source>
          ,
          <year>2018</year>
          , p.
          <fpage>357</fpage>
          -
          <lpage>361</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>T.</given-names>
            <surname>Lattimore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Szepesvári</surname>
          </string-name>
          , Bandit algorithms, Cambridge University Press (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Langford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. E.</given-names>
            <surname>Schapire</surname>
          </string-name>
          ,
          <article-title>A contextual-bandit approach to personalized news article recommendation</article-title>
          ,
          <source>in: Proceedings of the 19th international conference on World wide web</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>661</fpage>
          -
          <lpage>670</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Factorization bandits for interactive recommendation</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence</source>
          , volume
          <volume>31</volume>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>F.</given-names>
            <surname>Hutter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. H.</given-names>
            <surname>Hoos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Leyton-Brown</surname>
          </string-name>
          ,
          <article-title>Sequential model-based optimization for general algorithm configuration</article-title>
          ,
          <source>International conference on learning and intelligent optimization</source>
          (
          <year>2011</year>
          )
          <fpage>507</fpage>
          -
          <lpage>523</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Falkner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Klein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hutter</surname>
          </string-name>
          ,
          <article-title>Bohb: Robust and eficient hyperparameter optimization at scale</article-title>
          ,
          <source>in: International Conference on Machine Learning, PMLR</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1437</fpage>
          -
          <lpage>1446</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Rodriguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Posse</surname>
          </string-name>
          ,
          <string-name>
            <surname>E. Zhang,</surname>
          </string-name>
          <article-title>Multiple objective optimization in recommender systems</article-title>
          ,
          <source>in: Proceedings of the sixth ACM conference on Recommender systems</source>
          ,
          <year>2012</year>
          , pp.
          <fpage>11</fpage>
          -
          <lpage>18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>P.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dines</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Krasnodebski</surname>
          </string-name>
          ,
          <article-title>A multi-objective learning to re-rank approach to optimize online marketplaces for multiple stakeholders</article-title>
          ,
          <source>arXiv preprint arXiv:1708.00651</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Ö. Sürer</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Burke</surname>
            ,
            <given-names>E. C.</given-names>
          </string-name>
          <string-name>
            <surname>Malthouse</surname>
          </string-name>
          ,
          <article-title>Multistakeholder recommendation with provider constraints</article-title>
          ,
          <source>in: Proceedings of the 12th ACM Conference on Recommender Systems</source>
          , ACM,
          <year>2018</year>
          , pp.
          <fpage>54</fpage>
          -
          <lpage>62</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>W.</given-names>
            <surname>Black</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ilhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Marchini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Markeviciute</surname>
          </string-name>
          ,
          <article-title>Adaptex: A self-service contextual bandit platform</article-title>
          ,
          <source>in: Proceedings of the 17th ACM Conference on Recommender Systems</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>839</fpage>
          -
          <lpage>842</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>E.</given-names>
            <surname>Ie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Navrekar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Wu</surname>
          </string-name>
          , H.-T. Gao,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chandra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Boutilier</surname>
          </string-name>
          ,
          <article-title>Recsim: A configurable simulation platform for recommender systems</article-title>
          , arXiv preprint arXiv:
          <year>1909</year>
          .
          <volume>04847</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>J.</given-names>
            <surname>Demšar</surname>
          </string-name>
          ,
          <article-title>Statistical comparisons of classifiers over multiple data sets</article-title>
          ,
          <source>J. Mach. Learn. Res</source>
          .
          <volume>7</volume>
          (
          <issue>2006</issue>
          )
          <fpage>1</fpage>
          -
          <lpage>30</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>M.</given-names>
            <surname>Dudík</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Langford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Doubly robust policy evaluation and learning</article-title>
          ,
          <source>ICML'11</source>
          ,
          <string-name>
            <surname>Omnipress</surname>
            , Madison,
            <given-names>WI</given-names>
          </string-name>
          , USA,
          <year>2011</year>
          , p.
          <fpage>1097</fpage>
          -
          <lpage>1104</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Swaminathan</surname>
          </string-name>
          , T. Joachims,
          <article-title>The self-normalized estimator for counterfactual learning</article-title>
          ,
          <source>advances in neural information processing systems</source>
          <volume>28</volume>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>R.</given-names>
            <surname>Burke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Sonboli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ordoñez-Gauger</surname>
          </string-name>
          ,
          <article-title>Balanced neighborhoods for multi-sided fairness in recommendation</article-title>
          , in: Conference on Fairness, Accountability and Transparency,
          <string-name>
            <surname>PMLR</surname>
          </string-name>
          ,
          <year>2018</year>
          , pp.
          <fpage>202</fpage>
          -
          <lpage>214</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>