<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">A Hybrid Meta-Learning and MAB Approach for Context-Specific Multi-Objective Recommendation Optimization</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Tiago</forename><surname>Cunha</surname></persName>
							<email>tsacunha@expediagroup.com</email>
							<affiliation key="aff0">
								<orgName type="department">Group</orgName>
								<address>
									<settlement>Expedia</settlement>
									<country key="PT">Portugal</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Andrea</forename><surname>Marchini</surname></persName>
							<email>amarchini@expediagroup.com</email>
							<affiliation key="aff1">
								<orgName type="laboratory">Expedia Group</orgName>
								<address>
									<country key="GB">United Kingdom</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff2">
								<orgName type="department" key="dep1">18th</orgName>
								<orgName type="department" key="dep2">ACM Conference on Recommender Systems (RecSys)</orgName>
								<address>
									<postCode>2024</postCode>
									<settlement>Bari</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">A Hybrid Meta-Learning and MAB Approach for Context-Specific Multi-Objective Recommendation Optimization</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">B207510EB70EC56E9E741A29BFA14BA4</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:44+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Multi-Stakeholder</term>
					<term>Multi-Armed bandits</term>
					<term>Meta-Learning</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Recommender systems in online marketplaces face the challenge of balancing multiple objectives to satisfy various stakeholders, including customers, providers, and the platform itself. This paper introduces Juggler-MAB, a hybrid approach that combines meta-learning with Multi-Armed Bandits (MAB) to address the limitations of existing multi-stakeholder recommendation systems. Our method extends the Juggler framework, which uses meta-learning to predict optimal weights for utility and compensation adjustments, by incorporating a MAB component for real-time, context-specific refinements. We present a two-stage approach where Juggler provides initial weight predictions, followed by MAB-based adjustments that adapt to rapid changes in user behavior and market conditions. Our system leverages contextual features such as device type and brand to make fine-grained weight adjustments based on specific segments. To evaluate our approach, we developed a simulation framework using a dataset of 0.6 million searches from Expedia's lodging booking platform. Results show that Juggler-MAB outperforms the original Juggler model across all metrics, with NDCG improvements of 2.9%, a 13.7% reduction in regret, and a 9.8% improvement in best arm selection rate.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Recommender systems often focus solely on user satisfaction. However, in many real-world applications, particularly in online marketplaces, multiple stakeholders' interests need to be considered <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref>. These stakeholders typically include users (customers), item providers (e.g., hotel owners), and the platform. Multi-stakeholder recommenders aim to balance these diverse and often conflicting objectives <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4]</ref>.</p><p>The Juggler framework <ref type="bibr" target="#b4">[5]</ref> was introduced to address this multi-stakeholder recommendation problem by using meta-learning <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b6">7]</ref> to predict optimal weights for utility and compensation adjustments in real-time scoring. Deployed in production, Juggler has been an integral part of the Lodging Ranking stack at Expedia. However, Juggler's reliance on a pre-configured set of five options for relevance and compensation limits its ability to fine-tune recommendations for specific contexts. Additionally, its infrequent training cycles make it less responsive to rapid changes in traffic patterns across segments.</p><p>To address these limitations, we propose a two-step approach that combines meta-learning (Juggler) with Multi-Armed Bandits for multi-stakeholder recommendations for real-time weight adjustments. This approach aims to: 1) provide more granular weight adjustments based on specific segments (e.g., device type, brand) and 2) adapt quickly to changes in traffic patterns without requiring frequent retraining of the main Juggler model. Our research questions are:</p><p>• Can the integration of MAB with Juggler improve the performance and adaptability of multistakeholder recommendations in online marketplaces? • Are contextual features useful to improve the MAB's effectiveness at making the right decisions?</p><p>The rest of this paper is organized as follows: Section 2 presents the related work, while Section 3 introduces the proposed hybrid solution. Section 4 covers the experimental setup used to validate the proposal and while Section 5 reports the results to the research questions. Lastly, Section 6 highlights the main conclusions and avenues for future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Background and Related Work</head><p>The Juggler framework <ref type="bibr" target="#b4">[5]</ref> was introduced as a meta-learning approach to address the multi-stakeholder recommendation problem. It dynamically predicts the ideal weights for utility (user relevance) and compensation (platform revenue) for each search query. The meta-model leverages a collection of historical search queries and learns the mapping between the search context and the ideal utility and compensation weights, learned via offline simulations. Juggler selects from five pre-configured options, each representing a different balance between relevance and compensation: 1) Lower relevance, lower compensation, 2) Lower relevance, higher compensation, 3) Neutral relevance, neutral compensation, 4) Higher relevance, lower compensation and 5) Higher relevance, higher compensation. The preconfigured options refer to sections of the search space which are explored to identify different directions of improvement, while reducing the number of options to ultimately choose from. It is noteworthy that although the pre-configured options are fixed, the actual instantiation of weights for each option depends on the ranking problem characteristics and Juggler framework hyper-parameters. While Juggler has shown success in production, its reliance on these fixed options and infrequent training cycles limits its adaptability to rapid changes in user behavior and market conditions.</p><p>Multi-Armed Bandits (MAB) are a class of reinforcement learning algorithms that balance exploration and exploitation in decision-making processes <ref type="bibr" target="#b7">[8]</ref>. In the context of recommenders, MABs have been used to address the exploration-exploitation dilemma and to adapt to changing user preferences <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b9">10]</ref>.</p><p>The integration of meta-learning and bandit algorithms has been explored in other domains, such as algorithm selection <ref type="bibr" target="#b10">[11]</ref> and hyperparameter optimization <ref type="bibr" target="#b11">[12]</ref>. Our work extends these ideas to the realm of multi-stakeholder recommendations, addressing the unique challenges of online marketplaces.</p><p>Several studies have addressed the challenge of balancing multiple objectives in recommender systems. Rodriguez et al. <ref type="bibr" target="#b12">[13]</ref> proposed a multi-objective optimization approach for job recommendations. Nguyen et al. <ref type="bibr" target="#b13">[14]</ref> introduced a multi-objective learning to re-rank approach to optimize online marketplaces for multiple stakeholders. Sürer et al. <ref type="bibr" target="#b14">[15]</ref> explored multi-stakeholder recommendation with provider constraints. These approaches provide valuable insights into balancing multiple objectives, but our proposed method aims to extend their capabilities by combining meta-learning with multi-armed bandits for enhanced adaptability in dynamic online marketplaces.</p><p>Recent developments in industry have led to the creation of self-service platforms for deploying contextual bandits, such as AdaptEx <ref type="bibr" target="#b15">[16]</ref>. These platforms provide powerful tools for optimizing user experiences at scale, which we leverage in our hybrid approach to combine the strengths of meta-learning and MAB algorithms. To evaluate our approach, we utilized a custom simulation framework based on real-world data from an online travel marketplace. This allowed us to assess the performance of our system in a controlled yet realistic setting, similar to other sophisticated simulation environments <ref type="bibr" target="#b16">[17]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Juggler with MAB</head><p>We present a hybrid approach that combines the Juggler framework's meta-learning capabilities <ref type="bibr" target="#b4">[5]</ref> with a MAB system powered by the AdaptEx SDK <ref type="bibr" target="#b15">[16]</ref>. This approach, which we call "Juggler-MAB" aims to address the limitations of the original Juggler system while leveraging the adaptive capabilities of contextual bandits. The Juggler-MAB system operates in two stages:</p><p>1. Juggler Stage: The meta-learning model predicts initial utility and compensation weights based on search context.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">MAB Stage:</head><p>A contextual MAB refines these weights in real-time based on user interactions and search features.</p><p>The Juggler framework selects from five pre-configured options for utility and compensation weights, providing a coarse adjustment of the recommendation strategy based on the search context. These options range from lower relevance and compensation to higher relevance and compensation, as described in <ref type="bibr" target="#b4">[5]</ref> and aim to tackle the main issues in multi-objective optimization.</p><p>The MAB component introduces fine-grained adjustments to the Juggler-predicted weights. Each arm of the bandit represents a small corrective measure to be applied to the utility and compensation weights to improve relevance.</p><p>The key features of our MAB implementation include:</p><p>1. Contextual arms: The contextual bandits consider contextual features (e.g., device type, brand) when selecting arms. 2. Reward function: We use Normalized Discounted Cumulative Gain (NDCG) as a proxy for Conversion Rate, allowing for offline simulation and evaluation. 3. Exploration strategy: We employ epsilon-greedy and Thompson Sampling for its ability to balance exploration and exploitation effectively <ref type="bibr" target="#b7">[8]</ref>.</p><p>The integration of Juggler and MAB is achieved through an additive approach in the scoring function:</p><formula xml:id="formula_0">𝑠𝑜𝑟𝑡𝑆𝑐𝑜𝑟𝑒 =(𝑤 𝐽 𝑢𝑔𝑔𝑙𝑒𝑟 𝑢𝑡𝑖𝑙𝑖𝑡𝑦 + 𝑤 𝑀𝐴𝐵 𝑢𝑡𝑖𝑙𝑖𝑡𝑦 ) ⋅ 𝑢𝑡𝑖𝑙𝑖𝑡𝑦𝑆𝑐𝑜𝑟𝑒 + (𝑤 𝐽 𝑢𝑔𝑔𝑙𝑒𝑟 𝑐𝑜𝑚𝑝 + 𝑤 𝑀𝐴𝐵 𝑐𝑜𝑚𝑝 ) ⋅ 𝑐𝑜𝑚𝑝𝑒𝑛𝑠𝑎𝑡𝑖𝑜𝑛𝑆𝑐𝑜𝑟𝑒<label>(1)</label></formula><p>where 𝑤</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>𝐽 𝑢𝑔𝑔𝑙𝑒𝑟</head><p>𝑢𝑡𝑖𝑙𝑖𝑡𝑦 and 𝑤</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>𝐽 𝑢𝑔𝑔𝑙𝑒𝑟 𝑐𝑜𝑚𝑝</head><p>are the weights predicted by Juggler, and 𝑤 𝑀𝐴𝐵 𝑢𝑡𝑖𝑙𝑖𝑡𝑦 and 𝑤 𝑀𝐴𝐵 𝑐𝑜𝑚𝑝 are the corrective weights determined by the MAB.</p><p>We formulate our contextual MAB problem as follows: let 𝒜 be the set of arms, where each arm 𝑎 ∈ 𝒜 represents a pair of corrective weights (𝑤 𝑀𝐴𝐵 𝑢𝑡𝑖𝑙𝑖𝑡𝑦 , 𝑤 𝑀𝐴𝐵 𝑐𝑜𝑚𝑝𝑒𝑛𝑠𝑎𝑡𝑖𝑜𝑛 ). The context 𝑥 𝑡 ∈ 𝒳 at time 𝑡 includes features such as device or brand. The reward 𝑟 𝑡 is defined as the NDCG of the resulting ranking. The goal is to find a policy 𝜋 ∶ 𝒳 → 𝒜 that maximizes the expected cumulative reward:</p><formula xml:id="formula_1">max 𝜋 𝔼 [ 𝑇 ∑ 𝑡=1 𝑟 𝑡 (𝑥 𝑡 , 𝜋(𝑥 𝑡 ))] (<label>2</label></formula><formula xml:id="formula_2">)</formula><p>where 𝑇 is the time horizon. We explored various methods to combine Juggler's predictions with MAB corrections, ultimately settling on the additive approach described above. We carefully selected contextual variables that would help identify under-performing segments in the Juggler model, such as device type and brand. Balancing multiple objectives in a single reward function required careful consideration. We chose NDCG as an initial approach due to its widely accepted usage, with plans to explore more complex multi-objective reward functions in future work.</p><p>To evaluate our hybrid approach, we developed a custom simulator that allows us to test various configurations offline using historical data. The simulator, built on Expedia data, enables to:</p><p>1. Replay historical searches and user interactions. Data is loaded on a daily basis, consisting of data for each property in each search and the respective user clicks and bookings. 2. Apply the Juggler-MAB model to generate new rankings. The MAB is sampled (ppotentially using contextual data) and the retrieved arm is included in the ranking formula, yielding the simulated score and the final ranking. 3. Evaluate the performance using both immediate (e.g., clicks) and delayed (e.g., bookings) feedback.</p><p>The reward function evaluates the simulated rankings and information about the arm sampled, reward and contextual information (if any) is provided to the MAB, to update its internal state.</p><p>The simulation framework provides a safe environment to test and refine our approach before considering online deployment.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experimental Setup</head><p>We used a dataset of 0.6 million searches from Expedia's lodging booking platform, covering a period of 31 consecutive days. The data has over 600000 distinct properties across approximately 41000 distinct destinations, with feedback sparsity over 96%. The dataset includes features such as device type, brand, destination, and historical user interactions.</p><p>We compared several variants of the proposed Juggler-MAB hybrid approach against the original Juggler model <ref type="bibr" target="#b4">[5]</ref>. We tested several MAB algorithms, ranging from classical (i.e. no contextual features) to contextual bandits:</p><p>• Gaussian Thompson (GT): a classical bandit using Thompson Sampling assuming a Gaussian Distribution of reward value. • 𝜖-greedy: a classical bandit using a vanilla implementation of the canonical algorithm. We have used 𝜖 = 0.1 and 𝜖 = 0.3. • Recursive Least Squares with Thompson Sampling (RLS): a contextual bandit using a linear model with a vector of means and a matrix of variances-covariances.</p><p>The experiments use the actual production Juggler model predictions for each search. This improves the reliability of Juggler's predictions, which in turn leads to more robust estimates of the MAB's effect. We then implemented the MAB component using the AdaptEx SDK <ref type="bibr" target="#b15">[16]</ref>, with the following configuration:</p><p>• Arm space: we explore 3 different values for each arm, respectively 𝑤 𝑀𝐴𝐵 𝑢𝑡𝑖𝑙𝑖𝑡𝑦 ∈ {−0.3, 0.0, 0.3} and 𝑤 𝑀𝐴𝐵 𝑐𝑜𝑚𝑝 ∈ {−0.2, 0.0, 0.2}. The selected weights are determined via domain knowledge, also ensuring non-zero weights.</p><p>• Contextual features: several low cardinality categorical search features were tested, with 3 being identified as the most important: brand, user device and geographical categorization of the search destination, e.g. neighborhood vs city. • Exploration strategy: Thompson Sampling and 𝜖-greedy • Reward: Normalized Discounted Cumulative Gain (NDCG), to determine how well can MAB algorithms correct towards relevance and expected conversion rate improvement.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Results and Discussion</head><p>Table <ref type="table" target="#tab_0">1</ref> summarizes the main results. For each bandit, we report the average reward, regret and the percentage of best arm selections across all searches. The best results per metric are highlighted in bold. Notice that regret is best when lowest, the remaining metrics are better when maximized. Aggregated reward, regret and the percentage of best arm selections results for all bandits and baselines.</p><p>Our Juggler-MAB hybrid approach outperformed the Juggler baseline across all metrics for all bandits proposed. The NDCG improvements range from +0.8% for GT bandit, all the way to +2.9% in several RLS bandits. In terms of regret, we achieve a reduction of 13.7% and an improvement in best arm selection rate of 9.8%.</p><p>The 𝜖-greedy algorithms provide very strong baselines, especially when 𝜖 = 0.1. GT bandit is clearly the worst bandit, but yet useful since it outperforms the baseline. Among the contextual bandits, the best one across all metrics is the 𝑅𝐿𝑆 𝑏𝑟𝑎𝑛𝑑 . Interestingly, when using more contextual features, we did not achieve better performance. Further investigations are required to identify what matters to define the context.</p><p>We performed Wilcoxon signed-rank tests and observed no statistical difference between all RLS bandits. The Critical Difference <ref type="bibr" target="#b17">[18]</ref> diagram for the remaining bandits is shown in Figure <ref type="figure" target="#fig_0">1</ref>. The results show no statistical significant difference between RLS and 𝜖 = 0.1, hinting that contextual features are not meaningful. However, all RLS bandits are better than the baselines. To note as well how all bandits are better than the Juggler baseline -this is a testament to the value of hybrid approach proposed. Figure <ref type="figure" target="#fig_1">2</ref> shows the learning dynamics for all bandits across all days in the data sample. To improve interpretation, we include only the best contextual bandit. The Juggler-MAB demonstrated fast adaptation to changing conditions. We observed that the MAB component was able to make fine-grained adjustments to the Juggler predictions, resulting in improved performance.</p><p>We inspect now Juggler-MAB's effect on lodging ranking top-10 average statistics in Table <ref type="table" target="#tab_1">2</ref>. The results are reported as differences to the Juggler baseline, as we cannot expose the sensitive raw data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Metric</head><p>𝜖-greedy (0. The results show a clear pattern for all bandits: average daily price decreases and guest and star ratings increase as NDCG improves. On the contrary, margin % and margin $ decreases, which could pose problems to the marketplace objectives and long term health. The expectations, to be validated via AB test, is that the increase in relevance will lead to an improvement in conversion rate which can offset the impact in profit per transaction.</p><p>Diving now deeper into the arms selection per bandit, we present Figure <ref type="figure" target="#fig_2">3</ref>. The results show a clear and expected preference towards arms lower compensation weights, as they are not aligned with the NDCG reward. However, it is interesting to observe that the best bandit has learned that not only is it ideal to decrease compensation, but also to increase or decrease relevance depending on the context.</p><p>Despite the overall positive results, we identified two limitations. First, the reward function considers only a single dimension of the problem (i.e. relevance), which explains the impact to the compensation component. Future work will address this limitation by using multi-objective optimization techniques <ref type="bibr" target="#b12">[13]</ref>. Second, our current simulations use historical interactions with a deterministic logging policy, introducing bias. To address it, we will implement off-policy evaluation techniques <ref type="bibr" target="#b18">[19,</ref><ref type="bibr" target="#b19">20]</ref>   </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion and Future Work</head><p>In this paper, we presented a novel hybrid approach combining Meta-Learning with Multi-Armed Bandits for multi-stakeholder recommendations in online travel marketplaces. Our Juggler-MAB system demonstrated significant improvements over existing methods. Key contributions of our work include 1) an integration of meta-learning and contextual bandits for recommendation systems and 2) empirical evidence of the effectiveness of our approach in a large-scale, real-world setting. Based on our findings and the limitations identified, we propose the following directions for future research:</p><p>1. Online testing: Conduct A/B tests in a production environment to validate the performance of Juggler-MAB under real-world conditions and user behaviors 2. Dynamic arm space: Explore methods for dynamically adjusting the arm space of the MAB component based on observed performance and changing market conditions. 3. Fairness considerations: Incorporate explicit fairness constraints or objectives into the MAB formulation to ensure equitable treatment of different provider segments <ref type="bibr" target="#b20">[21]</ref> 4. Long-term value optimization: Extend the approach to consider long-term user value, potentially using reinforcement learning techniques for sequential decision-making.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Critical Difference diagram shows superiority of bandits over multiple baselines, including simpler bandits.</figDesc><graphic coords="5,159.40,267.26,276.48,53.28" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Multiple metrics per bandit over time.</figDesc><graphic coords="6,77.32,65.61,440.64,245.81" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Arm pulls per bandit over time.</figDesc><graphic coords="6,177.60,349.86,240.08,153.97" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc></figDesc><table><row><cell>Bandit</cell><cell cols="3">avg(reward) avg(regret) best arm %</cell></row><row><cell>Juggler</cell><cell>0.1776</cell><cell>0.0373</cell><cell>0.7515</cell></row><row><cell>GT</cell><cell>0.1791</cell><cell>0.0358</cell><cell>0.7866</cell></row><row><cell>𝜖-greedy (0.3)</cell><cell>0.1811</cell><cell>0.0339</cell><cell>0.8095</cell></row><row><cell>𝜖-greedy (0.1)</cell><cell>0.1824</cell><cell>0.0325</cell><cell>0.8218</cell></row><row><cell>𝑅𝐿𝑆 𝑏𝑟𝑎𝑛𝑑</cell><cell>0.1827</cell><cell>0.0322</cell><cell>0.8252</cell></row><row><cell>𝑅𝐿𝑆 𝑑𝑒𝑣𝑖𝑐𝑒</cell><cell>0.1822</cell><cell>0.0327</cell><cell>0.8200</cell></row><row><cell>𝑅𝐿𝑆 𝑔𝑒𝑜</cell><cell>0.1825</cell><cell>0.0325</cell><cell>0.8228</cell></row><row><cell>𝑅𝐿𝑆 𝑔𝑒𝑜, 𝑏𝑟𝑎𝑛𝑑</cell><cell>0.1827</cell><cell>0.0323</cell><cell>0.8246</cell></row><row><cell>𝑅𝐿𝑆 𝑑𝑒𝑣𝑖𝑐𝑒, 𝑏𝑟𝑎𝑛𝑑</cell><cell>0.1827</cell><cell>0.0322</cell><cell>0.8228</cell></row><row><cell>𝑅𝐿𝑆 𝑔𝑒𝑜, 𝑑𝑒𝑣𝑖𝑐𝑒</cell><cell>0.1827</cell><cell>0.0322</cell><cell>0.8247</cell></row><row><cell>𝑅𝐿𝑆 𝑔𝑒𝑜, 𝑑𝑒𝑣𝑖𝑐𝑒, 𝑏𝑟𝑎𝑛𝑑</cell><cell>0.1826</cell><cell>0.0323</cell><cell>0.8246</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Differences in several metrics in top-10 positions.</figDesc><table><row><cell></cell><cell></cell><cell>3) 𝜖-greedy (0.1)</cell><cell>RLS</cell></row><row><cell>daily price</cell><cell>-0.7278</cell><cell>-0.8324</cell><cell>-0.8595</cell></row><row><cell>guest rating</cell><cell>0.0416</cell><cell>0.0572</cell><cell>0.0604</cell></row><row><cell>star rating</cell><cell>0.0499</cell><cell>0.0747</cell><cell>0.0796</cell></row><row><cell>margin %</cell><cell>-0.0034</cell><cell>-0.0045</cell><cell>-0.0048</cell></row><row><cell>margin $</cell><cell>-0.6285</cell><cell>-0.8222</cell><cell>-0.8633</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Multistakeholder recommendation: Survey and research directions</title>
		<author>
			<persName><forename type="first">H</forename><surname>Abdollahpouri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Adomavicius</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Burke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Guy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Jannach</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Kamishima</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Krasnodebski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Pizzato</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">User Modeling and User-Adapted Interaction</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<biblScope unit="page" from="127" to="158" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Multi-stakeholder recommendation and its connection to multi-sided fairness</title>
		<author>
			<persName><forename type="first">H</forename><surname>Abdollahpouri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Burke</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Conjunction with the 13th ACM Conference on Recommender Systems, RecSys&apos;19</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note>Workshop on Recommendation in Multi-stakeholder Environments (RMSE&apos;19)</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Recommendations in a marketplace</title>
		<author>
			<persName><forename type="first">R</forename><surname>Mehrotra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Carterette</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 13th ACM Conference on Recommender Systems</title>
				<meeting>the 13th ACM Conference on Recommender Systems</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="580" to="581" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Recommendations with a purpose</title>
		<author>
			<persName><forename type="first">D</forename><surname>Jannach</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Adomavicius</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 10th ACM Conference on Recommender Systems</title>
				<meeting>the 10th ACM Conference on Recommender Systems</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="7" to="10" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Juggler: Multi-stakeholder ranking with meta-learning</title>
		<author>
			<persName><forename type="first">T</forename><surname>Cunha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Partalas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Nguyen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the MORS workshop at the 15th ACM Conference on Recommender Systems, CEUR Workshop Proceedings</title>
				<meeting>the MORS workshop at the 15th ACM Conference on Recommender Systems, CEUR Workshop Proceedings</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Metalearning and recommender systems: A literature review and empirical study on the algorithm selection problem for collaborative filtering</title>
		<author>
			<persName><forename type="first">T</forename><surname>Cunha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Soares</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">C</forename><surname>De Carvalho</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Information Sciences</title>
		<imprint>
			<biblScope unit="volume">423</biblScope>
			<biblScope unit="page" from="128" to="144" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Cf4cf: Recommending collaborative filtering algorithms using collaborative filtering</title>
		<author>
			<persName><forename type="first">T</forename><surname>Cunha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Soares</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">C</forename><surname>De Carvalho</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">RecSys 2018 -Proceedings of the 12th ACM Conference on Recommender Systems</title>
				<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="357" to="361" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Lattimore</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Szepesvári</surname></persName>
		</author>
		<title level="m">Bandit algorithms</title>
				<imprint>
			<publisher>Cambridge University Press</publisher>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">A contextual-bandit approach to personalized news article recommendation</title>
		<author>
			<persName><forename type="first">L</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Langford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">E</forename><surname>Schapire</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 19th international conference on World wide web</title>
				<meeting>the 19th international conference on World wide web</meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="661" to="670" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Factorization bandits for interactive recommendation</title>
		<author>
			<persName><forename type="first">H</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the AAAI Conference on Artificial Intelligence</title>
				<meeting>the AAAI Conference on Artificial Intelligence</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">31</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Sequential model-based optimization for general algorithm configuration</title>
		<author>
			<persName><forename type="first">F</forename><surname>Hutter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">H</forename><surname>Hoos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Leyton-Brown</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International conference on learning and intelligent optimization</title>
				<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="507" to="523" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Bohb: Robust and efficient hyperparameter optimization at scale</title>
		<author>
			<persName><forename type="first">S</forename><surname>Falkner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Klein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Hutter</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Machine Learning</title>
				<meeting><address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="1437" to="1446" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Multiple objective optimization in recommender systems</title>
		<author>
			<persName><forename type="first">M</forename><surname>Rodriguez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Posse</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Zhang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the sixth ACM conference on Recommender systems</title>
				<meeting>the sixth ACM conference on Recommender systems</meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="11" to="18" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dines</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Krasnodebski</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1708.00651</idno>
		<title level="m">A multi-objective learning to re-rank approach to optimize online marketplaces for multiple stakeholders</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Multistakeholder recommendation with provider constraints</title>
		<author>
			<persName><forename type="first">Ö</forename><surname>Sürer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Burke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">C</forename><surname>Malthouse</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 12th ACM Conference on Recommender Systems</title>
				<meeting>the 12th ACM Conference on Recommender Systems</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="54" to="62" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Adaptex: A self-service contextual bandit platform</title>
		<author>
			<persName><forename type="first">W</forename><surname>Black</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Ilhan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Marchini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Markeviciute</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 17th ACM Conference on Recommender Systems</title>
				<meeting>the 17th ACM Conference on Recommender Systems</meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="839" to="842" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<author>
			<persName><forename type="first">E</forename><surname>Ie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Jain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Navrekar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Agarwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H.-T</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Chandra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Boutilier</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1909.04847</idno>
		<title level="m">Recsim: A configurable simulation platform for recommender systems</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Statistical comparisons of classifiers over multiple data sets</title>
		<author>
			<persName><forename type="first">J</forename><surname>Demšar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. Mach. Learn. Res</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="1" to="30" />
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Doubly robust policy evaluation and learning</title>
		<author>
			<persName><forename type="first">M</forename><surname>Dudík</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Langford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Li</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICML&apos;11</title>
				<meeting><address><addrLine>Madison, WI, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Omnipress</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="1097" to="1104" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">The self-normalized estimator for counterfactual learning</title>
		<author>
			<persName><forename type="first">A</forename><surname>Swaminathan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Joachims</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">advances in neural information processing systems</title>
		<imprint>
			<biblScope unit="volume">28</biblScope>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Balanced neighborhoods for multi-sided fairness in recommendation</title>
		<author>
			<persName><forename type="first">R</forename><surname>Burke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Sonboli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ordoñez-Gauger</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Conference on Fairness, Accountability and Transparency, PMLR</title>
				<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="202" to="214" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
