<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Enhancing Prediction Models with Reinforcement Learning</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Karol</forename><surname>Radziszewski</surname></persName>
							<email>karol.radziszewski@ringieraxelspringer.pl</email>
							<affiliation key="aff0">
								<orgName type="department">Ringier Axel Springer Polska</orgName>
								<address>
									<settlement>Warsaw, Kraków</settlement>
									<country key="PL">Poland</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">Warsaw University of Technology</orgName>
								<address>
									<settlement>Warsaw</settlement>
									<country key="PL">Poland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Piotr</forename><surname>Ociepka</surname></persName>
							<email>piotr.ociepka@ringieraxelspringer.pl</email>
							<affiliation key="aff0">
								<orgName type="department">Ringier Axel Springer Polska</orgName>
								<address>
									<settlement>Warsaw, Kraków</settlement>
									<country key="PL">Poland</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Enhancing Prediction Models with Reinforcement Learning</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">0CF17538B53FA24976683E875A705B79</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:29+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Personalization</term>
					<term>News recommendations</term>
					<term>Reinforcement Learning</term>
					<term>Deep Learning</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>We present a large-scale news recommendation system implemented at Ringier Axel Springer Polska, focusing on enhancing prediction models with reinforcement learning techniques. The system, named Aureus, integrates a variety of algorithms, including multi-armed bandit methods and deep learning models based on large language models (LLMs). We detail the architecture and implementation of Aureus, emphasizing the significant improvements in online metrics achieved by combining ranking prediction models with reinforcement learning. The paper further explores the impact of different models mixing on key business performance indicators. Our approach effectively balances the need for personalized recommendations with the ability to adapt to rapidly changing news content, addressing common challenges such as the cold start problem and content freshness. The results of online evaluation demonstrate the effectiveness of the proposed system in a real-world production environment.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Ringier Axel Springer Polska is among the largest media companies in Poland, operating the news website www.onet.pl. Onet.pl attracts approximately 6 million unique users monthly, representing about 20% of the Polish internet users <ref type="bibr" target="#b0">[1]</ref>. According to SimilarWeb, Onet is the 18th largest news website globally <ref type="bibr" target="#b1">[2]</ref>. Our recommendation system, called Aureus, processes over a thousand requests per second, necessitating low latency to ensure users experience minimal wait times for website loading.</p><p>Aureus comprises a variety of recommendation components, including user segmentation and reinforcement learning (popularity-based component). Over the past year, we implemented additional modules responsible for content similarity and deep learning models based on on large language models (LLM) to capture individual preferences.</p><p>In this article, we focus on describing the architecture of a real-world large-scale news recommendation system, in particular:</p><p>• we show that combining ranking prediction models with reinforcement learning significantly improves online metrics, • we further analyze different aspects of model configuration and training objectives concerning multiple business KPIs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>The complexity of news recommendation systems exceeds that of other systems due to the rapid data updates and content obsolescence, with thousands of articles published daily. A key challenge is addressing cold start users, as many visitors rely on cookie IDs without logging in.</p><p>Traditional collaborative filtering methods, such as matrix factorization <ref type="bibr" target="#b2">[3]</ref>, face difficulties due to the cold start problem, requiring multiple observations for each user. To overcome this, models using external features have been proposed. Wang et al. introduced RippleNet, a deep learning model that leverages external data via knowledge graphs, enabling recommendations with minimal previous users' interactions <ref type="bibr" target="#b3">[4]</ref>.</p><p>Reinforcement learning, particularly multi-armed bandit algorithms, is another solution to cold start. This approach has been successfully used in our systems for several years <ref type="bibr" target="#b4">[5]</ref>.</p><p>The subsequent methodology of modelling recommendation systems, content-based filtering is crucial, particularly in news recommendations. Recent Natural Language Processing (NLP) advances, such as pretrained models like GPT <ref type="bibr" target="#b5">[6]</ref> and PolBERT <ref type="bibr" target="#b6">[7]</ref>, have enhanced the generation of personalized recommendations through embeddings.</p><p>However, these models are often large and costly. Wu et al. addressed this by introducing NewsBert <ref type="bibr" target="#b7">[8]</ref>, a distilled version of BERT tailored for the news domain, reducing model size and complexity.</p><p>The core aim of recommendation systems is to boost user satisfaction, often measured by clicks or time spent on the platform. While click modeling is straightforward, time-based metrics are more complex. Covington et al. proposed a method to weight clicks based on time spent, implemented in YouTube's system <ref type="bibr" target="#b8">[9]</ref>.</p><p>Our approach uniquely integrates bandit algorithms with traditional ranking models, creating an adaptive news recommendation engine that combines the strengths of both multi-armed bandits and deep learning models within a unified architecture.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Proposed Approach</head><p>Over time, Aureus has expanded to incorporate a variety of recommendation algorithms and methods, each characterized by distinct capabilities and limitations. This section provides a detailed overview of several of these methods, followed by the introduction of a novel approach for aggregating multiple recommendations into a unified output. This approach leverages the unique strengths of each constituent algorithm while effectively mitigating the specific drawbacks associated with individual methods.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Reinforcement Learning</head><p>The initial application of Aureus was to automate the curation process for the Onet.pl news feed. The method required the capability to rapidly collect user feedback, identify both short-and long-term popularity trends, and recommend content that was both highly popular and engaging. Additionally, the system needed to adapt to emerging articles as well as those experiencing a decline in user engagement over time. Given the primary objective of automating the existing editorial workflow, the recommendations were designed to be population-wide, independent of individual user preferences or tastes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.1.">Multi-armed Bandits</head><p>Considering the outlined requirements, we selected multi-armed bandit algorithms as the foundation of our approach. This class of methods is particularly suited for balancing the trade-off between exploration -acquiring knowledge regarding each article's performance and popularity -and exploitationrecommending the highest-performing content. Moreover, multi-armed bandit algorithms possess the capability to optimize a wide range of business-related Key Performance Indicators (KPIs), including both continuous and discrete metrics. This flexibility makes them an ideal choice for the dynamic and demanding environment of the publishing industry. Following extensive offline and online evaluations, we identified Upper Confidence Bound <ref type="bibr" target="#b9">[10]</ref> and Thompson Sampling <ref type="bibr" target="#b10">[11]</ref> as the most effective bandit methods for this application.</p><p>Nevertheless, the exclusion of individual user preferences emerged as a significant limitation of the selected approach. To overcome this constraint, while preserving the robustness, simplicity, trend-responsiveness, and cost-and time-efficiency of the bandit-based recommender system, we introduced the concept of user segmentation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.2.">Users Segmentation</head><p>Segmentation involves dividing the entire user population into smaller, more homogeneous groups, each consisting of users with similar tastes. By applying multi-armed bandit algorithms separately within each segment, the recommendation process remains primarily popularity-based. However, through segmentation, each user is presented with a set of articles that are most popular among individuals with comparable interests, thereby enhancing the overall user experience.</p><p>The initial approach to user segmentation was based on topic modeling <ref type="bibr" target="#b4">[5]</ref>. Specifically, each article was transformed into a simplified embedding using Latent Dirichlet Allocation (LDA). Subsequently, user interest profiles were generated by averaging the LDA embeddings of the articles read by each individual user. Then, user interest profiles were clustered using the k-Means algorithm.</p><p>Although successful and effective, this method was soon enhanced by substituting LDA modeling with Item2Vec embeddings <ref type="bibr" target="#b11">[12]</ref>. This modification significantly simplified and accelerated the segmentation process by eliminating the need for text analysis, thereby rendering the method language-agnostic. Consequently, this improvement allows for the deployment of Aureus across digital publishers regardless of the language in which they publish. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Prediction Model</head><p>Our models are based on articles that a user has read within the last N days. In our experiments, we use an arbitrary value of N = 30 days. We calculate user representation by averaging the embeddings of these articles, created by already pretrained LLM PolBert model <ref type="bibr" target="#b6">[7]</ref>. Subsequently, we develop two types of models:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.1.">Similarity Model</head><p>A simple model that compares user embedding to article embeddings using cosine similarity. This was our initial approach.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.2.">Deep Model</head><p>We created a trainable model that was trained on user clicks as a target variable. Given our large and imbalanced dataset, we sampled an equal number of clicked and unclicked articles to ensure balanced data for evaluation. Using neural network architecture drawn in Figure <ref type="figure" target="#fig_1">2</ref>, we seamlessly integrated additional features into our recommender system, such as article length and other parameters. Since our business KPI is a continuous variable, we also trained models with clicks weighted by this KPI, similar to the approach described in <ref type="bibr" target="#b8">[9]</ref>. Weighting by the business KPI resulted in an increase in this KPI in online tests.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Model Ensemble Architecture</head><p>We previously outlined two key components of our recommendation system. The reinforcement learning module identifies popular and trending articles, while the prediction model captures individual user preferences. For optimal user satisfaction, the recommendation system must integrate both aspects. Relying solely on a popularity-based model neglects individual user preferences, whereas a user-preference model may overlook trending articles, which are crucial in the news domain.</p><p>We evaluated several methods for combining multiple recommendations, two of which advanced to the online testing phase and are now employed in daily operations:</p><p>• Proportional Random Mixer -In this approach, each recommendation method is assigned a target proportion within the final content set (e.g., 40% of recommendations from a bandit algorithm and 60% from a deep learning model). For the k-th position in the final recommendation list, an article is selected randomly from the k-th positions of the input recommendations, with the selection probability proportional to the assigned target share. • Weighted Average Mixer -In this method, each content item from the input recommendations is associated with a score from the corresponding model. These scores are normalized to the range [0.0, 1.0] to ensure equity. Each content item is then assigned a new score, which is a weighted average of the scores from the input models, and the final recommendation list is ordered based on these new scores.</p><p>Online testing proved that the weighted average mixer performed significantly better. Consequently, all results and conclusions presented in this paper are based on the weighted average mixer.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experimental Evaluation</head><p>In this section, we present a comprehensive evaluation of our proposed recommendation system. The evaluation is divided into three subsections: Offline Evaluation, Online Evaluation, and Results.</p><p>Offline Evaluation describes the performance metrics derived from historical data, allowing us to assess the model's predictive accuracy in a controlled environment. This process helps identify the best model for subsequent online testing. </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: The diagram of the segments calculation process.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: The diagram of the deep model architecture. Input embeddings are calculated with pretrained models.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: The diagram of the Aureus recommendation system illustrates the following components: Inputs consist of the user ID, a set of content items, and online business KPI metrics. The system integrates two submodels: a deep learning-based user interest model and a multi-armed bandit content popularity model. These submodels are combined using a specified combination strategy.</figDesc></figure>
		</body>
		<back>
			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Online Evaluation involves deploying the model in a live setting on Onet.pl, where we measure its real-time effectiveness, user engagement metrics and buisness KPI metrics.</p><p>Finally, the Results subsection synthesizes the findings from both offline and online evaluations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Offline Setup</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.1.">Baselines</head><p>Each time we develop a new architecture or introduce a new feature, we evaluate the models against at least two baselines:</p><p>• random model -If our model does not outperform random recommendations, we conclude that it is unsuited for production deployment. • current production model -Our goal is to match or exceed the results of the current production model. If the new model achieves comparable or better results, we proceed to test it in the online environment.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.2.">Offline Evaluation Metrics</head><p>We implemented both standard and custom ranking evaluation metrics on historical data. Each metric is calculated with different values of k (primarily 3, 5, 10, 15, and 30). Our goal is to train a single model that can be deployed for an extended period. Therefore, we validate our model on three different days: one day, seven days, and thirty days after training. We utilize the following metrics:</p><p>• Standard Metrics -NDCG, Precision, Recall, Coverage and AUC • Custom Metrics -We aim to optimize a continuous buisness KPI with a click prediction model, so we calculate the average value of this KPI for a given ranking at k as our custom metric. These are:</p><p>-Average Label Value -This metric considers all articles in the list.</p><p>-Average Positive Label Value -This metric considers all articles that are clicked.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Online AB Tests</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.1.">Testing Setup</head><p>A critical component of the Aureus system, alongside the recommendation models, is the A/B testing engine. This engine facilitates statistically significant and fair online testing of multiple recommendation approaches. Users are randomly and stably assigned to one of the testing variants, independent of user agent, demographic factors, or other variables that might influence the test results. During the test period, each user is exclusively presented with recommendations generated by the model associated with their assigned variant. Key performance indicator (KPI) values for content are collected and recorded according to the testing variants, enabling subsequent analysis and comparison.</p><p>It is important to note that the online tests presented in this paper were conducted on a curated sample of users and focused specifically on a designated section of the webpage (recommendations displayed beneath articles). As such, the results may not fully generalize to similar experiments conducted under different conditions or in other areas of the webpage.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.2.">Online Monitoring</head><p>When the model is deployed in a production environment, we continuously monitor its performance with respect to business KPIs and latency. We enforce a stringent latency threshold, beyond which the recommendations generated by the model would not be utilized. To track these online metrics, we employ AWS QuickSight for business-related metrics and Grafana for technical metrics.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Results</head><p>Table <ref type="table">1</ref> presents the offline performance metrics of three predictive models: the random baseline, the similarity model, and the deep learning model. The deep learning model demonstrates superior performance, surpassing the similarity model by approximately 65.7% in NDCG, around 16.3% in AvgLabelValue and around 0.9% in AvgPositiveLabelValue. This indicates that the deep learning model is more effective in terms of business (KPIs) and has been deployed in online tests as the user-to-item recommendation model. We implemented our models in two production environments: the Onet.pl homepage and article pages (with recommendations below each article). The determination of the model that achieves the status of "king of the hill" is based on the results of online testing. This approach allows for an evaluation of not only the model's performance but also its alignment with the actual needs of users. In the following section, we compare several models employed by Aureus:</p><p>• random sample from the set of articles, • Thompson Sampling bandit (our golden standard of recommenders), • Thompson Sampling bandit with user segmentation enabled, • items' cosine similarity to the currently read article, • segmented bandit mixed with item-to-item similarity model, • segmented bandit mixed with user-to-item deep model, • segmented bandit mixed with both item-to-item similarity and user-to-item deep model.</p><p>For comparison, we use two main metrics:</p><p>• uplift -the percentage difference between the average business KPI value of pieces of content returned by the tested model and that returned by the baseline model, • latency -the median response time, measured in milliseconds, of the tested model; this auxiliary metric serves as a sanity check to ensure that the news website provides users with reasonably responsive performance.</p><p>Table <ref type="table">2</ref> presents the results observed during our online testing process. The data clearly demonstrate the synergy effect of the ensembled models, which consistently outperform the individual models. It is also noteworthy that the offline evaluations differ slightly from the online test results, where the combination of similarity-based methods with bandits slightly outperformed the deep model mixed with bandits. From our experience, this discrepancy is common in the context of news and time-sensitive content, where deep models alone may struggle to capture the temporal dynamics.</p><p>In terms of latency, deep models substantially increase the response times of the Aureus recommender. However, this increase remains within acceptable limits and does not negatively impact the user experience. Furthermore, incorporating more than two models in the mixing process does not significantly extend response times.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusions and Future Work</head><p>We demonstrated an enhancement of recommendation systems by integrating multiple models into a unified architecture. This hybrid approach facilitates the seamless incorporation of new recommendation scores, enabling the modeling of diverse recommendation aspects. Future work will involve the exploration of additional features, different mixing strategies and various embedding models to further refine the system.</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<author>
			<persName><surname>Wirtualnemedia</surname></persName>
		</author>
		<ptr target="https://www.wirtualnemedia.pl/artykul/najpopularniejsze-serwisy-strony-glowne-wp-pl-onet-pl-interia-gazeta-pl" />
		<title level="m">Strony główne po zmianach w Mediapanelu. WP wyprzedziła Onet</title>
				<imprint>
			<date type="published" when="2024-08-30">2024. 2024-08-30</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><surname>Similarweb</surname></persName>
		</author>
		<ptr target="https://www.similarweb.com/top-websites/news-and-media/" />
		<title level="m">Top websites ranking. Most visited news &amp; media publishers websites</title>
				<imprint>
			<date type="published" when="2024-09-06">2024. 2024-09-06</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Collaborative filtering for implicit feedback datasets</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Koren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Volinsky</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICDM.2008.22</idno>
		<imprint>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="263" to="272" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Ripplenet: Propagating user preferences on the knowledge graph for recommender systems</title>
		<author>
			<persName><forename type="first">H</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Xie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Guo</surname></persName>
		</author>
		<idno type="DOI">10.1145/3269206.3271739</idno>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="417" to="426" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Trend-responsive user segmentation enabling traceable publishing insights. A case study of a real-world large-scale news recommendation system</title>
		<author>
			<persName><forename type="first">J</forename><surname>Misztal-Radecka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Rusiecki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Żmuda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bujak</surname></persName>
		</author>
		<ptr target="http://ceur-ws.org/Vol-2554/paper_08.pdf" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 7th International Workshop on News Recommendation and Analytics in conjunction with the 13th ACM Conference on Recommender Systems, INRA @ RecSys 2019</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<meeting>the 7th International Workshop on News Recommendation and Analytics in conjunction with the 13th ACM Conference on Recommender Systems, INRA @ RecSys 2019<address><addrLine>Copenhagen, Denmark</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019-09-20">September 20, 2019. 2019</date>
			<biblScope unit="volume">2554</biblScope>
			<biblScope unit="page" from="53" to="62" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<ptr target="https://openai.com/index/new-embedding-models-and-api-updates/" />
		<title level="m">OpenAI, New embedding models and API updates</title>
				<imprint>
			<date type="published" when="2024-08-30">2024. 2024-08-30</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Polbert: Attacking polish nlp tasks with transformers</title>
		<author>
			<persName><forename type="first">D</forename><surname>Kłeczek</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the PolEval 2020 Workshop</title>
				<editor>
			<persName><forename type="first">M</forename><surname>Ogrodniczuk</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Łukasz</forename><surname>Kobyliński</surname></persName>
		</editor>
		<meeting>the PolEval 2020 Workshop</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
		<respStmt>
			<orgName>Institute of Computer Science, Polish Academy of Sciences</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">NewsBERT: Distilling pre-trained language model for intelligent news application</title>
		<author>
			<persName><forename type="first">C</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Qi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Liu</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.findings-emnlp.280</idno>
		<ptr target="https://aclanthology.org/2021.findings-emnlp.280.doi:10.18653/v1/2021.findings-emnlp.280" />
	</analytic>
	<monogr>
		<title level="m">Findings of the Association for Computational Linguistics: EMNLP 2021, Association for Computational Linguistics</title>
				<meeting><address><addrLine>Punta Cana, Dominican Republic</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="3285" to="3295" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Deep neural networks for youtube recommendations</title>
		<author>
			<persName><forename type="first">P</forename><surname>Covington</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Adams</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Sargin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 10th ACM Conference on Recommender Systems</title>
				<meeting>the 10th ACM Conference on Recommender Systems<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Using confidence bounds for exploitation-exploration trade-offs</title>
		<author>
			<persName><forename type="first">P</forename><surname>Auer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="page" from="397" to="422" />
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Analysis of thompson sampling for the multi-armed bandit problem</title>
		<author>
			<persName><forename type="first">S</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<ptr target="https://proceedings.mlr.press/v23/agrawal12.html" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 25th Annual Conference on Learning Theory</title>
				<editor>
			<persName><forename type="first">S</forename><surname>Mannor</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Srebro</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><forename type="middle">C</forename><surname>Williamson</surname></persName>
		</editor>
		<meeting>the 25th Annual Conference on Learning Theory</meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="page">26</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Item2vec: Neural item embedding for collaborative filtering</title>
		<author>
			<persName><forename type="first">O</forename><surname>Barkan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Koenigstein</surname></persName>
		</author>
		<idno type="DOI">10.1109/MLSP.2016.7738886</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP)</title>
				<imprint>
			<date type="published" when="2016">2016. 2016</date>
			<biblScope unit="page" from="1" to="6" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
