<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Contrastive Learning to Improve User Embeddings for Diverse News Recom mendations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zijie Tang</string-name>
          <email>z.tang3@student.vu.nl</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manel Slokom</string-name>
          <email>manel.slokom@cwi.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vrije Universiteit Amsterdam</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centrum Wiskunde en Informatica</institution>
          ,
          <addr-line>Amsterdam</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1906</year>
      </pub-date>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>News recommender systems (NRS) play a key role in delivering personalised content in fast-paced, high-volume environments. However, models optimised solely for accuracy often overlook important societal objectives such as fairness and diversity, leading to over-personalisation, biased exposure, and narrow content consumption. In this paper, we propose a contrastive learning framework for improving user representations in neural news recommendation.1 We build upon a bi‑encoder architecture and introduce self-supervised objectives that group semantically related news items by theme, encouraging the model to bring similar items closer in the embedding space while pushing dissimilar ones apart. This strategy mitigates embedding collapse and guides the model toward producing recommendations with broader topical coverage.</p>
      </abstract>
      <kwd-group>
        <kwd>News recommendation</kwd>
        <kwd>Contrastive learning</kwd>
        <kwd>Multi-objective optimisation</kwd>
        <kwd>Diversity</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Recommender systems (RSs) are designed to provide personalised suggestions for items that users
are most likely to find relevant or appealing. By analysing users’ preferences and behaviours, these
systems extract valuable insights that help deliver suitable products and services [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. As a result, RSs
play an important role in modern business strategies, enabling data-driven decisions by analysing users’
historical choices and behavioural patterns [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        Despite impressive performance in accuracy-based metrics, many NRSs sufer from
overpersonalisation and embedding bias, leading to narrow content exposure and poor representation
of user interests [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Prior work has identified that such models often converge to degenerate
embedding geometries, where user representations cluster toward dominant topics or highly clicked
content [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. This behaviour not only limits diversity but also hinders fairness by disproportionately
amplifying popular content while marginalising niche or minority interests [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        To address these challenges, researchers have increasingly turned to “beyond-accuracy” objectives,
including diversity, fairness, as complementary goals in recommendation [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. Diversity improves the
user experience by reducing content redundancy, while fairness ensures equitable exposure for both
users and content providers [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Generally, these goals are often intertwined: models that promote
diverse content also tend to improve fairness in exposure [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>In this paper, we focus on news recommender systems (NRS). We focus on improving user
representations in NRSs through the lens of contrastive learning (CL). Contrastive learning (CL) is a self-supervised
1Our code is publicly available at: https://github.com/tan9zj/xnrs-CL/tree/main
These authors contributed equally.</p>
      <p>CEUR</p>
      <p>
        ceur-ws.org
learning approach that pulls semantically similar instances closer in the embedding space and pushes
dissimilar ones apart [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. CL has recently gained attention in recommendation tasks, especially for
learning robust representations under sparse or implicit feedback [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. In the news domain, CL-based
models such as SFCNR [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and SimGCL [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] have demonstrated efectiveness in improving
generalisation and capturing subtle user interests by introducing content-aware perturbations or graph-based
augmentations. We build upon the neural model XNRS [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], which uses a bi-encoder architecture for
user-item matching and identifies embedding collapse as a critical limitation. To address this, we
introduce a contrastive learning approach that groups news articles based on their semantic themes
and uses these groupings to construct positive and negative pairs. By encouraging the user encoder to
discriminate between thematic clusters, our approach aims to produce user embeddings that are both
more expressive and better aligned with diverse topical interests.
      </p>
      <p>
        This consideration leads us to the research focus of our study. Our main research question is:
• Main RQ: How can we optimise user embeddings to improve recommendations accuracy and
diversity? We have the following sub-research questions:
• RQ0 Can we reproduce research [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]?
• RQ1 How can contrastive learning be applied to improve the quality of user embeddings?
• RQ2 What is the impact of improved user embeddings on recommendation accuracy?
• RQ3 How does our approach perform on beyond-accuracy measures such as diversity?
We summarise our contributions as follows:
• We reproduce the neural news recommendation model proposed in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], verifying its efectiveness
and identifying its limitations regarding embedding bias.
• We introduce contrastive learning mechanisms into the user encoder to better learn the user’s
interest.
• We evaluate our model on both standard accuracy metrics (e.g., NDCG, MRR, AUC) and
beyondaccuracy metrics (e.g., KL divergence, JS divergence, fair-nDCG), demonstrating improvements
in recommendation quality, diversity, and fairness.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Background and Related Work</title>
      <p>In this section, we review existing research on news recommendation systems and methods involving
contrastive learning.</p>
      <sec id="sec-2-1">
        <title>2.1. News Recommendation Methods</title>
        <p>News recommender systems aim to provide users with personalised news content based on their
preferences and browsing behaviour. A typical workflow involves three key stages: candidate news
retrieval, personalised ranking, and feedback-based profile updating [ 14]. When a user visits a platform,
a small set of candidate articles is recalled and then ranked based on inferred interests from historical
interactions. Top-ranked articles are shown to the user, and their click behaviour is used to update
profiles [ 15]. However, user interests are diverse, context-dependent, and dynamic, making accurate
modelling challenging [16].</p>
        <p>
          Classical recommender systems are typically divided into content-based, collaborative filtering, and
hybrid methods [17]. Content-based systems analyse item features to recommend similar content [17],
while collaborative filtering uses the preferences of similar users [ 18]. Recent work uses deep learning
to improve NRS performance. Models like NRMS [19], NAML [20], LSTUR [21], and NPA [22] adopt
diferent neural architectures (e.g., CNNs [ 23], LSTMs [24], attention mechanisms [25]) to learn
highquality representations of news content and user behaviour. These methods improve performance by
encoding contextual semantics and user interest dynamics. More recent approaches further explore
graph-based models [26] or user-news co-embedding strategies [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] to better capture structural and
semantic relationships in news consumption.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Contrastive Learning for News Recommendation</title>
        <p>
          Contrastive learning (CL) improves representation by contrasting positive and negative samples from
diferent data views [ 27]. In the domain of news recommendation, contrastive learning is used to
improve the robustness and expressiveness of user and item embeddings, especially under sparse or
implicit feedback [28, 29, 30]. For instance, SFCNR [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] applies contrastive learning to cold-start users
by aligning news representations under content-based perturbations. [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] shows superior performance
and better alignment of virtual and original features. SimGCL [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] uses directed noise to enforce
uniformity, outperforming graph-based augmentation models while reducing training time. However,
existing methods have limitations [28, 31]. Many rely on simplistic augmentation strategies, such as
random masking and item reordering, that may not capture meaningful variations in user behaviour.
In addition, diferent augmentations should be required for diferent datasets, e.g., properties and task
types [32]. They often overlook the temporal nature of user interests in the news domain, where
freshness and recency are critical [26]. Some methods also focus heavily on user-side contrast but
ignore item-side diversity or semantic drift, leading to suboptimal generalisation.
2.3. Fairness and Diversity in News Recommendations
In the top N recommendations, fairness aims to ensure equitable treatment between users or groupings of
elements [33]. Fairness-aware techniques are classified into pre-, in-, or post-processing. Pre-processing
addresses bias in data. For instance, data reweighting or augmentation can balance underrepresented
groups [34, 35, 36, 37]. Such strategies are beneficial when the model does not directly use sensitive
attributes but is correlated with other features in the training data. In-processing methods add fairness
constraints into the learning objective or model architecture, often using regularisation [ 38, 39, 40].
For example,  1 and  2 norm regularisation [38] or graph-based relational modelling [39] help control
bias. These techniques enable the model to dynamically balance accuracy and fairness but may increase
training complexity. Post-processing modifies ranked outputs to satisfy fairness constraints without
changing the model, often through re-ranking [ 41, 42, 43]. FA*IR [41] guarantees minimum group
representation, while LLM-based methods such as IFairLRS [43] adjust for semantic bias post-hoc.
While post-processing ofers practical deployment advantages, it typically requires access to group
membership or sensitive attributes to evaluate and enforce fairness constraints.
        </p>
        <p>With respect to diversity, the goal is to provide a varied selection of recommendations that reduce
redundancy, enrich user experience, and prevent users from being confined to a narrow range of
content, often referred to as the filter bubble [44]. In recommender systems, diversity generally refers
to the recommended items difering from one another, either in terms of content, category, or user
intent [45, 46]. Promoting diversity in recommendations is important in news domains, where repeated
exposure to similar viewpoints may lead to confirmation bias and reduced information plurality [ 47, 48].
Diverse recommendations can help users discover novel content, improve user satisfaction, and even
enhance long-term engagement [49]. Several strategies have been proposed to improve diversity in
recommendation outputs. These include pre-processing methods that modify user profiles by adding or
removing items [37]; re-ranking approaches based on pairwise item dissimilarity [45]; topic modelling
techniques to promote topical variety [49]; and multi-objective optimisation approaches that balance
accuracy and diversity [50]. Some approaches explicitly include diversity-aware regularisation terms in
the model’s objective function, while others operate in a post-processing stage, adjusting the output
list after ranking. Despite its benefits, improving diversity often comes at the cost of recommendation
accuracy, leading to a trade-of that must be carefully managed depending on the application context [ 51].</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Model Design</title>
      <p>This section describes our proposed news recommendation approach, which integrates a content-based
bi-encoder architecture with a contrastive learning objective to optimize user representation quality.
Then, we describe the training procedure, including the joint loss formulation and optimisation strategy.</p>
      <p>MSE loss
s1
dot
 1
Multi-Layer Perceptron
 1</p>
      <p>Attention
 2
 3</p>
      <p>InfoNCE loss
Positive sample</p>
      <p>Negative sample
 
 1</p>
      <p>1
Multi-Layer Perceptron</p>
      <p>Attention
···
′ 
′1
Multi-Layer Perceptron
 1</p>
      <p>Attention
 2
 3
···</p>
      <p>Multi-Layer Perceptron
 1</p>
      <p>Attention
 2
 3
Pre-trained languagemodel</p>
      <p>Pre-trained languagemodel</p>
      <p>Pre-trained languagemodel
Title w1 ··· w2 ··· w3</p>
      <p>Title w1 ··· w2 ··· w3</p>
      <p>Title w1 ··· w2 ··· w3
CandidateNews</p>
      <p>Clicked News</p>
      <p>UserEncoder
News Encoder
Click Prediction
ContrastiveLearning</p>
      <sec id="sec-3-1">
        <title>3.1. Recommendation Model</title>
        <p>
          The overall structure of our model is shown in Figure 1, which consists of a news encoder, a user
encoder, and a scoring module. We build upon the work of [
          <xref ref-type="bibr" rid="ref4">4, 52, 20</xref>
          ], which proposes a content-based
news recommender using a bi-encoder architecture. The model computes the relevance score between
a user  and a candidate news article  by taking the inner product of their vector representations:
 =  ⊤ .
        </p>
        <sec id="sec-3-1-1">
          <title>News modeling</title>
          <p>
            We use the Siamese sentence-transformer (S-BERT), which shows the best
performance among RoBERTa, NewsBERT, and FastText [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ]. Considering each news title as a sequence
of  -dimensional token embeddings   , with  indexing individual tokens, the sequential representation
is represented as:
 =  
  =  (
( ∑     ),
          </p>
          <p>ℎ(
  +  )) ,
where   are the attention weights for the individual token embeddings   . They are predicted from the
respective embeddings as shown in Equation 3, in which  ,  , and  are learnable parameters. The
softmax function ensures that all weights sum to one. After the attention mechanism, the model further
transforms the embeddings through a small multi-layer perception (MLP).</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>User modeling</title>
          <p>For each user, we use the same additive attention mechanism over the user’s history
news embeddings   for the  most recently read news:
 =  
( ∑     ).</p>
          <p>Here   are the attention weights gained similarly to   through Equation 3.
(1)
(2)
(3)
(4)
Contrastive learning</p>
          <p>In our approach, we extend the standard news recommendation pipeline by
integrating contrastive learning to improve user representation learning. While the candidate and
historical news are encoded as usual via the news encoder and aggregated into a user embedding using
the user encoder, we add an auxiliary contrastive loss on the user embeddings. This loss encourages
user embeddings with similar interests (e.g., sharing the same category or theme) to be closer in the
embedding space, while pushing dissimilar ones apart.</p>
          <p>Given a batch of user embeddings u1, u2, ..., u and their corresponding class labels  1,  2, ...,   , the
similarity matrix is:
where  is a temperature scaling hyperparameter, and the embeddings are ℓ2 normalized before
computing dot-product similarity. For each anchor user  , the contrastive loss is defined as:
where  () denotes the set of indices in the batch that share the same label as  (excluding  itself). The
ifnal loss is averaged over all users in the batch:
(5)
(6)
(7)
(8)
(9)
(10)</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Model training</title>
        <p>The proposed model is trained with a joint loss function that combines the standard click prediction
objective and the contrastive learning objective described earlier.</p>
        <sec id="sec-3-2-1">
          <title>Click prediction loss</title>
          <p>The primary training objective is to predict whether a user will click on a
candidate news article. Given a user embedding  and a candidate news embedding  , the predicted
relevance score is computed as in Equation 1 and the click probability is computed by applying a sigmoid
function to their dot product score:</p>
          <p>=̂  (  ⊤ ),
where  (⋅) denotes the sigmoid function. Our approach employs mean squared error (MSE) loss to
model user-news interaction scores as continuous relevance values.</p>
          <p>
            Let  ∈ [
            <xref ref-type="bibr" rid="ref1">0, 1</xref>
            ] be the target label representing the interaction (e.g., 1 for clicked, 0 for not clicked), the
click prediction loss is defined as:
  =


  

 
() = −
∑∈()
          </p>
          <p>(
∑≠ (
 )</p>
          <p>,
 )
 
=
1
||
∑  () .</p>
          <p>=1
 MSE = 1
∑ (  −  ̂ ) ,</p>
          <p>2
 =  
+  ⋅   ,
where  is the number of training samples in the batch.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>Combined training objective</title>
          <p>To leverage both supervised click signals and self-supervised
contrastive signals, we combine the click prediction loss with the contrastive learning loss:
where  is a hyperparameter that controls the relative weight of the contrastive loss. In practice,  is
chosen based on validation performance to balance accuracy and representation robustness.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Setup</title>
      <p>This section presents the experimental design used to evaluate the efectiveness of our proposed
approach. We describe the datasets employed, detail the experimental settings, and report key metrics
to analyse model performance. For reproducibility, our code is publicly available at: https://github.com/
tan9zj/xnrs-CL/tree/main.</p>
      <sec id="sec-4-1">
        <title>4.1. Datasets</title>
        <p>We run experiments on a real-world news recommendation dataset Microsoft News Dataset
(MIND) [15]. MIND is a large-scale English news recommendation dataset constructed from anonymised
user behaviour logs on the Microsoft News platform. It contains user click histories, impression logs,
and detailed information about the news articles, including titles, abstracts, categories, and entities
of the title and abstract. For our experiments, we use the MIND-small subset, which is a randomly
sampled portion of the full dataset. The detailed statistics are summarised in Table 1.</p>
        <p>To better align with the objectives of our method, we reorganise the original category labels into a set
of high-level thematic groups specifically designed for our contrastive learning approach. The mapping
between themes and their corresponding categories is presented in Table 2. This reorganisation serves
more than one purpose: first, it facilitates the learning of more generalised user representations by
grouping semantically similar categories; second, it encourages the model to consider a broader range
of content types during training; third, we note that contrastive learning setups on category implicitly
force each user embedding to align strongly with a single content category, which may reinforce narrow
interest profiles and cause the filter bubble efect. We aim to promote diversity by encouraging the
inclusion of multiple distinct categories in the final ranked list. By grouping categories into broader
themes, we aim to guide the contrastive learning process to be sensitive to semantic-level distinctions.
This allows user embeddings to be aligned with higher-level semantic themes rather than overly specific
labels, increasing the likelihood that users are exposed to a diverse set of categories. As a result, the
model generates recommendations that are both relevant and semantically varied, contributing to
improved fairness and diversity.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Baselines</title>
        <p>
          We follow the work of Möller et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. In previous work, they compared the proposed model, XNRS,
with several established baselines, including LSTUR [21], NPA [22], NAML [20], NRMS [19], CAUM [53],
and a late fusion (LF) approach based on NRMS introduced by Iana et al. [52]. XNRS outperformed
most baselines overall, although LSTUR and NAML yielded better results on specific metrics. Based on
these findings, we select XNRS, LSTUR, and NAML as baseline models for our experiments, as they have
previously demonstrated competitive performance.
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Hyperparameter tuning of CL</title>
        <p>There are two key hyperparameters in our contrastive learning setup: the temperature scaling factor ( )
and the contrastive loss weight ( ). To assess the sensitivity of model performance to these parameters,
we perform a grid search over a predefined range.</p>
        <p>For the temperature  , we evaluate values in {0.08, 0.1, 0.9}, and for the contrastive loss weight  ,
we explore values in {0.005, 0.01, 0.012, 0.02}. Based on the experimental results, we find that setting
 = 0.08 and  = 0.01 yields the best overall performance, and we use these values for all subsequent
experiments.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Metrics</title>
        <p>Click-Prediction Evaluation For accuracy metrics, we use the well-established ranking metrics
normalised discounted cumulative gain (nDCG), mean reciprocal rank (MRR), click-through rate (CTR),
and area under the receiver operating characteristic curve (AUC). These metrics evaluate the quality of
ranked recommendations from multiple perspectives: nDCG assesses ranking quality by considering
both relevance and position of recommended items [54]. The highly relevant items are more useful
when appearing earlier in the ranking than the less relevant items. MRR evaluates the ranking quality
based on the position of the first relevant item in the recommendation list. A higher MRR indicates that
relevant items appear earlier in the ranking list. CTR measures user engagement by calculating the ratio
of clicked recommendations to the total number of displayed recommendations. AUC measures the
probability that a randomly chosen relevant item is ranked higher than a randomly chosen irrelevant
item. Higher value indicates the efectiveness of a recommender system in distinguishing relevant
items from irrelevant ones.</p>
        <p>Diversity Evaluation To evaluate the diversity of the recommended results, we adopt
distributionbased metrics that compare the statistical diference between the category distributions of the
recommended items and a reference distribution, typically the user’s historical reading distribution. We use
Kullback–Leibler (KL) divergence and Jensen–Shannon (JS) divergence to quantify this dissimilarity [55].
Fairness Evaluation To evaluate the relevance and category-level fairness of recommendations,
particularly toward minority categories, we adopt a modified version of the normalised Discounted
Cumulative Gain (nDCG) called fair-nDCG [56].</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experimental Results</title>
      <p>In this section, we analyse our model performance across multiple aspects, including recommendation
accuracy, representation geometry, fairness and diversity of recommendations, and robustness.</p>
      <sec id="sec-5-1">
        <title>5.1. Recommendation performance</title>
        <p>
          We start by evaluating the click-prediction performance of our proposed model against several strong
neural baselines. Our method, XNRS+CL, extends the original XNRS model [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] by incorporating
contrastive learning. The goal is to improve user representations by encouraging semantically similar news
items to be embedded closer in the latent space. Table 3 shows our experiment results.
        </p>
        <p>Overall, our results in Table 3, show that both contrastive learning variants of XNRS (XNRS+CL(theme)
and XNRS+CL(category)) consistently outperform the original XNRS model across all metrics, nDCG,
MRR, CTR, and AUC. This could be explained by the fact that contrastive learning contributes to better
user-item matching, likely by improving the quality of user embeddings.</p>
        <p>Notably, XNRS+CL(category) shows slightly stronger performance than the theme-based variant
on several metrics, including nDCG@5, MRR, and CTR@1. Although the diferences are modest, this
suggests that the type of semantic grouping used to define positive contrastive pairs may afect specific
aspects of model behaviour. Further exploration of grouping strategies could provide additional insight
into the relationship between content semantics and recommendation efectiveness.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Geometry of the embeddings</title>
        <p>
          To gain insight into how contrastive learning shapes the user representation space, we visualise the
user embeddings before and after applying our theme-based contrastive learning objective. Specifically,
we project the high-dimensional user embeddings onto a polar coordinate plot to enable qualitative
comparison, following the approach of Möller et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>As illustrated in Figure 2, the contrastive learning objective encourages the user embeddings to
expand and become more geometrically structured. In the original embedding space (Figure 5.2),
user embeddings are relatively concentrated and show limited separation. After contrastive learning
(Figure 5.2), the embeddings are more widely dispersed. This geometric expansion suggests that
contrastive learning promotes better separation among users. Figure 3 provides an additional t-SNE
visualisation of user embeddings. After applying contrastive learning, the embeddings show clearer
clustering structures and are more directionally aligned.</p>
        <p>t-SNE of User Embeddings (Before CL)
t-SNE of User Embeddings (After CL)
2
iEdm 0
N
S
-t
60
40
20
20
40
60</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Fairness and Diversity of the Recommendations</title>
        <p>In addition to accuracy, we evaluate our model on fairness and diversity criteria to assess its ability to
provide balanced exposure across categories, especially minority ones.</p>
        <p>Fairness To evaluate the fairness of the recommendation lists, we adopt the fair-nDCG metric, a
modification of the traditional nDCG that assigns relevance only to items belonging to predefined
underrepresented or minority categories. In our experiments, we define the minority categories as: tv,
entertainment, music, kids, movies, middleeast, games, weather, and autos.</p>
        <p>As shown in Figure 4, we compare the fairness performance across top- recommendations between
diferent model variants. The green line represents the ground-truth distribution from the test set,
while the orange and blue lines correspond to models trained with contrastive learning (CL) based on
theme-level and category-level grouping, respectively.</p>
        <p>We observe that both contrastive variants outperform the baseline in terms of minority category
coverage, especially as the value of  increases. The curves for CL+Theme and CL+Category almost
entirely overlap, indicating that both variants contribute similarly to improved fairness, despite using
diferent semantic groupings during training. This trend is also reflected in Table 3, where the NDCG
scores of these two variants are nearly identical. This consistency may suggest that the introduction of
contrastive signals, rather than the specific grouping schema, is the primary factor driving improved
fairness in recommendation ranking.</p>
        <p>Diversity We quantify the diversity of the recommendations using statistical divergence measures. To
evaluate the topical diversity of the recommendation lists, we compute the average Kullback–Leibler (KL)
divergence and Jensen–Shannon (JS) divergence between the category distributions of the recommended
items and a reference distribution, either derived from the user’s historical reading behaviour or the
global news category distribution. Here, we consider the user’s reading history distribution as the
reference one. We observe that both contrastive learning variants, whether based on category or theme,
produce remarkably similar diversity outcomes. Specifically, the average KL divergence across all users
is 2.2192, and the mean JS divergence is 0.4941 for both variants. Figure 5 illustrates the distribution of
divergence scores alongside the user count. These findings suggest that, despite diferences in how user
representations are constructed, the overall category diversity of the generated recommendation lists
remains comparable.</p>
        <p>However, when we restrict the evaluation to the top-5 or top-10 recommended items, the
themebased variant shows better diversity. Table 4 presents a comparison between XNRS+CL(theme) and
XNRS+CL(category) under Top-5 and Top-10 cutofs.</p>
        <p>These results suggest that our theme-based methods lead to a more diverse set of recommended news
articles, particularly in the top-ranked positions where user attention is highest. Although absolute
diferences are relatively small, the consistent advantage of the theme-based variant at the top position
indicates that higher semantic granularity in contrastive learning, such as grouping by theme rather
than a single category, can encourage more varied and semantically rich recommendation lists.</p>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. Ablation study</title>
        <p>
          To further understand the impact of diferent model components, we conduct an ablation study
comparing multiple architectural and training variations, including diferent user representation strategies (e.g.,
scoring function) and contrastive grouping granularity (e.g., themes vs. categories). We name these
variants following the setup from previous work [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>
          First, we remove the MLP inside the user encoder (Equation 4), obtaining the base model. Then,
we replace the attention mechanisms in both the news and user encoders (Equations 2 and 4) with a
simple average pooling over the input embeddings, naming this variant Mean. For the final matching
layer between the user representation  and candidate news items   , we experiment with two types of
scoring functions:
• Dot Product (dot): This baseline computes the interaction score as the dot product between the
user and item embeddings:
  =  ⊤ 
(11)
Optionally, the embeddings can be ℓ2-normalised before computing the similarity. This is eficient
and widely used in many recommendation models.
• Bilinear (bilin): Inspired by prior work [
          <xref ref-type="bibr" rid="ref4">57, 4</xref>
          ] on learning user-item interactions, we introduce a
trainable bilinear transformation via a learnable matrix W ∈ ℝ× :
  =  ⊤W 
(12)
        </p>
        <p>
          Our results are summarised in Table 5. As shown, we observe that base+dot+theme consistently
outperforms base+bilin+theme. This indicates that the dot-product scoring function is more efective
than the bilinear alternative in our contrastive learning setup. Among the model variants, the standard
configuration outperforms the mean variant across most metrics, consistent with prior findings [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] that
identify the standard model as the strongest among the three.
        </p>
        <p>We note that when contrastive learning is applied to the lightweight base model, its performance
improves substantially, outperforming even NAML on several metrics. In the case of the base model,
contrastive learning is applied directly to the user embeddings produced by the encoder, without
additional architectural complexity. Surprisingly, this straightforward setup yields strong performance.
However, the reasons behind its efectiveness remain unclear and warrant further investigation.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <p>Our results demonstrate that incorporating contrastive learning into a bi-encoder news recommendation
architecture improves user representation quality and overall recommendation accuracy. Both
themebased and category-based grouping strategies outperform the base model, with particularly strong
performance from the theme-based variant.</p>
      <p>Beyond accuracy, we observe measurable gains in diversity and fairness metrics, especially at deeper
ranks (e.g., top-30). However, these improvements are less pronounced at top-5 and top-10, where
real-world user attention is typically concentrated. This discrepancy highlights a key challenge in
designing recommendation models that balance fairness and diversity with real-world usability.</p>
      <p>Another important observation is the strong performance of the lightweight base model when
combined with contrastive learning. Despite its architectural simplicity, it competes closely with, and
sometimes outperforms, more complex models like NAML. This suggests that contrastive learning
can be a powerful technique even in low-complexity setups and motivates further exploration into
contrastive signals and pairing strategies.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion and Future Work</title>
      <p>In this paper, we propose a contrastive learning framework for content-based news recommendation
using a bi-encoder architecture.</p>
      <p>
        Outlook Our method builds upon the XNRS model [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and introduces self-supervised learning
objectives that group semantically similar news items by themes or categories. The resulting user
embeddings lead to improved performance across standard ranking metrics and beyond-accuracy
objectives. Extensive experiments on the MIND dataset show that our contrastive learning variants
achieve competitive or superior performance compared to strong neural baselines such as LSTUR and
NAML. We found that even lightweight model variants benefit from contrastive learning supervision.
      </p>
      <p>In addition to improvements in accuracy, our method contributes positively to diversity and fairness.
While gains are more prominent at deeper ranks, the theme-based contrastive setup shows promise for
encouraging a more balanced exposure of content.</p>
      <p>Future Work One key limitation of the current model is its reliance solely on news titles for content
encoding. Other textual fields, such as subtitles, abstracts, and metadata (e.g., keywords, subcategories),
ofer rich semantic information that could improve representation learning. Incorporating these
modalities may help capture finer-grained topical signals and disambiguate semantically similar content.</p>
      <p>Future work should also address fairness in the top-ranked positions, where user engagement is
highest. Although our method improves fairness at top-30 positions, the impact is limited at
top5 or top-3 ranks. Fairness-aware re-ranking techniques or adaptive contrastive loss functions that
prioritise exposure fairness at higher ranks may help bridge this gap. Another promising direction is the
exploration of alternative contrastive learning strategies. These may include hierarchical or dynamic
positive pair generation, temporal-aware contrastive signals, or dual user-item contrastive frameworks.
Such extensions could further enhance both the performance and generalisability of contrastive learning
in news recommendation systems.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>This publication is part of the AI, Media &amp; Democracy Lab. For more information about the lab and its
further activities, visit https://www.aim4dem.nl/. The authors thank Bo-Chan Jack, Sven Lankester, and
Natalie Halaskova for their valuable advice.</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>The authors declare that GenAI is only used to identify and correct grammatical errors, typos, and other
writing mistakes. This helps improve the clarity and professionalism of the writing.
[14] C. Wu, F. Wu, Y. Huang, X. Xie, Personalized news recommendation: Methods and challenges,</p>
      <p>ACM Transactions on Information Systems 41 (2023) 1–50.
[15] F. Wu, Y. Qiao, J.-H. Chen, C. Wu, T. Qi, J. Lian, D. Liu, X. Xie, J. Gao, W. Wu, et al., Mind: A
large-scale dataset for news recommendation, in: Proceedings of the 58th annual meeting of the
association for computational linguistics, 2020, pp. 3597–3606.
[16] C. Feng, M. Khan, A. U. Rahman, A. Ahmad, News recommendation systems-accomplishments,
challenges &amp; future directions, IEEE Access 8 (2020) 16702–16725.
[17] G. Adomavicius, A. Tuzhilin, Toward the next generation of recommender systems: A survey of
the state-of-the-art and possible extensions, IEEE transactions on knowledge and data engineering
17 (2005) 734–749.
[18] J. B. Schafer, D. Frankowski, J. Herlocker, S. Sen, Collaborative filtering recommender systems, in:</p>
      <p>The adaptive web: methods and strategies of web personalization, Springer, 2007, pp. 291–324.
[19] C. Wu, F. Wu, S. Ge, T. Qi, Y. Huang, X. Xie, Neural news recommendation with multi-head
self-attention, in: Proceedings of the 2019 conference on empirical methods in natural language
processing and the 9th international joint conference on natural language processing
(EMNLPIJCNLP), 2019, pp. 6389–6394.
[20] C. Wu, F. Wu, M. An, J. Huang, Y. Huang, X. Xie, Neural news recommendation with attentive
multi-view learning, arXiv preprint arXiv:1907.05576 (2019).
[21] M. An, F. Wu, C. Wu, K. Zhang, Z. Liu, X. Xie, Neural news recommendation with long-and
short-term user representations, in: Proceedings of the 57th annual meeting of the association for
computational linguistics, 2019, pp. 336–345.
[22] C. Wu, F. Wu, M. An, J. Huang, Y. Huang, X. Xie, Npa: neural news recommendation with
personalized attention, in: Proceedings of the 25th ACM SIGKDD international conference on
knowledge discovery &amp; data mining, 2019, pp. 2576–2584.
[23] R. Yamashita, M. Nishio, R. K. G. Do, K. Togashi, Convolutional neural networks: an overview and
application in radiology, Insights into imaging 9 (2018) 611–629.
[24] A. Graves, A. Graves, Long short-term memory, Supervised sequence labelling with recurrent
neural networks (2012) 37–45.
[25] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin,</p>
      <p>Attention is all you need, Advances in neural information processing systems 30 (2017).
[26] S. Ge, C. Wu, F. Wu, T. Qi, Y. Huang, Graph enhanced representation learning for news
recommendation, in: Proceedings of the web conference 2020, 2020, pp. 2863–2869.
[27] J. Giorgi, O. Nitski, B. Wang, G. Bader, DeCLUTR: Deep contrastive learning for unsupervised
textual representations, in: C. Zong, F. Xia, W. Li, R. Navigli (Eds.), Proceedings of the 59th
Annual Meeting of the Association for Computational Linguistics and the 11th International
Joint Conference on Natural Language Processing (Volume 1: Long Papers), Association for
Computational Linguistics, 2021, pp. 879–895. doi:10.18653/v1/2021.acl- long.72.
[28] L. Wu, H. Lin, C. Tan, Z. Gao, S. Z. Li, Self-supervised learning on graphs: Contrastive, generative,
or predictive, IEEE Transactions on Knowledge and Data Engineering (2021).
[29] A. Iana, G. Glavaš, H. Paulheim, Train once, use flexibly: A modular framework for multi-aspect
neural news recommendation, arXiv preprint arXiv:2307.16089 (2023).
[30] X. Ren, W. Wei, L. Xia, C. Huang, A comprehensive survey on self-supervised learning for
recommendation, ACM Computing Surveys (2024).
[31] W. He, G. Sun, J. Lu, X. S. Fang, Candidate-aware graph contrastive learning for recommendation,
in: Proceedings of the 46th International ACM SIGIR Conference on Research and Development
in Information Retrieval, SIGIR, 2023, p. 1670–1679. doi:10.1145/3539618.3591647.
[32] W. Jin, T. Derr, H. Liu, Y. Wang, S. Wang, Z. Liu, J. Tang, Self-supervised learning on graphs: Deep
insights and new direction, arXiv preprint arXiv:2006.10141 (2020).
[33] E. Pitoura, K. Stefanidis, G. Koutrika, Fairness in rankings and recommendations: an overview,</p>
      <p>The VLDB Journal (2022) 1–28.
[34] F. Calmon, D. Wei, B. Vinzamuri, K. Natesan Ramamurthy, K. R. Varshney, Optimized
preprocessing for discrimination prevention, Advances in neural information processing systems 30
(2017).
[35] L. Chen, L. Wu, K. Zhang, R. Hong, D. Lian, Z. Zhang, J. Zhou, M. Wang, Improving
recommendation fairness via data augmentation, in: Proceedings of the ACM Web Conference 2023, 2023, pp.
1012–1020.
[36] K. Balasubramanian, A. Alshabanah, E. Markowitz, G. Ver Steeg, M. Annavaram, Biased user
history synthesis for personalized long-tail item recommendation, in: Proceedings of the 18th
ACM Conference on Recommender Systems, 2024, pp. 189–199.
[37] M. Slokom, S. Daniil, L. Hollink, How to diversify any personalized recommender?, in:
Advances in Information Retrieval: 47th European Conference on Information Retrieval, ECIR 2025,
Lucca, Italy, April 6–10, 2025, Proceedings, Part IV, 2025, p. 307–323. URL: https://doi.org/10.1007/
978-3-031-88717-8_23. doi:10.1007/978-3-031-88717-8_23.
[38] R. Burke, N. Sonboli, A. Ordonez-Gauger, Balanced neighborhoods for multi-sided fairness in
recommendation, in: Conference on fairness, accountability and transparency, PMLR, 2018, pp.
202–214.
[39] B. Yang, D. Liu, T. Suzumura, R. Dong, I. Li, Going beyond local: Global graph-enhanced
personalized news recommendations, in: Proceedings of the 17th ACM Conference on Recommender
Systems, 2023, pp. 24–34.
[40] L. Boratto, F. Fabbri, G. Fenu, M. Marras, G. Medda, Fair augmentation for graph collaborative
ifltering, in: Proceedings of the 18th ACM Conference on Recommender Systems, 2024, pp.
158–168.
[41] M. Zehlike, F. Bonchi, C. Castillo, S. Hajian, M. Megahed, R. Baeza-Yates, Fa* ir: A fair top-k
ranking algorithm, in: Proceedings of the 2017 ACM on Conference on Information and Knowledge
Management, 2017, pp. 1569–1578.
[42] S. C. Geyik, S. Ambler, K. Kenthapadi, Fairness-aware ranking in search &amp; recommendation systems
with application to linkedin talent search, in: Proceedings of the 25th acm sigkdd international
conference on knowledge discovery &amp; data mining, 2019, pp. 2221–2231.
[43] M. Jiang, K. Bao, J. Zhang, W. Wang, Z. Yang, F. Feng, X. He, Item-side fairness of large language
model-based recommendation system, in: Proceedings of the ACM on Web Conference 2024, 2024,
pp. 4717–4726.
[44] T. T. Nguyen, P.-M. Hui, F. M. Harper, L. Terveen, J. A. Konstan, Exploring the filter bubble: the
efect of using recommender systems on content diversity, in: Proceedings of the 23rd ACM
international conference on World wide web, 2014, pp. 677–686.
[45] S. Vargas, P. Castells, Rank and relevance in novelty and diversity metrics for recommender
systems, in: Proceedings of the 5th ACM conference on Recommender systems, 2011, pp. 109–116.
[46] M. Kunaver, T. Požrl, Diversity in recommender systems–a survey, Knowledge-Based Systems
123 (2017) 154–162.
[47] E. Bozdag, Bias in algorithmic filtering and personalization, Ethics and Information Technology
15 (2013) 209–227.
[48] N. Helberger, K. Karppinen, L. D’Acunto, Exposure diversity as a design principle for recommender
systems, Information, Communication &amp; Society 21 (2018) 191–207.
[49] C.-N. Ziegler, S. M. McNee, J. A. Konstan, G. Lausen, Improving recommendation lists through
topic diversification, in: Proceedings of the 14th international conference on World Wide Web,
2005, pp. 22–32.
[50] M. Zhang, N. Hurley, J. Peng, Avoiding monotony: improving the diversity of recommendation
lists, Proceedings of the 2008 ACM conference on Recommender systems (2008) 123–130.
[51] P. Castells, S. Vargas, J. Wang, Novelty and diversity in recommender systems, Recommender</p>
      <p>Systems Handbook (2021) 845–884.
[52] A. Iana, G. Glavas, H. Paulheim, Simplifying content-based neural news recommendation: On user
modeling and training objectives, in: Proceedings of the 46th international ACM SIGIR conference
on research and development in information retrieval, 2023, pp. 2384–2388.
[53] T. Qi, F. Wu, C. Wu, Y. Huang, News recommendation with candidate-aware user modeling, in:
Proceedings of the 45th international ACM SIGIR conference on research and development in
information retrieval, 2022, pp. 1917–1921.
[54] K. Järvelin, J. Kekäläinen, Ir evaluation methods for retrieving highly relevant documents, in:</p>
      <p>ACM SIGIR Forum, volume 51, 2017, pp. 243–250.
[55] H. Steck, Calibrated recommendations, in: Proceedings of the 12th ACM Conference on
Recommender Systems, RecSys ’18, 2018, p. 154–162.
[56] E. Pitoura, K. Stefanidis, G. Koutrika, Fairness in rankings and recommendations: an overview,</p>
      <p>The VLDB Journal 31 (2021) 431–458.
[57] L. Möller, S. Padó, Understanding the relation of user and news representations in content-based
neural news recommendation, The 10th International Workshop on News Recommendation and
Analytics (INRA) (2022).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F. E.</given-names>
            <surname>Zaizi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Qassimi</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <article-title>Rakrak, Multi-objective optimization with recommender systems: A systematic review</article-title>
          ,
          <source>Information Systems</source>
          <volume>117</volume>
          (
          <year>2023</year>
          )
          <fpage>102233</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kuo</surname>
          </string-name>
          , C.
          <article-title>-</article-title>
          K. Chen,
          <string-name>
            <given-names>S.-H.</given-names>
            <surname>Keng</surname>
          </string-name>
          ,
          <article-title>Application of hybrid metaheuristic with perturbation-based k-nearest neighbors algorithm and densest imputation to collaborative filtering in recommender systems</article-title>
          ,
          <source>Information Sciences 575</source>
          (
          <year>2021</year>
          )
          <fpage>90</fpage>
          -
          <lpage>115</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>E. Pariser,</surname>
          </string-name>
          <article-title>The filter bubble: What the Internet is hiding from you</article-title>
          ,
          <string-name>
            <surname>Penguin</surname>
            <given-names>UK</given-names>
          </string-name>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Möller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Padó</surname>
          </string-name>
          ,
          <article-title>Explaining neural news recommendation with attributions onto reading histories</article-title>
          ,
          <source>ACM Transactions on Intelligent Systems and Technology</source>
          <volume>16</volume>
          (
          <year>2025</year>
          )
          <fpage>1</fpage>
          -
          <lpage>25</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          , X. Cheng, C. C. Aggarwal, T. Derr,
          <article-title>Fairness and diversity in recommender systems: a survey</article-title>
          ,
          <source>ACM Transactions on Intelligent Systems and Technology</source>
          <volume>16</volume>
          (
          <year>2025</year>
          )
          <fpage>1</fpage>
          -
          <lpage>28</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>McNee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Riedl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Konstan</surname>
          </string-name>
          ,
          <article-title>Being accurate is not enough: how accuracy metrics have hurt recommender systems</article-title>
          ,
          <source>in: CHI '06 Extended Abstracts on Human Factors in Computing Systems, CHI EA '06</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery,
          <year>2006</year>
          , p.
          <fpage>1097</fpage>
          -
          <lpage>1101</lpage>
          . doi:
          <volume>10</volume>
          .1145/ 1125451.1125659.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Jannach</surname>
          </string-name>
          , G. Adomavicius,
          <article-title>Recommendations with a purpose</article-title>
          ,
          <source>in: Proceedings of the 10th ACM conference on recommender systems</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>7</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>T.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xie</surname>
          </string-name>
          , Profairrec:
          <article-title>Provider fairness-aware news recommendation</article-title>
          ,
          <source>in: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>1164</fpage>
          -
          <lpage>1173</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Ekstrand</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Das</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Burke</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Diaz</surname>
          </string-name>
          ,
          <source>Fairness in Recommender Systems</source>
          , Springer US,
          <year>2022</year>
          , pp.
          <fpage>679</fpage>
          -
          <lpage>707</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>T.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kornblith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Norouzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <article-title>A simple framework for contrastive learning of visual representations</article-title>
          ,
          <source>in: International conference on machine learning</source>
          ,
          <source>PmLR</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1597</fpage>
          -
          <lpage>1607</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          , T.-S. Chua,
          <article-title>Contrastive learning for cold-start recommendation</article-title>
          ,
          <source>in: Proceedings of the 29th ACM international conference on multimedia</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>5382</fpage>
          -
          <lpage>5390</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>H.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Self-supervised contrastive enhancement with symmetric few-shot learning towers for cold-start news recommendation</article-title>
          ,
          <source>in: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>945</fpage>
          -
          <lpage>954</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V. H.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <article-title>Are graph augmentations necessary? simple graph contrastive learning for recommendation</article-title>
          ,
          <source>in: Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>1294</fpage>
          -
          <lpage>1303</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>