<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>P. Baldi, Z. Lu, Complex-valued autoencoders, Neural Networks</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1145/2983323.2983758</article-id>
      <title-group>
        <article-title>Comparing Recommendation Losses under Negative Sampling ⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giulia Di Teodoro</string-name>
          <email>G@0.4</email>
          <email>giulia.di.teodoro@ing.unipi.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Federico Siciliano</string-name>
          <email>siciliano@diag.uniroma1.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicola Tonellotto</string-name>
          <email>nicola.tonellotto@unipi.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabrizio Silvestri</string-name>
          <email>fsilvestri@diag.uniroma1.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer, Control and Management Engineering, Sapienza University of Rome</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Information Engineering Department, University of Pisa</institution>
          ,
          <addr-line>Pisa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <volume>33</volume>
      <issue>2012</issue>
      <fpage>227</fpage>
      <lpage>236</lpage>
      <abstract>
        <p>Loss functions, such as categorical cross-entropy (CCE), binary cross-entropy (BCE), and Bayesian personalized ranking (BPR), play a central role in training modern recommender systems. Although evaluations are often based on ranking metrics, such as Normalized Discounted Cumulative Gain (NDCG) and Mean Reciprocal Rank (MRR), a direct understanding of how these losses relate to target metrics remains incomplete. Furthermore, full-item training is computationally prohibitive, which has led to the widespread use of negative sampling. In this extended abstract, we (i) derive theoretical equivalences and bounds relating these loss functions under negative sampling; (ii) prove that BPR and CCE become identical under a single negative sample; and (iii) show that BCE provides the tightest bound on NDCG and MRR when negative sampling is used. We complement our theoretical findings with empirical results on five datasets and four neural architectures, which consistently validate the theory.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Recommender Systems</kwd>
        <kwd>Loss Functions</kwd>
        <kwd>Negative Sampling</kwd>
        <kwd>Ranking Metrics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>2. Theoretical Analysis
negative (non-interacted) items. Let (, ) denote the model’s score for item .</p>
      <p>We consider a user  and a set of items ℐ, where  ⊂ ℐ
are positive (interacted) items and  are
2.1. Loss Definitions under Sampling
CCE - Categorical cross-entropy:
BCE - Binary cross-entropy:
BPR - Bayesian personalized ranking:</p>
      <p>where + ∈  and − ∈ .
2.2. Ranking Metrics
ℒCCE = − log</p>
      <p>(,+)
(,+) + ∑︀</p>
      <p>=1 (,− )
ℒBCE = − log  (︀ (, +))︀ −</p>
      <p>=1
︁(</p>
      <p>︁)
∑︁ log 1 −  ((, − ))
ℒBPR = −</p>
      <p>∑︁ log 
=1
︁(</p>
      <p>(, +) − (, − )︁)
The Normalized Discounted Cumulative Gain (NDCG) is a widely used recommendation metric that
accounts for the graded relevance of items depending on their position in the ranked list:
 (+) =</p>
      <p>1
2(1 + +)
 (+) =
1
+
if there’s only one relevant item and + is its rank position.
the rank position + of the first relevant item in the recommendations:</p>
      <p>Another key ranking measure is the Mean Reciprocal Rank (MRR), which computes the inverse of
2.3. Equivalence of BPR and CCE
Under a single negative sample ( = 1), we prove that ℓ  is equivalent to ℓ .
Proposition 1. ℓ  = ℓ if one negative item  = 1 is sampled for each user.</p>
      <p>This highlights that, when sampling only one negative per positive, optimizing CCE or BPR leads to
the same parameter updates.
2.4. Equivalence of Global Minima
We now present a result that establishes the equivalence of the global minima of the three loss functions
when a single negative is sampled and item scores are bounded.</p>
      <p>Proposition 2. If +,  ∈ [− , ] for some  &gt; 0, then:
+
+
+



arg min ℓ = arg min ℓ  = arg min ℓ =  arg min ℓ = arg min ℓ  = arg min ℓ
= −</p>
      <p>This proposition implies that, under bounded scores and single negative sampling, BPR, BCE, and
CCE converge to the same optimal solution. Practically, it means that the choice of loss function does
not afect the ideal parameter configuration.</p>
      <p>However, in deep neural networks, these extreme score values are rarely reached due to regularization,
early stopping, and model inductive biases, which prevent overfitting and favour generalization [ 18, 19].
Hence, while useful, this result has limited applicability to real-world RS training scenarios.
2.5. Bounding Ranking Metrics
We now turn to the comparison of ranking losses from the perspective of their ability to upper bound
ranking metrics, particularly − log(NDCG), under uniform negative sampling.</p>
      <p>Theorem 1. When uniformly sampling  negative items, in the worst-case scenario, and + ≥ 0:
P(− log NDCG(+) ≤ ℓ ) ≥
P(− log NDCG(+) ≤ ℓ ) ≥</p>
      <p>P(− log NDCG(+) ≤ ℓ )</p>
      <p>This result shows that BCE ofers the tightest bound on NDCG among the three losses, followed by
BPR and then CCE. While the exact behaviour depends on the rank + of the positive item and the
number of sampled negatives , BCE consistently exhibits more favourable properties, especially when
item embeddings remain well-distributed, avoiding embedding collapse [20].</p>
      <p>That said, practical dynamics during training, such as changing item ranks, diferences in optimization
behaviour between losses, and embedding concentration due to popularity bias, can afect these bounds.
Thus, while BCE is theoretically preferable, its advantage may vary in real-world scenarios.</p>
      <p>
        Additional theorems, full proofs, and the extension to MRR can be found in the original work [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
3. Empirical Evaluation
We validate our theoretical insights on five benchmarks (MovieLens-1M[ 21], Amazon-Beauty[22],
Amazon-Books[22], Yelp [23], and Foursquare NYC [24]) and four architectures (matrix factorization[25],
Self-attentive Sequential Recommendation (SASRec) [26], GRU4Rec[27], and LightGCN[28]). For each
setting, we vary  ∈ {1, 5, 10, 20} negatives per positive and measure NDCG@10 and MRR [29].
3.1. Efect of Negative Sampling
0.6
100
101
102
100
101
102
100
101
102
Epoch
      </p>
      <p>Epoch</p>
      <p>Epoch
(a) BCE – GRU4Rec
(b) BCE – SASRec
(c) BCE – GRU4Rec (Foursquare)</p>
      <p>We analyze how varying the number of negative items afects training on ML-1M using BCE. As
shown in Fig. 1, fewer negatives yield faster improvements in early epochs, while a larger number (e.g.,
100) leads to slower starts but better final performance. This reflects a trade-of: fewer negatives ease
0.6
0.5
010.4
@
CG0.3
D
N0.2
0.1
early learning, but more negatives improve generalization by providing harder contrasts. For BPR and
CCE, we observe similar trends with slightly more stable early-phase training (see complete results in
the original paper).
3.2. Loss Comparison: 1 vs 100 Negatives
100
101</p>
      <p>102
Epoch
100
101
102
100
101</p>
      <p>102
Epoch</p>
      <p>Epoch
(a) ML-1M (1 negative)
(b) ML-1M (100 negatives)
(c) Foursquare (100 negatives)
0.7
0.6
00.5
1
@0.4
G
CD0.3
N
0.2
0.1
0.6
0
1
@0.4
G
C
D
N0.2
100
101
102
100
101
102
103
100
101
102
Epoch</p>
      <p>Epoch</p>
      <p>Epoch
(a) ML-1M (1 negative)
(b) ML-1M (100 negatives)
(c) Foursquare (100 negatives)</p>
      <p>Figs. 2 and 3 compare loss functions using 1 and 100 negative samples. With a single negative, BPR
and CCE perform identically on SASRec, as predicted by theory. BCE shows superior final performance,
confirming its tighter bound to ranking metrics. On GRU4Rec, diferences between losses are smaller.</p>
      <p>When using 100 negatives, CCE generally performs better than BPR early in training, while BCE starts
slower but steadily improves, surpassing both losses in later epochs. On Foursquare (Figs. 2c and 3c),
BCE again starts behind but shows strong late-phase gains. However, due to its slower convergence,
CCE often remains the most stable choice in early-to-mid training.
4. Conclusion and Future Work
We presented a unified theoretical framework that (i) links popular recommendation losses under
negative sampling, (ii) uncovers an equivalence between BPR and CCE for a single negative, and (iii)
establishes BCE as the preferred surrogate for ranking metrics. Future directions include extending the
analysis to dynamic sampling schemes and to include gradient descent dynamics.</p>
    </sec>
    <sec id="sec-2">
      <title>Acknowledgments</title>
      <p>This work was partially supported by projects FAIR (PE0000013) and SERICS (PE00000014), under the
MUR National Recovery and Resilience Plan funded by the European Union - NextGenerationEU, and
project NEREO (Neural Reasoning over Open Data), funded by the Italian Ministry of Education and
Research (PRIN) Grant no. 2022AEFHAZ.</p>
      <p>Declaration on Generative AI
During the preparation of this work, the authors used GPT-4 and DeepL for grammar and spelling
check. After using these tools, the authors reviewed and edited the content as needed and take full
responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Di Teodoro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Siciliano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tonellotto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Silvestri</surname>
          </string-name>
          ,
          <article-title>A theoretical analysis of recommendation loss functions under negative sampling</article-title>
          ,
          <source>2025 International Joint Conference on Neural Networks (IJCNN)</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Rendle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Freudenthaler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Gantner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Schmidt-Thieme</surname>
          </string-name>
          ,
          <article-title>Bpr: Bayesian personalized ranking from implicit feedback</article-title>
          ,
          <source>in: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence</source>
          , UAI '09, AUAI Press, Arlington, Virginia, USA,
          <year>2009</year>
          , p.
          <fpage>452</fpage>
          -
          <lpage>461</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A. V.</given-names>
            <surname>Petrov</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Macdonald, gsasrec: Reducing overconfidence in sequential recommendation trained with negative sampling</article-title>
          ,
          <source>in: Proceedings of the 17th ACM Conference on Recommender Systems</source>
          , RecSys '23,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2023</year>
          , p.
          <fpage>116</fpage>
          -
          <lpage>128</lpage>
          . URL: https://doi.org/10.1145/3604915.3608783. doi:
          <volume>10</volume>
          .1145/3604915.3608783.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. Zhang,</surname>
          </string-name>
          <article-title>Understanding the role of cross-entropy loss in fairly evaluating large language model-based recommendation</article-title>
          ,
          <year>2024</year>
          . arXiv:
          <volume>2402</volume>
          .
          <fpage>06216</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Fu</surname>
          </string-name>
          , T. Qiu,
          <article-title>On the efectiveness of sampled softmax loss for item recommendation</article-title>
          ,
          <source>ACM Trans. Inf. Syst</source>
          .
          <volume>42</volume>
          (
          <year>2024</year>
          ). URL: https://doi.org/10.1145/3637061. doi:
          <volume>10</volume>
          .1145/3637061.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bruch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bendersky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Najork</surname>
          </string-name>
          ,
          <article-title>An analysis of the softmax cross entropy loss for learning-to-rank with binary relevance</article-title>
          ,
          <source>in: Proceedings of the 2019 ACM SIGIR international conference on theory of information retrieval</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>75</fpage>
          -
          <lpage>78</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Pu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Learning-eficient yet generalizable collaborative filtering for item recommendation</article-title>
          ,
          <source>in: Forty-first International Conference on Machine Learning (ICML)</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>W.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Psl: Rethinking and improving softmax loss from pairwise perspective for recommendation</article-title>
          ,
          <source>in: The Thirty-eighth Annual Conference on Neural Information Processing Systems</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bacciu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Siciliano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tonellotto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Silvestri</surname>
          </string-name>
          ,
          <article-title>Integrating item relevance in training loss for sequential recommender systems</article-title>
          ,
          <source>in: Proceedings of the 17th ACM Conference on Recommender Systems</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>1114</fpage>
          -
          <lpage>1119</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>F.</given-names>
            <surname>Siciliano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lagziel</surname>
          </string-name>
          , I. Gamzu, G. Tolomei,
          <article-title>Robust training of sequential recommender systems with missing input data (</article-title>
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>K.</given-names>
            <surname>Järvelin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kekäläinen</surname>
          </string-name>
          ,
          <article-title>Cumulated gain-based evaluation of ir techniques</article-title>
          ,
          <source>ACM Trans. Inf. Syst</source>
          .
          <volume>20</volume>
          (
          <year>2002</year>
          )
          <fpage>422</fpage>
          -
          <lpage>446</lpage>
          . URL: https://doi.org/10.1145/582415.582418. doi:
          <volume>10</volume>
          .1145/582415.582418.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Weston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Usunier</surname>
          </string-name>
          ,
          <article-title>Wsabie: scaling up to large vocabulary image annotation</article-title>
          ,
          <source>in: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence -</source>
          Volume Volume Three,
          <source>IJCAI'11</source>
          , AAAI Press,
          <year>2011</year>
          , p.
          <fpage>2764</fpage>
          -
          <lpage>2770</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>C. J. C.</surname>
          </string-name>
          <article-title>Burges, From RankNet to LambdaRank to LambdaMART: An Overview</article-title>
          ,
          <source>Technical Report, Microsoft Research</source>
          ,
          <year>2010</year>
          . URL: http://research.microsoft.com/en-us/um/people/cburges/tech_ reports/MSR-TR-2010-82.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>F.</given-names>
            <surname>Yuan</surname>
          </string-name>
          , G. Guo,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Jose</surname>
          </string-name>
          , L. Chen,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Lambdafm:
          <article-title>Learning optimal ranking with factorization machines using lambda surrogates</article-title>
          ,
          <source>in: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management</source>
          , CIKM '16,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>