<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Enhancing Recommendation Systems' Performance with Network-Oriented Negative Sampling</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Elif Ece Erdem</string-name>
          <email>eceerdem@gsu.edu.tr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Günce Keziban Orman</string-name>
          <email>korman@gsu.edu.tr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Galatasaray University</institution>
          ,
          <addr-line>Istanbul</addr-line>
          ,
          <country country="TR">Türkiye</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Recommendation Systems</institution>
          ,
          <addr-line>Negative Sampling, Network Topology, Data Sparsity</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>eficiency gains. Recommendation systems often struggle with data sparsity and class imbalance, afecting their ability to suggest relevant items. An eficient solution to these problems is negative sampling. This study introduces three novel network-oriented negative sampling strategies-Path-Length (PL-NS), Difusion Distance (DD-NS), and Path-Density (PD-NS)-to enhance the GCN-based recommendation systems' performance. Integrated into the LightGCN framework and tested on the Last.fm dataset, our methods showed up to a 5% improvement in Discovery Science - Late Breaking Contributions 2024 ∗Corresponding author.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Recommendation systems (RS) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] are essential for aiding user decisions in various applications, from
e-commerce to entertainment [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The efectiveness of these systems largely depends on the quality
and quantity of user interaction data. In recent years, collaborative filtering (CF) algorithms, which
utilize user-item interactions, have become popular for their superior RS performance [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], relying on
explicit interactions like ratings or implicit ones like clicks to infer user preferences [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        However, two key challenges hinder RS performance: data sparsity and class imbalance. Data sparsity
occurs when users interact with only a small subset of available items, leading to incomplete data
that limits the system’s ability to provide personalized recommendations [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Class imbalance, where
interacted items are vastly outnumbered by non-interacted ones, further complicates accurate learning
of user preferences [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>Negative sampling, the task of selecting items a user is unlikely to be interested in, has become
a practical solution, especially in Graph Convolutional Network (GCN)-based CF methods [6]. The
most common negative sampling approach, uniform random sampling (RNS), treats all non-interacted
items equally, often leading to the selection of irrelevant items that do not contribute efectively to the
learning [6]. However, RNS may fall short due to the abundance of irrelevant items.</p>
      <p>Our study introduces novel negative sampling techniques that leverage the structure of user-item
bipartite networks, aiming to identify more informative negative examples and improve the GCN-based
recommendation systems’ accuracy. The key contributions include: (1) proposing ten new methods
that address RNS ineficiencies in sparse, imbalanced data; (2) integrating these methods into LightGCN,
one of the top-performing GCN-based CF models [7]; (3) evaluating their impact on a well-known RS
dataset, focusing on accuracy, convergence time, and comparisons with two baselines; and (4) analyzing
the efects of combining our techniques with RNS at diferent levels of uniform selection.</p>
      <p>The paper is organized as follows: Section 2 introduces our method, introducing Path-Length Negative
(PD-NS). Section 3 presents our experiments and the results. Finally, Section 4 concludes with a summary
and future research directions.</p>
      <p>CEUR</p>
      <p>ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>2. Method</title>
      <p>We introduce three novel strategies for finding more informative negative examples than traditional
methods based on diferent network topology aspects.</p>
      <p>Path-Length Negative Sampling (PL-NS): This strategy prioritizes negative samples based on the
shortest path distance within the user-item bipartite network. Items farther away from the user are
assumed to be less relevant and more informative. We use two methods for selecting these negative
samples. In Long-Distance Weighted Selection, items are weighted according to their distance from
the user, with longer paths being prioritized. In Quartile-Based Selection, items are selected based on
specific quartiles of the distance distribution, targeting items that are neither too close nor too far from
the user.</p>
      <p>Difusion Distance Negative Sampling (DD-NS): This approach measures node similarity using
difusion distance [ 8], which reflects how information propagates through the network via random
walks. The difusion distance provides a nuanced measure of similarity that considers both direct and
indirect connections. We propose several strategies for selecting negative samples based on difusion
distance.</p>
      <p>In Farthest Node Selection and Nearest Node Selection, we select items with the highest and
smallest difusion distance from the user, respectively. In Long-Distance Weighted Selection and
Quartile-Based Selection, similar to PL-NS, but applied to difusion distance metrics.</p>
      <p>Path-Density Negative Sampling (PD-NS): This strategy focuses on the density of paths between
user and item nodes within the bipartite network. Path density is estimated using random walks, where
the number of successful walks reaching an item from a user node indicates the path density. Two
selection strategies are employed.</p>
      <p>In Max Path Density Selection and Min Path Density Selection, we prioritize items with the
highest and lowest path density, respectively, to identify less relevant and more informative negative
samples.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments and Results</title>
      <p>We integrate our proposed negative sampling strategies into the LightGCN framework which is known
for its simplicity and eficiency in recommendation tasks 1. We tested these methods on the Last.fm
dataset, a standard benchmark in recommendation systems, using Precision@10, Recall@10, and
NDCG@10, comparing our techniques with traditional RNS, which selects negative samples randomly,
and PNS (popularity-based negative sampling) [9], which favors frequently interacted items as negatives.
The LightGCN model was configured as in its original implementation, with a maximum of 10 positive
training examples per user, randomly selected for users with more interactions. Table 1 summarizes
our experimental results, showing that our negative sampling strategies outperformed the baselines
across all metrics, efectively improving recommendation performance.</p>
      <p>The Long-Distance approach achieved the highest performance among the PL-NS methods, with a
4% improvement in Precision@10 and Recall@10 over the PNS baseline. The Quartile-Based Selection
performed slightly lower, likely due to excluding informative negatives at other path lengths. In the
DD-NS, the Long-Distance approach also led, with up to a 2.3% improvement in Precision@10 and
a 2.5% boost in NDCG@10 compared to RNS. However, farthest-node and nearest-node selections
underperformed, suggesting extreme difusion distances may be less efective. The PD-NS generally
underperformed against the baselines. Although the PD-Max approach showed some potential, it was
not as efective as the other network-oriented strategies, indicating that path density alone may not
strongly indicate negative sample relevance in this dataset. Combining proposed strategies with RNS,
hybrid approaches showed mixed results: random negatives sometimes improved performance slightly
in some cases, but often diluted the benefits of network-oriented methods. This suggests that while
some randomness can be helpful, most negative samples should be selected based on network topology
1Our code is available at https://github.com/tripledoubleE/network_oriented_NS
for optimal performance. Besides our methods consistently achieved faster convergence than the RNS
baseline, reducing training time by 30-35%, demonstrating that more informative negative sampling
improves both recommendation accuracy and training eficiency.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>In this study, we introduced ten novel methods leveraging the user-item bipartite network to prioritize
more informative negatives. Integrating these into the LightGCN framework significantly improved
recommendation accuracy and model convergence. Specifically, our best-performing method, Difusion
Distance-based Long-Distance Negative Sampling, achieved up to a 5% improvement in Precision@10
and 6.7% in NDCG@10 compared to traditional static negative sampling baseline, PNS.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work is supported by the Galatasaray University Research Fund (BAP) within the scope of project
number FBA-2024-1240, titled Bağlantı Tahmini Yöntemleri ve Sınıf Dengesizliği Sorununa Çözüm
Yaklaşımları, and the bilateral project of the Scientific and Technological Research Council of Türkiye
(TÜBITAK), under grant number 122N701, with CampusFrance, within the scope of Hubert Curien
Partnerships (PHC) project number 49032VB.
[6] Z. Yang, M. Ding, C. Zhou, H. Yang, J. Zhou, J. Tang, Understanding negative sampling in graph
representation learning, in: Proceedings of the 26th ACM SIGKDD International Conference on
Knowledge Discovery &amp; Data Mining, KDD ’20, Association for Computing Machinery, New York,
NY, USA, 2020, p. 1666–1676. doi:10.1145/3394486.3403218.
[7] X. He, K. Deng, X. Wang, Y. Li, Y. Zhang, M. Wang, Lightgcn: Simplifying and powering graph
convolution network for recommendation, in: Proceedings of the 43rd International ACM SIGIR
Conference on Research and Development in Information Retrieval, SIGIR ’20, Association for
Computing Machinery, New York, NY, USA, 2020, p. 639–648. doi:10.1145/3397271.3401063.
[8] G. Bertagnolli, M. De Domenico, Difusion geometry of multiplex and interdependent systems,</p>
      <p>Phys. Rev. E 103 (2021) 042301. doi:10.1103/PhysRevE.103.042301.
[9] Z. Yang, M. Ding, X. Zou, J. Tang, B. Xu, C. Zhou, H. Yang, Region or global? a principle for negative
sampling in graph-based recommendation, IEEE Transactions on Knowledge and Data Engineering
35 (2023) 6264–6277. doi:10.1109/TKDE.2022.3155155.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kosiur</surname>
          </string-name>
          , Understanding
          <string-name>
            <surname>Policy-Based</surname>
            <given-names>Networking</given-names>
          </string-name>
          , 2nd. ed., Wiley, New York, NY,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F.</given-names>
            <surname>Ricci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Rokach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Shapira</surname>
          </string-name>
          , Recommender Systems Handbook, 3rd ed., Springer New York, NY,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <article-title>Graph neural networks in recommender systems: A survey</article-title>
          ,
          <source>ACM Comput. Surv</source>
          .
          <volume>55</volume>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .1145/3535101.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Natarajan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vairavasundaram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Natarajan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. H.</given-names>
            <surname>Gandomi</surname>
          </string-name>
          ,
          <article-title>Resolving data sparsity and cold start problem in collaborative filtering recommender system using linked open data</article-title>
          ,
          <source>Expert Systems with Applications</source>
          <volume>149</volume>
          (
          <year>2020</year>
          )
          <article-title>113248</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.eswa.
          <year>2020</year>
          .
          <volume>113248</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , T. M.
          <article-title>Khoshgoftaar, Survey on deep learning with class imbalance</article-title>
          ,
          <source>Journal of Big Data</source>
          (
          <year>2019</year>
          ).
          <source>doi:10.1186/s40537- 019- 0192- 5.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>