<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Identification of the Relevance of Comments in Codes Using Graph Neural Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Durairaj Thenmozhi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aadit P</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Adithya S</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Harshil Malisetty</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rohan R</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Engineering, SSN College of Engineering</institution>
          ,
          <addr-line>Chennai</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>This study presents our submission to the FIRE 2025 IRSE shared task, which focuses on automatically determining whether a comment within source code is genuinely helpful or redundant. Unlike prior approaches that primarily relied on bag-of-words features or large transformer based models, we propose a graph oriented approach in which each code-comment pair is transformed into a graph structure. Here, each node corresponds to individual tokens from the code-comment pairs, edges capture diferent sorts of relationships between code-comment pair tokens such as normal sequential order of tokens, skip connections for longer range dependencies, and dedicated bridge links that explicitly connect code elements to comment tokens. For said node representation, we use 53-dimensional vectors obtained from pretrained embeddings and we train a compact Graph Neural Network comprising of only 9762 learnable parameters. To further assess the robustness of this approach, we further evaluate it on synthetically constructed datasets, where the trained model attains 62.5% accuracy on a smaller set of 16 samples and 83% on a larger set of 100 samples.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Comprehending source code is a fundamental prerequisite in modern software development. When
engineers are tasked with updating or extending an application, they frequently rely on accompanying
comments to understand the purpose of implementations and the rationale behind design decisions.
However, the quality and usefulness of such comments in real-world projects can vary considerably
depending on multiple factors.</p>
      <p>The Forum for Information Retrieval Evaluation (FIRE) 2022 edition introduced a shared task aimed at
automatically classifying whether code comments are useful or not useful for given code snippets written
in the C programming language. As part of this initiative, a dataset of manually labeled code–comment
pairs was built, where domain experts assessed the usefulness of each comment.</p>
      <p>
        Previous research addressing this problem has explored a wide range of computational approaches.
Majumdar [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] developed CommentProbe, which employs custom embeddings called SWVec, trained on
Stack Overflow data, combined with LSTM neural networks, achieving an F1 score of 0.8634 on this
task. The FIRE 2022 shared task [
        <xref ref-type="bibr" rid="ref1 ref6">1, 6</xref>
        ] included submissions comparing traditional bag of words models
with transformer based approaches such as BERT, RoBERTa, and ALBERT. Work by Sruthi and Basu [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
demonstrated that, in certain cases, simpler bag of words models with TF-IDF weighting outperformed
more complex transformer models.
      </p>
      <p>Most existing studies treat source code purely as natural language text, applying standard natural
language processing techniques without accounting for the inherent structural properties of code.
A graph based formulation can potentially enable a model to better determine whether a comment
is genuinely useful. Graph Neural Networks (GNNs) have demonstrated strong performance across
various code analysis tasks but, to the best of our knowledge, have not yet been applied to this specific
problem.</p>
      <p>This work makes three primary contributions. First, we propose a graph based representation for
each code - comment pair that explicitly connects tokens from both sides through multiple types of
links. Second, we develop and train a lightweight GNN classifier that utilizes 53-dimensional node
features with only 9,762 trainable parameters. Finally, we evaluate the model’s generalization capability
using synthetically generated datasets of varying sizes.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>2.1. FIRE 2022 Shared Task Results</title>
        <p>
          The FIRE 2022 IRSE shared task attracted several participants who explored diverse computational
approaches for assessing the usefulness of code comments. [
          <xref ref-type="bibr" rid="ref10 ref11 ref12">10, 11, 12</xref>
          ] share insights about this
task with a diferent perspective of using metadata for better contexts. Sruthi and Basu [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] submitted
ifve experimental runs that evaluated both classical machine learning and deep learning methods.
Their classical models relied on TF-IDF and entropy based feature weighting schemes, combined with
classifiers such as Support Vector Machines (SVMs), Random Forests, and Logistic Regression. They
also finetuned transformer based architectures, including BERT, RoBERTa, and ALBERT. Their findings
revealed that, in some cases, simpler models outperformed more complex ones. The best bag of words
approach achieved an F1 score of 0.72 on the training set, whereas ALBERT attained only 0.67. Both
approaches exhibited noticeable performance degradation on the test set, with their best submission
reaching 53% accuracy.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Graph Representations and Comment Quality</title>
        <p>
          Graph based representations have gained increasing attention in recent years for a variety of
codeanalysis tasks. Chen and Monperrus [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] provided a comprehensive survey of embedding techniques
for source code, highlighting the use of Abstract Syntax Trees (ASTs), Control Flow Graphs (CFGs),
Data Flow Graphs (DFGs), and Program Dependence Graphs (PDGs). Our approach difers in that it
constructs custom graphs that directly connect code structures with corresponding natural language
comments, thereby enabling the model to learn associations between structural code patterns and
comment usefulness.
        </p>
        <p>
          Research on the quality and consistency of code comments has been active for over a decade. Tan [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]
developed iComment, a tool that integrates natural language processing, machine learning, and program
analysis to detect inconsistencies between code and comments, achieving accuracy rates ranging from
90.8% to 100%. Majumdar [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] introduced CommentProbe, a framework designed specifically for comment
usefulness classification. Their approach utilized specialized embeddings, termed SWVec, and achieved
an F1 score of 0.8634 on C language codebases. More recently, Majumdar [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] explored the use of
large language models (LLMs) for code comprehension, investigating how retrieval and reasoning
mechanisms can be leveraged to generate meaningful and contextually appropriate comments.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. Problem Formulation and Dataset</title>
        <p>
          The dataset employed in this study was introduced as part of the FIRE 2022 IRSE shared task [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
It comprises 11,452 code - comment pairs written in the C programming language, each manually
annotated by domain experts as either useful or not useful. The dataset includes code snippets extracted
from open-source repositories such as glibc, fmpeg and linux. Prior to model development, the data
were preprocessed by removing comment markers, standardizing indentation, and converting all tokens
to lowercase to ensure uniformity.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Graph Construction from Code - Comment Pairs</title>
        <p>Each code - comment pair is represented as an undirected, weighted graph using the NetworkX library.
The graph construction process consists of three primary stages: tokenization, node creation, and edge
formation.</p>
        <p>• Tokenization: For the source code, text is segmented into tokens using the regular expression
[+[]\w]+, which retains words, operators, and punctuation as separate entities. All tokens are
converted to lowercase. For comments, comment markers are removed, and alphabetic words are
extracted using [a-zA-Z][a-zA-Z0-9]*. Tokens consisting of a single character are discarded
to reduce noise.
• Node Creation: Each token is represented as a node that stores its textual content, source
(code or comment), and positional index within the sequence. Code tokens are positioned before
comment tokens to preserve a logical structure.
• Edge Formation: Multiple types of edges are added to capture both local sequential dependencies
and long-range relationships. Sequential edges with a weight of 1.0 connect adjacent tokens
within the same sequence. Additional edges include skip links for non-adjacent dependencies,
bridge edges connecting related code and comment tokens, and self-loops for all nodes to retain
self-referential information.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Node Feature Extraction</title>
        <p>Each node is encoded as a 53-dimensional feature vector. These embeddings are pretrained on the
training corpus and capture token identity, contextual information from neighboring tokens, origin
type (code or comment), and normalized positional attributes. The embeddings are saved and reused
across all subsequent experiments to maintain consistency and reduce computational overhead.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Graph Neural Network Architecture</title>
        <p>
          The proposed model builds upon the Graph Convolutional Network (GCN) framework introduced
by Kipf and Welling [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. The first GCN layer transforms the 53-dimensional token embeddings into
64-dimensional feature representations using ReLU activation, while the second layer preserves the
same dimensionality. Following node level updates, a global mean pooling operation aggregates the
node embeddings into a single graph level representation. Dropout with a rate of 0.3 is applied to
mitigate overfitting.
        </p>
        <p>The pooled representation is then passed through a fully connected layer that reduces its
dimensionality from 64 to 32, followed by a final linear layer that outputs two logits corresponding to the binary
classes. Overall, the model comprises 9,762 trainable parameters, emphasizing its lightweight design.
The GCN layer updates each node representation according to the following formulation:
⎛
ℎ(+1) =  ⎝</p>
        <p>∑︁
∈ ()∪{}</p>
        <p>1
√︀</p>
        <p>⎞
 ()ℎ()
 ⎠ ,
() denotes the feature vector of node  at layer ,  () represents the set of neighboring nodes,
where ℎ
 is the degree of node ,  () is the learnable weight matrix, and  is the ReLU activation function.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Training Configuration and Class Balancing</title>
        <p>The dataset was divided into 80% for training (9,161 samples) and 20% for testing (2,291 samples). The
training subset exhibited a class imbalance, with 3,510 not useful samples and 5,651 useful samples. To
address this issue, an upsampling strategy was adopted by generating synthetic variants of the not
useful samples. Gaussian noise with a standard deviation of  = 0.05 was added to the 53-dimensional
embeddings, producing a balanced training set with 5,651 examples per class.</p>
        <p>Model training was performed using the Adam optimizer with a learning rate of 0.001 and a weight
decay of 10−4 . A batch size of 64 was used, and training proceeded for 30 epochs. The binary
crossentropy loss function was employed as the objective. At the end of each epoch, the model checkpoint
yielding the highest validation accuracy was preserved for evaluation.</p>
      </sec>
      <sec id="sec-3-6">
        <title>3.6. Synthetic Data Generation</title>
        <p>To evaluate the model’s generalization capability beyond the original training distribution, two synthetic
test sets were constructed. The smaller set contained 16 examples evenly divided between useful and not
useful classes, while the larger set consisted of 100 similarly balanced examples. The useful comments
described non trivial programming concepts such as binary search algorithms or memory management,
whereas the not useful comments corresponded to trivial statements such as “set x to 1” or “loop through
array.” Each synthetic code - comment pair was processed using the same graph construction pipeline
as the original dataset to ensure methodological consistency.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Results</title>
      <sec id="sec-4-1">
        <title>4.1. Performance on Original Test Set</title>
        <p>The proposed model achieved an accuracy of 80.8% on the original test set comprising 2,291 samples.
The weighted F1 score was 0.807, indicating well-balanced performance across both classes. For the not
useful class, precision, recall, and F1 scores were 0.77, 0.71, and 0.74, respectively. For the useful class,
precision was 0.83, recall was 0.87, and F1 was 0.85.</p>
        <p>The confusion matrix revealed that, among 879 not useful samples, 626 (71%) were correctly classified,
while 253 were misclassified as useful. Conversely, for 1,412 useful samples, 1,226 (87%) were correctly
identified, and 186 were misclassified as not useful. Table 1 presents a comparison of model performance
under diferent experimental configurations. The baseline configuration, which utilized 6-dimensional
features, achieved 79.8% accuracy. Increasing the feature dimensionality to 53 improved accuracy to
80.4%, and incorporating upsampling further enhanced it to 80.8%.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Performance on Synthetic Test Sets</title>
        <p>When evaluated on the small synthetic dataset containing 16 samples, the model attained an accuracy
of 62.5%, corresponding to an 18.3% decrease relative to the original test set. For the larger synthetic
dataset of 100 samples, accuracy increased to 83.0%. These findings suggest that model performance
improves with larger and more diverse synthetic datasets, indicating sensitivity to dataset size and
variability.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Training Dynamics</title>
        <p>During training, the model demonstrated stable convergence behavior. The training loss decreased
from 0.607 in the first epoch to 0.421 by the 30th epoch, while validation accuracy exhibited steady
improvement during the initial 15–20 epochs before plateauing around epochs 25–26. No significant
overfitting was observed, as validation accuracy remained consistent and did not deteriorate toward
the end of training. These trends indicate that the model efectively generalized to unseen data without
sacrificing stability.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <sec id="sec-5-1">
        <title>5.1. Advantages of Graph-Based Representation</title>
        <p>The proposed graph-based representation ofers distinct advantages over treating code and comments
purely as plain text. By incorporating multiple types of connections, the graph structure enables
the model to better capture relationships between code tokens and corresponding comment tokens.
Sequential edges preserve the inherent order of tokens, while skip connections allow the model to
recognize long-range dependencies across non-adjacent tokens. Bridge connections, on the other hand,
explicitly link code and comment elements, allowing the model to associate linguistic context with
program structure. Collectively, these connections help the model discern whether a comment provides
meaningful information or simply restates the code.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Limitations and Future Work</title>
        <p>The proposed approach has certain limitations. First, the graph construction process relies on manually
defined rules, rather than being learned adaptively from data. Second, the 53-dimensional embeddings,
though efective, may not fully capture the semantic richness of complex programming constructs or
natural-language comments. Incorporating larger pre-trained representations such as CodeBERT or
GraphCodeBERT could provide more expressive features.</p>
        <p>Future extensions of this work may focus on: (1) learning optimal graph structures and edge weights
directly from data, (2) integrating attention-based mechanisms for more efective information
propagation, and (3) employing large-scale pre-trained models trained on diverse code repositories to enhance
generalization across programming languages and comment styles.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Comparison with Related Work</title>
        <p>
          On the FIRE 2022 IRSE dataset, the proposed model achieved an accuracy of 80.8%, which is comparable
to, though slightly below, the performance of CommentProbe [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], which reported an F1 score of 0.863.
This diference can be attributed to the specialized embeddings used by CommentProbe, which were
trained on extensive Stack Overflow data. When compared to the submissions by Sruthi and Basu [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ],
our model demonstrates superior performance, achieving an F1 score of 0.807 compared to 0.72 for the
best bag-of-words model and 0.67 for transformer-based approaches. Furthermore, our model maintains
a substantially smaller parameter count than transformer-based models, underscoring its eficiency and
suitability for resource-constrained environments without compromising accuracy.
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>This paper presented a graph-based framework for classifying the usefulness of code comments using
Graph Neural Networks (GNNs). The proposed model achieved an accuracy of 80.8% on the FIRE 2022
IRSE shared task, while remaining lightweight with only 9,762 trainable parameters. Evaluation on
synthetic datasets demonstrated that accuracy decreased to 62.5% on a smaller set of 16 samples, but
improved to 83.0% on a larger set of 100 samples. These findings suggest that the proposed graph-based
approach efectively captures structural relationships between code and comments, striking a balance
between the simplicity of bag-of-words models and the representational power of transformer-based
architectures.</p>
      <p>The results underscore the potential of graph-oriented models as eficient alternatives for comment
usefulness classification, especially in scenarios where computational resources are limited. Future
research may explore scaling this framework with larger pre-trained embeddings and automated graph
construction techniques to further enhance generalization.</p>
      <p>All source code and experimental configurations are publicly available at: https://github.com/
CrimsonCoderAadit/code_comment_project_clean</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>We thank the organizers of FIRE 2025 IRSE shared task for providing the dataset and evaluation
framework. The authors would also like to express their gratitude to the faculty of the Department
of Computer Science and Engineering, SSN College of Engineering, for their valuable guidance and
support throughout this work.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>In the course of preparing this manuscript, the author(s) employed the generative AI tool ChatGPT. Its
use was limited to performing checks for grammar and spelling. Following this, the author(s) conducted
a thorough review and revision of the text and assume full responsibility for the final published content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          et al.,
          <source>“Overview of the IRSE track at FIRE</source>
          <year>2022</year>
          :
          <article-title>Information Retrieval in Software Engineering</article-title>
          ,” ACM,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          et al.,
          <article-title>“Automated evaluation of comments to aid software maintenance</article-title>
          ,
          <source>” Journal of Software: Evolution and Process</source>
          , vol.
          <volume>34</volume>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          et al.,
          <article-title>“Comprehending C Codes with LLMs,” Pattern Recognition Letters</article-title>
          , Elsevier,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Paul</surname>
          </string-name>
          et al.,
          <article-title>“Eficiency of Large Language Models to scale up Ground Truth: Overview of the IRSE Track at Forum for</article-title>
          <source>Information Retrieval</source>
          <year>2023</year>
          ,”
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Paul</surname>
          </string-name>
          et al.,
          <article-title>“Overview of the 'Information Retrieval in Software Engineering' (IRSE) track at</article-title>
          <source>Forum for Information Retrieval</source>
          <year>2024</year>
          ,”
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>S. S</surname>
          </string-name>
          , T. Basu, “
          <article-title>Identification of the Relevance of Comments in Codes Using Bag of Words and Transformer Based Models</article-title>
          ,
          <source>” FIRE 2022 Working Notes</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Tan</surname>
          </string-name>
          et al., “/*icomment: Bugs or bad comments?*/,” SIGOPS Oper.
          <source>Syst. Rev.</source>
          , vol.
          <volume>41</volume>
          , pp.
          <fpage>145</fpage>
          -
          <lpage>158</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          , M. Monperrus, “
          <article-title>A literature study of embeddings on source code</article-title>
          ,” arXiv:
          <year>1904</year>
          .03061,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T. N.</given-names>
            <surname>Kipf</surname>
          </string-name>
          , M. Welling, “
          <article-title>Semi-supervised classification with graph convolutional networks</article-title>
          ,
          <source>” ICLR</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Deshpande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Maji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mondol</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>P. D.</given-names>
          </string-name>
          <string-name>
            <surname>Clough</surname>
            , and
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Majumdar</surname>
          </string-name>
          , “
          <article-title>The Code-LLM Handshake: Smarter Maintenance Through AI,” in Proceedings of the 17th Annual Meeting of the Forum for Information Retrieval Evaluation</article-title>
          , pp.
          <fpage>9</fpage>
          -
          <lpage>12</lpage>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mukhopadhyay</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>P. D.</given-names>
          </string-name>
          <string-name>
            <surname>Clough</surname>
            , and
            <given-names>P. P.</given-names>
          </string-name>
          <string-name>
            <surname>Chakrabarti</surname>
          </string-name>
          , “
          <article-title>Operationalizing Large Language Models with Design-Aware Contexts for Code Comment Generation</article-title>
          ,” arXiv preprint arXiv:
          <volume>2510</volume>
          .22338,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar and P. P. Das</surname>
          </string-name>
          , “
          <article-title>Smart Knowledge Transfer using Google-like Search,”</article-title>
          <source>arXiv preprint arXiv:2308.06653</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>