<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Empirical Study on Synthetic Data Augmentation for Code Comment Quality Classification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lakshay Khurana</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Indian Institute of Technology</institution>
          ,
          <addr-line>Goa, 403401</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>Code comments are important for software maintenance tasks, but developing an automated assessment of the quality of code comments has proven more dificult than expected, often due to a shortage of large annotated datasets. In this paper, we will assess the utility of data augmentation using synthetically labeled comments to classify comments as useful or not useful. In particular, we will use GPT-3.5-turbo to label a second dataset that we will use to augment a manually labeled dataset. We develop and evaluate a baseline Support Vector Machine model with an F1 score of 0.80 on the original dataset and find that while we assumed we would improve on this performance by adding the synthetic data, the change in performance is negligible. The takeaway of the paper is to recognize the limitations of simple data augmentation with synthetically generated labels for our task, while acknowledging the model's ability to find signal in the data of the baseline model with suficient data generated by human annotation.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Large Language Models</kwd>
        <kwd>GPT-3</kwd>
        <kwd>5</kwd>
        <kwd>Support Vector Machine</kwd>
        <kwd>Comment Classification</kwd>
        <kwd>Data Augmentation</kwd>
        <kwd>Qualitative Analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In today’s increasingly digital world, software has become vital across a wide range of industries,
including finance, healthcare, and transportation.The need for rapid innovation causes organizations to
constantly update existing software and design new applications. Increased software updates lead to
increasingly complex code, which makes managing substantial software systems a critical part of the
software development life cycle (SDLC).</p>
      <p>The time-consuming nature of these development processes can lead to poor code quality as
developers rush to fix bugs quickly, and this takes precedence over the development of new features-and
sometimes untested changes get released. When developers create software, they create documentation
such as design specs, and this documentation that can be critical future reference to the specific domain
predating developer onboarding may age or become no longer relevant to the software once it evolves,
changing language evolves and various later stages such as developer involvement ceases, and they
are no longer in the position to field questions relative to team members who inherit the project. This
indicates that quality-oriented processes are critical in software to create understanding and domain
knowledge regarding an projects existing code to be maintained.</p>
      <p>Information sources that are more reliable than word of mouth include executing tests and assessing
information derived using static code analysis methods and comments left by the developer(s) as even
repository history may tell a story from one team member to another. This paper focuses on comments
by developers relative to code comments, in which comments may save time and provide context for the
code being analyzed. The usefulness of comment artifacts by developers that came before a member on
a software development team is acknowledged. Importantly, the comments will serve as documentation
for capturing the rationale and goals for the code that should aid in the understanding and maintenance
of code. However, comments can also vary greatly in terms of quality, and therefore it is important to
be able to utilize an automated method to evaluate the rating of comments.</p>
      <p>One major issue in moving toward the development of such assessment tools is the lack of
wellannotated datasets that describe the range of comments across diferent programming contexts. In this
work, we evaluate the use of synthetic data augmentation to improve model performance. We start
with a manually labeled dataset and enhance this using synthetic data labeled by GPT-3.5-turbo.</p>
      <p>In this paper, we examine a binary classification problem for source code comments in the C
programming language, classifying them as ’Useful’ and ’Not Useful.’ We first establish a baseline using a
Support Vector Machine model trained on a dataset of over 11,000 manually labeled comments. We then
supplement this dataset with over 200 additional samples labeled by the GPT-3 model and assess any
improvement in overall performance. We found that the performance of the model remained unchanged,
maintaining roughly an F1 score of 0.80 for both the original and synthetic datasets.</p>
      <p>This research illustrates to the community a practical evaluation of using LLM-based synthetic data
augmentation, in regards to code comment classification. We hope to address a few of the current
challenges and guide the future creation of more robust and adaptable models in light of the growing
software engineering workforce.</p>
      <p>The paper is organized as follows. Section 2 reviews related work in comment classification. Section
3 details the task and dataset. Our methodology is presented in Section 4. The results are analyzed in
Section 5, and Section 6 concludes the study.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The research area of automated program comprehension is widely accepted in the field of software
engineering. Various tools and techniques have been proposed to extract knowledge from software
metadata, including runtime traces and structural code properties [
        <xref ref-type="bibr" rid="ref1 ref10 ref11 ref12 ref13 ref2 ref3 ref4 ref5 ref6 ref7 ref8 ref9">1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13</xref>
        ].
      </p>
      <p>
        Code comments are a primary resource for developers, yet their utility varies. Consequently, the
automatic classification of source code comments has been a focus of extensive research. Omal et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]
proposed organizing factors of software maintainability into hierarchical structures with measurable
attributes, allowing for a consolidated maintainability index. Fluri et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] investigated the
coevolution of source code and comments, finding that 97% of comment changes occur in the same
revision as the associated code changes in open-source systems like ArgoUML. Deissenboeck et al. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]
introduced a two-dimensional maintainability model linking system properties to maintenance activities,
creating a structured quality knowledge base for industrial application. Other studies have explored the
role of specific comment types, such as TODO annotations, in developers’ task management [17].
      </p>
      <p>
        The impact of comments on readability has also been empirically validated. An experiment by
Tenny et al. [18] demonstrated that programs without comments were the least readable. More recent
work has focused on fine-grained classification. Yu Hai et al. [ 19] classified comments into four
quality tiers—unqualified, qualified, good, and excellent—and improved results by aggregating basic
classification algorithms. Majumdar et al. [ 20] proposed "CommentProbe," an automatic classification
mechanism for evaluating the quality of comments in C codebases. Despite these varied approaches [
        <xref ref-type="bibr" rid="ref13">20,
21, 22, 23, 13, 24</xref>
        ], the automatic quality evaluation of source code comments remains a challenging and
active area of research.
      </p>
      <p>
        With the advent of large language models (LLMs) [25], it has become pertinent to compare their
assessment of code comment quality with human interpretation, like in the cases of [26, 27]. The IRSE
track at FIRE 2024 [
        <xref ref-type="bibr" rid="ref10">28, 10</xref>
        ] extends methodologies from [
        <xref ref-type="bibr" rid="ref11">20, 11, 29</xref>
        ] to explore vector space models [30]
for this task. This track specifically evaluates model performance when incorporating GPT-generated
labels, aligning closely with the objectives of our study.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Task and Dataset Description</title>
      <p>In this study, we tackle the problem of establishing a binary classification system for source code
comments. Specifically, the objective is to assign a code comment, along with the relevant source
code context, a label indicating whether the comment is useful or not useful. We formulate the two
classification categories as follows:
• Useful: The comment accurately describes or provides relevant, non-obvious information about
the associated code.
• Not Useful: The comment is redundant, inaccurate, or fails to convey meaningful information
about the code.</p>
      <p>Our main dataset includes over 15,000 pairs of C code and comments. These pairs have been annotated
by a group of 14 human annotators who have labeled each comment as either useful or not useful. To
explore the possibilities of synthetic data, we created a secondary dataset of 233 code-comment pairs
taken from GitHub. These pairs were automatically annotated using GPT-3.5-turbo. This secondary
dataset is also in the same structure as the original dataset and will serve as an augmentation corpus
for our experiments.</p>
      <p>#</p>
      <p>Description
2 /*Example 1 for data
aug</p>
      <p>mentation*/
4
/*Calculate the square root
of the distance vector
magnitude*/</p>
      <p>Surrounding Code
-12. void setup() {
-11. int val = 0;
-2. #endif // AUGMENT
/*Example 1 for data
augmentation*/
1. print(val);
-8. if (delta &gt; 0)
-3. double dist_sq = (x*x +
y*y);
-2. if (dist_sq &gt; 0)</p>
      <p>Class
Unnecessary
Informative</p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>Our methodology employs a Support Vector Machine model for binary classification. The model accepts
a set of input features, created from both the code comment and the code snippet that surrounds it. To
represent the comment and code context as vector embeddings, we employ a pre-trained Universal
Sentence Encoder to compute one independently for the comment and the code context. The two vector
embeddings are then combined to serve as the model’s feature vector.</p>
      <p>The joint dataset was separated into training and testing datasets, using 80% of the instances for
training and 20% for testing, which was consistent for both experiments (one where we used data
augmentation and one where we did not).</p>
      <sec id="sec-4-1">
        <title>4.1. Support Vector Machine Model</title>
        <p>The Support Vector Machine (SVM) is a binary classifier that finds an optimal hyperplane maximizing
the margin between classes, defined by support vectors.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Model Equations</title>
        <p>The linear output score , representing the signed distance from the hyperplane, is calculated as:
 = w x + 
(1)
where w is the weight vector, x is the feature vector, and  is the bias.
The classification is determined by the sign of :</p>
        <p>Class =
{︃+1 if  ≥ 0
−1 if  &lt; 0
(2)</p>
        <p>The model is trained by minimizing the Hinge Loss and can use the Kernel Trick for non-linear
data.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>We conducted two experiments to evaluate the impact of synthetic data augmentation. In the first
experiment, the Support Vector Machine model was trained and tested solely on the original
humanannotated dataset of 11,452 samples. In the second experiment, the training data was augmented with
the 233 GPT-labeled samples.</p>
      <p>The performance of the model in both scenarios is presented in Table 2.</p>
      <p>The results indicate that the model had an F1 score close to 0.80 on the original dataset. After
augmenting the training data with samples generated by GPT, the performance metrics were similar,
with only a very small change in the F1 score.</p>
      <p>Since the change was small, we can assume that none of the synthetic data generated new patterns
for the Support Vector Machine model to exploit for an improved classification task. The augmented
samples appeared to provide a similar statistical distribution in that they did not make meaningful
changes to the model’s decision boundary. The results demonstrated that adding additional LLM-labeled
examples in augmented samples did not yield improvements in performance for this model architecture
and task. It is also possible that there were not performance improvements due to the large diversity
and size of the original dataset from which the Support Vector Machine model could exploit.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In this research, we explored the potential of using synthetic data to support the task of code comment
quality classification. A solid baseline was established with a Support Vector Machine model that led to
an F1 score of 0.80 on an extensive human-annotated dataset. We primarily focused on the eficacy of
synthetic data augmentation using GPT-3.5-turbo, and this initial analysis suggested that synthetic data
labels did not improve performance.</p>
      <p>These results imply that synthetic labels did not introduce enough novelty and variance to improve
the performance of the model and/or additional high-quality human-annotated data had already pushed
the baseline model towards the performance ceiling. Our research findings suggest that synthetic data
is not helpful for certain types of low-N task when an adequate high-quality human-annotated initial
data is available, and thus, naive synthetic data augmentation would provide diminishing returns.</p>
      <p>Future work may implement better-performing architectures, such as transformer-based models, to
take advantage of subtle variations in the data. Another avenue for future work is the consideration of
more elaborate data augmentation approaches that seek to introduce more semantic variation would
likely outperform typical label generation.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used ChatGPT in order to perform grammar and
spelling checks. After using this tool, the author(s) reviewed and edited the content as needed and take
full responsibility for the publication’s content.
maintainability, in: 2007 IEEE International Conference on Software Maintenance, IEEE, 2007, pp.
184–193.
[17] M.-A. Storey, J. Ryall, R. I. Bull, D. Myers, J. Singer, Todo or to bug, in: 2008 ACM/IEEE 30th</p>
      <p>International Conference on Software Engineering, IEEE, 2008, pp. 251–260.
[18] T. Tenny, Program readability: Procedures versus comments, IEEE Transactions on Software</p>
      <p>Engineering 14 (1988) 1271.
[19] H. Yu, B. Li, P. Wang, D. Jia, Y. Wang, Source code comments quality assessment method based on
aggregation of classification algorithms, Journal of Computer Applications 36 (2016) 3448.
[20] S. Majumdar, A. Bansal, P. P. Das, P. D. Clough, K. Datta, S. K. Ghosh, Automated evaluation of
comments to aid software maintenance, Journal of Software: Evolution and Process 34 (2022)
e2463.
[21] S. Majumdar, S. Papdeja, P. P. Das, S. K. Ghosh, Comment-mine—a semantic search approach to
program comprehension from code comments, in: Advanced Computing and Systems for Security,
Springer, 2020, pp. 29–42.
[22] S. Majumdar, A. Bandyopadhyay, S. Chattopadhyay, P. P. Das, P. D. Clough, P. Majumder, Overview
of the irse track at fire 2022: Information retrieval in software engineering., in: FIRE (Working
Notes), 2022, pp. 1–9.
[23] S. Majumdar, A. Bandyopadhyay, P. P. Das, P. Clough, S. Chattopadhyay, P. Majumder, Can
we predict useful comments in source codes?-analysis of findings from information retrieval in
software engineering track@ fire 2022, in: Proceedings of the 14th Annual Meeting of the Forum
for Information Retrieval Evaluation, 2022, pp. 15–17.
[24] P. Chakraborty, S. Dutta, D. K. Sanyal, S. Majumdar, P. P. Das, Bringing order to chaos:
Conceptualizing a personal research knowledge graph for scientists., IEEE Data Eng. Bull. 46 (2023)
43–56.
[25] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam,
G. Sastry, A. Askell, et al., Language models are few-shot learners, Advances in neural information
processing systems 33 (2020) 1877–1901.
[26] A. Deshpande, A. Maji, D. Mondol, P. P. Das, P. D. Clough, S. Majumdar, The code–llm handshake:
Smarter maintenance through ai, in: Proceedings of the 17th annual meeting of the Forum for
Information Retrieval Evaluation, 2025, pp. 9–12.
[27] A. Mitra, S. Majumdar, A. Mukhopadhyay, P. P. Das, P. D. Clough, P. P. Chakrabarti,
Operationalizing large language models with design-aware contexts for code comment generation, arXiv
preprint arXiv:2510.22338 (2025).
[28] S. Paul, S. Majumdar, R. Shah, S. Das, M. Ghosh, D. Ganguly, G. Calikli, D. Sanyal, P. P. Das,
P. D Clough, A. Bandyopadhyay, S. Chattopadhyay, Generative ai for code metadata quality
assessment, in: Proceedings of the 16th Annual Meeting of the Forum for Information Retrieval
Evaluation, 2024.
[29] S. Majumdar, S. Paul, D. Paul, A. Bandyopadhyay, S. Chattopadhyay, P. P. Das, P. D. Clough,
P. Majumder, Generative ai for software metadata: Overview of the information retrieval in
software engineering track at fire 2023, arXiv preprint arXiv:2311.03374 (2023).
[30] S. Majumdar, A. Varshney, P. P. Das, P. D. Clough, S. Chattopadhyay, An efective low-dimensional
software code representation using bert and elmo, in: 2022 IEEE 22nd International Conference
on Software Quality, Reliability and Security (QRS), IEEE, 2022, pp. 763–774.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>C. B. de Souza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Anquetil</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. M. de Oliveira</surname>
          </string-name>
          ,
          <article-title>A study of the documentation essential to software maintenance</article-title>
          ,
          <source>Conference on Design of communication, ACM</source>
          ,
          <year>2005</year>
          , pp.
          <fpage>68</fpage>
          -
          <lpage>75</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Papdeja</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>S. K.</given-names>
          </string-name>
          <string-name>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <article-title>Smartkt: a search framework to assist program comprehension using smart knowledge transfer</article-title>
          ,
          <source>in: 2019 IEEE 19th International Conference on Software Quality, Reliability and Security (QRS)</source>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>97</fpage>
          -
          <lpage>108</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Sahoo</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
          </string-name>
          ,
          <article-title>Debugging multi-threaded applications using pin-augmented gdb (pgdb)</article-title>
          ,
          <source>in: International conference on software engineering research and practice (SERP)</source>
          . Springer,
          <year>2015</year>
          , pp.
          <fpage>109</fpage>
          -
          <lpage>115</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Sahoo</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
          </string-name>
          ,
          <article-title>D-cube: tool for dynamic design discovery from multi-threaded applications using pin</article-title>
          ,
          <source>in: 2016 IEEE International Conference on Software Quality, Reliability and Security (QRS)</source>
          , IEEE,
          <year>2016</year>
          , pp.
          <fpage>25</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Chakrabarti</surname>
          </string-name>
          ,
          <article-title>A mathematical framework for design discovery from multi-threaded applications using neural sequence solvers</article-title>
          ,
          <source>Innovations in Systems and Software Engineering</source>
          <volume>17</volume>
          (
          <year>2021</year>
          )
          <fpage>289</fpage>
          -
          <lpage>307</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pratim Das</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chakrabarti</surname>
          </string-name>
          ,
          <article-title>Dcube_ nn d cube nn: Tool for dynamic design discovery from multi-threaded applications using neural sequence models</article-title>
          ,
          <source>Advanced Computing and Systems for Security:</source>
          Volume
          <volume>14</volume>
          (
          <year>2021</year>
          )
          <fpage>75</fpage>
          -
          <lpage>92</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Siegmund</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Peitek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Parnin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Apel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hofmeister</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kästner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Begel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bethmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Brechmann</surname>
          </string-name>
          ,
          <article-title>Measuring neural eficiency of program comprehension</article-title>
          ,
          <source>in: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>140</fpage>
          -
          <lpage>150</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Chakrabarti</surname>
          </string-name>
          ,
          <article-title>Parallelc-assist: Productivity accelerator suite based on dynamic instrumentation</article-title>
          , IEEE Access (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Deshpande</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>P. P.</given-names>
          </string-name>
          <string-name>
            <surname>Chakrabarti</surname>
          </string-name>
          ,
          <article-title>Comprehending c codes with llms: Efective comment generation through retrieval and reasoning, Pattern Recognition Letters (</article-title>
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Paul</surname>
          </string-name>
          , S. Majumdar,
          <string-name>
            <given-names>R.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Das</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ganguly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Calikli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sanyal</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>P. D.</given-names>
          </string-name>
          <string-name>
            <surname>Clough</surname>
          </string-name>
          , et al.,
          <article-title>Overview of the “information retrieval in software engineering”(irse) track at forum for information retrieval 2024</article-title>
          ,
          <source>in: Proceedings of the 16th Annual Meeting of the Forum for Information Retrieval Evaluation</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>18</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Paul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bandyopadhyay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Dave</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chattopadhyay</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. Das</surname>
            ,
            <given-names>P. D.</given-names>
          </string-name>
          <string-name>
            <surname>Clough</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Majumder</surname>
          </string-name>
          ,
          <article-title>Eficiency of large language models to scale up ground truth: Overview of the irse track at forum for information retrieval 2023</article-title>
          ,
          <source>in: Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>16</fpage>
          -
          <lpage>18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>N.</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Chakrabarti</surname>
          </string-name>
          ,
          <article-title>Tool assisted agile approach for legacy application migration</article-title>
          ,
          <source>International Journal of System Assurance Engineering and Management</source>
          (
          <year>2025</year>
          )
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
          </string-name>
          ,
          <article-title>Smart knowledge transfer using google-like search</article-title>
          ,
          <source>arXiv preprint arXiv:2308.06653</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>P.</given-names>
            <surname>Oman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hagemeister</surname>
          </string-name>
          ,
          <article-title>Metrics for assessing a software system's maintainability</article-title>
          ,
          <source>in: Proceedings Conference on Software Maintenance</source>
          <year>1992</year>
          , IEEE Computer Society,
          <year>1992</year>
          , pp.
          <fpage>337</fpage>
          -
          <lpage>338</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>B.</given-names>
            <surname>Fluri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wursch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. C.</given-names>
            <surname>Gall</surname>
          </string-name>
          ,
          <article-title>Do code and comments co-evolve? on the relation between source code and comment changes</article-title>
          ,
          <source>in: 14th Working Conference on Reverse Engineering (WCRE</source>
          <year>2007</year>
          ), IEEE,
          <year>2007</year>
          , pp.
          <fpage>70</fpage>
          -
          <lpage>79</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>F.</given-names>
            <surname>Deissenboeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wagner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pizka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Teuchert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-F.</given-names>
            <surname>Girard</surname>
          </string-name>
          ,
          <article-title>An activity-based quality model for</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>