<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Categorizing Code Comments based on its Relevance for Code Readability</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kashvi Aggarwal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Indian Institute of Technology</institution>
          ,
          <addr-line>Goa</addr-line>
          ,
          <country country="IN">India -</country>
          <addr-line>403401</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>In software development, code comments can become uniquely valuable, necessitating structured approaches to objectively assess their value. This study focuses upon augmenting code comment usefulness classification with a labelled dataset that was created manually, as well as synthetically through a data augmentation method. In the study, labelled comment samples were developed to add to the training dataset using the GPT-3.5-turbo approach. A baseline predicting usefulness from code comments was subsequently constructed using Logistic Regression and Random Forest models. In terms of results, the F1 score performance metric reached 0.79 across all conditions, whether or not the model was created with synthetic data. This study adds evidence for the advantages of using synthetically created data augmentation on the accuracy of code comment usefulness selections.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Large Language Models</kwd>
        <kwd>GPT-3</kwd>
        <kwd>5</kwd>
        <kwd>Random Forests</kwd>
        <kwd>Data Augmentation</kwd>
        <kwd>Comment Classification</kwd>
        <kwd>Qualitative Analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In software engineering, code is critical for many domains such as finance, healthcare, and infrastructure.
As software systems adapt to meet new challenges, the complexity of their codebases increases, and
often leads to inconsistent or outdated documentation, making maintenance dificult. For this reason,
code comments become one of the most reliable sources for sharing information for developers and
automated tools. Comments include valuable metadata about the intent and logic of program segments.</p>
      <p>However, comments can vary widely in quality and clarity and cannot be relied upon to accurately
grade them for usefulness. To address this issue, we present a method to enhance a manually labeled
dataset of C language code comments through the addition of synthetic examples generated from a
state-of-the-art language model called GPT-3.5-turbo. Our method evaluates the resulting dataset by
exploring how the use of synthetic data produced through a language model afects rates of comment
usefulness classification. The code comments were classified using a Random Forest model as a classifier
baseline. We retained stable F1 scores of approximately 0.80 using both the original dataset and newly
augmented datasets, with the analysis suggesting promise for language model-based synthetic data
augmentation to complement human annotator codes comments without substantially changing rates
of usefulness classification.</p>
      <p>This research adds to the field evaluating the interaction of manual annotations and language model
generated data, the contribution to practical aspects of using synthetic data augmentations to increase
comment classification in dynamic software environments.The structure of this paper proceeds as
follows: Section 2 reviews related work; Section 3 introduces the task and dataset; Section 4 details the
methodology; results are discussed in Section 5; and Section 6 provides concluding insights.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Recognized by software experts, automated program understanding represents an area of research
in software engineering. There are various tools for extracting knowledge from software metadata,
including runtime traces and structure [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5 ref6 ref7 ref8 ref9">1, 2, 3, 4, 5, 6, 7, 8, 9</xref>
        ]. Researchers have developed various
methods to mine and evaluate code comments, focusing on analyzing comment quality through
codecomment pair comparisons. In assessing code comment quality, authors [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13 ref14 ref15 ref16 ref9">10, 11, 12, 13, 14, 15, 9, 16</xref>
        ]
employ techniques such as word similarity measures (e.g., Levenshtein distance) and comment length
analysis to filter out trivial and non-informative comments. Rahman et al. [ 17] detect useful and
nonuseful code review comments (logged-in review portals) based on attributes identified from a survey
conducted with developers of Microsoft [18].
      </p>
      <p>
        Assessing the relevance of source code comments is a critical step for ensuring program
comprehension, yet not all comments are helpful. This has led to extensive research in automatic comment quality
evaluation. Foundational studies established metrics for software maintainability [19] and confirmed
the tight coupling between code and comment evolution [20]. More recent work has focused on creating
multi-level classification systems [ 21] and automated frameworks like "CommentProbe" for C code [22].
While these and other studies [
        <xref ref-type="bibr" rid="ref13 ref14 ref16 ref8 ref9">22, 23, 14, 13, 9, 16, 24, 25, 8, 26, 27, 28, 29</xref>
        ] have made significant progress,
the automatic evaluation of comment quality continues to be an important research challenge.
      </p>
      <p>
        With the advent of Large Language Models (LLMs) such as GPT-3.5 [30], a key question emerges:
how does their assessment of comment quality compare to human judgment? The IRSE track at FIRE
2024 [31, 25] tackles this question directly. It extends prior work [
        <xref ref-type="bibr" rid="ref13">22, 26, 32, 13</xref>
        ] by applying various
vector space models [33] to the task of comment classification. Furthermore, the track explicitly evaluates
the performance of predictive models when the training data is augmented with labels generated by
GPT, providing insights into the utility of synthetic data for this software engineering task.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Task and Dataset Description</title>
      <p>This study introduces a binary classification system to evaluate the utility of source code comments
by labeling them as either useful or not useful. The model analyzes a comment and its corresponding
code snippet to determine its relevance. Our methodology relies on a robust dataset of over 11,000 C
code-comment pairs, which were professionally annotated by a team of 14 experts. To further enhance
this dataset, we generated and manually validated an additional 200+ synthetic samples using
GPT-3.5turbo. This combined dataset serves as the foundation for training machine learning models, such as
logistic regression, to perform the classification task.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>Our approach to classifying code comments involves using embeddings derived from both the comment
and the associated code snippet, which are then fed into a classification model.
4.1. Methodology Block Diagram
Evaluation metrics, including accuracy, precision, recall, and F1 score, assessing the efectiveness of
our classification model.</p>
      <p>The logistic regression model works by applying a logistic function to constrain the output between
0 and 1. This process starts with the formula  =  +  to calculate a linear combination of input
characteristics, followed by the application of the logistic function () = 1+exp1(−) to produce
a probability score. A threshold of 0.6 is set to favor predictions toward the useful comment category.
Each training example is represented with a three-dimensional feature vector, and the Cross-Entropy
loss function is used to optimize hyperparameters. In training, 80% of the dataset is utilized, with the
remaining 20% reserved for testing.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>We trained our Random Forest model independently with the original dataset and an augmented
dataset with additional dataset produced by GPT. The original dataset consisted of a total of 11,452
labeled samples, and we later created an additional 233 sample from the GPT augmentation. In the first
experiment, we used only the original dataset and the metrics follow below. The following metrics was
captured after the original dataset was augmented by the GPT-generated samples:</p>
      <p>Accuracy
Original Dataset 81.05679%
Augmented Dataset 81.53476%</p>
      <p>Precision
0.7913
0.7945</p>
      <p>Recall
0.8035
0.8078</p>
      <p>F1 Score
0.7967
0.7913</p>
      <p>The minor diferences in measures across both datasets indicate that GPT-generated samples are
successfully close to the original data in terms of quality, reinforcing the efectiveness of synthetic data
augmentation in this setting.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In this paper, we present a binary classification model based on a Random Forest model as the machine
learning algorithm to evaluate the utility of code comments. Our results indicate that synthetic data
generated by GPT-3.5-turbo is very close to the quality of manually labeled data, which demonstrates
the potential of augmenting training datasets with synthetic data when resources are constrained.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used ChatGPT in order to: Grammar and spelling
check. After using these tool(s)/service(s), the author(s) reviewed and edited the content as needed and
take(s) full responsibility for the publication’s content.
[17] M. M. Rahman, C. K. Roy, R. G. Kula, Predicting usefulness of code review comments using textual
features and developer experience, International Conference on Mining Software Repositories
(MSR), IEEE, 2017, pp. 215–226.
[18] A. Bosu, M. Greiler, C. Bird, Characteristics of useful code reviews: An empirical study at microsoft,</p>
      <p>Working Conference on Mining Software Repositories, IEEE, 2015, pp. 146–156.
[19] P. Oman, J. Hagemeister, Metrics for assessing a software system’s maintainability, in: Proceedings</p>
      <p>Conference on Software Maintenance 1992, IEEE Computer Society, 1992, pp. 337–338.
[20] B. Fluri, M. Wursch, H. C. Gall, Do code and comments co-evolve? on the relation between source
code and comment changes, in: 14th Working Conference on Reverse Engineering (WCRE 2007),
IEEE, 2007, pp. 70–79.
[21] H. Yu, B. Li, P. Wang, D. Jia, Y. Wang, Source code comments quality assessment method based on
aggregation of classification algorithms, Journal of Computer Applications 36 (2016) 3448.
[22] S. Majumdar, A. Bansal, P. P. Das, P. D. Clough, K. Datta, S. K. Ghosh, Automated evaluation of
comments to aid software maintenance, Journal of Software: Evolution and Process 34 (2022)
e2463.
[23] S. Majumdar, S. Papdeja, P. P. Das, S. K. Ghosh, Comment-mine—a semantic search approach to
program comprehension from code comments, in: Advanced Computing and Systems for Security,
Springer, 2020, pp. 29–42.
[24] S. Majumdar, A. Deshpande, P. P. Das, P. P. Chakrabarti, Comprehending c codes with llms:</p>
      <p>Efective comment generation through retrieval and reasoning, Pattern Recognition Letters (2025).
[25] S. Paul, S. Majumdar, R. Shah, S. Das, M. Ghosh, D. Ganguly, G. Calikli, D. Sanyal, P. P. Das,
P. D Clough, A. Bandyopadhyay, S. Chattopadhyay, Overview of the irse track at fire 2024:
Information retrieval in software engineering, in: FIRE (Working Notes), 2024.
[26] S. Paul, S. Majumdar, A. Bandyopadhyay, B. Dave, S. Chattopadhyay, P. Das, P. D. Clough, P.
Majumder, Eficiency of large language models to scale up ground truth: Overview of the irse track
at forum for information retrieval 2023, in: Proceedings of the 15th Annual Meeting of the Forum
for Information Retrieval Evaluation, 2023, pp. 16–18.
[27] N. Chatterjee, S. Majumdar, P. P. Das, A. Chakrabarti, Tool assisted agile approach for legacy
application migration, International Journal of System Assurance Engineering and Management
(2025) 1–16.
[28] A. Deshpande, A. Maji, D. Mondol, P. P. Das, P. D. Clough, S. Majumdar, The code–llm handshake:
Smarter maintenance through ai, in: Proceedings of the 17th annual meeting of the Forum for
Information Retrieval Evaluation, 2025, pp. 9–12.
[29] A. Mitra, S. Majumdar, A. Mukhopadhyay, P. P. Das, P. D. Clough, P. P. Chakrabarti,
Operationalizing large language models with design-aware contexts for code comment generation, arXiv
preprint arXiv:2510.22338 (2025).
[30] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam,
G. Sastry, A. Askell, et al., Language models are few-shot learners, Advances in neural information
processing systems 33 (2020) 1877–1901.
[31] S. Paul, S. Majumdar, R. Shah, S. Das, M. Ghosh, D. Ganguly, G. Calikli, D. Sanyal, P. P. Das,
P. D Clough, A. Bandyopadhyay, S. Chattopadhyay, Generative ai for code metadata quality
assessment, in: Proceedings of the 16th Annual Meeting of the Forum for Information Retrieval
Evaluation, 2024.
[32] S. Majumdar, S. Paul, D. Paul, A. Bandyopadhyay, S. Chattopadhyay, P. P. Das, P. D. Clough,
P. Majumder, Generative ai for software metadata: Overview of the information retrieval in
software engineering track at fire 2023, arXiv preprint arXiv:2311.03374 (2023).
[33] S. Majumdar, A. Varshney, P. P. Das, P. D. Clough, S. Chattopadhyay, An efective low-dimensional
software code representation using bert and elmo, in: 2022 IEEE 22nd International Conference
on Software Quality, Reliability and Security (QRS), IEEE, 2022, pp. 763–774.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>C. B. de Souza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Anquetil</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. M. de Oliveira</surname>
          </string-name>
          ,
          <article-title>A study of the documentation essential to software maintenance</article-title>
          ,
          <source>Conference on Design of communication, ACM</source>
          ,
          <year>2005</year>
          , pp.
          <fpage>68</fpage>
          -
          <lpage>75</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Papdeja</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>S. K.</given-names>
          </string-name>
          <string-name>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <article-title>Smartkt: a search framework to assist program comprehension using smart knowledge transfer</article-title>
          ,
          <source>in: 2019 IEEE 19th International Conference on Software Quality, Reliability and Security (QRS)</source>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>97</fpage>
          -
          <lpage>108</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Sahoo</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
          </string-name>
          ,
          <article-title>Debugging multi-threaded applications using pin-augmented gdb (pgdb)</article-title>
          ,
          <source>in: International conference on software engineering research and practice (SERP)</source>
          . Springer,
          <year>2015</year>
          , pp.
          <fpage>109</fpage>
          -
          <lpage>115</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Sahoo</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
          </string-name>
          ,
          <article-title>D-cube: tool for dynamic design discovery from multi-threaded applications using pin</article-title>
          ,
          <source>in: 2016 IEEE International Conference on Software Quality, Reliability and Security (QRS)</source>
          , IEEE,
          <year>2016</year>
          , pp.
          <fpage>25</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Chakrabarti</surname>
          </string-name>
          ,
          <article-title>A mathematical framework for design discovery from multi-threaded applications using neural sequence solvers</article-title>
          ,
          <source>Innovations in Systems and Software Engineering</source>
          <volume>17</volume>
          (
          <year>2021</year>
          )
          <fpage>289</fpage>
          -
          <lpage>307</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pratim Das</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chakrabarti</surname>
          </string-name>
          ,
          <article-title>Dcube_ nn d cube nn: Tool for dynamic design discovery from multi-threaded applications using neural sequence models</article-title>
          ,
          <source>Advanced Computing and Systems for Security:</source>
          Volume
          <volume>14</volume>
          (
          <year>2021</year>
          )
          <fpage>75</fpage>
          -
          <lpage>92</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Siegmund</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Peitek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Parnin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Apel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hofmeister</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kästner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Begel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bethmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Brechmann</surname>
          </string-name>
          ,
          <article-title>Measuring neural eficiency of program comprehension</article-title>
          ,
          <source>in: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>140</fpage>
          -
          <lpage>150</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Chakrabarti</surname>
          </string-name>
          ,
          <article-title>Parallelc-assist: Productivity accelerator suite based on dynamic instrumentation</article-title>
          , IEEE Access (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
          </string-name>
          ,
          <article-title>Smart knowledge transfer using google-like search</article-title>
          ,
          <source>arXiv preprint arXiv:2308.06653</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>L.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Hotcomments: how to make program comments more useful?, in: Conference on Programming language design and implementation (SIGPLAN)</article-title>
          , ACM,
          <year>2007</year>
          , pp.
          <fpage>20</fpage>
          -
          <lpage>27</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. D.</given-names>
            <surname>Gotmare</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. D.</given-names>
            <surname>Bui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. C.</given-names>
            <surname>Hoi</surname>
          </string-name>
          , Codet5+:
          <article-title>Open code large language models for code understanding and generation</article-title>
          ,
          <source>arXiv preprint arXiv:2305.07922</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D.</given-names>
            <surname>Steidl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hummel</surname>
          </string-name>
          , E. Juergens,
          <article-title>Quality analysis of source code comments</article-title>
          ,
          <source>International Conference on Program Comprehension (ICPC)</source>
          , IEEE,
          <year>2013</year>
          , pp.
          <fpage>83</fpage>
          -
          <lpage>92</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bandyopadhyay</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Clough</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Chattopadhyay</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Majumder</surname>
          </string-name>
          ,
          <article-title>Can we predict useful comments in source codes?-analysis of findings from information retrieval in software engineering track@ fire 2022</article-title>
          ,
          <source>in: Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>15</fpage>
          -
          <lpage>17</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bandyopadhyay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chattopadhyay</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>P. D.</given-names>
          </string-name>
          <string-name>
            <surname>Clough</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Majumder</surname>
          </string-name>
          ,
          <article-title>Overview of the irse track at fire 2022: Information retrieval in software engineering</article-title>
          .,
          <source>in: FIRE (Working Notes)</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Freitas</surname>
          </string-name>
          , D. da Cruz,
          <string-name>
            <given-names>P. R.</given-names>
            <surname>Henriques</surname>
          </string-name>
          ,
          <article-title>A comment analysis approach for program comprehension</article-title>
          ,
          <source>Annual Software Engineering Workshop</source>
          (SEW), IEEE,
          <year>2012</year>
          , pp.
          <fpage>11</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>P.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dutta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. K.</given-names>
            <surname>Sanyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
          </string-name>
          ,
          <article-title>Bringing order to chaos: Conceptualizing a personal research knowledge graph for scientists</article-title>
          .,
          <source>IEEE Data Eng. Bull</source>
          .
          <volume>46</volume>
          (
          <year>2023</year>
          )
          <fpage>43</fpage>
          -
          <lpage>56</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>