<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Enhancing Code Annotation Reliability: Generative AI's Role in Comment Quality Assessment Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Seetharam Killivalavan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Durairaj Thenmozhi</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Sri Sivasubramaniya Nadar College of Engineering</institution>
          ,
          <addr-line>Chennai, Tamil Nadu- 603110</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper explores a novel method for enhancing binary classification models that assess code comment quality, leveraging Generative Artificial Intelligence to elevate model performance. By integrating 1,437 newly generated code-comment pairs, labeled as "Useful" or "Not Useful" and sourced from various GitHub repositories, into an existing C-language dataset of 9,048 pairs, we demonstrate substantial model improvements. Using an advanced Large Language Model, our approach yields a 5.78% precision increase in the Support Vector Machine (SVM) model, improving from 0.79 to 0.8478, and a 2.17% recall boost in the Artificial Neural Network (ANN) model, rising from 0.731 to 0.7527. These results underscore Generative AI's value in advancing code comment classification models, ofering significant potential for enhanced accuracy in software development and quality control. This study provides a promising outlook on the integration of generative techniques for refining machine learning models in practical software engineering settings.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Code Comment Quality Classification</kwd>
        <kwd>Generative Artificial Intelligence</kwd>
        <kwd>Support Vector Machines</kwd>
        <kwd>Artificial Neural Networks</kwd>
        <kwd>Natural Language Processing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Code comments are essential in software development, enhancing understanding, supporting team
collaboration, and facilitating long-term code maintenance, as discussed by De et al. (2005) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. However,
manually evaluating comments poses challenges due to its time-intensive and subjective nature, as noted
by Haouari et al. (2011) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. To address these limitations, this study explores the use of Generative AI to
automate comment quality assessment, as proposed by Ebert et al. (2023) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], presenting a significant
advancement for optimizing code review processes and expediting the Software Development Life
Cycle (SDLC).
      </p>
      <p>
        Incorporating comments efectively within the SDLC can benefit developers by accelerating
troubleshooting, providing essential documentation, and establishing a robust groundwork for future
development phases, as suggested by Majumdar (2020) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. This paper details our methods,
experimental design, and the transformative potential of this AI-based approach for the software engineering field,
as previously highlighted by Roehm et al. (2012) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Following this introduction, we review existing
studies on comment classification and explain our process for generating a new dataset using Large
Language Models (LLMs).
      </p>
      <sec id="sec-1-1">
        <title>1.1. CODE COMMENT CLASSIFICATION: CURRENT LANDSCAPE AND</title>
      </sec>
      <sec id="sec-1-2">
        <title>CHALLENGES</title>
        <p>
          Code comments are used to clarify logic, design decisions, and develop challenges [6]. However, manual
evaluation remains inconsistent, time-consuming, and subjective [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. Automated classification, labeling
comments as "Useful" or "Not Useful," ofers a more eficient approach to streamline code review [ 7]. This
study examines how Generative AI can enhance these classification models [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], potentially transforming
comment quality assessment. By prioritizing essential comments, resource management can improve.
        </p>
        <p>
          This introduction sets up a discussion on how Large Language Models (LLMs) are advancing code
comment classification and software development practices [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
      </sec>
      <sec id="sec-1-3">
        <title>1.2. IMPACT OF LLM ON THE QUALITY OF COMMENTS</title>
        <p>
          Leveraging Large Language Models (LLMs) represents a major advancement in evaluating the quality of
code comments [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. These models move beyond syntactic comprehension, capturing the deeper
semantics of the code and generating insightful comments that streamline assessment processes. By doing so,
they significantly enhance the relevance and clarity of comments across the Software Development Life
Cycle (SDLC). Beyond mere classification, LLMs redefine developer interaction with code, fostering
clearer communication and strengthening collaboration. This transformative impact underscores the
essential role LLMs are set to play in the future of code comment quality evaluation.
The application of Generative AI within the IRSE@FIRE-2024 task [8] is set to transform code quality
evaluation, streamlining the Software Development Life Cycle (SDLC) and promoting more efective
resource distribution and collaborative development eforts among teams.
        </p>
        <p>The subsequent sections are organized as follows: Section 2 provides an overview of comment
classification and the foundations of Generative AI. Section 3 describes the task setup and dataset used.
Our methodology is detailed in Section 4. In Section 5, we present the results, while Section 6 ofers
a comparative analysis of our models and embeddings against established approaches in code
comment quality assessment, underscoring their unique contributions. Lastly, Section 7 concludes with a
summary of our findings and discusses possible avenues for future research.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Automated program understanding is a recognized research area among professionals in the software
domain. Various tools have been developed to facilitate the extraction of knowledge from software
metadata, encompassing components such as runtime traces and structural attributes of code [
        <xref ref-type="bibr" rid="ref1">1,
9, 10, 11, 12, 13, 14, 15</xref>
        ]. Researchers have developed various methods to mine and evaluate code
comments, focusing on analyzing comment quality through code-comment pair comparisons. In
assessing code comment quality, authors [16, 17, 18, 19, 20, 21, 22, 23] employ techniques such as word
similarity measures (e.g., Levenshtein distance) and comment length analysis to filter out trivial and
non-informative comments. Rahman et al. [24] detect useful and non-useful code review comments
(logged-in review portals) based on attributes identified from a survey conducted with developers of
Microsoft [25].
      </p>
      <p>
        New programmers often rely on existing comments to comprehend code flow. However, not all
comments contribute efectively to program comprehension, necessitating a relevancy assessment
of source code comments prior to their use. Numerous researchers have focused on the automatic
classification of source code comments in terms of quality evaluation. For instance, Omal et al. [ 26] noted
that factors influencing software maintainability can be organized into hierarchical structures. The
authors defined measurable attributes in the form of metrics for each factor, enabling the assessment of
software characteristics, which can then be consolidated into a single index of software maintainability.
Fluri et al.[27] examined whether the source code and associated comments are changed together along
the multiple versions. They investigated three open source systems, such as ArgoUML, Azureus, and
JDT Core, and found that 97% of the comment changes are done in the same revision as the associated
source code changes. Yu Hai et al.[28] classified source code comments into four classes - unqualified,
qualified, good, and excellent. The aggregation of basic classification algorithms further improved the
classification result. Another work published in [ 7] in which author proposed an automatic classification
mechanism "CommentProbe" for quality evaluation of code comments of C codebases. We see that
people worked on source code comments with diferent aspects[
        <xref ref-type="bibr" rid="ref4">7, 4, 20, 19, 22, 23</xref>
        ], but still, automatic
quality evaluation of source code comments is an important area and demands more research.
      </p>
      <p>With the advent of large language models [29], it is important to compare the quality assessment
of code comments by the standard models like GPT 3.5 or llama with the human interpretation. The
IRSE track at FIRE 2024 [30, 31] builds upon the methodologies proposed in [7, 32, 8, 19] to investigate
various vector space models [33] and features for binary classification and evaluation of comments
in relation to code comprehension. This track also assesses the performance of the predictive model
by incorporating GPT-generated labels for the quality of code and comment snippets extracted from
open-source software.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Task and Dataset Description</title>
      <p>This section outlines the IRSE@FIRE-2024 task [8], focused on improving a binary code comment
quality classification model. The task involves integrating newly generated code-comment pairs for
enhanced accuracy. It comprises an initial dataset of 9048 labeled code-comment pairs in C, out of
which 5378 were classified as "Useful" and 3670 were classified as "Not Useful", along with additional
pairs generated using a Large Language Model (LLM), each labeled.</p>
      <p>The desired output includes two versions of the classification model: one with the added generated
pairs and labels, and another without. The starting dataset encompasses 9048 comments from GitHub,
each with the comment text, surrounding code, and a corresponding usefulness label (Table 1).</p>
      <p>To establish the ground truth, 14 annotators assessed each comment independently, resulting in
substantial agreement (Cohen’s kappa value of 0.734). The annotation process involved the assessment
of a comprehensive set of 16,000 comments.</p>
      <p>Participants are also tasked with generating an additional dataset of labeled code-comment pairs
from GitHub using an LLM. This dataset is to be submitted alongside the task.</p>
      <p>In summary, the objective is to refine the code comment quality classification model by integrating
newly generated pairs, ultimately enhancing accuracy and efectiveness.</p>
      <p>For further details, please refer to the task description provided at IRSE@FIRE-2024 1.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>Our approach encompasses the combination robust methodologies, including Support Vector Machine
(SVM) models for classification and Artificial Neural Networks (ANN) with diverse activation functions
for capturing complex data relationships [34]. Additionally, we leverage Large Language Models (LLMs)
via the OpenAI API and utilize GitHub repositories to generate a diverse and substantial dataset of
code-comment pairs. The following subtopics detail our specific methodologies: implementing SVM
models, exploring ANN models, and generating datasets using the OpenAI API and GitHub repositories.
These methodologies collectively form the foundation of our innovative approach to code comment
quality assessment. Within the framework of our methodology, Figure 1 elegantly elucidates the
architectural blueprint that underpins our approach.</p>
      <sec id="sec-4-1">
        <title>4.1. Support Vector Machines</title>
        <p>A Linear Support Vector Machine (SVM) is a powerful classification technique that finds the optimal
hyperplane for efective data separation, expressed as  =  + , where  is the predicted class label,
 is the input data,  is the slope and  is the y-intercept. It maximizes the margin, which is the distance
between the hyperplane and the nearest data points. This margin (M) can be calculated as:
 = w1x1 + w2x2 + . . . + wx + b
where  are input features,  are corresponding weights and  is the bias term.
The weighted sum (Z ) is then passed through an activation function, which introduces non-linearity
into the model. Diferent activation functions yield diferent learning behaviours.</p>
        <p>Here are a few common activation functions and their formulas:
 =</p>
        <p>2
‖‖
( ·  + ) ≥ 1
where ||m|| is the length of the weight vector m.</p>
        <p>SVM aims to minimize the square of the length of the weight vector (||m||²) while ensuring that each
data point  is correctly classified:
Equation 2 states that the product must be greater than or equal to 1 for all data points, emphasizing the
importance of well-defined class separation in SVM classification. This condition is central to SVM’s
goal of locating an optimal hyperplane, maximizing the margin, and guaranteeing accurate data point
classification. Support vectors, those closest to the hyperplane, are pivotal in margin definition, thereby
influencing SVM’s overall performance.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Artificial Neural Networks</title>
        <p>Artificial Neural Networks (ANNs) are adaptable machine learning models that draw inspiration from
the architecture and operation of the human brain. They excel at discerning complex data relationships,
making them highly efective for tasks like code comment quality classification. The mathematical
representation of a single neuron in an ANN is given by:
(1)
(2)
(3)
i) Logistic Function:
ii) Rectified Linear Unit (ReLU):
iii) Hyperbolic Tangent (tanh):
 () =</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Leveraging LLM for Generation of Dataset</title>
        <p>Our methodology encompasses a multi-step approach to dataset generation. Initially, we leveraged both
the OpenAI API, powered by the Curie Model, and GitHub repositories to diversify our dataset. The
API simulated real-world coding scenarios, producing authentic code-comment pairs and substantially
augmenting our dataset. Complementing this, we extracted additional pairs from various open-source
projects on GitHub, ensuring relevance and utility. This combined strategy significantly broadened
the dataset’s coverage while upholding high quality standards. Subsequently, the code-comment pairs
underwent processing using OpenAI’s Curie Model in conjunction with BERT for label generation,
signifying comment usefulness. This involved presenting prompts with both code and comment, and
employing the LLM to generate a label. Finally, the dataset was meticulously assembled, each entry
comprising code, comment, and the corresponding generated label. This rigorous methodology serves
as a robust foundation for our code comment quality classification model.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Analysis of Results</title>
      <p>Evaluating our code comment quality classification model is a crucial step in validating its efectiveness.
We utilized a combination of Support Vector Machines (SVM) and Artificial Neural Networks (ANN) with
various activation functions, including ReLU, identity, logistic, and tanh, to conduct a comprehensive
analysis of the model’s performance. This multidimensional approach ofered valuable insights into the
model’s adaptability, revealing its robustness across diverse scenarios. Additionally, integrating these
methodologies resulted in a significant improvement in precision, underscoring the model’s ability
to categorize code comments accurately based on practical value. These findings align with previous
research that demonstrates the reliability of SVM and ANN models for comment quality assessment.
The use of diverse activation functions further highlights the flexibility of our approach, reinforcing the
model’s potential applicability in real-world software development.</p>
      <sec id="sec-5-1">
        <title>5.1. Classification Models</title>
        <p>The evaluation of our code comment quality classification models yielded insightful findings, showcasing
the impact of integrating LLM-generated data into our seed dataset of 9048 entries. This initial dataset
was thoughtfully partitioned into training, testing and validation sets, with the testing set comprising
1718 entries. With the Seed Data, SVM exhibited commendable precision (0.79), while ANN with ReLU
activation demonstrated remarkable efectiveness, resulting in a notable recall score (0.731). Models
with tanh and logistic activation functions showed similar precision scores of 0.726 and 0.73.</p>
        <p>Post integration of 1437 LLM-generated entries, which seamlessly enriched the Seed Data, SVM’s
precision notably increased by 5.78%, elevating the preceding value to 0.8478, highlighting the value
of incorporating generative AI. Using ReLU, ANN achieved a noteworthy 2.17% rise in its recall,
giving it a final recall of 0.7527, while tanh and logistic functions yielded marginal changes. Extensive
experimentation with varied SVM models and ANN activation functions was performed, and the results
depicts the efectiveness of our approach, emphasizing the importance of meticulous experimentation
in fine-tuning models for code comment quality analysis.</p>
        <p>Furthermore, for detailed numerical insights, please refer to Table 2, which provides a comparison of
the model performance, ofering the classification report of our top-performing models. It serves as a
comprehensive reference for our findings and allows the comparison of test accuracies and F1 scores
before and after integration.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Analysis of Dataset Generated using LLM</title>
        <p>The integration of data generated by OpenAI’s Large Language Model (LLM), in conjunction with the
utilization of the Curie model, and the inclusion of diverse datasets from various GitHub repositories and
open-source projects represents a significant stride in elevating our code comment quality classification
model. By meticulously adding 1437 new entries to our original dataset, we substantially enriched the
diversity of our training corpus. This augmentation in data diversity led to a marked improvement in the
accuracy of our classification model, benefiting both Support Vector Machine (SVM) and Artificial Neural
Network (ANN) models. The heightened sensitivity achieved through this amalgamation enhances the
model’s generalization and prediction capabilities, underscoring the value of incorporating external
data sources. Furthermore, the integration of BERT embeddings and the Curie model empowered our
model to adeptly capture the intricacies of code commentary, notably enhancing its ability to distinguish
between "Useful" and "Not Useful" comments. This capability proves crucial in real-world scenarios,
where precise comment assessment plays a pivotal role in influencing the efectiveness of software
development and maintenance processes.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <p>In this section, we conduct a thorough comparative analysis of our models and embeddings in relation
to previous studies on code comment classification. Our deliberate emphasis on Support Vector Machine
(SVM) and Artificial Neural Network (ANN) models, each with specific activation functions, allows
for an in-depth exploration of their eficacy. This focused investigation provides nuanced insights into
their performance in code comment quality assessment, contrasting with the broader set of classifiers
utilized by Majumdar et al. (2022a) [7].</p>
      <p>
        Additionally, our research methodology diverges from the work of Majumdar et al. (2020) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], which
primarily centers on the extraction of knowledge domains from code comments for addressing developer
queries during maintenance. In contrast, our focus centers on the development and evaluation of code
comment quality classification models. This includes the integration of LLM-generated data, resulting
in significant enhancements in classification precision.
      </p>
      <p>Concerning embeddings, Majumdar et al. (2022b) [33] emphasize contextualized word representations
ifne-tuned on software development texts. In our case, we utilized both BERT and custom embeddings
specifically tailored for software development concepts. This approach provided high-dimensional
semantic representations, catering to a wide array of natural language processing tasks. It’s worth
noting that for labeling, we harnessed the Curie model. This distinction underscores the versatility
and broader applicability of our embeddings compared to the contextualized embeddings discussed by
Majumdar et al (2022b)[33].</p>
      <p>Fundamentally, our proposition emphatically focuses on specific models and embeddings, providing
unique insights into their efectiveness for assessing code comment quality. The emphasis on specific
models and customized embeddings ofers detailed insights into evaluating code comment quality,
distinguishing it from the broader, contextually-focused techniques utilized in prior research [33].</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>Building on these foundational advancements, our study highlights the practicality and scalability
of Generative AI for real-world applications. By generating and integrating new data into existing
datasets, we demonstrated that Generative AI could enhance the performance of traditional models in
code comment quality assessment. This approach not only elevated our models’ precision and recall
but also underscored the potential of Generative AI to provide robust solutions for improving software
documentation practices, making it an impactful tool for future development cycles.</p>
      <p>The integration of LLM-generated data notably amplified model performance, with precision for the
SVM model increasing by 5.78% and recall for the ANN model improving by 2.17%. These enhancements
raised the test accuracies to 81.1% for SVM and 75% for ANN, marking a clear advancement from their
pre-augmentation baselines. These quantifiable gains underscore the efectiveness of data augmentation
via Generative AI, illustrating how even modest dataset expansions can yield substantial improvements
in model accuracy and reliability, particularly for complex classification tasks in software development.</p>
      <p>Looking ahead, the impact of this work can extend well beyond code comment classification. The
methodologies introduced here establish a versatile framework that can be adapted for a wide range of
tasks in software development and quality assurance. By leveraging generative AI, specifically through
Large Language Models (LLMs), we highlight a powerful approach that could redefine code analysis
and documentation practices. As the software industry evolves, this research stands as evidence of the
substantial value in adopting advanced technologies, reinforcing the importance of innovative solutions
in enhancing eficiency and precision in practical engineering applications.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used ChatGPT in order to: Grammar and spelling
check. After using these tool(s)/service(s), the author(s) reviewed and edited the content as needed and
take(s) full responsibility for the publication’s content.
[6] P. Rani, S. Panichella, M. Leuenberger, A. Di Sorbo, O. Nierstrasz, How to identify class comment
types? a multi-language approach for class comment classification, Journal of systems and software
181 (2021) 111047.
[7] S. Majumdar, A. Bansal, P. P. Das, P. D. Clough, K. Datta, S. K. Ghosh, Automated evaluation of
comments to aid software maintenance, Journal of Software: Evolution and Process 34 (2022)
e2463.
[8] S. Majumdar, S. Paul, D. Paul, A. Bandyopadhyay, S. Chattopadhyay, P. P. Das, P. D. Clough,
P. Majumder, Generative ai for software metadata: Overview of the information retrieval in
software engineering track at fire 2023, arXiv preprint arXiv:2311.03374 (2023).
[9] S. Majumdar, S. Papdeja, P. P. Das, S. K. Ghosh, Smartkt: a search framework to assist program
comprehension using smart knowledge transfer, in: 2019 IEEE 19th International Conference on
Software Quality, Reliability and Security (QRS), IEEE, 2019, pp. 97–108.
[10] N. Chatterjee, S. Majumdar, S. R. Sahoo, P. P. Das, Debugging multi-threaded applications using
pin-augmented gdb (pgdb), in: International conference on software engineering research and
practice (SERP). Springer, 2015, pp. 109–115.
[11] S. Majumdar, N. Chatterjee, S. R. Sahoo, P. P. Das, D-cube: tool for dynamic design discovery
from multi-threaded applications using pin, in: 2016 IEEE International Conference on Software
Quality, Reliability and Security (QRS), IEEE, 2016, pp. 25–32.
[12] S. Majumdar, N. Chatterjee, P. P. Das, A. Chakrabarti, A mathematical framework for design
discovery from multi-threaded applications using neural sequence solvers, Innovations in Systems
and Software Engineering 17 (2021) 289–307.
[13] S. Majumdar, N. Chatterjee, P. Pratim Das, A. Chakrabarti, Dcube_ nn d cube nn: Tool for dynamic
design discovery from multi-threaded applications using neural sequence models, Advanced
Computing and Systems for Security: Volume 14 (2021) 75–92.
[14] J. Siegmund, N. Peitek, C. Parnin, S. Apel, J. Hofmeister, C. Kästner, A. Begel, A. Bethmann,
A. Brechmann, Measuring neural eficiency of program comprehension, in: Proceedings of the
2017 11th Joint Meeting on Foundations of Software Engineering, 2017, pp. 140–150.
[15] N. Chatterjee, S. Majumdar, P. P. Das, A. Chakrabarti, Parallelc-assist: Productivity accelerator
suite based on dynamic instrumentation, IEEE Access (2023).
[16] L. Tan, D. Yuan, Y. Zhou, Hotcomments: how to make program comments more useful?, in:
Conference on Programming language design and implementation (SIGPLAN), ACM, 2007, pp.
20–27.
[17] Y. Wang, H. Le, A. D. Gotmare, N. D. Bui, J. Li, S. C. Hoi, Codet5+: Open code large language
models for code understanding and generation, arXiv preprint arXiv:2305.07922 (????).
[18] D. Steidl, B. Hummel, E. Juergens, Quality analysis of source code comments, International</p>
      <p>Conference on Program Comprehension (ICPC), IEEE, 2013, pp. 83–92.
[19] S. Majumdar, A. Bandyopadhyay, P. P. Das, P. Clough, S. Chattopadhyay, P. Majumder, Can
we predict useful comments in source codes?-analysis of findings from information retrieval in
software engineering track@ fire 2022, in: Proceedings of the 14th Annual Meeting of the Forum
for Information Retrieval Evaluation, 2022, pp. 15–17.
[20] S. Majumdar, A. Bandyopadhyay, S. Chattopadhyay, P. P. Das, P. D. Clough, P. Majumder, Overview
of the irse track at fire 2022: Information retrieval in software engineering., in: FIRE (Working
Notes), 2022, pp. 1–9.
[21] J. L. Freitas, D. da Cruz, P. R. Henriques, A comment analysis approach for program comprehension,</p>
      <p>Annual Software Engineering Workshop (SEW), IEEE, 2012, pp. 11–20.
[22] S. Majumdar, P. P. Das, Smart knowledge transfer using google-like search, arXiv preprint
arXiv:2308.06653 (2023).
[23] P. Chakraborty, S. Dutta, D. K. Sanyal, S. Majumdar, P. P. Das, Bringing order to chaos:
Conceptualizing a personal research knowledge graph for scientists., IEEE Data Eng. Bull. 46 (2023)
43–56.
[24] M. M. Rahman, C. K. Roy, R. G. Kula, Predicting usefulness of code review comments using textual
features and developer experience, International Conference on Mining Software Repositories
(MSR), IEEE, 2017, pp. 215–226.
[25] A. Bosu, M. Greiler, C. Bird, Characteristics of useful code reviews: An empirical study at microsoft,</p>
      <p>Working Conference on Mining Software Repositories, IEEE, 2015, pp. 146–156.
[26] P. Oman, J. Hagemeister, Metrics for assessing a software system’s maintainability, in: Proceedings</p>
      <p>Conference on Software Maintenance 1992, IEEE Computer Society, 1992, pp. 337–338.
[27] B. Fluri, M. Wursch, H. C. Gall, Do code and comments co-evolve? on the relation between source
code and comment changes, in: 14th Working Conference on Reverse Engineering (WCRE 2007),
IEEE, 2007, pp. 70–79.
[28] H. Yu, B. Li, P. Wang, D. Jia, Y. Wang, Source code comments quality assessment method based on
aggregation of classification algorithms, Journal of Computer Applications 36 (2016) 3448.
[29] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam,
G. Sastry, A. Askell, et al., Language models are few-shot learners, Advances in neural information
processing systems 33 (2020) 1877–1901.
[30] S. Paul, S. Majumdar, R. Shah, S. Das, M. Ghosh, D. Ganguly, G. Calikli, D. Sanyal, P. P. Das,
P. D Clough, A. Bandyopadhyay, S. Chattopadhyay, Generative ai for code metadata quality
assessment, in: Proceedings of the 16th Annual Meeting of the Forum for Information Retrieval
Evaluation, 2024.
[31] S. Paul, S. Majumdar, R. Shah, S. Das, M. Ghosh, D. Ganguly, G. Calikli, D. Sanyal, P. P. Das,
P. D Clough, A. Bandyopadhyay, S. Chattopadhyay, Overview of the irse track at fire 2024:
Information retrieval in software engineering, in: FIRE (Working Notes), 2024.
[32] S. Paul, S. Majumdar, A. Bandyopadhyay, B. Dave, S. Chattopadhyay, P. Das, P. D. Clough, P.
Majumder, Eficiency of large language models to scale up ground truth: Overview of the irse track
at forum for information retrieval 2023, in: Proceedings of the 15th Annual Meeting of the Forum
for Information Retrieval Evaluation, 2023, pp. 16–18.
[33] S. Majumdar, A. Varshney, P. P. Das, P. D. Clough, S. Chattopadhyay, An efective low-dimensional
software code representation using bert and elmo, in: 2022 IEEE 22nd International Conference
on Software Quality, Reliability and Security (QRS), IEEE, 2022, pp. 763–774.
[34] L. Igual, S. Seguí, L. Igual, S. Seguí, Introduction to data science, Springer, 2017.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>C. B. de Souza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Anquetil</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. M. de Oliveira</surname>
          </string-name>
          ,
          <article-title>A study of the documentation essential to software maintenance</article-title>
          ,
          <source>Conference on Design of communication, ACM</source>
          ,
          <year>2005</year>
          , pp.
          <fpage>68</fpage>
          -
          <lpage>75</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Haouari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sahraoui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Langlais</surname>
          </string-name>
          ,
          <article-title>How good is your comment? a study of comments in java programs</article-title>
          , in:
          <year>2011</year>
          <article-title>International symposium on empirical software engineering and measurement</article-title>
          , IEEE,
          <year>2011</year>
          , pp.
          <fpage>137</fpage>
          -
          <lpage>146</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Ebert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Louridas</surname>
          </string-name>
          ,
          <article-title>Generative ai for software practitioners</article-title>
          ,
          <source>IEEE Software 40</source>
          (
          <year>2023</year>
          )
          <fpage>30</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Papdeja</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>S. K.</given-names>
          </string-name>
          <string-name>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <article-title>Comment-mine-a semantic search approach to program comprehension from code comments</article-title>
          ,
          <source>in: Advanced Computing and Systems for Security</source>
          , Springer,
          <year>2020</year>
          , pp.
          <fpage>29</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Roehm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tiarks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Koschke</surname>
          </string-name>
          , W. Maalej,
          <article-title>How do professional developers comprehend software?</article-title>
          ,
          <source>in: 2012 34th International Conference on Software Engineering (ICSE)</source>
          , IEEE,
          <year>2012</year>
          , pp.
          <fpage>255</fpage>
          -
          <lpage>265</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>