<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Forum for Information Retrieval Evaluation, December</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Binary Classification of Source Code Comments using Machine Learning Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lisa Sarkar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Indian Institute of Technology</institution>
          ,
          <addr-line>Kharagpur, West Bengal, Kol-721302</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>1</volume>
      <fpage>5</fpage>
      <lpage>18</lpage>
      <abstract>
        <p>This paper reports a detailed analysis on the viability of a classification framework which can classify a comment based on its usefulness within the source code. This classification will be helpful for new developers for correctly comprehending the source code. Three machine learning models: such as logistic regression, support vector machine, and multinomial naive Bayes are trained using an initial dataset called seed dataset. Each comment is classified into one of the two categories - useful and not useful. An accuracy of 82.92%, 83.92%, and 50.75% respectively is achieved from the initial training of three models. The dataset is then augmented using a new set of data extracted from several online resources. The corresponding class for the new set are generated using chatGPT large language model (LLM). The augmented dataset is then again used to train those three machine learning models. It is observed that for the new augmented dataset, the accuracy drops down for all three models due to inclusion of noise and biasness owing to the LLM generated dataset.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Logistic Regression</kwd>
        <kwd>Support vector machine</kwd>
        <kwd>Comment classification</kwd>
        <kwd>Qualitative analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Software is emerging as the backbone of modern technology as they warrant many promising
applications owing to their integration into electronics and appliances. They are simplifying
the challenges of our daily life and making it easier in all aspects. For example, GPS software
facilitates driving from one place to another. Constant modification of existing software and
building of new software is the key of improving the software functionality which leads to an
increase in the source code. Maintaining this large amount of source code is a crucial phase
of Software Development Life Cycle (SDLC). In most of the cases, developers face numerous
challenges during this source code maintenance including comprehending large code base in
short period of time, outdated and incomplete required documents, unavailability of knowledge
from previous developer to name a few.</p>
      <p>
        This type of scenario can be tackled by following systematic process flow. New developers
generally have the source code, sample test cases, requirement documents, and a debugger to
implement new functionality. For further modification of the code, the developer must understand
the existing source code. So, they repeatedly run the current application on the sample test cases
to identify execution patterns, understand the design, and to comprehend the program. But this
whole process is time-hungry, efort-intensive, monotonous and sometimes becomes
unmanageable. To overcome those bottlenecks, developers often follow shortcut method which further
introduce errors that are dificult to filter. This results in the degradation of software quality
and developer’s eficiency. These types of situations demand a systematic quality-controlled
development process for ease of use by the developers. Program comprehension is one such
process for maintaining existing source code in a better way. This reverse engineering process
is beneficial for reuse, inspection, maintenance, and many others in the context of software
engineering[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>Inserting comments within program is incumbent for better understanding of the
sourcecode. Many developers are working on the same code-base and describe diferently while
inserting comments. Diferent description of same code-base decreases the program’s readability.
Therefore, a standardized procedure of writing codes and comments is imperative in order
to enhance the readability. Still this approach is not efective for understanding an already
registered code. An intuitive understanding of source code comments may be a good idea to
solve the readability program. In recent years, researchers are exploring this domain to develop
new applications aiming to enhance the eficiency of new programmers through understanding
the existing code.</p>
      <p>In this paper, we present a classification framework applied on a dataset of code and comment
pairs written in C language. The work is done in three stages - Training of classification
framework based on seed dataset, augment the dataset volume using large language model,
and again train the classification framework using the augmented dataset. In the first stage,
the framework takes code and comment pair as an input and classify them into one of the two
classes - Useful and Not Useful. In this case, logistic regression method, support vector machine
(SVM), and multinomial naive Bayes techniques are employed for comment classification. A
training dataset of 9000 samples and test dataset of 1001 samples are used for this purpose. The
model is validated using five fold cross-validation process. We employ linear kernel for the SVM
strategy and L2 regularization method for the logistic regression strategy. In the next stage,
another set of code-comment pair is gathered from online sources such as github. Consequently,
chatGPT-4 large language model is used to categorise the newly gathered code-comment pair
into two classes- useful and not-useful. This generated dataset is augmented with previous seed
dataset. The newly generated dataset is used to train these classification frameworks again. A
small reduction in all the F1 score values and accuracies is obtained which can be due to noise
inclusion as part of the newly generated dataset.</p>
      <p>The rest of the paper is organized as follows. Section 2 discusses the background work done
in the domain of comment classification. Details of existing methods are discussed in section
3. We discuss the proposed method in section 4. Results are addressed in section 5. Section 6
concludes the paper.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Software metadata [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] plays a crucial role in the maintenance of code and its subsequent
understanding. Numerous tools have been developed to assist in extracting knowledge from
software metadata, which includes runtime traces and structural attributes of code [
        <xref ref-type="bibr" rid="ref10 ref11 ref3 ref4 ref5 ref6 ref7 ref8 ref9">3, 4, 5, 6, 7,
8, 9, 10, 11</xref>
        ].
      </p>
      <p>
        In the realm of mining code comments and assessing their quality, several authors have
conducted research. Steidl et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] employ techniques such as Levenshtein distance and
comment length to gauge the similarity of words in code-comment pairs, efectively filtering
out trivial and non-informative comments. Rahman et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] focus on distinguishing useful
from non-useful code review comments within review portals, drawing insights from attributes
identified in a survey conducted with Microsoft developers [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Majumdar et al. [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16, 17, 18</xref>
        ]
have introduced a framework for evaluating comments based on concepts crucial for code
comprehension. Their approach involves the development of textual and code correlation
features, utilizing a knowledge graph to semantically interpret the information within comments.
These approaches employ both semantic and structural features to address the prediction
problem of distinguishing useful from non-useful comments, ultimately contributing to the
process of decluttering codebases
      </p>
      <p>
        In light of the emergence of large language models, such as GPT-3.5 or llama [19], it becomes
crucial to assess the quality of code comments and compare them to human interpretation. The
IRSE track at FIRE 2023 [20] expands upon the approach presented in a prior work [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. It delves
into the exploration of various vector space models [21] and features for binary classification
and evaluation of comments, specifically in the context of their role in comprehending code.
Furthermore, this track conducts a comparative analysis of the prediction model’s performance
when GPT-generated labels for code and comment quality, extracted from open-source software,
are included.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Task and Dataset Description</title>
      <p>The task of implementing binary classification framework is accomplished in three consecutive
steps. Those are designing of classification framework with seed dataset, augmenting the
seed dataset volume using large language model and train the same framework using the new
augmented dataset. The source code and comments pairs are classified into two classes useful
and not useful using the trained framework. The procedure takes a comment description with
associated lines of code as input and generates a label such as useful or not useful corresponding
to each code-comment pair. The classification system was developed using classical machine
learning models such as logistic regression, naive Bayes, and SVM.</p>
      <p>• Useful - The specified comment is appropriate for the corresponding source code.
• Not Useful - The specified comment is not appropriate for the corresponding source code.</p>
      <p>The seed dataset has 9000 code-comment pairs which are written in C language. Each data
contains comment text, surrounding code snippet, and a label that describes its usefullness. The
whole dataset is gathered from the GitHub and is annotated with the help of a 14 annotators
group. An example of dataset is presented in table 1. Another set of code-comment pairs is
collected from a diferent online resources and is augmented with the above mentioned dataset.
The set of code-comment pairs is categorized into two above-mentioned classes using a large
language model. This newly generated dataset in then added with the seed dataset.</p>
      <p>The classification model is then again trained using this augmented dataset in order to
understand the efect of augmentation. Diferent factors are analysed including noise inclusion,
distribution of dataset, which cause the change in accuracy during the training of classification
framework with augmented dataset.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Working Principle</title>
      <p>A binary classification system was implemented with the help of three machine learning models
- logistic regression, support vector machine, and multinomial naïve Bayes. The system took
both the source code and corresponding comments as input. Considering the task criteria we
did not use any deep learning frameworks in our classification system. The comments are first
tokenized using English word lemmatizer. Following that the set of tokens are vectorized using
TF-IDF vectorizer. The TF-IDF matrix generated from the vectorization step along with class
labels was used as feature fed into the classification models. These models are trained using the
primary seed data and tested using test dataset. The unlikeliness in training data was controlled
using five-fold cross validation. We will briefly elucidate each machine learning models in the
subsequent subsections.</p>
      <sec id="sec-4-1">
        <title>4.1. Logistic Regression</title>
        <p>We use logistic regression for the binary comment classification task where a logistic function
is used in order to keep the output between 0 and 1. The logistic function is defined as follows:
 =  + 
() =</p>
        <p>1
1 + (− )</p>
        <p>Equation (1) is referred as the linear regression equation whose output (Z) is passed to the
logistic function. The logistic function is defined in equation 2. The binary class is predicted from
the probability value generated by the logistic function based on the acceptance threshold. The
threshold value is kept to 0.6 which is in favor of the useful comment class. A three-dimensional
input feature is extracted from each training instance which is passed to the regression function.
During training the Cross entropy loss function is used for the hyper-parameter tuning.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Support Vector Machine</title>
        <p>In the next step, a support vector machine model is implemented for binary classification task.
Classification is done based on the output of the linear function (equation 1). If the output is
greater than 1, then it is classified with one type of class, and if the output is less than -1, then
it is classified it with another class. We train the SVM model using the hinge loss function, as
shown below.</p>
        <p>(, , ) = 0,</p>
        <p>*  ≥ 1
= 1 −  * , ℎ</p>
        <p>It is noticed from the loss function, that the cost will be 0 if the predicted and actual values
are of the same sign. In this case, the loss value is called if the predicted and actual values are of
diferent signs. The Hinge loss function is used for the SVM model hyper-parameter tuning.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Multinomial Naive Bayes</title>
        <p>Multinomial Naïve Bayes model is also used in this task mainly for text classification. This
model uses the Bayes’ theorem mentioned as below:
 (|) =
 (|). ()
 ()
where,
 (|) is the posterior probability of class y given features X.
(1)
(2)
(3)
(4)
 (|) is the likelihood, representing the probability of observing features X given class y.
 () is the prior probability of class y.
 () is the probability of observing features X, which acts as a normalization constant.</p>
        <p>Multinomial Naive Bayes operates on the assumption that each of the features are
conditionally independent of the other given some class.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>A system with Intel i5 processor and 32 GB RAM is employed for the task implementation.
The whole task has three steps, as mentioned earlier. At first, the seed dataset is divided into
two segments training data (90%) and validation data (10%). Training dataset is exploited to
train the three ML models - logistic regression, support vector machine, and multinomial naive
Bayes. The test dataset contains 1001 instances. Among them, 719 instances are labeled as
not useful and 282 instances are useful. All three models are tested on this test dataset and an
overall accuracy of 82.92%, 83.92%, and 50.75% are achieved for logistic regression, support
vector machine, and multinomial naive Bayes model respectively. The corresponding confusion
matrix are plotted in figure 1. We noticed that the naive Bayes algorithm fails to predict the not
useful data eficiently which leads to a degradation in the overall accuracy.</p>
      <p>Another dataset is generated using large language model which consists of 311 useful samples
and 21 not useful samples. This generated data is then augmented with seed data. This new
dataset is then again divided into two parts - training and validation dataset. Furthermore,
those new training and validation dataset are used to train the same classification models. The
newly trained models are tested with the same test data. An overall accuracy of 82.92%, 84.12%,
and 50.05% are achieved from the three models respectively. The individual confusion matrix
for all three models are also displayed in figure 2. The evaluation results for all three models
are illustrated in table 2. It is evident that the models trained with the augmented dataset
experience a slight decrease in the accuracy compared to the previously achieved accuracy
from seed dataset. This may be attributed to the incorporation of noises in the seed data from
large language models. This noises are mainly generated because of the imperfection of large
language model such as chatGPT 4 in our case, which leads to a decrease in the overall accuracy.
Still we can argue that the augmented dataset is well-balanced for machine learning model
training and generates a similar accuracy as for the initial seed dataset.</p>
      <p>(a) Logistic regression
(b) Support vector machine
(c) Multinomial naive Bayes</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>This paper proposes a framework for source code comment classification, which classify a
comment based on its usefulness within the source code. Three machine learning models such
as logistic regression, support vector machine, and multinomial naive Bayes are implemented
and trained using seed dataset. These classifier classify each comment into two categories
useful and not useful. These three models exhibit an accuracy of 82.92%, 83.92%, and 50.75%
respectively. Subsequently this seed dataset is augmented with a newly generated dataset that
are gathered from online sources. The corresponding class for the new set are generated using
chatGPT large language model (LLM). The newly generated augmented dataset is again used to
train all the models. It is observed that the new augmented dataset drops down the accuracy for
all three models due to inclusion of noise and biasness owing to the LLM generated dataset.
approach to program comprehension from code comments, in: Advanced Computing and
Systems for Security, Springer, 2020, pp. 29–42.
[17] S. Majumdar, A. Bandyopadhyay, S. Chattopadhyay, P. P. Das, P. D. Clough, P. Majumder,
Overview of the irse track at fire 2022: Information retrieval in software engineering, in:
Forum for Information Retrieval Evaluation, ACM, 2022.
[18] S. Majumdar, A. Bandyopadhyay, P. P. Das, P. Clough, S. Chattopadhyay, P. Majumder,
Can we predict useful comments in source codes?-analysis of findings from information
retrieval in software engineering track@ fire 2022, in: Proceedings of the 14th Annual
Meeting of the Forum for Information Retrieval Evaluation, 2022, pp. 15–17.
[19] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan,
P. Shyam, G. Sastry, A. Askell, et al., Language models are few-shot learners, Advances in
neural information processing systems 33 (2020) 1877–1901.
[20] S. Majumdar, S. Paul, D. Paul, A. Bandyopadhyay, B. Dave, S. Chattopadhyay, P. P. Das, P. D.</p>
      <p>Clough, P. Majumder, Generative ai for software metadata: Overview of the information
retrieval in software engineering track at fire 2023, in: Forum for Information Retrieval
Evaluation, ACM, 2023.
[21] S. Majumdar, A. Varshney, P. P. Das, P. D. Clough, S. Chattopadhyay, An efective
lowdimensional software code representation using bert and elmo, in: 2022 IEEE 22nd
International Conference on Software Quality, Reliability and Security (QRS), IEEE, 2022,
pp. 763–774.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Berón</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. R.</given-names>
            <surname>Henriques</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Varanda</surname>
          </string-name>
          <string-name>
            <surname>Pereira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Uzal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Montejano</surname>
          </string-name>
          ,
          <article-title>A language processing tool for program comprehension</article-title>
          , in: XII Congreso
          <string-name>
            <surname>Argentino de Ciencias de la Computación</surname>
          </string-name>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>C. B. de Souza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Anquetil</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. M. de Oliveira</surname>
          </string-name>
          ,
          <article-title>A study of the documentation essential to software maintenance</article-title>
          ,
          <source>Conference on Design of communication, ACM</source>
          ,
          <year>2005</year>
          , pp.
          <fpage>68</fpage>
          -
          <lpage>75</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Hotcomments: how to make program comments more useful?, in: Conference on Programming language design and implementation (SIGPLAN)</article-title>
          , ACM,
          <year>2007</year>
          , pp.
          <fpage>20</fpage>
          -
          <lpage>27</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Papdeja</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>S. K.</given-names>
          </string-name>
          <string-name>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <article-title>Smartkt: a search framework to assist program comprehension using smart knowledge transfer</article-title>
          ,
          <source>in: 2019 IEEE 19th International Conference on Software Quality, Reliability and Security (QRS)</source>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>97</fpage>
          -
          <lpage>108</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Sahoo</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
          </string-name>
          ,
          <article-title>Debugging multi-threaded applications using pin-augmented gdb (pgdb)</article-title>
          ,
          <source>in: International conference on software engineering research and practice (SERP)</source>
          . Springer,
          <year>2015</year>
          , pp.
          <fpage>109</fpage>
          -
          <lpage>115</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Sahoo</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
          </string-name>
          ,
          <article-title>D-cube: tool for dynamic design discovery from multi-threaded applications using pin</article-title>
          ,
          <source>in: 2016 IEEE International Conference on Software Quality, Reliability and Security (QRS)</source>
          , IEEE,
          <year>2016</year>
          , pp.
          <fpage>25</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Chakrabarti</surname>
          </string-name>
          ,
          <article-title>A mathematical framework for design discovery from multi-threaded applications using neural sequence solvers</article-title>
          ,
          <source>Innovations in Systems and Software Engineering</source>
          <volume>17</volume>
          (
          <year>2021</year>
          )
          <fpage>289</fpage>
          -
          <lpage>307</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pratim Das</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chakrabarti</surname>
          </string-name>
          ,
          <article-title>Dcube_ nn d cube nn: Tool for dynamic design discovery from multi-threaded applications using neural sequence models</article-title>
          ,
          <source>Advanced Computing and Systems for Security:</source>
          Volume
          <volume>14</volume>
          (
          <year>2021</year>
          )
          <fpage>75</fpage>
          -
          <lpage>92</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Siegmund</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Peitek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Parnin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Apel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hofmeister</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kästner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Begel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bethmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Brechmann</surname>
          </string-name>
          ,
          <article-title>Measuring neural eficiency of program comprehension</article-title>
          ,
          <source>in: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>140</fpage>
          -
          <lpage>150</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. D.</given-names>
            <surname>Gotmare</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. D.</given-names>
            <surname>Bui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. C.</given-names>
            <surname>Hoi</surname>
          </string-name>
          , Codet5+:
          <article-title>Open code large language models for code understanding and generation</article-title>
          ,
          <source>arXiv preprint arXiv:2305.07922</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Freitas</surname>
          </string-name>
          , D. da Cruz,
          <string-name>
            <given-names>P. R.</given-names>
            <surname>Henriques</surname>
          </string-name>
          ,
          <article-title>A comment analysis approach for program comprehension</article-title>
          ,
          <source>Annual Software Engineering Workshop</source>
          (SEW), IEEE,
          <year>2012</year>
          , pp.
          <fpage>11</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D.</given-names>
            <surname>Steidl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hummel</surname>
          </string-name>
          , E. Juergens,
          <article-title>Quality analysis of source code comments</article-title>
          ,
          <source>International Conference on Program Comprehension (ICPC)</source>
          , IEEE,
          <year>2013</year>
          , pp.
          <fpage>83</fpage>
          -
          <lpage>92</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>M. M. Rahman</surname>
            ,
            <given-names>C. K.</given-names>
          </string-name>
          <string-name>
            <surname>Roy</surname>
          </string-name>
          , R. G. Kula,
          <article-title>Predicting usefulness of code review comments using textual features and developer experience</article-title>
          ,
          <source>International Conference on Mining Software Repositories (MSR)</source>
          , IEEE,
          <year>2017</year>
          , pp.
          <fpage>215</fpage>
          -
          <lpage>226</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bosu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Greiler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bird</surname>
          </string-name>
          ,
          <article-title>Characteristics of useful code reviews: An empirical study at microsoft</article-title>
          ,
          <source>Working Conference on Mining Software Repositories, IEEE</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>146</fpage>
          -
          <lpage>156</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bansal</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>P. D.</given-names>
          </string-name>
          <string-name>
            <surname>Clough</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Datta</surname>
            ,
            <given-names>S. K.</given-names>
          </string-name>
          <string-name>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <article-title>Automated evaluation of comments to aid software maintenance</article-title>
          ,
          <source>Journal of Software: Evolution and Process</source>
          <volume>34</volume>
          (
          <year>2022</year>
          )
          <article-title>e2463</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Papdeja</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>S. K.</given-names>
          </string-name>
          <string-name>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <article-title>Comment-mine-a semantic search</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>