<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Forum for Information Retrieval Evaluation, December</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Source Code Comment Classification using machine learning algorithms</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Trisha Datta</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Sri Ramaswamy Memorial Institute of Science and Technology</institution>
          ,
          <addr-line>Kattankulathur</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>1</volume>
      <fpage>5</fpage>
      <lpage>18</lpage>
      <abstract>
        <p>This paper proposes a framework for source code comment classification, which classify a comment based on its usefulness within the source code. This qualitative classification assists new developers in correctly comprehending the source code. We implement three machine learning models: logistic regression, support vector machine, and multinomial naive Bayes which are trained using an initial seed dataset. These classifier will classify each comment into two categories - useful and not useful. The initial trained models achieves accuracy of 83.42%, 84.72%, and 50.45% respectively. The dataset is then augmented using a new set of data extracted from several online resources. The corresponding class for the new set are generated using chatGPT large language model (LLM). All three models are again trained with the augmented dataset. The new models are tested with the same test dataset. We observe that all three models generates a little lower accuracy which demonstrates the inclusion of noise and biasness due to the LLM generated dataset.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Logistic Regression</kwd>
        <kwd>Support vector machine</kwd>
        <kwd>Multinomial naive Bayes</kwd>
        <kwd>Large language model</kwd>
        <kwd>Comment classification</kwd>
        <kwd>Qualitative analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In today’s technology-driven world, programming has a crucial role in all aspects of life, be
it medicine, communication, or transportation. As the number of applications increases, the
quantity of source code also becomes more. Revisions to pre-existing code also contribute to
this. Managing and debugging a huge bulk of code in a limited time is dificult. Trying to
achieve such a task can lead to poor documentation of the code. This makes recycling this old
code to make something new and improved a nearly impossible task.</p>
      <p>The programmer must execute the source code repeatedly in order to understand its working.
This is a slow, boring process that takes a lot of efort. So, programmers may end up trying to
speed up the process, which leads to errors. This reduces the functionality of the new code
or software. One way of tackling this issue is the comments used in coding. They provide a
shortcut to understanding the original programmer’s thought process while developing the
code. However, all programmers have diferent personal styles of writing these comments,
which makes it confusing. Therefore, it is essential to follow certain rules while programming,
including the comments.</p>
      <p>But in the case of already existing source code with no proper comments or documentation,
we need a diferent solution. Building software that helps improve the readability of such
existing source code is an area of interest for researchers right now. Such eforts can help
programmers save time and develop improved applications.</p>
      <p>In this paper, we implement a classification framework to categorize a code-comment pair
into two classes - useful and not useful. A seed dataset with almost 10,000 code-comments pairs
written in C language is used for training the framework. Three machine learning models, such
as support vector machine, logistic regression, and multinomial naive bayes are tried out to
implement the binary classification framework. Also, we pull out a set of code comment pairs
from internet resources, which is written in C language. A large language model, chatGPT 4,
is used to label them into two classes. The seed dataset is augmented using the new data and
again trains the classification models. The new F1 score and accuracy are compared with the
previous results. This comparison illuminates the advantages and drawbacks of the new dataset
for our classification model.</p>
      <p>The rest of the paper is organized as follows. Section 2 discusses the background work done
in the domain of comment classification. Details of existing methods are discussed in section
3. We discuss the proposed method in section 4. Results are addressed in section 5. Section 6
concludes the paper.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Understanding a program automatically is a well-known research area among people working
in the software domain. A significant number of tools [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5 ref6">1, 2, 3, 4, 5, 6</xref>
        ] have been proposed
to aid in extracting knowledge from software metadata [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] like runtime traces or structural
attributes of codes. New programmers generally check for existing comments to understand a
code flow. Although, every comment is not helpful for program comprehension, which demands
a relevancy check of source code comments beforehand. Many researchers worked on the
automatic classification of source code comments in terms of quality evaluation. For example,
Omal et al.[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] discussed that the factors influencing software maintainability can be organized
into hierarchical structures. The author defined measurable attributes in the form of metrics for
each factor which helps measure software characteristics, and those metrics can be combined
into a single index of software maintainability. Fluri et al.[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] examined whether the source code
and associated comments are changed together along the multiple versions. They investigated
three open source systems, such as ArgoUML, Azureus, and JDT Core, and found that 97% of
the comment changes are done in the same revision as the associated source code changes.
Another work[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] published in 2007 which proposed a two-dimensional maintainability model
that explicitly associates system properties with the activities carried out during maintenance.
The author claimed that this approach transforms the quality model into a structured quality
knowledge base that is usable in industrial environments. Storey et al. did an empirical study on
task annotations embedding within a source code and how it plays a vital role in a developer’s
task management[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The paper described how task management is negotiated between formal
issue tracking systems and manual annotations that programmers include within their source
code. Ted et al.[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] performed a 3× 2 experiment to compare the eforts of procedure format with
those of comments on the readability of a PL/I program. The readability accuracy was checked
by questioning students about the program after reading it. The result said that the program
without comment was the least readable. Yu Hai et al.[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] classified source code comments into
four classes - unqualified, qualified, good, and excellent. The aggregation of basic classification
algorithms further improved the classification result. Another work published in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] in which
author proposed an automatic classification mechanism "CommentProbe" for quality evaluation
of code comments of C codebases. We see that people worked on source code comments with
diferent aspects[
        <xref ref-type="bibr" rid="ref14 ref15 ref16 ref17">14, 15, 16, 17</xref>
        ], but still, automatic quality evaluation of source code comments
is an important area and demands more research.
      </p>
      <p>
        With the advent of large language models [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], it is important to compare the quality
assessment of code comments by the standard models like GPT 3.5 or llama with the human
interpretation. The IRSE track at FIRE 2023 [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] extends the approach proposed in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] to
explore various vector space models [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] and features for binary classification and evaluation
of comments in the context of their use in understanding the code. This track also compares
the performance of the prediction model with the inclusion of the GPT-generated labels for the
quality of code and comment snippets extracted from open-source software.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Task and Dataset Description</title>
      <p>This section describes the task addressed in this paper. The task can be addressed in three
modules - classification framework design with seed dataset, augmenting the seed dataset using
a large language model, and training the same classification framework with the new dataset.
We aim to implement a binary classification framework to classify source code and comments
pairs into two classes - useful and not useful. The procedure takes a comment description with
associated lines of code as input. The output will be a label such as useful or not useful for
the corresponding comment. This framework is designed to help developers comprehend the
associated code. Classical machine learning algorithms such as naive Bayes, logistic regression,
and SVM are employed to develop the classification system. The two classes of source code
comments can be described as follows:
• Useful - The given comment is relevant to the corresponding source code.
• Not Useful - The given comment is not relevant to the corresponding source code.</p>
      <p>The seed dataset consisting of over 9000 code-comment pairs written in C language is used
in our work. Each instance of data consists of comment text, a surrounding code snippet, and a
label that specifies whether the comment is useful or not. The whole dataset is collected from
GitHub and annotated by a team of 14 annotators. A sample data is illustrated in table 1. We
augment this dataset with another set of code-comment pairs collected from diferent online
resources. The set of code-comment pairs is categorized into two above-mentioned classes
using a large language model. The generated subset is then adjoined with the seed dataset.</p>
      <p>This augmented dataset is used to train the classification model another time to understand
the efect of augmentation. We analyze noise inclusion, distribution of dataset, and many other
#
1
2
/*cr to cr,nul*/</p>
      <p>Code
-10. int res = 0;
-9. CURL *curl = NULL;
-8. FILE *hd_src = NULL;
-7. int hd;
-6. struct_stat file_info;
-5. CURLM *m = NULL;
-4. int running;
-3. start_test_timing();
-2. if(!libtest_arg2) {
-1. #ifdef LIB529
/*test 529*/
1. fprin
-1. else
/*cr to cr,nul*/
1. newline = 0;
2. }
3. else {
4. if(test-&gt;rcount) {
5. c = test-&gt;rptr[0];
6. test-&gt;rptr++;
7. test-&gt;rcount–;
8. }
9. else
10. break;
-10. break;
-9. }
-8. gss_release_bufer(&amp;min_stat, &amp;status_string);
-7. }
-6. if(sizeof(buf) &gt; len + 3) {
-5. strcpy(buf + len, ".\n");
-4. len += 2;
-3. }
-2. msg_ctx = 0;
-1. while(!msg_ctx) {
/*con</p>
      <p>Label
Not Useful
Not Useful
Useful
3
/*convert minor status code
(underlying routine error) to text*/</p>
    </sec>
    <sec id="sec-4">
      <title>4. Working Principle</title>
      <p>factors that afect the change in accuracy while training with the augmented dataset.
We try to train three machine learning models- Logistic Regression, Support Vector Machine,
and Multinomial Naïve Bayes to implement the binary classification functionality. Note that we
do not implement any of the deep learning frameworks due to the constraint of the task criteria.
The system takes the comments and corresponding code segments as input. The comment is
lemmaitized and tokentized using English word tokenizer before transforming it to a numerical
vector space. We assume that all comments are written in English language. We use TF-IDF
vectorizer to transform each English keyword into numerical vector. It generates a TF-IDF
matrix for all keywords present in the comment set. This matrix is considered as a feature set in
our classification framework. The entire dataset is divided into two disjoint subsets - training
set (90%) and test set (10%). We extract features from the training data using above mentioned
techniques. The TF-IDF matrix along with class levels are fed into machine learning models
for training. These models are used to automatically assign each code-comment pair into two
classes. We will now briefly discuss about three machine learning classification models in the
subsequent subsections.</p>
      <sec id="sec-4-1">
        <title>4.1. Logistic Regression</title>
        <p>Logistic Regression is used for the binary comment classification. We use a function to keep
regression output between 0 and 1. The logistic function is defined as given below:
 =  + 
() =</p>
        <p>1
1 + (− )</p>
        <p>The output of the linear regression equation (refer equation 1) is passed to the logistic function
(refer equation 2). The probability value generated by the logistic function is used to predict the
binary class based on the acceptance threshold. The threshold value of 0.6 is kept in favor of
the useful comment class. A three-dimensional input feature is extracted from each training
instance which is passed to the regression function. The Cross-Entropy Loss function is used
during training for the Logistic Regression hyper-parameter tuning.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Support Vector Machine</title>
        <p>We next use a Support Vector Machine model to do the binary classification. We take the output
of the linear function, and if the output is greater than 1, we classify it with one class and if the
output is less than -1, we classify it with the other class. We train the SVM model using the
Hinge Loss function, as given below:
(, , ) = 0,</p>
        <p>*  ≥ 1
= 1 −  * , ℎ</p>
        <p>The loss function suggests that the cost is 0 if the predicted and actual values have same sign.
We calculate the loss value if they are of diferent signs. The Hinge Loss function is used for the
Support Vector Machine model hyper-parameter tuning.
(1)
(2)
(3)</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Multinomial Naive Bayes</title>
        <p>We also use Multinomial Naïve Bayes model, which is used mostly in text classification, to
implement the binary classification. This model uses the Bayes’ theorem mentioned as below:
 (|) =
 (|). ()
 ()
(4)
where,
 (|) is the posterior probability of class y given features X.
 (|) is the likelihood, representing the probability of observing features X given class y.
 () is the prior probability of class y.
 () is the probability of observing features X, which acts as a normalization constant.</p>
        <p>Multinomial Naive Bayes operates on the assumption that each of the features are
conditionally independent of the other given some class.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>The task has been implemented in a system with an Intel i5 processor and 32 GB RAM
configuration. As mentioned earlier in 3, the whole task has three steps. Initially, the seed dataset is
randomly divided into two parts - training data (90%) and validation data (10%). Three
classification model such as logistic regression, support vector machine, and multinomial naive Bayes
are trained using the same training dataset. The test dataset consists of 1001 data instances,
among them, 719 instances are labeled as not useful and 282 instances are useful. We test all
three models on this test dataset and achieves an overall accuracy of 83.42%, 84.72%, and 50.45%
respectively. The relevant confusion matrix are displayed in figure 1. It is evident that the naive
Bayes algorithm does not predict not useful data well. This leads to low overall accuracy for the
naive Bayes unlike the other two algorithms.</p>
      <p>We augment the seed data with a large language model generated data consisting of 309
useful samples and 25 not useful samples. The overall dataset is then divided into training and
validation dataset. Same classification models are again trained with the new training and
validation dataset. The newly trained models are tested with the same test data. These models
achieve overall accuracy of 83.32%, 84.12%, and 49.95% respectively. The individual confusion
matrix are also displayed in figure 2. We notice that the models trained from the augmented
dataset experience a small decrease in accuracy from the accuracy achieved for seed data. The
evaluation matrix for all three models are illustrated in table 2. This demonstrates that the large
language models introduce some noises in the seed data which afects the overall accuracy of
all three models. This noises are generated because of the imperfection of large language model
such as chatGPT 4 in our case. Still we can argue that the augmented dataset is well-balanced
for machine learning model training.</p>
      <p>(a) Logistic regression
(b) Support vector machine
(c) Multinomial naive Bayes</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>This paper has developed three binary classification models in the domain of code and comment
classification. The classification models are trained based on a seed data having two diferent
classes. All source data present in the dataset are written in C language and all comments
are written in English language. Each comment are tokenized and further vectorized using
the TF-IDF vectorizer. This numerical vector are used as feature in our classification models.
Also, the initial seed dataset is augmented using new data extracted from online sources. The
newly extracted data are then classified into same two classes using a large language model,
chatGPT. All three classification models are again trained with the augmented dataset. We have
observed that models trained with augmented data are producing a little lower accuracy than
those models trained with initial seed data. This exemplify for noisy data introduced as part of
the LLM generated dataset. We have also done a comparative analysis on both results. We can
argue that the biasness and noise present in the augmented dataset are degrading the accuracy</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Papdeja</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>S. K.</given-names>
          </string-name>
          <string-name>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <article-title>Smartkt: a search framework to assist program comprehension using smart knowledge transfer</article-title>
          ,
          <source>in: 2019 IEEE 19th International Conference on Software Quality, Reliability and Security (QRS)</source>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>97</fpage>
          -
          <lpage>108</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Sahoo</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
          </string-name>
          ,
          <article-title>Debugging multi-threaded applications using pin-augmented gdb (pgdb)</article-title>
          ,
          <source>in: International conference on software engineering research and practice (SERP)</source>
          . Springer,
          <year>2015</year>
          , pp.
          <fpage>109</fpage>
          -
          <lpage>115</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Sahoo</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
          </string-name>
          ,
          <article-title>D-cube: tool for dynamic design discovery from multi-threaded applications using pin</article-title>
          ,
          <source>in: 2016 IEEE International Conference on Software Quality, Reliability and Security (QRS)</source>
          , IEEE,
          <year>2016</year>
          , pp.
          <fpage>25</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Chakrabarti</surname>
          </string-name>
          ,
          <article-title>A mathematical framework for design discovery from multi-threaded applications using neural sequence solvers</article-title>
          ,
          <source>Innovations in Systems and Software Engineering</source>
          <volume>17</volume>
          (
          <year>2021</year>
          )
          <fpage>289</fpage>
          -
          <lpage>307</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pratim Das</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chakrabarti</surname>
          </string-name>
          ,
          <article-title>Dcube_ nn d cube nn: Tool for dynamic design discovery from multi-threaded applications using neural sequence models</article-title>
          ,
          <source>Advanced Computing and Systems for Security:</source>
          Volume
          <volume>14</volume>
          (
          <year>2021</year>
          )
          <fpage>75</fpage>
          -
          <lpage>92</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Siegmund</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Peitek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Parnin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Apel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hofmeister</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kästner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Begel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bethmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Brechmann</surname>
          </string-name>
          ,
          <article-title>Measuring neural eficiency of program comprehension</article-title>
          ,
          <source>in: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>140</fpage>
          -
          <lpage>150</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>C. B. de Souza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Anquetil</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. M. de Oliveira</surname>
          </string-name>
          ,
          <article-title>A study of the documentation essential to software maintenance</article-title>
          ,
          <source>Conference on Design of communication, ACM</source>
          ,
          <year>2005</year>
          , pp.
          <fpage>68</fpage>
          -
          <lpage>75</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Oman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hagemeister</surname>
          </string-name>
          ,
          <article-title>Metrics for assessing a software system's maintainability</article-title>
          ,
          <source>in: Proceedings Conference on Software Maintenance</source>
          <year>1992</year>
          , IEEE Computer Society,
          <year>1992</year>
          , pp.
          <fpage>337</fpage>
          -
          <lpage>338</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>B.</given-names>
            <surname>Fluri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wursch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. C.</given-names>
            <surname>Gall</surname>
          </string-name>
          ,
          <article-title>Do code and comments co-evolve? on the relation between source code and comment changes</article-title>
          ,
          <source>in: 14th Working Conference on Reverse Engineering (WCRE</source>
          <year>2007</year>
          ), IEEE,
          <year>2007</year>
          , pp.
          <fpage>70</fpage>
          -
          <lpage>79</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>F.</given-names>
            <surname>Deissenboeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wagner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pizka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Teuchert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-F.</given-names>
            <surname>Girard</surname>
          </string-name>
          ,
          <article-title>An activity-based quality model for maintainability</article-title>
          ,
          <source>in: 2007 IEEE International Conference on Software Maintenance, IEEE</source>
          ,
          <year>2007</year>
          , pp.
          <fpage>184</fpage>
          -
          <lpage>193</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>M.-A. Storey</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Ryall</surname>
            ,
            <given-names>R. I.</given-names>
          </string-name>
          <string-name>
            <surname>Bull</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Myers</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Singer</surname>
          </string-name>
          , Todo or to bug,
          <source>in: 2008 ACM/IEEE 30th International Conference on Software Engineering</source>
          , IEEE,
          <year>2008</year>
          , pp.
          <fpage>251</fpage>
          -
          <lpage>260</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>T.</given-names>
            <surname>Tenny</surname>
          </string-name>
          ,
          <article-title>Program readability: Procedures versus comments</article-title>
          ,
          <source>IEEE Transactions on Software Engineering</source>
          <volume>14</volume>
          (
          <year>1988</year>
          )
          <fpage>1271</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>H.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Source code comments quality assessment method based on aggregation of classification algorithms</article-title>
          ,
          <source>Journal of Computer Applications</source>
          <volume>36</volume>
          (
          <year>2016</year>
          )
          <fpage>3448</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bansal</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>P. D.</given-names>
          </string-name>
          <string-name>
            <surname>Clough</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Datta</surname>
            ,
            <given-names>S. K.</given-names>
          </string-name>
          <string-name>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <article-title>Automated evaluation of comments to aid software maintenance</article-title>
          ,
          <source>Journal of Software: Evolution and Process</source>
          <volume>34</volume>
          (
          <year>2022</year>
          )
          <article-title>e2463</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Papdeja</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>S. K.</given-names>
          </string-name>
          <string-name>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <article-title>Comment-mine-a semantic search approach to program comprehension from code comments</article-title>
          ,
          <source>in: Advanced Computing and Systems for Security</source>
          , Springer,
          <year>2020</year>
          , pp.
          <fpage>29</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bandyopadhyay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chattopadhyay</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>P. D.</given-names>
          </string-name>
          <string-name>
            <surname>Clough</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Majumder</surname>
          </string-name>
          ,
          <article-title>Overview of the irse track at fire 2022: Information retrieval in software engineering, in: Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bandyopadhyay</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Clough</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Chattopadhyay</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Majumder</surname>
          </string-name>
          ,
          <article-title>Can we predict useful comments in source codes?-analysis of findings from information retrieval in software engineering track@ fire 2022</article-title>
          ,
          <source>in: Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>15</fpage>
          -
          <lpage>17</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>T.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ryder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Subbiah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dhariwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neelakantan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shyam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          , et al.,
          <article-title>Language models are few-shot learners</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          )
          <fpage>1877</fpage>
          -
          <lpage>1901</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Paul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Paul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bandyopadhyay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Dave</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chattopadhyay</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>P. D.</given-names>
          </string-name>
          <string-name>
            <surname>Clough</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Majumder</surname>
          </string-name>
          ,
          <article-title>Generative ai for software metadata: Overview of the information retrieval in software engineering track at fire 2023, in: Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Varshney</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>P. D.</given-names>
          </string-name>
          <string-name>
            <surname>Clough</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Chattopadhyay</surname>
          </string-name>
          ,
          <article-title>An efective lowdimensional software code representation using bert and elmo</article-title>
          ,
          <source>in: 2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS)</source>
          , IEEE,
          <year>2022</year>
          , pp.
          <fpage>763</fpage>
          -
          <lpage>774</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>