<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>India
$ mithundas@iitkgp.ac.in (M. Das); subhadeep.ju@gmail.com (S. Chatterjee)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Exploring Transformer-Based Models for Automatic Useful Code Comments Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mithun Das</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Subhadeep Chatterjee</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur</institution>
          ,
          <addr-line>West Bengal</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Siemens EDA</institution>
          ,
          <addr-line>Kolkata, West Bengal</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>Code commenting is a practice developer pursues to improve the readability of the code. Hence, it is essential to evaluate comments based on whether they increase code understandability for software maintenance tasks. Although some studies have been conducted for detecting useful code comments, most of them exploit various handcrafted features, and the recent advancement of transformer-based models is still under-explored. Therefore to fill the gap, we explore transformer-based models and propose a fusion-based solution for detecting Useful comments based on the shared task, "Information Retrieval in Software Engineering (IRSE) 1" at FIRE 2022. We observe that our fusion-based model BERT+CodeBERT(PP), which uses features from both code comments and snippets, achieves the highest Macro F1 score of 90.739 among all the models and ranked first in this task.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Comment Quality</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Classification</kwd>
        <kwd>Software Development</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Software development refers to a software deliverable’s design, documentation, programming,
testing, and ongoing maintenance 1. During the development and ongoing maintenance of
the product, a developer has to write a lot of code to fix bugs and extend/add new features.
Although during the development phase, the developer writes codes based on his thinking and
code writing style, it may not be the case when a bug comes, or new features have to be added;
the same developer may be working on the same code he wrote earlier. Therefore, the new
developer must understand the existing code before modifying it. However, it is not always
easy to understand the current code just by going through it without proper documentation.
Even if documentation is found, without adequate updates, it becomes outdated, which may not
benefit the new developers. Therefore, developers must manually search and mine source files
and other knowledge sources like emails, and defect trackers, to understand the application’s
overall design [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Nevertheless, this prolonged procedure decreases developer productivity,
introduces hidden defects resistant to regression testing, and lowers the quality of the code.
Hence code commenting is a technique developer follows to increase the readability of the
code [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. A code comment is a programmer-readable explanation or annotation in a computer
program’s source code, added to make the source code manageable for humans to comprehend, and
is commonly ignored by compilers and interpreters.2
      </p>
      <p>
        Commenting code is not always as easy as described in the definition. Depending on the
products and company guidelines, the technique of writing comments may difer. Although, to a
large extent, the commonality among all the guidelines is that a comment should be informative
and meaningful and represent whatever is written in actual coding without any ambiguity.
Nevertheless, it is undeniable that comments can be noisy, unstable, and may not evolve with
the source code [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]; still, source code and associated comments can play an essential role [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] in
understanding the code rapidly for ongoing maintenance.
      </p>
      <p>
        In addition, merely placing irrelevant comments in the codes does not add any significance
to enhance the readability of the source code. Thus it is necessary to identify useful comments
in the source code given a code snippet. Therefore to engage and facilitate the research around
useful comment detection, the organizers of the “Information Retrieval in Software Engineering
(IRSE)" [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] shared task at FIRE 2022 have introduced a dataset for useful comments classification
given the associated code snippet. The objective of the shared task is to devise methodologies
to detect useful comments automatically. We show some examples of Useful and Not-Useful
comments in Table 1.
      </p>
      <p>
        To this end, several strategies have been explored to classify comments based on various
handcrafted features such as explicit syntactic information, presence of specific tags (e.g., @param,
@deprecated, etc.), words and symbols; or implicit elements, such as comment length,
parts-ofspeech of comment words or the cosine similarity of words in code-comment pairs [
        <xref ref-type="bibr" rid="ref6 ref7 ref8 ref9">6, 7, 8, 9</xref>
        ];
however, the recent advancement of transformer-based models(e.g., BERT [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], CodeBERT [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ])
are still under-explored.
      </p>
      <p>
        In this paper, we attempt to use the existing transformer-based models for our classification
task, which have already been seen to outperform several baselines and stand as a
state-of-theart model for various downstream tasks [
        <xref ref-type="bibr" rid="ref12">12, 13, 14</xref>
        ] in Natural language processing. The rest of
the paper is organized as follows. In section 2, we discuss some of the related work. In section
      </p>
      <sec id="sec-1-1">
        <title>2https://en.wikipedia.org/wiki/Comment_(computer_programming)</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>This section discusses some of the proposed strategies in the literature to investigate and
evaluate comments quality by detecting inconsistencies and classifying comments.</p>
      <sec id="sec-2-1">
        <title>2.1. Detecting inconsistencies with source code</title>
        <p>Tan et al. [15] devised a tool iComments, to analyze comments written in natural language
to extract implicit program rules and used those rules to automatically detect inconsistencies
between comments and source code, indicating either bugs or irrelevant comments. For this
purpose, the author incorporates Machine Learning, Natural Language Processing(NLP),
Statistics, and Program Analysis methods and evaluates the tools on four large code bases: Linux,
Mozilla, Wine, and Apache. Ratol el al. [16] designed a new rule-based approach called Fraco to
detect fragile comments. It incorporates the identifier’s type, its morphology, the identifier’s
scope, and the location of the comment. The author evaluated the method by comparing its
precision and recall against hand-annotated benchmarks created for six targets Java systems
and compared the results against the performance of Eclipse’s automatic in-comment identifier
replacement feature.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Comment classification &amp; quality evaluation</title>
        <p>
          Haouari et al. [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] empirically studied existing comments in diferent open source Java projects
from both a quantitative and a qualitative point of view. The authors proposed a taxonomy of
comments based on comment scope (inline, method, constructor), comment type (application,
implementation, and the like), and comment style for their analysis. Padioleau et al. [17]
manually examined 1,050 comments randomly from operating systems code written in C from
three open source projects: Linux, FreeBSD, and OpenSolaris (started as closed software) due
to their overwhelming complexity and the critical essence of trustworthiness. The authors
studied the comments from several dimensions and categorized them based on memory, locks,
data-structure related, errors, control flow, TODO or FIXME, etc. Aman et al. [ 18] collected Java
methods (programs) from six popular open source products and conducted analyses on words
that emerged in their comments. The authors showed that a method with longer comments (more
words) tends to be more change-prone and requires more fixes after releases. Steidl et al. [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]
presented a semi-automatic approach for quality analysis and assessment of code comments. The
method furnishes a model for comment quality based on diferent comment categories(copyright,
header, member, inline, section, code, and application task). The authors explored machine
learning models to categorize comments on Java and C/C++ programs. Additionally, they
presented a quality model that filters out useless and uninformative comments by examining the
similarity of words in code-comment pairs using the Levenshtein distance and comment length.
Majumdar et al. [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] proposed CommentProbe for automatic classification and quality evaluation
of code comments of C codebases based on how they can assist in understanding existing code.
For this purpose, the authors have collected 20,206 comments from open-source GitHub projects
and annotated them with assistance from industry experts. The authors handcrafted several
features to analyze comments semantically and using machine learning models, classified them
as Useful, Partially Useful, and Not Useful.
        </p>
        <p>Although the existing methodologies established several baselines for meaningful code
comments analysis, none of the approaches used the recent advancement of transformer-based
models. Hence to fill the research gap, in this work, we propose a fusion-based technique using
pre-trained transformer-based models to classify Useful comments based on the dataset shared
by the organizers of IRSE.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset Description</title>
      <p>
        The shared task on Useful Comment Classification (given the surrounding code snippet) in
Information Retrieval in Software Engineering (IRSE) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] at FIRE 2022 is based on a classification
problem to evaluate the usefulness of code comments to improve the readability of the codes for
the developers. The primary focus of the shared task is to improvise methodologies for Useful
code comments detection. For this purpose, the organizers developed a dataset by labeling
code comments as Useful and Non-Useful based on the associated code. In total, a team of 14
annotators was assigned for the task, and two individual annotators labeled each code comment
as Useful and Non-Useful. To supervise the annotation process, the organizer conducted weekly
meetings with the annotators. To evaluate the quality of the annotated dataset, the organizers
used the cohen kappa score and achieved a kappa () value of 0.734, which shows substantial
agreement among the annotators. We show the class distribution of the shared dataset in Table
2. The training set consists of 8,047 code comments (out of which 4,337 comments were labeled
as Useful), and the test set consists of 1,001 Comments.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. System Description</title>
      <p>This section discusses the methodology we followed for detecting Useful code comments. The
detail of the pipeline is shown in Figure 1.</p>
      <sec id="sec-4-1">
        <title>4.1. Pre-Processing</title>
        <p>While manually going through the dataset, we observe the code comments contain lots of
special characters, blank spaces, newlines, etc. Therefore we apply pre-processing steps to
Code Snippet</p>
        <p>Pre Processing
Cleaned Code</p>
        <p>Pre Processing
Code Comments</p>
        <p>CodeBERT
(768 x 1)</p>
        <p>Dense Layer
(256 x 1)</p>
        <p>Dense Layer
(128 x 1)</p>
        <p>Dense Layer
(64 x 1)
r
ayeL )1x
isonu (281
F</p>
        <p>Classification</p>
        <p>Head</p>
        <p>Post
Processing
Cleaned Code
Comments</p>
        <p>BERT
(768 x 1)</p>
        <p>Dense Layer
(256 x 1)</p>
        <p>Dense Layer
(128 x 1)</p>
        <p>Dense Layer
(64 x 1)
remove all the non-English characters. Further, we observed the comments are associated with
the code snippet, so we removed the code comments from the code snippet as well.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Uni-modal Text-based BERT Model</title>
        <p>
          As part of our initial experiments, we pose the problem as a unimodal text classification task.
Here, instead of using the comments along with their associated codes, we only use the code
comments to find out whether the comment is useful or not. The idea is that although the code
associated with the comment is not utilized explicitly for the classification, sometimes developers
write code snippets in the comment to make it more transparent, which can indicate the model
to determine the usefulness of the comments. For this purpose, we use the transformer-form
model BERT [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>BERT3, which stands for Bidirectional Encoder Representations from Transformers, is
pretrained on a large corpus of English data utilizing masked language modeling (MLM) and
Next sentence prediction (NSP) objectives in a self-supervised manner. It consists of a stack
of transformer encoder layers with 12 “attention heads," i.e., fully connected neural networks
augmented with a self-attention mechanism. The model can handle a maximum of 512 tokens
as input. To fine-tune BERT, we add a fully connected layer with the output corresponding to
the CLS token in the input. This CLS token output usually carries the representation of the
sentence provided to the model.</p>
        <sec id="sec-4-2-1">
          <title>3https://huggingface.co/bert-base-uncased</title>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Fusion Model</title>
        <p>As discussed above, the uni-modal model does not explicitly consider the associated code
snippet along with the code comments. However, to decide whether a comment is useful, the
use of surrounding code is crucial. Therefore we design a fusion-based model to take into
consideration both code comments and code snippets to understand better the relationship
between the comments and surrounding code snippets.</p>
        <p>
          Although programming languages are primarily written in the English language, the grammar
of programming languages does not follow the rule of natural language. Hence using a model
like BERT, which is pre-trained on natural language, won’t be an excellent choice to represent
the code. Thus we use the model CodeBERT 4 [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], a multi-programming-lingual model
pre-trained on NL-PL pairs in 6 programming languages (Python, Java, JavaScript, PHP, Ruby,
Go). This model is initialized with Roberta-base and trained with MLM and Replaced Token
Detection(RTD) objective. Similar to BERT, the illustration of [CLS] works as the aggregated
sequence representation passed through the model.
        </p>
        <p>Here we extract the 768-dimensional features vector from the last layer of the BERT and
CodeBERT model using the code comments and code snippets, respectively. Then these feature
vectors are separately provided through three dense intermediate layers of size 256, 128, and
64, respectively. Finally, we concatenate all the nodes(BERT+CodeBERT) and reduce them to a
feature vector of length 2(Useful or Not-Useful).</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Post-Processing</title>
        <p>We further apply the following post-processing step to improve the classification performance
of the models. We interviewed two software developers and asked how they felt about the
short-length comments. Based on their prior experience, both of them have noticed that shorter
code comments are mostly Not-Useful. Thus as a post-processing step, we relabeled all the
comments less than five tokens to Not-Useful.</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. Experimental Setup</title>
        <p>We have trained the models for ten epochs with binary cross entropy loss for both the uni-modal
and fusion-based models. We have used the Adam optimizer with an initial learning rate of 2e-5
and epsilon of 1e-8. For the unimodal text-based BERT model, we have used the batch size of 16
and maximum token length of 100; for the fusion-based model, we have used the batch size of
32. Additionally, as no validation set was given for the experiments, we divided the training
data points into 85% and 15% split and used the 15% as a validation set. We predict the test
set for the best validation performance. We have used HuggingFace[19] and PyTorch [20] for
implementing all the models.</p>
        <sec id="sec-4-5-1">
          <title>4https://huggingface.co/microsoft/codebert-base</title>
          <p>Model Accuracy
BERT 89.810
BERT(PP) 92.607
BERT+CodeBERT 90.909
BERT+CodeBERT(PP) 92.807</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>Table 3 demonstrates the performance of each model. As expected, the BERT+CodeBERT
model(Accuracy: 90.909, Macro F1: 88.804), which utilizes features from code comments
and snippets, performs better than the standalone BERT(Accuracy: 89.810, Macro F1: 87.595)
model. Further, we observe post-processing improves the model’s performance. In
conclusion, BERT+CodeBERT(PP) model performed the best in terms of accuracy(92.807) and macro
F1(90.739) score. We plot the confusion matrix in Figure 2 to further assess the models. Although
post-processing makes the majority of the correction for Not-Useful classes; however, some of
the Useful classes’ test data points got misclassified, which further reduces the recall for Useful
classes, as shown in Table 3. The observation holds true for both the unimodal text-based model
and the fusion-based model. Nonetheless, the post-processing technique improves the overall
performance.</p>
      <p>
        While Majumdar et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] achieved a macro F1 score of 86.34, our fusion-based techniques
achieved a Macro F1 score of 90.73. Although one thing to keep in mind is that Majumdar et
al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] performed the analysis considering three classes, and here we have two classes; therefore,
the comparison is not entirely precise.
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In this shared task, we deal with a novel problem of classifying Useful code comments given
the surrounding code snippet. We evaluated the state-of-the-art transformer-based models. We
found that the fusion-based model BERT+CodeBERT, which uses features from code comments
and snippets, performs better than the standalone text-based BERT model. In the future, we
plan to explore other transformer-based models, such as Roberta [21], CodeT5 [22], etc., for
code understanding and useful comments detection. We also plan to explore knowledge graphs
relevant to Software Engineering to improve the classification performance.
[13] M. Das, P. Saha, R. Dutt, P. Goyal, A. Mukherjee, B. Mathew, You too brutus! trapping
hateful users in social media: Challenges, solutions &amp; insights, in: Proceedings of the 32nd
ACM Conference on Hypertext and Social Media, 2021, pp. 79–89.
[14] M. Das, S. Banerjee, P. Saha, Abusive and threatening language detection in urdu
using boosting based and bert based models: A comparative approach, arXiv preprint
arXiv:2111.14830 (2021).
[15] L. Tan, D. Yuan, G. Krishna, Y. Zhou, /* icomment: Bugs or bad comments?*, in: Proceedings
of twenty-first ACM SIGOPS symposium on Operating systems principles, 2007, pp. 145–
158.
[16] I. K. Ratol, M. P. Robillard, Detecting fragile comments, in: 2017 32nd IEEE/ACM
International Conference on Automated Software Engineering (ASE), IEEE, 2017, pp. 112–122.
[17] Y. Padioleau, L. Tan, Y. Zhou, Listening to programmers—taxonomies and characteristics
of comments in operating system code, in: 2009 IEEE 31st International Conference on
Software Engineering, IEEE, 2009, pp. 331–341.
[18] H. Aman, S. Amasaki, T. Yokogawa, M. Kawahara, Empirical analysis of words in comments
written for java methods, in: 2017 43rd Euromicro Conference on Software Engineering
and Advanced Applications (SEAA), IEEE, 2017, pp. 375–379.
[19] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf,
M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. L. Scao,
S. Gugger, M. Drame, Q. Lhoest, A. M. Rush, Huggingface’s transformers: State-of-the-art
natural language processing, 2020. arXiv:1910.03771.
[20] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin,
N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z. DeVito, M. Raison, A. Tejani,
S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chintala, Pytorch: An imperative style,
high-performance deep learning library, 2019. arXiv:1912.01703.
[21] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer,
V. Stoyanov, Roberta: A robustly optimized bert pretraining approach, arXiv preprint
arXiv:1907.11692 (2019).
[22] Y. Wang, W. Wang, S. Joty, S. C. Hoi, Codet5: Identifier-aware unified pre-trained
encoderdecoder models for code understanding and generation, in: Proceedings of the 2021
Conference on Empirical Methods in Natural Language Processing, 2021, pp. 8696–8708.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Papdeja</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>S. K.</given-names>
          </string-name>
          <string-name>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <article-title>Comment-mine-a semantic search approach to program comprehension from code comments</article-title>
          ,
          <source>in: Advanced Computing and Systems for Security</source>
          , Springer,
          <year>2020</year>
          , pp.
          <fpage>29</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L. H.</given-names>
            <surname>Etzkorn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. G.</given-names>
            <surname>Davis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. L.</given-names>
            <surname>Bowen</surname>
          </string-name>
          ,
          <article-title>The language of comments in computer software: A sublanguage of english</article-title>
          ,
          <source>Journal of Pragmatics</source>
          <volume>33</volume>
          (
          <year>2001</year>
          )
          <fpage>1731</fpage>
          -
          <lpage>1756</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Z. M.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. E.</given-names>
            <surname>Hassan</surname>
          </string-name>
          ,
          <article-title>Examining the evolution of code comments in postgresql</article-title>
          ,
          <source>in: Proceedings of the 2006 international workshop on Mining software repositories</source>
          ,
          <year>2006</year>
          , pp.
          <fpage>179</fpage>
          -
          <lpage>180</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Stroustrup</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sutter</surname>
          </string-name>
          , C+
          <article-title>+ core guidelines</article-title>
          ,
          <source>Web. Last accessed February</source>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bandyopadhyay</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. D Clough</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Chattopadhyay</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Majumder</surname>
          </string-name>
          ,
          <article-title>Overview of the IRSE track at FIRE 2022: Information Retrieval in Software Engineering, in: Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bansal</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>P. D.</given-names>
          </string-name>
          <string-name>
            <surname>Clough</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Datta</surname>
            ,
            <given-names>S. K.</given-names>
          </string-name>
          <string-name>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <article-title>Automated evaluation of comments to aid software maintenance</article-title>
          ,
          <source>Journal of Software: Evolution and Process</source>
          <volume>34</volume>
          (
          <year>2022</year>
          )
          <article-title>e2463</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Pascarella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bacchelli</surname>
          </string-name>
          ,
          <article-title>Classifying code comments in java open-source software systems</article-title>
          ,
          <source>in: 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR)</source>
          , IEEE,
          <year>2017</year>
          , pp.
          <fpage>227</fpage>
          -
          <lpage>237</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Haouari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sahraoui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Langlais</surname>
          </string-name>
          ,
          <article-title>How good is your comment? a study of comments in java programs</article-title>
          , in: 2011
          <source>International Symposium on Empirical Software Engineering and Measurement</source>
          , IEEE,
          <year>2011</year>
          , pp.
          <fpage>137</fpage>
          -
          <lpage>146</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Steidl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hummel</surname>
          </string-name>
          , E. Juergens,
          <article-title>Quality analysis of source code comments</article-title>
          ,
          <source>in: 2013 21st international conference on program comprehension (icpc)</source>
          , Ieee,
          <year>2013</year>
          , pp.
          <fpage>83</fpage>
          -
          <lpage>92</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <source>in: NAACL</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Duan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Shou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Qin</surname>
          </string-name>
          , T. Liu,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jiang</surname>
          </string-name>
          , et al.,
          <article-title>Codebert: A pre-trained model for programming and natural languages</article-title>
          , arXiv preprint arXiv:
          <year>2002</year>
          .
          <volume>08155</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sarkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Saha</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. Das</surname>
          </string-name>
          ,
          <article-title>Exploring transformer based models to identify hate speech and ofensive content in english and indo-aryan languages</article-title>
          ,
          <source>arXiv preprint arXiv:2111.13974</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>