<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Improving Binary Code Comment Quality Classification with Augmented Code-Comment Pairs Using Generative AI</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rohith Arumugam S</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Angel Deborah S</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Assistant Professor, Department of Computer Science and Engineering, Sri Sivasubramaniya Nadar College of Engineering</institution>
          ,
          <addr-line>Chennai, Tamil Nadu</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Sri Sivasubramaniya Nadar College of Engineering</institution>
          ,
          <addr-line>Chennai, Tamil Nadu</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This report focuses on enhancing a binary code comment quality classification model by integrating generated code and comment pairs, to improve model accuracy. The dataset comprises 9048 pairs of code and comments written in the C programming language, each annotated as "Useful" or "Not Useful." Additionally, code and comment pairs are generated using a Large Language Model Architecture, and these generated pairs are labeled to indicate their utility. The outcome of this efort consists of two classification models: one utilizing the original dataset and another incorporating the augmented dataset with the newly generated code comment pairs and labels.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Generative AI</kwd>
        <kwd>Software Metadata Classification</kwd>
        <kwd>Code Comment Quality</kwd>
        <kwd>Binary Classification</kwd>
        <kwd>C Programming</kwd>
        <kwd>Large Language Model</kwd>
        <kwd>Data Augmentation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Eficiently assessing the quality of code comments is crucial for improving software maintainability and
reliability, a growing necessity in today’s software development landscape.[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] This paper addresses this
challenge by enhancing an existing binary classification model for code comment quality through the
integration of generated code-comment pairs, with the goal of improving both accuracy and eficiency.
      </p>
      <p>
        In modern software development, where the focus on increasing code maintainability, readability,
and overall system reliability is paramount, evaluating the quality of code and its associated comments
has become essential.[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] This research revisits current approaches to code comment quality evaluation,
emphasizing the limitations of traditional manual assessments, which are inherently subjective and
prone to individual bias.
      </p>
      <p>
        The key objective of this study is to improve an existing model by leveraging a robust dataset
containing 9048 code-comment pairs, each categorized as either "Useful" or "Not Useful." By pursuing
this goal, we aim to contribute to the advancement of automated code comment quality evaluation,
ofering a meaningful enhancement to contemporary software development workflows. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Understanding a program automatically is a well-known research area among people working in the
software domain. Numerous tools have been developed to aid in the extraction of knowledge from
software metadata, including elements such as runtime traces and structural attributes of code [
        <xref ref-type="bibr" rid="ref10 ref11 ref4 ref5 ref6 ref7 ref8 ref9">4, 5, 6,
7, 8, 9, 10, 11</xref>
        ].
      </p>
      <p>
        New programmers generally check for existing comments to understand a code flow. Although,
every comment is not helpful for program comprehension, which demands a relevancy check of source
code comments beforehand. Many researchers worked on the automatic classification of source code
comments in terms of quality evaluation. For example, Omal et al.[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] discussed that the factors
influencing software maintainability can be organized into hierarchical structures. The author defined
measurable attributes in the form of metrics for each factor which helps measure software characteristics,
and those metrics can be combined into a single index of software maintainability. Fluri et al.[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]
examined whether the source code and associated comments are changed together along the multiple
versions. They investigated three open source systems, such as ArgoUML, Azureus, and JDT Core, and
found that 97% of the comment changes are done in the same revision as the associated source code
changes. Another work[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] published in 2007 which proposed a two-dimensional maintainability model
that explicitly associates system properties with the activities carried out during maintenance. The
author claimed that this approach transforms the quality model into a structured quality knowledge
base that is usable in industrial environments. Storey et al. did an empirical study on task annotations
embedding within a source code and how it plays a vital role in a developer’s task management[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
The paper described how task management is negotiated between formal issue tracking systems and
manual annotations that programmers include within their source code. Ted et al.[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] performed a 3 × 2
experiment to compare the eforts of procedure format with those of comments on the readability of a
PL/I program. The readability accuracy was checked by questioning students about the program after
reading it. The result said that the program without comment was the least readable. Yu Hai et al.[17]
classified source code comments into four classes - unqualified, qualified, good, and excellent. The
aggregation of basic classification algorithms further improved the classification result. Another work
published in [18] in which author proposed an automatic classification mechanism "CommentProbe"
for quality evaluation of code comments of C codebases. We see that people worked on source code
comments with diferent aspects[ 18, 19, 20, 21, 22, 23], but still, automatic quality evaluation of source
code comments is an important area and demands more research.
      </p>
      <p>The advent of large language models (LLMs) [24] necessitates a comparison between the quality
assessment of code comments performed by established models, such as GPT-3.5 and LLaMA, and
evaluations based on human interpretation. The IRSE track at FIRE 2024 [25, 26] extends the approach
proposed in [18, 27, 28, 21] to explore various vector space models [29] and features for binary
classification and evaluation of comments in the context of their use in understanding the code. This track
also compares the performance of the prediction model with the inclusion of the GPT-generated labels
for the quality of code and comment snippets extracted from open-source software.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methods</title>
      <sec id="sec-3-1">
        <title>3.1. Data Collection and Code Comment Pair Extraction</title>
        <p>The process of data collection involved utilizing the GitHub API with a unique API token for
authentication. An API token was incorporated to enable access to the GitHub repositories. The search for
suitable repositories was conducted through a query specifically targeting repositories coded in the C
programming language. The GitHub API facilitated the retrieval of pertinent repository information.</p>
        <p>Upon identifying potential repositories, the script proceeded to access the contents of these
repositories. This was accomplished by sending requests to the respective GitHub endpoints. The response
from these requests, received in JSON format, contained detailed metadata about the files within the
repositories.</p>
        <p>Further refinement was necessary to focus exclusively on C files. This involved parsing the JSON
response and filtering files based on their file extensions. Specifically, files with the ’.c’ extension were
selected for subsequent processing, ensuring that only C programming files were included in the dataset.</p>
        <p>For each qualifying C file, the script meticulously parsed the file content. It employed a line-by-line
approach, allowing for the precise identification of comments and code sections. The parsing process
distinguished between single-line and multi-line comments, ensuring the accurate extraction of both
types. Comments within the code were identified based on standard commenting conventions, such as
‘//’ for single-line comments, ’/*’ for the beginning of a multi-line comment and ’*/’ for its end.</p>
        <p>The extracted code-comment pairs were organized into a structured format, enabling seamless storage
and subsequent analysis. These pairs constituted the foundational dataset upon which the subsequent
phases of the research were built.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Manual Labeling Process</title>
        <p>To facilitate the supervised learning aspect of the research, a portion of the acquired code-comment pairs
underwent manual labeling. Specifically, the initial 100 rows of the dataset were meticulously reviewed
and labeled as either "Useful" or "Not Useful." This manual labeling process ensured the presence of a
high-quality labeled subset, vital for training and evaluating machine learning models.</p>
        <p>The manual labeling process involved a meticulous examination of the contextual relevance and
informativeness of comments within the code context. Comments deemed to significantly enhance
the understanding of the code, improve readability, or provide valuable insights were categorized as
"Useful." Conversely, comments lacking relevance, clarity, or informativeness were categorized as "Not
Useful." This manual curation ensured the creation of a reliable ground truth dataset, crucial for training
and validating machine learning algorithms.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Machine Learning Model Training and Evaluation</title>
        <p>The machine learning model utilized for this research was BERT (Bidirectional Encoder Representations
from Transformers), a state-of-the-art transformer-based architecture. The model training process
commenced with the preprocessing of the labeled dataset. This involved the concatenation of comments
and their surrounding code context, creating cohesive textual sequences. These sequences were
tokenized using the ’bert-base-uncased’ tokenizer, ensuring compatibility with the pre-trained BERT
model.</p>
        <p>The dataset was meticulously divided into training and test sets, employing a standard 80-20 split
ratio. The BERT model was then fine-tuned on the training data, incorporating a reduced learning rate
of 1e-6 to optimize convergence. To handle the substantial dataset efectively, a batch size of 8 was
employed, with gradient accumulation over 4 batches. This approach allowed for eficient processing
and optimization of the model’s performance.</p>
        <p>The fine-tuned BERT model was subsequently evaluated on the test set to gauge its eficacy in
classifying comments as either "Useful" or "Not Useful." Model predictions were generated, and the
accuracy metric was calculated using the scikit-learn library. This rigorous evaluation process ensured
the determination of the model’s classification accuracy, a pivotal metric in assessing its efectiveness
in code comment quality assessment.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Predictive Analysis and Result Interpretation</title>
        <p>The final phase of the research involved leveraging the fine-tuned BERT model to make predictions on
a distinct dataset. These predictions, indicative of the model’s classification prowess, were meticulously
analyzed and interpreted. The output, consisting of predicted labels for each code-comment pair, was
organized into a structured format for comprehensive analysis.</p>
        <p>Additionally, a comparative analysis was conducted between the manual labels and the model
predictions. Discrepancies, if any, were scrutinized to discern patterns and insights into the model’s
decision-making process. This meticulous analysis facilitated a deeper understanding of the model’s
strengths and potential areas for enhancement, contributing valuable insights to the research findings.</p>
        <p>This comprehensive and detailed methodology encompassed every stage of the research process,
ensuring meticulous data collection, manual curation, machine learning model training, and rigorous
evaluation. The intricate interplay between manual expertise and advanced machine learning techniques
formed the foundation of this research, culminating in a robust and reliable code comment quality
assessment framework.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiment Design</title>
      <sec id="sec-4-1">
        <title>4.1. Problem Definition:</title>
        <p>The primary objective of our experiment is to enhance code comment quality assessment using a
combination of automated code-comment pair extraction from GitHub repositories and
state-of-theart machine learning techniques, specifically the BERT (Bidirectional Encoder Representations from
Transformers) model. We aim to categorize code comments as "Useful" or "Not Useful" based on their
contextual relevance, clarity, and informativeness.[30]</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Data Collection and Preprocessing</title>
        <p>We obtain code-comment pairs by querying GitHub repositories coded in C language. These pairs are
then meticulously parsed and tokenized for further analysis. The resulting dataset is represented as
{(1, 1), (2, 2), . . . , (, )} where  represents the comment and  represents its label ("Useful"
or "Not Useful").</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Manual Labeling Process</title>
        <p>The first 100 code-comment pairs are manually labeled based on predefined criteria. Let
the manually labeled set {(1, 1), (2, 2), . . . , (100, 100)} where  ∈ {0, 1}.
 represent</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Model Architecture</title>
        <p>We employ the BERT model, a transformer-based architecture, for sequence classification. The model is
trained to predict the usefulness label () of a given code comment (). The BERT model transforms
each comment into an embedding vector .</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. Loss Function</title>
        <p>The model is trained using the cross-entropy loss function, which computes the loss L as follows:

1 ∑︁ (· log(  )+(1− )· log(1−   ))
 = −</p>
        <p>=1
where  () is the sigmoid activation function.</p>
      </sec>
      <sec id="sec-4-6">
        <title>4.6. Training Procedure</title>
        <p>The model’s performance is evaluated using accuracy (Acc), precision(P), recall(R), and F1 - Score (F1).
These metrics are calculated as follows:</p>
        <p>Acc =
 =
 =
 1 =</p>
        <p>Number of Correct Predictions</p>
        <p>Total Number of Predictions</p>
        <p>True Positives
True Positives + False Positives</p>
        <p>True Positives
True Positives + False Negatives
2 ×  ×</p>
        <p>+</p>
      </sec>
      <sec id="sec-4-7">
        <title>4.7. Experimental Workflow</title>
        <p>The process begins with data collection, where code-comment pairs are obtained from random GitHub
repositories. Subsequently, the first 100 pairs are manually labeled based on predefined criteria.
Following manual labeling, the entire dataset undergoes tokenization and preprocessing. The preprocessed data
is then utilized to train a BERT model (). After training, the model’s performance is evaluated using
a test dataset, and metrics such as accuracy, precision, recall, and F1-score are calculated. The final step
involves interpreting the results, analyzing model predictions, and identifying false positives/negatives
to gain insights into the model’s performance and efectiveness in understanding code comments.</p>
        <p>Refer to Figure 1 for the detailed architecture diagram illustrating the entire process.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results and Analysis</title>
      <p>In the context of the code comment quality assessment task, a comprehensive analysis of experimental
results was conducted on two datasets: the original dataset (Seed Data) and an augmented dataset
comprising additional comments generated using Language Model (LLM) techniques (Seed Data + LLM
Generated Data). The evaluation involved various machine learning algorithms, including Decision Tree
Classifier[ 31], Artificial Neural Network (ANN)[ 32], Support Vector Machine (SVM)[33], Random Forest
Classifier[ 34], Gradient Boosting Classifier[ 35], Logistic Regression[36], Naive Bayes[37], LightGBM
Classifier [ 30], k-Nearest Neighbors (KNN) Classifier [ 38], and Recurrent Neural Network (RNN)[39].
The performance metrics, including precision, recall, and F1-score, were used for the assessment. The
detailed results of these experiments can be found in Table 1, Table 2 and Figure 2.</p>
      <sec id="sec-5-1">
        <title>5.1. Key Observations and Insights</title>
        <p>The results demonstrate that the combination of Seed Data and LLM Generated Data consistently
improved performance metrics such as precision, recall, and F1-score across most algorithms.</p>
        <p>Notably, ANN and SVM exhibited impressive performance on both datasets, with high precision and
recall values. These models efectively balanced precision and recall, crucial for code comment quality
assessment.</p>
        <p>The introduction of comments generated by LLM notably enhanced the performance of all algorithms.
This highlights the utility of synthetic data in improving model generalization and robustness.</p>
        <p>Decision Tree and Logistic Regression, although achieving reasonable results, demonstrated a more
significant improvement when exposed to LLM Generated Data. This suggests that these models might
benefit significantly from increased and diverse training data.</p>
        <p>Models such as Naive Bayes achieved high recall values but at the expense of precision. This trade-of
emphasizes the challenge of striking a balance between minimizing false positives (precision) and
capturing all relevant instances (recall).</p>
        <p>The RNN model exhibited a perfect recall on Seed Data but showed a notable decrease in precision
and recall when applied to Seed Data + LLM Generated Data. This indicates potential challenges in
adapting RNN architectures to mixed datasets.</p>
        <p>Diferent algorithms might be preferred depending on the specific use case. For instance, if minimizing
false positives is critical, models with higher precision, such as ANN and SVM, could be the preferred
choice.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Future Outlook</title>
      <p>In summary, this in-depth study has navigated the intricate field of assessing code comment quality. We
began by carefully designing experiments that merged state-of-the-art machine learning methodologies,
prominently featuring the BERT model, with a carefully curated dataset sourced from GitHub
repositories. The detailed experimental process—from gathering data to training models—was structured to
provide a solid groundwork for rigorous analysis.</p>
      <p>The experimentation phase was both varied and insightful, involving algorithms from Decision Trees
to advanced Neural Networks, each bringing out distinct strengths and limitations. By combining
original seed data with LLM-generated examples, we observed notable improvements across various
evaluation metrics, underscoring the promise of hybrid approaches in practical scenarios.</p>
      <p>Our findings revealed subtle insights into algorithm performance: from the high precision of Artificial
Neural Networks to the strong recall of Support Vector Machines, each algorithm demonstrated unique
tendencies. Comparing seed data with LLM-generated data ofered a comprehensive perspective,
underlining the important balance between precision and recall, essential in assessing code comment
quality.</p>
      <p>Yet, this exploration only scratches the surface of a promising field. Future directions may involve
integrating cutting-edge natural language processing methods to better contextualize code comments.
Exploring transformer models beyond BERT, such as GPT, could unlock new potential in understanding
and evaluating code documentation.</p>
      <p>Ethical considerations are also crucial. Maintaining unbiased data practices, addressing potential
biases within algorithms, and safeguarding privacy remain essential as AI ethics evolve.</p>
      <p>Moreover, collaboration between academia and industry will be vital. Industry expertise can inform
academic research to develop impactful, real-world solutions, while academic innovation can inspire
industry to adopt novel practices. This synergy is likely to accelerate advancements in the field.</p>
      <p>Ultimately, this study represents a foundational step in the expansive domain of code comment
quality assessment. As technology evolves and new challenges arise, the fusion of human insight
and machine learning will be pivotal in unraveling the nuances of evaluating code comments. With
sustained commitment, collaborative eforts, and ethical rigor, the future of code comment assessment
promises to be eficient, precise, deeply meaningful, and responsive to real-world needs.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used ChatGPT in order to: Grammar and spelling
check. After using these tool(s)/service(s), the author(s) reviewed and edited the content as needed and
take(s) full responsibility for the publication’s content.
[17] H. Yu, B. Li, P. Wang, D. Jia, Y. Wang, Source code comments quality assessment method based on
aggregation of classification algorithms, Journal of Computer Applications 36 (2016) 3448.
[18] S. Majumdar, A. Bansal, P. P. Das, P. D. Clough, K. Datta, S. K. Ghosh, Automated evaluation of
comments to aid software maintenance, Journal of Software: Evolution and Process 34 (2022)
e2463.
[19] S. Majumdar, S. Papdeja, P. P. Das, S. K. Ghosh, Comment-mine—a semantic search approach to
program comprehension from code comments, Advanced Computing and Systems for Security:
Volume Twelve (2020) 29–42.
[20] S. Majumdar, A. Bandyopadhyay, S. Chattopadhyay, P. P. Das, P. D. Clough, P. Majumder, Overview
of the irse track at fire 2022: Information retrieval in software engineering., in: FIRE (Working
Notes), 2022, pp. 1–9.
[21] S. Majumdar, A. Bandyopadhyay, P. P. Das, P. Clough, S. Chattopadhyay, P. Majumder, Can
we predict useful comments in source codes?-analysis of findings from information retrieval in
software engineering track@ fire 2022, in: Proceedings of the 14th Annual Meeting of the Forum
for Information Retrieval Evaluation, 2022, pp. 15–17.
[22] S. Majumdar, P. P. Das, Smart knowledge transfer using google-like search, arXiv preprint
arXiv:2308.06653 (2023).
[23] P. Chakraborty, S. Dutta, D. K. Sanyal, S. Majumdar, P. P. Das, Bringing order to chaos:
Conceptualizing a personal research knowledge graph for scientists., IEEE Data Eng. Bull. 46 (2023)
43–56.
[24] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam,
G. Sastry, A. Askell, et al., Language models are few-shot learners, Advances in neural information
processing systems 33 (2020) 1877–1901.
[25] S. Paul, S. Majumdar, R. Shah, S. Das, M. Ghosh, D. Ganguly, G. Calikli, D. Sanyal, P. P. Das,
P. D Clough, A. Bandyopadhyay, S. Chattopadhyay, Generative ai for code metadata quality
assessment, in: Proceedings of the 16th Annual Meeting of the Forum for Information Retrieval
Evaluation, 2024.
[26] S. Paul, S. Majumdar, R. Shah, S. Das, M. Ghosh, D. Ganguly, G. Calikli, D. Sanyal, P. P. Das,
P. D Clough, A. Bandyopadhyay, S. Chattopadhyay, Overview of the irse track at fire 2024:
Information retrieval in software engineering, in: FIRE (Working Notes), 2024.
[27] S. Paul, S. Majumdar, A. Bandyopadhyay, B. Dave, S. Chattopadhyay, P. Das, P. D. Clough, P.
Majumder, Eficiency of large language models to scale up ground truth: Overview of the irse track
at forum for information retrieval 2023, in: Proceedings of the 15th Annual Meeting of the Forum
for Information Retrieval Evaluation, 2023, pp. 16–18.
[28] S. Majumdar, S. Paul, D. Paul, A. Bandyopadhyay, S. Chattopadhyay, P. P. Das, P. D. Clough,
P. Majumder, Generative ai for software metadata: Overview of the information retrieval in
software engineering track at fire 2023, arXiv preprint arXiv:2311.03374 (2023).
[29] S. Majumdar, A. Varshney, P. P. Das, P. D. Clough, S. Chattopadhyay, An efective low-dimensional
software code representation using bert and elmo, in: 2022 IEEE 22nd International Conference
on Software Quality, Reliability and Security (QRS), IEEE, 2022, pp. 763–774.
[30] J. Keim, A. Kaplan, A. Koziolek, M. Mirakhorli, Does bert understand code?–an exploratory study
on the detection of architectural tactics in code, in: European Conference on Software Architecture,
Springer, 2020, pp. 220–228.
[31] J. R. Quinlan, Learning decision tree classifiers, ACM Computing Surveys (CSUR) 28 (1996) 71–72.
[32] A. K. Jain, J. Mao, K. M. Mohiuddin, Artificial neural networks: A tutorial, Computer 29 (1996)
31–44.
[33] P.-H. Chen, C.-J. Lin, B. Schölkopf, A tutorial on  -support vector machines, Applied Stochastic</p>
      <p>Models in Business and Industry 21 (2005) 111–136.
[34] N. L. Afanador, A. Smolinska, T. N. Tran, L. Blanchet, Unsupervised random forest: a tutorial with
case studies, journal of Chemometrics 30 (2016) 232–241.
[35] A. Natekin, A. Knoll, Gradient boosting machines, a tutorial, Frontiers in neurorobotics 7 (2013)
21.
[36] A. DeMaris, A tutorial in logistic regression, Journal of Marriage and the Family (1995) 956–968.
[37] C. Haruechaiyasak, A tutorial on naive bayes classification, Last update 16 (2008).
[38] P. Cunningham, S. J. Delany, k-nearest neighbour classifiers-a tutorial, ACM computing surveys
(CSUR) 54 (2021) 1–25.
[39] G. Chen, A gentle tutorial of recurrent neural network with error backpropagation, arXiv preprint
arXiv:1610.02583 (2016).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Rani</surname>
          </string-name>
          ,
          <article-title>Speculative analysis for quality assessment of code comments</article-title>
          , in: 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion
          <string-name>
            <surname>Proceedings (ICSE-Companion</surname>
            <given-names>)</given-names>
          </string-name>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>299</fpage>
          -
          <lpage>303</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liping</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Fengrong</surname>
          </string-name>
          ,
          <article-title>A survey on research of code comment</article-title>
          ,
          <source>in: Proceedings of the 2019 3rd International Conference on Management Engineering, Software Engineering and Service Sciences</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>45</fpage>
          -
          <lpage>51</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bacchelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bird</surname>
          </string-name>
          , Expectations, outcomes, and
          <article-title>challenges of modern code review</article-title>
          ,
          <source>in: 2013 35th International Conference on Software Engineering (ICSE)</source>
          , IEEE,
          <year>2013</year>
          , pp.
          <fpage>712</fpage>
          -
          <lpage>721</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>C. B. de Souza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Anquetil</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. M. de Oliveira</surname>
          </string-name>
          ,
          <article-title>A study of the documentation essential to software maintenance</article-title>
          ,
          <source>Conference on Design of communication, ACM</source>
          ,
          <year>2005</year>
          , pp.
          <fpage>68</fpage>
          -
          <lpage>75</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Papdeja</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>S. K.</given-names>
          </string-name>
          <string-name>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <article-title>Smartkt: a search framework to assist program comprehension using smart knowledge transfer</article-title>
          ,
          <source>in: 2019 IEEE 19th International Conference on Software Quality, Reliability and Security (QRS)</source>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>97</fpage>
          -
          <lpage>108</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N.</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Sahoo</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
          </string-name>
          ,
          <article-title>Debugging multi-threaded applications using pin-augmented gdb (pgdb)</article-title>
          ,
          <source>in: International conference on software engineering research and practice (SERP)</source>
          . Springer,
          <year>2015</year>
          , pp.
          <fpage>109</fpage>
          -
          <lpage>115</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Sahoo</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
          </string-name>
          ,
          <article-title>D-cube: tool for dynamic design discovery from multi-threaded applications using pin</article-title>
          ,
          <source>in: 2016 IEEE International Conference on Software Quality, Reliability and Security (QRS)</source>
          , IEEE,
          <year>2016</year>
          , pp.
          <fpage>25</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Chakrabarti</surname>
          </string-name>
          ,
          <article-title>A mathematical framework for design discovery from multi-threaded applications using neural sequence solvers</article-title>
          ,
          <source>Innovations in Systems and Software Engineering</source>
          <volume>17</volume>
          (
          <year>2021</year>
          )
          <fpage>289</fpage>
          -
          <lpage>307</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pratim Das</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chakrabarti</surname>
          </string-name>
          ,
          <article-title>Dcube_ nn d cube nn: Tool for dynamic design discovery from multi-threaded applications using neural sequence models</article-title>
          ,
          <source>Advanced Computing and Systems for Security:</source>
          Volume
          <volume>14</volume>
          (
          <year>2021</year>
          )
          <fpage>75</fpage>
          -
          <lpage>92</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Siegmund</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Peitek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Parnin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Apel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hofmeister</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kästner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Begel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bethmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Brechmann</surname>
          </string-name>
          ,
          <article-title>Measuring neural eficiency of program comprehension</article-title>
          ,
          <source>in: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>140</fpage>
          -
          <lpage>150</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>N.</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Chakrabarti</surname>
          </string-name>
          ,
          <article-title>Parallelc-assist: Productivity accelerator suite based on dynamic instrumentation</article-title>
          , IEEE Access (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P.</given-names>
            <surname>Oman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hagemeister</surname>
          </string-name>
          ,
          <article-title>Metrics for assessing a software system's maintainability</article-title>
          ,
          <source>in: Proceedings Conference on Software Maintenance</source>
          <year>1992</year>
          , IEEE Computer Society,
          <year>1992</year>
          , pp.
          <fpage>337</fpage>
          -
          <lpage>338</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>B.</given-names>
            <surname>Fluri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wursch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. C.</given-names>
            <surname>Gall</surname>
          </string-name>
          ,
          <article-title>Do code and comments co-evolve? on the relation between source code and comment changes</article-title>
          ,
          <source>in: 14th Working Conference on Reverse Engineering (WCRE</source>
          <year>2007</year>
          ), IEEE,
          <year>2007</year>
          , pp.
          <fpage>70</fpage>
          -
          <lpage>79</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>F.</given-names>
            <surname>Deissenboeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wagner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pizka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Teuchert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-F.</given-names>
            <surname>Girard</surname>
          </string-name>
          ,
          <article-title>An activity-based quality model for maintainability</article-title>
          ,
          <source>in: 2007 IEEE International Conference on Software Maintenance, IEEE</source>
          ,
          <year>2007</year>
          , pp.
          <fpage>184</fpage>
          -
          <lpage>193</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>M.-A. Storey</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Ryall</surname>
            ,
            <given-names>R. I.</given-names>
          </string-name>
          <string-name>
            <surname>Bull</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Myers</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Singer</surname>
          </string-name>
          , Todo or to bug,
          <source>in: 2008 ACM/IEEE 30th International Conference on Software Engineering</source>
          , IEEE,
          <year>2008</year>
          , pp.
          <fpage>251</fpage>
          -
          <lpage>260</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>T.</given-names>
            <surname>Tenny</surname>
          </string-name>
          ,
          <article-title>Program readability: Procedures versus comments</article-title>
          ,
          <source>IEEE Transactions on Software Engineering</source>
          <volume>14</volume>
          (
          <year>1988</year>
          )
          <fpage>1271</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>