<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Automated Classification of C Code Comment Quality with SVM and Naïve Bayes</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kumar Shresth</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Indian Institute of Technology</institution>
          ,
          <addr-line>Kharagpur (IIT-KGP), West Bengal-721302</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>Code comments are essential for efective software development, yet their quality often sufers, particularly among inexperienced programmers, leading to a high volume of unhelpful annotations. This study investigates the efectiveness of two machine learning models-the Support Vector Machine (SVM) and the Naïve Bayes Classifier-for automatically classifying the utility of comments in C source code. The results of these experiments provide a foundational benchmark for future research in this area. This work demonstrates that these models can serve as a starting point for developing more advanced machine learning solutions to improve the accuracy of comment quality assessment.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Machine Learning</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>SVM</kwd>
        <kwd>Naïve Bayes Classifier</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>AI-generated content into the machine learning pipeline for this task. This exploration of the synergy
between human-written and AI-generated data forms a significant component of our research.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The importance of software metadata for code maintenance and comprehension is well-established. A
variety of tools have been developed to extract knowledge from diferent forms of software metadata,
including code structure and runtime traces [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5 ref6">1, 2, 3, 4, 5, 6</xref>
        ].
      </p>
      <p>The specific area of mining code comments and assessing their quality has been investigated by
numerous authors. For instance, Steidl et al. [7] proposed methods to filter out irrelevant and
uninformative comments by analyzing word similarity in code-comment pairs using techniques like Levenshtein
distance and comment length. In a diferent approach, Rahman et al. [ 8] focused on distinguishing
between helpful and unhelpful code review comments by leveraging features identified in a survey of
developers at Microsoft [9].</p>
      <p>More recently, Majumdar et al. [10, 11, 12, 13, 14, 15, 16] have introduced a framework for evaluating
comments based on concepts central to code comprehension. Their approach utilizes a knowledge
network to semantically assess the information within comments, developing features based on the
correlation between the text and the code. Ultimately, these methodologies contribute to cleaning
codebases by using both semantic and structural information to classify comments based on their utility.</p>
      <p>The advent of large language models such as GPT-5.0 and Llama has introduced a new dimension
to this field, making it crucial to evaluate the quality of automatically generated code comments
against human standards. The IRSE track at FIRE 2023 [17, 18, 19, 20, 21, 22, 23] expanded upon
the methodology from previous work [10] to address this. This track explored various vector space
models [24] and features for the binary classification of comments, particularly concerning their role in
code comprehension. A key aspect of this track was comparing the prediction model’s performance
with labels generated by GPT for both code and comment quality from open-source projects.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Task and Dataset Description</title>
      <p>This section outlines the specifics of the experimental task and the dataset provided for it. The core
objective is as follows:</p>
      <p>To develop a binary classification model for assessing the quality of code comments, with the model’s
accuracy being potentially enhanced through data augmentation using generated code and comment pairs.</p>
      <p>The dataset provided for this task was divided into two main parts:
• A training dataset containing 8048 entries.</p>
      <p>• A testing dataset containing 1000 entries.</p>
      <p>For the purpose of model development, the training dataset was shufled and then partitioned,
with 70% allocated for training the models and the remaining 30% reserved for cross-validation. Each
comment in the dataset is categorized with one of two labels:
• Useful: Comments that contribute positively to code comprehension.</p>
      <p>• Not Useful: Comments that do not aid in understanding the code.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Dataset Augmentation</title>
      <p>To enhance the training dataset, we employed a data augmentation strategy utilizing the large language
model, GPT-5.0-turbo. The primary objective of this augmentation was to increase both the size and
diversity of the dataset. By leveraging the natural language generation capabilities of GPT-5.0, we
produced additional comment data.</p>
      <p>This process was designed to introduce a broader spectrum of writing styles, formats, and topics
into the training set. The inclusion of this GPT-generated data allowed us to evaluate its potential for
improving the performance of machine learning models on the task of comment quality classification.
This approach facilitated an investigation into how AI-generated content can supplement
humanwritten data, with the goal of creating a more comprehensive and robust dataset to ultimately enhance
model performance.</p>
    </sec>
    <sec id="sec-5">
      <title>5. System Description</title>
      <p>This section details the methodology used to build and evaluate the comment classification models,
including the text preprocessing pipeline, feature extraction techniques, and the machine learning
models themselves.</p>
      <p>Start
Text Preprocessing</p>
      <p>Feature Extraction
Train ML Models (SVM &amp; Naive Bayes)</p>
      <p>Evaluate Performance</p>
      <p>End</p>
      <sec id="sec-5-1">
        <title>5.1. Text Preprocessing</title>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Feature Extraction</title>
        <p>To convert the preprocessed text into a numerical format suitable for machine learning, the Tfidf
Vectorizer from the SciKit-Learn library was employed. This technique creates a matrix of TF-IDF (Term
Frequency-Inverse Document Frequency) features, which reflect the importance of a word in a document
relative to the entire corpus. The Keras library’s tokenizer was also utilized in this process.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Machine Learning Models</title>
        <p>Two distinct machine learning models were implemented for this classification task: a Support Vector
Machine (SVM) and a Naïve Bayes classifier. Both models were built using the SciKit-Learn library. The
SVM model was configured with the following parameters:
• C: The regularization parameter was set to 1.</p>
        <p>• kernel: A ’linear’ kernel was used for the classification.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Findings</title>
      <p>This section presents the performance of the SVM and Naïve Bayes models on the comment classification
task. The results are presented in two parts: first without data augmentation, and then with the inclusion
of GPT-5.0 generated data.</p>
      <sec id="sec-6-1">
        <title>6.1. Performance Without Data Augmentation</title>
        <p>On the original, un-augmented validation set, the Support Vector Machine (SVM) model achieved
an accuracy of 77.27% with a corresponding Macro F1 score of 0.771. The Naive Bayes classifier, in
comparison, yielded a 60.99% accuracy score and a Macro F1 score of 0.686. The detailed performance
metrics, including precision and recall, are presented in Table 2.</p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Performance With Data Augmentation</title>
        <p>After augmenting the training data with comments generated by GPT-5, both models showed an
improvement in performance. The SVM model’s accuracy increased to 77.65%, with a corresponding
Macro F1 score of 0.778. The Naive Bayes classifier demonstrated a more significant improvement,
reaching an accuracy of 64.03% and a Macro F1 score of 0.730. A comprehensive breakdown of these
results is provided in Table 3.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>This study successfully employed two fundamental machine learning models, the Support Vector
Machine and the Naïve Bayes classifier, to address the task of classifying C code comments. The results
obtained from the SVM classifier are promising and indicate that there is significant potential for further
improvement. These findings serve as a solid baseline and justify the exploration of more sophisticated
models that can better capture the nuances of the problem domain and achieve higher accuracy. It is
worth noting that prior research by Majumdar et al. [25] has already demonstrated the efectiveness
of neural networks for this task, and we anticipate that future work will continue to build upon these
successes.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this manuscript, the author utilized ChatGPT for assistance with grammar
and spelling checks. Following the use of this tool, the author thoroughly reviewed and edited the
content to ensure its accuracy and originality, and takes full responsibility for the final content of this
publication.</p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgments References</title>
      <p>The author would like to extend sincere thanks to the organizers of the IRSE track at FIRE for providing
this excellent opportunity to engage with such a compelling research problem. Their continuous
technical support throughout the duration of the project was invaluable.
[7] D. Steidl, B. Hummel, E. Juergens, Quality analysis of source code comments, International</p>
      <p>Conference on Program Comprehension (ICPC), IEEE, 2013, pp. 83–92.
[8] M. M. Rahman, C. K. Roy, R. G. Kula, Predicting usefulness of code review comments using textual
features and developer experience, International Conference on Mining Software Repositories
(MSR), IEEE, 2017, pp. 215–226.
[9] A. Bosu, M. Greiler, C. Bird, Characteristics of useful code reviews: An empirical study at microsoft,</p>
      <p>Working Conference on Mining Software Repositories, IEEE, 2015, pp. 146–156.
[10] S. Majumdar, A. Bansal, P. P. Das, P. D. Clough, K. Datta, S. K. Ghosh, Automated evaluation of
comments to aid software maintenance, Journal of Software: Evolution and Process 34 (2022)
e2463.
[11] S. Majumdar, S. Papdeja, P. P. Das, S. K. Ghosh, Comment-mine—a semantic search approach to
program comprehension from code comments, in: Advanced Computing and Systems for Security,
Springer, 2020, pp. 29–42.
[12] S. Majumdar, A. Bandyopadhyay, S. Chattopadhyay, P. P. Das, P. D. Clough, P. Majumder, Overview
of the irse track at fire 2022: Information retrieval in software engineering, in: Forum for
Information Retrieval Evaluation, ACM, 2022.
[13] S. Majumdar, A. Bandyopadhyay, P. P. Das, P. Clough, S. Chattopadhyay, P. Majumder, Can
we predict useful comments in source codes?-analysis of findings from information retrieval in
software engineering track@ fire 2022, in: Proceedings of the 14th Annual Meeting of the Forum
for Information Retrieval Evaluation, 2022, pp. 15–17.
[14] S. Majumdar, P. P. Das, Smart knowledge transfer using google-like search, arXiv preprint
arXiv:2308.06653 (2023).
[15] A. Mitra, S. Majumdar, A. Mukhopadhyay, P. P. Das, P. D. Clough, P. P. Chakrabarti,
Operationalizing large language models with design-aware contexts for code comment generation, arXiv
preprint arXiv:2510.22338 (2025).
[16] A. Deshpande, A. Maji, D. Mondol, P. P. Das, P. D. Clough, S. Majumdar, The code–llm handshake:
Smarter maintenance through ai, in: Proceedings of the 17th annual meeting of the Forum for
Information Retrieval Evaluation, 2025, pp. 9–12.
[17] S. Majumdar, S. Paul, D. Paul, A. Bandyopadhyay, B. Dave, S. Chattopadhyay, P. P. Das, P. D.</p>
      <p>Clough, P. Majumder, Generative ai for software metadata: Overview of the information retrieval
in software engineering track at fire 2023, in: Forum for Information Retrieval Evaluation, ACM,
2023.
[18] S. Majumdar, A. Deshpande, P. P. Das, P. P. Chakrabarti, Comprehending c codes with llms:</p>
      <p>Efective comment generation through retrieval and reasoning, Pattern Recognition Letters (2025).
[19] S. Paul, S. Majumdar, R. Shah, S. Das, M. Ghosh, D. Ganguly, G. Calikli, D. Sanyal, P. P. Das, P. D.</p>
      <p>Clough, et al., Overview of the “information retrieval in software engineering”(irse) track at forum
for information retrieval 2024, in: Proceedings of the 16th Annual Meeting of the Forum for
Information Retrieval Evaluation, 2024, pp. 18–21.
[20] N. Chatterjee, S. Majumdar, P. P. Das, A. Chakrabarti, Parallelc-assist: Productivity accelerator
suite based on dynamic instrumentation, IEEE Access 11 (2023) 73599–73612.
[21] S. Paul, S. Majumdar, A. Bandyopadhyay, B. Dave, S. Chattopadhyay, P. Das, P. D. Clough, P.
Majumder, Eficiency of large language models to scale up ground truth: Overview of the irse track
at forum for information retrieval 2023, in: Proceedings of the 15th Annual Meeting of the Forum
for Information Retrieval Evaluation, 2023, pp. 16–18.
[22] P. Chakraborty, S. Dutta, D. K. Sanyal, S. Majumdar, P. P. Das, Bringing order to chaos:
Conceptualizing a personal research knowledge graph for scientists., IEEE Data Eng. Bull. 46 (2023)
43–56.
[23] N. Chatterjee, S. Majumdar, P. P. Das, A. Chakrabarti, Tool assisted agile approach for legacy
application migration, International Journal of System Assurance Engineering and Management
(2025) 1–16.
[24] S. Majumdar, A. Varshney, P. P. Das, P. D. Clough, S. Chattopadhyay, An efective low-dimensional
software code representation using bert and elmo, in: 2022 IEEE 22nd International Conference
on Software Quality, Reliability and Security (QRS), IEEE, 2022, pp. 763–774.
[25] S. Majumdar, A. Bansal, P. P. Das, P. D. Clough, K. Datta, S. K. Ghosh,
Automated evaluation of comments to aid software maintenance, Journal of
Software: Evolution and Process 34 (2022) e2463. URL: https://onlinelibrary.
wiley.com/doi/abs/10.1002/smr.2463. doi:https://doi.org/10.1002/smr.2463.
arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/smr.2463.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Hotcomments: how to make program comments more useful?, in: Conference on Programming language design and implementation (SIGPLAN)</article-title>
          , ACM,
          <year>2007</year>
          , pp.
          <fpage>20</fpage>
          -
          <lpage>27</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Papdeja</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>S. K.</given-names>
          </string-name>
          <string-name>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <article-title>Smartkt: a search framework to assist program comprehension using smart knowledge transfer</article-title>
          ,
          <source>in: 2019 IEEE 19th International Conference on Software Quality, Reliability and Security (QRS)</source>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>97</fpage>
          -
          <lpage>108</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Sahoo</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
          </string-name>
          ,
          <article-title>Debugging multi-threaded applications using pin-augmented gdb (pgdb)</article-title>
          ,
          <source>in: International conference on software engineering research and practice (SERP)</source>
          . Springer,
          <year>2015</year>
          , pp.
          <fpage>109</fpage>
          -
          <lpage>115</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Sahoo</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
          </string-name>
          ,
          <article-title>D-cube: tool for dynamic design discovery from multi-threaded applications using pin</article-title>
          ,
          <source>in: 2016 IEEE International Conference on Software Quality, Reliability and Security (QRS)</source>
          , IEEE,
          <year>2016</year>
          , pp.
          <fpage>25</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Chakrabarti</surname>
          </string-name>
          ,
          <article-title>A mathematical framework for design discovery from multi-threaded applications using neural sequence solvers</article-title>
          ,
          <source>Innovations in Systems and Software Engineering</source>
          <volume>17</volume>
          (
          <year>2021</year>
          )
          <fpage>289</fpage>
          -
          <lpage>307</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pratim Das</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chakrabarti</surname>
          </string-name>
          ,
          <article-title>Dcube_ nn d cube nn: Tool for dynamic design discovery from multi-threaded applications using neural sequence models</article-title>
          ,
          <source>Advanced Computing and Systems for Security:</source>
          Volume
          <volume>14</volume>
          (
          <year>2021</year>
          )
          <fpage>75</fpage>
          -
          <lpage>92</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>