<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Usefulness of C Com ments using SVM and Naïve Bayes Classifier</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aritra Mitra</string-name>
          <email>aritramitra2002@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Machine Learning, Natural Language Processing, SVM, Naïve Bayes Classifier</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Indian Institute of Technology</institution>
          ,
          <addr-line>Kharagpur (IIT-KGP), West Bengal-721302</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <abstract>
        <p>Comments are very useful to the flow of code development. With the increasing use of code in commonplace life, commenting the codes becomes a hassle for rookie coders, and often they do not even think commenting as a part of the development process. This in general causes the quality of comments to degrade, and a considerable amount of useless comments are found in such codes. In these experiments, the usefulness of C comments are evaluated using Support Vector Machine (SVM) and Naïve Bayes Classifier. The results of the experiments create a baseline for better results that can be found in the future through more research. Based on these findings, more complex and intricate machine learning models can be created that can improve the accuracy achieved in performing said task.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Comments are an integral part of code development [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], and a lot of time is used to write the
comments to make the code more readable [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. But, not all comments are useful in helping the
cause, and with coding becoming more and more commonplace, novice coders are ignoring
the art of commenting, and the quality and quantity of comments are degrading [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. A lot of
comments are useless. But reading through a long comment only to find out that it is useless is
frustrating, and a wastage of time.
      </p>
      <p>
        The quantity of comments can be increased using various automatic commenting models based
of of deep learning [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. But, unfortunately there is a lack of research done when it comes to
address the bad quality of these comments. However, finally these problems are being addressed,
and to improve the quality of the human-written comments, machine learning models are being
developed to recognize and label the comments based on their usefulness.
      </p>
      <p>
        The author has explored various Machine Learning (ML) models to approach this problem. In
this paper, the author tries to find the answer to the following questions as a part of a shared
task, called the Information Retrieval in Software Engineering (IRSE) at Forum for Information
Retrieval Evaluation (FIRE) 2022 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], which was completed with the team name FaultySegment:
https://cse.iitkgp.ac.in/~aritra.mitra/ (A. Mitra)
• How much complex does a Machine Learning model need to be for being able to reliably
separate the useful comments from the useless ones?
• How do models like SVM and Naïve Bayes Classifier, two models which are known to
even the most novice Machine Learning students, fair in this problem?
The paper aims to show that models like these can be good starting points in approaching a
problem like this, and further complex models can be built upon these, with, of course, keeping
the factor of overfitting in mind. The paper
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>Several research works have been done to classify code comments, and diferent papers have
also been published.</p>
      <p>Yoann Padioleau et. al. showed that there exist many diferent classes of comments, each
serving a diferent purpose [ 6].</p>
      <p>Luca Pascarella et. al. worked on classifying Java comments based on their use cases and
types [7] where they used random forests and multinomial naïve bayes classifier to distinguish
between the diferent commonly occuring types of code comments.</p>
      <p>Similarly, Jingyi Zhang et. al. classified comments based on their use cases and types [ 8] where
they used decision trees and multinomial naïve bayes classifier to distinguish between the
diferent commonly occuring types of code comments.</p>
      <p>Yusuke Shinyama et. al. have analysed diferent type of code comments in Java and
Python to get details about the working of the code at a microscopic level using methods
like decision tree to identify explanatory comments with 60% precision and 80% percent recall [9].</p>
      <p>Srijoni Majumdar et. al. have worked on evaluating the quality and usefulness of comments
in being able to make the relevant code more comprehensible [10] where they use neural
networks to achieve precision and recall scores of 86.27% and 86.42%, respectively.</p>
      <p>Mohammed Masudur Rahman, et. al. worked on evaluating usefulness of comments in code
review [11] where they used textual features along with developer experience.</p>
      <p>Pooja Rani et. al. have looked at classifying comments of multiple languages, and the
diferences that appear among the types of comments in those diferent languages [ 12].</p>
    </sec>
    <sec id="sec-3">
      <title>3. Task and Dataset Description</title>
      <p>In this section, a description of the task at hand and the dataset provided are given. The task at
IRSE, FIRE 2022 was as follows:
A binary classification task to classify source code comments as Useful or Not Useful for a given
comment and associated code pair as input.</p>
      <p>The corresponding dataset was split into two:
• The training dataset with 8048 data points, and
• the testing dataset with 1001 data points.</p>
      <p>The training dataset was randomly split into 70% for training the models, and 30% for
crossvalidation. The data was labelled as follows:
• Useful: Comments that are useful for code comprehension
• Not Useful: Comments that are not useful for code comprehension</p>
    </sec>
    <sec id="sec-4">
      <title>4. System Description</title>
      <sec id="sec-4-1">
        <title>4.1. Text Preprocessing</title>
        <p>All the links, punctuations, numbers and stop words have been removed. Lemmatization is
used for grouping together the diferent forms of a word into a single word. NLTK wordnet
[13] is used for lemmatization. Both training and testing datasets use same preprocessing steps.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Feature Extraction</title>
        <p>TfidfVectorizer [ 14] is used for converting the text into numerical features. Tokenizer by Keras
[15] library is used, along with TfidfVectorizer that was used from scikit-learn library.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Machine Learning Models</title>
        <p>Two runs have been submitted for the task: one using Support Vector Machine (SVM) model,
and another with Naïve Bayes classifer model. We have used the SciKit-Learn library for both
of the models, with the parameters for the SVM model as follows:
• C: (regularization parameter) = 1
• kernel: (kernel type) = ’linear’</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Findings</title>
      <p>Run</p>
      <p>SVM
Naïve Bayes</p>
      <p>Macro F1 Score</p>
      <p>Macro Precision</p>
      <p>Macro Recall Accuracy%
With these parameters set for the SVM model, the validation set gives a 77.26708074534162%
accuracy score, along with an F1 score of 0.786464410735123.</p>
      <p>Also, with the Naïve Bayes Classifier, the validation set gives a 60.993788819875775% accuracy
score, along with an F1 score of 0.699233716475096.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>The tasks have been completed using elementary machine learning models like SVM and Naïve
Bayes Classifier, and the results for the SVM classifier shows that this model can be improved
upon and more complex models can be created, which will better suit the problem statement,
and will give a better result. Srijoni Majumdar, et. al. have already gotten better results using
neural networks [10], and the author hopes that these results will only improve over time.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>Thanks to the creators of IRSE FIRE for giving this wonderful opportunity to work on such a
project, and their constant technical support throughout the timespan.
[6] Y. Padioleau, L. Tan, Y. Zhou, Listening to programmers — taxonomies and characteristics
of comments in operating system code, in: 2009 IEEE 31st International Conference on
Software Engineering, 2009, pp. 331–341. doi: 10.1109/ICSE.2009.5070533.
[7] L. Pascarella, M. Bruntink, A. Bacchelli, Classifying code comments in java software
systems, Empirical Software Engineering 24 (2019) 1499–1537. URL: https://doi.org/10.
1007/s10664-019-09694-w. doi:10.1007/s10664-019-09694-w.
[8] J. Zhang, L. Xu, Y. Li, Classifying python code comments based on supervised learning, in:
X. Meng, R. Li, K. Wang, B. Niu, X. Wang, G. Zhao (Eds.), Web Information Systems and
Applications, Springer International Publishing, Cham, 2018, pp. 39–47.
[9] Y. Shinyama, Y. Arahori, K. Gondow, Analyzing code comments to boost program
comprehension, in: 2018 25th Asia-Pacific Software Engineering Conference (APSEC), 2018, pp.
325–334. doi:10.1109/APSEC.2018.00047.
[10] S. Majumdar, A. Bansal, P. P. Das, P. D. Clough, K. Datta, S. K. Ghosh,
Automated evaluation of comments to aid software maintenance, Journal of
Software: Evolution and Process 34 (2022) e2463. URL: https://onlinelibrary.
wiley.com/doi/abs/10.1002/smr.2463. doi:https://doi.org/10.1002/smr.2463.
arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/smr.2463.
[11] M. M. Rahman, C. Roy, R. Kula, Predicting usefulness of code review comments using
textual features and developer experience, 2017. doi:10.1109/MSR.2017.17.
[12] P. Rani, S. Panichella, M. Leuenberger, A. Di Sorbo, O. Nierstrasz, How to identify class
comment types? a multi-language approach for class comment classification, Journal of
Systems and Software 181 (2021) 111047. URL: https://www.sciencedirect.com/science/
article/pii/S0164121221001448. doi:https://doi.org/10.1016/j.jss.2021.111047.
[13] E. Loper, S. Bird, Nltk: The natural language toolkit, 2002. URL: https://arxiv.org/abs/cs/
0205028. doi:10.48550/ARXIV.CS/0205028.
[14] V. Kumar, B. Subba, A tfidfvectorizer and svm based sentiment analysis framework for
text data corpus, in: 2020 National Conference on Communications (NCC), 2020, pp. 1–6.
doi:10.1109/NCC48643.2020.9056085.
[15] N. Ketkar, Introduction to Keras, 2017, pp. 95–109. doi:10.1007/978-1-4842-2766-4_7.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Fluri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Würsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gall</surname>
          </string-name>
          ,
          <article-title>Do code and comments co-evolve? on the relation between source code and comment changes</article-title>
          ,
          <year>2007</year>
          , pp.
          <fpage>70</fpage>
          -
          <lpage>79</lpage>
          . doi:
          <volume>10</volume>
          .1109/WCRE.
          <year>2007</year>
          .
          <volume>21</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kajko-Mattsson</surname>
          </string-name>
          ,
          <article-title>A survey of documentation practice within corrective maintenance</article-title>
          ,
          <source>Empirical Software Engineering</source>
          <volume>10</volume>
          (
          <year>2005</year>
          )
          <fpage>31</fpage>
          -
          <lpage>55</lpage>
          . URL: https://doi.org/10.1023/B:LIDA.
          <volume>0000048322</volume>
          .42751.ca. doi:
          <volume>10</volume>
          .1023/B:LIDA.
          <volume>0000048322</volume>
          .42751.ca.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Raskin</surname>
          </string-name>
          ,
          <article-title>Comments are more important than code</article-title>
          ,
          <source>ACM Queue 3</source>
          (
          <year>2005</year>
          )
          <fpage>64</fpage>
          -.
          <source>doi:10. 1145/1053331</source>
          .1053354.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Wong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <article-title>Autocomment: Mining question and answer sites for automatic comment generation</article-title>
          ,
          <source>in: 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE)</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>562</fpage>
          -
          <lpage>567</lpage>
          . doi:
          <volume>10</volume>
          .1109/ASE.
          <year>2013</year>
          .
          <volume>6693113</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bandyopadhyay</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. P. Das</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. D Clough</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Chattopadhyay</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Majumder</surname>
          </string-name>
          ,
          <article-title>Overview of the IRSE subtrack at FIRE 2022: Information Retreival in Software Engineering</article-title>
          , in: Working Notes of FIRE 2022 -
          <article-title>Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>