=Paper=
{{Paper
|id=Vol-3395/T1-6
|storemode=property
|title=Evaluating Usefulness of C Comments using SVM and Naïve Bayes Classifier
|pdfUrl=https://ceur-ws.org/Vol-3395/T1-6.pdf
|volume=Vol-3395
|authors=Aritra Mitra
|dblpUrl=https://dblp.org/rec/conf/fire/Mitra22
}}
==Evaluating Usefulness of C Comments using SVM and Naïve Bayes Classifier==
<pdf width="1500px">https://ceur-ws.org/Vol-3395/T1-6.pdf</pdf>
<pre>
Evaluating Usefulness of C Comments using SVM and
Naïve Bayes Classifier
Aritra Mitra1,∗
1
    Indian Institute of Technology, Kharagpur (IIT-KGP), West Bengal-721302, India


                                         Abstract
                                         Comments are very useful to the flow of code development. With the increasing use of code in common-
                                         place life, commenting the codes becomes a hassle for rookie coders, and often they do not even think
                                         commenting as a part of the development process. This in general causes the quality of comments to
                                         degrade, and a considerable amount of useless comments are found in such codes. In these experiments,
                                         the usefulness of C comments are evaluated using Support Vector Machine (SVM) and Naïve Bayes
                                         Classifier. The results of the experiments create a baseline for better results that can be found in the
                                         future through more research. Based on these findings, more complex and intricate machine learning
                                         models can be created that can improve the accuracy achieved in performing said task.

                                         Keywords
                                         Machine Learning, Natural Language Processing, SVM, Naïve Bayes Classifier


1. Introduction
Comments are an integral part of code development [1], and a lot of time is used to write the
comments to make the code more readable [2]. But, not all comments are useful in helping the
cause, and with coding becoming more and more commonplace, novice coders are ignoring
the art of commenting, and the quality and quantity of comments are degrading [3]. A lot of
comments are useless. But reading through a long comment only to find out that it is useless is
frustrating, and a wastage of time.
The quantity of comments can be increased using various automatic commenting models based
off of deep learning [4]. But, unfortunately there is a lack of research done when it comes to
address the bad quality of these comments. However, finally these problems are being addressed,
and to improve the quality of the human-written comments, machine learning models are being
developed to recognize and label the comments based on their usefulness.
The author has explored various Machine Learning (ML) models to approach this problem. In
this paper, the author tries to find the answer to the following questions as a part of a shared
task, called the Information Retrieval in Software Engineering (IRSE) at Forum for Information
Retrieval Evaluation (FIRE) 2022 [5], which was completed with the team name FaultySegment:


Forum for Information Retrieval Evaluation, December 9-13, 2022, India
∗
    Corresponding author.
Envelope-Open aritramitra2002@gmail.com (A. Mitra)
GLOBE https://cse.iitkgp.ac.in/~aritra.mitra/ (A. Mitra)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
    • How much complex does a Machine Learning model need to be for being able to reliably
      separate the useful comments from the useless ones?
    • How do models like SVM and Naïve Bayes Classifier, two models which are known to
      even the most novice Machine Learning students, fair in this problem?

The paper aims to show that models like these can be good starting points in approaching a
problem like this, and further complex models can be built upon these, with, of course, keeping
the factor of overfitting in mind. The paper


2. Related Works
Several research works have been done to classify code comments, and different papers have
also been published.
Yoann Padioleau et. al. showed that there exist many different classes of comments, each
serving a different purpose [6].

   Luca Pascarella et. al. worked on classifying Java comments based on their use cases and
types [7] where they used random forests and multinomial naïve bayes classifier to distinguish
between the different commonly occuring types of code comments.
Similarly, Jingyi Zhang et. al. classified comments based on their use cases and types [8] where
they used decision trees and multinomial naïve bayes classifier to distinguish between the
different commonly occuring types of code comments.

   Yusuke Shinyama et. al. have analysed different type of code comments in Java and
Python to get details about the working of the code at a microscopic level using methods
like decision tree to identify explanatory comments with 60% precision and 80% percent recall [9].

   Srijoni Majumdar et. al. have worked on evaluating the quality and usefulness of comments
in being able to make the relevant code more comprehensible [10] where they use neural
networks to achieve precision and recall scores of 86.27% and 86.42%, respectively.

  Mohammed Masudur Rahman, et. al. worked on evaluating usefulness of comments in code
review [11] where they used textual features along with developer experience.

   Pooja Rani et. al. have looked at classifying comments of multiple languages, and the
differences that appear among the types of comments in those different languages [12].


3. Task and Dataset Description
In this section, a description of the task at hand and the dataset provided are given. The task at
IRSE, FIRE 2022 was as follows:
A binary classification task to classify source code comments as Useful or Not Useful for a given
                           comment and associated code pair as input.

The corresponding dataset was split into two:

    • The training dataset with 8048 data points, and
    • the testing dataset with 1001 data points.

The training dataset was randomly split into 70% for training the models, and 30% for cross-
validation. The data was labelled as follows:

    • Useful: Comments that are useful for code comprehension
    • Not Useful: Comments that are not useful for code comprehension


Table 1
Description of the Dataset for the Task
                 Label                               Example
                Useful      /*not interested in the downloaded bytes, return the size*/
                Useful                     /*Fill in the file upload part*/
               Not Useful    /*The following works both in 1.5.4 and earlier versions:*/
               Not Useful                            /*lock_time*/


4. System Description
4.1. Text Preprocessing
All the links, punctuations, numbers and stop words have been removed. Lemmatization is
used for grouping together the different forms of a word into a single word. NLTK wordnet
[13] is used for lemmatization. Both training and testing datasets use same preprocessing steps.

4.2. Feature Extraction
TfidfVectorizer [14] is used for converting the text into numerical features. Tokenizer by Keras
[15] library is used, along with TfidfVectorizer that was used from scikit-learn library.

4.3. Machine Learning Models
Two runs have been submitted for the task: one using Support Vector Machine (SVM) model,
and another with Naïve Bayes classifer model. We have used the SciKit-Learn library for both
of the models, with the parameters for the SVM model as follows:

    • C: (regularization parameter) = 1
    • kernel: (kernel type) = ’linear’
Table 2
Results of Classifier Runs
               Run           Macro F1 Score   Macro Precision   Macro Recall   Accuracy%
              SVM               0.771718         0.772345         0.771381      77.2670
           Naïve Bayes          0.599571         0.609193         0.644289      63.9751


5. Findings
With these parameters set for the SVM model, the validation set gives a 77.26708074534162%
accuracy score, along with an F1 score of 0.786464410735123.
Also, with the Naïve Bayes Classifier, the validation set gives a 60.993788819875775% accuracy
score, along with an F1 score of 0.699233716475096.


6. Conclusion
The tasks have been completed using elementary machine learning models like SVM and Naïve
Bayes Classifier, and the results for the SVM classifier shows that this model can be improved
upon and more complex models can be created, which will better suit the problem statement,
and will give a better result. Srijoni Majumdar, et. al. have already gotten better results using
neural networks [10], and the author hopes that these results will only improve over time.


Acknowledgments
Thanks to the creators of IRSE FIRE for giving this wonderful opportunity to work on such a
project, and their constant technical support throughout the timespan.


References
 [1] B. Fluri, M. Würsch, H. Gall, Do code and comments co-evolve? on the relation between
     source code and comment changes, 2007, pp. 70–79. doi:10.1109/WCRE.2007.21 .
 [2] M. Kajko-Mattsson, A survey of documentation practice within corrective maintenance,
     Empirical Software Engineering 10 (2005) 31–55. URL: https://doi.org/10.1023/B:LIDA.
     0000048322.42751.ca. doi:10.1023/B:LIDA.0000048322.42751.ca .
 [3] J. Raskin, Comments are more important than code, ACM Queue 3 (2005) 64–. doi:10.
     1145/1053331.1053354 .
 [4] E. Wong, J. Yang, L. Tan, Autocomment: Mining question and answer sites for automatic
     comment generation, in: 2013 28th IEEE/ACM International Conference on Automated
     Software Engineering (ASE), 2013, pp. 562–567. doi:10.1109/ASE.2013.6693113 .
 [5] S. Majumdar, A. Bandyopadhyay, P. P. Das, P. D Clough, S. Chattopadhyay, P. Majumder,
     Overview of the IRSE subtrack at FIRE 2022: Information Retreival in Software Engineering,
     in: Working Notes of FIRE 2022 - Forum for Information Retrieval Evaluation, ACM, 2022.
 [6] Y. Padioleau, L. Tan, Y. Zhou, Listening to programmers — taxonomies and characteristics
     of comments in operating system code, in: 2009 IEEE 31st International Conference on
     Software Engineering, 2009, pp. 331–341. doi:10.1109/ICSE.2009.5070533 .
 [7] L. Pascarella, M. Bruntink, A. Bacchelli, Classifying code comments in java software
     systems, Empirical Software Engineering 24 (2019) 1499–1537. URL: https://doi.org/10.
     1007/s10664-019-09694-w. doi:10.1007/s10664- 019- 09694- w .
 [8] J. Zhang, L. Xu, Y. Li, Classifying python code comments based on supervised learning, in:
     X. Meng, R. Li, K. Wang, B. Niu, X. Wang, G. Zhao (Eds.), Web Information Systems and
     Applications, Springer International Publishing, Cham, 2018, pp. 39–47.
 [9] Y. Shinyama, Y. Arahori, K. Gondow, Analyzing code comments to boost program compre-
     hension, in: 2018 25th Asia-Pacific Software Engineering Conference (APSEC), 2018, pp.
     325–334. doi:10.1109/APSEC.2018.00047 .
[10] S. Majumdar, A. Bansal, P. P. Das, P. D. Clough, K. Datta, S. K. Ghosh, Au-
     tomated evaluation of comments to aid software maintenance,                    Journal of
     Software: Evolution and Process 34 (2022) e2463. URL: https://onlinelibrary.
     wiley.com/doi/abs/10.1002/smr.2463.            doi:https://doi.org/10.1002/smr.2463 .
     arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/smr.2463 .
[11] M. M. Rahman, C. Roy, R. Kula, Predicting usefulness of code review comments using
     textual features and developer experience, 2017. doi:10.1109/MSR.2017.17 .
[12] P. Rani, S. Panichella, M. Leuenberger, A. Di Sorbo, O. Nierstrasz, How to identify class
     comment types? a multi-language approach for class comment classification, Journal of
     Systems and Software 181 (2021) 111047. URL: https://www.sciencedirect.com/science/
     article/pii/S0164121221001448. doi:https://doi.org/10.1016/j.jss.2021.111047 .
[13] E. Loper, S. Bird, Nltk: The natural language toolkit, 2002. URL: https://arxiv.org/abs/cs/
     0205028. doi:10.48550/ARXIV.CS/0205028 .
[14] V. Kumar, B. Subba, A tfidfvectorizer and svm based sentiment analysis framework for
     text data corpus, in: 2020 National Conference on Communications (NCC), 2020, pp. 1–6.
     doi:10.1109/NCC48643.2020.9056085 .
[15] N. Ketkar, Introduction to Keras, 2017, pp. 95–109. doi:10.1007/978- 1- 4842- 2766- 4_7 .

</pre>