453


Pragmatic Analysis of Classification Techniques based on Hyper-
parameter Tuning for Sentiment Analysis

Charu Gupta1 , Prateek Agrawal2, Rohan Ahuja 1, Kunal Vats1 , Chirag Pahuja1 , Tanuj Ahuja1
1
 Department of Computer Science and Engineering, Bhagwan Parshuram Institute of Technology, India
2
 School of Computer Science Engineering, Lovely Professional University, Punjab, India
2
 Department of ITEC, University of Klagenfurt, Austria

           Abstract
           The evolution of technology and strong social network has empowered the online user community to share
           their views on almost every product, event or issue. This has led to a large amount of unstructured online user
           generated data. Furthermore, every company selling online products analyses its product’s demand and also
           focuses on their corresponding user reviews. This online user data needs to be analyzed for effective decision
           making either for the user or for the manufacturer. For this, Sentiment Analysis plays a vital role and is
           extremely useful in social media monitoring as it allows insight of the wider public opinion. In the present
           study, Amazon product review dataset is used to perform sentiment analysis. The proposed model is trained
           for four different classifiers: Naive Bayes, Support Vector Machine, Logistic Regression, and Random Forest
           with different hyper-parameter tuning. The model achieved a maximum accuracy of 91% using Logistic
           Regression. Furthermore, a comparative analysis of various algorithms is also discussed. The study focuses
           on the importance of hyper parameter tuning while training a classifier which helps in achieving better results
           than other previous approaches.
           Keywords
           Amazon Product Reviews, Classification Sentiment Analysis, Social Media, Hyper-parameter tuning,
           machine learning classification, SVM, Naïve Bayes, Random forest, Logistic regression.

    1. Introduction
    Sentiment analysis or opinion mining is a field of                                         Although, sentiment analysis is one of the most widely
    natural language processing which analyses the                                             used techniques to find sentiment in the text, it has
    positive, negative or neutral sentiments (emotions)                                        numerous challenges [9]. Firstly, online text on the
    associated in text, speech or both. It extracts subjective                                 internet consists of slang, abbreviations, typos, poor
    information from the text corpus to provide valuable                                       punctuation, poor grammar, which makes it difficult
    insights which provide the required decision-making                                        for the classifier to predict accurate results. Secondly,
    rules to business. Sentiment mining is a significant                                       sarcasm in text data is a major problem in identifying
    research area as there is a significant increase in the                                    the polarity of the statement [11]. Thirdly, anaphora
    user online data on E-trade sites where understanding                                      resolution which is the process of resolving the
    an individual's opinions is an important criterion.                                        reference of a pronoun or a noun phrase in a sentence
    Around 90% of the users' information has been given                                        [3]. For example, "We went to play cricket and
    during the most recent two years. Hence, there is a dire                                   watched the movie, it was awful." What does "It" refer
    need to carefully analyse this plethora of information.                                    to here? This is a significant hurdle in the process of
                                                                                               sentiment analysis. Furthermore, the ability to identify
    ISIC’21: International Semantic Intelligence Conference,                                   the correct interpretation of the context in which
    February 25-27, 2021, Delhi, India                                                         certain words used remains a challenge.
    EMAIL:       charu.wa1987@gmail.com       (C.    Gupta);                                            In this paper, an online user review analysis
    prateek061186@mail.com             (P.            Agrawal);                                system (based on text only) is designed to create an
    access.2287@gmail.com (R. Ahuja); kunal.vats.bpit@gmail.com
    (K.    Vats);  chirag.bpit@gmail.com        (C.    Pahuja);
                                                                                               easy to use environment which can be used by the
    tanuj.bpit@gmail.com (T. Ahuja)                                                            companies/manufacturers to analyse the impact (good
    ORCID: 0000-0002-1703-7040 (C. Gupta); 0000-0001-6861-                                     or bad) of the company's product in the market. The
    0698 (P. Agrawal)                                                                          proposed methodology is experimented with four
               ©️ 2021 Copyright for this paper by its authors. Use permitted under Creative
               Commons License Attribution 4.0 International (CC BY 4.0).                      different classifiers, namely Naive Bayes, Support
               CEUR Workshop Proceedings (CEUR-WS.org)                                         Vector Machine (SVM), K- Nearest Neighbors, and
                                                                                                                     454


Random Forest (RF) [6] on the amazon earphone                 A flowchart is a sort of framework that addresses a
review dataset. The motivation behind the proposed            work procedure or method. A flowchart can similarly
methodology is to critically examine various classifiers      be described as a diagrammatic depiction of any
with hyper-parameter tuning for predicting the best           process or method [4]. The flow graph in Figure 1
result of finding the polarity of the text. The results are   depicts the proposed methodology of the process of
further compared with existing works in the literature.       this paper.

         The rest of the paper is organized in the            3.1 Data Preprocessing and Dataset
following sections. Section 2 discusses the related
                                                              Data pre-processing is mainly carried out to remove
work and section 3 explains the methodology of the
                                                              inconsistent, noisy, and incomplete data from the
proposed work. Section 4, 5 and 6 illustrates the
                                                              training set. It consists of different steps: tokenization,
implementation, experimental results and comparative
                                                              stop words removal, Stemming, Lemmatization [10].
analysis respectively. Section 7 discusses the
conclusion of the proposed methodology with critical
examination of the results and future work.


2 Related Work

Nowadays, every company wants to analyse how good
their products are in the online market. May it be an
online store or only an organization that wants to test
its employee's satisfaction. Opinion mining and
sentiment analysis have long been proposed as a
technique used to solve this problem and became a
field of interest for many researchers. Sentiment
Analysis has been tackled at various levels of detail,
including document-level classification in [1],
sentence-level in [2], and phrase-level in [3]. In [4] the
methodology used integrates existing sentiment
analysis approaches and increases the accuracy of the
system. In [5], it is shown that support vector machines
(SVM) perform better than Naive Bayes, which agrees
with the proposed results. In [6], technique for opinion
mining using R on plain text data from Twitter using a
lexicon approach is proposed. However, none of the
above approaches have shown the importance of hyper-
parameter tuning while training a classifier. To
understand the effect of hyper-parameter tuning, the
proposed framework does a comparative analysis of the
following classifiers: Naive Bayes, SVM, Logistic                 Figure 1: Flowchart of the Proposed System
Regression and RF Algorithm. The motivation of the
proposed study is to help and guide the decision-maker
to choose the most appropriate classifier for a given         Tokenization: It is the process of recognizing basic
dataset.                                                      units inside a sentence which need not be disintegrated
                                                              in subsequent processing. The resultant individual units
                                                              after the process of tokenization are known tokens.
3 Proposed Methodology                                        These tokens are input to the next step(s) in the pre-
                                                              processing stage.
                                                                                                                   455


Stop words removal: Most words in a sentence or a           4        Rando     G(t) = 1 -        All as default.
paragraph are connecting words which do not                          m         Σp2(k|t)
contribute much towards the polarity. In this process,               Forest
these unnecessary words from the text are removed.
According to the proposed framework, this step is not
optional. In absence of stop-word removal, the feature      Data Set Features:
space might get too large, which can significantly
affect the performance of the algorithm(s).                 ReviewTitle : Title of the Review
                                                            ReviewBody : Body of the Review
Stemming: In this process, the characters in a word are     ReviewStar : Stars given by Customer ProductProduct:
removed which reduces the word to its root. In the          Name
proposed work, Porter stemming is used to perform this
task. It works by removing the everyday person's
morphological and inflexional endings from words in         4 Design and Implementation
English.                                                    The ideology of the proposed work is to understand the
Lemmatization: The objective of lemmatization is            usage of online product reviews taken from a well-
equivalent to Stemming. It reduces inflectional             known dataset repository (Kaggle). The steps in the
structures and derivationally related types of a word for   proposed methodology are as follows:
a typical base structure [3]. It takes into consideration
the meaning of the word rather than stemming, which
aims to reduce the characters in the word.                  Step 1 : Data collection of reviews for products
                                                            Step 2 . Data cleaning like stop-words removal.
3.2 Hyper Parameter Tuning                                           Step 3.1: Tokenize each review.
Hyper-parameters are the values which are used in                    Step 3.2: Lemmatize each word.
machine learning algorithms and whose values are set
before the learning process begins [12]. Tuning of the      Step 3 : Converting text to numerical features using
hyper-parameters means finding out the best suited          Bag-of-Words.
values for each algorithm which would work best.            Step 4 : Splitting data into train and test data.
Every algorithm has its different hyper-parameters to       Step 5 : Analysing different algorithms.
be tuned. The respective hyper-parameters for each                   Step 5.1 : Apply different machine learning
algorithm are shown in the Table 1.                                  algorithms on the cleaned text and analyse the
                                                                     accuracy of the respective model.
    Table 1: Hyper-parameter tuned with respect to                   Step 5.2 : Hyper-parameter tuning for the
            various classification algorithms                        algorithm with the best accuracy on the given
                                                                     dataset.
S.No     Algorit   Formula             Parameters
         hm
                                                            As it can be seen from the step 1 to step 5, the
1        Logisti   Vi = β0 + β1Y1      C=0.7(For
                                                            proposed model fetches the data, performs cleaning or
         c         + ε1                Hyperparaneter
         Regres                        tuning),
                                                            remove stop words, classifies reviews, and gets the
         sion                          Other as default.    polarity of the reviews. Further, almost all machine
                                                            learning methods can be used to the task of classifying
2        Naive     P(c | t) = P(c) .   var_smoothing =
         Bayes     P(t | c)            10^-12
                                                            texts. Most often used and well-proven SVM, Bayes
                                       Other as default.    Method, Nearest Neighbor Method, Neural networks,
                                                            Decision trees, Rocher classifier. However, the
3        k-NN      S(d, ci) =          n_neighbors = 10
                   Σsim(d,dj)          Other as default.
                                                            proposed work develops an appropriate method for the
                   δ(dj,ci)                                 Classification of online user review text using four
                                                            classic algorithms: SVM, Logistic Regression, Naïve
                                                                                                                     456


Bayes, and RF. These algorithms are easy to                    The confusion matrix of Logistic Regression is shown
understand and used widely in the literature.                  in Table 2 and that of SVM prediction is shown in
                                                               Table 4. Also, the Precision - recall of Logistic
                                                               Regression is shown in Table 3 and that of SVM is
5 Experimental Results                                         depicted in Table 5.
In order to implement the above-mentioned steps,
Python is used for sentiment analysis. The packages,
thus, utilized includes CGI, counter, accuracy_score,           Table 3: Precision - recall with F1-score matrix of
model_selection,               nltk,            stopwords,                     Logistic model used
WordNetLemmatizer,                         train_test_split,
RandomizedSearchCV, Logistic Regression. The
                                                                            Precisio    Recall      F1-score      Support
experimental results show that SVM and Logistic                                n
Regression have better average performance than RF
and Naïve Bayes. Initially, Logistic Regression reached
                                                                  0          0.60        0.62         0.60          34
89% using combinations of representative design with
prior processing tokenization, filtering, normalization,
and root stemmer. TF-IDF is used as a representation              1          0.925      0.939        0.932          66
of characteristics with/without a selection of any
feature. SVM reached 81.00% using a combination of
                                                               Accuracy                             0.89285        100
tokenization, filtering in as pre-processing, and TF-IDF
as a representation indicator with information gain as a
choice of the indicator. Further, it is observed that easy      Macro        0.77        0.78         0.77         100
stemming is the best cutting technique. This is because          avg
easy stemming is better than stemming from
linguistics. From the semantic point of view, it takes         Weighted      0.89        0.89         0.89         100
the least time for pre-processing and has the excellent          avg
average classification accuracy. Also, it is observed
that the development of indicators (hyper- parameters)
is very important for improving the accuracy of the            Table 4: Confusion matrix of SVM prediction effect
classification.
                                                                                                    Actual Value

  Table 2: Confusion matrix of Logistic prediction
                                                                                            1             0          Merge
                      effect
                                                                                        (positive     (negative
                                                                                          class)        class)
                                     Actual Value
                                                               Predictive       1       261 (TP)      122 (FP)           383
                                                                 value      (positive
                             1             0        Merge                     class)
                          (positiv      (negativ
                                                                                0       146 (FN)      871 (TN)           1017
                          e class)      e class)
                                                                            (negative
Predict    1 (positive    29 (TP)       5 (FP)       34                       class)
  ive         class)
 value                                                                       Merge        407           993              1400
           0 (negative     4 (FN)       62 (TN)      66
              class)

             Merge           33           67         100
                                                                                                            457


 Table 5: Precision - recall with F1-score matrix of       Table 7: Comparison with State-of-the Art Methods
                    SVM model                                                 [13,14,15]


              Precision   Recall       F1-        suppor     Paper Title      Dataset      Accuracy
                                      score          t
                                                             Amazon           Amazon       MNB            72.95%
    0           0.64       0.68        0.66        383       Reviews,         Product
                                                             business         Reviews      SVM            80.11%
                                                             analytics with   (Mobile
    1           0.88       0.86        0.87        1017      sentiment        Reviews)
                                                             analysis [13]
                                                             Feature          Amazon       Phrase Level   70.00%
Accuracy                               0.81        1400      Selection        data for
                                                             Methods in       Books        Single Word    70.00%
                                                             Sentiment
 Macro          0.76       0.77        0.76        1400      Analysis and                  Multi Word     80.00%
  avg                                                        Sentiment
                                                             Classification   Amazon       Phrase Level   62.00%
                                                             of Amazon        data for
Weighted        0.81       0.81        0.81        1400      Product          Music        Single Word    80.00%
  avg                                                        Reviews [14]
                                                                                           Multi Word     68.00%

                                                                              Amazon       Phrase Level   62.00%
   Table 6: Comparative Analysis of Accuracy in                               data for
    Various Learning Models (Product Reviews)                                 Camera       Single Word    80.00%

                                                                                           Multi Word     68.00%
              Model                    Accuracy
                                                             Sentiment        Data of      84.44%
Naive Bayes                             70.5%                Analysis in      reviews of
                                                             Amazon           books
                                                             Reviews Using    Data of      87.33%
Random Forest                           78.5%                Probabilistic    reviews of
                                                             Machine          Kindle
SVM                                      83%                 Learning [15]
                                                             Proposed         Amazon       Naive Bayes    70.5%
                                                             Model            Earphone
Logistic Regression                      91%
                                                                              Review       Random         78.5%
                                                                              Dataset      Forest

                                                                                           SVM            83%

6 Comparative Analysis                                                                     Logistic       91%
                                                                                           Regression
In this section, the proposed methodology is compared
with the existing works in the literature. A comparative
analysis of this study is shown in Table 7 which           From Table 7, the approaches dealt do not explicitly
examines the proposed methodology with other similar       concentrate on the values of hyper-parameters during
works in literature [13,14,15].                            the process of training however in the proposed work,
                                                           Hyper-parameter tuning on the Logistic Regression
                                                           model gave the best accuracy when sample models
                                                                                                              458


were trained. The empirical analysis suggests that these    [3] Nguyen, H., Veluchamy, A., Diop, M. and Iqbal,
parameters play a vital role in improving the resulting     R., 2018. Comparative Study of Sentiment Analysis
accuracy. This is because tuning hyper-parameters           with Product Reviews Using Machine Learning and
helps in getting rid of under-fitting and over-fitting of   Lexicon-Based Approaches. SMU Data Science
the model. Hyper-parameter tuning helps in reducing         Review, 1(4), p.7.
loss factor through a great margin as the parameters are    [4] Bhatt, A., Patel, A., Chheda, H. and Gawande, K.,
fine tuned in correspondence to the training data. For      2015. Amazon review classification and sentiment
example - In Logistic regression, to get the right          analysis. International Journal of Computer Science
classifying plane it is really important to get the         and Information Technologies, 6(6), pp.5107-5110.
appropriate weights associated with each of the
features. This can be easily tested by tuning the hyper-    [5] Shivaprasad, T.K. and Shetty, J., 2017, March.
parameters which is true for other algorithms also.         Sentiment analysis of product reviews: a review. In
                                                            2017 International Conference on Inventive
                                                            Communication and Computational Technologies
7 Conclusion and Future Work                                (ICICCT), pp. 298-301. IEEE.
With the increased interest of people in online             [6] Ray, P. and Chakrabarti, A., 2017, February.
shopping, tweeting, writing opinions, there is a need to    Twitter sentiment analysis for product review using
analyze these opinions that contain a large amount of       lexicon method. In 2017 International Conference on
decision-making information. This information is            Data Management, Analytics and Innovation
useful for both customers as well as for the                (ICDMAI), pp. 211-216. IEEE.
manufacturer. With the proposed methodology, these          [7] onnx.ai , https://onnx.ai/[Last Accessed on:26-12-
opinions are analyzed using various classification          2019]
algorithms. Also, the importance of product reviews is
analyzed. The classification of the reviews is discussed    [8]es.scribd.com,       https://www.scribd.com/[Last
with an emphasis on the importance of hyper-                Accessed on :23/12/2019]
parameter tuning. Through empirical testing it is           [9] Gautam, A., Bhateja, V., Tiwari, A. and Satapathy,
observed that hyper-parameter tuning is of great            S.C., 2018. An improved mammogram classification
significance and can improve the accuracy of any            approach using back propagation neural network. In
classification algorithm drastically. From the              Data Engineering and Intelligent Computing (pp. 369-
experimental results obtained, it is observed that          376). Springer, Singapore.
Logistic regression outperforms other algorithms in         [10] Nandal, N., Tanwar, R. and Pruthi, J., 2020.
classifying the reviews with an accuracy of 91%. This       Machine learning based aspect level sentiment analysis
study can be further utilized to understand the effect of   for Amazon products. Spatial Information Research,
parameters and hyper parameters used in various             pp.1-7.
classification algorithms. The proposed methodology
can be studied with soft computing techniques as well.      [11] Verma, Pawan K. and Agrawal, Prateek, "Study
                                                            and Detection of Fake News: P2C2-Based Machine
                                                            Learning Approach", International Conference on Data
                     References                             Management, Analytics and Innovation, pp. 261-278,
                                                            2020
                                                            [12] Madaan, V. and Goyal, A., 2020. Predicting
[1] Pang, B. and Lee, L., 2004. A sentimental               Ayurveda Based Constituent Balancing in Human
education: Sentiment analysis using subjectivity            Body Using Machine Learning Methods, IEEE
summarization based on minimum cuts. In Proceedings         ACCESS, 8(1), pp. 65060-65070.
of the 42nd annual meeting on Association for
Computational Linguistics, pp. 271.                         [13] Elli, M.S. and Wang, Y.F., 2016. Amazon
                                                            Reviews, business analytics with sentiment analysis.
[2] Liu, B., 2012. Sentiment analysis and opinion
mining. Synthesis lectures on human language                [14] Shaikh, T. and Deshpande, D., 2016. Feature
technologies, 5(1), pp.1-167.                               selection methods in sentiment analysis and sentiment
                                                                                                       459


classification of amazon product reviews. Int J Comput   [15] Rain, C., 2013. Sentiment analysis in amazon
Trends Technol, 36(4), pp.225-230.                       reviews using probabilistic machine learning.
                                                         Swarthmore College.