453 Pragmatic Analysis of Classification Techniques based on Hyper- parameter Tuning for Sentiment Analysis Charu Gupta1 , Prateek Agrawal2, Rohan Ahuja 1, Kunal Vats1 , Chirag Pahuja1 , Tanuj Ahuja1 1 Department of Computer Science and Engineering, Bhagwan Parshuram Institute of Technology, India 2 School of Computer Science Engineering, Lovely Professional University, Punjab, India 2 Department of ITEC, University of Klagenfurt, Austria Abstract The evolution of technology and strong social network has empowered the online user community to share their views on almost every product, event or issue. This has led to a large amount of unstructured online user generated data. Furthermore, every company selling online products analyses its product’s demand and also focuses on their corresponding user reviews. This online user data needs to be analyzed for effective decision making either for the user or for the manufacturer. For this, Sentiment Analysis plays a vital role and is extremely useful in social media monitoring as it allows insight of the wider public opinion. In the present study, Amazon product review dataset is used to perform sentiment analysis. The proposed model is trained for four different classifiers: Naive Bayes, Support Vector Machine, Logistic Regression, and Random Forest with different hyper-parameter tuning. The model achieved a maximum accuracy of 91% using Logistic Regression. Furthermore, a comparative analysis of various algorithms is also discussed. The study focuses on the importance of hyper parameter tuning while training a classifier which helps in achieving better results than other previous approaches. Keywords Amazon Product Reviews, Classification Sentiment Analysis, Social Media, Hyper-parameter tuning, machine learning classification, SVM, Naïve Bayes, Random forest, Logistic regression. 1. Introduction Sentiment analysis or opinion mining is a field of Although, sentiment analysis is one of the most widely natural language processing which analyses the used techniques to find sentiment in the text, it has positive, negative or neutral sentiments (emotions) numerous challenges [9]. Firstly, online text on the associated in text, speech or both. It extracts subjective internet consists of slang, abbreviations, typos, poor information from the text corpus to provide valuable punctuation, poor grammar, which makes it difficult insights which provide the required decision-making for the classifier to predict accurate results. Secondly, rules to business. Sentiment mining is a significant sarcasm in text data is a major problem in identifying research area as there is a significant increase in the the polarity of the statement [11]. Thirdly, anaphora user online data on E-trade sites where understanding resolution which is the process of resolving the an individual's opinions is an important criterion. reference of a pronoun or a noun phrase in a sentence Around 90% of the users' information has been given [3]. For example, "We went to play cricket and during the most recent two years. Hence, there is a dire watched the movie, it was awful." What does "It" refer need to carefully analyse this plethora of information. to here? This is a significant hurdle in the process of sentiment analysis. Furthermore, the ability to identify ISIC’21: International Semantic Intelligence Conference, the correct interpretation of the context in which February 25-27, 2021, Delhi, India certain words used remains a challenge. EMAIL: charu.wa1987@gmail.com (C. Gupta); In this paper, an online user review analysis prateek061186@mail.com (P. Agrawal); system (based on text only) is designed to create an access.2287@gmail.com (R. Ahuja); kunal.vats.bpit@gmail.com (K. Vats); chirag.bpit@gmail.com (C. Pahuja); easy to use environment which can be used by the tanuj.bpit@gmail.com (T. Ahuja) companies/manufacturers to analyse the impact (good ORCID: 0000-0002-1703-7040 (C. Gupta); 0000-0001-6861- or bad) of the company's product in the market. The 0698 (P. Agrawal) proposed methodology is experimented with four ©️ 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). different classifiers, namely Naive Bayes, Support CEUR Workshop Proceedings (CEUR-WS.org) Vector Machine (SVM), K- Nearest Neighbors, and 454 Random Forest (RF) [6] on the amazon earphone A flowchart is a sort of framework that addresses a review dataset. The motivation behind the proposed work procedure or method. A flowchart can similarly methodology is to critically examine various classifiers be described as a diagrammatic depiction of any with hyper-parameter tuning for predicting the best process or method [4]. The flow graph in Figure 1 result of finding the polarity of the text. The results are depicts the proposed methodology of the process of further compared with existing works in the literature. this paper. The rest of the paper is organized in the 3.1 Data Preprocessing and Dataset following sections. Section 2 discusses the related Data pre-processing is mainly carried out to remove work and section 3 explains the methodology of the inconsistent, noisy, and incomplete data from the proposed work. Section 4, 5 and 6 illustrates the training set. It consists of different steps: tokenization, implementation, experimental results and comparative stop words removal, Stemming, Lemmatization [10]. analysis respectively. Section 7 discusses the conclusion of the proposed methodology with critical examination of the results and future work. 2 Related Work Nowadays, every company wants to analyse how good their products are in the online market. May it be an online store or only an organization that wants to test its employee's satisfaction. Opinion mining and sentiment analysis have long been proposed as a technique used to solve this problem and became a field of interest for many researchers. Sentiment Analysis has been tackled at various levels of detail, including document-level classification in [1], sentence-level in [2], and phrase-level in [3]. In [4] the methodology used integrates existing sentiment analysis approaches and increases the accuracy of the system. In [5], it is shown that support vector machines (SVM) perform better than Naive Bayes, which agrees with the proposed results. In [6], technique for opinion mining using R on plain text data from Twitter using a lexicon approach is proposed. However, none of the above approaches have shown the importance of hyper- parameter tuning while training a classifier. To understand the effect of hyper-parameter tuning, the proposed framework does a comparative analysis of the following classifiers: Naive Bayes, SVM, Logistic Figure 1: Flowchart of the Proposed System Regression and RF Algorithm. The motivation of the proposed study is to help and guide the decision-maker to choose the most appropriate classifier for a given Tokenization: It is the process of recognizing basic dataset. units inside a sentence which need not be disintegrated in subsequent processing. The resultant individual units after the process of tokenization are known tokens. 3 Proposed Methodology These tokens are input to the next step(s) in the pre- processing stage. 455 Stop words removal: Most words in a sentence or a 4 Rando G(t) = 1 - All as default. paragraph are connecting words which do not m Σp2(k|t) contribute much towards the polarity. In this process, Forest these unnecessary words from the text are removed. According to the proposed framework, this step is not optional. In absence of stop-word removal, the feature Data Set Features: space might get too large, which can significantly affect the performance of the algorithm(s). ReviewTitle : Title of the Review ReviewBody : Body of the Review Stemming: In this process, the characters in a word are ReviewStar : Stars given by Customer ProductProduct: removed which reduces the word to its root. In the Name proposed work, Porter stemming is used to perform this task. It works by removing the everyday person's morphological and inflexional endings from words in 4 Design and Implementation English. The ideology of the proposed work is to understand the Lemmatization: The objective of lemmatization is usage of online product reviews taken from a well- equivalent to Stemming. It reduces inflectional known dataset repository (Kaggle). The steps in the structures and derivationally related types of a word for proposed methodology are as follows: a typical base structure [3]. It takes into consideration the meaning of the word rather than stemming, which aims to reduce the characters in the word. Step 1 : Data collection of reviews for products Step 2 . Data cleaning like stop-words removal. 3.2 Hyper Parameter Tuning Step 3.1: Tokenize each review. Hyper-parameters are the values which are used in Step 3.2: Lemmatize each word. machine learning algorithms and whose values are set before the learning process begins [12]. Tuning of the Step 3 : Converting text to numerical features using hyper-parameters means finding out the best suited Bag-of-Words. values for each algorithm which would work best. Step 4 : Splitting data into train and test data. Every algorithm has its different hyper-parameters to Step 5 : Analysing different algorithms. be tuned. The respective hyper-parameters for each Step 5.1 : Apply different machine learning algorithm are shown in the Table 1. algorithms on the cleaned text and analyse the accuracy of the respective model. Table 1: Hyper-parameter tuned with respect to Step 5.2 : Hyper-parameter tuning for the various classification algorithms algorithm with the best accuracy on the given dataset. S.No Algorit Formula Parameters hm As it can be seen from the step 1 to step 5, the 1 Logisti Vi = β0 + β1Y1 C=0.7(For proposed model fetches the data, performs cleaning or c + ε1 Hyperparaneter Regres tuning), remove stop words, classifies reviews, and gets the sion Other as default. polarity of the reviews. Further, almost all machine learning methods can be used to the task of classifying 2 Naive P(c | t) = P(c) . var_smoothing = Bayes P(t | c) 10^-12 texts. Most often used and well-proven SVM, Bayes Other as default. Method, Nearest Neighbor Method, Neural networks, Decision trees, Rocher classifier. However, the 3 k-NN S(d, ci) = n_neighbors = 10 Σsim(d,dj) Other as default. proposed work develops an appropriate method for the δ(dj,ci) Classification of online user review text using four classic algorithms: SVM, Logistic Regression, Naïve 456 Bayes, and RF. These algorithms are easy to The confusion matrix of Logistic Regression is shown understand and used widely in the literature. in Table 2 and that of SVM prediction is shown in Table 4. Also, the Precision - recall of Logistic Regression is shown in Table 3 and that of SVM is 5 Experimental Results depicted in Table 5. In order to implement the above-mentioned steps, Python is used for sentiment analysis. The packages, thus, utilized includes CGI, counter, accuracy_score, Table 3: Precision - recall with F1-score matrix of model_selection, nltk, stopwords, Logistic model used WordNetLemmatizer, train_test_split, RandomizedSearchCV, Logistic Regression. The Precisio Recall F1-score Support experimental results show that SVM and Logistic n Regression have better average performance than RF and Naïve Bayes. Initially, Logistic Regression reached 0 0.60 0.62 0.60 34 89% using combinations of representative design with prior processing tokenization, filtering, normalization, and root stemmer. TF-IDF is used as a representation 1 0.925 0.939 0.932 66 of characteristics with/without a selection of any feature. SVM reached 81.00% using a combination of Accuracy 0.89285 100 tokenization, filtering in as pre-processing, and TF-IDF as a representation indicator with information gain as a choice of the indicator. Further, it is observed that easy Macro 0.77 0.78 0.77 100 stemming is the best cutting technique. This is because avg easy stemming is better than stemming from linguistics. From the semantic point of view, it takes Weighted 0.89 0.89 0.89 100 the least time for pre-processing and has the excellent avg average classification accuracy. Also, it is observed that the development of indicators (hyper- parameters) is very important for improving the accuracy of the Table 4: Confusion matrix of SVM prediction effect classification. Actual Value Table 2: Confusion matrix of Logistic prediction 1 0 Merge effect (positive (negative class) class) Actual Value Predictive 1 261 (TP) 122 (FP) 383 value (positive 1 0 Merge class) (positiv (negativ 0 146 (FN) 871 (TN) 1017 e class) e class) (negative Predict 1 (positive 29 (TP) 5 (FP) 34 class) ive class) value Merge 407 993 1400 0 (negative 4 (FN) 62 (TN) 66 class) Merge 33 67 100 457 Table 5: Precision - recall with F1-score matrix of Table 7: Comparison with State-of-the Art Methods SVM model [13,14,15] Precision Recall F1- suppor Paper Title Dataset Accuracy score t Amazon Amazon MNB 72.95% 0 0.64 0.68 0.66 383 Reviews, Product business Reviews SVM 80.11% analytics with (Mobile 1 0.88 0.86 0.87 1017 sentiment Reviews) analysis [13] Feature Amazon Phrase Level 70.00% Accuracy 0.81 1400 Selection data for Methods in Books Single Word 70.00% Sentiment Macro 0.76 0.77 0.76 1400 Analysis and Multi Word 80.00% avg Sentiment Classification Amazon Phrase Level 62.00% of Amazon data for Weighted 0.81 0.81 0.81 1400 Product Music Single Word 80.00% avg Reviews [14] Multi Word 68.00% Amazon Phrase Level 62.00% Table 6: Comparative Analysis of Accuracy in data for Various Learning Models (Product Reviews) Camera Single Word 80.00% Multi Word 68.00% Model Accuracy Sentiment Data of 84.44% Naive Bayes 70.5% Analysis in reviews of Amazon books Reviews Using Data of 87.33% Random Forest 78.5% Probabilistic reviews of Machine Kindle SVM 83% Learning [15] Proposed Amazon Naive Bayes 70.5% Model Earphone Logistic Regression 91% Review Random 78.5% Dataset Forest SVM 83% 6 Comparative Analysis Logistic 91% Regression In this section, the proposed methodology is compared with the existing works in the literature. A comparative analysis of this study is shown in Table 7 which From Table 7, the approaches dealt do not explicitly examines the proposed methodology with other similar concentrate on the values of hyper-parameters during works in literature [13,14,15]. the process of training however in the proposed work, Hyper-parameter tuning on the Logistic Regression model gave the best accuracy when sample models 458 were trained. The empirical analysis suggests that these [3] Nguyen, H., Veluchamy, A., Diop, M. and Iqbal, parameters play a vital role in improving the resulting R., 2018. Comparative Study of Sentiment Analysis accuracy. This is because tuning hyper-parameters with Product Reviews Using Machine Learning and helps in getting rid of under-fitting and over-fitting of Lexicon-Based Approaches. SMU Data Science the model. Hyper-parameter tuning helps in reducing Review, 1(4), p.7. loss factor through a great margin as the parameters are [4] Bhatt, A., Patel, A., Chheda, H. and Gawande, K., fine tuned in correspondence to the training data. For 2015. Amazon review classification and sentiment example - In Logistic regression, to get the right analysis. International Journal of Computer Science classifying plane it is really important to get the and Information Technologies, 6(6), pp.5107-5110. appropriate weights associated with each of the features. This can be easily tested by tuning the hyper- [5] Shivaprasad, T.K. and Shetty, J., 2017, March. parameters which is true for other algorithms also. Sentiment analysis of product reviews: a review. In 2017 International Conference on Inventive Communication and Computational Technologies 7 Conclusion and Future Work (ICICCT), pp. 298-301. IEEE. With the increased interest of people in online [6] Ray, P. and Chakrabarti, A., 2017, February. shopping, tweeting, writing opinions, there is a need to Twitter sentiment analysis for product review using analyze these opinions that contain a large amount of lexicon method. In 2017 International Conference on decision-making information. This information is Data Management, Analytics and Innovation useful for both customers as well as for the (ICDMAI), pp. 211-216. IEEE. manufacturer. With the proposed methodology, these [7] onnx.ai , https://onnx.ai/[Last Accessed on:26-12- opinions are analyzed using various classification 2019] algorithms. Also, the importance of product reviews is analyzed. The classification of the reviews is discussed [8]es.scribd.com, https://www.scribd.com/[Last with an emphasis on the importance of hyper- Accessed on :23/12/2019] parameter tuning. Through empirical testing it is [9] Gautam, A., Bhateja, V., Tiwari, A. and Satapathy, observed that hyper-parameter tuning is of great S.C., 2018. An improved mammogram classification significance and can improve the accuracy of any approach using back propagation neural network. In classification algorithm drastically. From the Data Engineering and Intelligent Computing (pp. 369- experimental results obtained, it is observed that 376). Springer, Singapore. Logistic regression outperforms other algorithms in [10] Nandal, N., Tanwar, R. and Pruthi, J., 2020. classifying the reviews with an accuracy of 91%. This Machine learning based aspect level sentiment analysis study can be further utilized to understand the effect of for Amazon products. Spatial Information Research, parameters and hyper parameters used in various pp.1-7. classification algorithms. The proposed methodology can be studied with soft computing techniques as well. [11] Verma, Pawan K. and Agrawal, Prateek, "Study and Detection of Fake News: P2C2-Based Machine Learning Approach", International Conference on Data References Management, Analytics and Innovation, pp. 261-278, 2020 [12] Madaan, V. and Goyal, A., 2020. Predicting [1] Pang, B. and Lee, L., 2004. A sentimental Ayurveda Based Constituent Balancing in Human education: Sentiment analysis using subjectivity Body Using Machine Learning Methods, IEEE summarization based on minimum cuts. In Proceedings ACCESS, 8(1), pp. 65060-65070. of the 42nd annual meeting on Association for Computational Linguistics, pp. 271. [13] Elli, M.S. and Wang, Y.F., 2016. Amazon Reviews, business analytics with sentiment analysis. [2] Liu, B., 2012. Sentiment analysis and opinion mining. Synthesis lectures on human language [14] Shaikh, T. and Deshpande, D., 2016. Feature technologies, 5(1), pp.1-167. selection methods in sentiment analysis and sentiment 459 classification of amazon product reviews. Int J Comput [15] Rain, C., 2013. Sentiment analysis in amazon Trends Technol, 36(4), pp.225-230. reviews using probabilistic machine learning. Swarthmore College.