1. Introduction

Amazon review classification and sentiment analysis. International Journal of Computer Science and Information Technologies

Pragmatic Analysis of Classification Techniques based on Hyper- parameter Tuning for Sentiment Analysis

Charu Gupta

charu.wa1987@gmail.com 0

Prateek Agrawal

prateek061186@mail.com 1

Rohan Ahuja

Kunal Vats

kunal.vats.bpit@gmail.com 0

Chirag Pahuja

chirag.bpit@gmail.com 0

Tanuj Ahuja

tanuj.bpit@gmail.com 0 0 Department of Computer Science and Engineering, Bhagwan Parshuram Institute of Technology , India 1 Department of ITEC, University of Klagenfurt , Austria

2020

6 6 5107 5110

The evolution of technology and strong social network has empowered the online user community to share their views on almost every product, event or issue. This has led to a large amount of unstructured online user generated data. Furthermore, every company selling online products analyses its product's demand and also focuses on their corresponding user reviews. This online user data needs to be analyzed for effective decision making either for the user or for the manufacturer. For this, Sentiment Analysis plays a vital role and is extremely useful in social media monitoring as it allows insight of the wider public opinion. In the present study, Amazon product review dataset is used to perform sentiment analysis. The proposed model is trained for four different classifiers: Naive Bayes, Support Vector Machine, Logistic Regression, and Random Forest with different hyper-parameter tuning. The model achieved a maximum accuracy of 91% using Logistic Regression. Furthermore, a comparative analysis of various algorithms is also discussed. The study focuses on the importance of hyper parameter tuning while training a classifier which helps in achieving better results than other previous approaches.

eol>Amazon Product Reviews Classification Sentiment Analysis Social Media Hyper-parameter tuning machine learning classification SVM Naïve Bayes Random forest Logistic regression

1. Introduction

Sentiment analysis or opinion mining is a field of natural language processing which analyses the positive, negative or neutral sentiments (emotions) associated in text, speech or both. It extracts subjective information from the text corpus to provide valuable insights which provide the required decision-making rules to business. Sentiment mining is a significant research area as there is a significant increase in the user online data on E-trade sites where understanding an individual's opinions is an important criterion. Around 90% of the users' information has been given during the most recent two years. Hence, there is a dire need to carefully analyse this plethora of information. Although, sentiment analysis is one of the most widely used techniques to find sentiment in the text, it has numerous challenges [9]. Firstly, online text on the internet consists of slang, abbreviations, typos, poor punctuation, poor grammar, which makes it difficult for the classifier to predict accurate results. Secondly, sarcasm in text data is a major problem in identifying the polarity of the statement [11]. Thirdly, anaphora resolution which is the process of resolving the reference of a pronoun or a noun phrase in a sentence [3]. For example, "We went to play cricket and watched the movie, it was awful." What does "It" refer to here? This is a significant hurdle in the process of sentiment analysis. Furthermore, the ability to identify the correct interpretation of the context in which certain words used remains a challenge.

In this paper, an online user review analysis system (based on text only) is designed to create an easy to use environment which can be used by the companies/manufacturers to analyse the impact (good or bad) of the company's product in the market. The proposed methodology is experimented with four different classifiers, namely Naive Bayes, Support Vector Machine (SVM), K- Nearest Neighbors, and Random Forest (RF) [6] on the amazon earphone review dataset. The motivation behind the proposed methodology is to critically examine various classifiers with hyper-parameter tuning for predicting the best result of finding the polarity of the text. The results are further compared with existing works in the literature.

The rest of the paper is organized in the following sections. Section 2 discusses the related work and section 3 explains the methodology of the proposed work. Section 4, 5 and 6 illustrates the implementation, experimental results and comparative analysis respectively. Section 7 discusses the conclusion of the proposed methodology with critical examination of the results and future work.

2 Related Work

Nowadays, every company wants to analyse how good their products are in the online market. May it be an online store or only an organization that wants to test its employee's satisfaction. Opinion mining and sentiment analysis have long been proposed as a technique used to solve this problem and became a field of interest for many researchers. Sentiment Analysis has been tackled at various levels of detail, including document-level classification in [1], sentence-level in [2], and phrase-level in [3]. In [4] the methodology used integrates existing sentiment analysis approaches and increases the accuracy of the system. In [5], it is shown that support vector machines (SVM) perform better than Naive Bayes, which agrees with the proposed results. In [6], technique for opinion mining using R on plain text data from Twitter using a lexicon approach is proposed. However, none of the above approaches have shown the importance of hyperparameter tuning while training a classifier. To understand the effect of hyper-parameter tuning, the proposed framework does a comparative analysis of the following classifiers: Naive Bayes, SVM, Logistic Regression and RF Algorithm. The motivation of the proposed study is to help and guide the decision-maker to choose the most appropriate classifier for a given dataset.

3 Proposed Methodology

A flowchart is a sort of framework that addresses a work procedure or method. A flowchart can similarly be described as a diagrammatic depiction of any process or method [4]. The flow graph in Figure 1 depicts the proposed methodology of the process of this paper.

3.1 Data Preprocessing and Dataset

Data pre-processing is mainly carried out to remove inconsistent, noisy, and incomplete data from the training set. It consists of different steps: tokenization, stop words removal, Stemming, Lemmatization [10]. Tokenization: It is the process of recognizing basic units inside a sentence which need not be disintegrated in subsequent processing. The resultant individual units after the process of tokenization are known tokens. These tokens are input to the next step(s) in the preprocessing stage.

Stop words removal: Most words in a sentence or a paragraph are connecting words which do not contribute much towards the polarity. In this process, these unnecessary words from the text are removed. According to the proposed framework, this step is not optional. In absence of stop-word removal, the feature space might get too large, which can significantly affect the performance of the algorithm(s).

Stemming: In this process, the characters in a word are removed which reduces the word to its root. In the proposed work, Porter stemming is used to perform this task. It works by removing the everyday person's morphological and inflexional endings from words in English.

Lemmatization: The objective of lemmatization is equivalent to Stemming. It reduces inflectional structures and derivationally related types of a word for a typical base structure [3]. It takes into consideration the meaning of the word rather than stemming, which aims to reduce the characters in the word.

3.2 Hyper Parameter Tuning

Hyper-parameters are the values which are used in machine learning algorithms and whose values are set before the learning process begins [12]. Tuning of the hyper-parameters means finding out the best suited values for each algorithm which would work best. Every algorithm has its different hyper-parameters to be tuned. The respective hyper-parameters for each algorithm are shown in the Table 1. All as default. 4

Rando m Forest

G(t) = 1 Σp2(k|t)

Data Set Features: ReviewTitle : Title of the Review ReviewBody : Body of the Review ReviewStar : Stars given by Customer ProductProduct: Name 4 Design and Implementation

The ideology of the proposed work is to understand the usage of online product reviews taken from a wellknown dataset repository (Kaggle). The steps in the proposed methodology are as follows: Step 1 : Data collection of reviews for products Step 2 . Data cleaning like stop-words removal.

Step 3.1: Tokenize each review. Step 3.2: Lemmatize each word.

Step 3 : Converting text to numerical features using Bag-of-Words.

Step 4 : Splitting data into train and test data. Step 5 : Analysing different algorithms.

Step 5.1 : Apply different machine learning algorithms on the cleaned text and analyse the accuracy of the respective model.

Step 5.2 : Hyper-parameter tuning for the algorithm with the best accuracy on the given dataset.

As it can be seen from the step 1 to step 5, the proposed model fetches the data, performs cleaning or remove stop words, classifies reviews, and gets the polarity of the reviews. Further, almost all machine learning methods can be used to the task of classifying texts. Most often used and well-proven SVM, Bayes Method, Nearest Neighbor Method, Neural networks, Decision trees, Rocher classifier. However, the proposed work develops an appropriate method for the Classification of online user review text using four classic algorithms: SVM, Logistic Regression, Naïve Bayes, and RF. These algorithms are understand and used widely in the literature. easy to

5 Experimental Results

In order to implement the above-mentioned steps, Python is used for sentiment analysis. The packages, thus, utilized includes CGI, counter, accuracy_score, model_selection, nltk, stopwords, WordNetLemmatizer, train_test_split, RandomizedSearchCV, Logistic Regression. The experimental results show that SVM and Logistic Regression have better average performance than RF and Naïve Bayes. Initially, Logistic Regression reached 89% using combinations of representative design with prior processing tokenization, filtering, normalization, and root stemmer. TF-IDF is used as a representation of characteristics with/without a selection of any feature. SVM reached 81.00% using a combination of tokenization, filtering in as pre-processing, and TF-IDF as a representation indicator with information gain as a choice of the indicator. Further, it is observed that easy stemming is the best cutting technique. This is because easy stemming is better than stemming from linguistics. From the semantic point of view, it takes the least time for pre-processing and has the excellent average classification accuracy. Also, it is observed that the development of indicators (hyper- parameters) is very important for improving the accuracy of the classification. 34 66 100 100 100 The confusion matrix of Logistic Regression is shown in Table 2 and that of SVM prediction is shown in Table 4. Also, the Precision - recall of Logistic Regression is shown in Table 3 and that of SVM is depicted in Table 5.

6 Comparative Analysis

In this section, the proposed methodology is compared with the existing works in the literature. A comparative analysis of this study is shown in Table 7 which examines the proposed methodology with other similar works in literature [ 13,14,15 ].

F1score From Table 7, the approaches dealt do not explicitly concentrate on the values of hyper-parameters during the process of training however in the proposed work, Hyper-parameter tuning on the Logistic Regression model gave the best accuracy when sample models 72.95% 80.11% 70.00% 80.00% 62.00% 80.00% 68.00% 62.00% 80.00% 68.00% 70.5% 78.5% 83% 91% Phrase Level

70.00% were trained. The empirical analysis suggests that these parameters play a vital role in improving the resulting accuracy. This is because tuning hyper-parameters helps in getting rid of under-fitting and over-fitting of the model. Hyper-parameter tuning helps in reducing loss factor through a great margin as the parameters are fine tuned in correspondence to the training data. For example - In Logistic regression, to get the right classifying plane it is really important to get the appropriate weights associated with each of the features. This can be easily tested by tuning the hyperparameters which is true for other algorithms also.

7 Conclusion and Future Work

With the increased interest of people in online shopping, tweeting, writing opinions, there is a need to analyze these opinions that contain a large amount of decision-making information. This information is useful for both customers as well as for the manufacturer. With the proposed methodology, these opinions are analyzed using various classification algorithms. Also, the importance of product reviews is analyzed. The classification of the reviews is discussed with an emphasis on the importance of hyperparameter tuning. Through empirical testing it is observed that hyper-parameter tuning is of great significance and can improve the accuracy of any classification algorithm drastically. From the experimental results obtained, it is observed that Logistic regression outperforms other algorithms in classifying the reviews with an accuracy of 91%. This study can be further utilized to understand the effect of parameters and hyper parameters used in various classification algorithms. The proposed methodology can be studied with soft computing techniques as well. [1] Pang, B. and Lee, L., 2004. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd annual meeting on Association for Computational Linguistics, pp. 271. [2] Liu, B., 2012. Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, 5(1), pp.1-167. [3] Nguyen, H., Veluchamy, A., Diop, M. and Iqbal, R., 2018. Comparative Study of Sentiment Analysis with Product Reviews Using Machine Learning and Lexicon-Based Approaches. SMU Data Science Review, 1(4), p.7. [5] Shivaprasad, T.K. and Shetty, J., 2017, March. Sentiment analysis of product reviews: a review. In 2017 International Conference on Inventive Communication and Computational Technologies (ICICCT), pp. 298-301. IEEE. [6] Ray, P. and Chakrabarti, A., 2017, February. Twitter sentiment analysis for product review using lexicon method. In 2017 International Conference on Data Management, Analytics and Innovation (ICDMAI), pp. 211-216. IEEE. [7] onnx.ai , https://onnx.ai/[Last Accessed on:26-122019]

https://www.scribd.com/[Last [9] Gautam, A., Bhateja, V., Tiwari, A. and Satapathy, S.C., 2018. An improved mammogram classification approach using back propagation neural network. In Data Engineering and Intelligent Computing (pp. 369376). Springer, Singapore. [10] Nandal, N., Tanwar, R. and Pruthi, J., 2020. Machine learning based aspect level sentiment analysis for Amazon products. Spatial Information Research, pp.1-7. [14] Shaikh, T. and Deshpande, D., 2016. Feature selection methods in sentiment analysis and sentiment

[8]es .scribd.com, Accessed on : 23 /12/2019] classification of amazon product reviews . Int J Comput Trends Technol , 36 ( 4 ), pp. 225 - 230 .

[15] Rain , C. , 2013 . Sentiment analysis in amazon reviews using probabilistic machine learning .