Building a fuzzy system for opinion classification across different domains Matheus Cardoso Angelo Loula State University of Feira de Santana (UEFS) Matheus Giovanni Pires and Federal University of Bahia (UFBA) State University of Feira de Santana (UEFS) Salvador, Brazil Feira de Santana, Brazil matheus.mcas@gmail.com {angelocl, mgpires}@ecomp.uefs.br Abstract them in phrases and documents we have to deal with vague and imprecise terms, such as ”good”, ”very nice”, ”bad”, Opinions are central in almost all human activi- among others. Due to the nature of this data, Fuzzy Logic ties, because they are a relevant influence on peo- [Zadeh, 1965] can be a promising approach to deal with this. ples behavior. The internet and the web have cre- Given the importance of opinions in human lives, the com- ated mechanisms that made possible for people to mercial and political relevance, the huge amount of gener- share their opinions and for other people and or- ated data that has to be automatically handled, besides the ganizations to find out more about opinions and vague and imprecise nature of the data, this paper aims to pro- experiences from individuals and help in decision pose and evaluate an automated fuzzy opinion mining system making. Still, opinions involve sentiments that are to classify the overall sentiment orientation of a document’s vague and inaccurate textual descriptions. Hence, text. Our proposal differs from others because it generates due to data’s nature, Fuzzy Logic can be a promis- fuzzy rules based on most fitted features among almost 70 ing approach. This paper proposes a fuzzy sys- features that were extracted from documents, introducing the tem to perform opinion classification across differ- use of the Wang-Mendel method [Wang and Mendel, 1992]. ent domains. Almost 70 features were extracted We apply those rules to perform opinion classification across from documents and multiple feature selection al- different domains. gorithms were applied to select the most fitted fea- tures to classify documents. Over the selected fea- The next section writes about related works, describing tures, the Wang-Mendel (WM) method was used to previous works on opinion mining and applications of fuzzy generate fuzzy rules and classify documents. The logic. The following section outlines the opinion mining pro- WM fuzzy system based achieved 71,25% of accu- cess, specifying all stages involved in opinion mining work racy in a 10-fold cross-validation. flow. Results from our approach are shown and discussed next. The last section concludes this paper pointing out our contributions and some future improvements to this research. 1 Introduction Opinions are central in human lives. In almost all people’s 2 Related works daily tasks, they ask or seek other people’s opinions to help them make decisions, such as what movie to watch, what car, The research in opinion mining began with subjectivity de- book or notebook, for instance, to buy or to know what are tection, dating back to the late 1990s, with [Wiebe, 1990; the political standpoint of their neighborhood about a certain 1994]. This task involves separating non-opinionated, neutral issue. The internet and the web have created mechanisms and objective sentences from subjective sentences carrying that made possible for people to share their opinions and also heavy sentiments. Following the years, starting at 2000s the organizations to find out more about opinions and experien- overall research focus has shifted to divide the language units ces from other individuals, most of them unknown persons. into three categories: negative, positive and neutral. From This mechanisms, over the time, have created a huge amount there many works on this task, also known as sentiment anal- of opinative sources, hard for a person to process by itself. ysis or sentiment classification, among other naming, has ar- Hence, an automated opinion mining system is required, one rived. that could identify and extract opinions from text would be an One of the first research studies on unsupervised opinion enormous help to someone sifting through the vast amounts mining was [Turney, 2002]. Similar to the task of classifying of news and web data [Wilson et al., 2004]. documents as positive or negative, [Turney, 2002] proposed Opinion mining is the process that seeks to predict the to classify reviews as recommended (thumbs up) or not rec- overall sentiment orientation conveyed in a piece of text such ommended (thumbs down). The classification of a review is as a user review of a movie or product, blog post or editorial predicted by the average semantic orientation of the phrases [Ohana et al., 2011]. Attached to opinions, there are senti- in the review that contain adjectives or adverbs. He got a av- ments. Sentiments are intrinsically subjective and to identify erage of 74% accuracy across domains. On the other hand, [Pang et al., 2002] was one of the first to opinion mining process that we used, describing each stage propose using classic machine learning techniques in opinion and the relevant techniques used on them. mining. Comparing the performance between Naive Bayes, Maximum Entropy and Support Vector Machine (SVM), this 3 The opinion mining process work showed that such techniques produces high accuracy Our opinion mining process is composed by five stages: do- levels, achieving 82.9% of accuracy using only isolated words main definition, preprocessing and transformation, feature (called unigrams) with SVM. It showed as well that super- extraction and selection, classification and evaluation. In the vised techniques shows better results than unsupervised ap- first stage it is defined what kind of data will be handled by proaches. However, they are domain dependent, producing the system and what datasets will be used. We picked up even poorest results in other kind of data, demanding another the widely used Cornell Movie Review Data 2.0 [Pang and training round of the classifier, increasing cost and time to Lee, 2004] and a mixed dataset containing Amazon products classify documents. [Wang et al., 2011], such as camera, mobile phone, TV, lap- Related to our work, there are the works from [Wilson et top, tablet, among others to evaluate our cross domain pro- al., 2005], [Taboada et al., 2008] and [Ohana and Tierney, posal. 2009] that use a wide range of document’s features. These features range from the count of adjectives, adverbs in phrase 3.1 Preprocessing and Transformation or whole document, tuples of words (called bigrams if two In the preprocessing stage, data filtering takes place and a words; trigrams if three words), such as adverbs and ad- document representation model is built. There are three ba- jectives, to the sum of polarities and many others features. sic levels of document analysis: document, sentence and en- [Taboada et al., 2008] and [Ohana and Tierney, 2009] use a tities and its aspects [Liu, 2012]. The first level focus on semantic lexicon, Sentiwordnet [Esuli and Sebastiani, 2006], classify opinions as positive or negative from the whole doc- to assign numeric values to word’s semantic orientation. In ument perspective. The second one seeks to classify opin- classifying documents as negative or positive, the results ob- ions of each sentence in a document and the last level looks tained were 65,7% of accuracy in [Wilson et al., 2005], 80.6% to classify opinions targeted to aspects of the found enti- in [Taboada et al., 2008], and 69.35% in [Ohana and Tierney, ties. We chose the document level analysis [Turney, 2002; 2009]. Pang et al., 2002; Pang and Lee, 2004; Taboada et al., 2008]. Although it has been shown that Fuzzy Logic is suitable to As a first step, we remove all the sentences in a document handle imprecise and vague data [Zadeh, 1996; Wang, 2003], that has modal words on it, such as ”would”, ”could”, among we found only a few works applying fuzzy concepts to opin- others. Modals indicates that the words appearing in a sen- ion mining, such as fuzzy sets or fuzzy inference systems. tence might not be reliable for the purposes of sentiment anal- One of the few papers found was [Nadali et al., 2010]. It ysis [Taboada et al., 2011]. Next, all words in each document proposes a fuzzy logic model to perform semantic classifica- are tagged with its grammatical class using a POS (Part of tions of customers review into five classes: very weak, weak, Speech) tagger [Brill, 1995]. moderate, very strong and strong. Also introduces a method- The document model in our approach is the popular bag-of- ology that implies use of a fuzzy inference system, fuzzy sets words model in which a document is represented as a vector, that models the five classes and manually created IF-THEN whose entries correspond to individual terms of a vocabulary rules. However, the paper did not describe results or further [Moraes et al., 2012]. These terms are called generically as discussion. n-grams. They can be unigrams (only one word), bigrams Another paper was [Ballhysa and Asilkan, 2012] that pro- (two words) and trigrams (three words). For each document, poses a fuzzy approach for discovering the underlying opin- one n-gram vector leaves to the next step of the process. ion in entries in blogs, determining the overall polarity. The We defined 7 types of n-grams: adjectives, adverbs, verbs authors presented fuzzy concepts such as fuzzy sets and fuzzy as unigrams ; adverbs with adjectives (e.g. very good) , sets operations. They proposed a set of fuzzy measures (from adverbs with verbs (e.g. truly recommend), adverbs with counting manually chosen keywords) and a single fuzzy ag- adverbs as bigrams and one type of trigram, the combina- gregation of these measures, but a fuzzy inference system is tion of two adverbs with one adjective (e.g. not very nice) not used. However, the proposed measures seem to actually [Pang et al., 2002; Turney, 2002; Taboada et al., 2008; correspond crisp value, so there is no actual application of Karamibekr and Ghorbani, 2012]. fuzzy logic. Moreover, there is only a superficial description We also look for special types of bigrams and trigrams: the of results, obtained on their own dataset with no comparison negated n-grams (e.g. not bad, nothing special). This tech- with other works. nique is called negation detection and by itself it is a entire This paper differs from previous work on applying fuzzy line of research, going beyond this work scope, but we use a systems for opinion mining. We model fuzzy variables and simple version from [Taboada et al., 2011]. build a fuzzy inference system based on document features. At this point stage, each document was transformed into We run our tests in datasets already used in previous works, a n-gram bag-of-words vector. Each n-gram is now associ- allowing direct comparison. Besides, we propose a feature ated with a numeric value, an opinion polarity degree, using extraction and selection stage, where we extract a great num- an opinion lexicon. Opinion lexicons are resources that as- ber of features from documents, based on previous works and sociate words with sentiment orientation [Ohana and Tierney, extended with our own features, and perform feature selection 2009]. Hence, we decided to use a automatically built opin- based on different algorithms. The next section presents the ion lexicon, the Sentiwornet [Baccianella et al., 2010]. SentiWordNet (SWN) is a lexical resource explicitly de- need immense annotated training datasets, huge amount of vised for supporting sentiment classification. SWN provides time for training and still produces poorest results across do- positive, negative and objective scores (ranging from 0 to 1) main without full retraining. for each sense of English words. Since words can have multi- Different studies proposed many various features to de- ple senses, we apply the approach proposed by [Guerini et al., scribe or discriminate documents among themselves to iden- 2013], called prior priorities, to derive positive and negative tify their polarities [Wilson et al., 2005; Ohana and Tierney, polarity for words. 2009; Taboada et al., 2011]. In order to capture diverse as- To determine polarity degrees for bigrams and trigrams, pects from documents, we decided to extract a great number we consider adverbs as modifiers, subdivided into amplifiers of features, so we used features presented in these works and (e.g. very) and downtoners (e.g. slightly) to increase or de- derived many others, obtaining a total of 67 features. crease adjective (unigram) values, respectively [Quirk et al., Three kinds of features were defined: sum, count and max- 1985]. Downtoners and amplifiers have sub-levels, each of imum values. Sum features involves the numerical sum of them has a modifier value associated, such as -0.5 to ”low- polarity degrees for different types of n-grams, such as sum est” downtoners and 0.25 to ”high” amplifiers, among oth- of adjectives of a document, sum of adverbs, verbs, bigrams ers sub-levels. The final score s for a bigram is defined by composed by adverb and adjective, sum of trigrams, among s(bigram) = s(unigram) + s(unigram) · s(modif ier) others. The count features proceeds in a similar way for dif- and score s for trigram by s(trigram) = s(bigram) + ferent types of n-grams, counting the number of positive or s(bigram) · s(modif ier). negative polarity values. The special case among bigrams and trigrams are the The maximum values features refer to the maximum value negated ones. For these, instead of use modifiers, we apply of a given type of n-gram in a document. For instance, if the a similar approach made by [Taboada et al., 2011], shifting maximum absolute value among the unigrams is positive, this the n-gram polarity to the opposite sign by a fixed amount feature has the value 1. On the other side, if maximum value (0.5, empirically defined). [Taboada et al., 2011] has shown is negative, this feature has the value -1. This feature was as well that shift polarity is better than just invert the n-gram obtained for unigrams, bigrams and trigrams. polarity sign. More features were derived from the three kinds described Other technique was the attenuation by n-gram frequency, above by applying normalization or subtraction of features. in which a term polarity is decreased by the number of times For instance, the difference between positives and negatives that it appears in the document. The nth appearance of a word bigrams of a document and the normalized sum of positive in text will have the new score s’ defined by s0 (word) = adjectives are one of these derived features. s(word)/n. The repetition of an adjective, for instance, sug- After the feature extraction step, vectors of n-grams and gests that the writer lacks additional substantive commentary, polarity values are replaced by feature vectors. Each docu- and is simply using a generic positive word [Taboada et al., ment in the dataset is now represented by a 67 size feature 2011]. Also we have used a bias compensation to negative vector. term polarities. Lexicon-based sentiment classifiers generally show a positive bias [Alistair and Diana, 2005], likely the re- 3.3 Feature selection sult of a universal human tendency to favor positive language [Boucher and Osgood, 1969]. So, we increased the final n- This stage is commonly found in opinion mining approaches. gram degree of any negative expression (after other modifiers It can make classifiers more efficient/effective by reducing have applied) by a fixed amount (currently 50%). In the end feature vector dimensionality, the amount of data to be ana- of this stage, we have a vector of n-grams associated with lyzed as well as identifying relevant features to be considered polarity degrees for each document of the dataset. [Moraes et al., 2012]. To choose the features among the ones extracted and reduce the amount of features to be analyzed by 3.2 Feature extraction the classifier, we used two algorithms for feature selection, the Correlation−based Feature Selection (CFS) and feature In this step, we extract document features from the previ- selection from C4.5 decision tree [Cintra et al., 2008]. ous numerical n-grams vectors to intent to be domain inde- pendent. We decided this approach, because is effective as CFS evaluates subsets of features on the basis that a suit- it takes reviews, checks documents features and decides its able feature subsets contain features highly correlated with semantic orientation considering only its characteristics, in- the classification, yet uncorrelated to each other [Hall, 1999]. stead of its specific contents. The features we use are not spe- C4.5, in other hand, is an algorithm that generates a decision cific to a domain, and should be easily applicable to other do- tree that can be used to a classification task [Quinlan, 1993]. mains [Pang et al., 2002]. Also, this reduces features dimen- But, to build that tree, c4.5 needs to select the best features sionality, since the resultant features vector is significantly among the provided. Hence, we also use c4.5 as our feature smaller than a regular bag-of-words vector. selection algorithm. On the other hand, corpus-based machine learning meth- ods applied to opining mining are able to obtain high accu- 3.4 Classification racy rates, up to 95%, feeding word vectors directly to classi- In the classification stage, we build a rule-based fuzzy system fiers, which will learn from the given document corpus which classifier to predict the overall sentiment orientation, as pos- words are related to positive and negative contexts. However, itive or negative, of each document in the dataset. Building in order to reach their full potential, most of these approaches such classifier involves creating a set of rules based on the extracted features, modeling these features as linguistic vari- 4 Results ables with fuzzy sets, lastly, defining an inference system. In this section we describe and discuss our experiments and In order for fuzzy sets to appropriately model data, we its results. We aim to not only compare best classification ac- first identify outliers values in features and limit the range curacy but also discuss contexts in which the classifiers pro- of feature values. To do this, we used the three-sigma rule duce better or worse results. [Kazmier, 2004] to select outliers values that lie after three standard deviations from the mean of a feature, an interval 4.1 Datasets where 99.73% of the values in a normal distribution stand in. We performed our experiments on two datasets, as described Outliers values left out this range were modified to the ex- before. Each dataset consists of 2000 reviews that were pre- treme value of the accepted range. viously classified in terms of the overall orientation as being Now, with the input range standardized for every feature, either positive or negative (1000 positive and 1000 negative we can define the fuzzy sets [Zadeh, 1965] to model our input reviews). For the Amazon dataset, the ground truth was ob- and output variables. We decided to use triangular fuzzy sets. tained according to the customer 5-stars rating. Reviews with The first approach was use three fuzzy sets in the input (low, more than 3 stars were defined as being positive and reviews medium and high) and two sets for the output (negative and with less than 3 stars were labeled as being negative. Re- positive), uniformly distributed along the feature value range. views with 3 stars were not included in our analysis. In the Another approach was to use only two sets in the input, re- movie reviews dataset, all documents were already tagged as moving the medium fuzzy set. positive or negative. Once fuzzy variables were modeled from fuzzy sets, the next step was to build our fuzzy rule base using the Wang- 4.2 Design of experiments Mendel Fuzzy Rule generation [Wang and Mendel, 1992]. With previously specified fuzzy sets, this fuzzy rule gener- We focus on comparing GFRM and CFRM varying the con- ation method takes each data instance in the dataset, deter- figuration settings and comparing classification accuracy. We mines pertinence degrees in all fuzzy sets and builds rules also evaluate the influence of the feature selection algorithms, using the fuzzy sets with highest pertinence degrees, for each the inference systems themselves and the quantity of fuzzy input-output pair. sets in the input system. The generated fuzzy rule base along with the specified For each dataset, we performed the preprocessing, trans- fuzzy sets are then used by a fuzzy inference mechanism formation and feature creation stages as we described. But, to determine document polarity class. The mechanism used starting at feature selection we performed each stage only on were the General Fuzzy Reasoning Method (GFRM) and the training folds. For example, the fold 1 is used as test fold in Classic Fuzzy Reasoning Method (CFRM) [Cordon et al., classification and evaluation stages, but the remaining folds 1999]. are used to feature selection and build the combined fuzzy In this classification process, each document feature vector rule base for that fold. The same process is repeated for the is evaluated by all fuzzy rules and a compatibility degree is rest of the folds and our results are reported as the average of produced for each rule. The CFRM picks up the rule with the the test folds. Consequently, all kinds of n-grams combined maximum compatibility degree and assigns the rule output with all transformation techniques described in this work pass class to document. In the other side, GFRM takes the maxi- to feature selection stage to find out which features would be mum average compatibility degree between the two possible more fitted to represent documents. classes, positive and negative. In other words, GFRM cal- Feature selection algorithms evaluation culates the average degree among all rules with negative and To evaluate the feature selection algorithms we start with the positive output and assigns to document the class from the following settings: 3 fuzzy sets in the input and CFRM for maximum average compatibility degree. both datasets. Hence, using the two other parameters un- changed, we can evaluate the feature selection algorithms per- 3.5 Evaluation formance. Besides recall, precision, accuracy and F1 we also In order to evaluate our opinion classification approach, we verified the average quantity of selected features for each al- apply a 10-fold cross-validation. As measures of classifica- gorithm. Table (1) shows the results from movies and table tion performance, accuracy, recall, precision and F1 score (2) from Amazon dataset. were chosen. Accuracy is a measure of the ratio between documents that has been classified correctly to the total num- Movies CFS c4.5 ber of documents being classified. Recall measures the ratio Precision 55.69% ± 8.52% 82.85% ± 20.00% of documents correctly classified into a category to the total Recall 79.40% ± 31.15% 37.7% ± 39.16% number of documents truly belonging to that category. This Accuracy 53.5% ± 2.16% 55.7% ± 2.46% measurement indicates the ability to recall items in the cat- F1 59.08% ± 15.84% 35.40% ± 23.04% egory. Precision measures the ratio of the number of docu- Features selected 3.5 ± 0.5 1 ments correctly classified into a category to the total number of documents classified into category. And F1 score is a mea- Table 1: Results from movie reviews dataset sure that considers both the Precision and the Recall to com- pute the score. F1 is often considered as a weighted average As we can see, feature selection with c4.5 using CFRM and of the precision and recall. [Chaovalit and Zhou, 2005]. 3 fuzzy sets in the input obtained better overall precision and Movies CFS c4.5 decision to pick up c4.5 to reduce the complexity of the rules Precision 70.52% ± 8.91% 63.2% ± 18.45% and make them more human readable, we tried to reduce the Recall 69.8% ± 12.86% 73.8% ± 39.17% fuzzy sets, using only the ”Low” and ”High” fuzzy sets. Table Accuracy 68.75% ± 5.91% 53.5% ± 2.09% (5) shows the obtained results to movies and table (6) from F1 68.79% ± 5.51% 54.56% ± 19.95% Amazon dataset. Features selected 6.2 ± 1.66 1 3 fuzzy sets 2 fuzzy sets Table 2: Results from Amazon reviews dataset Precision 79.32% ± 15.54% 72.09% ± 4.28% Recall 45.7% ± 31.71% 69.50% ± 8.46% accuracy in movie reviews dataset, using almost four times Accuracy 60.9% ± 2.55% 71.25% ± 4.43% less features. However, the inverse occurs in Amazon dataset, F1 48.27% ± 16.01% 70.53% ± 5.55% where CFS with CFRM performs better than c4.5. But, in Amazon dataset, CFS uses even more features, creating rules Table 5: Inference systems results from movie reviews with six antecedents, on average, turning the rules less hu- dataset man readable. So, since c4.5 just needed one feature, gener- ating more readable rules, and also considering accuracy as the main reference of performance, despite of less balanced 3 fuzzy sets 2 fuzzy sets performance showed with lower F1 measure, we decided to Precision 65.14% ± 15.51% 73.32% ± 3.08% use c4.5 in both datasets. Recall 75.7% ± 33.27% 62.5% ± 4.58% Accuracy 59.65% ± 1.98% 69.9% ± 3.02% Inference system evaluation F1 60.97% ± 14.74% 67.43% ± 3.68% In this subsection we evaluate the performance of the cho- sen inference systems, CFRM and GFRM. As we did in the Table 6: Inference systems results from Amazon reviews last subsection, we fixed the remaining parameters, to bet- dataset ter evaluate the inference systems performances, maintaining the c4.5 algorithm and 3 fuzzy sets in the input. Table (3) Table (5) shows that the accuracy and F1 were signifi- shows results from movie reviews and table (4) from Ama- cantly improved by removing a fuzzy set, more specifically zon dataset. the ”medium”, remaining the ”low” and ”high” fuzzy sets. CFRM GFRM Also, between movies and amazon datasets, even though very Precision 82.85% ± 20.0% 79.32% ± 15.54% slightly, the best overall results is in movies. This is specially Recall 37.7% ± 39.16% 45.7% ± 31.71% interesting because movie reviews are often reported as the Accuracy 55.7% ± 2.46% 60.9% ± 2.55% most difficult type of reviews to be classified [Turney, 2002; F1 35.40% ± 23.04% 48.27% ± 16.01% Pang and Lee, 2004; Chaovalit and Zhou, 2005; Ohana and Tierney, 2009]. Table 3: Inference systems results from movies reviews In both datasets, the single feature selected by c4.5 was the dataset difference between the sum of positive and negative unigrams and bigrams composed by adjectives and adverbs. With this only feature, we could classify close to 70% of the movies re- CFRM GFRM views and Amazon reviews with two simple and human read- Precision 63.22% ± 18.45% 65.14% ± 15.51% able rules generated by Wang-Mendel method: Recall 73.8% ± 39.17% 75.7% ± 33.27% • IF the difference between the sum of positive and nega- Accuracy 53.5% ± 2.09% 59.65% ± 1.98% tive unigrams and bigrams composed by adjectives and F1 54.56% ± 19.95% 60.97% ± 14.74% adverbs is HIGH then POLARITY is POSITIVE Table 4: Inference systems results from Amazon reviews • IF the difference between the sum of positive and nega- dataset tive unigrams and bigrams composed by adjectives and adverbs is LOW then POLARITY is NEGATIVE The results shows that General Fuzzy Reasoning Method More results improves accuracy over the Classical Fuzzy Reasoning Although we have used the Amazon dataset presented in Method, maintaining feature selection and fuzzy sets un- [Wang et al., 2011] to test and evaluate our work, the evalua- changed.Also the F1 score shows better balance between pre- tion in that paper was related to rating prediction and not clas- cision and recall with GFRM. In this classification task with sification, making any comparison improper. The same can’t two classes only, to consider the entire set of rules of a class be said about the Cornell Movie Reviews. This is a dataset al- is a better approach than use only one rule with the highest ready pre-processed by the authors used in that way for many degree. Hence, GFRM is our choice to achieve better results others papers. Hence, we compare our results with those pa- in this work. pers that have used the Cornell Movie Reviews dataset. Evaluation of fuzzy sets quantity Our work can be comparable to [Ohana and Tierney, 2009] Through the last subsections, we have seen the results using and [Taboada et al., 2008] that used strictly the same dataset 3 fuzzy sets to model our linguistic variables. Following the and they aren’t domain dependent as well. They showed 69,35% and 76% of accuracy, respectively. It is important [Ballhysa and Asilkan, 2012] Elton Ballhysa and Ozcan to say that these works do not apply a fuzzy approach and Asilkan. A fuzzy approach for blog opinion mining-an [Taboada et al., 2008] uses many different steps from our application to albanian language. AWERProcedia Infor- work, such as opinion lexicon (they manually created their mation Technology and Computer Science, 1, 2012. own), entire different intensifiers set, among others. [Ohana [Boucher and Osgood, 1969] Jerry Boucher and Charles E and Tierney, 2009], in the other side, uses many things re- Osgood. The pollyanna hypothesis. Journal of Verbal lated to this work, such as Sentiwordnet and many similar Learning and Verbal Behavior, 8(1):1–8, 1969. and equal documents features. We can cite others papers that have used a previous version [Brill, 1995] Eric Brill. Transformation-based error-driven of this movie dataset (that differs in quantity) such as [Ohana learning and natural language processing: A case study et al., 2011] that presented 69,9% of accuracy. Concerning in part-of-speech tagging. Computational linguistics, papers that have presented a fuzzy approach, we couldn’t find 21(4):543–565, 1995. any of them that presented results or anything closely related [Chaovalit and Zhou, 2005] Pimwadee Chaovalit and Lina to this work. Zhou. Movie review mining: A comparison between su- pervised and unsupervised classification approaches. In 5 Conclusion and further works System Sciences, 2005. HICSS’05. Proceedings of the 38th This work proposed and evaluated an automated fuzzy opin- Annual Hawaii International Conference on, pages 112c– ion mining system to classify the overall sentiment orienta- 112c. IEEE, 2005. tion of document’s text. Our proposal uses the Wang-Mendel [Cintra et al., 2008] Marcos Evandro Cintra, CH de Arruda, method [Wang and Mendel, 1992] to generate fuzzy rules and Maria Carolina Monard. Fuzzy feature subset selec- based on most fitted features among almost 70 features that tion using the wang & mendel method. In Hybrid Intelli- were extracted and selected from documents. We achieved gent Systems, 2008. HIS’08. Eighth International Confer- promising results, reaching 71,25% of accuracy in a 10-fold ence on, pages 590–595. IEEE, 2008. cross-validation. [Cordon et al., 1999] Oscar Cordon, Maria Jose del Jesus, Our work is probably the first one to apply Fuzzy Logic and Wang-Mendel method in opinion mining, evidencing re- and Francisco Herrera. A proposal on reasoning methods sults on datasets from previous works. Besides, our results in fuzzy rule-based classification systems. International are comparable to previous works that applies non fuzzy tech- Journal of Approximate Reasoning, 20(1):21–45, 1999. niques. Also, we classified documents with human readable [Esuli and Sebastiani, 2006] Andrea Esuli and Fabrizio Se- rules using simple fuzzy sets, such as low, high, positive and bastiani. Sentiwordnet: A publicly available lexical re- negative. We contribute as well in the investigation of features source for opinion mining. In Proceedings of LREC, vol- that can be relevant to describe and discriminate documents. ume 6, pages 417–422, 2006. We have reported initial results from a ongoing research. [Guerini et al., 2013] Marco Guerini, Lorenzo Gatti, and As future works, we have many improvements points such Marco Turchi. Sentiment analysis: How to derive as: prior polarities from sentiwordnet. arXiv preprint • Build a better set of intensifiers and evaluate their influ- arXiv:1309.5843, 2013. ence in final results; [Hall, 1999] Mark A Hall. Correlation-based feature selec- • Improve negation detection and how to better apply it; tion for machine learning. PhD thesis, The University of • Improve how the fuzzy sets are modeled to inputs from Waikato, 1999. document features; [Karamibekr and Ghorbani, 2012] Mostafa Karamibekr and • Investigate more features that could represent and better Ali A Ghorbani. Verb oriented sentiment classification. distinguish documents; In Web Intelligence and Intelligent Agent Technology (WI- IAT), 2012 IEEE/WIC/ACM International Conferences on, • Experiment with other feature selection techniques, to volume 1, pages 327–331. IEEE, 2012. investigate the influence of the selected features on fuzzy rules generation. [Kazmier, 2004] Leonard J Kazmier. Schaum’s outline of business statistics. McGraw-Hill, 2004. References [Liu, 2012] Bing Liu. Sentiment analysis and opinion min- [Alistair and Diana, 2005] Kennedy Alistair and Inkpen Di- ing. Synthesis Lectures on Human Language Technolo- gies, 5(1):1–167, 2012. ana. Sentiment classification of movie and product re- views using contextual valence shifters. Proceedings of [Moraes et al., 2012] Rodrigo Moraes, João Francisco FINEXIN, 2005. Valiati, and Wilson P GaviãO Neto. Document-level sen- [Baccianella et al., 2010] Stefano Baccianella, Andrea Esuli, timent classification: An empirical comparison between svm and ann. Expert Systems with Applications, 2012. and Fabrizio Sebastiani. Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion min- [Nadali et al., 2010] S Nadali, MAA Murad, and RA Kadir. ing. In LREC, volume 10, pages 2200–2204, 2010. Sentiment classification of customer reviews based on fuzzy logic. In Information Technology (ITSim), 2010 In- [Wiebe, 1990] Janyce M Wiebe. Identifying subjective char- ternational Symposium in, volume 2, pages 1037–1044. acters in narrative. In Proceedings of the 13th conference IEEE, 2010. on Computational linguistics-Volume 2, pages 401–406. [Ohana and Tierney, 2009] Bruno Ohana and Brendan Tier- Association for Computational Linguistics, 1990. ney. Sentiment classification of reviews using sentiword- [Wiebe, 1994] Janyce M Wiebe. Tracking point of view net. In 9th. IT & T Conference, page 13, 2009. in narrative. Computational Linguistics, 20(2):233–287, [Ohana et al., 2011] Bruno Ohana, Brendan Tierney, and 1994. S Delany. Domain independent sentiment classification [Wilson et al., 2004] Theresa Wilson, Janyce Wiebe, and with many lexicons. In Advanced Information Network- Rebecca Hwa. Just how mad are you? finding strong and ing and Applications (WAINA), 2011 IEEE Workshops of weak opinion clauses. In aaai, volume 4, pages 761–769, International Conference on, pages 632–637. IEEE, 2011. 2004. [Pang and Lee, 2004] Bo Pang and Lillian Lee. A sentimen- [Wilson et al., 2005] Theresa Wilson, Janyce Wiebe, and tal education: Sentiment analysis using subjectivity sum- Paul Hoffmann. Recognizing contextual polarity in marization based on minimum cuts. In Proceedings of the phrase-level sentiment analysis. In Proceedings of the 42nd annual meeting on Association for Computational conference on human language technology and empirical Linguistics, page 271. Association for Computational Lin- methods in natural language processing, pages 347–354. guistics, 2004. Association for Computational Linguistics, 2005. [Pang et al., 2002] Bo Pang, Lillian Lee, and Shivakumar [Zadeh, 1965] Lotfi A Zadeh. Fuzzy sets. Information and Vaithyanathan. Thumbs up?: sentiment classification us- control, 8(3):338–353, 1965. ing machine learning techniques. In Proceedings of the [Zadeh, 1996] Lotfi A Zadeh. Fuzzy logic= computing with ACL-02 conference on Empirical methods in natural lan- words. Fuzzy Systems, IEEE Transactions on, 4(2):103– guage processing-Volume 10, pages 79–86. Association 111, 1996. for Computational Linguistics, 2002. [Quinlan, 1993] RC Quinlan. 4.5: Programs for machine learning morgan kaufmann publishers inc. San Francisco, USA, 1993. [Quirk et al., 1985] Randolph Quirk, David Crystal, and Pearson Education. A comprehensive grammar of the En- glish language, volume 397. Cambridge Univ Press, 1985. [Taboada et al., 2008] Maite Taboada, Kimberly Voll, and Julian Brooke. Extracting sentiment as a function of dis- course structure and topicality. Simon Fraser Univeristy School of Computing Science Technical Report, 2008. [Taboada et al., 2011] Maite Taboada, Julian Brooke, Milan Tofiloski, Kimberly Voll, and Manfred Stede. Lexicon- based methods for sentiment analysis. Computational lin- guistics, 37(2):267–307, 2011. [Turney, 2002] Peter D Turney. Thumbs up or thumbs down?: semantic orientation applied to unsupervised clas- sification of reviews. In Proceedings of the 40th an- nual meeting on association for computational linguistics, pages 417–424. Association for Computational Linguis- tics, 2002. [Wang and Mendel, 1992] L-X Wang and Jerry M Mendel. Generating fuzzy rules by learning from examples. Sys- tems, Man and Cybernetics, IEEE Transactions on, 22(6):1414–1427, 1992. [Wang et al., 2011] Hongning Wang, Yue Lu, and ChengX- iang Zhai. Latent aspect rating analysis without aspect keyword supervision. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discov- ery and data mining, pages 618–626. ACM, 2011. [Wang, 2003] L-X Wang. The wm method completed: a flexible fuzzy system approach to data mining. Fuzzy Sys- tems, IEEE Transactions on, 11(6):768–782, 2003.