<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Building a fuzzy system for opinion classification across different domains</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Matheus</forename><surname>Cardoso</surname></persName>
							<email>matheus.mcas@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="department">de Santana (UEFS)</orgName>
								<orgName type="institution" key="instit1">State University of Feira</orgName>
								<orgName type="institution" key="instit2">Federal University of Bahia (UFBA)</orgName>
								<address>
									<settlement>Salvador</settlement>
									<country key="BR">Brazil</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Angelo</forename><forename type="middle">Loula</forename><surname>Matheus</surname></persName>
							<email>angelocl@ecomp.uefs.br</email>
							<affiliation key="aff1">
								<orgName type="institution">State University of Feira de Santana (UEFS) Feira de Santana</orgName>
								<address>
									<country key="BR">Brazil</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Giovanni</forename><surname>Pires</surname></persName>
							<email>mgpires@ecomp.uefs.br</email>
							<affiliation key="aff1">
								<orgName type="institution">State University of Feira de Santana (UEFS) Feira de Santana</orgName>
								<address>
									<country key="BR">Brazil</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Building a fuzzy system for opinion classification across different domains</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">34A5149589EE48C32B89B02CAA23F36D</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T22:53+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Opinions are central in almost all human activities, because they are a relevant influence on peoples behavior. The internet and the web have created mechanisms that made possible for people to share their opinions and for other people and organizations to find out more about opinions and experiences from individuals and help in decision making. Still, opinions involve sentiments that are vague and inaccurate textual descriptions. Hence, due to data's nature, Fuzzy Logic can be a promising approach. This paper proposes a fuzzy system to perform opinion classification across different domains. Almost 70 features were extracted from documents and multiple feature selection algorithms were applied to select the most fitted features to classify documents. Over the selected features, the Wang-Mendel (WM) method was used to generate fuzzy rules and classify documents. The WM fuzzy system based achieved 71,25% of accuracy in a 10-fold cross-validation.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Opinions are central in human lives. In almost all people's daily tasks, they ask or seek other people's opinions to help them make decisions, such as what movie to watch, what car, book or notebook, for instance, to buy or to know what are the political standpoint of their neighborhood about a certain issue. The internet and the web have created mechanisms that made possible for people to share their opinions and also organizations to find out more about opinions and experiences from other individuals, most of them unknown persons. This mechanisms, over the time, have created a huge amount of opinative sources, hard for a person to process by itself. Hence, an automated opinion mining system is required, one that could identify and extract opinions from text would be an enormous help to someone sifting through the vast amounts of news and web data <ref type="bibr">[Wilson et al., 2004]</ref>.</p><p>Opinion mining is the process that seeks to predict the overall sentiment orientation conveyed in a piece of text such as a user review of a movie or product, blog post or editorial <ref type="bibr" target="#b5">[Ohana et al., 2011]</ref>. Attached to opinions, there are sentiments. Sentiments are intrinsically subjective and to identify them in phrases and documents we have to deal with vague and imprecise terms, such as "good", "very nice", "bad", among others. Due to the nature of this data, Fuzzy Logic <ref type="bibr" target="#b9">[Zadeh, 1965]</ref> can be a promising approach to deal with this.</p><p>Given the importance of opinions in human lives, the commercial and political relevance, the huge amount of generated data that has to be automatically handled, besides the vague and imprecise nature of the data, this paper aims to propose and evaluate an automated fuzzy opinion mining system to classify the overall sentiment orientation of a document's text. Our proposal differs from others because it generates fuzzy rules based on most fitted features among almost 70 features that were extracted from documents, introducing the use of the Wang-Mendel method <ref type="bibr" target="#b7">[Wang and Mendel, 1992]</ref>. We apply those rules to perform opinion classification across different domains.</p><p>The next section writes about related works, describing previous works on opinion mining and applications of fuzzy logic. The following section outlines the opinion mining process, specifying all stages involved in opinion mining work flow. Results from our approach are shown and discussed next. The last section concludes this paper pointing out our contributions and some future improvements to this research.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related works</head><p>The research in opinion mining began with subjectivity detection, dating back to the late 1990s, with <ref type="bibr" target="#b8">[Wiebe, 1990;</ref><ref type="bibr" target="#b9">1994]</ref>. This task involves separating non-opinionated, neutral and objective sentences from subjective sentences carrying heavy sentiments. Following the years, starting at 2000s the overall research focus has shifted to divide the language units into three categories: negative, positive and neutral. From there many works on this task, also known as sentiment analysis or sentiment classification, among other naming, has arrived.</p><p>One of the first research studies on unsupervised opinion mining was <ref type="bibr" target="#b7">[Turney, 2002]</ref>. Similar to the task of classifying documents as positive or negative, <ref type="bibr" target="#b7">[Turney, 2002]</ref> proposed to classify reviews as recommended (thumbs up) or not recommended (thumbs down). The classification of a review is predicted by the average semantic orientation of the phrases in the review that contain adjectives or adverbs. He got a average of 74% accuracy across domains.</p><p>On the other hand, <ref type="bibr" target="#b5">[Pang et al., 2002]</ref> was one of the first to propose using classic machine learning techniques in opinion mining. Comparing the performance between Naive Bayes, Maximum Entropy and Support Vector Machine (SVM), this work showed that such techniques produces high accuracy levels, achieving 82.9% of accuracy using only isolated words (called unigrams) with SVM. It showed as well that supervised techniques shows better results than unsupervised approaches. However, they are domain dependent, producing even poorest results in other kind of data, demanding another training round of the classifier, increasing cost and time to classify documents.</p><p>Related to our work, there are the works from <ref type="bibr" target="#b9">[Wilson et al., 2005]</ref>, <ref type="bibr" target="#b6">[Taboada et al., 2008]</ref> and <ref type="bibr" target="#b5">[Ohana and Tierney, 2009]</ref> that use a wide range of document's features. These features range from the count of adjectives, adverbs in phrase or whole document, tuples of words (called bigrams if two words; trigrams if three words), such as adverbs and adjectives, to the sum of polarities and many others features. <ref type="bibr" target="#b6">[Taboada et al., 2008]</ref> and <ref type="bibr" target="#b5">[Ohana and Tierney, 2009</ref>] use a semantic lexicon, Sentiwordnet <ref type="bibr">[Esuli and Sebastiani, 2006]</ref>, to assign numeric values to word's semantic orientation. In classifying documents as negative or positive, the results obtained were 65,7% of accuracy in <ref type="bibr" target="#b9">[Wilson et al., 2005]</ref>, 80.6% in <ref type="bibr" target="#b6">[Taboada et al., 2008], and</ref><ref type="bibr">69.35% in [Ohana and</ref><ref type="bibr" target="#b5">Tierney, 2009]</ref>.</p><p>Although it has been shown that Fuzzy Logic is suitable to handle imprecise and vague data <ref type="bibr" target="#b10">[Zadeh, 1996;</ref><ref type="bibr" target="#b8">Wang, 2003]</ref>, we found only a few works applying fuzzy concepts to opinion mining, such as fuzzy sets or fuzzy inference systems. One of the few papers found was <ref type="bibr" target="#b5">[Nadali et al., 2010]</ref>. It proposes a fuzzy logic model to perform semantic classifications of customers review into five classes: very weak, weak, moderate, very strong and strong. Also introduces a methodology that implies use of a fuzzy inference system, fuzzy sets that models the five classes and manually created IF-THEN rules. However, the paper did not describe results or further discussion.</p><p>Another paper was [Ballhysa and Asilkan, 2012] that proposes a fuzzy approach for discovering the underlying opinion in entries in blogs, determining the overall polarity. The authors presented fuzzy concepts such as fuzzy sets and fuzzy sets operations. They proposed a set of fuzzy measures (from counting manually chosen keywords) and a single fuzzy aggregation of these measures, but a fuzzy inference system is not used. However, the proposed measures seem to actually correspond crisp value, so there is no actual application of fuzzy logic. Moreover, there is only a superficial description of results, obtained on their own dataset with no comparison with other works.</p><p>This paper differs from previous work on applying fuzzy systems for opinion mining. We model fuzzy variables and build a fuzzy inference system based on document features. We run our tests in datasets already used in previous works, allowing direct comparison. Besides, we propose a feature extraction and selection stage, where we extract a great number of features from documents, based on previous works and extended with our own features, and perform feature selection based on different algorithms. The next section presents the opinion mining process that we used, describing each stage and the relevant techniques used on them.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">The opinion mining process</head><p>Our opinion mining process is composed by five stages: domain definition, preprocessing and transformation, feature extraction and selection, classification and evaluation. In the first stage it is defined what kind of data will be handled by the system and what datasets will be used. We picked up the widely used Cornell Movie Review Data 2.0 [Pang and <ref type="bibr" target="#b5">Lee, 2004</ref>] and a mixed dataset containing Amazon products <ref type="bibr" target="#b7">[Wang et al., 2011]</ref>, such as camera, mobile phone, TV, laptop, tablet, among others to evaluate our cross domain proposal.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Preprocessing and Transformation</head><p>In the preprocessing stage, data filtering takes place and a document representation model is built. There are three basic levels of document analysis: document, sentence and entities and its aspects <ref type="bibr" target="#b5">[Liu, 2012]</ref>. The first level focus on classify opinions as positive or negative from the whole document perspective. The second one seeks to classify opinions of each sentence in a document and the last level looks to classify opinions targeted to aspects of the found entities. We chose the document level analysis <ref type="bibr" target="#b7">[Turney, 2002;</ref><ref type="bibr" target="#b5">Pang et al., 2002;</ref><ref type="bibr" target="#b5">Pang and Lee, 2004;</ref><ref type="bibr" target="#b6">Taboada et al., 2008]</ref>.</p><p>As a first step, we remove all the sentences in a document that has modal words on it, such as "would", "could", among others. Modals indicates that the words appearing in a sentence might not be reliable for the purposes of sentiment analysis <ref type="bibr" target="#b6">[Taboada et al., 2011]</ref>. Next, all words in each document are tagged with its grammatical class using a POS (Part of Speech) tagger <ref type="bibr" target="#b1">[Brill, 1995]</ref>.</p><p>The document model in our approach is the popular bag-ofwords model in which a document is represented as a vector, whose entries correspond to individual terms of a vocabulary <ref type="bibr" target="#b5">[Moraes et al., 2012]</ref>. These terms are called generically as n-grams. They can be unigrams (only one word), bigrams (two words) and trigrams (three words). For each document, one n-gram vector leaves to the next step of the process.</p><p>We defined 7 types of n-grams: adjectives, adverbs, verbs as unigrams ; adverbs with adjectives (e.g. very good) , adverbs with verbs (e.g. truly recommend), adverbs with adverbs as bigrams and one type of trigram, the combination of two adverbs with one adjective (e.g. not very nice) <ref type="bibr" target="#b5">[Pang et al., 2002;</ref><ref type="bibr" target="#b7">Turney, 2002;</ref><ref type="bibr" target="#b6">Taboada et al., 2008;</ref><ref type="bibr" target="#b3">Karamibekr and Ghorbani, 2012]</ref>.</p><p>We also look for special types of bigrams and trigrams: the negated n-grams (e.g. not bad, nothing special). This technique is called negation detection and by itself it is a entire line of research, going beyond this work scope, but we use a simple version from <ref type="bibr" target="#b6">[Taboada et al., 2011]</ref>.</p><p>At this point stage, each document was transformed into a n-gram bag-of-words vector. Each n-gram is now associated with a numeric value, an opinion polarity degree, using an opinion lexicon. Opinion lexicons are resources that associate words with sentiment orientation <ref type="bibr" target="#b5">[Ohana and Tierney, 2009]</ref>. Hence, we decided to use a automatically built opinion lexicon, the Sentiwornet <ref type="bibr" target="#b0">[Baccianella et al., 2010]</ref>.</p><p>SentiWordNet (SWN) is a lexical resource explicitly devised for supporting sentiment classification. SWN provides positive, negative and objective scores (ranging from 0 to 1) for each sense of English words. Since words can have multiple senses, we apply the approach proposed by <ref type="bibr">[Guerini et al., 2013]</ref>, called prior priorities, to derive positive and negative polarity for words.</p><p>To determine polarity degrees for bigrams and trigrams, we consider adverbs as modifiers, subdivided into amplifiers (e.g. very) and downtoners (e.g. slightly) to increase or decrease adjective (unigram) values, respectively <ref type="bibr" target="#b6">[Quirk et al., 1985]</ref>. Downtoners and amplifiers have sub-levels, each of them has a modifier value associated, such as -0.5 to "lowest" downtoners and 0.25 to "high" amplifiers, among others sub-levels. The final score s for a bigram is defined by</p><formula xml:id="formula_0">s(bigram) = s(unigram) + s(unigram) • s(modif ier) and score s for trigram by s(trigram) = s(bigram) + s(bigram) • s(modif ier).</formula><p>The special case among bigrams and trigrams are the negated ones. For these, instead of use modifiers, we apply a similar approach made by <ref type="bibr" target="#b6">[Taboada et al., 2011]</ref>, shifting the n-gram polarity to the opposite sign by a fixed amount (0.5, empirically defined). <ref type="bibr" target="#b6">[Taboada et al., 2011]</ref> has shown as well that shift polarity is better than just invert the n-gram polarity sign.</p><p>Other technique was the attenuation by n-gram frequency, in which a term polarity is decreased by the number of times that it appears in the document. The nth appearance of a word in text will have the new score s' defined by s (word) = s(word)/n. The repetition of an adjective, for instance, suggests that the writer lacks additional substantive commentary, and is simply using a generic positive word <ref type="bibr" target="#b6">[Taboada et al., 2011]</ref>. Also we have used a bias compensation to negative term polarities. Lexicon-based sentiment classifiers generally show a positive bias <ref type="bibr" target="#b0">[Alistair and Diana, 2005]</ref>, likely the result of a universal human tendency to favor positive language <ref type="bibr" target="#b0">[Boucher and Osgood, 1969]</ref>. So, we increased the final ngram degree of any negative expression (after other modifiers have applied) by a fixed amount (currently 50%). In the end of this stage, we have a vector of n-grams associated with polarity degrees for each document of the dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Feature extraction</head><p>In this step, we extract document features from the previous numerical n-grams vectors to intent to be domain independent. We decided this approach, because is effective as it takes reviews, checks documents features and decides its semantic orientation considering only its characteristics, instead of its specific contents. The features we use are not specific to a domain, and should be easily applicable to other domains <ref type="bibr" target="#b5">[Pang et al., 2002]</ref>. Also, this reduces features dimensionality, since the resultant features vector is significantly smaller than a regular bag-of-words vector.</p><p>On the other hand, corpus-based machine learning methods applied to opining mining are able to obtain high accuracy rates, up to 95%, feeding word vectors directly to classifiers, which will learn from the given document corpus which words are related to positive and negative contexts. However, in order to reach their full potential, most of these approaches need immense annotated training datasets, huge amount of time for training and still produces poorest results across domain without full retraining.</p><p>Different studies proposed many various features to describe or discriminate documents among themselves to identify their polarities <ref type="bibr" target="#b9">[Wilson et al., 2005;</ref><ref type="bibr" target="#b5">Ohana and Tierney, 2009;</ref><ref type="bibr" target="#b6">Taboada et al., 2011]</ref>. In order to capture diverse aspects from documents, we decided to extract a great number of features, so we used features presented in these works and derived many others, obtaining a total of 67 features.</p><p>Three kinds of features were defined: sum, count and maximum values. Sum features involves the numerical sum of polarity degrees for different types of n-grams, such as sum of adjectives of a document, sum of adverbs, verbs, bigrams composed by adverb and adjective, sum of trigrams, among others. The count features proceeds in a similar way for different types of n-grams, counting the number of positive or negative polarity values.</p><p>The maximum values features refer to the maximum value of a given type of n-gram in a document. For instance, if the maximum absolute value among the unigrams is positive, this feature has the value 1. On the other side, if maximum value is negative, this feature has the value -1. This feature was obtained for unigrams, bigrams and trigrams.</p><p>More features were derived from the three kinds described above by applying normalization or subtraction of features. For instance, the difference between positives and negatives bigrams of a document and the normalized sum of positive adjectives are one of these derived features.</p><p>After the feature extraction step, vectors of n-grams and polarity values are replaced by feature vectors. Each document in the dataset is now represented by a 67 size feature vector.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Feature selection</head><p>This stage is commonly found in opinion mining approaches. It can make classifiers more efficient/effective by reducing feature vector dimensionality, the amount of data to be analyzed as well as identifying relevant features to be considered <ref type="bibr" target="#b5">[Moraes et al., 2012]</ref>. To choose the features among the ones extracted and reduce the amount of features to be analyzed by the classifier, we used two algorithms for feature selection, the Correlation−based Feature Selection (CFS) and feature selection from C4.5 decision tree <ref type="bibr" target="#b2">[Cintra et al., 2008]</ref>.</p><p>CFS evaluates subsets of features on the basis that a suitable feature subsets contain features highly correlated with the classification, yet uncorrelated to each other <ref type="bibr" target="#b3">[Hall, 1999]</ref>. C4.5, in other hand, is an algorithm that generates a decision tree that can be used to a classification task <ref type="bibr" target="#b6">[Quinlan, 1993]</ref>. But, to build that tree, c4.5 needs to select the best features among the provided. Hence, we also use c4.5 as our feature selection algorithm.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Classification</head><p>In the classification stage, we build a rule-based fuzzy system classifier to predict the overall sentiment orientation, as positive or negative, of each document in the dataset. Building such classifier involves creating a set of rules based on the extracted features, modeling these features as linguistic variables with fuzzy sets, lastly, defining an inference system.</p><p>In order for fuzzy sets to appropriately model data, we first identify outliers values in features and limit the range of feature values. To do this, we used the three-sigma rule <ref type="bibr" target="#b4">[Kazmier, 2004]</ref> to select outliers values that lie after three standard deviations from the mean of a feature, an interval where 99.73% of the values in a normal distribution stand in. Outliers values left out this range were modified to the extreme value of the accepted range. Now, with the input range standardized for every feature, we can define the fuzzy sets <ref type="bibr" target="#b9">[Zadeh, 1965]</ref> to model our input and output variables. We decided to use triangular fuzzy sets. The first approach was use three fuzzy sets in the input (low, medium and high) and two sets for the output (negative and positive), uniformly distributed along the feature value range. Another approach was to use only two sets in the input, removing the medium fuzzy set.</p><p>Once fuzzy variables were modeled from fuzzy sets, the next step was to build our fuzzy rule base using the Wang-Mendel Fuzzy Rule generation <ref type="bibr" target="#b7">[Wang and Mendel, 1992]</ref>. With previously specified fuzzy sets, this fuzzy rule generation method takes each data instance in the dataset, determines pertinence degrees in all fuzzy sets and builds rules using the fuzzy sets with highest pertinence degrees, for each input-output pair.</p><p>The generated fuzzy rule base along with the specified fuzzy sets are then used by a fuzzy inference mechanism to determine document polarity class. The mechanism used were the General Fuzzy Reasoning Method (GFRM) and the Classic Fuzzy Reasoning Method (CFRM) <ref type="bibr" target="#b2">[Cordon et al., 1999]</ref>.</p><p>In this classification process, each document feature vector is evaluated by all fuzzy rules and a compatibility degree is produced for each rule. The CFRM picks up the rule with the maximum compatibility degree and assigns the rule output class to document. In the other side, GFRM takes the maximum average compatibility degree between the two possible classes, positive and negative. In other words, GFRM calculates the average degree among all rules with negative and positive output and assigns to document the class from the maximum average compatibility degree.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5">Evaluation</head><p>In order to evaluate our opinion classification approach, we apply a 10-fold cross-validation. As measures of classification performance, accuracy, recall, precision and F1 score were chosen. Accuracy is a measure of the ratio between documents that has been classified correctly to the total number of documents being classified. Recall measures the ratio of documents correctly classified into a category to the total number of documents truly belonging to that category. This measurement indicates the ability to recall items in the category. Precision measures the ratio of the number of documents correctly classified into a category to the total number of documents classified into category. And F1 score is a measure that considers both the Precision and the Recall to compute the score. F1 is often considered as a weighted average of the precision and recall. <ref type="bibr" target="#b1">[Chaovalit and Zhou, 2005]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Results</head><p>In this section we describe and discuss our experiments and its results. We aim to not only compare best classification accuracy but also discuss contexts in which the classifiers produce better or worse results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Datasets</head><p>We performed our experiments on two datasets, as described before. Each dataset consists of 2000 reviews that were previously classified in terms of the overall orientation as being either positive or negative (1000 positive and 1000 negative reviews). For the Amazon dataset, the ground truth was obtained according to the customer 5-stars rating. Reviews with more than 3 stars were defined as being positive and reviews with less than 3 stars were labeled as being negative. Reviews with 3 stars were not included in our analysis. In the movie reviews dataset, all documents were already tagged as positive or negative.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Design of experiments</head><p>We focus on comparing GFRM and CFRM varying the configuration settings and comparing classification accuracy. We also evaluate the influence of the feature selection algorithms, the inference systems themselves and the quantity of fuzzy sets in the input system.</p><p>For each dataset, we performed the preprocessing, transformation and feature creation stages as we described. But, starting at feature selection we performed each stage only on training folds. For example, the fold 1 is used as test fold in classification and evaluation stages, but the remaining folds are used to feature selection and build the combined fuzzy rule base for that fold. The same process is repeated for the rest of the folds and our results are reported as the average of the test folds. Consequently, all kinds of n-grams combined with all transformation techniques described in this work pass to feature selection stage to find out which features would be more fitted to represent documents.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Feature selection algorithms evaluation</head><p>To evaluate the feature selection algorithms we start with the following settings: 3 fuzzy sets in the input and CFRM for both datasets. Hence, using the two other parameters unchanged, we can evaluate the feature selection algorithms performance. Besides recall, precision, accuracy and F1 we also verified the average quantity of selected features for each algorithm. accuracy in movie reviews dataset, using almost four times less features. However, the inverse occurs in Amazon dataset, where CFS with CFRM performs better than c4.5. But, in Amazon dataset, CFS uses even more features, creating rules with six antecedents, on average, turning the rules less human readable. So, since c4.5 just needed one feature, generating more readable rules, and also considering accuracy as the main reference of performance, despite of less balanced performance showed with lower F1 measure, we decided to use c4.5 in both datasets.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Inference system evaluation</head><p>In this subsection we evaluate the performance of the chosen inference systems, CFRM and GFRM. As we did in the last subsection, we fixed the remaining parameters, to better evaluate the inference systems performances, maintaining the c4.5 algorithm and 3 fuzzy sets in the input. The results shows that General Fuzzy Reasoning Method improves accuracy over the Classical Fuzzy Reasoning Method, maintaining feature selection and fuzzy sets unchanged.Also the F1 score shows better balance between precision and recall with GFRM. In this classification task with two classes only, to consider the entire set of rules of a class is a better approach than use only one rule with the highest degree. Hence, GFRM is our choice to achieve better results in this work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Evaluation of fuzzy sets quantity</head><p>Through the last subsections, we have seen the results using 3 fuzzy sets to model our linguistic variables. Following the decision to pick up c4.5 to reduce the complexity of the rules and make them more human readable, we tried to reduce the fuzzy sets, using only the "Low" and "High" fuzzy sets.  <ref type="bibr">(5)</ref> shows that the accuracy and F1 were significantly improved by removing a fuzzy set, more specifically the "medium", remaining the "low" and "high" fuzzy sets. Also, between movies and amazon datasets, even though very slightly, the best overall results is in movies. This is specially interesting because movie reviews are often reported as the most difficult type of reviews to be classified <ref type="bibr" target="#b7">[Turney, 2002;</ref><ref type="bibr" target="#b5">Pang and Lee, 2004;</ref><ref type="bibr" target="#b1">Chaovalit and Zhou, 2005;</ref><ref type="bibr" target="#b5">Ohana and Tierney, 2009]</ref>.</p><p>In both datasets, the single feature selected by c4.5 was the difference between the sum of positive and negative unigrams and bigrams composed by adjectives and adverbs. With this only feature, we could classify close to 70% of the movies reviews and Amazon reviews with two simple and human readable rules generated by Wang-Mendel method:</p><p>• IF the difference between the sum of positive and negative unigrams and bigrams composed by adjectives and adverbs is HIGH then POLARITY is POSITIVE • IF the difference between the sum of positive and negative unigrams and bigrams composed by adjectives and adverbs is LOW then POLARITY is NEGATIVE</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>More results</head><p>Although we have used the Amazon dataset presented in <ref type="bibr" target="#b7">[Wang et al., 2011]</ref> to test and evaluate our work, the evaluation in that paper was related to rating prediction and not classification, making any comparison improper. The same can't be said about the Cornell Movie Reviews. This is a dataset already pre-processed by the authors used in that way for many others papers. Hence, we compare our results with those papers that have used the Cornell Movie Reviews dataset.</p><p>Our work can be comparable to <ref type="bibr" target="#b5">[Ohana and Tierney, 2009</ref>] and <ref type="bibr" target="#b6">[Taboada et al., 2008]</ref> that used strictly the same dataset and they aren't domain dependent as well. They showed 69,35% and 76% of accuracy, respectively. It is important to say that these works do not apply a fuzzy approach and <ref type="bibr" target="#b6">[Taboada et al., 2008]</ref> uses many different steps from our work, such as opinion lexicon (they manually created their own), entire different intensifiers set, among others. <ref type="bibr" target="#b5">[Ohana and Tierney, 2009]</ref>, in the other side, uses many things related to this work, such as Sentiwordnet and many similar and equal documents features.</p><p>We can cite others papers that have used a previous version of this movie dataset (that differs in quantity) such as <ref type="bibr" target="#b5">[Ohana et al., 2011]</ref> that presented 69,9% of accuracy. Concerning papers that have presented a fuzzy approach, we couldn't find any of them that presented results or anything closely related to this work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusion and further works</head><p>This work proposed and evaluated an automated fuzzy opinion mining system to classify the overall sentiment orientation of document's text. Our proposal uses the Wang-Mendel method <ref type="bibr" target="#b7">[Wang and Mendel, 1992]</ref> to generate fuzzy rules based on most fitted features among almost 70 features that were extracted and selected from documents. We achieved promising results, reaching 71,25% of accuracy in a 10-fold cross-validation.</p><p>Our work is probably the first one to apply Fuzzy Logic and Wang-Mendel method in opinion mining, evidencing results on datasets from previous works. Besides, our results are comparable to previous works that applies non fuzzy techniques. Also, we classified documents with human readable rules using simple fuzzy sets, such as low, high, positive and negative. We contribute as well in the investigation of features that can be relevant to describe and discriminate documents.</p><p>We have reported initial results from a ongoing research. As future works, we have many improvements points such as:</p><p>• Build a better set of intensifiers and evaluate their influence in final results;</p><p>• Improve negation detection and how to better apply it;</p><p>• Improve how the fuzzy sets are modeled to inputs from document features;</p><p>• Investigate more features that could represent and better distinguish documents;</p><p>• Experiment with other feature selection techniques, to investigate the influence of the selected features on fuzzy rules generation.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Table (1) shows the results from movies and table (2) from Amazon dataset. Results from movie reviews datasetAs we can see, feature selection with c4.5 using CFRM and 3 fuzzy sets in the input obtained better overall precision and</figDesc><table><row><cell>Movies</cell><cell>CFS</cell><cell>c4.5</cell></row><row><cell>Precision</cell><cell>55.69% ± 8.52%</cell><cell>82.85% ± 20.00%</cell></row><row><cell>Recall</cell><cell cols="2">79.40% ± 31.15% 37.7% ± 39.16%</cell></row><row><cell>Accuracy</cell><cell>53.5% ± 2.16%</cell><cell>55.7% ± 2.46%</cell></row><row><cell>F1</cell><cell cols="2">59.08% ± 15.84% 35.40% ± 23.04%</cell></row><row><cell cols="2">Features selected 3.5 ± 0.5</cell><cell>1</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>Results from Amazon reviews dataset</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table (</head><label>(</label><figDesc></figDesc><table><row><cell></cell><cell></cell><cell>3)</cell></row><row><cell cols="3">shows results from movie reviews and table (4) from Ama-</cell></row><row><cell>zon dataset.</cell><cell></cell><cell></cell></row><row><cell></cell><cell>CFRM</cell><cell>GFRM</cell></row><row><cell cols="2">Precision 82.85% ± 20.0%</cell><cell>79.32% ± 15.54%</cell></row><row><cell>Recall</cell><cell>37.7% ± 39.16%</cell><cell>45.7% ± 31.71%</cell></row><row><cell cols="2">Accuracy 55.7% ± 2.46%</cell><cell>60.9% ± 2.55%</cell></row><row><cell>F1</cell><cell cols="2">35.40% ± 23.04% 48.27% ± 16.01%</cell></row><row><cell cols="3">Table 3: Inference systems results from movies reviews</cell></row><row><cell>dataset</cell><cell></cell><cell></cell></row><row><cell></cell><cell>CFRM</cell><cell>GFRM</cell></row><row><cell cols="3">Precision 63.22% ± 18.45% 65.14% ± 15.51%</cell></row><row><cell>Recall</cell><cell>73.8% ± 39.17%</cell><cell>75.7% ± 33.27%</cell></row><row><cell cols="2">Accuracy 53.5% ± 2.09%</cell><cell>59.65% ± 1.98%</cell></row><row><cell>F1</cell><cell cols="2">54.56% ± 19.95% 60.97% ± 14.74%</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4 :</head><label>4</label><figDesc>Inference systems results from Amazon reviews dataset</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table ( 5</head><label>(</label><figDesc>) shows the obtained results to movies and table (6) from Amazon dataset.</figDesc><table><row><cell></cell><cell>3 fuzzy sets</cell><cell>2 fuzzy sets</cell></row><row><cell cols="3">Precision 79.32% ± 15.54% 72.09% ± 4.28%</cell></row><row><cell>Recall</cell><cell>45.7% ± 31.71%</cell><cell>69.50% ± 8.46%</cell></row><row><cell cols="2">Accuracy 60.9% ± 2.55%</cell><cell>71.25% ± 4.43%</cell></row><row><cell>F1</cell><cell cols="2">48.27% ± 16.01% 70.53% ± 5.55%</cell></row><row><cell cols="3">Table 5: Inference systems results from movie reviews</cell></row><row><cell>dataset</cell><cell></cell><cell></cell></row><row><cell></cell><cell>3 fuzzy sets</cell><cell>2 fuzzy sets</cell></row><row><cell cols="3">Precision 65.14% ± 15.51% 73.32% ± 3.08%</cell></row><row><cell>Recall</cell><cell>75.7% ± 33.27%</cell><cell>62.5% ± 4.58%</cell></row><row><cell cols="2">Accuracy 59.65% ± 1.98%</cell><cell>69.9% ± 3.02%</cell></row><row><cell>F1</cell><cell cols="2">60.97% ± 14.74% 67.43% ± 3.68%</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 6 :</head><label>6</label><figDesc>Inference systems results from Amazon reviews dataset</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_6"><head>Table</head><label></label><figDesc></figDesc><table /></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining</title>
		<author>
			<persName><forename type="first">Diana</forename><forename type="middle">;</forename><surname>Alistair</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kennedy</forename><surname>Alistair</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Inkpen</forename><surname>Diana</surname></persName>
		</author>
		<author>
			<persName><forename type="first">;</forename><surname>Baccianella</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of FINEXIN</title>
		<title level="s">AWERProcedia Information Technology and Computer Science</title>
		<editor>
			<persName><forename type="first">Osgood</forename><surname>Boucher</surname></persName>
		</editor>
		<meeting>FINEXIN</meeting>
		<imprint>
			<date type="published" when="1969">2005. 2005. 2010. 2010. 2012. 2012. 1969. 1969</date>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page" from="1" to="8" />
		</imprint>
	</monogr>
	<note>The pollyanna hypothesis</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging</title>
		<author>
			<persName><forename type="first">Eric</forename><surname>Brill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Pimwadee</forename><surname>Brill ; Chaovalit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lina</forename><surname>Chaovalit</surname></persName>
		</author>
		<author>
			<persName><surname>Zhou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">HICSS&apos;05. Proceedings of the 38th Annual Hawaii International Conference on</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="1995">1995. 1995. 2005. 2005</date>
			<biblScope unit="volume">21</biblScope>
			<biblScope unit="page" from="112" to="112" />
		</imprint>
	</monogr>
	<note>System Sciences</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Esuli and Fabrizio Sebastiani. Sentiwordnet: A publicly available lexical resource for opinion mining</title>
		<author>
			<persName><surname>Cintra</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1309.5843</idno>
	</analytic>
	<monogr>
		<title level="m">Sentiment analysis: How to derive prior polarities from sentiwordnet</title>
				<editor>
			<persName><forename type="first">Marco</forename><surname>Guerini</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Lorenzo</forename><surname>Gatti</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Marco</forename><surname>Turchi</surname></persName>
		</editor>
		<meeting><address><addrLine>Francisco Herrera</addrLine></address></meeting>
		<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="1999">2008. 2008. 2008. 1999. 1999. 2006. 2006. 2013. 2013</date>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="page" from="417" to="422" />
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
	<note>Eighth International Conference on</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Correlation-based feature selection for machine learning</title>
		<author>
			<persName><forename type="first">Mark</forename><forename type="middle">A</forename><surname>Hall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">;</forename><surname>Hall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ghorbani</forename><surname>Karamibekr</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Web Intelligence and Intelligent Agent Technology (WI-IAT)</title>
				<editor>
			<persName><forename type="first">Mostafa</forename><surname>Karamibekr</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Ali</forename><forename type="middle">A</forename><surname>Ghorbani</surname></persName>
		</editor>
		<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="1999">1999. 1999. 2012. 2012. 2012</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="327" to="331" />
		</imprint>
		<respStmt>
			<orgName>The University of Waikato</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">PhD thesis</note>
	<note>IEEE/WIC/ACM International Conferences on</note>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<author>
			<persName><forename type="first">Leonard</forename><forename type="middle">J</forename><surname>Kazmier</surname></persName>
		</author>
		<author>
			<persName><surname>Kazmier</surname></persName>
		</author>
		<title level="m">Schaum&apos;s outline of business statistics</title>
				<imprint>
			<publisher>McGraw-Hill</publisher>
			<date type="published" when="2004">2004. 2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts</title>
		<author>
			<persName><forename type="first">Bing</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">;</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><surname>Moraes</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advanced Information Networking and Applications (WAINA), 2011 IEEE Workshops of International Conference on</title>
				<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2002">2012. 2012. 2012. 2012. 2010. 2010. 2010. 2009. 2009. 2011. 2011. 2004. 2004. 2002. 2002</date>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="page" from="79" to="86" />
		</imprint>
	</monogr>
	<note>Proceedings of the ACL-02 conference on Empirical methods in natural language processing-</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">5: Programs for machine learning morgan kaufmann publishers inc</title>
		<author>
			<persName><surname>Quinlan ; Quirk</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Randolph Quirk, David Crystal, and Pearson Education. A comprehensive grammar of the English language</title>
				<meeting><address><addrLine>San Francisco, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Cambridge Univ Press</publisher>
			<date type="published" when="1985">1993. 1993. 1985. 1985. 2008. 2011. 2011</date>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page" from="267" to="307" />
		</imprint>
		<respStmt>
			<orgName>Simon Fraser Univeristy School of Computing Science Technical Report</orgName>
		</respStmt>
	</monogr>
	<note>RC Quinlan</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews</title>
		<author>
			<persName><forename type="first">Peter</forename><forename type="middle">D</forename><surname>Turney</surname></persName>
		</author>
		<author>
			<persName><forename type="first">;</forename><surname>Turney</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mendel ; L-X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jerry</forename><forename type="middle">M</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><surname>Mendel</surname></persName>
		</author>
		<author>
			<persName><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining</title>
				<meeting>the 17th ACM SIGKDD international conference on Knowledge discovery and data mining</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="1992">2002. 2002. 1992. 1992. 2011</date>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="page" from="618" to="626" />
		</imprint>
	</monogr>
	<note>Proceedings of the 40th annual meeting on association for computational linguistics</note>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">The wm method completed: a flexible fuzzy system approach to data mining. Fuzzy Systems</title>
		<author>
			<persName><forename type="first">;</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L-X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">;</forename><surname>Wiebe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Janyce</forename><forename type="middle">M</forename><surname>Wiebe</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 13th conference on Computational linguistics</title>
				<meeting>the 13th conference on Computational linguistics</meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="1990">2003. 2003. 1990. 1990</date>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="401" to="406" />
		</imprint>
	</monogr>
	<note>Identifying subjective characters in narrative</note>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Recognizing contextual polarity in phrase-level sentiment analysis</title>
		<author>
			<persName><forename type="first">Janyce</forename><forename type="middle">M</forename><surname>Wiebe</surname></persName>
		</author>
		<author>
			<persName><surname>Wiebe ; Wilson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the conference on human language technology and empirical methods in natural language processing</title>
				<meeting>the conference on human language technology and empirical methods in natural language processing</meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="1965">1994. 1994. 2004. 2004. 2005. 2005. 1965. 1965</date>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="page" from="338" to="353" />
		</imprint>
	</monogr>
	<note>Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Fuzzy logic= computing with words. Fuzzy Systems</title>
		<author>
			<persName><forename type="first">Lotfi</forename><forename type="middle">A</forename><surname>Zadeh</surname></persName>
		</author>
		<author>
			<persName><surname>Zadeh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="103" to="111" />
			<date type="published" when="1996">1996. 1996</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
