Building a fuzzy system for opinion classification across different domains
                     Matheus Cardoso                             Angelo Loula
         State University of Feira de Santana (UEFS)       Matheus Giovanni Pires
           and Federal University of Bahia (UFBA) State University of Feira de Santana (UEFS)
                       Salvador, Brazil                     Feira de Santana, Brazil
                 matheus.mcas@gmail.com               {angelocl, mgpires}@ecomp.uefs.br

                         Abstract                                them in phrases and documents we have to deal with vague
                                                                 and imprecise terms, such as ”good”, ”very nice”, ”bad”,
     Opinions are central in almost all human activi-            among others. Due to the nature of this data, Fuzzy Logic
     ties, because they are a relevant influence on peo-         [Zadeh, 1965] can be a promising approach to deal with this.
     ples behavior. The internet and the web have cre-
                                                                    Given the importance of opinions in human lives, the com-
     ated mechanisms that made possible for people to
                                                                 mercial and political relevance, the huge amount of gener-
     share their opinions and for other people and or-
                                                                 ated data that has to be automatically handled, besides the
     ganizations to find out more about opinions and
                                                                 vague and imprecise nature of the data, this paper aims to pro-
     experiences from individuals and help in decision
                                                                 pose and evaluate an automated fuzzy opinion mining system
     making. Still, opinions involve sentiments that are
                                                                 to classify the overall sentiment orientation of a document’s
     vague and inaccurate textual descriptions. Hence,
                                                                 text. Our proposal differs from others because it generates
     due to data’s nature, Fuzzy Logic can be a promis-
                                                                 fuzzy rules based on most fitted features among almost 70
     ing approach. This paper proposes a fuzzy sys-
                                                                 features that were extracted from documents, introducing the
     tem to perform opinion classification across differ-
                                                                 use of the Wang-Mendel method [Wang and Mendel, 1992].
     ent domains. Almost 70 features were extracted
                                                                 We apply those rules to perform opinion classification across
     from documents and multiple feature selection al-
                                                                 different domains.
     gorithms were applied to select the most fitted fea-
     tures to classify documents. Over the selected fea-            The next section writes about related works, describing
     tures, the Wang-Mendel (WM) method was used to              previous works on opinion mining and applications of fuzzy
     generate fuzzy rules and classify documents. The            logic. The following section outlines the opinion mining pro-
     WM fuzzy system based achieved 71,25% of accu-              cess, specifying all stages involved in opinion mining work
     racy in a 10-fold cross-validation.                         flow. Results from our approach are shown and discussed
                                                                 next. The last section concludes this paper pointing out our
                                                                 contributions and some future improvements to this research.
1   Introduction
Opinions are central in human lives. In almost all people’s      2   Related works
daily tasks, they ask or seek other people’s opinions to help
them make decisions, such as what movie to watch, what car,      The research in opinion mining began with subjectivity de-
book or notebook, for instance, to buy or to know what are       tection, dating back to the late 1990s, with [Wiebe, 1990;
the political standpoint of their neighborhood about a certain   1994]. This task involves separating non-opinionated, neutral
issue. The internet and the web have created mechanisms          and objective sentences from subjective sentences carrying
that made possible for people to share their opinions and also   heavy sentiments. Following the years, starting at 2000s the
organizations to find out more about opinions and experien-      overall research focus has shifted to divide the language units
ces from other individuals, most of them unknown persons.        into three categories: negative, positive and neutral. From
This mechanisms, over the time, have created a huge amount       there many works on this task, also known as sentiment anal-
of opinative sources, hard for a person to process by itself.    ysis or sentiment classification, among other naming, has ar-
Hence, an automated opinion mining system is required, one       rived.
that could identify and extract opinions from text would be an      One of the first research studies on unsupervised opinion
enormous help to someone sifting through the vast amounts        mining was [Turney, 2002]. Similar to the task of classifying
of news and web data [Wilson et al., 2004].                      documents as positive or negative, [Turney, 2002] proposed
   Opinion mining is the process that seeks to predict the       to classify reviews as recommended (thumbs up) or not rec-
overall sentiment orientation conveyed in a piece of text such   ommended (thumbs down). The classification of a review is
as a user review of a movie or product, blog post or editorial   predicted by the average semantic orientation of the phrases
[Ohana et al., 2011]. Attached to opinions, there are senti-     in the review that contain adjectives or adverbs. He got a av-
ments. Sentiments are intrinsically subjective and to identify   erage of 74% accuracy across domains.
   On the other hand, [Pang et al., 2002] was one of the first to   opinion mining process that we used, describing each stage
propose using classic machine learning techniques in opinion        and the relevant techniques used on them.
mining. Comparing the performance between Naive Bayes,
Maximum Entropy and Support Vector Machine (SVM), this              3     The opinion mining process
work showed that such techniques produces high accuracy             Our opinion mining process is composed by five stages: do-
levels, achieving 82.9% of accuracy using only isolated words       main definition, preprocessing and transformation, feature
(called unigrams) with SVM. It showed as well that super-           extraction and selection, classification and evaluation. In the
vised techniques shows better results than unsupervised ap-         first stage it is defined what kind of data will be handled by
proaches. However, they are domain dependent, producing             the system and what datasets will be used. We picked up
even poorest results in other kind of data, demanding another       the widely used Cornell Movie Review Data 2.0 [Pang and
training round of the classifier, increasing cost and time to       Lee, 2004] and a mixed dataset containing Amazon products
classify documents.                                                 [Wang et al., 2011], such as camera, mobile phone, TV, lap-
   Related to our work, there are the works from [Wilson et         top, tablet, among others to evaluate our cross domain pro-
al., 2005], [Taboada et al., 2008] and [Ohana and Tierney,          posal.
2009] that use a wide range of document’s features. These
features range from the count of adjectives, adverbs in phrase      3.1    Preprocessing and Transformation
or whole document, tuples of words (called bigrams if two           In the preprocessing stage, data filtering takes place and a
words; trigrams if three words), such as adverbs and ad-            document representation model is built. There are three ba-
jectives, to the sum of polarities and many others features.        sic levels of document analysis: document, sentence and en-
[Taboada et al., 2008] and [Ohana and Tierney, 2009] use a          tities and its aspects [Liu, 2012]. The first level focus on
semantic lexicon, Sentiwordnet [Esuli and Sebastiani, 2006],        classify opinions as positive or negative from the whole doc-
to assign numeric values to word’s semantic orientation. In         ument perspective. The second one seeks to classify opin-
classifying documents as negative or positive, the results ob-      ions of each sentence in a document and the last level looks
tained were 65,7% of accuracy in [Wilson et al., 2005], 80.6%       to classify opinions targeted to aspects of the found enti-
in [Taboada et al., 2008], and 69.35% in [Ohana and Tierney,        ties. We chose the document level analysis [Turney, 2002;
2009].                                                              Pang et al., 2002; Pang and Lee, 2004; Taboada et al., 2008].
   Although it has been shown that Fuzzy Logic is suitable to           As a first step, we remove all the sentences in a document
handle imprecise and vague data [Zadeh, 1996; Wang, 2003],          that has modal words on it, such as ”would”, ”could”, among
we found only a few works applying fuzzy concepts to opin-          others. Modals indicates that the words appearing in a sen-
ion mining, such as fuzzy sets or fuzzy inference systems.          tence might not be reliable for the purposes of sentiment anal-
One of the few papers found was [Nadali et al., 2010]. It           ysis [Taboada et al., 2011]. Next, all words in each document
proposes a fuzzy logic model to perform semantic classifica-        are tagged with its grammatical class using a POS (Part of
tions of customers review into five classes: very weak, weak,       Speech) tagger [Brill, 1995].
moderate, very strong and strong. Also introduces a method-             The document model in our approach is the popular bag-of-
ology that implies use of a fuzzy inference system, fuzzy sets      words model in which a document is represented as a vector,
that models the five classes and manually created IF-THEN           whose entries correspond to individual terms of a vocabulary
rules. However, the paper did not describe results or further       [Moraes et al., 2012]. These terms are called generically as
discussion.                                                         n-grams. They can be unigrams (only one word), bigrams
   Another paper was [Ballhysa and Asilkan, 2012] that pro-         (two words) and trigrams (three words). For each document,
poses a fuzzy approach for discovering the underlying opin-         one n-gram vector leaves to the next step of the process.
ion in entries in blogs, determining the overall polarity. The          We defined 7 types of n-grams: adjectives, adverbs, verbs
authors presented fuzzy concepts such as fuzzy sets and fuzzy       as unigrams ; adverbs with adjectives (e.g. very good) ,
sets operations. They proposed a set of fuzzy measures (from        adverbs with verbs (e.g. truly recommend), adverbs with
counting manually chosen keywords) and a single fuzzy ag-           adverbs as bigrams and one type of trigram, the combina-
gregation of these measures, but a fuzzy inference system is        tion of two adverbs with one adjective (e.g. not very nice)
not used. However, the proposed measures seem to actually           [Pang et al., 2002; Turney, 2002; Taboada et al., 2008;
correspond crisp value, so there is no actual application of        Karamibekr and Ghorbani, 2012].
fuzzy logic. Moreover, there is only a superficial description          We also look for special types of bigrams and trigrams: the
of results, obtained on their own dataset with no comparison        negated n-grams (e.g. not bad, nothing special). This tech-
with other works.                                                   nique is called negation detection and by itself it is a entire
   This paper differs from previous work on applying fuzzy          line of research, going beyond this work scope, but we use a
systems for opinion mining. We model fuzzy variables and            simple version from [Taboada et al., 2011].
build a fuzzy inference system based on document features.              At this point stage, each document was transformed into
We run our tests in datasets already used in previous works,        a n-gram bag-of-words vector. Each n-gram is now associ-
allowing direct comparison. Besides, we propose a feature           ated with a numeric value, an opinion polarity degree, using
extraction and selection stage, where we extract a great num-       an opinion lexicon. Opinion lexicons are resources that as-
ber of features from documents, based on previous works and         sociate words with sentiment orientation [Ohana and Tierney,
extended with our own features, and perform feature selection       2009]. Hence, we decided to use a automatically built opin-
based on different algorithms. The next section presents the        ion lexicon, the Sentiwornet [Baccianella et al., 2010].
   SentiWordNet (SWN) is a lexical resource explicitly de-          need immense annotated training datasets, huge amount of
vised for supporting sentiment classification. SWN provides         time for training and still produces poorest results across do-
positive, negative and objective scores (ranging from 0 to 1)       main without full retraining.
for each sense of English words. Since words can have multi-           Different studies proposed many various features to de-
ple senses, we apply the approach proposed by [Guerini et al.,      scribe or discriminate documents among themselves to iden-
2013], called prior priorities, to derive positive and negative     tify their polarities [Wilson et al., 2005; Ohana and Tierney,
polarity for words.                                                 2009; Taboada et al., 2011]. In order to capture diverse as-
   To determine polarity degrees for bigrams and trigrams,          pects from documents, we decided to extract a great number
we consider adverbs as modifiers, subdivided into amplifiers        of features, so we used features presented in these works and
(e.g. very) and downtoners (e.g. slightly) to increase or de-       derived many others, obtaining a total of 67 features.
crease adjective (unigram) values, respectively [Quirk et al.,         Three kinds of features were defined: sum, count and max-
1985]. Downtoners and amplifiers have sub-levels, each of           imum values. Sum features involves the numerical sum of
them has a modifier value associated, such as -0.5 to ”low-         polarity degrees for different types of n-grams, such as sum
est” downtoners and 0.25 to ”high” amplifiers, among oth-           of adjectives of a document, sum of adverbs, verbs, bigrams
ers sub-levels. The final score s for a bigram is defined by        composed by adverb and adjective, sum of trigrams, among
s(bigram) = s(unigram) + s(unigram) · s(modif ier)                  others. The count features proceeds in a similar way for dif-
and score s for trigram by s(trigram) = s(bigram) +                 ferent types of n-grams, counting the number of positive or
s(bigram) · s(modif ier).                                           negative polarity values.
   The special case among bigrams and trigrams are the                 The maximum values features refer to the maximum value
negated ones. For these, instead of use modifiers, we apply         of a given type of n-gram in a document. For instance, if the
a similar approach made by [Taboada et al., 2011], shifting         maximum absolute value among the unigrams is positive, this
the n-gram polarity to the opposite sign by a fixed amount          feature has the value 1. On the other side, if maximum value
(0.5, empirically defined). [Taboada et al., 2011] has shown        is negative, this feature has the value -1. This feature was
as well that shift polarity is better than just invert the n-gram   obtained for unigrams, bigrams and trigrams.
polarity sign.                                                         More features were derived from the three kinds described
   Other technique was the attenuation by n-gram frequency,         above by applying normalization or subtraction of features.
in which a term polarity is decreased by the number of times        For instance, the difference between positives and negatives
that it appears in the document. The nth appearance of a word       bigrams of a document and the normalized sum of positive
in text will have the new score s’ defined by s0 (word) =           adjectives are one of these derived features.
s(word)/n. The repetition of an adjective, for instance, sug-          After the feature extraction step, vectors of n-grams and
gests that the writer lacks additional substantive commentary,      polarity values are replaced by feature vectors. Each docu-
and is simply using a generic positive word [Taboada et al.,        ment in the dataset is now represented by a 67 size feature
2011]. Also we have used a bias compensation to negative            vector.
term polarities. Lexicon-based sentiment classifiers generally
show a positive bias [Alistair and Diana, 2005], likely the re-     3.3   Feature selection
sult of a universal human tendency to favor positive language
[Boucher and Osgood, 1969]. So, we increased the final n-           This stage is commonly found in opinion mining approaches.
gram degree of any negative expression (after other modifiers       It can make classifiers more efficient/effective by reducing
have applied) by a fixed amount (currently 50%). In the end         feature vector dimensionality, the amount of data to be ana-
of this stage, we have a vector of n-grams associated with          lyzed as well as identifying relevant features to be considered
polarity degrees for each document of the dataset.                  [Moraes et al., 2012]. To choose the features among the ones
                                                                    extracted and reduce the amount of features to be analyzed by
3.2   Feature extraction                                            the classifier, we used two algorithms for feature selection,
                                                                    the Correlation−based Feature Selection (CFS) and feature
In this step, we extract document features from the previ-
                                                                    selection from C4.5 decision tree [Cintra et al., 2008].
ous numerical n-grams vectors to intent to be domain inde-
pendent. We decided this approach, because is effective as             CFS evaluates subsets of features on the basis that a suit-
it takes reviews, checks documents features and decides its         able feature subsets contain features highly correlated with
semantic orientation considering only its characteristics, in-      the classification, yet uncorrelated to each other [Hall, 1999].
stead of its specific contents. The features we use are not spe-    C4.5, in other hand, is an algorithm that generates a decision
cific to a domain, and should be easily applicable to other do-     tree that can be used to a classification task [Quinlan, 1993].
mains [Pang et al., 2002]. Also, this reduces features dimen-       But, to build that tree, c4.5 needs to select the best features
sionality, since the resultant features vector is significantly     among the provided. Hence, we also use c4.5 as our feature
smaller than a regular bag-of-words vector.                         selection algorithm.
   On the other hand, corpus-based machine learning meth-
ods applied to opining mining are able to obtain high accu-         3.4   Classification
racy rates, up to 95%, feeding word vectors directly to classi-     In the classification stage, we build a rule-based fuzzy system
fiers, which will learn from the given document corpus which        classifier to predict the overall sentiment orientation, as pos-
words are related to positive and negative contexts. However,       itive or negative, of each document in the dataset. Building
in order to reach their full potential, most of these approaches    such classifier involves creating a set of rules based on the
extracted features, modeling these features as linguistic vari-   4     Results
ables with fuzzy sets, lastly, defining an inference system.      In this section we describe and discuss our experiments and
   In order for fuzzy sets to appropriately model data, we        its results. We aim to not only compare best classification ac-
first identify outliers values in features and limit the range    curacy but also discuss contexts in which the classifiers pro-
of feature values. To do this, we used the three-sigma rule       duce better or worse results.
[Kazmier, 2004] to select outliers values that lie after three
standard deviations from the mean of a feature, an interval       4.1     Datasets
where 99.73% of the values in a normal distribution stand in.
                                                                  We performed our experiments on two datasets, as described
Outliers values left out this range were modified to the ex-
                                                                  before. Each dataset consists of 2000 reviews that were pre-
treme value of the accepted range.
                                                                  viously classified in terms of the overall orientation as being
   Now, with the input range standardized for every feature,
                                                                  either positive or negative (1000 positive and 1000 negative
we can define the fuzzy sets [Zadeh, 1965] to model our input
                                                                  reviews). For the Amazon dataset, the ground truth was ob-
and output variables. We decided to use triangular fuzzy sets.
                                                                  tained according to the customer 5-stars rating. Reviews with
The first approach was use three fuzzy sets in the input (low,
                                                                  more than 3 stars were defined as being positive and reviews
medium and high) and two sets for the output (negative and
                                                                  with less than 3 stars were labeled as being negative. Re-
positive), uniformly distributed along the feature value range.
                                                                  views with 3 stars were not included in our analysis. In the
Another approach was to use only two sets in the input, re-
                                                                  movie reviews dataset, all documents were already tagged as
moving the medium fuzzy set.
                                                                  positive or negative.
   Once fuzzy variables were modeled from fuzzy sets, the
next step was to build our fuzzy rule base using the Wang-        4.2     Design of experiments
Mendel Fuzzy Rule generation [Wang and Mendel, 1992].
With previously specified fuzzy sets, this fuzzy rule gener-      We focus on comparing GFRM and CFRM varying the con-
ation method takes each data instance in the dataset, deter-      figuration settings and comparing classification accuracy. We
mines pertinence degrees in all fuzzy sets and builds rules       also evaluate the influence of the feature selection algorithms,
using the fuzzy sets with highest pertinence degrees, for each    the inference systems themselves and the quantity of fuzzy
input-output pair.                                                sets in the input system.
   The generated fuzzy rule base along with the specified            For each dataset, we performed the preprocessing, trans-
fuzzy sets are then used by a fuzzy inference mechanism           formation and feature creation stages as we described. But,
to determine document polarity class. The mechanism used          starting at feature selection we performed each stage only on
were the General Fuzzy Reasoning Method (GFRM) and the            training folds. For example, the fold 1 is used as test fold in
Classic Fuzzy Reasoning Method (CFRM) [Cordon et al.,             classification and evaluation stages, but the remaining folds
1999].                                                            are used to feature selection and build the combined fuzzy
   In this classification process, each document feature vector   rule base for that fold. The same process is repeated for the
is evaluated by all fuzzy rules and a compatibility degree is     rest of the folds and our results are reported as the average of
produced for each rule. The CFRM picks up the rule with the       the test folds. Consequently, all kinds of n-grams combined
maximum compatibility degree and assigns the rule output          with all transformation techniques described in this work pass
class to document. In the other side, GFRM takes the maxi-        to feature selection stage to find out which features would be
mum average compatibility degree between the two possible         more fitted to represent documents.
classes, positive and negative. In other words, GFRM cal-         Feature selection algorithms evaluation
culates the average degree among all rules with negative and      To evaluate the feature selection algorithms we start with the
positive output and assigns to document the class from the        following settings: 3 fuzzy sets in the input and CFRM for
maximum average compatibility degree.                             both datasets. Hence, using the two other parameters un-
                                                                  changed, we can evaluate the feature selection algorithms per-
3.5   Evaluation                                                  formance. Besides recall, precision, accuracy and F1 we also
In order to evaluate our opinion classification approach, we      verified the average quantity of selected features for each al-
apply a 10-fold cross-validation. As measures of classifica-      gorithm. Table (1) shows the results from movies and table
tion performance, accuracy, recall, precision and F1 score        (2) from Amazon dataset.
were chosen. Accuracy is a measure of the ratio between
documents that has been classified correctly to the total num-        Movies              CFS                  c4.5
ber of documents being classified. Recall measures the ratio          Precision           55.69% ± 8.52%       82.85% ± 20.00%
of documents correctly classified into a category to the total        Recall              79.40% ± 31.15%      37.7% ± 39.16%
number of documents truly belonging to that category. This            Accuracy            53.5% ± 2.16%        55.7% ± 2.46%
measurement indicates the ability to recall items in the cat-         F1                  59.08% ± 15.84%      35.40% ± 23.04%
egory. Precision measures the ratio of the number of docu-            Features selected   3.5 ± 0.5            1
ments correctly classified into a category to the total number
of documents classified into category. And F1 score is a mea-               Table 1: Results from movie reviews dataset
sure that considers both the Precision and the Recall to com-
pute the score. F1 is often considered as a weighted average         As we can see, feature selection with c4.5 using CFRM and
of the precision and recall. [Chaovalit and Zhou, 2005].          3 fuzzy sets in the input obtained better overall precision and
 Movies               CFS                   c4.5                   decision to pick up c4.5 to reduce the complexity of the rules
 Precision            70.52% ± 8.91%        63.2% ± 18.45%         and make them more human readable, we tried to reduce the
 Recall               69.8% ± 12.86%        73.8% ± 39.17%         fuzzy sets, using only the ”Low” and ”High” fuzzy sets. Table
 Accuracy             68.75% ± 5.91%        53.5% ± 2.09%          (5) shows the obtained results to movies and table (6) from
 F1                   68.79% ± 5.51%        54.56% ± 19.95%        Amazon dataset.
 Features selected    6.2 ± 1.66            1
                                                                                  3 fuzzy sets           2 fuzzy sets
       Table 2: Results from Amazon reviews dataset                 Precision     79.32% ± 15.54%        72.09% ± 4.28%
                                                                    Recall        45.7% ± 31.71%         69.50% ± 8.46%
accuracy in movie reviews dataset, using almost four times          Accuracy      60.9% ± 2.55%          71.25% ± 4.43%
less features. However, the inverse occurs in Amazon dataset,       F1            48.27% ± 16.01%        70.53% ± 5.55%
where CFS with CFRM performs better than c4.5. But, in
Amazon dataset, CFS uses even more features, creating rules        Table 5: Inference systems results from movie reviews
with six antecedents, on average, turning the rules less hu-       dataset
man readable. So, since c4.5 just needed one feature, gener-
ating more readable rules, and also considering accuracy as
the main reference of performance, despite of less balanced                       3 fuzzy sets           2 fuzzy sets
performance showed with lower F1 measure, we decided to             Precision     65.14% ± 15.51%        73.32% ± 3.08%
use c4.5 in both datasets.                                          Recall        75.7% ± 33.27%         62.5% ± 4.58%
                                                                    Accuracy      59.65% ± 1.98%         69.9% ± 3.02%
Inference system evaluation                                         F1            60.97% ± 14.74%        67.43% ± 3.68%
In this subsection we evaluate the performance of the cho-
sen inference systems, CFRM and GFRM. As we did in the             Table 6: Inference systems results from Amazon reviews
last subsection, we fixed the remaining parameters, to bet-        dataset
ter evaluate the inference systems performances, maintaining
the c4.5 algorithm and 3 fuzzy sets in the input. Table (3)           Table (5) shows that the accuracy and F1 were signifi-
shows results from movie reviews and table (4) from Ama-           cantly improved by removing a fuzzy set, more specifically
zon dataset.                                                       the ”medium”, remaining the ”low” and ”high” fuzzy sets.
              CFRM                   GFRM                          Also, between movies and amazon datasets, even though very
 Precision    82.85% ± 20.0%         79.32% ± 15.54%               slightly, the best overall results is in movies. This is specially
 Recall       37.7% ± 39.16%         45.7% ± 31.71%                interesting because movie reviews are often reported as the
 Accuracy     55.7% ± 2.46%          60.9% ± 2.55%                 most difficult type of reviews to be classified [Turney, 2002;
 F1           35.40% ± 23.04%        48.27% ± 16.01%               Pang and Lee, 2004; Chaovalit and Zhou, 2005; Ohana and
                                                                   Tierney, 2009].
Table 3: Inference systems results from movies reviews                In both datasets, the single feature selected by c4.5 was the
dataset                                                            difference between the sum of positive and negative unigrams
                                                                   and bigrams composed by adjectives and adverbs. With this
                                                                   only feature, we could classify close to 70% of the movies re-
              CFRM                   GFRM                          views and Amazon reviews with two simple and human read-
 Precision    63.22% ± 18.45%        65.14% ± 15.51%               able rules generated by Wang-Mendel method:
 Recall       73.8% ± 39.17%         75.7% ± 33.27%                  • IF the difference between the sum of positive and nega-
 Accuracy     53.5% ± 2.09%          59.65% ± 1.98%                    tive unigrams and bigrams composed by adjectives and
 F1           54.56% ± 19.95%        60.97% ± 14.74%                   adverbs is HIGH then POLARITY is POSITIVE
Table 4: Inference systems results from Amazon reviews               • IF the difference between the sum of positive and nega-
dataset                                                                tive unigrams and bigrams composed by adjectives and
                                                                       adverbs is LOW then POLARITY is NEGATIVE
   The results shows that General Fuzzy Reasoning Method           More results
improves accuracy over the Classical Fuzzy Reasoning
                                                                   Although we have used the Amazon dataset presented in
Method, maintaining feature selection and fuzzy sets un-           [Wang et al., 2011] to test and evaluate our work, the evalua-
changed.Also the F1 score shows better balance between pre-
                                                                   tion in that paper was related to rating prediction and not clas-
cision and recall with GFRM. In this classification task with
                                                                   sification, making any comparison improper. The same can’t
two classes only, to consider the entire set of rules of a class
                                                                   be said about the Cornell Movie Reviews. This is a dataset al-
is a better approach than use only one rule with the highest
                                                                   ready pre-processed by the authors used in that way for many
degree. Hence, GFRM is our choice to achieve better results
                                                                   others papers. Hence, we compare our results with those pa-
in this work.
                                                                   pers that have used the Cornell Movie Reviews dataset.
Evaluation of fuzzy sets quantity                                     Our work can be comparable to [Ohana and Tierney, 2009]
Through the last subsections, we have seen the results using       and [Taboada et al., 2008] that used strictly the same dataset
3 fuzzy sets to model our linguistic variables. Following the      and they aren’t domain dependent as well. They showed
69,35% and 76% of accuracy, respectively. It is important            [Ballhysa and Asilkan, 2012] Elton Ballhysa and Ozcan
to say that these works do not apply a fuzzy approach and              Asilkan. A fuzzy approach for blog opinion mining-an
[Taboada et al., 2008] uses many different steps from our              application to albanian language. AWERProcedia Infor-
work, such as opinion lexicon (they manually created their             mation Technology and Computer Science, 1, 2012.
own), entire different intensifiers set, among others. [Ohana        [Boucher and Osgood, 1969] Jerry Boucher and Charles E
and Tierney, 2009], in the other side, uses many things re-
                                                                       Osgood. The pollyanna hypothesis. Journal of Verbal
lated to this work, such as Sentiwordnet and many similar
                                                                       Learning and Verbal Behavior, 8(1):1–8, 1969.
and equal documents features.
   We can cite others papers that have used a previous version       [Brill, 1995] Eric Brill. Transformation-based error-driven
of this movie dataset (that differs in quantity) such as [Ohana        learning and natural language processing: A case study
et al., 2011] that presented 69,9% of accuracy. Concerning             in part-of-speech tagging. Computational linguistics,
papers that have presented a fuzzy approach, we couldn’t find          21(4):543–565, 1995.
any of them that presented results or anything closely related       [Chaovalit and Zhou, 2005] Pimwadee Chaovalit and Lina
to this work.                                                          Zhou. Movie review mining: A comparison between su-
                                                                       pervised and unsupervised classification approaches. In
5     Conclusion and further works                                     System Sciences, 2005. HICSS’05. Proceedings of the 38th
This work proposed and evaluated an automated fuzzy opin-              Annual Hawaii International Conference on, pages 112c–
ion mining system to classify the overall sentiment orienta-           112c. IEEE, 2005.
tion of document’s text. Our proposal uses the Wang-Mendel           [Cintra et al., 2008] Marcos Evandro Cintra, CH de Arruda,
method [Wang and Mendel, 1992] to generate fuzzy rules                  and Maria Carolina Monard. Fuzzy feature subset selec-
based on most fitted features among almost 70 features that             tion using the wang & mendel method. In Hybrid Intelli-
were extracted and selected from documents. We achieved                 gent Systems, 2008. HIS’08. Eighth International Confer-
promising results, reaching 71,25% of accuracy in a 10-fold             ence on, pages 590–595. IEEE, 2008.
cross-validation.
                                                                     [Cordon et al., 1999] Oscar Cordon, Maria Jose del Jesus,
   Our work is probably the first one to apply Fuzzy Logic
and Wang-Mendel method in opinion mining, evidencing re-               and Francisco Herrera. A proposal on reasoning methods
sults on datasets from previous works. Besides, our results            in fuzzy rule-based classification systems. International
are comparable to previous works that applies non fuzzy tech-          Journal of Approximate Reasoning, 20(1):21–45, 1999.
niques. Also, we classified documents with human readable            [Esuli and Sebastiani, 2006] Andrea Esuli and Fabrizio Se-
rules using simple fuzzy sets, such as low, high, positive and          bastiani. Sentiwordnet: A publicly available lexical re-
negative. We contribute as well in the investigation of features        source for opinion mining. In Proceedings of LREC, vol-
that can be relevant to describe and discriminate documents.            ume 6, pages 417–422, 2006.
   We have reported initial results from a ongoing research.         [Guerini et al., 2013] Marco Guerini, Lorenzo Gatti, and
As future works, we have many improvements points such
                                                                       Marco Turchi.        Sentiment analysis: How to derive
as:
                                                                       prior polarities from sentiwordnet.      arXiv preprint
    • Build a better set of intensifiers and evaluate their influ-     arXiv:1309.5843, 2013.
      ence in final results;                                         [Hall, 1999] Mark A Hall. Correlation-based feature selec-
    • Improve negation detection and how to better apply it;           tion for machine learning. PhD thesis, The University of
    • Improve how the fuzzy sets are modeled to inputs from            Waikato, 1999.
      document features;                                             [Karamibekr and Ghorbani, 2012] Mostafa Karamibekr and
    • Investigate more features that could represent and better        Ali A Ghorbani. Verb oriented sentiment classification.
      distinguish documents;                                           In Web Intelligence and Intelligent Agent Technology (WI-
                                                                       IAT), 2012 IEEE/WIC/ACM International Conferences on,
    • Experiment with other feature selection techniques, to           volume 1, pages 327–331. IEEE, 2012.
      investigate the influence of the selected features on fuzzy
      rules generation.                                              [Kazmier, 2004] Leonard J Kazmier. Schaum’s outline of
                                                                       business statistics. McGraw-Hill, 2004.
References                                                           [Liu, 2012] Bing Liu. Sentiment analysis and opinion min-
[Alistair and Diana, 2005] Kennedy Alistair and Inkpen Di-              ing. Synthesis Lectures on Human Language Technolo-
                                                                        gies, 5(1):1–167, 2012.
  ana. Sentiment classification of movie and product re-
  views using contextual valence shifters. Proceedings of            [Moraes et al., 2012] Rodrigo Moraes, João Francisco
  FINEXIN, 2005.                                                       Valiati, and Wilson P GaviãO Neto. Document-level sen-
[Baccianella et al., 2010] Stefano Baccianella, Andrea Esuli,          timent classification: An empirical comparison between
                                                                       svm and ann. Expert Systems with Applications, 2012.
  and Fabrizio Sebastiani. Sentiwordnet 3.0: An enhanced
  lexical resource for sentiment analysis and opinion min-           [Nadali et al., 2010] S Nadali, MAA Murad, and RA Kadir.
  ing. In LREC, volume 10, pages 2200–2204, 2010.                      Sentiment classification of customer reviews based on
   fuzzy logic. In Information Technology (ITSim), 2010 In-     [Wiebe, 1990] Janyce M Wiebe. Identifying subjective char-
   ternational Symposium in, volume 2, pages 1037–1044.            acters in narrative. In Proceedings of the 13th conference
   IEEE, 2010.                                                     on Computational linguistics-Volume 2, pages 401–406.
[Ohana and Tierney, 2009] Bruno Ohana and Brendan Tier-            Association for Computational Linguistics, 1990.
   ney. Sentiment classification of reviews using sentiword-    [Wiebe, 1994] Janyce M Wiebe. Tracking point of view
   net. In 9th. IT &amp; T Conference, page 13, 2009.              in narrative. Computational Linguistics, 20(2):233–287,
[Ohana et al., 2011] Bruno Ohana, Brendan Tierney, and             1994.
   S Delany. Domain independent sentiment classification        [Wilson et al., 2004] Theresa Wilson, Janyce Wiebe, and
   with many lexicons. In Advanced Information Network-            Rebecca Hwa. Just how mad are you? finding strong and
   ing and Applications (WAINA), 2011 IEEE Workshops of            weak opinion clauses. In aaai, volume 4, pages 761–769,
   International Conference on, pages 632–637. IEEE, 2011.         2004.
[Pang and Lee, 2004] Bo Pang and Lillian Lee. A sentimen-       [Wilson et al., 2005] Theresa Wilson, Janyce Wiebe, and
   tal education: Sentiment analysis using subjectivity sum-       Paul Hoffmann.        Recognizing contextual polarity in
   marization based on minimum cuts. In Proceedings of the         phrase-level sentiment analysis. In Proceedings of the
   42nd annual meeting on Association for Computational            conference on human language technology and empirical
   Linguistics, page 271. Association for Computational Lin-       methods in natural language processing, pages 347–354.
   guistics, 2004.                                                 Association for Computational Linguistics, 2005.
[Pang et al., 2002] Bo Pang, Lillian Lee, and Shivakumar        [Zadeh, 1965] Lotfi A Zadeh. Fuzzy sets. Information and
   Vaithyanathan. Thumbs up?: sentiment classification us-         control, 8(3):338–353, 1965.
   ing machine learning techniques. In Proceedings of the       [Zadeh, 1996] Lotfi A Zadeh. Fuzzy logic= computing with
   ACL-02 conference on Empirical methods in natural lan-          words. Fuzzy Systems, IEEE Transactions on, 4(2):103–
   guage processing-Volume 10, pages 79–86. Association            111, 1996.
   for Computational Linguistics, 2002.
[Quinlan, 1993] RC Quinlan. 4.5: Programs for machine
   learning morgan kaufmann publishers inc. San Francisco,
   USA, 1993.
[Quirk et al., 1985] Randolph Quirk, David Crystal, and
   Pearson Education. A comprehensive grammar of the En-
   glish language, volume 397. Cambridge Univ Press, 1985.
[Taboada et al., 2008] Maite Taboada, Kimberly Voll, and
   Julian Brooke. Extracting sentiment as a function of dis-
   course structure and topicality. Simon Fraser Univeristy
   School of Computing Science Technical Report, 2008.
[Taboada et al., 2011] Maite Taboada, Julian Brooke, Milan
   Tofiloski, Kimberly Voll, and Manfred Stede. Lexicon-
   based methods for sentiment analysis. Computational lin-
   guistics, 37(2):267–307, 2011.
[Turney, 2002] Peter D Turney. Thumbs up or thumbs
   down?: semantic orientation applied to unsupervised clas-
   sification of reviews. In Proceedings of the 40th an-
   nual meeting on association for computational linguistics,
   pages 417–424. Association for Computational Linguis-
   tics, 2002.
[Wang and Mendel, 1992] L-X Wang and Jerry M Mendel.
   Generating fuzzy rules by learning from examples. Sys-
   tems, Man and Cybernetics, IEEE Transactions on,
   22(6):1414–1427, 1992.
[Wang et al., 2011] Hongning Wang, Yue Lu, and ChengX-
   iang Zhai. Latent aspect rating analysis without aspect
   keyword supervision. In Proceedings of the 17th ACM
   SIGKDD international conference on Knowledge discov-
   ery and data mining, pages 618–626. ACM, 2011.
[Wang, 2003] L-X Wang. The wm method completed: a
   flexible fuzzy system approach to data mining. Fuzzy Sys-
   tems, IEEE Transactions on, 11(6):768–782, 2003.