<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Jerry Boucher and Charles E
Osgood. The pollyanna hypothesis. Journal of Verbal
Learning and Verbal Behavior</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Building a fuzzy system for opinion classification across different domains</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Matheus Cardoso</string-name>
          <email>matheus.mcas@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Angelo Loula</string-name>
          <email>fangelocl, mgpiresg@ecomp.uefs.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Matheus Giovanni Pires, State University of Feira de Santana (UEFS)</institution>
          ,
          <addr-line>Feira de Santana</addr-line>
          ,
          <country country="BR">Brazil</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>State University of Feira de Santana (UEFS), and Federal University of Bahia (UFBA)</institution>
          ,
          <addr-line>Salvador</addr-line>
          ,
          <country country="BR">Brazil</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1969</year>
      </pub-date>
      <volume>8</volume>
      <issue>1</issue>
      <fpage>1</fpage>
      <lpage>8</lpage>
      <abstract>
        <p>Opinions are central in almost all human activities, because they are a relevant influence on peoples behavior. The internet and the web have created mechanisms that made possible for people to share their opinions and for other people and organizations to find out more about opinions and experiences from individuals and help in decision making. Still, opinions involve sentiments that are vague and inaccurate textual descriptions. Hence, due to data's nature, Fuzzy Logic can be a promising approach. This paper proposes a fuzzy system to perform opinion classification across different domains. Almost 70 features were extracted from documents and multiple feature selection algorithms were applied to select the most fitted features to classify documents. Over the selected features, the Wang-Mendel (WM) method was used to generate fuzzy rules and classify documents. The WM fuzzy system based achieved 71,25% of accuracy in a 10-fold cross-validation.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Opinions are central in human lives. In almost all people’s
daily tasks, they ask or seek other people’s opinions to help
them make decisions, such as what movie to watch, what car,
book or notebook, for instance, to buy or to know what are
the political standpoint of their neighborhood about a certain
issue. The internet and the web have created mechanisms
that made possible for people to share their opinions and also
organizations to find out more about opinions and
experiences from other individuals, most of them unknown persons.
This mechanisms, over the time, have created a huge amount
of opinative sources, hard for a person to process by itself.
Hence, an automated opinion mining system is required, one
that could identify and extract opinions from text would be an
enormous help to someone sifting through the vast amounts
of news and web data [Wilson et al., 2004].</p>
      <p>Opinion mining is the process that seeks to predict the
overall sentiment orientation conveyed in a piece of text such
as a user review of a movie or product, blog post or editorial
[Ohana et al., 2011]. Attached to opinions, there are
sentiments. Sentiments are intrinsically subjective and to identify
them in phrases and documents we have to deal with vague
and imprecise terms, such as ”good”, ”very nice”, ”bad”,
among others. Due to the nature of this data, Fuzzy Logic
[Zadeh, 1965] can be a promising approach to deal with this.</p>
      <p>Given the importance of opinions in human lives, the
commercial and political relevance, the huge amount of
generated data that has to be automatically handled, besides the
vague and imprecise nature of the data, this paper aims to
propose and evaluate an automated fuzzy opinion mining system
to classify the overall sentiment orientation of a document’s
text. Our proposal differs from others because it generates
fuzzy rules based on most fitted features among almost 70
features that were extracted from documents, introducing the
use of the Wang-Mendel method [Wang and Mendel, 1992].
We apply those rules to perform opinion classification across
different domains.</p>
      <p>The next section writes about related works, describing
previous works on opinion mining and applications of fuzzy
logic. The following section outlines the opinion mining
process, specifying all stages involved in opinion mining work
flow. Results from our approach are shown and discussed
next. The last section concludes this paper pointing out our
contributions and some future improvements to this research.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related works</title>
      <p>The research in opinion mining began with subjectivity
detection, dating back to the late 1990s, with [Wiebe, 1990;
1994]. This task involves separating non-opinionated, neutral
and objective sentences from subjective sentences carrying
heavy sentiments. Following the years, starting at 2000s the
overall research focus has shifted to divide the language units
into three categories: negative, positive and neutral. From
there many works on this task, also known as sentiment
analysis or sentiment classification, among other naming, has
arrived.</p>
      <p>One of the first research studies on unsupervised opinion
mining was [Turney, 2002]. Similar to the task of classifying
documents as positive or negative, [Turney, 2002] proposed
to classify reviews as recommended (thumbs up) or not
recommended (thumbs down). The classification of a review is
predicted by the average semantic orientation of the phrases
in the review that contain adjectives or adverbs. He got a
average of 74% accuracy across domains.</p>
      <p>On the other hand, [Pang et al., 2002] was one of the first to
propose using classic machine learning techniques in opinion
mining. Comparing the performance between Naive Bayes,
Maximum Entropy and Support Vector Machine (SVM), this
work showed that such techniques produces high accuracy
levels, achieving 82.9% of accuracy using only isolated words
(called unigrams) with SVM. It showed as well that
supervised techniques shows better results than unsupervised
approaches. However, they are domain dependent, producing
even poorest results in other kind of data, demanding another
training round of the classifier, increasing cost and time to
classify documents.</p>
      <p>Related to our work, there are the works from [Wilson et
al., 2005], [Taboada et al., 2008] and [Ohana and Tierney,
2009] that use a wide range of document’s features. These
features range from the count of adjectives, adverbs in phrase
or whole document, tuples of words (called bigrams if two
words; trigrams if three words), such as adverbs and
adjectives, to the sum of polarities and many others features.
[Taboada et al., 2008] and [Ohana and Tierney, 2009] use a
semantic lexicon, Sentiwordnet [Esuli and Sebastiani, 2006],
to assign numeric values to word’s semantic orientation. In
classifying documents as negative or positive, the results
obtained were 65,7% of accuracy in [Wilson et al., 2005], 80.6%
in [Taboada et al., 2008], and 69.35% in [Ohana and Tierney,
2009].</p>
      <p>Although it has been shown that Fuzzy Logic is suitable to
handle imprecise and vague data [Zadeh, 1996; Wang, 2003],
we found only a few works applying fuzzy concepts to
opinion mining, such as fuzzy sets or fuzzy inference systems.
One of the few papers found was [Nadali et al., 2010]. It
proposes a fuzzy logic model to perform semantic
classifications of customers review into five classes: very weak, weak,
moderate, very strong and strong. Also introduces a
methodology that implies use of a fuzzy inference system, fuzzy sets
that models the five classes and manually created IF-THEN
rules. However, the paper did not describe results or further
discussion.</p>
      <p>Another paper was [Ballhysa and Asilkan, 2012] that
proposes a fuzzy approach for discovering the underlying
opinion in entries in blogs, determining the overall polarity. The
authors presented fuzzy concepts such as fuzzy sets and fuzzy
sets operations. They proposed a set of fuzzy measures (from
counting manually chosen keywords) and a single fuzzy
aggregation of these measures, but a fuzzy inference system is
not used. However, the proposed measures seem to actually
correspond crisp value, so there is no actual application of
fuzzy logic. Moreover, there is only a superficial description
of results, obtained on their own dataset with no comparison
with other works.</p>
      <p>This paper differs from previous work on applying fuzzy
systems for opinion mining. We model fuzzy variables and
build a fuzzy inference system based on document features.
We run our tests in datasets already used in previous works,
allowing direct comparison. Besides, we propose a feature
extraction and selection stage, where we extract a great
number of features from documents, based on previous works and
extended with our own features, and perform feature selection
based on different algorithms. The next section presents the
opinion mining process that we used, describing each stage
and the relevant techniques used on them.
3</p>
    </sec>
    <sec id="sec-3">
      <title>The opinion mining process</title>
      <p>Our opinion mining process is composed by five stages:
domain definition, preprocessing and transformation, feature
extraction and selection, classification and evaluation. In the
first stage it is defined what kind of data will be handled by
the system and what datasets will be used. We picked up
the widely used Cornell Movie Review Data 2.0 [Pang and
Lee, 2004] and a mixed dataset containing Amazon products
[Wang et al., 2011], such as camera, mobile phone, TV,
laptop, tablet, among others to evaluate our cross domain
proposal.
3.1</p>
      <sec id="sec-3-1">
        <title>Preprocessing and Transformation</title>
        <p>In the preprocessing stage, data filtering takes place and a
document representation model is built. There are three
basic levels of document analysis: document, sentence and
entities and its aspects [Liu, 2012]. The first level focus on
classify opinions as positive or negative from the whole
document perspective. The second one seeks to classify
opinions of each sentence in a document and the last level looks
to classify opinions targeted to aspects of the found
entities. We chose the document level analysis [Turney, 2002;
Pang et al., 2002; Pang and Lee, 2004; Taboada et al., 2008].</p>
        <p>As a first step, we remove all the sentences in a document
that has modal words on it, such as ”would”, ”could”, among
others. Modals indicates that the words appearing in a
sentence might not be reliable for the purposes of sentiment
analysis [Taboada et al., 2011]. Next, all words in each document
are tagged with its grammatical class using a POS (Part of
Speech) tagger [Brill, 1995].</p>
        <p>The document model in our approach is the popular
bag-ofwords model in which a document is represented as a vector,
whose entries correspond to individual terms of a vocabulary
[Moraes et al., 2012]. These terms are called generically as
n-grams. They can be unigrams (only one word), bigrams
(two words) and trigrams (three words). For each document,
one n-gram vector leaves to the next step of the process.</p>
        <p>We defined 7 types of n-grams: adjectives, adverbs, verbs
as unigrams ; adverbs with adjectives (e.g. very good) ,
adverbs with verbs (e.g. truly recommend), adverbs with
adverbs as bigrams and one type of trigram, the
combination of two adverbs with one adjective (e.g. not very nice)
[Pang et al., 2002; Turney, 2002; Taboada et al., 2008;
Karamibekr and Ghorbani, 2012].</p>
        <p>We also look for special types of bigrams and trigrams: the
negated n-grams (e.g. not bad, nothing special). This
technique is called negation detection and by itself it is a entire
line of research, going beyond this work scope, but we use a
simple version from [Taboada et al., 2011].</p>
        <p>At this point stage, each document was transformed into
a n-gram bag-of-words vector. Each n-gram is now
associated with a numeric value, an opinion polarity degree, using
an opinion lexicon. Opinion lexicons are resources that
associate words with sentiment orientation [Ohana and Tierney,
2009]. Hence, we decided to use a automatically built
opinion lexicon, the Sentiwornet [Baccianella et al., 2010].</p>
        <p>SentiWordNet (SWN) is a lexical resource explicitly
devised for supporting sentiment classification. SWN provides
positive, negative and objective scores (ranging from 0 to 1)
for each sense of English words. Since words can have
multiple senses, we apply the approach proposed by [Guerini et al.,
2013], called prior priorities, to derive positive and negative
polarity for words.</p>
        <p>To determine polarity degrees for bigrams and trigrams,
we consider adverbs as modifiers, subdivided into amplifiers
(e.g. very) and downtoners (e.g. slightly) to increase or
decrease adjective (unigram) values, respectively [Quirk et al.,
1985]. Downtoners and amplifiers have sub-levels, each of
them has a modifier value associated, such as -0.5 to
”lowest” downtoners and 0.25 to ”high” amplifiers, among
others sub-levels. The final score s for a bigram is defined by
s(bigram) = s(unigram) + s(unigram) s(modif ier)
and score s for trigram by s(trigram) = s(bigram) +
s(bigram) s(modif ier).</p>
        <p>The special case among bigrams and trigrams are the
negated ones. For these, instead of use modifiers, we apply
a similar approach made by [Taboada et al., 2011], shifting
the n-gram polarity to the opposite sign by a fixed amount
(0.5, empirically defined). [Taboada et al., 2011] has shown
as well that shift polarity is better than just invert the n-gram
polarity sign.</p>
        <p>Other technique was the attenuation by n-gram frequency,
in which a term polarity is decreased by the number of times
that it appears in the document. The nth appearance of a word
in text will have the new score s’ defined by s0(word) =
s(word)=n. The repetition of an adjective, for instance,
suggests that the writer lacks additional substantive commentary,
and is simply using a generic positive word [Taboada et al.,
2011]. Also we have used a bias compensation to negative
term polarities. Lexicon-based sentiment classifiers generally
show a positive bias [Alistair and Diana, 2005], likely the
result of a universal human tendency to favor positive language
[Boucher and Osgood, 1969]. So, we increased the final
ngram degree of any negative expression (after other modifiers
have applied) by a fixed amount (currently 50%). In the end
of this stage, we have a vector of n-grams associated with
polarity degrees for each document of the dataset.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Feature extraction</title>
        <p>In this step, we extract document features from the
previous numerical n-grams vectors to intent to be domain
independent. We decided this approach, because is effective as
it takes reviews, checks documents features and decides its
semantic orientation considering only its characteristics,
instead of its specific contents. The features we use are not
specific to a domain, and should be easily applicable to other
domains [Pang et al., 2002]. Also, this reduces features
dimensionality, since the resultant features vector is significantly
smaller than a regular bag-of-words vector.</p>
        <p>On the other hand, corpus-based machine learning
methods applied to opining mining are able to obtain high
accuracy rates, up to 95%, feeding word vectors directly to
classifiers, which will learn from the given document corpus which
words are related to positive and negative contexts. However,
in order to reach their full potential, most of these approaches
need immense annotated training datasets, huge amount of
time for training and still produces poorest results across
domain without full retraining.</p>
        <p>Different studies proposed many various features to
describe or discriminate documents among themselves to
identify their polarities [Wilson et al., 2005; Ohana and Tierney,
2009; Taboada et al., 2011]. In order to capture diverse
aspects from documents, we decided to extract a great number
of features, so we used features presented in these works and
derived many others, obtaining a total of 67 features.</p>
        <p>Three kinds of features were defined: sum, count and
maximum values. Sum features involves the numerical sum of
polarity degrees for different types of n-grams, such as sum
of adjectives of a document, sum of adverbs, verbs, bigrams
composed by adverb and adjective, sum of trigrams, among
others. The count features proceeds in a similar way for
different types of n-grams, counting the number of positive or
negative polarity values.</p>
        <p>The maximum values features refer to the maximum value
of a given type of n-gram in a document. For instance, if the
maximum absolute value among the unigrams is positive, this
feature has the value 1. On the other side, if maximum value
is negative, this feature has the value -1. This feature was
obtained for unigrams, bigrams and trigrams.</p>
        <p>More features were derived from the three kinds described
above by applying normalization or subtraction of features.
For instance, the difference between positives and negatives
bigrams of a document and the normalized sum of positive
adjectives are one of these derived features.</p>
        <p>After the feature extraction step, vectors of n-grams and
polarity values are replaced by feature vectors. Each
document in the dataset is now represented by a 67 size feature
vector.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Feature selection</title>
        <p>This stage is commonly found in opinion mining approaches.
It can make classifiers more efficient/effective by reducing
feature vector dimensionality, the amount of data to be
analyzed as well as identifying relevant features to be considered
[Moraes et al., 2012]. To choose the features among the ones
extracted and reduce the amount of features to be analyzed by
the classifier, we used two algorithms for feature selection,
the Correlation based Feature Selection (CFS) and feature
selection from C4.5 decision tree [Cintra et al., 2008].</p>
        <p>CFS evaluates subsets of features on the basis that a
suitable feature subsets contain features highly correlated with
the classification, yet uncorrelated to each other [Hall, 1999].
C4.5, in other hand, is an algorithm that generates a decision
tree that can be used to a classification task [Quinlan, 1993].
But, to build that tree, c4.5 needs to select the best features
among the provided. Hence, we also use c4.5 as our feature
selection algorithm.
3.4</p>
      </sec>
      <sec id="sec-3-4">
        <title>Classification</title>
        <p>In the classification stage, we build a rule-based fuzzy system
classifier to predict the overall sentiment orientation, as
positive or negative, of each document in the dataset. Building
such classifier involves creating a set of rules based on the
extracted features, modeling these features as linguistic
variables with fuzzy sets, lastly, defining an inference system.</p>
        <p>In order for fuzzy sets to appropriately model data, we
first identify outliers values in features and limit the range
of feature values. To do this, we used the three-sigma rule
[Kazmier, 2004] to select outliers values that lie after three
standard deviations from the mean of a feature, an interval
where 99.73% of the values in a normal distribution stand in.
Outliers values left out this range were modified to the
extreme value of the accepted range.</p>
        <p>Now, with the input range standardized for every feature,
we can define the fuzzy sets [Zadeh, 1965] to model our input
and output variables. We decided to use triangular fuzzy sets.
The first approach was use three fuzzy sets in the input (low,
medium and high) and two sets for the output (negative and
positive), uniformly distributed along the feature value range.
Another approach was to use only two sets in the input,
removing the medium fuzzy set.</p>
        <p>Once fuzzy variables were modeled from fuzzy sets, the
next step was to build our fuzzy rule base using the
WangMendel Fuzzy Rule generation [Wang and Mendel, 1992].
With previously specified fuzzy sets, this fuzzy rule
generation method takes each data instance in the dataset,
determines pertinence degrees in all fuzzy sets and builds rules
using the fuzzy sets with highest pertinence degrees, for each
input-output pair.</p>
        <p>The generated fuzzy rule base along with the specified
fuzzy sets are then used by a fuzzy inference mechanism
to determine document polarity class. The mechanism used
were the General Fuzzy Reasoning Method (GFRM) and the
Classic Fuzzy Reasoning Method (CFRM) [Cordon et al.,
1999].</p>
        <p>In this classification process, each document feature vector
is evaluated by all fuzzy rules and a compatibility degree is
produced for each rule. The CFRM picks up the rule with the
maximum compatibility degree and assigns the rule output
class to document. In the other side, GFRM takes the
maximum average compatibility degree between the two possible
classes, positive and negative. In other words, GFRM
calculates the average degree among all rules with negative and
positive output and assigns to document the class from the
maximum average compatibility degree.
3.5</p>
      </sec>
      <sec id="sec-3-5">
        <title>Evaluation</title>
        <p>In order to evaluate our opinion classification approach, we
apply a 10-fold cross-validation. As measures of
classification performance, accuracy, recall, precision and F1 score
were chosen. Accuracy is a measure of the ratio between
documents that has been classified correctly to the total
number of documents being classified. Recall measures the ratio
of documents correctly classified into a category to the total
number of documents truly belonging to that category. This
measurement indicates the ability to recall items in the
category. Precision measures the ratio of the number of
documents correctly classified into a category to the total number
of documents classified into category. And F1 score is a
measure that considers both the Precision and the Recall to
compute the score. F1 is often considered as a weighted average
of the precision and recall. [Chaovalit and Zhou, 2005].</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>In this section we describe and discuss our experiments and
its results. We aim to not only compare best classification
accuracy but also discuss contexts in which the classifiers
produce better or worse results.
4.1</p>
      <sec id="sec-4-1">
        <title>Datasets</title>
        <p>We performed our experiments on two datasets, as described
before. Each dataset consists of 2000 reviews that were
previously classified in terms of the overall orientation as being
either positive or negative (1000 positive and 1000 negative
reviews). For the Amazon dataset, the ground truth was
obtained according to the customer 5-stars rating. Reviews with
more than 3 stars were defined as being positive and reviews
with less than 3 stars were labeled as being negative.
Reviews with 3 stars were not included in our analysis. In the
movie reviews dataset, all documents were already tagged as
positive or negative.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Design of experiments</title>
        <p>We focus on comparing GFRM and CFRM varying the
configuration settings and comparing classification accuracy. We
also evaluate the influence of the feature selection algorithms,
the inference systems themselves and the quantity of fuzzy
sets in the input system.</p>
        <p>For each dataset, we performed the preprocessing,
transformation and feature creation stages as we described. But,
starting at feature selection we performed each stage only on
training folds. For example, the fold 1 is used as test fold in
classification and evaluation stages, but the remaining folds
are used to feature selection and build the combined fuzzy
rule base for that fold. The same process is repeated for the
rest of the folds and our results are reported as the average of
the test folds. Consequently, all kinds of n-grams combined
with all transformation techniques described in this work pass
to feature selection stage to find out which features would be
more fitted to represent documents.</p>
      </sec>
      <sec id="sec-4-3">
        <title>Feature selection algorithms evaluation</title>
        <p>To evaluate the feature selection algorithms we start with the
following settings: 3 fuzzy sets in the input and CFRM for
both datasets. Hence, using the two other parameters
unchanged, we can evaluate the feature selection algorithms
performance. Besides recall, precision, accuracy and F1 we also
verified the average quantity of selected features for each
algorithm. Table (1) shows the results from movies and table
(2) from Amazon dataset.</p>
        <p>As we can see, feature selection with c4.5 using CFRM and
3 fuzzy sets in the input obtained better overall precision and
accuracy in movie reviews dataset, using almost four times
less features. However, the inverse occurs in Amazon dataset,
where CFS with CFRM performs better than c4.5. But, in
Amazon dataset, CFS uses even more features, creating rules
with six antecedents, on average, turning the rules less
human readable. So, since c4.5 just needed one feature,
generating more readable rules, and also considering accuracy as
the main reference of performance, despite of less balanced
performance showed with lower F1 measure, we decided to
use c4.5 in both datasets.</p>
      </sec>
      <sec id="sec-4-4">
        <title>Inference system evaluation</title>
        <p>In this subsection we evaluate the performance of the
chosen inference systems, CFRM and GFRM. As we did in the
last subsection, we fixed the remaining parameters, to
better evaluate the inference systems performances, maintaining
the c4.5 algorithm and 3 fuzzy sets in the input. Table (3)
shows results from movie reviews and table (4) from
Amazon dataset.</p>
        <p>The results shows that General Fuzzy Reasoning Method
improves accuracy over the Classical Fuzzy Reasoning
Method, maintaining feature selection and fuzzy sets
unchanged.Also the F1 score shows better balance between
precision and recall with GFRM. In this classification task with
two classes only, to consider the entire set of rules of a class
is a better approach than use only one rule with the highest
degree. Hence, GFRM is our choice to achieve better results
in this work.</p>
      </sec>
      <sec id="sec-4-5">
        <title>Evaluation of fuzzy sets quantity</title>
        <p>Through the last subsections, we have seen the results using
3 fuzzy sets to model our linguistic variables. Following the
decision to pick up c4.5 to reduce the complexity of the rules
and make them more human readable, we tried to reduce the
fuzzy sets, using only the ”Low” and ”High” fuzzy sets. Table
(5) shows the obtained results to movies and table (6) from
Amazon dataset.</p>
        <p>Table (5) shows that the accuracy and F1 were
significantly improved by removing a fuzzy set, more specifically
the ”medium”, remaining the ”low” and ”high” fuzzy sets.
Also, between movies and amazon datasets, even though very
slightly, the best overall results is in movies. This is specially
interesting because movie reviews are often reported as the
most difficult type of reviews to be classified [Turney, 2002;
Pang and Lee, 2004; Chaovalit and Zhou, 2005; Ohana and
Tierney, 2009].</p>
        <p>In both datasets, the single feature selected by c4.5 was the
difference between the sum of positive and negative unigrams
and bigrams composed by adjectives and adverbs. With this
only feature, we could classify close to 70% of the movies
reviews and Amazon reviews with two simple and human
readable rules generated by Wang-Mendel method:</p>
        <p>IF the difference between the sum of positive and
negative unigrams and bigrams composed by adjectives and
adverbs is HIGH then POLARITY is POSITIVE
IF the difference between the sum of positive and
negative unigrams and bigrams composed by adjectives and
adverbs is LOW then POLARITY is NEGATIVE</p>
      </sec>
      <sec id="sec-4-6">
        <title>More results</title>
        <p>Although we have used the Amazon dataset presented in
[Wang et al., 2011] to test and evaluate our work, the
evaluation in that paper was related to rating prediction and not
classification, making any comparison improper. The same can’t
be said about the Cornell Movie Reviews. This is a dataset
already pre-processed by the authors used in that way for many
others papers. Hence, we compare our results with those
papers that have used the Cornell Movie Reviews dataset.</p>
        <p>Our work can be comparable to [Ohana and Tierney, 2009]
and [Taboada et al., 2008] that used strictly the same dataset
and they aren’t domain dependent as well. They showed
69,35% and 76% of accuracy, respectively. It is important
to say that these works do not apply a fuzzy approach and
[Taboada et al., 2008] uses many different steps from our
work, such as opinion lexicon (they manually created their
own), entire different intensifiers set, among others. [Ohana
and Tierney, 2009], in the other side, uses many things
related to this work, such as Sentiwordnet and many similar
and equal documents features.</p>
        <p>We can cite others papers that have used a previous version
of this movie dataset (that differs in quantity) such as [Ohana
et al., 2011] that presented 69,9% of accuracy. Concerning
papers that have presented a fuzzy approach, we couldn’t find
any of them that presented results or anything closely related
to this work.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and further works</title>
      <p>This work proposed and evaluated an automated fuzzy
opinion mining system to classify the overall sentiment
orientation of document’s text. Our proposal uses the Wang-Mendel
method [Wang and Mendel, 1992] to generate fuzzy rules
based on most fitted features among almost 70 features that
were extracted and selected from documents. We achieved
promising results, reaching 71,25% of accuracy in a 10-fold
cross-validation.</p>
      <p>Our work is probably the first one to apply Fuzzy Logic
and Wang-Mendel method in opinion mining, evidencing
results on datasets from previous works. Besides, our results
are comparable to previous works that applies non fuzzy
techniques. Also, we classified documents with human readable
rules using simple fuzzy sets, such as low, high, positive and
negative. We contribute as well in the investigation of features
that can be relevant to describe and discriminate documents.</p>
      <p>We have reported initial results from a ongoing research.
As future works, we have many improvements points such
as:</p>
      <p>Build a better set of intensifiers and evaluate their
influence in final results;
Improve negation detection and how to better apply it;
Improve how the fuzzy sets are modeled to inputs from
document features;
Investigate more features that could represent and better
distinguish documents;
Experiment with other feature selection techniques, to
investigate the influence of the selected features on fuzzy
rules generation.
[Ballhysa and Asilkan, 2012] Elton Ballhysa and Ozcan
Asilkan. A fuzzy approach for blog opinion mining-an
application to albanian language. AWERProcedia
Information Technology and Computer Science, 1, 2012.
[Brill, 1995] Eric Brill. Transformation-based error-driven
learning and natural language processing: A case study
in part-of-speech tagging. Computational linguistics,
21(4):543–565, 1995.
[Chaovalit and Zhou, 2005] Pimwadee Chaovalit and Lina
Zhou. Movie review mining: A comparison between
supervised and unsupervised classification approaches. In
System Sciences, 2005. HICSS’05. Proceedings of the 38th
Annual Hawaii International Conference on, pages 112c–
112c. IEEE, 2005.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[Alistair and Diana</source>
          , 2005]
          <string-name>
            <given-names>Kennedy</given-names>
            <surname>Alistair</surname>
          </string-name>
          and
          <string-name>
            <given-names>Inkpen</given-names>
            <surname>Diana</surname>
          </string-name>
          .
          <article-title>Sentiment classification of movie and product reviews using contextual valence shifters</article-title>
          .
          <source>Proceedings of FINEXIN</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Baccianella et al.,
          <year>2010</year>
          ]
          <string-name>
            <given-names>Stefano</given-names>
            <surname>Baccianella</surname>
          </string-name>
          ,
          <source>Andrea Esuli, and Fabrizio Sebastiani. Sentiwordnet 3</source>
          .
          <article-title>0: An enhanced lexical resource for sentiment analysis and opinion mining</article-title>
          .
          <source>In LREC</source>
          , volume
          <volume>10</volume>
          , pages
          <fpage>2200</fpage>
          -
          <lpage>2204</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [Cintra et al.,
          <year>2008</year>
          ]
          <string-name>
            <given-names>Marcos</given-names>
            <surname>Evandro</surname>
          </string-name>
          <string-name>
            <surname>Cintra</surname>
          </string-name>
          , CH de Arruda, and Maria Carolina Monard.
          <article-title>Fuzzy feature subset selection using the wang &amp; mendel method</article-title>
          .
          <source>In Hybrid Intelligent Systems</source>
          ,
          <year>2008</year>
          . HIS'08. Eighth International Conference on, pages
          <fpage>590</fpage>
          -
          <lpage>595</lpage>
          . IEEE,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Cordon et al.,
          <year>1999</year>
          ]
          <string-name>
            <given-names>Oscar</given-names>
            <surname>Cordon</surname>
          </string-name>
          ,
          <source>Maria Jose del Jesus</source>
          , and Francisco Herrera.
          <article-title>A proposal on reasoning methods in fuzzy rule-based classification systems</article-title>
          .
          <source>International Journal of Approximate Reasoning</source>
          ,
          <volume>20</volume>
          (
          <issue>1</issue>
          ):
          <fpage>21</fpage>
          -
          <lpage>45</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <source>[Esuli and Sebastiani</source>
          , 2006]
          <string-name>
            <given-names>Andrea</given-names>
            <surname>Esuli</surname>
          </string-name>
          and
          <string-name>
            <given-names>Fabrizio</given-names>
            <surname>Sebastiani</surname>
          </string-name>
          .
          <article-title>Sentiwordnet: A publicly available lexical resource for opinion mining</article-title>
          .
          <source>In Proceedings of LREC</source>
          , volume
          <volume>6</volume>
          , pages
          <fpage>417</fpage>
          -
          <lpage>422</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [Guerini et al.,
          <year>2013</year>
          ]
          <string-name>
            <given-names>Marco</given-names>
            <surname>Guerini</surname>
          </string-name>
          , Lorenzo Gatti, and
          <string-name>
            <given-names>Marco</given-names>
            <surname>Turchi</surname>
          </string-name>
          .
          <article-title>Sentiment analysis: How to derive prior polarities from sentiwordnet</article-title>
          .
          <source>arXiv preprint arXiv:1309.5843</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [Hall, 1999]
          <article-title>Mark A Hall. Correlation-based feature selection for machine learning</article-title>
          .
          <source>PhD thesis</source>
          , The University of Waikato,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>[Karamibekr and Ghorbani</source>
          , 2012]
          <article-title>Mostafa Karamibekr and Ali A Ghorbani. Verb oriented sentiment classification</article-title>
          .
          <source>In Web Intelligence and Intelligent Agent Technology (WIIAT)</source>
          ,
          <year>2012</year>
          IEEE/WIC/ACM International Conferences on, volume
          <volume>1</volume>
          , pages
          <fpage>327</fpage>
          -
          <lpage>331</lpage>
          . IEEE,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <source>[Kazmier</source>
          , 2004]
          <article-title>Leonard J Kazmier. Schaum's outline of business statistics</article-title>
          .
          <source>McGraw-Hill</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [Liu, 2012]
          <string-name>
            <given-names>Bing</given-names>
            <surname>Liu</surname>
          </string-name>
          .
          <article-title>Sentiment analysis and opinion mining</article-title>
          .
          <source>Synthesis Lectures on Human Language Technologies</source>
          ,
          <volume>5</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>167</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [Moraes et al.,
          <year>2012</year>
          ]
          <string-name>
            <given-names>Rodrigo</given-names>
            <surname>Moraes</surname>
          </string-name>
          , Joa˜o Francisco Valiati, and Wilson P Gavia˜
          <string-name>
            <given-names>O</given-names>
            <surname>Neto</surname>
          </string-name>
          .
          <article-title>Document-level sentiment classification: An empirical comparison between svm and ann</article-title>
          .
          <source>Expert Systems with Applications</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [Nadali et al.,
          <year>2010</year>
          ]
          <string-name>
            <given-names>S</given-names>
            <surname>Nadali</surname>
          </string-name>
          , MAA Murad, and
          <string-name>
            <given-names>RA</given-names>
            <surname>Kadir</surname>
          </string-name>
          .
          <article-title>Sentiment classification of customer reviews based on fuzzy logic</article-title>
          .
          <source>In Information Technology (ITSim)</source>
          , 2010 International Symposium in, volume
          <volume>2</volume>
          , pages
          <fpage>1037</fpage>
          -
          <lpage>1044</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <source>[Ohana and Tierney</source>
          , 2009]
          <string-name>
            <given-names>Bruno</given-names>
            <surname>Ohana</surname>
          </string-name>
          and
          <string-name>
            <given-names>Brendan</given-names>
            <surname>Tierney</surname>
          </string-name>
          .
          <article-title>Sentiment classification of reviews using sentiwordnet</article-title>
          . In 9th. IT &amp;amp; T Conference, page
          <volume>13</volume>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [Ohana et al.,
          <year>2011</year>
          ]
          <string-name>
            <given-names>Bruno</given-names>
            <surname>Ohana</surname>
          </string-name>
          , Brendan Tierney, and
          <string-name>
            <given-names>S</given-names>
            <surname>Delany</surname>
          </string-name>
          .
          <article-title>Domain independent sentiment classification with many lexicons</article-title>
          .
          <source>In Advanced Information Networking and Applications (WAINA)</source>
          ,
          <year>2011</year>
          IEEE Workshops of International Conference on, pages
          <fpage>632</fpage>
          -
          <lpage>637</lpage>
          . IEEE,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <source>[Pang and Lee</source>
          , 2004]
          <string-name>
            <given-names>Bo</given-names>
            <surname>Pang</surname>
          </string-name>
          and
          <string-name>
            <given-names>Lillian</given-names>
            <surname>Lee</surname>
          </string-name>
          .
          <article-title>A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts</article-title>
          .
          <source>In Proceedings of the 42nd annual meeting on Association for Computational Linguistics, page 271. Association for Computational Linguistics</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [Pang et al.,
          <year>2002</year>
          ]
          <string-name>
            <given-names>Bo</given-names>
            <surname>Pang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Lillian</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Shivakumar</given-names>
            <surname>Vaithyanathan</surname>
          </string-name>
          .
          <article-title>Thumbs up?: sentiment classification using machine learning techniques</article-title>
          .
          <source>In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-</source>
          Volume
          <volume>10</volume>
          , pages
          <fpage>79</fpage>
          -
          <lpage>86</lpage>
          . Association for Computational Linguistics,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <source>[Quinlan</source>
          ,
          <year>1993</year>
          ]
          <string-name>
            <given-names>RC</given-names>
            <surname>Quinlan</surname>
          </string-name>
          .
          <article-title>4.5: Programs for machine learning</article-title>
          morgan kaufmann publishers inc. San Francisco, USA,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [Quirk et al.,
          <year>1985</year>
          ]
          <string-name>
            <given-names>Randolph</given-names>
            <surname>Quirk</surname>
          </string-name>
          , David Crystal,
          <string-name>
            <given-names>and Pearson</given-names>
            <surname>Education</surname>
          </string-name>
          .
          <article-title>A comprehensive grammar of the English language</article-title>
          , volume
          <volume>397</volume>
          . Cambridge Univ Press,
          <year>1985</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [Taboada et al.,
          <year>2008</year>
          ]
          <string-name>
            <given-names>Maite</given-names>
            <surname>Taboada</surname>
          </string-name>
          , Kimberly Voll, and
          <string-name>
            <given-names>Julian</given-names>
            <surname>Brooke</surname>
          </string-name>
          .
          <article-title>Extracting sentiment as a function of discourse structure and topicality</article-title>
          .
          <source>Simon Fraser Univeristy School of Computing Science Technical Report</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [Taboada et al.,
          <year>2011</year>
          ]
          <string-name>
            <given-names>Maite</given-names>
            <surname>Taboada</surname>
          </string-name>
          , Julian Brooke, Milan Tofiloski, Kimberly Voll, and
          <string-name>
            <given-names>Manfred</given-names>
            <surname>Stede</surname>
          </string-name>
          .
          <article-title>Lexiconbased methods for sentiment analysis</article-title>
          .
          <source>Computational linguistics</source>
          ,
          <volume>37</volume>
          (
          <issue>2</issue>
          ):
          <fpage>267</fpage>
          -
          <lpage>307</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <source>[Turney</source>
          , 2002]
          <string-name>
            <given-names>Peter D</given-names>
            <surname>Turney</surname>
          </string-name>
          .
          <article-title>Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews</article-title>
          .
          <source>In Proceedings of the 40th annual meeting on association for computational linguistics</source>
          , pages
          <fpage>417</fpage>
          -
          <lpage>424</lpage>
          . Association for Computational Linguistics,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <source>[Wang and Mendel</source>
          , 1992]
          <string-name>
            <given-names>L-X</given-names>
            <surname>Wang</surname>
          </string-name>
          and
          <string-name>
            <surname>Jerry M Mendel.</surname>
          </string-name>
          <article-title>Generating fuzzy rules by learning from examples</article-title>
          .
          <source>Systems, Man and Cybernetics</source>
          , IEEE Transactions on,
          <volume>22</volume>
          (
          <issue>6</issue>
          ):
          <fpage>1414</fpage>
          -
          <lpage>1427</lpage>
          ,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <surname>[Wang</surname>
          </string-name>
          et al.,
          <year>2011</year>
          ]
          <string-name>
            <given-names>Hongning</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>Yue Lu</surname>
          </string-name>
          , and ChengXiang Zhai.
          <article-title>Latent aspect rating analysis without aspect keyword supervision</article-title>
          .
          <source>In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          , pages
          <fpage>618</fpage>
          -
          <lpage>626</lpage>
          . ACM,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <source>[Wang</source>
          ,
          <year>2003</year>
          ]
          <string-name>
            <given-names>L-X</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <article-title>The wm method completed: a flexible fuzzy system approach to data mining</article-title>
          .
          <source>Fuzzy Systems</source>
          , IEEE Transactions on,
          <volume>11</volume>
          (
          <issue>6</issue>
          ):
          <fpage>768</fpage>
          -
          <lpage>782</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <source>[Wiebe</source>
          , 1990]
          <string-name>
            <surname>Janyce</surname>
            <given-names>M</given-names>
          </string-name>
          <string-name>
            <surname>Wiebe.</surname>
          </string-name>
          <article-title>Identifying subjective characters in narrative</article-title>
          .
          <source>In Proceedings of the 13th conference on Computational linguistics-Volume</source>
          <volume>2</volume>
          , pages
          <fpage>401</fpage>
          -
          <lpage>406</lpage>
          . Association for Computational Linguistics,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <source>[Wiebe</source>
          , 1994]
          <string-name>
            <surname>Janyce</surname>
            <given-names>M</given-names>
          </string-name>
          <string-name>
            <surname>Wiebe.</surname>
          </string-name>
          <article-title>Tracking point of view in narrative</article-title>
          .
          <source>Computational Linguistics</source>
          ,
          <volume>20</volume>
          (
          <issue>2</issue>
          ):
          <fpage>233</fpage>
          -
          <lpage>287</lpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [Wilson et al.,
          <year>2004</year>
          ] Theresa Wilson, Janyce Wiebe, and
          <string-name>
            <given-names>Rebecca</given-names>
            <surname>Hwa</surname>
          </string-name>
          .
          <article-title>Just how mad are you? finding strong and weak opinion clauses</article-title>
          . In aaai, volume
          <volume>4</volume>
          , pages
          <fpage>761</fpage>
          -
          <lpage>769</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [Wilson et al.,
          <year>2005</year>
          ] Theresa Wilson, Janyce Wiebe, and
          <string-name>
            <given-names>Paul</given-names>
            <surname>Hoffmann</surname>
          </string-name>
          .
          <article-title>Recognizing contextual polarity in phrase-level sentiment analysis</article-title>
          .
          <source>In Proceedings of the conference on human language technology and empirical methods in natural language processing</source>
          , pages
          <fpage>347</fpage>
          -
          <lpage>354</lpage>
          . Association for Computational Linguistics,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <source>[Zadeh</source>
          , 1965]
          <article-title>Lotfi A Zadeh. Fuzzy sets</article-title>
          .
          <source>Information and control</source>
          ,
          <volume>8</volume>
          (
          <issue>3</issue>
          ):
          <fpage>338</fpage>
          -
          <lpage>353</lpage>
          ,
          <year>1965</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <source>[Zadeh</source>
          , 1996]
          <article-title>Lotfi A Zadeh. Fuzzy logic= computing with words</article-title>
          .
          <source>Fuzzy Systems</source>
          , IEEE Transactions on,
          <volume>4</volume>
          (
          <issue>2</issue>
          ):
          <fpage>103</fpage>
          -
          <lpage>111</lpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>