<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Detecting Truthful and Useful Consumer Reviews for Products using Opinion Mining</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kalpana Algotar</string-name>
          <email>kalgotar@asu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ajay Bansal</string-name>
          <email>ajay.banssal@asu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Arizona State University</institution>
          ,
          <addr-line>Mesa AZ 85212</addr-line>
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <fpage>63</fpage>
      <lpage>72</lpage>
      <abstract>
        <p>Individuals and organizations rely heavily on social media these days for consumer reviews in their decision-making on purchases. However, for personal gains such as profit or fame, people post fake reviews to promote or demote certain target products as well as to deceive the reader. To get genuine user experiences and opinions, there is a need to detect such spam or fake reviews. This paper presents a study that aims to detect truthful, useful reviews and ranks them. An effective supervised learning technique is proposed to detect truthful and useful reviews and rank them, using a 'deceptive' classifier, 'useful' classifier, and a 'ranking' model respectively. Deceptive and nonuseful consumer reviews from online review communities such as amazon.com and Epinions.com are used. The proposed method first uses the 'deceptive' classifier to find truthful reviews followed by the 'useful' classifier to find whether a review is useful or not. Manually labeling individual reviews is very difficult and time consuming. We incorporate a dictionary that makes it easy to label reviews. We present the experimental results of our proposed approach using our dictionary with 'deceptive' classifier and 'useful' classifier.</p>
      </abstract>
      <kwd-group>
        <kwd>Text Classification</kwd>
        <kwd>Spam Review Detection</kwd>
        <kwd>Opinion Mining</kwd>
        <kwd>Supervised Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Nowadays, consumers looking to buy a product increasingly rely on user-generated
online reviews to make or reverse their purchase decisions. Positive reviews of a
product greatly influence the person’s decision to buy the product. However, if one
sees many negative reviews, he/she will most likely choose a different product. The
outcome of positive reviews gives significant profit and advertizing for the seller and
their organization. This in turn creates a market for incentivizing opinion spam. This
has resulted in more and more people trying to game the system by writing fake
reviews to harm or promote some products or services. A fake review means that it is
either a positive review written by the business owners themselves (or people they
contract to write reviews) or a negative review written by a business’s competitors.
Those fake reviews try to deliberately mislead readers by giving fake reviews to some
entities (e.g. products) in order to promote them or to damage their reputation.</p>
      <p>
        Opinion spamming refers to writing fake reviews that try to deliberately mislead
human readers. The focus of spam research in the context of online reviews has been
primarily on detection. Cornell University has developed a model to spot fake,
nonfake review for hotels [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] as well as some existing works have been done by other
researchers to detect fake reviews and spam reviewers. Recent studies, however, show
that opinion spam is not easily identified by human readers [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. In particular, humans
have a difficult time identifying deceptive messages from consumer reviews. We
decided to work on the same issue for product by taking different approach to make the
process easier. In this approach, we choose Cornell model [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] as a base to prepare our
own dictionary for fake, non-fake reviews. Our, automated approach has emerged to
reliably label reviews as truthful vs. deceptive as well as second approach to label
useful vs. not-useful using reader’s rating on consumer’s review. We train SVM text
classifier using a corpus of truthful and deceptive as well as useful and not-useful
reviews from Amazon and Epinion. We applied our approach to the domain of camera
reviews and present the results.
      </p>
      <p>The rest of the document is organized as follows: Section 2 presents related work.
Background material related to this project is presented in Section 3. Our proposed
approach and its implementation is presented in Section 4. Section 5 presents the
experiments and analysis followed by conclusions and future work in Section 6.</p>
    </sec>
    <sec id="sec-2">
      <title>2 Related Work</title>
      <p>
        Web spam and email spam have been investigated extensively. The objective of
Web spam is to make search engines rank the target pages high in order to attract
people to visit these pages. Web spam can be categorized into two main types: content
spam and link spam. Link spam is spam on hyperlinks that are placed between pages,
which does not exist in reviews as usually there are no links within them. Content
spam tries to add irrelevant or remotely relevant words in target pages to fool search
engines to rank the target pages high. Another related research is email spam [
        <xref ref-type="bibr" rid="ref14 ref5 ref8">5, 8, 14</xref>
        ],
which is also quite different from review spam. Email spam usually refers to
unsolicited commercial advertisements. Although this exists, advertisements in reviews
are not as frequent as in emails. They are also relatively easy to detect. Deceptive
opinion spam is much harder to deal with. We present below, different approaches
taken opinion spam detection.
      </p>
      <sec id="sec-2-1">
        <title>2.1 Review Spam Detection</title>
        <p>
          A preliminary study was reported in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] to study spam review and spam detection
based on finding duplicates and classification. That study proposed to treat duplicate
reviews as positive training examples (with label fake), and the rest of the reviews as
the negative training examples (with label non-fake). For the rest of spam (fake)
reviews, they detected based on 2-class classification (spam and non-spam). In addition,
they found that 52% of the highly ranked non-duplicate reviews had more than 1800
words, much higher than the average length of a normal review, and were regarded as
spam reviews. A more in-depth investigation was given in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] where three types of
spam review were identified, namely untruthful reviews (reviews that promote or
demote products), reviews on brands but not products, and non-reviews (e.g.,
advertisements). By representing a review using a set of review, reviewer and
productlevel features, classification techniques were used to assign spam (fake) labels to
reviews. In particular, untruthful review detection is performed by using duplicate
reviews as the positive training examples (fake) and the rest of the reviews as negative
training examples (non-fake) and for rest of the types manual labeling was done. In
[
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] neural network based model was used for representation learning of reviews.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2 Reviewer Spam Detection</title>
        <p>
          Some of the related research addresses the problem of review spammer detection, or
finding users who are the source of spam reviews. Reviews usually come with ratings.
Detecting unfair ratings has been studied in several works including [
          <xref ref-type="bibr" rid="ref10 ref4">4, 10</xref>
          ]. The
techniques used include: (a) clustering ratings into unfairly high ratings and unfairly
low ratings, and (b) using third party ratings on the producers of ratings and ratings
from less reputable producers are then deemed as unfair. Once unfair ratings are found,
they can be removed to restore a fair item evaluation system. These works did not
address review spammer detection directly on the reviews. They usually did not
conduct evaluation of their techniques on real data.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3 Helpful Review Detection and Prediction</title>
        <p>
          Review helpfulness prediction is closely related to review spam detection described
in above. A helpful review is one that is informative and useful to the readers. The
purpose of predicting review helpfulness is to faciliate review sites to provide feedback
to the review contributors and to help readers choose and read high quality reviews. A
classification approach to solving helpfulness prediction using review content and
meta-data features was developed in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The meta-data features used are review's
rating and the difference between the review rating and the average rating of all
reviews of the product. Liu et. al proposes to derive features from reviews content that
correspond to informativeness, readability, and subjectiveness aspects of the review
[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. These features are then used to train a review helpfulness classification method.
        </p>
        <p>
          Amazon.com allows users to vote if a review is helpful or not. These helpfulness
votes are manually assigned and are thus subjective and possibly abused.
DanescuNiculescu-Mizil et. al found that a strong correlation between proportion of helpful
votes of reviews and the deviation of the review ratings from the average ratings of
products [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. This correlation illustrates that helpful votes are generally consistent with
average ratings. The study is however conducted at the collection level and does not
provide evidence to link spam and helpfulness votes. Ott and others [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] presented a
framework for estimating the prevalence of deception in online review communities. In
this task, they paid one US dollar ($1) to each of 400 unique Mechanical Turk workers
to write a fake positive (5-star) review for one of the 20 most heavily-reviewed
Chicago hotels on TripAdvisor. For consistency with labeled deceptive review data,
they simply labeled as truthful all positive (5-star) reviews of the 20 previously chosen
Chicago hotels.
        </p>
        <p>Detecting spam and predicting helpfulness are two separate problems since
notuseful reviews are not necessarily fake. A poorly written review may be not-useful but
is not fake. Spam reviews usually target specific products while not-useful votes may
be given to any products. Given the motive driven nature of spamming activities,
review spam detection will therefore require an approach different from not-useful
review detection. Our proposed technique aims to detect truthful, useful reviews and
provide a ranking of the reviews.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Background</title>
      <sec id="sec-3-1">
        <title>3.1 Supervised Learning Methods:</title>
        <p>A computer system learns from training data that represents some “past
experiences” of an application domain. In this section, we briefly describe the various
classification methods used in order to categorize reviews into deceptive, truthful and
useful, not-useful. Classification involves labeling of the data (observations,
measurements) with pre-defined classes. We have used three supervised learning
algorithms: Support Vector Machine, Naïve Bayes, and K-Nearest Neighbor.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Support Vector Machines:</title>
        <p>
          Support Vector Machines [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] are supervised learning methods used for classification,
as well as regression. The advantage of Support Vector Machines is that they can make
use of certain kernels in order to transform the problem, such that we can apply linear
classification techniques to non-linear data. Applying the kernel equations arranges the
data instances in such a way within the multi-dimensional space, that there is a
hyperplane that separates data instances of one kind from those of another. The kernel
equations may be any function that transforms the linearly non-separable data in one
domain into another domain where the instances become linearly separable. Kernel
equations may be linear, quadratic, Gaussian, or anything else that achieves this
particular purpose. Once we manage to divide the data into two distinct categories, our
aim is to get the best hyper-plane to separate the two types of instances. This
hyperplane is important because it decides the target variable value for future predictions.
We should decide upon a hyper-plane that maximizes the margin between the support
vectors on either side of the plane that is displayed in the Figure 1.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Naïve Bayes Classifier:</title>
        <p>
          The Naïve Bayes classifier [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] is based on the Bayes rule of conditional probability. It
makes use of all the attributes contained in the data, and analyses them individually as
though they are equally important and independent of each other. For example,
consider that the training data consists of various animals (for example: elephants,
monkeys, and giraffes), and our classifier has to classify any new instance that it
encounters. We know that elephants have attributes like they have a trunk, huge tusks,
a short tail, are extremely big, etc. Monkeys are short in size, jump around a lot, and
can climbing trees; whereas giraffes are tall, have a long neck and short ears.
        </p>
        <p>The Naïve Bayes classifier will consider each of these attributes separately when
classifying a new instance. So, when checking to see if the new instance is an elephant,
the Naïve Bayes classifier will not check whether it has a trunk and has huge tusks and
is large. Rather, it will separately check whether the new instance has a trunk, whether
it has tusks, whether it is large, etc. It works under the assumption that one attribute
works independently of the other attributes contained by the sample. In our
experiments, it is seen that the Naïve Bayes classifier shows a drop in performance,
when compared with K-NN and Support Vector Machines.</p>
      </sec>
      <sec id="sec-3-4">
        <title>K-Nearest Neighbor:</title>
        <p>
          The K-nearest neighbor [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] algorithm is a method for classifying objects based on
closest training examples in the feature space. Unlike all the previous learning
methods, K-NN doesn’t build the model from the training data. No explicit model for
the probability density of the classes is formed; each point is estimated locally from the
surrounding points. The k-nearest-neighbor classifier is commonly based on the
Euclidean distance between a test sample and the specified training samples. Given a
test instance, a distance metric is computed between the test instance and all training
instances, then the instance k nearest neighbors are selected from the training data as
per defined in the following figure.
We choose SVM, because it is an immensely powerful classifier and it is more suited
for 2-class problem. In addition, we compared experimentally SVM, Naïve Bayes and
K-NN in performance and conclude that SVM has very good predictive power.
        </p>
      </sec>
      <sec id="sec-3-5">
        <title>3.2 RapidMiner and Rapid Analytics:</title>
        <p>
          The Community Edition of RapidMiner [
          <xref ref-type="bibr" rid="ref12 ref2">2, 12</xref>
          ] (formerly known as "Yale") is an
open source toolkit for data mining. It provides the ability to easily define analytical
steps and generate graphs. It is an environment for machine learning and data mining
experiments. RapidMiner provides a GUI which generates an XML (eXtensible
Markup Language) file that defines the analytical processes the user wishes to apply to
the data. This file is then read by RapidMiner to run the analyses automatically. While
these are running, the GUI can also be used to interactively control and inspect running
processes. RapidMiner can be used for text mining, multimedia mining, feature
engineering, data stream mining and tracking drifting concepts, development of
ensemble methods, and distributed data mining. RapidMiner provides data loading and
transformation (ETL), data preprocessing and visualization, modeling, evaluation, and
deployment. RapidMiner was rated as the fifth most used text mining software (6%) by
Rexer’s Annual Data Miner Survey in 2010. It is implemented in JAVA and available
under GPL among other licenses. Internal XML representation ensures standardized
interchange format of data mining experiments. GUI, command-line mode, and JAVA
API allow invoking RapidMiner from other programs. In RapidMiner, several plugins
are available for text processing, web mining etc. as well as a broad collection of data
mining algorithms such as SVM, decision trees and self-organization maps.
        </p>
        <p>
          Rapid Analytics [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] is the first open source business analytics server available.
Rapid Analytics was built around the most widely used data mining solution
RapidMiner and adds features like remote execution, scheduled processes, quick web
service definitions, and a complete web-based report designer. Rapid Analytics is the
new data mining server solution that uses RapidMiner both as a data mining engine
and as a front-end to design data mining processes. We chose RapidMiner and Rapid
Analytics for our implementation described in next section. First, it contains broad
collection of plugins as well as large number of supervised learning methods. Second,
classification engines created in RapidMiner but can be stored in remote repository to
execute it remotely on the Rapid Analytic server at regular time interval.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Proposed Technique</title>
      <p>In this section, we present our approach that includes (i) preparing a custom
dictionary to label reviews as truthful or deceptive; (ii) the ‘deceptive’ classifier to
predict testing data as a deceptive or truthful (iii) PHP script to label review as useful
or not-useful; (iv) ‘useful’ classifier to predict testing data is either useful or not-useful;
(v) “ranking” model to rank the reviews.</p>
      <sec id="sec-4-1">
        <title>4.1 Spam Review Detection:</title>
        <p>In general, spam review detection can be regarded as a classification problem with
two classes, fake and non-fake. Machine learning models can be built to classify each
review as deceptive or truthful. To build a classification model, we need labeled
training examples of both classes. There was no labeled dataset for product opinion
spam prior to this project. Recognizing whether a review is a deceptive opinion spam is
extremely difficult if it has to be done manually reading the review because one can
carefully craft a spam review which is just like any other genuine review. We prepared
the dictionary for fake and non-fake reviews by adding knowledge from the dataset
which is available on http://www.cs.uic.edu/~liub/FBS/CustomerReviewData.zip and
using Cornell model. To prepare dictionary we passed reviews through Cornell model
that tokenizes words based on specialized characters (like space, full stop, exclamation,
question mark etc.) in each sentence and puts it into any one of the appropriate
category along with weight like high positive (+3), moderate positive (+2), low
positive (+1), neutral (0), high negative (-3), moderate negative (-2) or low negative
(1). Some of words from neutral category of Cornell model are important for our
domain and we placed those important words into positive or negative category with
weight from http://www.cs.uic.edu/~liub/FBS/CustomerReviewData. After putting
each word of each sentence into any one of six categories along with weight, we
calculated final weight for each unique word based on our formula as follows:
Weight of each word =   ∗   ℎ</p>
        <p>More precisely we can say that,
Weight of each non-fake word =     ∗ 3 +     ∗ 2 +     ∗ 1</p>
        <p>where     is the count of a particular word in high positve category,     is the count
of a particular word in medium positve category,     is the count of a particular word in
low positve category.</p>
        <p>Weight of each fake word =     ∗ −3 +     ∗ −2 +     ∗ −1</p>
        <p>where     is the count of a particular word in high negative category,
    is the count of a particular word in medium negative category
    is the count of a particular word in low negative category
\Using above formula, we prepared two wordlists for fake and non-fake reviews along
with their corresponding weights. We called that dictionary through a php script to
label the review as fake or non-fake based on final summation of all words in each
review. If final summation of weight for fake and non-fake words of a review are
positive then it is labeled as “non-fake” otherwise it is labeled as “fake”.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Building Models Using LibSVM</title>
        <p>The first component of the framework is the ‘deception’ classifier, which predicts
whether each unlabeled review is non-fake (truthful) or fake (deceptive). As mentioned
previously, we labeled training review as deceptive or truthful, so that we can train
‘deception’ classifiers using a supervised learning algorithm. We tried three supervised
learning algorithms: support vector machine (SVM), K-NN, Naive Bayes to classify
product review using two pre-classified training sets: deceptive and truthful. Our work
has shown that SVM trained and performs well in deception detection tasks. We found
that SVM creates a hyper plane to best separate the two planes and it outperforms the
other two classifiers. We trained SVM classifiers using software package of
RapidMiner tool. Results of evaluation are presented in the next section.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.2 Useful Review Detection:</title>
        <p>In general, useful review detection can be regarded as a classification problem with
two classes, useful and not-useful. Machine learning models can be built to classify
each review as useful or not-useful. To build a classification model, we need labeled
training examples of both useful and not-useful class. There was no labeled dataset for
product opinions as useful and not-useful at the time of project (to the best of our
knowledge). However, to recognize review is useful or not, we considered reader’s
rating on consumer’s review. Using php we labeled reviews as useful if reader’s rating
is greater than 40% or as a not-useful review, if reader’s rating is less than 40%.</p>
      </sec>
      <sec id="sec-4-4">
        <title>Building Model Using LibSVM</title>
        <p>The second component of the framework is ‘useful’ classifier, which predicts
whether each unlabeled review is Useful or Not-Useful. As mentioned above, we
labeled training data, so that we can train ‘useful’ classifiers using a supervised
learning algorithm. We tried different supervised algorithms like Naïve Bayes, K-NN,
and SVM. Our work has shown that SVM trained and performs well in useful or
notuseful detection tasks as compared to other algorithms. We train SVM classifiers using
the software package of RapidMiner tool. Results of the evaluation are presented in the
next section.</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.3 Ranking Reviews:</title>
        <p>The last component of the framework is the ‘Ranking’ Model. This model takes the
output from the ‘deceptive’ classifier and ‘useful’ classifier as input to rearrange the
reviews based on weight (confidence) of fake, non-fake, useful, and not-useful. Higher
sort priority is given to deceptive/truthful reviews and then to useful/not-useful
reviews. Results of evaluation of the ‘ranking’ model are presented in the next section.</p>
      </sec>
      <sec id="sec-4-6">
        <title>4.4 Implementation:</title>
        <p>For the implementation of our approach we used RapidMiner, XAMPP, Rapid
Analytics tools. We created a PHP script to collect product (e.g. camera) reviews
from amazon and Epinion sites. To label training data, we prepared dictionary of
words for deceptive/truthful reviews and labeled the reviews by using the dictionary
in the PHP script. We created another PHP script to label training set as useful or
notuseful based on reader’s rating. We utilized RapidMiner tool and its supervised
learning method, e.g. SVM, for building the “deceptive” classification model and “useful”
classification model as well as “ranking” model. For testing purpose, we designed
HTML page to enter a product review. This review is stored in a database and when
the RapidMiner process is executed, it will fetch reviews from the database and based
on the classifier it is processed and results (reviews with classification) are stored in
the database. Using the HTML page, the result of both classifiers can be displayed.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experimental Results</title>
      <p>For evaluation, we trained both our models using different types of datasets such as
balanced and imbalanced. The training dataset for ‘deceptive’ classifier had 1348
reviews in the imbalanced dataset and 140 reviews in the balanced dataset. The
training dataset for the ‘useful’ classifier had 5003 reviews in the balanced dataset and
5103 in the imbalanced dataset. The following experimental result show that
‘deceptive’ classifier gives better performance using imbalance dataset and ‘useful’
classifier performs well using balanced dataset with SVM classification algorithm. We
calculated the performance of our models using the following formula.</p>
      <p>Performance, G = √(  ∗   )
where Sn is the sensitivity and Sp is the specificity
Sn =   and Sp =</p>
      <p>+    + 
where TP is the number of true positives</p>
      <p>TN is the number of true negatives
FP is the number of false positives
FN is the number of false negatives
We observed that SVM trained and performed well in deception detection tasks.
We found that SVM creates a hyper plane to best separate the two planes and it
outperforms the other two classifiers with an accuracy peak at about 66%.
Crossvalidated classifier performance results are presented in Table 1.</p>
      <p>We tried different supervised algorithms like Naïve Bayes, K-NN, and SVM for
“Useful” classifier. Evaluation results show that SVM trained and performed well
in useful or not-useful detection tasks as compared to other algorithms. This
approach has been evaluated to be nearly 78% accurate at detecting useful or
notuseful in a balanced dataset. Cross-validated classifier performance results are
presented in Table 2. Results of the ranking model are presented in Table 3.</p>
    </sec>
    <sec id="sec-6">
      <title>Summary and Future Work</title>
      <p>As individuals and businesses are increasingly using reviews for their
decisionmaking, it is critical to detect spam reviews. We presented our approach for detecting
spam, not-useful reviews and prioritization of the reviews based on their weight
(confidence). The evaluation shows that ‘deceptive’ classifier and ‘useful’ classifier is
nearly 66% and 78% accurate respectively. Various supervised learning methods were
used and we observed that SVM worked best as it is an immensely powerful classifier
and it is well suited for 2-class problem. In addition, we compared experimentally
SVM, Naïve Bayes and K-NN in performance and concluded that SVM has very
good predictive power. Online reviews are worthless if they are not honest opinion.
Our models, can give an idea to users on which reviews are non-fake and useful as well
as which reviews should be completely ignored in product purchase decision-making
thereby helping choose the right product. Future work might explore other methods
for labeling online reviews, and will focus on improving the accuracy and more
sopphisticated techniques for detecting spam reviews.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S. B.</given-names>
            <surname>Kotsiantis</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Zaharakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pintelas</surname>
          </string-name>
          .
          <article-title>"Supervised machine learning: A review of classification techniques." Emerging artificial intelligence applications in computer engineering 160 (</article-title>
          <year>2007</year>
          ):
          <fpage>3</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hofmann</surname>
          </string-name>
          , R. Klinkenberg, eds. RapidMiner:
          <article-title>Data mining use cases and business analytics applications</article-title>
          . CRC Press,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Danescu-Niculescu-Mizil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Kossinets</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kleinberg</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Lee</surname>
          </string-name>
          .
          <article-title>How opinions are received by online communities: a case study on amazon.com helpfulness votes</article-title>
          .
          <source>In 18th international conference on World Wide Web (WWW)</source>
          , pp.
          <fpage>141</fpage>
          -
          <lpage>150</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Dellarocas</surname>
          </string-name>
          .
          <article-title>Immunizing online reputation reporting systems against unfair ratings and discriminatory behavior</article-title>
          .
          <source>In ACM Conference on Electronic Commerce (EC)</source>
          , pp.
          <fpage>150</fpage>
          -
          <lpage>157</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>I.</given-names>
            <surname>Fette</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Sadeh-Koniecpol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tomasic</surname>
          </string-name>
          .
          <article-title>Learning to Detect Phishing Emails</article-title>
          .
          <source>In Proceedings of International Conference on World Wide Web (WWW)</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N.</given-names>
            <surname>Jindal</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Liu</surname>
          </string-name>
          .
          <article-title>Opinion spam and analysis</article-title>
          .
          <source>In WSDM</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.-M.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pantel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chklovski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pennacchiotti</surname>
          </string-name>
          .
          <article-title>Automatically assessing review helpfulness</article-title>
          .
          <source>In Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          , pp.
          <fpage>423</fpage>
          -
          <lpage>430</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>Jindal</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Liu</surname>
          </string-name>
          .
          <article-title>Analyzing and Detecting Review Spam</article-title>
          .
          <source>In IEEE Intl. Conference on Data Mining (ICDM)</source>
          , pp.
          <fpage>547</fpage>
          -
          <lpage>552</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.-Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Huang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhou</surname>
          </string-name>
          .
          <article-title>Low-quality product review detection in opinion summarization</article-title>
          .
          <source>In Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>G.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Greene</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Smyth</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Cunningham</surname>
          </string-name>
          .
          <article-title>Distortion as a validation criterion in the identification of suspicious reviews</article-title>
          .
          <source>In Proceedings of the First Workshop on Social Media Analytics (SOMA) at SIGKDD</source>
          , pp.
          <fpage>10</fpage>
          -
          <lpage>13</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Cardie</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Hancock</surname>
          </string-name>
          .
          <article-title>Estimating the prevalence of deception in online review communities</article-title>
          .
          <source>In Proceedings of the 21st international conference on World Wide Web (WWW)</source>
          , pp.
          <fpage>201</fpage>
          -
          <lpage>210</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>[12] RapidMiner: http://www.softwarenhardware.com/tag/rapidminer-tutorial/ [Last Accessed: Mar 2018]</mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Rapid</surname>
            <given-names>Miner &amp; Rapid</given-names>
          </string-name>
          <string-name>
            <surname>Analytics</surname>
          </string-name>
          [Online] http:// www.rapidi.com/downloads/brochures/RapidMiner_ Fact_Sheet.
          <source>pdf [Last Accessed: March</source>
          <year>2018</year>
          ]
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>A-M. Popescu</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Etzioni</surname>
          </string-name>
          .
          <article-title>Extracting Product Features and Opinions from Reviews</article-title>
          . EMNLP'
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>T.</given-names>
            <surname>Seidl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kriegel</surname>
          </string-name>
          .
          <article-title>"Optimal multi-step k-nearest neighbor search." ACM Sigmod Record</article-title>
          . Vol.
          <volume>27</volume>
          . No.
          <article-title>2</article-title>
          . ACM,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Ren</surname>
          </string-name>
          , T&gt; Liu. “
          <article-title>Document representation and feature combination for deceptive spam review detection</article-title>
          .
          <source>” Neurocomputing</source>
          , Volume
          <volume>254</volume>
          ,
          <year>2017</year>
          , pp.
          <fpage>33</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>