<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Ensuring the Integrity of Wikipedia: A Data Science Approach</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science Department Boise State University</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we present our research on the problem of ensuring the integrity of Wikipedia, the world's biggest free encyclopedia. As anyone can edit Wikipedia, many malicious users take advantage of this situation to make edits that compromise pages' content quality. Speci cally, we present DePP, the state-of-the-art tool that detects article pages to protect with an accuracy of 93% and we introduce our research on identifying spam users. We show that we are able to classify spammers from benign users with 80.8% of accuracy and 0.88 mean average precision.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Wikipedia is the world's biggest free encyclopedia read by many users every day.
Thanks to the mechanism by which anyone can edit, its content grows and is kept
constantly updated. However, malicious users can take advantage of this open
editing mechanism to seriously compromise the quality of Wikipedia articles.</p>
      <p>
        The main form of content damaging is vandalism, de ned by Wikipedia itself
as \any addition, removal, or change of content, in a deliberate attempt to
compromise the integrity of Wikipedia"[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Other forms of damaging edits are
page spamming [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and dissemination of false information, e.g. through hoax
articles [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>In this paper, we discuss two research e orts which have the common goal
of ensuring the content integrity of Wikipedia.</p>
      <p>
        First, we introduce DePP, the state-of-the-art tool detecting article pages
to protect [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Page protection is a mechanism used by Wikipedia to place
restrictions on the type of users that can make edits to prevent vandalism, libel,
or edit wars [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Our DePP system achieves an accuracy of 93% and signi cantly
improves over baselines.
      </p>
      <p>
        Second, we present our work on spam users identi cation [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. We formulate
the problem as a binary classi cation task and propose a set of features based on
user editing behavior to separate spam users from benign ones. Our results show
that we reach 80.8% classi cation accuracy and 0.88 mean average precision and
beat ORES, the most recent tool developed by Wikimedia to assign damaging
scores to edits.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Detecting damaging edits. Plenty of work has been done on detecting
damaging edits, particularly vandalism (see [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] for a survey). Currently, ClueBot
NG [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and STiki [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] are the state-of-the-art tools used by Wikipedia to detect
vandalism. ClueBot NG is a bot based on an arti cial neural network which
scores edits and reverts the worst-scoring edits. STiki is an intelligent routing
tool which suggests potential vandalism to humans for de nitive classi cation.
It works by scoring edits on the basis of metadata and reverts, and computing
a reputation score for each user.
      </p>
      <p>
        Recently, Wikimedia Foundation launched a new machine learning-based
service, called Objective Revision Evaluation Service (ORES) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] which measures
the level of general damage each edit causes. More speci cally, given an edit,
ORES provides three probabilities predicting (i) whether or not it causes
damage, (ii) if it was saved in good-faith, and (iii) if the edit will eventually be
reverted. These scores are available through the ORES public API 1.
      </p>
      <p>
        Regarding spam edits detection speci cally, previous work concentrated on
the problem of predicting whether a link contained in an edit is spam or not,
whereas, in this paper, we predict whether a user is a spammer or not by
considering her edit history. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] created the rst Wikipedia link-spam corpus,
identi ed Wikipedia's link spam vulnerabilities, and proposed mitigation strategies
based on explicit edit approval, re nement of account privileges, and detecting
potential spam edits through a machine learning framework. The latter
strategy, described by the same authors in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], relies on features based on (i) article
metadata and link/URL properties, (ii) HTML landing site analysis, and (iii)
third-party services used to discern spam landing sites. This tool was
implemented as part of STiki and has been used on Wikipedia since 2011. Nowadays,
this STiki component is inactive due to a monetary cost for third-party services.
An Early Warning System for Vandals. In our previous work [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], we
addressed the problem of vandalism in Wikipedia from a di erent perspective.
We studied for the rst time the problem of early prediction of vandal users.
The proposed system, called VEWS (Vandal Early Warning System) 2, leverages
di erences in the editing behavior of vandals vs. benign users and detect vandals
with an accuracy of over 85% and outperforms both ClueBot NG and STiki.
Moreover, as an early warning system, VEWS detects, on average, vandals 2.39
edits before ClueBot NG. The combination of VEWS and Cluebot NG results
in a fully automated system that does not leverage any human input (e.g. edit
reversion) and further increases the performances.
      </p>
      <p>Page protection. When a page article is heavily vandalized, administrators
may decide to protect the page by restricting its access. To the best of our
knowledge, little research has been done on the topic of page protection in Wikipedia.</p>
      <sec id="sec-2-1">
        <title>1 http://ores.wikimedia.org</title>
        <p>
          2 Dataset and code are available at http://www.cs.umd.edu/~vs/vews/
Hill and Shaw [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] studied the impact of page protection on user patterns of
editing. They also created a dataset (they admit it may not be complete) of protected
pages to perform their analysis. There are not currently bots on Wikipedia that
can search for pages that may need to be protected. WikiMedia does have a
script 3 available in which administrative users can protect a set of pages all at
once. However, this program requires that the user supply the pages, or the
category of pages to be protected and is only intended for protecting a large group
of pages at once. There are some bots on Wikipedia that can help with some of
the wiki-work that goes along with protecting or removing page protection. This
includes adding or removing a template to a page that is marked as protected or
no longer marked as protected. These bots can automatically update templates
if a page protection has expired.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Detecting Pages to Protect</title>
      <p>
        The rst problem we address consists in deciding whether or not a page should
be protected by Wikipedia administrators. Page protection consists in placing
restrictions on the type of users that can edit a Wikipedia page. Common
motivations that an administrative user may have in protecting a page include
(i) consistent vandalism or libel from one or more users, and (ii) avoiding edit
wars [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. There are di erent levels of page protection for which di erent levels
of users can make edits (or, in general, perform actions on the page): fully
protected pages can be edited (or moved) only by administrators, semi-protected
pages can be edited only by autocon rmed users, while move protection does
not allow pages to be moved to a new title, except by an administrator. Page
protections can also be set for di erent amounts of time, including 24 or 36
hours, or inde nitely.
      </p>
      <p>Currently, English Wikipedia contains over ve million pages. Only a small
percentage of those pages are currently protected, less than 0.2 percent.
However, around 17 pages become protected every day (according to the number of
protected pages from May 6 through Aug 6, 2016). This ratio shows how it is
di cult for administrative users to monitor over all Wikipedia pages to
determine if any need to be protected. Users can request pages to be protected or
unprotected but an administrative user would have to analyze the page to
determine if it should be protected, what level of protection to give, and for how long
the protection should last, if not inde nitely. All this work is currently manually
done by administrators.</p>
      <p>
        To overcome this problem, we propose DePP, the rst automated tool to
detect pages to protect in Wikipedia [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. DePP is a machine learning-based tool
that works with two novel set of features based on (i) users page revision behavior
and (ii) page categories. More speci cally, the rst group of features includes the
following six base features:
      </p>
      <sec id="sec-3-1">
        <title>E1 Total average time between revisions ;</title>
        <sec id="sec-3-1-1">
          <title>3 https://www.mediawiki.org/wiki/Manual:Pywikibot/protect.py</title>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>E2 Total number of users making 5 or more revisions ;</title>
      </sec>
      <sec id="sec-3-3">
        <title>E3 Total average number of revisions per user ;</title>
      </sec>
      <sec id="sec-3-4">
        <title>E4 Total number of revisions by non-registered users ;</title>
      </sec>
      <sec id="sec-3-5">
        <title>E5 Total number of revisions made from mobile device ;</title>
      </sec>
      <sec id="sec-3-6">
        <title>E6 Total average size of revisions.</title>
        <p>In addition to the above base features, we also include an additional set of
features taking into account the page editing pattern over the time. We de ne
these features by leveraging the features E1-E6 as follows. For each page, we
consider the edits made in the latest 10 weeks and we split this time interval
into time frames of two weeks (last two weeks, second last two weeks, etc.).
Then, we compute features E1 to E6 within each time frame. The idea of these
features is to monitor features E1-E6 over time to see if some anomaly starts to
happen at some point. For instance, if a page is new we may observe a lot of
edits of larger size in a short time after the page is created as users are building
the content of the page. Later when the content is stable, we may observe fewer
edits of smaller size representing small changes in the page. On the other hand,
if the content of the page was stable and suddenly we observe a lot of edits from
many users, it may indicate the page topic became controversial and the page
may need protection.</p>
        <p>The second group of features use information about page categories and
includes:</p>
      </sec>
      <sec id="sec-3-7">
        <title>NC Number of categories a page is marked under ;</title>
      </sec>
      <sec id="sec-3-8">
        <title>PC Probability of protecting the page given its categories : given all the pages</title>
        <p>in the training set T and a page category c, we compute the probability
pr(c) that pages in category c are protected as the percentage of pages in T
having category c that are protected. Then, given a page p having categories
c1; : : : ; cn, we compute this feature as the probability that the page is in at
least one category whose pages have a high probability to be protected as
P C(p) = 1 Qin=1(1 pr(ci)).</p>
        <p>In addition to the above two features, we de ne another group of features
that shows how much features E1-E6 vary for a page p w.r.t. the average of
these values among all the pages in the same categories as p. Speci cally, given
the set of pages in the training set T , we computed the set C of the top-100
most frequent categories. Additionally, for each category c 2 C, we averaged the
features E1-E6 among all the pages (denoted by Tc) having category c in the
training set. Then, for each page p we computed 600 features (6 times 100), one
for each feature Ei (1 i 6) and for each category c 2 C as follows:
C(Ei; c) =
(jEi(p)
0
avgp02Tc (Ei(p0))j if p is in category c
otherwise
where Ei(p) is the value of the feature Ei for the page p. The aim of this group
of features is to understand if a page is anomalous w.r.t. other pages in the same
category.</p>
        <p>DePP
93.237%</p>
        <p>All the features that we propose are language independent as they do not
consider page content. As a consequence, DePP is general and able to work on
any version of Wikipedia.</p>
        <p>To test our DePP system we built a balanced dataset 4 containing all edit
protected articles until to Apr. 7, 2016 (6,799 pages) and an almost equal number
of randomly selected unprotected pages (6,824), for a total of 13.6K article pages,
and up to the last 500 most recent revisions for each selected page. For protected
pages, we only gathered the revisions up until the most recent protection. If
there was more than one recent protection, we gathered the revision information
between the two protections. This allowed us to focus on the revisions leading
up to the most recent page protection. Revision information that we collected
included the user who made the revision, the timestamp of the revision, the
size of the revision, the categories of the page, and any comments, tags or ags
associated with the revision.</p>
        <p>The DePP accuracy in the prediction task on 10-fold cross validation is
reported for random forest (the best performing algorithm as compared to Logistic
Regression, SVM, and K-Nearest Neighbor) in Table 1. As we can see, DePP is
able to classify pages to protect from pages that do not need protection with
an accuracy of 93.237%. As no automated tool detecting which page to protect
exists in Wikipedia, we de ned some baselines to compare our results. One of
the main reasons for protecting a page on Wikipedia is to stop edit wars,
vandalism or libel from happening, or continuing to happen on a page. Thus, we
used the following baselines: [B1] Number of revisions tagged as \Possible libel
or vandalism"; [B2] Number of revisions that Cluebot NG or STiki reverted as
possible vandalism; [B3] Number of edit wars between two users in the page.</p>
        <p>As we can see in Table 1, DePP signi cantly beats each individual baseline
and the combination of all the three.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Spam Users Identi cation</title>
      <p>Another problem that compromises the content quality of Wikipedia articles is
spamming. Wikipedia, like most forms of online social media, receives
continuous spamming attempts every day. Since all non-protected pages are open for
editing by any type of user, inevitably happens that malicious users have the
opportunity to post spam messages into any open page. These messages remain</p>
      <sec id="sec-4-1">
        <title>4 Dataset available at http://bit.ly/wiki_depp</title>
        <p>
          on the page until they are discovered and removed by another user. Speci cally,
Wikipedia recognizes three main types of spam, namely \advertisements
masquerading as articles, external link spamming, and adding references with the
aim of promoting the author or the work being referenced"[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
        </p>
        <p>Currently, no speci c tool is available on Wikipedia to identify neither spam
edits or spam users. Tools like Cluebot NG and STiki are tailored toward
vandalism detection, while ORES is designed to detect damaging edits in general.
As in the case of page protection, the majority of the work to protect Wikipedia
from spammers is done manually by Wikipedia users (patrollers, watchlisters,
and readers) who monitor recent changes in the encyclopedia and, eventually,
report suspicious spam users to administrators for de nitive account blocking.</p>
        <p>
          To ght spammers on Wikipedia, we study the problem of identifying spam
users from benign ones [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Our work is closer in spirit to [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] as the aim is to
classify users by using their editing behavior instead of classifying a single edit
as vandalism [
          <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
          ], spam [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] or generally damaging [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
        </p>
        <p>
          We propose a machine learning-based framework using a set of features which
are based on research that has been done regarding typical behaviors exhibited
by spammers: similarity in edit size and links used in revisions, similar
timesensitive behavior in edits, social involvement of a user in the community through
contribution to Wikipedias talk page system, and chosen username. We did not
consider any feature related to edit content so that our system would be language
independent and capable of working for all Wikipedia versions. Moreover, we do
not rely on third-party services, so there is no overhead cost as in [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
        <p>The list of features we considered in our system are as follows:
User edit sizes based features : average size of edits, standard deviation of
edit sizes, and variance signi cance (previous feature normalized by user
average edit size).</p>
        <p>Edit timing behavior based features : average and standard deviation of
time between edits.</p>
        <p>Links in edits based features : Unique link rating (the ratio of unique links
posted by a user to the total number of links posted by the user) and link
ratio in edits (number of edits that a user makes which contain links).
Talk page edit ratio : this is the ratio of talk pages edited by a user that
correspond with the main article pages that a user edits.</p>
        <p>
          Username based features : Zafarani and Liu [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] showed that aspects of
users' usernames themselves contain information that is useful in detecting
malicious users. Thus, in addition to the features based on users' edit
behaviors, we also considered four additional username related features: number of
digits in a username, ratio of digits in a username, number of leading digits
in a username, and unique character ratio in a username.
        </p>
        <p>To test our framework, we built a new dataset 5 containing 4.2K (half
spammer and half benign) users and 75.6K edits as follows. We collected all Wikipedia</p>
      </sec>
      <sec id="sec-4-2">
        <title>5 Dataset available at http://bit.ly/wiki_spammers</title>
        <p>Accuracy</p>
        <p>MAP</p>
        <p>Our Features ORES
80.8% 69.7%
0.88 0.695</p>
        <p>Our Features
+ ORES
82.1%
0.886
users (up to Nov. 17, 2016) who were blocked for spamming from two lists
maintained on Wikipedia: \Wikipedians who are inde nitely blocked for spamming"6
\Wikipedians who are inde nitely blocked for link spamming" 7. The rst list
contains all spam users blocked before Mar 12, 2009, while the second one
includes all link-spammers after Mar 12, 2009 to today. We gathered a total of
2,087 spam users (we only included users who did at least one edit) between the
two lists considered.</p>
        <p>In order to create a balanced dataset of spam/benign users, we randomly
select a sample of benign Wikipedia users of roughly the same size as the spammer
user set (2,119 users). To ensure these were genuine users, we cross-checked their
usernames against the entire list of blocked users provided by Wikipedia 8. This
list contains all users in Wikipedia who have been blocked for any reason,
spammers included. For each user in our dataset, we collected up to their 500 most
recent edits. For each edit we gathered the following information: edit content,
time-stamp, whether or not the edit is done on a Talk page, and the damaging
score provided by ORES.</p>
        <p>We run 10-fold cross validation on several machine learning algorithms, namely
SVM, Logistic Regression, K-Nearest Neighbor, Random Forest, and XGBoost,
to test the performances of our features. Experimental results are shown in
Table 2 for the best performing algorithm (XGBoost). Here we can see that our
system is able to classify spammers from benign users with 80.8% of accuracy
and it is a valuable tool in suggesting potential spammers to Wikipedia
administrators for further investigation as proved by a mean average precision of
0.88.</p>
        <p>
          We compared our tool with ORES only, as the tool proposed in [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] is no
longer used and Cluebot NG and STiki are designed speci cally for vandalism
and not spam. To compare our system with ORES, we considered the edit
damaging score. More speci cally, given a user and all her edits, we computed both
the average and maximum damaging score provided by ORES and used these as
features for classi cation. Results on 10-fold cross validation with XGBoost (the
best performing classi er) are reported in Table 2, as well. As we can see, ORES
performances are poor for the task of spammer detection (69.7% of accuracy
6 http://en.wikipedia.org/wiki/Category:Wikipedians_who_are_indefinitely_
blocked_for_spamming
7 http://en.wikipedia.org/wiki/Category:Wikipedians_who_are_indefinitely_
blocked_for_link-spamming
8 http://en.wikipedia.org/wiki/Special:BlockList
and a mean average precision of 0.695). However, combining our features with
ORES further increases the accuracy to 82.1%.
5
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>In this paper, we addressed the problem of ensuring the integrity of Wikipedia
pages and presented our research on detecting pages to protect and identifying
spam users. Our experimental results show that we are able to classify (i) article
pages to protect with an accuracy of 93% and (ii) spammers from benign users
with 80.8% of accuracy and 0.88 mean average precision.</p>
      <p>Both the methods proposed do not look at edit content and, as a consequence,
they are generally applicable to all versions of Wikipedia.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>1. http://en.wikipedia.org/wiki/Wikipedia:Vandalism.</mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>2. http://en.wikipedia.org/wiki/Wikipedia:Spam.</mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>3. http://en.wikipedia.org/wiki/Wikipedia:Editwarring.</mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>4. http://en.wikipedia.org/wiki/User:ClueBot_NG.</mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>5. http://en.wikipedia.org/wiki/Wikipedia:STiki.</mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>6. http://meta.wikimedia.org/wiki/Objective_Revision_Evaluation_Service.</mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>B. Thomas Adler</given-names>
            , Luca de Alfaro,
            <surname>Santiago Moises</surname>
          </string-name>
          Mola-Velasco,
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Rosso</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Andrew G.</given-names>
            <surname>West</surname>
          </string-name>
          .
          <article-title>Wikipedia vandalism detection: Combining natural language, metadata, and reputation features</article-title>
          .
          <source>In Computational linguistics and intelligent text processing</source>
          , pages
          <volume>277</volume>
          {
          <fpage>288</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Green</surname>
          </string-name>
          and
          <string-name>
            <given-names>Francesca</given-names>
            <surname>Spezzano</surname>
          </string-name>
          .
          <article-title>Spam users identi cation in wikipedia via editing behavior</article-title>
          .
          <source>In International AAAI Conference Web and Social Media</source>
          , pages
          <volume>532</volume>
          {
          <fpage>535</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Benjamin</given-names>
            <surname>Mako</surname>
          </string-name>
          Hill and
          <string-name>
            <given-names>Aaron D.</given-names>
            <surname>Shaw</surname>
          </string-name>
          .
          <article-title>Page protection: another missing dimension of wikipedia research</article-title>
          .
          <source>In International Symposium on Open Collaboration</source>
          , pages
          <volume>15</volume>
          :1{
          <issue>15</issue>
          :
          <fpage>4</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Srijan</surname>
            <given-names>Kumar</given-names>
          </string-name>
          , Francesca Spezzano, and
          <string-name>
            <given-names>VS</given-names>
            <surname>Subrahmanian</surname>
          </string-name>
          .
          <article-title>Vews: A wikipedia vandal early warning system</article-title>
          .
          <source>In ACM SIGKDD Conference on Knowledge Discovery and Data Mining</source>
          , pages
          <volume>607</volume>
          {
          <fpage>616</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Srijan</surname>
            <given-names>Kumar</given-names>
          </string-name>
          , Robert West, and
          <string-name>
            <given-names>Jure</given-names>
            <surname>Leskovec</surname>
          </string-name>
          .
          <article-title>Disinformation on the web: Impact, characteristics, and detection of wikipedia hoaxes</article-title>
          .
          <source>In International World Wide Web Conference</source>
          , pages
          <volume>591</volume>
          {
          <fpage>602</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>Kelsey</given-names>
            <surname>Suyehira</surname>
          </string-name>
          and
          <string-name>
            <given-names>Francesca</given-names>
            <surname>Spezzano</surname>
          </string-name>
          .
          <article-title>Depp: A system for detecting pages to protect in wikipedia</article-title>
          .
          <source>In International Conference on Information and Knowledge Management</source>
          , pages
          <year>2081</year>
          {
          <year>2084</year>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. Andrew G West,
          <article-title>Avantika Agrawal</article-title>
          , Phillip Baker, Brittney Exline, and
          <string-name>
            <given-names>Insup</given-names>
            <surname>Lee</surname>
          </string-name>
          .
          <article-title>Autonomous link spam detection in purely collaborative environments</article-title>
          .
          <source>In International Symposium on Wikis and Open Collaboration</source>
          , pages
          <volume>91</volume>
          {
          <fpage>100</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. Andrew G. West,
          <string-name>
            <given-names>Jian</given-names>
            <surname>Chang</surname>
          </string-name>
          , Krishna Venkatasubramanian, Oleg Sokolsky, and
          <string-name>
            <given-names>Insup</given-names>
            <surname>Lee</surname>
          </string-name>
          .
          <article-title>Link spamming wikipedia for pro t</article-title>
          . In Annual Collaboration, Electronic messaging,
          <source>Anti-Abuse and Spam Conference</source>
          , pages
          <volume>152</volume>
          {
          <fpage>161</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>Reza</given-names>
            <surname>Zafarani</surname>
          </string-name>
          and Huan Liu.
          <article-title>10 bits of surprise: Detecting malicious users with minimum information</article-title>
          .
          <source>In International Conference on Information and Knowledge Management</source>
          , pages
          <volume>423</volume>
          {
          <fpage>431</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>