<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Making the Most of Tweet-Inherent Features for Social Spam Detection on Twitter</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bo Wang</string-name>
          <email>bo.wang@warwick.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arkaitz Zubiaga</string-name>
          <email>a.zubiaga@warwick.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maria Liakata</string-name>
          <email>m.liakata@warwick.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rob Procter</string-name>
          <email>rob.procter@warwick.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science University of Warwick Coventry</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <fpage>10</fpage>
      <lpage>16</lpage>
      <abstract>
        <p>Social spam produces a great amount of noise on social media services such as Twitter, which reduces the signal-tonoise ratio that both end users and data mining applications observe. Existing techniques on social spam detection have focused primarily on the identification of spam accounts by using extensive historical and network-based data. In this paper we focus on the detection of spam tweets, which optimises the amount of data that needs to be gathered by relying only on tweet-inherent features. This enables the application of the spam detection system to a large set of tweets in a timely fashion, potentially applicable in a realtime or near real-time setting. Using two large hand-labelled datasets of tweets containing spam, we study the suitability of five classification algorithms and four di↵erent feature sets to the social spam detection task. Our results show that, by using the limited set of features readily available in a tweet, we can achieve encouraging results which are competitive when compared against existing spammer detection systems that make use of additional, costly user features. Our study is the first that attempts at generalising conclusions on the optimal classifiers and sets of features for social spam detection over di↵erent datasets.</p>
      </abstract>
      <kwd-group>
        <kwd>spam detection</kwd>
        <kwd>classification</kwd>
        <kwd>social media</kwd>
        <kwd>microblogging</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        Social networking spam, or social spam, is increasingly
affecting social networking websites, such as Facebook,
Pinterest and Twitter. According to a study by the social media
security firm Nexgate [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], social media platforms
experiPCeorpmyirsisgihont tco m20ak1e5 dhieglidtalbyor ahuatrhdocro(sp)i/eoswonfearl(ls)o;r cpoaprtyionfgthpiserwmoirtktefdor
poenrlsyonfoarl oprricvlaatsesraonodm aucsaediesmgircanptuedrpwositehso.ut fee provided that copies are
nPoutbmliashdeedoradsipstarirbtuotefdthfoer#prMofiictroorpcoosmtsm20e1rc5iaWlaodrvkasnhtoapgeparnodcetehdaitncgosp,ies
baveaariltahbislenotice anadstCheEfUulRlcVitaotli-o1n39o5n (thhettfipr:s/t/pcaeguer.-Twos.coorpgy/oVtohle-r1w3i9s5e), to
online
republish, to post on servers or to redistribute to lists, requires prior specific
#Microposts2015, May 18th, 2015, Florence, Italy.
permission and/or a fee.
      </p>
      <p>
        Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00.
enced a 355% growth of social spam during the first half of
2013. Social spam can reach a surprisingly high visibility
even with a simple bot [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which detracts from a company’s
social media presence and damages their social marketing
ROI (Return On Investment). Moreover, social spam
exacerbates the amount of unwanted information that average
social media users receive in their timeline, and can
occasionally even a↵ect the physical condition of vulnerable users
through the so-called “Twitter psychosis” [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        Social spam has di↵erent e↵ects and therefore its
definition varies across major social networking websites. One of
the most popular social networking services, Twitter, has
published their definition of spamming as part of their “The
Twitter Rules” 1 and provided several methods for users to
report spam such as tweeting “@spam @username” where
@username will be reported as a spammer. While as a
business, Twitter is also generous with mainline bot-level access
2 and allows some level of advertisements as long as they
do not violate “The Twitter Rules”. In recent years we have
seen Twitter being used as a prominent knowledge base for
discovering hidden insights and predicting trends from
finance to public sector, both in industry and academia. The
ability to sort out the signal (or the information) from
Twitter noise is crucial, and one of the biggest e↵ects of Twitter
spam is that it significantly reduces the signal-to-noise ratio.
Our work on social spam is motivated by the initial attempts
at harvesting a Twitter corpus around a specific topic with
a set of predefined keywords [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. This led to the
identification of a large amount of spam within those datasets. The
fact that certain topics are trending and therefore many are
tracking its contents encourages spammers to inject their
spam tweets using the keywords associated with these
topics to maximise the visibility of their tweets. These tweets
produce a significant amount of noise both to end users who
follow the topic as well as to tools that mine Twitter data.
      </p>
      <p>
        In previous works, the automatic detection of Twitter
spam has been addressed in two die↵rent ways. The first
way is to tackle the task as a user classification problem,
where a user can be deemed either a spammer or a
nonspammer. This approach, which has been used by the
majority of the works in the literature so far (see e.g., [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] and [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]), makes use of numerous features that
need to gather historical details about a user, such as tweets
that a user posted in the past to explore what they usually
1https://support.twitter.com/articles/18311-the-twitterrules
2http://www.newyorker.com/tech/elements/the-rise-oftwitter-bots
tweet about, or how the number of followers and followings
of a user has evolved in recent weeks to discover unusual
behaviour. While this is ideal as the classifier can make use of
extensive user data, it is often unfeasible due to restrictions
of the Twitter API. The second, alternative way, which has
not been as common in the literature (see e.g., [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]), is to
define the task as a tweet classification problem, where a
tweet can be deemed spam or non-spam. In this case, the
classification task needs to assume that only the information
provided within a tweet is available to determine if it has to
be categorised as spam. Here, we delve into this approach to
Twitter spam classification, studying the categorisation of a
tweet as spam or not from its inherent features. While this
is more realistic for our scenario, it presents the extra
challenge that the available features are rather limited, which
we study here.
      </p>
      <p>In this work, after discussing the definition of social spam
and reviewing previous research in Twitter spam detection,
we present a comparative study of Twitter spam detection
systems. We investigate the use of di↵erent features inherent
to a tweet so as to identify the sets of features that do best in
categorising tweets as spam or not. Our study compares five
die↵rent classification algorithms over two di↵erent datasets.
The fact that we test our classifiers on two di↵erent datasets,
collected in di↵erent ways, enables us to validate the results
and claim repeatability. Our results suggest a competitive
performance can be obtained using tree-based classifiers for
spam detection even with only tweet-inherent features, as
comparing to the existing spammer detection studies. Also
the combination of di↵erent features generally lead to an
improved performance, with User feature + Bi &amp; Tri-gram
(Tf) having the best results for both datasets.
2.</p>
    </sec>
    <sec id="sec-2">
      <title>SOCIAL SPAM</title>
      <p>
        The detection of spam has now been studied for more
than a decade since email spam [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In the context of email
messages, spam has been widely defined as “unsolicited bulk
email” [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The term “spam” has then been extended to other
contexts, including “social spam” in the context of social
media. Similarly, social spam can be defined as the “unwanted
content that appears in online social networks”. It is, after
all, the noise produced by users who express a di↵erent
behavior from what the system is intended for, and has the
goal of grabbing attention by exploiting the social networks’
characteristics, including for instance the injection of
unrelated tweet content in timely topics, sharing malicious links
or fraudulent information. Social spam hence can appear in
many di↵erent forms, which poses another challenge of
having to identify very die↵rent types of noise for social spam
detection systems.
2.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>Social Spammer Detection</title>
      <p>
        As we said before, most of the previous work in the area
has focused on the detection of users that produce spam
content (i.e., spammers), using historical or network features of
the user rather than information inherent to the tweet. Early
work by [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] put together a set of di↵erent
features that can be obtained by looking at a user’s previous
behaviour. These include some aggregated statistics from
a user’s past tweets such as average number of hashtags,
average number of URL links and average number of user
mentions that appear in their tweets. They combine these
with other non-historical features, such as number of
followers, number of followings and age of the account, which can
be obtained from a user’s basic metadata, also inherent to
each tweet they post. Some of these features, such as the
number of followers, can be gamed by purchasing additional
followers to make the user look like a regular user account.
      </p>
      <p>
        Lee et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and Yang et al. [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] employed di↵erent
techniques for collecting data that includes spam (more details
will be discussed in Section 3.1) and performed
comprehensive studies of the spammers’ behaviour. They both relied
on the tweets posted in the past by the users and their social
networks, such as tweeting rate, following rate, percentage
of bidirectional friends and local clustering coecient of its
network graph, aiming to combat spammers’ evasion tactics
as these features are dicult or costly to simulate. Ferrara
et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] used network, user, friends, timing, content and
sentiment features for detecting Twitter bots, their
performance evaluation is based on the social honeypots dataset
(from [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]). Miller et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] treats spammer detection as an
anomaly detection problem as clustering algorithms are
proposed and such clustering model is built on normal Twitter
users with outliers being treated as spammers. They also
propose using 95 uni-gram counts along with user profile
attributes as features. The sets of features utilised in the
above works require the collection of historical and network
data for each user, which do not meet the requirements of
our scenario for spam detection.
2.2
      </p>
    </sec>
    <sec id="sec-4">
      <title>Social Spam Detection</title>
      <p>
        Few studies have addressed the problem of spam
detection. Santos et al. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] investigated two die↵rent approaches,
namely compression-based text classification algorithms (i.e.
Dynamic Markov compression and Prediction by partial
matching) and using “bag of words” language model (also known
as uni-gram language model) for detecting spam tweets.
Martinez-Romo and Araujo [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] applied Kullback-Leibler
Divergence and examined the di↵erence of language used in
a set of tweets related to a trending topic, suspicious tweets
(i.e. tweets that link to a web page) and the page linked
by the suspicious tweets. These language divergence
measures were used as their features for the classification. They
used several URL blacklists for identifying spam tweets from
their crawled dataset, therefore each one of their labelled
spam tweets contains a URL link, and is not able to identify
other types of spam tweets. In our studies we have
investigated and evaluated the discriminative power of four feature
sets on two Twitter datasets (which were previously in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
and [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]) using five di↵erent classifiers. We examine the
suitability of each of the features for the spam classification
purposes. Comparing to [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] our system is able to detect
most known types of spam tweet irrespective of having a
link or not. Also our system does not have to analyze a set
of tweets relating to each topic (which [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] did to create part
of their proposed features) or external web page linked by
each suspicious tweet, therefore its computation cost does
not increase dramatically when applied for mass spam
detection with potentially many die↵rent topics in the data
stream.
      </p>
      <p>
        The few works that have dealt with spam detection are
mostly limited in terms of the sets of features that they
studied, and the experiments have been only conducted in
a single dataset (except in the case of [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], where very
limited evaluation was conducted on a new and smaller set of
tweets), which does not allow for generalisability of the
results. To the best of our knowledge, our work is the first
study that evaluates a wide range of tweet-inherent features
(namely user, content, n-gram and sentiment features) over
two die↵rent datasets, obtained from [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] and with
more than 10,000 tweets each, for the task of spam detection.
The two datasets were collected using completely di↵erent
approaches (namely deploying social honeypots for
attracting spammers; and checking malicious URL links), which
helps us learn more about the nature of social spam and
further validate the results of di↵erent spam detection
systems.
3.
      </p>
    </sec>
    <sec id="sec-5">
      <title>METHODOLOGY</title>
      <p>In this section we describe the Twitter spam datasets we
used, the text preprocessing techniques that we performed
on the tweets, and the four di↵erent feature sets we used for
training our spam vs non-spam classifier.
3.1</p>
    </sec>
    <sec id="sec-6">
      <title>Datasets</title>
      <p>A labelled collection of tweets is crucial in a machine
learning task such as spam detection. We found no spam dataset
which is publicly available and specifically fulfils the
requirements of our task. Instead, the datasets we obtained include
Twitter users labelled as spammers or not. For our work, we
used the latter, which we adapted to our purposes by taking
out the features that would not be available in our scenario
of spam detection from tweet-inherent features. We used
two spammer datasets in this work, which have been
created using die↵rent data collection techniques and therefore
is suitable to our purposes of testing the spam classifier in
die↵rent settings. To accomodate the datasets to our needs,
we sample one tweet for each user in the dataset, so that
we can only access one tweet per user and cannot
aggregate several tweets from the same user or use social network
features. In what follows we describe the two datasets we
use.</p>
      <sec id="sec-6-1">
        <title>Social Honeypot Dataset: Lee et al. [8] created and</title>
        <p>
          manipulated (by posting random messages and engaging in
none of the activities of legitimate users) 60 social honeypot
accounts on Twitter to attract spammers. Their dataset
consists of 22,223 spammers and 19,276 legitimate users
along with their most recent tweets. They used
ExpectationMaximization (EM) clustering algorithm and then manually
grouped their harvested users into 4 categories: duplicate
spammers, duplicate @ spammers, malicious promoters and
friend infiltrators. 1KS-10KN Dataset: Yang et al. [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]
defines a tweet that contains at least one malicious or
phishing URL as a spam tweet, and a user whose spam ratio is
higher than 10% as a spammer. Therefore their dataset
which contains 1,000 spammers and 10,000 legitimate users,
represents only one major type of spammers (as discussed
in their paper).
        </p>
        <p>
          We used spammer vs. legitimate user datasets from [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]
and [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. After removing duplicated users and the ones that
do not have any tweets in the dataset we randomly selected
one tweet from each spammer or legitimate user to create our
labelled collection of spam vs. legitimate tweets, in order to
avoid overfitting and reduce our sampling bias. The
resulting datasets contain 20,707 spam tweets and 19,249 normal
tweets (named Social Honeypot dataset, as from [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]), and
1,000 spam tweets and 9,828 normal tweets (named
1KS10KN dataset, as from [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]) respectively.
3.2
        </p>
        <p>Before we extract the features to be used by the classifier
from each tweet, we apply a set of preprocessing techniques
to the content of the tweets to normalise it and reduce the
noise in the classification phase. The preprocessing
techniques include decoding HTML entities, and expanding
contractions with apostrophes to standard spellings (e.g. “I’m”
-&gt; “I am”). More advanced preprocessing techniques such as
spell-checking and stemming were tested but later discarded
given the minimal e↵ect we observed in the performance of
the classifiers.</p>
        <p>For the specific case of the extraction of sentiment-based
features, we also remove hashtags, links, and user mentions
from tweet contents.
3.3</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Features</title>
      <p>As spammers and legitimate users have di↵erent goals in
posting tweets or interacting with other users on Twitter,
we can expect that the characteristics of spam tweets are
quite die↵rent to the normal tweets. The features inherent
to a tweet include, besides the tweet content itself, a set of
metadata including information about the user who posted
the tweet, which is also readily available in the stream of
tweets we have access to in our scenario. We analyse a wide
range of features that reflect user behaviour, which can be
computed straightforwardly and do not require high
computational cost, and also describe the linguistic properties that
are shown in the tweet content. We considered four feature
sets: (i) user features, (ii) content features, (iii) n-grams,
and (iv) sentiment features.</p>
      <p>
        User features include a list of 11 attributes about the
author of the tweet (as seen in Table 1) that is generated
from each tweet’s metadata, such as reputation of the user
[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], which is defined as the ratio between the number of
followers and the total number of followers and followings and
it had been used to measure user influence. Other candidate
features, such as the number of retweets and favourites
garnered by a tweet, were not used given that it is not readily
available at the time of posting the tweet, where a tweet has
no retweets or favourites yet.
      </p>
      <p>
        Content features capture the linguistic properties from
the text of each tweet (Table 1) including a list of content
attributes and part-of-speech tags. Among the 17 content
attributes, number of spam words and number of spam words
per word are generated by matching a popular list of spam
words 3. Part-of-speech (or POS) tagging provides
syntactic (or grammatical) information of a sentence and has been
used in the natural language processing community for
measuring text informativeness (e.g. Tan et al. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] used POS
counts as a informativeness measure for tweets). We have
used a Twitter-specific tagger [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and in the end our POS
feature consists of uni-gram and 2-skip-bi-gram
representations of POS tagging for each tweet in order to capture the
structure and therefore informativeness of the text. We also
used Stanford tagger with standard Penn Tree tags, which
makes very little di↵erence in the classification results.
      </p>
      <p>N-gram models have long been used in natural language
processing for various tasks including text classification.
Although it is often criticized for its lack of any explicit
representation of long range or semantic dependency, it is
surpris</p>
      <sec id="sec-7-1">
        <title>3https://github.com/splorp/wordpress-comment</title>
        <p>blacklist/blob/master/blacklist.txt</p>
        <sec id="sec-7-1-1">
          <title>User features</title>
        </sec>
        <sec id="sec-7-1-2">
          <title>Content features</title>
          <p>Length of profile name
Length of profile description
Number of followings (FI)
Number of followers (FE)
Number of tweets posted
Reputation of the user (FE/(FI + FE))
Following rate (FI/AU)
Number of tweets posted per day
Number of tweets posted per week
Uni + bi-gram or bi + tri-gram</p>
        </sec>
        <sec id="sec-7-1-3">
          <title>N-grams</title>
        </sec>
        <sec id="sec-7-1-4">
          <title>Sentiment features</title>
          <p>Automatically created sentiment lexicons
Manually created sentiment lexicons
Age of the user account, in hours (AU)
Ratio of number of followings and followers (FE/FI)
Mean word length
Number of words
Number of characters
Number of white spaces
Number of capitalization words
Number of capitalization words per word
Maximum word length
Number of exclamation marks
Number of question marks
Number of URL links
Number of URL links per word
Number of hashtags
Number of hashtags per word
Number of mentions
Number of mentions per word
Number of spam words
Number of spam words per word</p>
          <p>Part of speech tags of every tweet
ingly powerful for simple text classification with reasonable
amount of training data. In order to give the best
classification result while being computationally ecient we have
tried uni + bi-gram or bi + tri-gram with binary (i.e. 1 for
feature presence while 0 for absence), term-frequency (tf)
and tf-idf (i.e. Term Frequency times Inverse Document
Frequency) techniques.</p>
          <p>
            Sentiment features: Ferrara et al. [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] used tweet-level
sentiment as part of their feature set for the purpose of
detecting Twitter bots. We have used the same list of
lexicons from [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ] (which has been proved of achieving top
performance in the Semeval-2014 Task 9 Twitter sentiment
analysis competition) for generating our sentiment features,
including manually generated sentiment lexicons: AFINN
lexicon [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ] , Bing Liu lexicon [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ], MPQA lexicon [
            <xref ref-type="bibr" rid="ref19">19</xref>
            ]; and
automatically generated sentiment lexicons: NRC Hashtag
Sentiment lexicon [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ] and Sentiment140 lexicon [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ].
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>EVALUATION</title>
    </sec>
    <sec id="sec-9">
      <title>Selection of Classifier</title>
      <p>During the classification and evaluation stage, we tested
5 classification algorithms implemented using scikit-learn4:
Bernoulli Naive Bayes, K-Nearest Neighbour (KNN),
Support Vector Machines (SVM), Decision Tree, and Random
Forests. These algorithms were chosen as being the most
commonly used in the previous research on spammer
detection. We evaluate using the standard information retrieval
metrics of recall (R), precision (P) and F1-measure. Recall
in this case refers to the ratio obtained from diving the
number of correctly classified spam tweets (i.e. True Positives)
by the number of tweets that are actually spam (i.e. True
Positives + False Negatives). Precision is the ratio of the
number of correctly classified spam tweets (i.e. True
Positives) to the total number of tweets that are classified as
spam (i.e. True Positives + False Positives). F1-measure
can be interpreted as a harmonic mean of the precision and
recall, where its score reaches its best value at 1 and worst
at 0. It is defined as:</p>
      <p>
        F 1 = 2 ⇤ (precision ⇤ recall)/(precision + recall)
In order to select the best classifier for our task, we have
used a subset of each dataset (20% for 1KS-10KN dataset
and 40% for Social Honeypot dataset, due to the die↵rent
sizes of the two datasets) to run a 10-fold cross validation for
optimising the hyperparameters of each classifier. By doing
so it minimises the risk of over-fitting in model selection and
hence subsequent selection bias in performance evaluation.
Such optimisation was conducted using all 4 feature sets
(each feature was normalised to fit the range of values [
        <xref ref-type="bibr" rid="ref1">-1,
1</xref>
        ]; we also selected 30% of the highest scoring features using
Chi Square for tuning SVM as computationally it is more
ecient and gives better classification results). Then we
evaluated our algorithm on the rest of the data (i.e. 80% for
1KS-10KN dataset and 60% for Social Honeypot dataset),
again using all 4 feature sets in a 10-fold cross validation
setting (same as in grid-search, each feature was normalised
and Chi square feature selection was used for SVM).
      </p>
      <p>As shown in Table 2, tree-based classifiers achieved very
promising performances, among which Random Forests
outperform all the others when we look at the F1-measure. This
outperformance occurs especially due to the high precision
values of 99.3% and 94.1% obtained by the Random
Forest classifier. While Random Forests show a clear
superiority in terms of precision, its performance in terms of recall
varies for the two datasets; it achieves high recall for the
Social Honeypot dataset, while it drops substantially for the
1KS-10KN dataset due to its approximate 1:10
spam/nonspam ratio. These results are consistent with the conclusion
of most spammer detection studies; our results extend this
conclusion to the spam detection task.</p>
      <p>
        When we compare the performance values for the di↵erent
datasets, it is worth noting that with the Social Honeypot
dataset the best result is more than 10% higher than the
best result in 1KS-10KN dataset. This is caused by the
die↵rent spam/non-spam ratios in the two datasets, as the
Social Honeypot dataset has a roughly 50:50 ratio while in
1KS-10KN it is roughly 1:10 which is a more realistic ratio
to reflect the amount of spam tweets existing on Twitter
(In Twitter’s 2014 Q2 earnings report it says that less than
5% of its accounts are spam5, but independent researchers
believe the number is higher). In comparison to the original
papers, [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] reported a best 0.983 F1-score and [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] reported
a best 0.884 F1-score. Our results are only about 4% lower
than their results, which make use of historical and
networkbased data, not readily available in our scenario. Our results
suggest that a competitive performance can also be obtained
for spam detection where only tweet-inherent features can
be used.
4.2
      </p>
    </sec>
    <sec id="sec-10">
      <title>Evaluation of Features</title>
      <p>We trained our best classifier (i.e. Random Forests) with
di↵erent feature sets, as well as combinations of the feature
sets using the two datasets (i.e. the whole corpora), and
under a 10-fold cross validation setting. We report our results
in Table 3. As seen in 1KS-10KN dataset, the F1-measure
for di↵erent feature sets ranges from 0.718 to 0.820 when
using a single feature set. All feature set combinations
except C + S (content + sentiment feature) perform higher
than 0.810 in terms of F1-measure, reflecting that feature
combinations have more discriminative power than a single
feature set.</p>
      <p>
        For the Social Honeypot dataset, we can clearly see User
features (U) having the most discriminative power as it has
a 0.940 F1-measure. Results without using User features
(U) have significantly worse performance, and feature
combinations with U give very little improvement with respect
to the original 0.940 (except for U + Uni &amp; Bi-gram (Tf) +
S). This means U is dominating the discriminative power of
these feature combinations and other feature sets contribute
very little in comparison to U. This is potentially caused
by the data collection approach (i.e. by using social
honeypots) adopted by [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], which resulted in the fact that most
spammers that they attracted have distinguishing user
profile information compared to the legitimate users. On the
other hand, Yang et al. [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] checked malicious or phishing
URL links for collecting their spammer data, and this way
of data collection gives more discriminative power to
Content and N-gram features than [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] does (although U is still
a very significant feature set in 1KS-10KN). Note that U +
Bi &amp; Tri-gram (Tf) resulted in the best performance in both
datasets, showing that these two feature sets are the most
      </p>
      <sec id="sec-10-1">
        <title>5http://www.webcitation.org/6VyBTJ7vt</title>
        <p>beneficial to each other irrespective of the di↵erent nature
of datasets.</p>
        <p>Another important aspect to take into account when
choosing the features to be used is the computation time,
especially when one wants to apply the spam classifier in
realtime. Table 4 shows a eciency comparison for generating
each feature from 1000 tweets, using a machine with 2.8
GHz Intel Core i7 processor and 16 GB memory. Some of
the features, such as the User features, can be computed
quickly and require minimal computational cost, as most
of these features can be straightforwardly inferred from a
tweet’s metadata. Other features, such as N-grams and
part-of-speech counts (from Content features), can be
affected by the size of the vocabulary in the training set. On
the other hand, some of the features are computationally
more expensive, and therefore worth studying their
applicability. This is the case of Sentiment features, which require
string matching between our training documents and a list
of lexica we used. We keep the sentiment features since
they have shown added value in the performance
evaluation of feature set combinations. Similarly, Content features
such as Number of spam words and Number of spam words
per word also require string matching between our training
documents and a dictionary containing 11,529 spam words.
However, given that the latter did not provide significant
improvements in terms of accuracy, most probably because
the spam words were extracted from blogs, we conclude that
Number of spam words and Number of spam words per word
can be taken out from the representation for the sake of the
classifier’s eciency.
5.</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>DISCUSSION</title>
      <p>Our study looks at di↵erent classifiers and feature sets
over two spam datasets to pick the settings that perform
best. First, our study on spam classification buttresses
previous findings for the task of spammer classification, where
Random Forests were found to be the most accurate
classifier. Second, our comparison of four feature sets reveals the
features that, being readily available in each tweet, perform
best in identifying spam tweets. While di↵erent features
perform better for each of the datasets when using them
alone, our comparison shows that the combination of
different features leads to an improved performance in both
datasets. We believe that the use of multiple feature sets
increases the possibility to capture di↵erent spam types, and
makes it more dicult for spammers to evade all feature sets
used by the spam detection system. For example spammers
might buy more followers to look more legitimate but it is
still very likely that their spam tweet will be detected as its
tweet content will give away its spam nature.</p>
      <p>Due to practical limitations, we have generated our spam
vs. non-spam data from two spammer vs. non-spammer
datasets that were collected in 2011. For future work, we
plan to generate a labelled spam/non-spam dataset which
was crawled in 2014. This will not only give us a
purposebuilt corpus of spam tweets to reduce the possible e↵ect of
sampling bias of the two datasets that we used, but will also
give us insights on how the nature of Twitter spam changes
over time and how spammers have evolved since 2011 (as
spammers do evolve and their spam content are manipulated
to look more and more like normal tweet). Furthermore we
will investigate the feasibility of cross-dataset spam
classification using domain adaptation methods, and also whether</p>
      <sec id="sec-11-1">
        <title>Classifier</title>
        <p>Bernoulli NB</p>
        <p>KNN</p>
        <p>SVM
Decision Tree
Random Forest
unsupervised approaches work well enough in the domain of
Twitter spam detection.</p>
        <p>A caveat of the approach we relied on for the dataset
generation is the fact that we have considered spam tweets
posted by users who were deemed spammers. This was done
based on the assumption that the majority of social spam
tweets on Twitter are shared by spam accounts. However,
the dataset could also be complemented with spam tweets
which are occasionally posted by legitimate users, which our
work did not deal with. An interesting study to complement
our work would be to look at these spam tweets posted by
legitimate users, both to quantify this type of tweets, as well
as to analyse whether they present di↵erent features from
those in our datasets, especially when it comes to the
userbased features as users might have di↵erent characteristics.
For future work, we plan to conduct further evaluation on
how our features would function for spam tweets shared by
legitimate users, in order to fully understand the e↵ects of
bias of pursuing our approach of corpus construction.
6.</p>
      </sec>
    </sec>
    <sec id="sec-12">
      <title>CONCLUSION</title>
      <p>In this paper we focus on the detection of spam tweets,
solely making use of the features inherent to each tweet.
This die↵rs from most previous research works that
classified Twitter users as spammers instead, and represents a real
scenario where either a user is tracking an event on
Twitter, or a tool is collecting tweets associated with an event.
In these situations, the spam removal process cannot a↵ord
to retrieve historical and network-based features for all the
tweets involved with the event, due to high number of
requests to the Twitter API that this represents. We have
tested five die↵rent classifiers, and four di↵erent feature sets
on two Twitter spam datasets with di↵erent characteristics,
which allows us to validate our results and claim
repeatability. While the task is more dicult and has access to
fewer data than a spammer classification task, our results
show competitive performances. Moreover, our system can
be applied for detecting spam tweets in real time and does
not require any feature not readily available in a tweet.</p>
      <p>Here we have conducted the experiments on two di↵erent
datasets which were originally collected in 2011. While this
allows us to validate the results with two datasets collected
in very di↵erent methods, our plan for future work includes
the application of the spam detection system to more recent
events, to assess the validity of the classifier with recent data
as Twitter and spammers may have evolved.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Aiello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Deplano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Schifanella</surname>
          </string-name>
          , and
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Ru↵o. People are strange when you're a stranger: Impact and influence of bots on social networks</article-title>
          .
          <source>CoRR, abs/1407.8134</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F.</given-names>
            <surname>Benevenuto</surname>
          </string-name>
          , G. Magno,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rodrigues</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Almeida</surname>
          </string-name>
          .
          <article-title>Detecting spammers on twitter</article-title>
          .
          <source>In Proceedings of CEAS</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.</given-names>
            <surname>Blanzieri</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Bryl</surname>
          </string-name>
          .
          <article-title>A survey of learning-based techniques of email spam filtering</article-title>
          .
          <source>Artificial Intelligence Review</source>
          ,
          <volume>29</volume>
          (
          <issue>1</issue>
          ):
          <fpage>63</fpage>
          -
          <lpage>92</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>X.</given-names>
            <surname>Carreras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. S.</given-names>
            <surname>Marquez</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. G.</given-names>
            <surname>Salgado</surname>
          </string-name>
          .
          <article-title>Boosting trees for anti-spam email filtering</article-title>
          .
          <source>In Proceedings of RANLP. Citeseer</source>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>E.</given-names>
            <surname>Ferrara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Varol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Davis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Menczer</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Flammini</surname>
          </string-name>
          .
          <article-title>The rise of social bots</article-title>
          .
          <source>CoRR, abs/1407.5225</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>K.</given-names>
            <surname>Gimpel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. O</given-names>
            <surname>'Connor</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. Das</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Mills</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Eisenstein</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Heilman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Yogatama</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Flanigan</surname>
            , and
            <given-names>N. A.</given-names>
          </string-name>
          <string-name>
            <surname>Smith.</surname>
          </string-name>
          <article-title>Part-of-speech tagging for twitter: Annotation, features, and experiments</article-title>
          .
          <source>In Proceedings of ACL, HLT '11</source>
          , pages
          <fpage>42</fpage>
          -
          <lpage>47</lpage>
          , Stroudsburg, PA, USA,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kalbitzer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bermpohl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Rapp</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Heinz</surname>
          </string-name>
          .
          <article-title>Twitter psychosis: a rare variation or a distinct syndrome</article-title>
          .
          <source>Journal of Nervous and Mental Disease</source>
          ,
          <volume>202</volume>
          (
          <issue>8</issue>
          ):
          <fpage>623</fpage>
          ,
          <year>August 2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. D.</given-names>
            <surname>Eo</surname>
          </string-name>
          ,↵ and
          <string-name>
            <given-names>J.</given-names>
            <surname>Caverlee</surname>
          </string-name>
          .
          <article-title>Seven months with the devils: A long-term study of content polluters on twitter</article-title>
          . In L. A.
          <string-name>
            <surname>Adamic</surname>
            ,
            <given-names>R. A.</given-names>
          </string-name>
          <string-name>
            <surname>Baeza-Yates</surname>
          </string-name>
          , and S. Counts, editors,
          <source>ICWSM</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>B.</given-names>
            <surname>Liu</surname>
          </string-name>
          .
          <article-title>Sentiment analysis: a multifaceted problem</article-title>
          .
          <source>IEEE Intelligent Systems</source>
          ,
          <volume>25</volume>
          (
          <issue>3</issue>
          ):
          <fpage>76</fpage>
          -
          <lpage>80</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Martinez-Romo</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Araujo</surname>
          </string-name>
          .
          <article-title>Detecting malicious tweets in trending topics using a statistical analysis of language</article-title>
          .
          <source>Expert Systems with Applications</source>
          ,
          <volume>40</volume>
          (
          <issue>8</issue>
          ):
          <fpage>2992</fpage>
          -
          <lpage>3000</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>McCord</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Chuah</surname>
          </string-name>
          .
          <article-title>Spam detection on twitter using traditional classifiers</article-title>
          . In
          <string-name>
            <surname>J. M. A. Calero</surname>
            ,
            <given-names>L. T.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
            , F. G. Ma´rmol,
            <given-names>L. J.</given-names>
          </string-name>
          <string-name>
            <surname>Garca</surname>
            ´-Villalba,
            <given-names>X. A.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            , and
            <given-names>Y. W.</given-names>
          </string-name>
          0002, editors,
          <source>ATC</source>
          , volume
          <volume>6906</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>175</fpage>
          -
          <lpage>186</lpage>
          . Springer,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Dickinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Deitrick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. H.</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <article-title>Twitter spammer detection using data stream clustering</article-title>
          .
          <source>Information Sciences</source>
          ,
          <volume>260</volume>
          (
          <issue>0</issue>
          ):
          <fpage>64</fpage>
          -
          <lpage>73</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mohammad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kiritchenko</surname>
          </string-name>
          , and
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          .
          <article-title>Nrc-canada: Building the state-of-the-art in sentiment analysis of tweets</article-title>
          .
          <source>In Proceedings of the seventh international workshop on Semantic Evaluation Exercises (SemEval-2013)</source>
          , Atlanta, Georgia, USA,
          <year>June 2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>H.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          . Research report:
          <year>2013</year>
          <article-title>state of social media spam</article-title>
          . http://nexgate.com/wp-content/ uploads/2013/09/Nexgate-2013
          <source>-State-of-SocialMedia-Spam-Research-Report.pdf</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>F. A˚. Nielsen.</surname>
          </string-name>
          <article-title>A new anew: Evaluation of a word list for sentiment analysis in microblogs</article-title>
          .
          <source>arXiv preprint arXiv:1103.2903</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>I.</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          <article-title>Min˜ambres-</article-title>
          <string-name>
            <surname>Marcos</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Laorden</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Gala</surname>
          </string-name>
          <article-title>´n-Garc´ıa, A. Santamar´ıa-</article-title>
          <string-name>
            <surname>Ibirika</surname>
            , and
            <given-names>P. G.</given-names>
          </string-name>
          <string-name>
            <surname>Bringas. Twitter</surname>
          </string-name>
          content
          <article-title>-based spam filtering</article-title>
          . In International Joint Conference SOCO'
          <fpage>13</fpage>
          -CISIS'
          <fpage>13</fpage>
          -ICEUTE'
          <volume>13</volume>
          , pages
          <fpage>449</fpage>
          -
          <lpage>458</lpage>
          . Springer,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>C.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lee</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Pang</surname>
          </string-name>
          .
          <article-title>The ee↵ct of wording on message propagation: Topic- and author-controlled natural experiments on twitter</article-title>
          .
          <source>CoRR, abs/1405.1438</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A. H.</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <article-title>Don't follow me - spam detection in twitter</article-title>
          . In S. K. Katsikas and P. Samarati, editors,
          <source>SECRYPT</source>
          , pages
          <fpage>142</fpage>
          -
          <lpage>151</lpage>
          . SciTePress,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wilson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wiebe</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Ho</surname>
          </string-name>
          <article-title>↵mann. Recognizing contextual polarity in phrase-level sentiment analysis</article-title>
          .
          <source>In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, HLT '05</source>
          , pages
          <fpage>347</fpage>
          -
          <lpage>354</lpage>
          , Stroudsburg, PA, USA,
          <year>2005</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>C.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. C.</given-names>
            <surname>Harkreader</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Gu</surname>
          </string-name>
          .
          <article-title>Die free or live hard? empirical evaluation and new design for fighting evolving twitter spammers</article-title>
          .
          <source>In Proceedings of RAID, RAID'11</source>
          , pages
          <fpage>318</fpage>
          -
          <lpage>337</lpage>
          , Berlin, Heidelberg,
          <year>2011</year>
          . Springer-Verlag.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>A.</given-names>
            <surname>Zubiaga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Liakata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Procter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bontcheva</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Tolmie</surname>
          </string-name>
          .
          <article-title>Towards detecting rumours in social media</article-title>
          .
          <source>In AAAI Workshop on AI for Cities</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>