<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Behavioral Tracing of Twitter Accounts</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Neel Guha</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Stanford University</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>\Trolls" - individuals who engage in malicious behavior are a common occurrence within online communities. Yet simply banning accounts associated with trolls is often ine ective as individuals may register new accounts under pseudonyms and resume their activity. In this paper, we demonstrate how this can be addressed through a behavioral trace. Speci cally, we show that by analyzing the posts of an account, we can derive a semantic signature unique to the account's owner. By comparing the signatures of two accounts, we can determine whether they belong to the same user. We validate our techniques on a dataset of Twitter users, and explore di erent properties of our methods.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>In recent years, online communities have increasingly struggled with the
emergence of malicious accounts. These accounts engage in adversarial behavior,
often spreading harmful content or attacking other individuals on the platform.
This especially prominent on Twitter, where most interactions are public and
accounts are not required to correspond to real life identities (unlike Facebook).</p>
      <p>Eliminating these malicious accounts is di cult for many reasons. Firstly, the
process of banning accounts is resource intensive and arduous, occurring rarely
and often too late. Platforms like Twitter often rely on some human validation
before banning accounts, resulting in a backlog of agged accounts. Additionally,
once an account is banned, it is trivial for the individual to create a new account
under a pseudonym. They can resume their malicious behavior through this new
account, thus creating a perpetual cycle.</p>
      <p>Though the process of banning accounts will likely remain arduous and time
consuming due to legal/corporate policies and procedures, it should be possible
to prevent banned individuals from creating new accounts, or at least detect
when an account may have been created by an individual previously banned.</p>
      <p>
        In theory, this could be achieved through phone veri cation, or IP address
blacklisting. However, these can have unintended consequences. Asking for users
to validate their accounts with phone numbers may expose individuals who live
in oppressive countries and have a legitimate need for privacy. Such measures
hamper their ability to use mechanisms like Tor to access Twitter[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Banning
accounts on the basis of IP address is also ine ective, as an individual could
merely switch to a di erent network to create/access their account. If an
individual uses a public machine (in a cybercafe or library), banning on IP address
may prevent large numbers of other individuals from accessing their accounts on
the same machine.
      </p>
      <p>In this paper, we present behavioral tracing : a method by which accounts
created by the same individual can be identi ed and linked on the basis of the
content of the accounts. Intuitively, an account's posts represent the topical
interests and idiosyncrasies of its owner. Thus, in examining an account's posts,
we should be able to derive a signature unique to the account's owner. We refer
to this as a trace, and demonstrate it can be constructed. By comparing the
traces of two di erent accounts, we can predict whether or not they were owned
by the same individual. Applying this in the context presented above, we can
use a trace to examine newly created accounts and determine if they resemble a
banned account.</p>
      <p>Our work is novel in our focus on semantic signatures. Unlike prior work, we
formulate an authorship model based primarily on the content produced by a
user (as opposed to the user's relations in the network graph, or lexical clues).
Rather than constructing user-speci c classi ers to identify accounts belonging
to the same user, we introduce a single method applicable to all users.
Specifically, we derive a vector space representation for each account (based on the
account's post) where the distance between two accounts is indicative of the
likelihood that they originate from the same user.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        There is a wealth of literature on di erent techniques for establishing authorship
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] presents methods for authorship identi cation of postings in online
communities. The authors experimented with a variety of features (lexical, structural,
content based, etc.) and models (neural networks, support vector machines, etc)
to establish authorship of di erent posts. Though this is similar to our work,
there are several key di erences. The posts analyzed in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] were on average over
150 words - well over the 140 character limit of Twitter. Additionally, the goal of
the work was to establish the authorship of single posts, and not a collection of
posts (corresponding to a single account). Though there has been work on
establishing authorship in a Twitter context, it has primarily focused on using lexical
and syntactic features [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ][
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In our paper, we demonstrate how authorship can
also be determine by using extracting a semantic (topic based) signature for
every user. To the best of our knowledge, this is a novel approach.
      </p>
      <p>It is important to distinguish prior literature on spam and troll detection from
our work. We focus on \linking" Twitter accounts to establish when two accounts
were launched by the same user. Though a primary application of this work may
be in detecting trolls, it could also be used to detect when a single individual
is seeking to in uence a discussion through the creation of multiple accounts.
Much of the prior work on spamming and trolling focuses on leveraging network
or language characteristics to identify common traits of banned accounts.</p>
      <p>
        There has been signi cant prior work on the role of spammers within social
networks like Twitter. Many, like [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], have focused on characterizing the nature
of spamming Twitter accounts. These works have demonstrated the techniques
spammers use to promote content, and various approaches that could be used
to detect them. It is important to clarify the distinction between spam
detection, and the focus of our paper. Spammers primarily use platforms like Twitter
to propagate commercial content, and convince users to take certain actions
(clicking a link, downloading some software, purchasing a product). Spam
accounts tend to be \fake" accounts that aren't tied to any real individual, and
are often controlled by bots. In contrast, we focus on \real" accounts that are
controlled by real individuals, and represent their interests. These individuals
are thus signi cantly less likely to follow the follow the behavioral patterns of
fake spam accounts. Our work is partially inspired by our prior work in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], which
present several methods for identifying web users across di erent browser
sessions. Though we incorporate some prior techniques, both our approaches and
the nature of the problem are very di erent.
      </p>
      <p>
        Prior work has also focused on identifying \trolls" or adversaries within
social networks [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] presents techniques for detecting trolls within social
media networks. However, they assume that trolling individuals create fake troll
accounts in addition to their real account. Further, the fake account is followed
by the real account, and regularly interacts with the real account. On a limited
sample of accounts, they present techniques for identifying the authorship of
individual tweets. Our work doesn't make these assumptions. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] analyzes
\antisocial users" to determine characteristics of banned users. However, their work
focuses less on identifying speci er users, and more on analyzing the behavior of
banned users on numerous internet forums.
      </p>
      <p>
        Similarly, there has been signi cant work on de-anonymizing social network
users by utilizing information about network relationships [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. In
particular, [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] demonstrates how anonymized users with accounts on both Flickr and
Twitter can be identi ed using graph topology. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] attempts to identify Twitter
accounts on the basis of browsing histories. By analyzing the t.co URLs visited
by a user, they can determine the combination of accounts the user must have
been following, which in turn can be used to identify the user's account.
However, this approach fails to derive a ngerprint based on the user's interests - a
critical contribution of our work. Furthermore, they require the browsing history
of a user, a data source that is not often available.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Behavioral Trace</title>
      <p>In this paper, we introduce behavioral tracing, a method to identify when posts
from two Twitter accounts were authored by the same individual. There are
many cases where this technique may be applicable. For example, we could apply
it to determine when a previously banned user has returned to a platform (i.e.
Twitter) and continued their activity under a pseudonym. Alternatively, a user
may decide to operate two accounts within a particular community (to reinforce
their opinions or create a perception of popularity). Behavioral tracing would
allow us to identify such cases.</p>
      <p>Our intuition is that a user's tweets are drawn from a xed distribution
governed by the user's interests. Given enough tweets from a single user, we can
derive some approximation of the original distribution (a user's behavioral trace
- also referred to as a user's trace). By comparing the extracted approximations
from two accounts, we can determine when two accounts are in fact run by the
same user. Thus, if we compared the trace from a banned account to the trace of
an active account, we can determine when a user reenters the platform under a
pseudonym. In this section we formalize the notion of a behavioral trace, describe
how it can be used, and where limitations exist.</p>
      <p>This approach also assumes that users maintain a consistent interest
distribution and that a signi cant fraction of tweets posted by a user (regardless of the
account used) are drawn from this distribution. If this condition were violated
- for example, if a user had di erent interest distributions for di erent accounts
- then it would be signi cantly harder to extract a meaningful trace. Thus, we
assume that when a user has been banned from the platform and returns under
a pseudonym, their tweets continue to be drawn from their original distribution.
In other words, a user's interests are maintained between both accounts, and
their behavior does not signi cantly alter.</p>
      <p>There are however, several important limitations to acknowledge. Over time,
a user's interests are likely to change. Hence we can expect that in the longer
term, a user's interest distribution will gradually shift, making it harder to
identify a user. This is something we hope to explore in future work. Additionally, if
an individual creates two accounts but uses them for signi cantly di erent aims
(professional and personal), the traces extracted won't be similar enough.</p>
      <p>We now formalize the notion of a behavioral trace. We imagine a user u
having a set of topical interests characterized by a distribution B over all possible
interests/topics. Furthermore, we assume that every tweet (ti) authored by u is
sampled at random from B. Thus, we should expect that as a user posts more
tweets, their collection of tweets grows more representative of their interests (the
distribution of ti's should resemble B).</p>
      <p>Underlying our approach is the assumption that with high probability, any
two users u1 and u2 will have di erent interest distributions (B1 and B2). We
reason that individuals tend to be quite diverse in their interests. Though most
users undoubtedly share common interests (sports teams, hobbies, etc.), the ways
in which individuals process or share information tend to be highly personalized.
When examined at a highly granular level, most individuals are distinguishable
from one another. Thus, in our approach we seek to construct a trace for each
user - an approximation of that user's interesting distribution inferred from their
tweets. We treat the trace as a signature, and use it to ngerprint users.</p>
      <p>We frame our problem as follows. Given two sets of tweets (T1 and T2) from
two di erent accounts, our goal is to extract a trace (referred to as b^1 and
b^2) that approximates the interest distribution of each account. If the traces
are su ciently similar, then we can determine that they must correspond to the
same interest distribution, and that the same user is responsible for writing both
sets of tweets. However, if they are su ciently di erent, then we can determine
that refer to di erent interest distributions, and that both sets of tweets were
written by di erent users.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Methodology</title>
      <p>We formulate the task as follows. Given a set of tweets from n users, we partition
each user's tweets into 2 separate accounts (giving a total of 2n accounts). Our
goal is to re-identify the accounts by determining which originate from the same
user.
4.1</p>
      <sec id="sec-4-1">
        <title>Approach</title>
        <p>
          We attempt to map each account to a vector based on its
interests/behavior/topics. Importantly, we seek to do so in a manner such that accounts
corresponding to the same user are close to each other in this vector space. Prior
work has demonstrated how word embeddings (e.g. Word2Vec) can capture rich
semantic meaning in a way that traditional bag-of-words models cannot [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
By constructing models to predict a word from its context (or vice versa), these
models allow us to map words/phrases to vectors. Most notably, words that are
\close" to each other in the vector space are likely to share similar contexts (and
thus meaning).
        </p>
        <p>
          In this work, we draw on Doc2Vec[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], an extension of the Word2Vec model
that allows us to construct representations of variable length (i.e. documents).
Our approach is motivated by the intuition that we can e ectively construct
a trace for each user by relying on word embeddings. In doing so, we can
derive a vector for each account where the distance between accounts re ects the
likelihood that they originate from the same author.
        </p>
        <p>
          In this work, we collate all tweets from an account and treat the account like
a single \document". We run Doc2Vec on the collection of accounts to derive
a vector representation for each account [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Rather than compute a similarity
score between every pair of accounts, we run k-means clustering to sort the
accounts into di erent clusters (on the basis of their inferred vectors). In doing
so, we're able to learn the \neighborhood" of an account - other accounts that
look similar and are thus more likely to originate from the same user. Relying
on this intuition, we thus only calculate a pairwise similarity score for accounts
within the same cluster. We assume that accounts in di erent clusters correspond
to di erent users. We nd that this is a relatively safe assumption which allows
us to signi cantly reduce the run time.
        </p>
        <p>After deriving a location for each account in the vector space, we seek to
identify the accounts in its neighborhood that could originate from the same user.
For two accounts represented as the vectors ai and aj , we calculate Score(ai; aj )
in the following manner.</p>
        <p>Score(ai; aj ) =</p>
        <p>Cosine(ai; aj )
We describe this as a \weighted similarity" function, which weighs the similarity
of two accounts by how dissimilar they are. It is not su cient to say that two
accounts are similar. Rather, we can only be con dent that two accounts
correspond to the same user if they are both similar to each other and dissimilar to
other accounts. If we have two accounts ai and aj such that ai is similar to aj
but both ai and aj are similar to the bulk of the accounts in our data set, we are
less con dent that ai and aj originate from the same user. it is probable that ai
and aj (and the accounts they are similar to) belong to a mass of users whose
behavior is too shallow or generic to discern. Conversely, if ai and aj were similar
to each other but di erent from other accounts, we would be signi cantly more
con dent that both accounts originated from the same user. Hence, our scoring
function is weighted by both account similarity and account dissimilarity.</p>
        <p>For calculating the similarity between two accounts ai and aj , we use the
cosine similarity metric, a common measure in information retrieval. For two
n-dimensional vectors, the cosine similarity is calculated by</p>
        <p>Cosine(si; sj ) =</p>
        <p>si sj
jjsijjjjsj jj</p>
        <p>For each account, we deem the account with the highest score that exceeds
the threshold to be from the same author. If no accounts have a score above the
threshold, then the account in question is deemed not to share an author with
any other account in the dataset. If multiple other accounts have scores which
exceed the threshold, we only pick the account with the highest score. As we
discuss in the next section, this approach is highly exible, allowing us to achieve
di erent types of results by varying the cuto score used.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Evaluation</title>
        <p>We measure the success of our approach using the precision-recall framework.
Precision is de ned as the proportion of account pairings we identify that are
correct.</p>
        <p>Precision = jSt \ Spj
jSpj
where Sp is the set of account pairings we predict and St is the set containing all
pairs of accounts that originate from the same user (truth). Recall is de ned as
the proportion of same user account pairs that are identi ed by our methodology,
or</p>
        <p>Recall = jSt \ Spj
jStj
In the context of our application, precision is the proportion of identi ed account
pairings that do correspond to the same user. Recall is the proportion of
sameuser account pairings that we do identify.</p>
        <p>Using the precision-recall framework to evaluate our approach allows us to
modulate the type of result achieved based. Depending on the context in which
we're applying the methodology, this can di er. Sometimes perhaps, we may
require a strategy that delivers high precision. This would be preferable, for
example, if we chose to be conservative in our identi cation of accounts.
Alternatively, we may want to ag as many accounts as possible. In this case, we would
prefer a strategy which delivered a high recall (even at the cost of precision).
4.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Baseline</title>
        <p>To establish a baseline, we simulate an adversary randomly guessing accounts
as pairs. We do this by randomly generating a score between [0; 1] for each pair
of accounts. We pick the cuto that maximizes the F1 score and report results
at that threshold.</p>
        <p>In addition, we o er a more advanced baseline by running K-Means clustering
directly on the generated Doc2Vec vectors for each account. Speci cally, we set
the number of desired clusters equal to the number of users. If two accounts are
contained in the same cluster, we predict those two accounts to originate form
the same user.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Data</title>
      <p>Using the Twitter API, we collected 1,270,999 tweets from 1849 users. Of these,
678,403 were retweets and 592,596 were original tweets by users. Figure 1 shows
a cumulative histogram of the number of tweets for every account in the dataset.
The vast majority have fewer than 2000 tweets. Figure 2 is a histogram of the
proportion of retweets for all accounts (the fraction of an account's tweets that
are retweeted). The majority of accounts in our dataset are regularly active, with
half posting at least 1.84 times per day.</p>
      <p>Given that the focus of this work was on using semantic clues to develop
unique identi ers for di erent Twitter accounts, we took care to clean tweets
so that the algorithm would not identify accounts on the basis of their network
properties. Speci cally, we removed all account handles from the text from every
tweet (e.g, \@exampleAccount").</p>
      <p>We evaluated our algorithm as follows. We split our dataset of users into
two groups - a \training" set and a \testing" set. Within each set, we split each
user into two separate accounts (with each account containing half of the user's
tweets). We applied our algorithm on the training set, and identi ed the cuto
distance at which the F1 score was maximized. We then applied our algorithm
to the test set, using the cuto score derived from the training set to predict
which accounts belonged to the same user. We repeated this process 25 di erent
times (sampling a di erent training and testing set each time).</p>
      <p>This particular procedure allows us to justify the nal cuto used to identify
accounts. We can imagine that in di erent contexts, a di erent cuto s might be
necessary. \Learning" it in this manner will allows us to better approximate an
optimal cuto .</p>
      <p>Additionally, we experimented with the e ect of retweets on our approach's
performance. We thus ran two variations of our strategy. In the rst, we ignored
all retweets by accounts, using only \original" tweets to construct account traces.
In the second variation, we used all of an account's tweets (including retweets)
to construct the trace. Table 5 presents the results of these two versions, along
with the baseline performance.</p>
      <p>We experimented by varying the number of tweets each account was
generated from. We see that as the number of tweets per account increases, the
algorithms performance improves (Figure 4). However, we observe that the overall
performance of the algorithm appears to level o after roughly 200 tweets.</p>
      <p>We also experimented by varying the number of users targeted by our
approach. We nd that generally, as the number of users analyzed increases, the
algorithm's ability to extract a uniquely identi able ngerprint decreases.
The results in Table 5 demonstrate the e ectiveness of our approach. Our
algorithm exceeds both the randomized and naive clustering baseline, suggesting
that our methods are capable of both successfully constructing unique traces,
and using these trace to identify when tweets from two accounts are authored
by the same user.</p>
      <p>Our techniques demonstrate signi cantly improved results when we use an
account's reweets to derive a semantic signature. There are several ways to
interpret this result. Its possible that by using an account's retweets, our extracted
semantic signature is in uenced by the user's location in the Twitter network.
Users are more likely to retweet accounts that they are following/followed by.
Thus, when the majority of a user's tweets are retweets, the extracted semantic
signature is e ectively a re ection of the network structure surrounding the user.</p>
      <p>However, a user's retweets are likely to re ect their interest pro le.
Furthermore, every account they retweet is also likely to have an extractable semantic
signature (given enough tweets). Thus, we can view the extracted semantic
signature for user not solely as their own, but as a composition of the semantic
signatures of the accounts they frequently retweet.</p>
      <p>We also nd that performance in general improves as the number of tweets
sampled for each account increases. Intuitively, this follows. As we gather more
tweets from an account, we're able to better approximate the user's pro le, and
thus build a better trace. After a while however, there appear to be diminishing
returns.</p>
      <p>Additionally, we nd that as the number of accounts we run our algorithm
on increases, performance tends to decrease. As we grow our sample, we can
imagine that accounts grow less distinguishable, and tend towards a more
general, \average" interest pro le. In these cases, it becomes hard for us to extract
a unique trace for each account. However, the results in Figure 5 suggest that
our strategy still nds success for larger samples of users. It's likely that our
approach is conducting a variant of \outlier detection", in e ect identifying users
who are su ciently di erent from all others.</p>
      <p>Additionally, we nd that that when the sample of users is small, the cuto
learned on the training set results in poorer performance on the testing set
(farther away from the optimal point). When the cuto is large however, we nd
that the performance on the training set is comparable to the test set.</p>
      <p>The methods we present can be extended beyond the problem posed in this
paper. The ability to construct ngerprints for users on the basis of their
behavior has wide ranging implications for privacy and security. Broadly speaking,
behavioral tracing is applicable in any domain where individuals take actions
consistent with a set of interests, habits, or tasks. It could be used for example,
to identify someone on the basis of online purchases. Equivalently, it could also
be used to disambiguate between multiple individuals using a single account (e.g.
on Net ix). At its core, behavioral tracing o ers a way of uniquely identifying
individuals on the basis of their behavior. Because behavior is hard to mask or
alter, behavioral tracing is especially potent.</p>
      <p>In summary, the primary contribution of our work is behavioral tracing, a
topical authorship model for Twitter. In framing a user's tweets as samples from
their interest distribution, we demonstrate how users can be ngerprinted on
the basis of a semantic signature. Validating our approach on real world Twitter
data, we demonstrate how it can nd success at identify users across di erent
accounts.
7</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>We'd like to thank Anand Shukla, Ramakrishnan Srikant, Dan Boneh,
Ramanathan Guha, Mehran Sahami, Lea Kissner, Scott Ellis, and Jonathan Mayer
for their advice and guidance on this project.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <article-title>Deep learning with paragraph2vec</article-title>
          . https://radimrehurek.com/gensim/models/doc2vec.html.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bhargava</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mehndiratta</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Asawa</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <article-title>Stylometric Analysis for Authorship Attribution on Twitter</article-title>
          . Springer International Publishing, Cham,
          <year>2013</year>
          , pp.
          <volume>37</volume>
          {
          <fpage>47</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Cheng, J.,
          <string-name>
            <surname>Danescu-Niculescu-Mizil</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Leskovec</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>Antisocial behavior in online discussion communities</article-title>
          .
          <source>CoRR abs/1504</source>
          .00680 (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Galan-Garc</surname>
            <given-names>a</given-names>
          </string-name>
          , P.,
          <string-name>
            <surname>de la Puerta</surname>
            ,
            <given-names>J. G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez</surname>
            ,
            <given-names>C. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Bringas</surname>
            ,
            <given-names>P. G.</given-names>
          </string-name>
          <article-title>Supervised Machine Learning for the Detection of Troll Pro les in Twitter Social Network: Application to a Real Case of Cyberbullying</article-title>
          . Springer International Publishing, Cham,
          <year>2014</year>
          , pp.
          <volume>419</volume>
          {
          <fpage>428</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Gibbs</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <article-title>The Problem With Twitters New Abuse Strategy</article-title>
          . https://www.theguardian.com/technology/2015/mar/04/twitters-new
          <article-title>-bid-toend-online-abuse-could-endanger-dissidents-analysis.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Gross</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Acquisti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <article-title>Information revelation and privacy in online social networks</article-title>
          .
          <source>In Proceedings of the 2005 ACM Workshop on Privacy in the Electronic Society</source>
          (New York, NY, USA,
          <year>2005</year>
          ),
          <source>WPES '05</source>
          , ACM, pp.
          <volume>71</volume>
          {
          <fpage>80</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Guha</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <article-title>Semantic identi cation of web browsing sessions</article-title>
          .
          <source>CoRR abs/1704</source>
          .03138 (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Hay</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miklau</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jensen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Towsley</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Weis</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <article-title>Resisting structural re-identi cation in anonymized social networks</article-title>
          .
          <source>Proc. VLDB Endow</source>
          .
          <volume>1</volume>
          ,
          <issue>1</issue>
          (Aug.
          <year>2008</year>
          ),
          <volume>102</volume>
          {
          <fpage>114</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Houvardas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <article-title>N-Gram Feature Selection for Authorship Identi cation</article-title>
          . Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>2006</year>
          , pp.
          <volume>77</volume>
          {
          <fpage>86</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Kunegis</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lommatzsch</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Bauckhage</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <article-title>The slashdot zoo: Mining a social network with negative edges</article-title>
          .
          <source>In Proceedings of the 18th International Conference on World Wide Web</source>
          (New York, NY, USA,
          <year>2009</year>
          ),
          <source>WWW '09</source>
          , ACM, pp.
          <volume>741</volume>
          {
          <fpage>750</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Layton</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Watters</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Dazeley</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <article-title>Authorship attribution for twitter in 140 characters or less</article-title>
          .
          <source>In 2010 Second Cybercrime and Trustworthy Computing Workshop (July</source>
          <year>2010</year>
          ), pp.
          <volume>1</volume>
          {
          <fpage>8</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q. V.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <article-title>Distributed representations of sentences and documents</article-title>
          .
          <source>CoRR abs/1405</source>
          .4053 (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>CoRR abs/1310</source>
          .4546 (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Narayanan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Shmatikov</surname>
          </string-name>
          , V.
          <article-title>De-anonymizing social networks</article-title>
          .
          <source>In 2009 30th IEEE Symposium on Security and Privacy (May</source>
          <year>2009</year>
          ), pp.
          <volume>173</volume>
          {
          <fpage>187</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Ortega</surname>
            ,
            <given-names>F. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Troyano</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cruz</surname>
            ,
            <given-names>F. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vallejo</surname>
            ,
            <given-names>C. G.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Enrquez</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <article-title>Propagation of trust and distrust for the detection of trolls in a social network</article-title>
          .
          <source>Computer Networks</source>
          <volume>56</volume>
          ,
          <issue>12</issue>
          (
          <year>2012</year>
          ),
          <volume>2884</volume>
          {
          <fpage>2895</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shukla</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Narayanan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <article-title>De-anonymizing web browsing data with social networks</article-title>
          .
          <source>In Proceedings of the 26th International Conference on World Wide Web (Republic and Canton of Geneva, Switzerland</source>
          ,
          <year>2017</year>
          ), WWW '17, International World Wide Web Conferences Steering Committee, pp.
          <volume>1261</volume>
          {
          <fpage>1269</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Thomas</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grier</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Paxson</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <article-title>Suspended accounts in retrospect: An analysis of twitter spam</article-title>
          .
          <source>In Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference</source>
          (New York, NY, USA,
          <year>2011</year>
          ), IMC '11, ACM, pp.
          <volume>243</volume>
          {
          <fpage>258</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <article-title>A framework for authorship identi cation of online messages: Writing-style features and classi cation techniques</article-title>
          .
          <source>Journal of the American Society for Information Science and Technology 57</source>
          ,
          <issue>3</issue>
          (
          <year>2006</year>
          ),
          <volume>378</volume>
          {
          <fpage>393</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Pei</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>Preserving privacy in social networks against neighborhood attacks</article-title>
          .
          <source>In 2008 IEEE 24th International Conference on Data Engineering (April</source>
          <year>2008</year>
          ), pp.
          <volume>506</volume>
          {
          <fpage>515</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>