<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Behavioral Tracing of Twitter Accounts</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Neel</forename><surname>Guha</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Stanford University</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Behavioral Tracing of Twitter Accounts</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">41E7FDCC36EA75DB206F60F0A6A1EC4C</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T08:48+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Trolls" -individuals who engage in malicious behaviorare a common occurrence within online communities. Yet simply banning accounts associated with trolls is often ineffective as individuals may register new accounts under pseudonyms and resume their activity. In this paper, we demonstrate how this can be addressed through a behavioral trace. Specifically, we show that by analyzing the posts of an account, we can derive a semantic signature unique to the account's owner. By comparing the signatures of two accounts, we can determine whether they belong to the same user. We validate our techniques on a dataset of Twitter users, and explore different properties of our methods.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>In recent years, online communities have increasingly struggled with the emergence of malicious accounts. These accounts engage in adversarial behavior, often spreading harmful content or attacking other individuals on the platform. This especially prominent on Twitter, where most interactions are public and accounts are not required to correspond to real life identities (unlike Facebook).</p><p>Eliminating these malicious accounts is difficult for many reasons. Firstly, the process of banning accounts is resource intensive and arduous, occurring rarely and often too late. Platforms like Twitter often rely on some human validation before banning accounts, resulting in a backlog of flagged accounts. Additionally, once an account is banned, it is trivial for the individual to create a new account under a pseudonym. They can resume their malicious behavior through this new account, thus creating a perpetual cycle.</p><p>Though the process of banning accounts will likely remain arduous and time consuming due to legal/corporate policies and procedures, it should be possible to prevent banned individuals from creating new accounts, or at least detect when an account may have been created by an individual previously banned.</p><p>In theory, this could be achieved through phone verification, or IP address blacklisting. However, these can have unintended consequences. Asking for users to validate their accounts with phone numbers may expose individuals who live in oppressive countries and have a legitimate need for privacy. Such measures hamper their ability to use mechanisms like Tor to access Twitter <ref type="bibr">[5]</ref>. Banning accounts on the basis of IP address is also ineffective, as an individual could merely switch to a different network to create/access their account. If an individual uses a public machine (in a cybercafe or library), banning on IP address may prevent large numbers of other individuals from accessing their accounts on the same machine.</p><p>In this paper, we present behavioral tracing: a method by which accounts created by the same individual can be identified and linked on the basis of the content of the accounts. Intuitively, an account's posts represent the topical interests and idiosyncrasies of its owner. Thus, in examining an account's posts, we should be able to derive a signature unique to the account's owner. We refer to this as a trace, and demonstrate it can be constructed. By comparing the traces of two different accounts, we can predict whether or not they were owned by the same individual. Applying this in the context presented above, we can use a trace to examine newly created accounts and determine if they resemble a banned account.</p><p>Our work is novel in our focus on semantic signatures. Unlike prior work, we formulate an authorship model based primarily on the content produced by a user (as opposed to the user's relations in the network graph, or lexical clues). Rather than constructing user-specific classifiers to identify accounts belonging to the same user, we introduce a single method applicable to all users. Specifically, we derive a vector space representation for each account (based on the account's post) where the distance between two accounts is indicative of the likelihood that they originate from the same user.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related Work</head><p>There is a wealth of literature on different techniques for establishing authorship <ref type="bibr" target="#b9">[9]</ref>. <ref type="bibr" target="#b18">[18]</ref> presents methods for authorship identification of postings in online communities. The authors experimented with a variety of features (lexical, structural, content based, etc.) and models (neural networks, support vector machines, etc) to establish authorship of different posts. Though this is similar to our work, there are several key differences. The posts analyzed in <ref type="bibr" target="#b18">[18]</ref> were on average over 150 words -well over the 140 character limit of Twitter. Additionally, the goal of the work was to establish the authorship of single posts, and not a collection of posts (corresponding to a single account). Though there has been work on establishing authorship in a Twitter context, it has primarily focused on using lexical and syntactic features <ref type="bibr" target="#b11">[11]</ref> <ref type="bibr" target="#b1">[2]</ref>. In our paper, we demonstrate how authorship can also be determine by using extracting a semantic (topic based) signature for every user. To the best of our knowledge, this is a novel approach.</p><p>It is important to distinguish prior literature on spam and troll detection from our work. We focus on "linking" Twitter accounts to establish when two accounts were launched by the same user. Though a primary application of this work may be in detecting trolls, it could also be used to detect when a single individual is seeking to influence a discussion through the creation of multiple accounts. Much of the prior work on spamming and trolling focuses on leveraging network or language characteristics to identify common traits of banned accounts.</p><p>There has been significant prior work on the role of spammers within social networks like Twitter. Many, like <ref type="bibr" target="#b17">[17]</ref>, have focused on characterizing the nature of spamming Twitter accounts. These works have demonstrated the techniques spammers use to promote content, and various approaches that could be used to detect them. It is important to clarify the distinction between spam detection, and the focus of our paper. Spammers primarily use platforms like Twitter to propagate commercial content, and convince users to take certain actions (clicking a link, downloading some software, purchasing a product). Spam accounts tend to be "fake" accounts that aren't tied to any real individual, and are often controlled by bots. In contrast, we focus on "real" accounts that are controlled by real individuals, and represent their interests. These individuals are thus significantly less likely to follow the follow the behavioral patterns of fake spam accounts. Our work is partially inspired by our prior work in <ref type="bibr" target="#b7">[7]</ref>, which present several methods for identifying web users across different browser sessions. Though we incorporate some prior techniques, both our approaches and the nature of the problem are very different.</p><p>Prior work has also focused on identifying "trolls" or adversaries within social networks <ref type="bibr" target="#b15">[15]</ref>  <ref type="bibr" target="#b10">[10]</ref>.[4] presents techniques for detecting trolls within social media networks. However, they assume that trolling individuals create fake troll accounts in addition to their real account. Further, the fake account is followed by the real account, and regularly interacts with the real account. On a limited sample of accounts, they present techniques for identifying the authorship of individual tweets. Our work doesn't make these assumptions. <ref type="bibr" target="#b2">[3]</ref> analyzes "antisocial users" to determine characteristics of banned users. However, their work focuses less on identifying specifier users, and more on analyzing the behavior of banned users on numerous internet forums.</p><p>Similarly, there has been significant work on de-anonymizing social network users by utilizing information about network relationships <ref type="bibr" target="#b8">[8]</ref> [6] <ref type="bibr" target="#b19">[19]</ref>. In particular, <ref type="bibr" target="#b14">[14]</ref> demonstrates how anonymized users with accounts on both Flickr and Twitter can be identified using graph topology. <ref type="bibr" target="#b16">[16]</ref> attempts to identify Twitter accounts on the basis of browsing histories. By analyzing the t.co URLs visited by a user, they can determine the combination of accounts the user must have been following, which in turn can be used to identify the user's account. However, this approach fails to derive a fingerprint based on the user's interests -a critical contribution of our work. Furthermore, they require the browsing history of a user, a data source that is not often available.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Behavioral Trace</head><p>In this paper, we introduce behavioral tracing, a method to identify when posts from two Twitter accounts were authored by the same individual. There are many cases where this technique may be applicable. For example, we could apply it to determine when a previously banned user has returned to a platform (i.e. Twitter) and continued their activity under a pseudonym. Alternatively, a user may decide to operate two accounts within a particular community (to reinforce their opinions or create a perception of popularity). Behavioral tracing would allow us to identify such cases.</p><p>Our intuition is that a user's tweets are drawn from a fixed distribution governed by the user's interests. Given enough tweets from a single user, we can derive some approximation of the original distribution (a user's behavioral trace -also referred to as a user's trace). By comparing the extracted approximations from two accounts, we can determine when two accounts are in fact run by the same user. Thus, if we compared the trace from a banned account to the trace of an active account, we can determine when a user reenters the platform under a pseudonym. In this section we formalize the notion of a behavioral trace, describe how it can be used, and where limitations exist.</p><p>This approach also assumes that users maintain a consistent interest distribution and that a significant fraction of tweets posted by a user (regardless of the account used) are drawn from this distribution. If this condition were violated -for example, if a user had different interest distributions for different accounts -then it would be significantly harder to extract a meaningful trace. Thus, we assume that when a user has been banned from the platform and returns under a pseudonym, their tweets continue to be drawn from their original distribution. In other words, a user's interests are maintained between both accounts, and their behavior does not significantly alter.</p><p>There are however, several important limitations to acknowledge. Over time, a user's interests are likely to change. Hence we can expect that in the longer term, a user's interest distribution will gradually shift, making it harder to identify a user. This is something we hope to explore in future work. Additionally, if an individual creates two accounts but uses them for significantly different aims (professional and personal), the traces extracted won't be similar enough.</p><p>We now formalize the notion of a behavioral trace. We imagine a user u having a set of topical interests characterized by a distribution B over all possible interests/topics. Furthermore, we assume that every tweet (t i ) authored by u is sampled at random from B. Thus, we should expect that as a user posts more tweets, their collection of tweets grows more representative of their interests (the distribution of t i 's should resemble B).</p><p>Underlying our approach is the assumption that with high probability, any two users u 1 and u 2 will have different interest distributions (B 1 and B 2 ). We reason that individuals tend to be quite diverse in their interests. Though most users undoubtedly share common interests (sports teams, hobbies, etc.), the ways in which individuals process or share information tend to be highly personalized. When examined at a highly granular level, most individuals are distinguishable from one another. Thus, in our approach we seek to construct a trace for each user -an approximation of that user's interesting distribution inferred from their tweets. We treat the trace as a signature, and use it to fingerprint users.</p><p>We frame our problem as follows. Given two sets of tweets (T 1 and T 2 ) from two different accounts, our goal is to extract a trace (referred to as b1 and b2 ) that approximates the interest distribution of each account. If the traces are sufficiently similar, then we can determine that they must correspond to the same interest distribution, and that the same user is responsible for writing both sets of tweets. However, if they are sufficiently different, then we can determine that refer to different interest distributions, and that both sets of tweets were written by different users.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Methodology</head><p>We formulate the task as follows. Given a set of tweets from n users, we partition each user's tweets into 2 separate accounts (giving a total of 2n accounts). Our goal is to re-identify the accounts by determining which originate from the same user.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Approach</head><p>We attempt to map each account to a vector based on its interests/behavior/topics. Importantly, we seek to do so in a manner such that accounts corresponding to the same user are close to each other in this vector space. Prior work has demonstrated how word embeddings (e.g. Word2Vec) can capture rich semantic meaning in a way that traditional bag-of-words models cannot <ref type="bibr" target="#b13">[13]</ref>. By constructing models to predict a word from its context (or vice versa), these models allow us to map words/phrases to vectors. Most notably, words that are "close" to each other in the vector space are likely to share similar contexts (and thus meaning).</p><p>In this work, we draw on Doc2Vec <ref type="bibr" target="#b12">[12]</ref>, an extension of the Word2Vec model that allows us to construct representations of variable length (i.e. documents). Our approach is motivated by the intuition that we can effectively construct a trace for each user by relying on word embeddings. In doing so, we can derive a vector for each account where the distance between accounts reflects the likelihood that they originate from the same author.</p><p>In this work, we collate all tweets from an account and treat the account like a single "document". We run Doc2Vec on the collection of accounts to derive a vector representation for each account <ref type="bibr" target="#b0">[1]</ref>. Rather than compute a similarity score between every pair of accounts, we run k-means clustering to sort the accounts into different clusters (on the basis of their inferred vectors). In doing so, we're able to learn the "neighborhood" of an account -other accounts that look similar and are thus more likely to originate from the same user. Relying on this intuition, we thus only calculate a pairwise similarity score for accounts within the same cluster. We assume that accounts in different clusters correspond to different users. We find that this is a relatively safe assumption which allows us to significantly reduce the run time.</p><p>After deriving a location for each account in the vector space, we seek to identify the accounts in its neighborhood that could originate from the same user. For two accounts represented as the vectors a i and a j , we calculate Score(a i , a j ) in the following manner.</p><formula xml:id="formula_0">Score(a i , a j ) = Cosine(a i , a j ) k=0 1 Cosine(a i , a k ) + k=0 1 Cosine(a j , a k ) (1)</formula><p>We describe this as a "weighted similarity" function, which weighs the similarity of two accounts by how dissimilar they are. It is not sufficient to say that two accounts are similar. Rather, we can only be confident that two accounts correspond to the same user if they are both similar to each other and dissimilar to other accounts. If we have two accounts a i and a j such that a i is similar to a j but both a i and a j are similar to the bulk of the accounts in our data set, we are less confident that a i and a j originate from the same user. it is probable that a i and a j (and the accounts they are similar to) belong to a mass of users whose behavior is too shallow or generic to discern. Conversely, if a i and a j were similar to each other but different from other accounts, we would be significantly more confident that both accounts originated from the same user. Hence, our scoring function is weighted by both account similarity and account dissimilarity.</p><p>For calculating the similarity between two accounts a i and a j , we use the cosine similarity metric, a common measure in information retrieval. For two n-dimensional vectors, the cosine similarity is calculated by</p><formula xml:id="formula_1">Cosine(s i , s j ) = s i • s j ||s i ||||s j ||</formula><p>For each account, we deem the account with the highest score that exceeds the threshold to be from the same author. If no accounts have a score above the threshold, then the account in question is deemed not to share an author with any other account in the dataset. If multiple other accounts have scores which exceed the threshold, we only pick the account with the highest score. As we discuss in the next section, this approach is highly flexible, allowing us to achieve different types of results by varying the cutoff score used.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Evaluation</head><p>We measure the success of our approach using the precision-recall framework. Precision is defined as the proportion of account pairings we identify that are correct.</p><formula xml:id="formula_2">Precision = |S t ∩ S p | |S p |</formula><p>where S p is the set of account pairings we predict and S t is the set containing all pairs of accounts that originate from the same user (truth). Recall is defined as the proportion of same user account pairs that are identified by our methodology, or</p><formula xml:id="formula_3">Recall = |S t ∩ S p | |S t |</formula><p>In the context of our application, precision is the proportion of identified account pairings that do correspond to the same user. Recall is the proportion of sameuser account pairings that we do identify.</p><p>Using the precision-recall framework to evaluate our approach allows us to modulate the type of result achieved based. Depending on the context in which we're applying the methodology, this can differ. Sometimes perhaps, we may require a strategy that delivers high precision. This would be preferable, for example, if we chose to be conservative in our identification of accounts. Alternatively, we may want to flag as many accounts as possible. In this case, we would prefer a strategy which delivered a high recall (even at the cost of precision).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Baseline</head><p>To establish a baseline, we simulate an adversary randomly guessing accounts as pairs. We do this by randomly generating a score between [0, 1] for each pair of accounts. We pick the cutoff that maximizes the F1 score and report results at that threshold.</p><p>In addition, we offer a more advanced baseline by running K-Means clustering directly on the generated Doc2Vec vectors for each account. Specifically, we set the number of desired clusters equal to the number of users. If two accounts are contained in the same cluster, we predict those two accounts to originate form the same user.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Data</head><p>Using the Twitter API, we collected 1,270,999 tweets from 1849 users. Of these, 678,403 were retweets and 592,596 were original tweets by users. Figure <ref type="figure" target="#fig_0">1</ref> shows a cumulative histogram of the number of tweets for every account in the dataset. The vast majority have fewer than 2000 tweets. Figure <ref type="figure" target="#fig_1">2</ref> is a histogram of the proportion of retweets for all accounts (the fraction of an account's tweets that are retweeted). The majority of accounts in our dataset are regularly active, with half posting at least 1.84 times per day.</p><p>Given that the focus of this work was on using semantic clues to develop unique identifiers for different Twitter accounts, we took care to clean tweets so that the algorithm would not identify accounts on the basis of their network properties. Specifically, we removed all account handles from the text from every tweet (e.g, " exampleAccount"). We evaluated our algorithm as follows. We split our dataset of users into two groups -a "training" set and a "testing" set. Within each set, we split each user into two separate accounts (with each account containing half of the user's This particular procedure allows us to justify the final cutoff used to identify accounts. We can imagine that in different contexts, a different cutoffs might be necessary. "Learning" it in this manner will allows us to better approximate an optimal cutoff. Additionally, we experimented with the effect of retweets on our approach's performance. We thus ran two variations of our strategy. In the first, we ignored all retweets by accounts, using only "original" tweets to construct account traces. In the second variation, we used all of an account's tweets (including retweets) to construct the trace. Table <ref type="table">5</ref> presents the results of these two versions, along with the baseline performance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Method</head><p>We experimented by varying the number of tweets each account was generated from. We see that as the number of tweets per account increases, the algorithms performance improves (Figure <ref type="figure" target="#fig_2">4</ref>). However, we observe that the overall performance of the algorithm appears to level off after roughly 200 tweets. The results in Table <ref type="table">5</ref> demonstrate the effectiveness of our approach. Our algorithm exceeds both the randomized and naive clustering baseline, suggesting that our methods are capable of both successfully constructing unique traces, and using these trace to identify when tweets from two accounts are authored by the same user.</p><p>Our techniques demonstrate significantly improved results when we use an account's reweets to derive a semantic signature. There are several ways to interpret this result. Its possible that by using an account's retweets, our extracted semantic signature is influenced by the user's location in the Twitter network. Users are more likely to retweet accounts that they are following/followed by. Thus, when the majority of a user's tweets are retweets, the extracted semantic signature is effectively a reflection of the network structure surrounding the user.</p><p>However, a user's retweets are likely to reflect their interest profile. Furthermore, every account they retweet is also likely to have an extractable semantic signature (given enough tweets). Thus, we can view the extracted semantic signature for user not solely as their own, but as a composition of the semantic signatures of the accounts they frequently retweet.</p><p>We also find that performance in general improves as the number of tweets sampled for each account increases. Intuitively, this follows. As we gather more tweets from an account, we're able to better approximate the user's profile, and Fig. <ref type="figure">3</ref>. Proportion of accounts grouped into the same cluster for different numbers of clusters thus build a better trace. After a while however, there appear to be diminishing returns.</p><p>Additionally, we find that as the number of accounts we run our algorithm on increases, performance tends to decrease. As we grow our sample, we can imagine that accounts grow less distinguishable, and tend towards a more general, "average" interest profile. In these cases, it becomes hard for us to extract a unique trace for each account. However, the results in Figure <ref type="figure">5</ref> suggest that our strategy still finds success for larger samples of users. It's likely that our approach is conducting a variant of "outlier detection", in effect identifying users who are sufficiently different from all others.</p><p>Additionally, we find that that when the sample of users is small, the cutoff learned on the training set results in poorer performance on the testing set (farther away from the optimal point). When the cutoff is large however, we find that the performance on the training set is comparable to the test set.</p><p>The methods we present can be extended beyond the problem posed in this paper. The ability to construct fingerprints for users on the basis of their behavior has wide ranging implications for privacy and security. Broadly speaking, behavioral tracing is applicable in any domain where individuals take actions consistent with a set of interests, habits, or tasks. It could be used for example, to identify someone on the basis of online purchases. Equivalently, it could also be used to disambiguate between multiple individuals using a single account (e.g. on Netflix). At its core, behavioral tracing offers a way of uniquely identifying individuals on the basis of their behavior. Because behavior is hard to mask or alter, behavioral tracing is especially potent. In summary, the primary contribution of our work is behavioral tracing, a topical authorship model for Twitter. In framing a user's tweets as samples from their interest distribution, we demonstrate how users can be fingerprinted on the basis of a semantic signature. Validating our approach on real world Twitter data, we demonstrate how it can find success at identify users across different accounts.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Cumulative histogram of the number of tweets for every user in our data set</figDesc><graphic coords="8,134.77,115.84,288.00,216.00" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. Histogram of the proportion of tweets that were retweets for every user in our dataset</figDesc><graphic coords="9,134.77,115.84,288.00,216.00" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 4 .</head><label>4</label><figDesc>Fig. 4. Algorithm performance as the tweet sample size varies</figDesc><graphic coords="11,134.77,115.84,369.00,228.00" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="10,134.77,115.84,300.00,185.50" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="12,134.77,115.84,300.00,185.50" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Results of different approaches</figDesc><table><row><cell></cell><cell cols="3">F1 Score Precision Recall</cell></row><row><cell cols="2">Behavioral Tracing (No Retweets) 0.54</cell><cell>0.54</cell><cell>0.54</cell></row><row><cell cols="2">Behavioral Tracing (All Tweets) 0.69</cell><cell>0.70</cell><cell>0.69</cell></row><row><cell>Raw K-Means</cell><cell>0.45</cell><cell>0.45</cell><cell>0.45</cell></row><row><cell>Randomized Baseline</cell><cell cols="3">0.00045 0.00086 0.000345</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>We also experimented by varying the number of users targeted by our approach. We find that generally, as the number of users analyzed increases, the algorithm's ability to extract a uniquely identifiable fingerprint decreases.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">Acknowledgments</head><p>We'd like to thank Anand Shukla, Ramakrishnan Srikant, Dan Boneh, Ramanathan Guha, Mehran Sahami, Lea Kissner, Scott Ellis, and Jonathan Mayer for their advice and guidance on this project.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<ptr target="https://radimrehurek.com/gensim/models/doc2vec.html" />
		<title level="m">Deep learning with paragraph2vec</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Stylometric Analysis for Authorship Attribution on Twitter</title>
		<author>
			<persName><forename type="first">M</forename><surname>Bhargava</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Mehndiratta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Asawa</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2013">2013</date>
			<publisher>Springer International Publishing</publisher>
			<biblScope unit="page" from="37" to="47" />
			<pubPlace>Cham</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Antisocial behavior in online discussion communities</title>
		<author>
			<persName><forename type="first">J</forename><surname>Cheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Danescu-Niculescu-Mizil</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Leskovec</surname></persName>
		</author>
		<idno>CoRR abs/1504.00680</idno>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title/>
		<author>
			<persName><surname>Fig</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Algorithm performance (for train and test sets) as the number of users in the sample varies</title>
		<author>
			<persName><forename type="first">P</forename><surname>Galán-García</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">G</forename><surname>De La Puerta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">L</forename><surname>Gómez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Santos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Bringas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Supervised Machine Learning for the Detection of Troll Profiles in Twitter Social Network: Application to a Real Case of Cyberbullying</title>
				<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="419" to="428" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Gibbs</surname></persName>
		</author>
		<ptr target="https://www.theguardian.com/technology/2015/mar/04/twitters-new-bid-to-end-online-abuse-could-endanger-dissidents-analysis" />
		<title level="m">The Problem With Twitters New Abuse Strategy</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Information revelation and privacy in online social networks</title>
		<author>
			<persName><forename type="first">R</forename><surname>Gross</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Acquisti</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2005 ACM Workshop on Privacy in the Electronic Society</title>
				<meeting>the 2005 ACM Workshop on Privacy in the Electronic Society<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2005">2005</date>
			<biblScope unit="page" from="71" to="80" />
		</imprint>
	</monogr>
	<note>WPES &apos;05</note>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Semantic identification of web browsing sessions</title>
		<author>
			<persName><forename type="first">N</forename><surname>Guha</surname></persName>
		</author>
		<idno>CoRR abs/1704.03138</idno>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Resisting structural re-identification in anonymized social networks</title>
		<author>
			<persName><forename type="first">M</forename><surname>Hay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Miklau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Jensen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Towsley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Weis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Proc. VLDB Endow</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="102" to="114" />
			<date type="published" when="2008-08">Aug. 2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">N-Gram Feature Selection for Authorship Identification</title>
		<author>
			<persName><forename type="first">J</forename><surname>Houvardas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Stamatatos</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2006">2006</date>
			<publisher>Springer</publisher>
			<biblScope unit="page" from="77" to="86" />
			<pubPlace>Berlin Heidelberg; Berlin, Heidelberg</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">The slashdot zoo: Mining a social network with negative edges</title>
		<author>
			<persName><forename type="first">J</forename><surname>Kunegis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lommatzsch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bauckhage</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 18th International Conference on World Wide Web</title>
				<meeting>the 18th International Conference on World Wide Web<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="741" to="750" />
		</imprint>
	</monogr>
	<note>WWW &apos;09</note>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Authorship attribution for twitter in 140 characters or less</title>
		<author>
			<persName><forename type="first">R</forename><surname>Layton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Watters</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Dazeley</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Second Cybercrime and Trustworthy Computing Workshop</title>
				<imprint>
			<date type="published" when="2010-07">2010. July 2010</date>
			<biblScope unit="page" from="1" to="8" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">Distributed representations of sentences and documents</title>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">V</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<idno>CoRR abs/1405.4053</idno>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Distributed representations of words and phrases and their compositionality</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dean</surname></persName>
		</author>
		<idno>CoRR abs/1310.4546</idno>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">De-anonymizing social networks</title>
		<author>
			<persName><forename type="first">A</forename><surname>Narayanan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Shmatikov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">30th IEEE Symposium on Security and Privacy</title>
				<imprint>
			<date type="published" when="2009-05">2009. May 2009</date>
			<biblScope unit="page" from="173" to="187" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Propagation of trust and distrust for the detection of trolls in a social network</title>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">J</forename><surname>Ortega</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A</forename><surname>Troyano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">L</forename><surname>Cruz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">G</forename><surname>Vallejo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Enrquez</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computer Networks</title>
		<imprint>
			<biblScope unit="volume">56</biblScope>
			<biblScope unit="page" from="2884" to="2895" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">De-anonymizing web browsing data with social networks</title>
		<author>
			<persName><forename type="first">J</forename><surname>Su</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Shukla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Goel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Narayanan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 26th International Conference on World Wide Web (Republic and Canton of</title>
				<meeting>the 26th International Conference on World Wide Web (Republic and Canton of<address><addrLine>Geneva, Switzerland</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="1261" to="1269" />
		</imprint>
	</monogr>
	<note>WWW &apos;17, International World Wide Web Conferences Steering Committee</note>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Suspended accounts in retrospect: An analysis of twitter spam</title>
		<author>
			<persName><forename type="first">K</forename><surname>Thomas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Grier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Paxson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2011 ACM SIG-COMM Conference on Internet Measurement Conference</title>
				<meeting>the 2011 ACM SIG-COMM Conference on Internet Measurement Conference<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="243" to="258" />
		</imprint>
	</monogr>
	<note>IMC &apos;11</note>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">A framework for authorship identification of online messages: Writing-style features and classification techniques</title>
		<author>
			<persName><forename type="first">R</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Huang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of the American Society for Information Science and Technology</title>
		<imprint>
			<biblScope unit="volume">57</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="378" to="393" />
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Preserving privacy in social networks against neighborhood attacks</title>
		<author>
			<persName><forename type="first">B</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pei</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE 24th International Conference on Data Engineering</title>
				<imprint>
			<date type="published" when="2008-04">2008. April 2008</date>
			<biblScope unit="page" from="506" to="515" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
